• This is a principle. Need to understand how regex engines make that choice.
  • For example: /\d+\w+\d+/
    • To apply to a filename such as 01_FY_07_report_99.xls
    • The regex engine could decide to return “01_FY_07”
    • Or “01_FY_07_report_99”
    • Which would should the regex engine return?
  • Another example is /".+", ".+"/
    • It will run through “Milton”, “Waddams”, “Initech, Inc.”
    • It could find “Milton”, “Waddams”
    • It could find “Milton”, “Waddams”, “Initech, Inc.”
  • For the above problems, we need to know how the regex engine deals with them.
  • Standard repetition quantifiers are greedy.
    • By “greedy”, the expression tries to match the longest possible string.
    • Defers to achieving overall match.
      • For example /.+\jpg/ matches “filename.jpg”
      • The + is greedy, but “gives back” the “.jpg” to make the match.
  • Greedy gives back as little as possible.
  • Another example /.*[0-9]+/ matches “Page 266”.
    • The .* portion matches “Page 26”.
    • [0-9]+ portion matches only “6”.
  • It matches as much as possible before giving control to the next expression part.
  • Regex engines are eager.
    • Want to match as quickly as possible.
  • Regex engines are greedy.

Updated: