PHP's preg-match_all causing Apache Segfault



Answers

Interestingly enough, when I remove the highlighted section, it all works fine. Trying to submit the highlighted section by itself causes no error.

What about size of the submission? If you pass gibberish of equal length, what will happen?

EDIT: splitting and merging will look something like this:

$strings = explode("\n", $sql);

$matches = array(array(), array(), array());

foreach ($strings AS $string) {
    preg_match_all("/[`'\"]?((?:[A-Z]|[a-z]|_|[0-9])+)[`'\"]? *= *[`'\"]?([^`'\" ,]+)[`'\"]?/", $string, $matches_temp);
    $matches[0] = array_merge($matches[0], $matches_temp[0]);
    $matches[1] = array_merge($matches[1], $matches_temp[1]);
    $matches[2] = array_merge($matches[2], $matches_temp[2]);
}
Question

I'm using two regular expressions to pull assignments out of MySQL queries and using them to create an audit trail. One of them is the 'picky' one that requires quoted column names/etc., the other one does not.

Both of them are tested and parse the values out correctly. The issue I'm having is that with certain queries the 'picky' regexp is actually just causing Apache to segfault.

I tried a variety of things to determine this was the cause up to leaving the regexp in the code, and just modifying the conditional to ensure it wasn't run (to rule out some sort of compile-time issue or something). No issues. It's only when it runs the regexp against specific queries that it segfaults, and I can't find any obvious pattern to tell me why.

The code in question:

if ($picky)
    preg_match_all("/[`'\"]((?:[A-Z]|[a-z]|_|[0-9])+)[`'\"] *= *'((?:[^'\\\\]|\\\\.)*)'/", $sql, $matches);
else
    preg_match_all("/[`'\"]?((?:[A-Z]|[a-z]|_|[0-9])+)[`'\"]? *= *[`'\"]?([^`'\" ,]+)[`'\"]?/", $sql, $matches);

The only difference between the two is that the first one removes the question marks on the quotes to make them non-optional and removes the option of using different kinds of quotes on the value - only allows single quotes. Replacing the first regexp with the second (for testing purposes) and using the same data removes the issue - it is definitely something to do with the regexp.

The specific SQL that is causing me grief is available at:
http://.pastebin.com/m75c2a2a0

Interestingly enough, when I remove the highlighted section, it all works fine. Trying to submit the highlighted section by itself causes no error.

I'm pretty perplexed as to what's going on here. Can anyone offer any suggestions as to further debugging or a fix?

EDIT: Nothing terribly exciting, but for the sake of completeness here's the relevant log entry from Apache (/var/log/apache2/error.log - There's nothing in the site's error.log. Not even a mention of the request in the access log.)

[Thu Dec 10 10:08:03 2009] [notice] child pid 20835 exit signal Segmentation fault (11)

One of these for each request containing that query.

EDIT2: On the suggestion of Kuroki Kaze, I tried gibberish of the same length and got the same segfault. Sat and tried a bunch of different lengths and found the limit. 6035 characters works fine. 6036 segfaults.

EDIT3: Changing the values of pcre.backtrack_limit and pcre.recursion_limit in php.ini mitigated the problem somewhat. Apache no longer segfaults, but my regexp no longer matches all of the matches in the string. Apparently this is a long-known (from 2007) bug in PHP/PCRE:
http://bugs.php.net/bug.php?id=40909

EDIT4: I posted the code in the answers below that I used to replace this specific regular expression as the workarounds weren't acceptable for my purpose (product for sale, can't guarantee php.ini changes and the regexp only partially working removed functionality we require). Code I posted is released into the public domain with no warranty or support of any kind. I hope it can help someone else. :)

Thank you everyone for the help!

Adam




Links



Tags