tutorial - Regex for matching a character, but not when it's enclosed in quotes




regular expression (4)

I need to match a colon (':') in a string, but not when it's enclosed by quotes - either a " or ' character.

So the following should have 2 matches

something:'firstValue':'secondValue'    
something:"firstValue":'secondValue'

but this should only have 1 match

something:'no:match'

I've come up with the following slightly worrying construction:

(?<=^('[^']*')*("[^"]*")*[^'"]*):

It uses a lookbehind assertion to make sure you match an even number of quotes from the beginning of the line to the current colon. It allows for embedding a single quote inside double quotes and vice versa. As in:

'a":b':c::"':" (matches at positions 6, 8 and 9)

EDIT

Gumbo is right, using * within a look behind assertion is not allowed.


If the regular expression implementation supports look-around assertions, try this:

:(?:(?<=["']:)|(?=["']))

This will match any colon that is either preceeded or followed by a double or single quote. So that does only consider construct like you mentioned. something:firstValue would not be matched.

It would be better if you build a little parser that reads the input byte-by-byte and remembers when quotation is open.


Uppps ... missed the point. Forget the rest. It's quite hard to do this because regex is not good at counting balanced characters (but the .NET implementation for example has an extension that can do it, but it's a bit complicated).

You can use negated character groups to do this.

[^'"]:[^'"]

You can further wrap the quotes in non-capturing groups.

(?:[^'"]):(?:[^'"])

Or you can use assertion.

(?<!['"]):(?!['"])

You can try to catch the strings withing the quotes

/(?<q>'|")([\w ]+)(\k<q>)/m

First pattern defines the allowed quote types, second pattern takes all Word-Digits and spaces. Very good on this solution is, it takes ONLY Strings where opening and closing quotes match.

Try it at regex101.com







regex