vba formula substring - How to use Regular Expressions (Regex) in Microsoft Excel both in-cell and loops
To make use of regular expressions directly in Excel formulas the following UDF (user defined function) can be of help. It more or less directly exposes regular expression functionality as an excel function.
How it works
It takes 2-3 parameters.
- A text to use the regular expression on.
- A regular expression.
- A format string specifying how the result should look. It can contain
$2, and so on.
$0is the entire match,
$1and up correspond to the respective match groups in the regular expression. Defaults to
Extracting an email address:
=regex("Peter Gordon: firstname.lastname@example.org, 47", "\w+@\w+\.\w+") =regex("Peter Gordon: email@example.com, 47", "\w+@\w+\.\w+", "$0")
Extracting several substrings:
=regex("Peter Gordon: firstname.lastname@example.org, 47", "^(.+): (.+), (\d+)$", "E-Mail: $2, Name: $1")
E-Mail: email@example.com, Name: Peter Gordon
To take apart a combined string in a single cell into its components in multiple cells:
=regex("Peter Gordon: firstname.lastname@example.org, 47", "^(.+): (.+), (\d+)$", "$" & 1) =regex("Peter Gordon: email@example.com, 47", "^(.+): (.+), (\d+)$", "$" & 2)
How to use
To use this UDF do the following (roughly based on this Microsoft page. They have some good additional info there!):
- In Excel in a Macro enabled file ('.xlsm') push
ALT+F11to open the Microsoft Visual Basic for Applications Editor.
- Add VBA reference to the Regular Expressions library (shamelessly copied from Portland Runners++ answer):
- Click on Tools -> References (please excuse the german screenshot)
- Find Microsoft VBScript Regular Expressions 5.5 in the list and tick the checkbox next to it.
- Click OK.
Click on Insert Module. If you give your module a different name make sure the Module does not have the same name as the UDF below (e.g. naming the Module
Regexand the function
regexcauses #NAME! errors).
In the big text window in the middle insert the following:
Function regex(strInput As String, matchPattern As String, Optional ByVal outputPattern As String = "$0") As Variant Dim inputRegexObj As New VBScript_RegExp_55.RegExp, outputRegexObj As New VBScript_RegExp_55.RegExp, outReplaceRegexObj As New VBScript_RegExp_55.RegExp Dim inputMatches As Object, replaceMatches As Object, replaceMatch As Object Dim replaceNumber As Integer With inputRegexObj .Global = True .MultiLine = True .IgnoreCase = False .Pattern = matchPattern End With With outputRegexObj .Global = True .MultiLine = True .IgnoreCase = False .Pattern = "\$(\d+)" End With With outReplaceRegexObj .Global = True .MultiLine = True .IgnoreCase = False End With Set inputMatches = inputRegexObj.Execute(strInput) If inputMatches.Count = 0 Then regex = False Else Set replaceMatches = outputRegexObj.Execute(outputPattern) For Each replaceMatch In replaceMatches replaceNumber = replaceMatch.SubMatches(0) outReplaceRegexObj.Pattern = "\$" & replaceNumber If replaceNumber = 0 Then outputPattern = outReplaceRegexObj.Replace(outputPattern, inputMatches(0).Value) Else If replaceNumber > inputMatches(0).SubMatches.Count Then 'regex = "A to high $ tag found. Largest allowed is $" & inputMatches(0).SubMatches.Count & "." regex = CVErr(xlErrValue) Exit Function Else outputPattern = outReplaceRegexObj.Replace(outputPattern, inputMatches(0).SubMatches(replaceNumber - 1)) End If End If Next regex = outputPattern End If End Function
Save and close the Microsoft Visual Basic for Applications Editor window.
How can I use regular expressions in Excel and take advantage of Excel's powerful grid like setup for data manipulation?
- In-cell function to return matched pattern or replaced value in string.
- Sub to loop through a column of data and extract matches to adjacent cells.
- What setup is necessary?
- What are Excel's special characters for Regular expressions?
I understand Regex is not ideal for many situations (To use or not to use regular expressions?) since excel can use
Instr type commands for similar manipulations.
Here is my attempt:
Function RegParse(ByVal pattern As String, ByVal html As String) Dim regex As RegExp Set regex = New RegExp With regex .IgnoreCase = True 'ignoring cases while regex engine performs the search. .pattern = pattern 'declaring regex pattern. .Global = False 'restricting regex to find only first match. If .Test(html) Then 'Testing if the pattern matches or not mStr = .Execute(html)(0) '.Execute(html)(0) will provide the String which matches with Regex RegParse = .Replace(mStr, "$1") '.Replace function will replace the String with whatever is in the first set of braces - $1. Else RegParse = "#N/A" End If End With End Function
I needed to use this as a cell function (like
VLOOKUP) and found that it was easy to:
- Make sure you are in a Macro Enabled Excel File (save as xlsm).
- Open developer tools Alt + F11
- Add Microsoft VBScript Regular Expressions 5.5 as in other answers
Create the following function either in workbook or in its own module:
Function REGPLACE(myRange As Range, matchPattern As String, outputPattern As String) As Variant Dim regex As New VBScript_RegExp_55.RegExp Dim strInput As String strInput = myRange.Value With regex .Global = True .MultiLine = True .IgnoreCase = False .Pattern = matchPattern End With REGPLACE = regex.Replace(strInput, outputPattern) End Function
Then you can use in cell with
=REGPLACE(B1, "(\w) (\d+)", "$1$2")(ex: "A 243" to "A243")