vba formula substring - How to use Regular Expressions (Regex) in Microsoft Excel both in-cell and loops





3 Answers

To make use of regular expressions directly in Excel formulas the following UDF (user defined function) can be of help. It more or less directly exposes regular expression functionality as an excel function.

How it works

It takes 2-3 parameters.

  1. A text to use the regular expression on.
  2. A regular expression.
  3. A format string specifying how the result should look. It can contain $0, $1, $2, and so on. $0 is the entire match, $1 and up correspond to the respective match groups in the regular expression. Defaults to $0.

Some examples

Extracting an email address:

=regex("Peter Gordon: some@email.com, 47", "\w+@\w+\.\w+")
=regex("Peter Gordon: some@email.com, 47", "\w+@\w+\.\w+", "$0")

Results in: some@email.com

Extracting several substrings:

=regex("Peter Gordon: some@email.com, 47", "^(.+): (.+), (\d+)$", "E-Mail: $2, Name: $1")

Results in: E-Mail: some@email.com, Name: Peter Gordon

To take apart a combined string in a single cell into its components in multiple cells:

=regex("Peter Gordon: some@email.com, 47", "^(.+): (.+), (\d+)$", "$" & 1)
=regex("Peter Gordon: some@email.com, 47", "^(.+): (.+), (\d+)$", "$" & 2)

Results in: Peter Gordon some@email.com ...

How to use

To use this UDF do the following (roughly based on this Microsoft page. They have some good additional info there!):

  1. In Excel in a Macro enabled file ('.xlsm') push ALT+F11 to open the Microsoft Visual Basic for Applications Editor.
  2. Add VBA reference to the Regular Expressions library (shamelessly copied from Portland Runners++ answer):
    1. Click on Tools -> References (please excuse the german screenshot)
    2. Find Microsoft VBScript Regular Expressions 5.5 in the list and tick the checkbox next to it.
    3. Click OK.
  3. Click on Insert Module. If you give your module a different name make sure the Module does not have the same name as the UDF below (e.g. naming the Module Regex and the function regex causes #NAME! errors).

  4. In the big text window in the middle insert the following:

    Function regex(strInput As String, matchPattern As String, Optional ByVal outputPattern As String = "$0") As Variant
        Dim inputRegexObj As New VBScript_RegExp_55.RegExp, outputRegexObj As New VBScript_RegExp_55.RegExp, outReplaceRegexObj As New VBScript_RegExp_55.RegExp
        Dim inputMatches As Object, replaceMatches As Object, replaceMatch As Object
        Dim replaceNumber As Integer
    
        With inputRegexObj
            .Global = True
            .MultiLine = True
            .IgnoreCase = False
            .Pattern = matchPattern
        End With
        With outputRegexObj
            .Global = True
            .MultiLine = True
            .IgnoreCase = False
            .Pattern = "\$(\d+)"
        End With
        With outReplaceRegexObj
            .Global = True
            .MultiLine = True
            .IgnoreCase = False
        End With
    
        Set inputMatches = inputRegexObj.Execute(strInput)
        If inputMatches.Count = 0 Then
            regex = False
        Else
            Set replaceMatches = outputRegexObj.Execute(outputPattern)
            For Each replaceMatch In replaceMatches
                replaceNumber = replaceMatch.SubMatches(0)
                outReplaceRegexObj.Pattern = "\$" & replaceNumber
    
                If replaceNumber = 0 Then
                    outputPattern = outReplaceRegexObj.Replace(outputPattern, inputMatches(0).Value)
                Else
                    If replaceNumber > inputMatches(0).SubMatches.Count Then
                        'regex = "A to high $ tag found. Largest allowed is $" & inputMatches(0).SubMatches.Count & "."
                        regex = CVErr(xlErrValue)
                        Exit Function
                    Else
                        outputPattern = outReplaceRegexObj.Replace(outputPattern, inputMatches(0).SubMatches(replaceNumber - 1))
                    End If
                End If
            Next
            regex = outputPattern
        End If
    End Function
    
  5. Save and close the Microsoft Visual Basic for Applications Editor window.

extract examples execute

How can I use regular expressions in Excel and take advantage of Excel's powerful grid like setup for data manipulation?

  • In-cell function to return matched pattern or replaced value in string.
  • Sub to loop through a column of data and extract matches to adjacent cells.
  • What setup is necessary?
  • What are Excel's special characters for Regular expressions?

I understand Regex is not ideal for many situations (To use or not to use regular expressions?) since excel can use Left, Mid, Right, Instr type commands for similar manipulations.




Here is my attempt:

Function RegParse(ByVal pattern As String, ByVal html As String)
    Dim regex   As RegExp
    Set regex = New RegExp

    With regex
        .IgnoreCase = True  'ignoring cases while regex engine performs the search.
        .pattern = pattern  'declaring regex pattern.
        .Global = False     'restricting regex to find only first match.

        If .Test(html) Then         'Testing if the pattern matches or not
            mStr = .Execute(html)(0)        '.Execute(html)(0) will provide the String which matches with Regex
            RegParse = .Replace(mStr, "$1") '.Replace function will replace the String with whatever is in the first set of braces - $1.
        Else
            RegParse = "#N/A"
        End If

    End With
End Function



I needed to use this as a cell function (like SUM or VLOOKUP) and found that it was easy to:

  1. Make sure you are in a Macro Enabled Excel File (save as xlsm).
  2. Open developer tools Alt + F11
  3. Add Microsoft VBScript Regular Expressions 5.5 as in other answers
  4. Create the following function either in workbook or in its own module:

    Function REGPLACE(myRange As Range, matchPattern As String, outputPattern As String) As Variant
        Dim regex As New VBScript_RegExp_55.RegExp
        Dim strInput As String
    
        strInput = myRange.Value
    
        With regex
            .Global = True
            .MultiLine = True
            .IgnoreCase = False
            .Pattern = matchPattern
        End With
    
        REGPLACE = regex.Replace(strInput, outputPattern)
    
    End Function
    
  5. Then you can use in cell with =REGPLACE(B1, "(\w) (\d+)", "$1$2") (ex: "A 243" to "A243")






Related