[.net] Is StringBuilder.Replace() more efficient than String.Replace?


Answers

It depends if the size of the replacement is larger than the string replaced.

The StringBuilder over allocates its buffer, whereas a string only ever holds how ever many characters are in it.

The StringBuilder.Capacity property is how many characters the buffer will hold, while StringBuilder.Length is how many characters are in use.

Normally you should set StringBuilder.Capacity to a value larger then the expected resultant string. Otherwise the StringBuilder will need to reallocate its buffer. When the StringBuilder reallocates its buffer, it doubles it in size, which means after a couple reallocates it is probably significantly larger then it needs to be, by default capacity starts at 16.

By setting the Capacity value when you start (in the constructor for example) you save the reallocations of the StringBuilder's buffer. You can use StringBuilder.MaxCapacity to limit to maximum capacity that a StringBuilder can be expanded to.

Question

If you have to use String.Replace() to replace test 50 times, you essentially have to create a new string 50 times. Does StringBuilder.Replace() do this more efficiently? E.g., should I use a StringBuilder if I'm going to be replacing a lot of text, even while I won't be appending any data to it?

I'm using .NET, but I assume this would be the same as Java and possibly other languages.




I haven't found any sources but i strongly doubt that the implementation creates always new strings. I'd implement it also with a StringBuilder internally. Then String.Replace is absolutely fine if you want to replace once a huge string. But if you have to replace it many times you should consider to use StringBuilder.Replace because every call of Replace creates a new string.

So you can use StringBuilder.Replace since you're already using a StringBuilder.




Does string.Replace(string, string) create additional strings?

Well, I'm not a .NET development team member (unfortunately), but I'll try to answer your question.

Microsoft has a great site of .NET Reference Source code, and according to it, String.Replace calls an external method that does the job. I wouldn't argue about how it is implemented, but there's a small comment to this method that may answer your question:

// This method contains the same functionality as StringBuilder Replace. The only difference is that
// a new String has to be allocated since Strings are immutable

Now, if we'll follow to StringBuilder.Replace implementation, we'll see what it actually does inside.

A little more on a string objects:

Although String is immutable in .NET, this is not some kind of limitation, it's a contract. String is actually a reference type, and what it includes is the length of the actual string + the buffer of characters. You can actually get an unsafe pointer to this buffer and change it "on the fly", but I wouldn't recommend doing this.

Now, the StringBuilder class also holds a character array, and when you pass the string to its constructor it actually copies the string's buffer to his own (see Reference Source). What it doesn't have, though, is the contract of immutability, so when you modify a string using StringBuilder you are actually working with the char array. Note that when you call ToString() on a StringBuilder, it creates a new "immutable" string any copies his buffer there.

So, if you need a fast and memory efficient way to make changes in a string, StringBuilder is definitely your choice. Especially regarding that Microsoft explicitly recommends to use StringBuilder if you "perform repeated modifications to a string".




Yes, StringBuilder will give you both gain in speed and memory (basically because it won't create an instance of a string each time you will make a manipulation with it - StringBuilder always operates with the same object). Here is an MSDN link with some details.




You have 3 options:

  1. Do this in an inefficient way with strings as others have recommended here.

  2. Use the .Matches() call on your Regex object, and emulate the way .Replace() works (see #3).

  3. Adapt the Mono implementation of Regex to build a Regex that accepts StringBuilder (and please share it here!) Almost all of the work is already done for you in Mono, but it will take time to suss out the parts that make it work into their own library. Mono's Regex leverages Novell's 2002 JVM implementation of Regex, oddly enough.

In Mono:

System.Text.RegularExpressions.Regex uses an RxCompiler to instantiate an IMachineFactory in the form of an RxInterpreterFactory, which unsurprisingly makes IMachines as RxInterpreters. Getting those to emit is most of what you need to do, although if you're just looking to learn how it's all structured for efficiency, it's notable much of what you're looking for is in its base class, BaseMachine.

In particular, in BaseMachine is the StringBuilder-based stuff. In the method LTRReplace, it first instantiates a StringBuilder with the initial string, and everything from there on out is purely StringBuilder-based. It's actually very annoying that Regex doesn't have StringBuilder methods hanging out, if we assume the internal Microsoft .Net implementation is similar.

Circling back to suggestion 2, you can mimic LTRReplace's behavior by calling .Matches(), tracking where you are in the original string, and looping:

var matches = regex.Matches(original);
var sb = new StringBuilder(original.Length);
int pos = 0; // position in original string
foreach(var match in matches)
{
    sb.Append(original.Substring(pos, match.Index)); // Append the portion of the original we skipped
    pos = match.Index;

    // Make any operations you like on the match result, like your own custom Replace, or even run another Regex

    pos += match.Value.Length;
}
sb.Append(original.Substring(pos, original.Length - 1));

But, this only saves you some strings - the mod-Mono approach is the only one that really does it right.




String.Replace() vs. StringBuilder.Replace()

Using RedGate Profiler using the following code

class Program
    {
        static string data = "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz";
        static Dictionary<string, string> values;

        static void Main(string[] args)
        {
            Console.WriteLine("Data length: " + data.Length);
            values = new Dictionary<string, string>()
            {
                { "ab", "aa" },
                { "jk", "jj" },
                { "lm", "ll" },
                { "yz", "zz" },
                { "ef", "ff" },
                { "st", "uu" },
                { "op", "pp" },
                { "x", "y" }
            };

            StringReplace(data);
            StringBuilderReplace1(data);
            StringBuilderReplace2(new StringBuilder(data, data.Length * 2));

            Console.ReadKey();
        }

        private static void StringReplace(string data)
        {
            foreach(string k in values.Keys)
            {
                data = data.Replace(k, values[k]);
            }
        }

        private static void StringBuilderReplace1(string data)
        {
            StringBuilder sb = new StringBuilder(data, data.Length * 2);
            foreach (string k in values.Keys)
            {
                sb.Replace(k, values[k]);
            }
        }

        private static void StringBuilderReplace2(StringBuilder data)
        {
            foreach (string k in values.Keys)
            {
                data.Replace(k, values[k]);
            }
        }
    }
  • String.Replace = 5.843ms
  • StringBuilder.Replace #1 = 4.059ms
  • Stringbuilder.Replace #2 = 0.461ms

String length = 1456

stringbuilder #1 creates the stringbuilder in the method while #2 does not so the performance difference will end up being the same most likely since you're just moving that work out of the method. If you start with a stringbuilder instead of a string then #2 might be the way to go instead.

As far as memory, using RedGateMemory profiler, there is nothing to worry about until you get into MANY replace operations in which stringbuilder is going to win overall.