c# with Splitting a string into chunks of a certain size



split string with length in c# (24)

Using regular expressions and Linq:

List<string> groups = (from Match m in Regex.Matches(str, @"\d{4}")
                       select m.Value).ToList();

I find this to be more readable, but it's just a personal opinion. It can also be a one-liner : ).

Suppose I had a string:

string str = "1111222233334444"; 

How can I break this string into chunks of some size?

e.g., breaking this into sizes of 4 would return strings:

"1111"
"2222"
"3333"
"4444"

It's not pretty and it's not fast, but it works, it's a one-liner and it's LINQy:

List<string> a = text.Select((c, i) => new { Char = c, Index = i }).GroupBy(o => o.Index / 4).Select(g => new String(g.Select(o => o.Char).ToArray())).ToList();

Best , Easiest and Generic Answer :).

    string originalString = "1111222233334444";
    List<string> test = new List<string>();
    int chunkSize = 4; // change 4 with the size of strings you want.
    for (int i = 0; i < originalString.Length; i = i + chunkSize)
    {
        if (originalString.Length - i >= chunkSize)
            test.Add(originalString.Substring(i, chunkSize));
        else
            test.Add(originalString.Substring(i,((originalString.Length - i))));
    }

If necessary to split by few different length: For example you have date and time in specify format stringstrangeStr = "07092016090532"; 07092016090532 (Date:07.09.2016 Time: 09:05:32)

public static IEnumerable<string> SplitBy(this string str, int[] chunkLength)
    {
        if (String.IsNullOrEmpty(str)) throw new ArgumentException();
        int i = 0;
        for (int j = 0; j < chunkLength.Length; j++)
        {
            if (chunkLength[j] < 1) throw new ArgumentException();
            if (chunkLength[j] + i > str.Length)
            {
                chunkLength[j] = str.Length - i;
            }
            yield return str.Substring(i, chunkLength[j]);
            i += chunkLength[j];
        }
    }

using:

string[] dt = strangeStr.SplitBy(new int[] { 2, 2, 4, 2, 2, 2, 2 }).ToArray();

Six years later o_O

Just because

    public static IEnumerable<string> Split(this string str, int chunkSize, bool remainingInFront)
    {
        var count = (int) Math.Ceiling(str.Length/(double) chunkSize);
        Func<int, int> start = index => remainingInFront ? str.Length - (count - index)*chunkSize : index*chunkSize;
        Func<int, int> end = index => Math.Min(str.Length - Math.Max(start(index), 0), Math.Min(start(index) + chunkSize - Math.Max(start(index), 0), chunkSize));
        return Enumerable.Range(0, count).Select(i => str.Substring(Math.Max(start(i), 0),end(i)));
    }

or

    private static Func<bool, int, int, int, int, int> start = (remainingInFront, length, count, index, size) =>
        remainingInFront ? length - (count - index) * size : index * size;

    private static Func<bool, int, int, int, int, int, int> end = (remainingInFront, length, count, index, size, start) =>
        Math.Min(length - Math.Max(start, 0), Math.Min(start + size - Math.Max(start, 0), size));

    public static IEnumerable<string> Split(this string str, int chunkSize, bool remainingInFront)
    {
        var count = (int)Math.Ceiling(str.Length / (double)chunkSize);
        return Enumerable.Range(0, count).Select(i => str.Substring(
            Math.Max(start(remainingInFront, str.Length, count, i, chunkSize), 0),
            end(remainingInFront, str.Length, count, i, chunkSize, start(remainingInFront, str.Length, count, i, chunkSize))
        ));
    }

AFAIK all edge cases are handled.

Console.WriteLine(string.Join(" ", "abc".Split(2, false))); // ab c
Console.WriteLine(string.Join(" ", "abc".Split(2, true))); // a bc
Console.WriteLine(string.Join(" ", "a".Split(2, true))); // a
Console.WriteLine(string.Join(" ", "a".Split(2, false))); // a

static IEnumerable<string> Split(string str, int chunkSize)
{
    return Enumerable.Range(0, str.Length / chunkSize)
        .Select(i => str.Substring(i * chunkSize, chunkSize));
}

Please note that additional code might be required to gracefully handle edge cases (null or empty input string, chunkSize == 0, input string length not divisible by chunkSize, etc.). The original question doesn't specify any requirements for these edge cases and in real life the requirements might vary so they are out of scope of this answer.


This can be done in this way too

        string actualString = "1111222233334444";
        var listResult = new List<string>();
        int groupingLength = actualString.Length % 4;
        if (groupingLength > 0)
            listResult.Add(actualString.Substring(0, groupingLength));
        for (int i = groupingLength; i < actualString.Length; i += 4)
        {
            listResult.Add(actualString.Substring(i, 4));
        }

        foreach(var res in listResult)
        {
            Console.WriteLine(res);
        }
        Console.Read();

An important tip if the string that is being chunked needs to support all Unicode characters.

If the string is to support international characters like 𠀋, then split up the string using the System.Globalization.StringInfo class. Using StringInfo, you can split up the string based on number of text elements.

string internationalString = '𠀋';

The above string has a Length of 2, because the String.Length property returns the number of Char objects in this instance, not the number of Unicode characters.


class StringHelper
{
    static void Main(string[] args)
    {
        string str = "Hi my name is vikas bansal and my email id is [email protected]";
        int offSet = 10;

        List<string> chunks = chunkMyStr(str, offSet);

        Console.Read();
    }

    static List<string> chunkMyStr(string str, int offSet)
    {


        List<string> resultChunks = new List<string>();

        for (int i = 0; i < str.Length; i += offSet)
        {
            string temp = str.Substring(i, (str.Length - i) > offSet ? offSet : (str.Length - i));
            Console.WriteLine(temp);
            resultChunks.Add(temp);


        }

        return resultChunks;
    }
}

Personally I prefer my solution :-)

It handles:

  • String lengths that are a multiple of the chunk size.
  • String lengths that are NOT a multiple of the chunk size.
  • String lengths that are smaller than the chunk size.
  • NULL and empty strings (throws an exception).
  • Chunk sizes smaller than 1 (throws an exception).

It is implemented as a extension method, and it calculates the number of chunks is going to generate beforehand. It checks the last chunk because in case the text length is not a multiple it needs to be shorter. Clean, short, easy to understand... and works!

    public static string[] Split(this string value, int chunkSize)
    {
        if (string.IsNullOrEmpty(value)) throw new ArgumentException("The string cannot be null.");
        if (chunkSize < 1) throw new ArgumentException("The chunk size should be equal or greater than one.");

        int remainder;
        int divResult = Math.DivRem(value.Length, chunkSize, out remainder);

        int numberOfChunks = remainder > 0 ? divResult + 1 : divResult;
        var result = new string[numberOfChunks];

        int i = 0;
        while (i < numberOfChunks - 1)
        {
            result[i] = value.Substring(i * chunkSize, chunkSize);
            i++;
        }

        int lastChunkSize = remainder > 0 ? remainder : chunkSize;
        result[i] = value.Substring(i * chunkSize, lastChunkSize);

        return result;
    }

static IEnumerable<string> Split(string str, int chunkSize)
{
   IEnumerable<string> retVal = Enumerable.Range(0, str.Length / chunkSize)
        .Select(i => str.Substring(i * chunkSize, chunkSize))

   if (str.Length % chunkSize > 0)
        retVal = retVal.Append(str.Substring(str.Length / chunkSize * chunkSize, str.Length % chunkSize));

   return retVal;
}

It correctly handles input string length not divisible by chunkSize.

Please note that additional code might be required to gracefully handle edge cases (null or empty input string, chunkSize == 0).


static IEnumerable<string> Split(string str, double chunkSize)
{
    return Enumerable.Range(0, (int) Math.Ceiling(str.Length/chunkSize))
       .Select(i => new string(str
           .Skip(i * (int)chunkSize)
           .Take((int)chunkSize)
           .ToArray()));
}

and another approach:

using System;
using System.Collections.Generic;
using System.Linq;

public class Program
{
    public static void Main()
    {

        var x = "Hello World";
        foreach(var i in x.ChunkString(2)) Console.WriteLine(i);
    }
}

public static class Ext{
    public static IEnumerable<string> ChunkString(this string val, int chunkSize){
        return val.Select((x,i) => new {Index = i, Value = x})
                  .GroupBy(x => x.Index/chunkSize, x => x.Value)
                  .Select(x => string.Join("",x));
    }
}

Using the Buffer extensions from the IX library

    static IEnumerable<string> Split( this string str, int chunkSize )
    {
        return str.Buffer(chunkSize).Select(l => String.Concat(l));
    }

public static IEnumerable<IEnumerable<T>> SplitEvery<T>(this IEnumerable<T> values, int n)
{
    var ls = values.Take(n);
    var rs = values.Skip(n);
    return ls.Any() ?
        Cons(ls, SplitEvery(rs, n)) : 
        Enumerable.Empty<IEnumerable<T>>();
}

public static IEnumerable<T> Cons<T>(T x, IEnumerable<T> xs)
{
    yield return x;
    foreach (var xi in xs)
        yield return xi;
}

I can't remember who gave me this, but it works great. I speed tested a number of ways to break Enumerable types into groups. The usage would just be like this...

List<string> Divided = Source3.Chunk(24).Select(Piece => string.Concat<char>(Piece)).ToList();

The extention code would look like this...

#region Chunk Logic
private class ChunkedEnumerable<T> : IEnumerable<T>
{
    class ChildEnumerator : IEnumerator<T>
    {
        ChunkedEnumerable<T> parent;
        int position;
        bool done = false;
        T current;


        public ChildEnumerator(ChunkedEnumerable<T> parent)
        {
            this.parent = parent;
            position = -1;
            parent.wrapper.AddRef();
        }

        public T Current
        {
            get
            {
                if (position == -1 || done)
                {
                    throw new InvalidOperationException();
                }
                return current;

            }
        }

        public void Dispose()
        {
            if (!done)
            {
                done = true;
                parent.wrapper.RemoveRef();
            }
        }

        object System.Collections.IEnumerator.Current
        {
            get { return Current; }
        }

        public bool MoveNext()
        {
            position++;

            if (position + 1 > parent.chunkSize)
            {
                done = true;
            }

            if (!done)
            {
                done = !parent.wrapper.Get(position + parent.start, out current);
            }

            return !done;

        }

        public void Reset()
        {
            // per http://msdn.microsoft.com/en-us/library/system.collections.ienumerator.reset.aspx
            throw new NotSupportedException();
        }
    }

    EnumeratorWrapper<T> wrapper;
    int chunkSize;
    int start;

    public ChunkedEnumerable(EnumeratorWrapper<T> wrapper, int chunkSize, int start)
    {
        this.wrapper = wrapper;
        this.chunkSize = chunkSize;
        this.start = start;
    }

    public IEnumerator<T> GetEnumerator()
    {
        return new ChildEnumerator(this);
    }

    System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
    {
        return GetEnumerator();
    }

}
private class EnumeratorWrapper<T>
{
    public EnumeratorWrapper(IEnumerable<T> source)
    {
        SourceEumerable = source;
    }
    IEnumerable<T> SourceEumerable { get; set; }

    Enumeration currentEnumeration;

    class Enumeration
    {
        public IEnumerator<T> Source { get; set; }
        public int Position { get; set; }
        public bool AtEnd { get; set; }
    }

    public bool Get(int pos, out T item)
    {

        if (currentEnumeration != null && currentEnumeration.Position > pos)
        {
            currentEnumeration.Source.Dispose();
            currentEnumeration = null;
        }

        if (currentEnumeration == null)
        {
            currentEnumeration = new Enumeration { Position = -1, Source = SourceEumerable.GetEnumerator(), AtEnd = false };
        }

        item = default(T);
        if (currentEnumeration.AtEnd)
        {
            return false;
        }

        while (currentEnumeration.Position < pos)
        {
            currentEnumeration.AtEnd = !currentEnumeration.Source.MoveNext();
            currentEnumeration.Position++;

            if (currentEnumeration.AtEnd)
            {
                return false;
            }

        }

        item = currentEnumeration.Source.Current;

        return true;
    }

    int refs = 0;

    // needed for dispose semantics 
    public void AddRef()
    {
        refs++;
    }

    public void RemoveRef()
    {
        refs--;
        if (refs == 0 && currentEnumeration != null)
        {
            var copy = currentEnumeration;
            currentEnumeration = null;
            copy.Source.Dispose();
        }
    }
}
/// <summary>Speed Checked.  Works Great!</summary>
public static IEnumerable<IEnumerable<T>> Chunk<T>(this IEnumerable<T> source, int chunksize)
{
    if (chunksize < 1) throw new InvalidOperationException();

    var wrapper = new EnumeratorWrapper<T>(source);

    int currentPos = 0;
    T ignore;
    try
    {
        wrapper.AddRef();
        while (wrapper.Get(currentPos, out ignore))
        {
            yield return new ChunkedEnumerable<T>(wrapper, chunksize, currentPos);
            currentPos += chunksize;
        }
    }
    finally
    {
        wrapper.RemoveRef();
    }
}
#endregion

List<string> SplitString(int chunk, string input)
{
    List<string> list = new List<string>();
    int cycles = input.Length / chunk;

    if (input.Length % chunk != 0)
        cycles++;

    for (int i = 0; i < cycles; i++)
    {
        try
        {
            list.Add(input.Substring(i * chunk, chunk));
        }
        catch
        {
            list.Add(input.Substring(i * chunk));
        }
    }
    return list;
}

How's this for a one-liner?

List<string> result = new List<string>(Regex.Split(target, @"(?<=\G.{4})", RegexOptions.Singleline));

With this regex it doesn't matter if the last chunk is less than four characters, because it only ever looks at the characters behind it. I'm sure this isn't the most efficient solution, but I just had to toss it out there. :D


I think this is an straight forward answer:

public static IEnumerable<string> Split(this string str, int chunkSize)
    {
        if(string.IsNullOrEmpty(str) || chunkSize<1)
            throw new ArgumentException("String can not be null or empty and chunk size should be greater than zero.");
        var chunkCount = str.Length / chunkSize + (str.Length % chunkSize != 0 ? 1 : 0);
        for (var i = 0; i < chunkCount; i++)
        {
            var startIndex = i * chunkSize;
            if (startIndex + chunkSize >= str.Length)
                yield return str.Substring(startIndex);
            else
                yield return str.Substring(startIndex, chunkSize);
        }
    }

And it covers edge cases.


In a combination of dove+Konstatin's answers...

static IEnumerable<string> WholeChunks(string str, int chunkSize) {
    for (int i = 0; i < str.Length; i += chunkSize) 
        yield return str.Substring(i, chunkSize);
}

This will work for all strings that can be split into a whole number of chunks, and will throw an exception otherwise.

If you want to support strings of any length you could use the following code:

static IEnumerable<string> ChunksUpto(string str, int maxChunkSize) {
    for (int i = 0; i < str.Length; i += maxChunkSize) 
        yield return str.Substring(i, Math.Min(maxChunkSize, str.Length-i));
}

However, the the OP explicitly stated he does not need this; it's somewhat longer and harder to read, slightly slower. In the spirit of KISS and YAGNI, I'd go with the first option: it's probably the most efficient implementation possible, and it's very short, readable, and, importantly, throws an exception for nonconforming input.


Changed slightly to return parts whose size not equal to chunkSize

public static IEnumerable<string> Split(this string str, int chunkSize)
    {
        var splits = new List<string>();
        if (str.Length < chunkSize) { chunkSize = str.Length; }
        splits.AddRange(Enumerable.Range(0, str.Length / chunkSize).Select(i => str.Substring(i * chunkSize, chunkSize)));
        splits.Add(str.Length % chunkSize > 0 ? str.Substring((str.Length / chunkSize) * chunkSize, str.Length - ((str.Length / chunkSize) * chunkSize)) : string.Empty);
        return (IEnumerable<string>)splits;
    }

Modified (now it accepts any non null string and any positive chunkSize) Konstantin Spirin's solution:

public static IEnumerable<String> Split(String value, int chunkSize) {
  if (null == value)
    throw new ArgumentNullException("value");
  else if (chunkSize <= 0)
    throw new ArgumentOutOfRangeException("chunkSize", "Chunk size should be positive");

  return Enumerable
    .Range(0, value.Length / chunkSize + ((value.Length % chunkSize) == 0 ? 0 : 1))
    .Select(index => (index + 1) * chunkSize < value.Length 
      ? value.Substring(index * chunkSize, chunkSize)
      : value.Substring(index * chunkSize));
}

Tests:

  String source = @"ABCDEF";

  // "ABCD,EF"
  String test1 = String.Join(",", Split(source, 4));
  // "AB,CD,EF"
  String test2 = String.Join(",", Split(source, 2));
  // "ABCDEF"
  String test3 = String.Join(",", Split(source, 123));

This is based on @dove solution but implemented as an extension method.

Benefits:

  • Extension method
  • Covers corner cases
  • Splits string with any chars: numbers, letters, other symbols

Code

public static class EnumerableEx
{    
    public static IEnumerable<string> SplitBy(this string str, int chunkLength)
    {
        if (String.IsNullOrEmpty(str)) throw new ArgumentException();
        if (chunkLength < 1) throw new ArgumentException();

        for (int i = 0; i < str.Length; i += chunkLength)
        {
            if (chunkLength + i > str.Length)
                chunkLength = str.Length - i;

            yield return str.Substring(i, chunkLength);
        }
    }
}

Usage

var result = "bobjoecat".SplitBy(3); // bob, joe, cat

Unit tests removed for brevity (see previous revision)


Why not loops? Here's something that would do it quite well:

        string str = "111122223333444455";
        int chunkSize = 4;
        int stringLength = str.Length;
        for (int i = 0; i < stringLength ; i += chunkSize)
        {
            if (i + chunkSize > stringLength) chunkSize = stringLength  - i;
            Console.WriteLine(str.Substring(i, chunkSize));

        }
        Console.ReadLine();

I don't know how you'd deal with case where the string is not factor of 4, but not saying you're idea is not possible, just wondering the motivation for it if a simple for loop does it very well? Obviously the above could be cleaned and even put in as an extension method.

Or as mentioned in comments, you know it's /4 then

str = "1111222233334444";
for (int i = 0; i < stringLength; i += chunkSize) 
  {Console.WriteLine(str.Substring(i, chunkSize));} 

    public static List<string> SplitByMaxLength(this string str)
    {
        List<string> splitString = new List<string>();

        for (int index = 0; index < str.Length; index += MaxLength)
        {
            splitString.Add(str.Substring(index, Math.Min(MaxLength, str.Length - index)));
        }

        return splitString;
    }




string