Parsing CSV files in C#, with header


Answers

A CSV parser is now a part of .NET Framework.

Add a reference to Microsoft.VisualBasic.dll (works fine in C#, don't mind the name)

using (TextFieldParser parser = new TextFieldParser(@"c:\temp\test.csv"))
{
    parser.TextFieldType = FieldType.Delimited;
    parser.SetDelimiters(",");
    while (!parser.EndOfData)
    {
        //Process row
        string[] fields = parser.ReadFields();
        foreach (string field in fields)
        {
            //TODO: Process field
        }
    }
}

The docs are here - TextFieldParser Class

Question

Is there a default/official/recommended way to parse CSV files in C#? I don't want to roll my own parser.

Also, I've seen instances of people using ODBC/OLE DB to read CSV via the Text driver, and a lot of people discourage this due to its "drawbacks." What are these drawbacks?

Ideally, I'm looking for a way through which I can read the CSV by column name, using the first record as the header / field names. Some of the answers given are correct but work to basically deserialize the file into classes.




In a business application, i use the Open Source project on codeproject.com, CSVReader.

It works well, and has good performance. There is some benchmarking on the link i provided.

A simple example, copied from the project page:

using (CsvReader csv = new CsvReader(new StreamReader("data.csv"), true))
{
    int fieldCount = csv.FieldCount;
    string[] headers = csv.GetFieldHeaders();

    while (csv.ReadNextRecord())
    {
        for (int i = 0; i < fieldCount; i++)
            Console.Write(string.Format("{0} = {1};", headers[i], csv[i]));

        Console.WriteLine();
    }
}

As you can see, it's very easy to work with.




Here is my KISS implementation...

using System;
using System.Collections.Generic;
using System.Text;

class CsvParser
{
    public static List<string> Parse(string line)
    {
        const char escapeChar = '"';
        const char splitChar = ',';
        bool inEscape = false;
        bool priorEscape = false;

        List<string> result = new List<string>();
        StringBuilder sb = new StringBuilder();
        for (int i = 0; i < line.Length; i++)
        {
            char c = line[i];
            switch (c)
            {
                case escapeChar:
                    if (!inEscape)
                        inEscape = true;
                    else
                    {
                        if (!priorEscape)
                        {
                            if (i + 1 < line.Length && line[i + 1] == escapeChar)
                                priorEscape = true;
                            else
                                inEscape = false;
                        }
                        else
                        {
                            sb.Append(c);
                            priorEscape = false;
                        }
                    }
                    break;
                case splitChar:
                    if (inEscape) //if in escape
                        sb.Append(c);
                    else
                    {
                        result.Add(sb.ToString());
                        sb.Length = 0;
                    }
                    break;
                default:
                    sb.Append(c);
                    break;
            }
        }

        if (sb.Length > 0)
            result.Add(sb.ToString());

        return result;
    }

}






This code reads csv to DataTable:

public static DataTable ReadCsv(string path)
{
    DataTable result = new DataTable("SomeData");
    using (TextFieldParser parser = new TextFieldParser(path))
    {
        parser.TextFieldType = FieldType.Delimited;
        parser.SetDelimiters(",");
        bool isFirstRow = true;
        //IList<string> headers = new List<string>();

        while (!parser.EndOfData)
        {
            string[] fields = parser.ReadFields();
            if (isFirstRow)
            {
                foreach (string field in fields)
                {
                    result.Columns.Add(new DataColumn(field, typeof(string)));
                }
                isFirstRow = false;
            }
            else
            {
                int i = 0;
                DataRow row = result.NewRow();
                foreach (string field in fields)
                {
                    row[i++] = field;
                }
                result.Rows.Add(row);
            }
        }
    }
    return result;
}



Single source file solution for straightforward parsing needs, useful. Deals with all the nasty edge cases. Such as new line normalization and handling new lines in quoted string literals. Your welcome!

If you CSV file has a header you just read out the column names (and compute column indexes) from the first row. Simple as that.

Note that Dump is a LINQPad method, you might want to remove that if you are not using LINQPad.

void Main()
{
    var file1 = "a,b,c\r\nx,y,z";
    CSV.ParseText(file1).Dump();

    var file2 = "a,\"b\",c\r\nx,\"y,z\"";
    CSV.ParseText(file2).Dump();

    var file3 = "a,\"b\",c\r\nx,\"y\r\nz\"";
    CSV.ParseText(file3).Dump();

    var file4 = "\"\"\"\"";
    CSV.ParseText(file4).Dump();
}

static class CSV
{
    public struct Record
    {
        public readonly string[] Row;

        public string this[int index] => Row[index];

        public Record(string[] row)
        {
            Row = row;
        }
    }

    public static List<Record> ParseText(string text)
    {
        return Parse(new StringReader(text));
    }

    public static List<Record> ParseFile(string fn)
    {
        using (var reader = File.OpenText(fn))
        {
            return Parse(reader);
        }
    }

    public static List<Record> Parse(TextReader reader)
    {
        var data = new List<Record>();

        var col = new StringBuilder();
        var row = new List<string>();
        for (; ; )
        {
            var ln = reader.ReadLine();
            if (ln == null) break;
            if (Tokenize(ln, col, row))
            {
                data.Add(new Record(row.ToArray()));
                row.Clear();
            }
        }

        return data;
    }

    public static bool Tokenize(string s, StringBuilder col, List<string> row)
    {
        int i = 0;

        if (col.Length > 0)
        {
            col.AppendLine(); // continuation

            if (!TokenizeQuote(s, ref i, col, row))
            {
                return false;
            }
        }

        while (i < s.Length)
        {
            var ch = s[i];
            if (ch == ',')
            {
                row.Add(col.ToString().Trim());
                col.Length = 0;
                i++;
            }
            else if (ch == '"')
            {
                i++;
                if (!TokenizeQuote(s, ref i, col, row))
                {
                    return false;
                }
            }
            else
            {
                col.Append(ch);
                i++;
            }
        }

        if (col.Length > 0)
        {
            row.Add(col.ToString().Trim());
            col.Length = 0;
        }

        return true;
    }

    public static bool TokenizeQuote(string s, ref int i, StringBuilder col, List<string> row)
    {
        while (i < s.Length)
        {
            var ch = s[i];
            if (ch == '"')
            {
                // escape sequence
                if (i + 1 < s.Length && s[i + 1] == '"')
                {
                    col.Append('"');
                    i++;
                    i++;
                    continue;
                }
                i++;
                return true;
            }
            else
            {
                col.Append(ch);
                i++;
            }
        }
        return false;
    }
}



If you need only reading csv files then I recommend this library: A Fast CSV Reader
If you also need to generate csv files then use this one: FileHelpers

Both of them are free and opensource.




This solution is using the official Microsoft.VisualBasic assembly to parse CSV.

Advantages:

  • delimiter escaping
  • ignores Header
  • trim spaces
  • ignore comments

Code:

    using Microsoft.VisualBasic.FileIO;

    public static List<List<string>> ParseCSV (string csv)
    {
        List<List<string>> result = new List<List<string>>();


        // To use the TextFieldParser a reference to the Microsoft.VisualBasic assembly has to be added to the project. 
        using (TextFieldParser parser = new TextFieldParser(new StringReader(csv))) 
        {
            parser.CommentTokens = new string[] { "#" };
            parser.SetDelimiters(new string[] { ";" });
            parser.HasFieldsEnclosedInQuotes = true;

            // Skip over header line.
            //parser.ReadLine();

            while (!parser.EndOfData)
            {
                var values = new List<string>();

                var readFields = parser.ReadFields();
                if (readFields != null)
                    values.AddRange(readFields);
                result.Add(values);
            }
        }

        return result;
    }