read - working with text file in c#




Reading double values from a text file (5)

Trying to read data from a text file using a C# application. There're multiple lines of data and each of them start with an integer and then followed by bunch of double values. A part of the text file looks like this,

   33 0.573140941467E-01 0.112914262390E-03 0.255553577735E-02 0.497192659486E-04 0.141869181079E-01-0.147813598922E-03
   34 0.570076593453E-01 0.100112550891E-03 0.256427138318E-02-0.868691490164E-05 0.142821920093E-01-0.346011975369E-03
   35 0.715507714946E-01 0.316132133031E-03-0.106581466521E-01-0.920513736900E-04 0.138018668842E-01-0.212219497066E-03

Here 33, 34, 35 are integer values and it's followed by 6 double values. And these double values are not guaranteed to have space or some other delimiter between them. i.e., if a double is negative then it will have a "-" before it and this will take up the space. So basically, it's possible that all 6 double values will be together.

Now the challenge is, how to extract this gracefully?

What I tried:

String.Split(' ');

This will not work as a space is not guaranteed between the initial integer values and then the rest of double values.

This can be easily solved in C++ using sscanf .

double a, b, c, d, e, f;

sscanf(string, "%d %lf%lf%lf%lf%lf%lf", &a, &b, &c, &d, &e, &f);
// here string contains a line of data from text file.

The text file containing double values are generated by a 3rd party tool and I have no control over its output.

Is there a way the integer and double values can be gracefully extracted line by line?


I just went non optimal and replaced the "E-" string to something else while I replaced all the negative sign with a space and a negative sign (" -") then reverted all the "E-" values.

Then I was able to use split to extract the values.

private static IEnumerable<double> ExtractValues(string values)
{
    return values.Replace("E-", "E*").Replace("-", " -").Replace("E*", "E-").Split(' ').Select(v => double.Parse(v));
}

If I am seeing that right, you have a "Fixed Width Data" format. Than you can simply parse on that fact.

i.e. assuming the values are in a file d:\temp\doubles.txt :

void Main()
{
    var filename = @"d:\temp\doubles.txt";
    Func<string, string[]> split = (s) =>
    {
        string[] res = new string[7];
        res[0] = s.Substring(0, 2);
        for (int i = 0; i < 6; i++)
        {
            res[i + 1] = s.Substring(2 + (i * 19), 19);
        }
        return res;
    };
    var result = from l in File.ReadAllLines(filename)
                 let la = split(l)
                 select new
                 {
                    i = int.Parse(la[0]),
                     d1 = double.Parse(la[1]),
                     d2 = double.Parse(la[2]),
                     d3 = double.Parse(la[3]),
                     d4 = double.Parse(la[4]),
                     d5 = double.Parse(la[5]),
                     d6 = double.Parse(la[6])

                 };
    foreach (var e in result)
    {
        Console.WriteLine($"{e.i}, {e.d1}, {e.d2}, {e.d3}, {e.d4}, {e.d5}, {e.d6}");
    }
}

Outputs:

33, 0.0573140941467, 0.00011291426239, 0.00255553577735, 4.97192659486E-05, 0.0141869181079, -0.000147813598922
34, 0.0570076593453, 0.000100112550891, 0.00256427138318, -8.68691490164E-06, 0.0142821920093, -0.000346011975369
35, 0.0715507714946, 0.000316132133031, -0.0106581466521, -9.205137369E-05, 0.0138018668842, -0.000212219497066

PS: With your exact data, int should be allocating more space.


Solve this with a regular expression. My first shot is:

"[\s-+]\d+\.\d+E[+-]\d\d"

I just tried it this way:

using System;
using System.Globalization;
using System.Text.RegularExpressions;

namespace ConsoleApp1 {

    class Program {
        static void Main(string[] args) {
            var fileContents =
                  "33 0.573140941467E-01 0.112914262390E-03 0.255553577735E-02 0.497192659486E-04 0.141869181079E-01-0.147813598922E-03"
                + "34 0.570076593453E-01 0.100112550891E-03 0.256427138318E-02-0.868691490164E-05 0.142821920093E-01-0.346011975369E-03"
                + "35 0.715507714946E-01 0.316132133031E-03-0.106581466521E-01-0.920513736900E-04 0.138018668842E-01-0.212219497066E-03";

            var rex = new Regex(@"[\s-+]\d+\.\d+E[+-]\d\d", RegexOptions.Multiline);
            foreach (Match match in rex.Matches(fileContents)) {
                double d = double.Parse(match.Value.TrimStart(), NumberFormatInfo.InvariantInfo);
                Console.WriteLine("found a match: " + match.Value.TrimStart() + " => " + d);
            }

            Console.ReadLine();
        }
    }
}

With this output (german localization, with comma as decimal separator):

found a match: 0.573140941467E-01 => 0,0573140941467
found a match: 0.112914262390E-03 => 0,00011291426239
found a match: 0.255553577735E-02 => 0,00255553577735
found a match: 0.497192659486E-04 => 4,97192659486E-05
found a match: 0.141869181079E-01 => 0,0141869181079
found a match: -0.147813598922E-03 => -0,000147813598922
found a match: 0.570076593453E-01 => 0,0570076593453
found a match: 0.100112550891E-03 => 0,000100112550891
found a match: 0.256427138318E-02 => 0,00256427138318
found a match: -0.868691490164E-05 => -8,68691490164E-06
found a match: 0.142821920093E-01 => 0,0142821920093
found a match: -0.346011975369E-03 => -0,000346011975369
found a match: 0.715507714946E-01 => 0,0715507714946
found a match: 0.316132133031E-03 => 0,000316132133031
found a match: -0.106581466521E-01 => -0,0106581466521
found a match: -0.920513736900E-04 => -9,205137369E-05
found a match: 0.138018668842E-01 => 0,0138018668842
found a match: -0.212219497066E-03 => -0,000212219497066

The answers i have seen so far are so complex. Here is a simple one without overthinking

According to @Veljko89's comment, i have updated the code with unlimited number support

    List<double> ParseLine(string line)
    {
        List<double> ret = new List<double>();

        ret.Add(double.Parse(line.Substring(0, line.IndexOf(' '))));
        line = line.Substring(line.IndexOf(' ') + 1);

        for (; !string.IsNullOrWhiteSpace(line); line = line.Substring(line.IndexOf('E') + 4))
        {
            ret.Add(double.Parse(line.Substring(0, line.IndexOf('E') + 4)));
        }

        return ret;
    }

You could do this:

public void ParseFile(string fileLocation)
{
   string[] lines = File.ReadAllLines(fileLocation);

   foreach(var line in lines)
   {
       string[] parts = var Regex.Split(line, "(?((?<!E)-)| )");

       if(parts.Any())
       {
          int first = int.Parse(parts[0]);

          double[] others = parts.Skip(1).Select(a => double.Parse(a)).ToArray();
       }
   }
}   




c#