shell - two - sum of column in unix




Shell command to sum integers, one per line? (20)

I am looking for a command that will accept as input multiple lines of text, each line containing a single integer, and output the sum of these integers.

As a bit of background, I have a log file which includes timing measurements, so through grepping for the relevant lines, and a bit of sed reformatting I can list all of the timings in that file. I'd like to work out the total however, and my mind has gone blank as to any command I can pipe this intermediate output to in order to do the final sum. I've always used expr in the past, but unless it runs in RPN mode I don't think it's going to cope with this (and even then it would be tricky).

What am I missing? Given that there are probably several ways to achieve this, I will be happy to read (and upvote) any approach that works, even if someone else has already posted a different solution that does the job.

Related question: Shortest command to calculate the sum of a column of output on Unix? (credits @Andrew)


Update: Wow, as expected there are some nice answers here. Looks like I will definitely have to give awk deeper inspection as a command-line tool in general!


Alternative pure Perl, fairly readable, no packages or options required:

perl -e "map {$x += $_} <> and print $x" < infile.txt

BASH solution, if you want to make this a command (e.g. if you need to do this frequently):

addnums () {
  local total=0
  while read val; do
    (( total += val ))
  done
  echo $total
}

Then usage:

addnums < /tmp/nums

C (not simplified)

seq 1 10 | tcc -run <(cat << EOF
#include <stdio.h>
int main(int argc, char** argv) {
    int sum = 0;
    int i = 0;
    while(scanf("%d", &i) == 1) {
        sum = sum + i;
    }
    printf("%d\n", sum);
    return 0;
}
EOF)


I realize this is an old question, but I like this solution enough to share it.

% cat > numbers.txt
1 
2 
3 
4 
5
^D
% cat numbers.txt | perl -lpe '$c+=$_}{$_=$c'
15

If there is interest, I'll explain how it works.


I think AWK is what you are looking for:

awk '{sum+=$1}END{print sum}'

You can use this command either by passing the numbers list through the standard input or by passing the file containing the numbers as a parameter.


I've done a quick benchmark on the existing answers which

  • use only standard tools (sorry for stuff like lua or rocket),
  • are real one-liners,
  • are capable of adding huge amounts of numbers (100 million), and
  • are fast (I ignored the ones which took longer than a minute).

I always added the numbers of 1 to 100 million which was doable on my machine in less than a minute for several solutions.

Here are the results:

Python

:; seq 100000000 | python -c 'import sys; print sum(map(int, sys.stdin))'
5000000050000000
# 30s
:; seq 100000000 | python -c 'import sys; print sum(int(s) for s in sys.stdin)'
5000000050000000
# 38s
:; seq 100000000 | python3 -c 'import sys; print(sum(int(s) for s in sys.stdin))'
5000000050000000
# 27s
:; seq 100000000 | python3 -c 'import sys; print(sum(map(int, sys.stdin)))'
5000000050000000
# 22s
:; seq 100000000 | pypy -c 'import sys; print(sum(map(int, sys.stdin)))'
5000000050000000
# 11s
:; seq 100000000 | pypy -c 'import sys; print(sum(int(s) for s in sys.stdin))'
5000000050000000
# 11s

Awk

:; seq 100000000 | awk '{s+=$1} END {print s}'
5000000050000000
# 22s

Paste & Bc

This ran out of memory on my machine. It worked for half the size of the input (50 million numbers):

:; seq 50000000 | paste -s -d+ - | bc
1250000025000000
# 17s
:; seq 50000001 100000000 | paste -s -d+ - | bc
3750000025000000
# 18s

So I guess it would have taken ~35s for the 100 million numbers.

Perl

:; seq 100000000 | perl -lne '$x += $_; END { print $x; }'
5000000050000000
# 15s
:; seq 100000000 | perl -e 'map {$x += $_} <> and print $x'
5000000050000000
# 48s

Ruby

:; seq 100000000 | ruby -e "puts ARGF.map(&:to_i).inject(&:+)"
5000000050000000
# 30s

C

Just for comparison's sake I compiled the C version and tested this also, just to have an idea how much slower the tool-based solutions are.

#include <stdio.h>
int main(int argc, char** argv) {
    long sum = 0;
    long i = 0;
    while(scanf("%ld", &i) == 1) {
        sum = sum + i;
    }
    printf("%ld\n", sum);
    return 0;
}

 

:; seq 100000000 | ./a.out 
5000000050000000
# 8s

Conclusion

C is of course fastest with 8s, but the Pypy solution only adds a very little overhead of about 30% to 11s. But, to be fair, Pypy isn't exactly standard. Most people only have CPython installed which is significantly slower (22s), exactly as fast as the popular Awk solution.

The fastest solution based on standard tools is Perl (15s).


My fifteen cents:

$ cat file.txt | xargs  | sed -e 's/\ /+/g' | bc

Example:

$ cat text
1
2
3
3
4
5
6
78
9
0
1
2
3
4
576
7
4444
$ cat text | xargs  | sed -e 's/\ /+/g' | bc 
5148

One-liner in Racket:

racket -e '(define (g) (define i (read)) (if (eof-object? i) empty (cons i (g)))) (foldr + 0 (g))' < numlist.txt

Paste typically merges lines of multiple files, but it can also be used to convert individual lines of a file into a single line. The delimiter flag allows you to pass a x+x type equation to bc.

paste -s -d+ infile | bc

Alternatively, when piping from stdin,

<commands> | paste -s -d+ - | bc

Plain bash:

$ cat numbers.txt 
1
2
3
4
5
6
7
8
9
10
$ sum=0; while read num; do ((sum += num)); done < numbers.txt; echo $sum
55

Pure and short bash.

f=$(cat numbers.txt)
echo $(( ${f//$'\n'/+} ))

Real-time summing to let you monitor progress of some number-crunching task.

$ cat numbers.txt 
1
2
3
4
5
6
7
8
9
10

$ cat numbers.txt | while read new; do total=$(($total + $new)); echo $total; done
1
3
6
10
15
21
28
36
45
55

(There is no need to set $total to zero in this case. Neither you can access $total after the finish.)


The following should work (assuming your number is the second field on each line).

awk 'BEGIN {sum=0} \
 {sum=sum + $2} \
END {print "tot:", sum}' Yourinputfile.txt

The one-liner version in Python:

$ python -c "import sys; print(sum(int(l) for l in sys.stdin))"

With jq:

seq 10 | jq -s 'add' # 'add' is equivalent to 'reduce .[] as $item (0; . + $item)'

You can use your preferred 'expr' command you just need to finagle the input a little first:

seq 10 | tr '[\n]' '+' | sed -e 's/+/ + /g' -e's/ + $/\n/' | xargs expr

The process is:

  • "tr" replaces the eoln characters with a + symbol,
  • sed pads the '+' with spaces on each side, and then strips the final + from the line
  • xargs inserts the piped input into the command line for expr to consume.

You can using num-utils, although it may be overkill for what you need. This is a set of programs for manipulating numbers in the shell, and can do several nifty things, including of course, adding them up. It's a bit out of date, but they still work and can be useful if you need to do something more.

http://suso.suso.org/programs/num-utils/


dc -f infile -e '[+z1<r]srz1<rp'

Note that negative numbers prefixed with minus sign should be translated for dc, since it uses _ prefix rather than - prefix for that. For example, via tr '-' '_' | dc -f- -e '...'.

Edit: Since this answer got so many votes "for obscurity", here is a detailed explanation:

The expression [+z1<r]srz1<rp does the following:

[   interpret everything to the next ] as a string
  +   push two values off the stack, add them and push the result
  z   push the current stack depth
  1   push one
  <r  pop two values and execute register r if the original top-of-stack (1)
      is smaller
]   end of the string, will push the whole thing to the stack
sr  pop a value (the string above) and store it in register r
z   push the current stack depth again
1   push 1
<r  pop two values and execute register r if the original top-of-stack (1)
    is smaller
p   print the current top-of-stack

As pseudo-code:

  1. Define "add_top_of_stack" as:
    1. Remove the two top values off the stack and add the result back
    2. If the stack has two or more values, run "add_top_of_stack" recursively
  2. If the stack has two or more values, run "add_top_of_stack"
  3. Print the result, now the only item left in the stack

To really understand the simplicity and power of dc, here is a working Python script that implements some of the commands from dc and executes a Python version of the above command:

### Implement some commands from dc
registers = {'r': None}
stack = []
def add():
    stack.append(stack.pop() + stack.pop())
def z():
    stack.append(len(stack))
def less(reg):
    if stack.pop() < stack.pop():
        registers[reg]()
def store(reg):
    registers[reg] = stack.pop()
def p():
    print stack[-1]

### Python version of the dc command above

# The equivalent to -f: read a file and push every line to the stack
import fileinput
for line in fileinput.input():
    stack.append(int(line.strip()))

def cmd():
    add()
    z()
    stack.append(1)
    less('r')

stack.append(cmd)
store('r')
z()
stack.append(1)
less('r')
p()

perl -lne '$x += $_; END { print $x; }' < infile.txt




shell