sed - with - tr replace newline




How can I replace a newline(\n) using sed? (20)

Easy-to-understand Solution

I had this problem. The kicker was that I needed the solution to work on BSD's (Mac OS X) and GNU's (Linux and Cygwin) sed and tr:

$ echo 'foo
bar
baz


foo2
bar2
baz2' \
| tr '\n' '\000' \
| sed 's:\x00\x00.*:\n:g' \
| tr '\000' '\n'

Output:

foo
bar
baz

(has trailing newline)

It works on Linux, OS X, and BSD - even without UTF-8 support or with a crappy terminal.

  1. Use tr to swap the newline with another character.

    NULL (\000 or \x00) is nice because it doesn't need UTF-8 support and it's not likely to be used.

  2. Use sed to match the NULL

  3. Use tr to swap back extra newlines if you need them

How can I replace a newline (\n) using the sed command?

I unsuccessfully tried:

sed 's#\n# #g' file
sed 's#^$# #g' file

How do I fix it?


Bullet-proof solution. Binary-data-safe and POSIX-compliant, but slow.

POSIX sed requires input according to the POSIX text file and POSIX line definitions, so NULL-bytes and too long lines are not allowed and each line must end with a newline (including the last line). This makes it hard to use sed for processing arbitrary input data.

The following solution avoids sed and instead converts the input bytes to octal codes and then to bytes again, but intercepts octal code 012 (newline) and outputs the replacement string in place of it. As far as I can tell the solution is POSIX-compliant, so it should work on a wide variety of platforms.

od -A n -t o1 -v | tr ' \t' '\n\n' | grep . |
  while read x; do [ "0$x" -eq 012 ] && printf '<br>\n' || printf "\\$x"; done

POSIX reference documentation: sh, shell command language, od, tr, grep, read, [, printf.

Both read, [, and printf are built-ins in at least bash, but that is probably not guaranteed by POSIX, so on some platforms it could be that each input byte will start one or more new processes, which will slow things down. Even in bash this solution only reaches about 50 kB/s, so it's not suited for large files.

Tested on Ubuntu (bash, dash, and busybox), FreeBSD, and OpenBSD.


Fast answer:

sed ':a;N;$!ba;s/\n/ /g' file
  1. :a create a label 'a'
  2. N append the next line to the pattern space
  3. $! if not the last line, ba branch (go to) label 'a'
  4. s substitute, /\n/ regex for new line, / / by a space, /g global match (as many times as it can)

sed will loop through step 1 to 3 until it reach the last line, getting all lines fit in the pattern space where sed will substitute all \n characters


Alternatives:

All alternatives, unlike sed will not need to reach the last line to begin the process

with bash, slow

while read line; do printf "%s" "$line "; done < file

with perl, sed-like speed

perl -p -e 's/\n/ /' file

with tr, faster than sed, can replace by one character only

tr '\n' ' ' < file

with paste, tr-like speed, can replace by one character only

paste -s -d ' ' file

with awk, tr-like speed

awk 1 ORS=' ' file

Other alternative like "echo $(< file)" is slow, works only on small files and needs to process the whole file to begin the process.


Long answer from the sed FAQ 5.10:

5.10. Why can't I match or delete a newline using the \n escape
sequence? Why can't I match 2 or more lines using \n?

The \n will never match the newline at the end-of-line because the
newline is always stripped off before the line is placed into the
pattern space. To get 2 or more lines into the pattern space, use
the 'N' command or something similar (such as 'H;...;g;').

Sed works like this: sed reads one line at a time, chops off the
terminating newline, puts what is left into the pattern space where
the sed script can address or change it, and when the pattern space
is printed, appends a newline to stdout (or to a file). If the
pattern space is entirely or partially deleted with 'd' or 'D', the
newline is not added in such cases. Thus, scripts like

  sed 's/\n//' file       # to delete newlines from each line             
  sed 's/\n/foo\n/' file  # to add a word to the end of each line         

will NEVER work, because the trailing newline is removed before
the line is put into the pattern space. To perform the above tasks,
use one of these scripts instead:

  tr -d '\n' < file              # use tr to delete newlines              
  sed ':a;N;$!ba;s/\n//g' file   # GNU sed to delete newlines             
  sed 's/$/ foo/' file           # add "foo" to end of each line          

Since versions of sed other than GNU sed have limits to the size of
the pattern buffer, the Unix 'tr' utility is to be preferred here.
If the last line of the file contains a newline, GNU sed will add
that newline to the output but delete all others, whereas tr will
delete all newlines.

To match a block of two or more lines, there are 3 basic choices:
(1) use the 'N' command to add the Next line to the pattern space;
(2) use the 'H' command at least twice to append the current line
to the Hold space, and then retrieve the lines from the hold space
with x, g, or G; or (3) use address ranges (see section 3.3, above)
to match lines between two specified addresses.

Choices (1) and (2) will put an \n into the pattern space, where it
can be addressed as desired ('s/ABC\nXYZ/alphabet/g'). One example
of using 'N' to delete a block of lines appears in section 4.13
("How do I delete a block of specific consecutive lines?"). This
example can be modified by changing the delete command to something
else, like 'p' (print), 'i' (insert), 'c' (change), 'a' (append),
or 's' (substitute).

Choice (3) will not put an \n into the pattern space, but it does
match a block of consecutive lines, so it may be that you don't
even need the \n to find what you're looking for. Since GNU sed
version 3.02.80 now supports this syntax:

  sed '/start/,+4d'  # to delete "start" plus the next 4 lines,           

in addition to the traditional '/from here/,/to there/{...}' range
addresses, it may be possible to avoid the use of \n entirely.


@OP, if you want to replace newlines in a file, you can just use dos2unix (or unix2dox)

dos2unix yourfile yourfile

A solution I particularly like is to append all the file in the hold space and replace all newlines at the end of file:

$ (echo foo; echo bar) | sed -n 'H;${x;s/\n//g;p;}'
foobar

However, someone said me the hold space can be finite in some sed implementations.


Another GNU sed method, almost the same as Zsolt Botykai's answer, but this uses sed's less-frequently used y (transliterate) command, which saves one byte of code (the trailing g):

sed ':a;N;$!ba;y/\n/ /'

One would hope y would run faster than s, (perhaps at tr speeds, 20x faster), but in GNU sed v4.2.2 y is about 4% slower than s.


More portable BSD sed version:

sed -e ':a' -e 'N;$!ba' -e 'y/\n/ /'

I used a hybrid approach to get around the newline thing by using tr to replace newlines with tabs, then replacing tabs with whatever I want. In this case, "
" since I'm trying to generate HTML breaks.

echo -e "a\nb\nc\n" |tr '\n' '\t' | sed 's/\t/ <br> /g'`

I'm not an expert, but I guess in sed you'd first need to append the next line into the pattern space, bij using "N". From the section "Multiline Pattern Space" in "Advanced sed Commands" of the book sed & awk (Dale Dougherty and Arnold Robbins; O'Reilly 1997; page 107 in the preview):

The multiline Next (N) command creates a multiline pattern space by reading a new line of input and appending it to the contents of the pattern space. The original contents of pattern space and the new input line are separated by a newline. The embedded newline character can be matched in patterns by the escape sequence "\n". In a multiline pattern space, the metacharacter "^" matches the very first character of the pattern space, and not the character(s) following any embedded newline(s). Similarly, "$" matches only the final newline in the pattern space, and not any embedded newline(s). After the Next command is executed, control is then passed to subsequent commands in the script.

From man sed:

[2addr]N

Append the next line of input to the pattern space, using an embedded newline character to separate the appended material from the original contents. Note that the current line number changes.

I've used this to search (multiple) badly formatted log files, in which the search string may be found on an "orphaned" next line.


In response to the "tr" solution above, on Windows (probably using the Gnuwin32 version of tr), the proposed solution:

tr '\n' ' ' < input

was not working for me, it'd either error or actually replace the \n w/ '' for some reason.

Using another feature of tr, the "delete" option -d did work though:

tr -d '\n' < input

or '\r\n' instead of '\n'


In some situations maybe you can change RS to some other string or character. This way, \n is available for sub/gsub:

$ gawk 'BEGIN {RS="dn" } {gsub("\n"," ") ;print $0 }' file

The power of shell scripting is that if you do not know how to do it in one way you can do it in another way. And many times you have more things to take into account than make a complex solution on a simple problem.

Regarding the thing that gawk is slow... and reads the file into memory, I do not know this, but to me gawk seems to work with one line at the time and is very very fast (not that fast as some of the others, but the time to write and test also counts).

I process MB and even GB of data, and the only limit I found is line size.


On Mac OS X (using FreeBSD sed):

# replace each newline with a space
printf "a\nb\nc\nd\ne\nf" | sed -E -e :a -e '$!N; s/\n/ /g; ta'
printf "a\nb\nc\nd\ne\nf" | sed -E -e :a -e '$!N; s/\n/ /g' -e ta

The Perl version works the way you expected.

perl -i -p -e 's/\n//' file

As pointed out in the comments, it's worth noting that this edits in place. -i.bak will give you a backup of the original file before the replacement in case your regular expression isn't as smart as you thought.


Three things.

  1. tr (or cat, etc.) is absolutely not needed. (GNU) sed and (GNU) awk, when combined, can do 99.9% of any text processing you need.

  2. stream != line based. ed is a line-based editor. sed is not. See sed lecture for more information on the difference. Most people confuse sed to be line-based because it is, by default, not very greedy in its pattern matching for SIMPLE matches - for instance, when doing pattern searching and replacing by one or two characters, it by default only replaces on the first match it finds (unless specified otherwise by the global command). There would not even be a global command if it were line-based rather than STREAM-based, because it would evaluate only lines at a time. Try running ed; you'll notice the difference. ed is pretty useful if you want to iterate over specific lines (such as in a for-loop), but most of the times you'll just want sed.

  3. That being said,

    sed -e '{:q;N;s/\n/ /g;t q}' file
    

    works just fine in GNU sed version 4.2.1. The above command will replace all newlines with spaces. It's ugly and a bit cumbersome to type in, but it works just fine. The {}'s can be left out, as they're only included for sanity reasons.


To remove empty lines:

sed -n "s/^$//;t;p;"

Use this solution with GNU sed:

sed ':a;N;$!ba;s/\n/ /g' file

This will read the whole file in a loop, then replaces the newline(s) with a space.

Explanation:

  1. Create a label via :a.
  2. Append the current and next line to the pattern space via N.
  3. If we are before the last line, branch to the created label $!ba ($! means not to do it on the last line as there should be one final newline).
  4. Finally the substitution replaces every newline with a space on the pattern space (which is the whole file).

Here is cross-platform compatible syntax which works with BSD and OS X's sed (as per @Benjie comment):

sed -e ':a' -e 'N' -e '$!ba' -e 's/\n/ /g' file

As you can see, using sed for this otherwise simple problem is problematic. For a simpler and adequate solution see this answer.


Using Awk:

awk "BEGIN { o=\"\" }  { o=o \" \" \$0 }  END { print o; }"

You can use xargs:

seq 10 | xargs

or

seq 10 | xargs echo -n

You can use this method also

sed 'x;G;1!h;s/\n/ /g;$!d'

Explanation

x   - which is used to exchange the data from both space (pattern and hold).
G   - which is used to append the data from hold space to pattern space.
h   - which is used to copy the pattern space to hold space.
1!h - During first line won't copy pattern space to hold space due to \n is
      available in pattern space.
$!d - Clear the pattern space every time before getting next line until the
      last line.

Flow:
When the first line get from the input, exchange is made, so 1 goes to hold space and \n comes to pattern space, then appending the hold space to pattern space, and then substitution is performed and deleted the pattern space.
During the second line exchange is made, 2 goes to hold space and 1 comes to pattern space, then G append the hold space into the pattern space, then h copy the pattern to it and substitution is made and deleted. This operation is continued until eof is reached then print exact result.


gnu sed has an option -z for null separated records (lines). You can just call:

sed -z 's/\n/ /g'

sed '1h;1!H;$!d
     x;s/\n/ /g' YourFile

This does not work for huge files (buffer limit), but it is very efficient if there is enough memory to hold the file. (Correction H-> 1h;1!H after the good remark of @hilojack )

Another version that change new line while reading (more cpu, less memory)

 sed ':loop
 $! N
 s/\n/ /
 t loop' YourFile