Xargs - The Separator Problem

The Separator Problem

Many Unix utilities are line oriented. These may work with xargs as long as the lines do not contain ', " or space. Some of the Unix utilities can use NUL as record separator (e.g. Perl (requires -0 and \0 instead of \n), locate (requires using -0), find (requires using -print0), grep (requires -z or -Z), sort (requires using -z)). Using -0 for xargs deals with the problem, but many Unix utilities cannot use NUL as separator (e.g. head, tail, ls, echo, sed, tar -v, wc, which).

But often people forget this and assume xargs is also line oriented, which is not the case (per default xargs separates on newlines and blanks within lines, substrings with blanks must be single or double-quoted).

The separator problem is illustrated here:

touch important_file touch 'not important_file' find . -name not\* | tail -1 | xargs rm mkdir -p '12" records' find \! -name . -type d | tail -1 | xargs rmdir

Running the above will cause important_file to be removed but will remove neither the directory called 12" records, nor the file called not important_file.

The proper fix is to use find -print0, but tail (and other tools) do not support NUL-terminated strings:

touch important_file touch 'not important_file' find -name not\* -print0 | xargs -0 rm mkdir -p '12" records' find \! -name . -type d -print0 | xargs -0 rmdir

When using the syntax find -print0, entries are separated by a null character instead of a end-of-line. This is equivalent to the more verbose command:

find -name not\* | tr \\n \\0 | xargs -0 rm

or shorter, by switching xargs to line oriented mode with the -d (delimiter) option:

find -name not\* | xargs -d '\n' rm

but in general the using the -0 option should be preferred, since newlines in filenames are still a problem.

GNU Parallel is an alternative to xargs that is designed to have the same options, but be line oriented. Thus, using GNU Parallel instead, the above would work as expected.

For Unix environments where xargs does not support the -0 option (e.g. Solaris, AIX), the following can not be used as it does not deal with ' and " (GNU Parallel would work on Solaris, though):

find -name not\* | sed 's/ /\\ /g' | xargs rm

Read more about this topic:  Xargs

Famous quotes containing the word problem:

    The problem with marriage is that it ends every night after making love, and it must be rebuilt every morning before breakfast.
    —Gabriel García Márquez (b. 1928)