I needed to compare two large directories with thousands of similarly named PDF files and find the differing filenames between them. In the first pass, this is what I did:
Listed out the content of the first directory and saved it in a file:
ls dir1 > dir1.txt
Did the same for the second directory:
ls dir2 > dir2.txt
Compared the difference between the two outputs:
diff dir1.txt dir2.txt
This returned the name of the differing files likes this:
3c3,4
< f3.pdf
---
> f4.pdf
> f5.pdf
It does the job, but I asked BingChat if there’s a better way to accomplish the task without creating intermediate files, and it didn’t let me down. Turns out that in Bash, process substitution allows you to do just that. Instead of running three commands, you can achieve the same result with a simple one-liner:
diff <(ls dir1) <(ls dir2)
Process substitution
In Bash, process substitution is a feature that allows you to treat the output of a command or commands as if it were a file. It enables you to use the output of a command as an input to another command or perform other operations that expect file input or output.
- One important thing to point out is that process substitution is specific to Bash, Zsh, and certain versions of Ksh. Other shells and Bash in POSIX mode don’t understand it. Bash, Zsh, and Ksh (88,93) support process substitution, but pdksh derivatives like mksh don’t currently have this capability.*
The syntax for process substitution is as follows:
<(command)
: This form allows you to use the output of a command as a file-like input.>(command)
: This form allows you to use the output of a command as a file-like output.
When using process substitution, Bash creates a named pipe (FIFO) or a special file
descriptor /dev/fd/<n>
behind the scenes. The command within the parentheses is executed,
and its output is redirected to the named pipe or file descriptor. Then, the path to the
named pipe or file descriptor is substituted into the original command line.
This is different from the plain-old stdin
or stdout
redirection. Here’s how:
Input
- Plain redirection: When using plain stdin redirection (
<
), you can redirect input from a file, for example,< input.txt
. The command reads the content of the file as standard input (stdin). - Process substitution: With process substitution, you can use the output of a
command as input. For example,
command < <(echo "input")
. Here, the output of theecho
command is treated as a file-like object and used as the input tocommand
.
- Plain redirection: When using plain stdin redirection (
Output
- Plain redirection: Using plain stdout redirection (
>
or>>
), you can redirect the output (stdout) of a command to a file, for example,command > output.txt
. The command’s output is written to the specified file. - Process substitution: With process substitution, you can use the output of a
command as output. For example,
command >(process_output)
. Here, the output ofcommand
is treated as a file-like output, and it is passed as input to theprocess_output
command or operation.
- Plain redirection: Using plain stdout redirection (
By using process substitution, the output of a command can be seamlessly integrated into other commands as if it were a file, even if the command doesn’t explicitly support stdin or stdout redirection. This allows for greater compatibility and enables the use of the output in situations where direct piping or redirection may not be possible.
A few practical examples
Inspecting the descriptors involved in process substitution
You can inspect the descriptor used by a process substitution like this:
echo >(true) <(false)
This returns:
/dev/fd/13 /dev/fd/11
Here, the expression >(true)
creates a temporary file-like object, and the true
command
serves as a placeholder for its input. Similarly, <(false)
creates another temporary
file-like object with the false command serving as a placeholder for its output. When the
echo
command is executed, it displays the filenames associated with these temporary
file-like objects, which are /dev/fd/13
and /dev/fd/11
in this specific scenario. These
filenames represent the underlying descriptors of the respective process substitutions,
indicating the file descriptors associated with the temporary objects created during the
process substitution.
Calculating the total number of lines in a file
wc -l < <(cat input.txt)
This command calculates the total number of lines in the input.txt
file. Here, the
<(cat input.txt)
commad creates an input-type file descriptor containing the output of the
cat
command and wc -l
reads that content from there. The extra <
redirects the
file-like object as an input stream again. This is a roundabout way of doing the following:
cat input.txt | wc -l
Processing the content of a file line by line
while read line;
do echo $line;
done < <(cat input.txt)
This command reads each line from the file input.txt
and echoes it. It uses a while
loop
with the read command to iterate over the lines, assigning each line to the variable line
and the echo $line
command displays the line. Process substitution <()
is used to treat
the output of cat input.txt
as a temporary file, providing the input to the loop.
Comparing directory sizes
diff -r <(du -sh dir1) <(du -sh dir2)
The command compares the disk usage of two directories, dir1
and dir2
, using the diff
command. The process substitution <()
is employed to capture the output of the du -sh
command, which calculates the disk usage of each directory and provides a summary in a
human-readable format. The output of each du -sh
command, representing the disk usage of
dir1
and dir2
, is treated as temporary files and passed as arguments to the diff
command. This enables the comparison of the disk usage between the two directories,
highlighting any discrepancies in file sizes or subdirectories.
Picking or rejecting lines common between two sorted files
comm <(echo 'hello world\nhello mars' | sort) \
<(echo 'hello world\nhello venus' | sort)
This returns:
hello mars
hello venus
hello world
This performs a comparison between the sorted outputs of two separate commands using comm
.
The com
command expects two files but we’re using process substitution to make two
file-like objects from stdout.
Within the first process substitution <()
, echo
is used to generate a string containing
two lines: hello world
and hello mars
. This string is then piped to the sort
command,
which sorts the lines alphabetically.
Similarly, the second part of the command <()
uses process substitution as well. It
follows the same pattern as the first process substitution, but this time the string
contains hello world
and hello venus
.
The file-like objects containing the sorted output from the two process substitutions are
then passed as arguments to the comm
command. Then comm
compares the input files line by
line and generates three columns of output: lines unique to the first input, lines unique to
the second input, and lines common to both inputs.
Recent posts
- Hierarchical rate limiting with Redis sorted sets
- Dynamic shell variables
- Link blog in a static site
- Running only a single instance of a process
- Function types and single-method interfaces in Go
- SSH saga
- Injecting Pytest fixtures without cluttering test signatures
- Explicit method overriding with @typing.override
- Quicker startup with module-level __getattr__
- Docker mount revisited