GNU Parallel
GNU Parallel, Parallel or parallel is a shell routine to distribute task over multiple threads. It has been written by Ole Tange in Perl.
Some impressions of where parallel can help:
First line in chapter 1 of the manual reads:
If you write shell scripts to do the same processing for different input, then GNU Parallel will make your life easier and make your scripts run faster.
From the man page:
If you write loops in shell, you will find GNU parallel may be able to replace most of the loops and make them run faster by running several jobs in parallel.
This article is as of Nov. 2022 a work in progress. That's why it looks like a brainstorm session, rather than a structured article.
Installation
$ sudo apt install parallel ... The following NEW packages will be installed: parallel sysstat ...
Example intro chapter 1
The intro of chapter 1 of the manual contains this example:
seq 5 | parallel seq {} '>' example.{}
What it does:
seq 5
produces a series with numbers from 1 to 5seq {}
this takes those numbers as arguments for creating additional sequences: A sequence with only the number 1, a sequence with the numbers 1 and 2, until a sequence with the numbers 1 to 5'>' example.{}
: These 5 sequences are written to filesexample.1
...example.5
.
An even simpler example, although not very useful:
seq 5 | paralel echo {}
Here, the five echo
commands are executed parallel.
Just call a function x times
This seems like such an easy start, but no. Very instructive, definitely:
- The 'CPU consuming' part is a loop with an integer addition and a subtraction. I didn't want to use something with
sleep
, as that might not actually take up CPU resources - Execution time is mentioned for various implementations. I think this was on my laptop, but that's besides the point. The essence is being able to compare the results.
Baseline: Without parallel stuff
Execution time: 24s.
# parallel_test_function() ######################################## # function parallel_test_function() { printf "parallel_test_function - Start... " i=0 for ((i; i<=1000000; i++)) do i=$i+1 i=$i-1 done printf "Done. " } # Main ######################################## # # Execution time (function: 1000000x. Here: 6x): 23, 24, 24, 24 ⇒ 24s # export -f parallel_test_function start=`date +%s` j=0 for ((j; j<=5; j++)) do parallel_test_function done end=`date +%s` echo ""; echo Execution time was `expr $end - $start` seconds.
Just call a function with 'parallel'?
This doesn't work:
# Main ######################################## # export -f parallel_test_function start=`date +%s` j=0 for ((j; j<=5; j++)) do parallel parallel_test_function done end=`date +%s` echo ""; echo Execution time was `expr $end - $start` seconds.
It will result in this error:
parallel: Warning: Input is read from the terminal. You either know what you parallel: Warning: are doing (in which case: YOU ARE AWESOME!) or you forgot parallel: Warning: ::: or :::: or to pipe data into parallel. If so parallel: Warning: consider going through the tutorial: man parallel_tutorial parallel: Warning: Press CTRL-D to exit.
Working or not?
This seems to work, but it isn't any faster:
# Call test function with parallel (v1) ######################################## # # * Execution time (function: 1000000x. Here: 6x): 21, 23, 24 ⇒ 24s # * With less inner loops and more outer loops, this is even slower than # without using || # export -f parallel_test_function start=`date +%s` j=0 for ((j; j<=5; j++)) do sem parallel_test_function done end=`date +%s` echo ""; echo Execution time was `expr $end - $start` seconds.
Finally!
Execution time: 9s - This works:
# Call test function with parallel (v2) ######################################## # # Execution time (function: 1000000x. Here: 6x): 10, 9, 9 ⇒ 9s # export -f parallel_test_function start=`date +%s` seq 6 | parallel parallel_test_function end=`date +%s` echo ""; echo Execution time was `expr $end - $start` seconds.
I suspect that the previous trial didn't work, because the loop kills parallelisation: Probably only what is stated after the keyword parallel
, is actually parallelised. Seems quite logical (except for using sem
- would this reasoning still hold?)
So the trick seems to be: When you have a loop and you want to || is, make sure that you get rid of the loop. A bit similar to changing a select query to a update query: Always a bit of puzzling, but doable.
Resources
How to get parallel stuff into parallel?
I find it difficult to understand how to get stuff parallel into parallel. It seems like the same kind of difficulties I had with understanding SQL, which is a 4GL and does stuff implicitly. This section tries to give a bit of an overview.
One operator - Multiple arguments
Example of one operator with multiple arguments. In this case, the arguments are generated on the left of the command line, and piped into parallel. There is on
seq 10 | parallel echo {}
Is this the same as
seq 10 | parallel echo # Same as above?
It often seems that xargs
implicitly picks up where to insert the piped stuff. Same for parallel?
Multiple commands
How to include multiple commands in a GNU Parallel statement?
Use a script file
parallel < my_script.sh
No difference between operators & arguments - Example Leo
[1]:
$ parallel "{1} {2}" ::: 'printf "%02d "' 'printf "%03d "' ::: 1 2 01 02 001 002
What it does:
printf "%02d "
: Print "00 "printf "%03d "
: Print "000 "
Arguments are multiplied into:
* printf "%02d " 1 * printf "%02d " 2 * printf "%03d " 1 * printf "%03d " 2
and this is executed through parallel.
Small detail: I get output as above, but when I run the print
commands separately, I get more '0's. Maybe has to do with the single quotes around the print statements?
Use a function
From Chapter 5 of the manual:
The command can be a script, a binary or a Bash function if the function is exported using export -f : my_func() { echo in my_func $1 } export -f my_func parallel my_func ::: 1 2 3
Note export -f
: Parallel operates within a subshell, and stuff from the invoking shell has to be made available in the subshell, if needed. See elswhere in the article for details.
Inline
See separate chapter below.
Multiple commands inline - Example
With the right syntaxis, it's perfectly possible to include multiple statements when invoking GNU Parallel.
Let's start here:
parallel echo ::: $(seq 5)
which is synonymous to
parallel echo ::: `seq 5`
And these two commands already hold the key to executing multiple commands in a parallel invocation: You have to encapsulate them, so that they get executed as one unit (subshell?) at the right moment.
Now expand this to two commands with some trial and error:
# I guess parallel doesn't see a connection between the echo statement # and the seq statement # $ parallel echo "foobar"; echo {} ::: `seq 5` parallel: Warning: Input is read from the terminal. You either know what you parallel: Warning: are doing (in which case: YOU ARE AWESOME!) or you forgot parallel: Warning: ::: or :::: or to pipe data into parallel. If so parallel: Warning: consider going through the tutorial: man parallel_tutorial parallel: Warning: Press CTRL-D to exit.
Another trial with error:
# The part between $() gets evaluated first and results in "foobar 1", # "foobar 2", etc. - lacking bash commands # $ parallel $(echo "foobar"; echo {}) ::: `seq 5` /bin/bash: foobar: command not found /bin/bash: foobar: command not found /bin/bash: foobar: command not found /bin/bash: foobar: command not found /bin/bash: foobar: command not found
Finally, a trial-without-error:
$ parallel echo $(echo "foobar"; echo {}) ::: `seq 5` foobar 1 foobar 2 foobar 3 foobar 4 foobar 5
- It may look like there is seemingly one
echo
statement too much, but it really isn't: The once between$()
get evaluated and therefore disappear before Parallel gets to to anything - And Parallel does need a command to execute - Note that the stuff between
$()
does get the 'distributed input'
Let's expand this with a static additional statement, just for size:
parallel echo $(echo "Hoi"; echo {}; echo $((1+1))) ::: $(seq 5)
Let's now make the sum in the last statement dynamic:
$ parallel echo $(echo "Hoi"; echo {}; echo $(({}+{}))) ::: $(seq 5) bash: {}+{}: syntax error: operand expected (error token is "{}+{}") ...
The problem: The inner stuff gets evaluated before {}
gets substituted by Parallel.
Change this by putting the stuff that needs to be substituted through Parallel first, between apostrophes:
$ parallel$ parallel echo '$(echo "Hoi"; echo {}; echo $(({}+{})))' ::: $(seq 5) Hoi 1 2 Hoi 2 4 Hoi 3 6 Hoi 4 8 Hoi 5 10
How I tend to interpret this:
$()
is general Bash syntaxis concerning redirecting (?) the outcome of something- The apostrophes are a Parallel trick, to change the order of evaluation.
In this last example, let's see what happens if you remove the $()
part:
$ parallel echo 'echo "Hoi"; echo {}; echo $(({}+{}))' ::: $(seq 5) echo Hoi 1 2 echo Hoi 2 4 echo Hoi 3 6 echo Hoi 4 8 echo Hoi 5 10
What I think is happening:
There no part that gets evaluted first. It's just like calling parallel with something like
parallel echo ::: (echo "hoi"; echo {}; echo $(({}+{}))) ::: $(seq 5)
as if the first part is just an array with 3 elements. Note that {}
does get substituted correctly.
Splitting over multiple lines
Let's try to split the inline commands over multiple lines, to make them more readible:
parallel echo \ $( \ echo "Foo"; \ echo "bar"; \ echo "One"; \ echo {} \ ) ::: $(seq 5)
Output:
Foo bar One 1 Foo bar One 2 Foo bar One 3 Foo bar One 4 Foo bar One 5
This works too:
parallel echo \ $( \ echo "Foo"; \ echo "bar"; \ echo "One"; \ echo {} \ ) \ ::: $(seq 5)
Hierarchie of ()
This doesn't work: Bash gets confused by the ()'s:
parallel echo \ $( \ sql="update wp_terms join wp_term_taxonomy using (term_id) "; \ sql+="set slug=replace(slug, '{1}', '{2}') "; \ sql+="where taxonomy='product_cat';"; \ echo "$sql" \ ) ::: $(seq 5) :::+ $(seq 6 10)
Output:
/bin/bash: -c: line 0: syntax error near unexpected token `(' /bin/bash: -c: line 0: `echo update wp_terms join wp_term_taxonomy using term_id set slug=replace(slug, 1, 6) where taxonomy=product_cat;' /bin/bash: -c: line 0: syntax error near unexpected token `(' /bin/bash: -c: line 0: `echo update wp_terms join wp_term_taxonomy using term_id set slug=replace(slug, 2, 7) where taxonomy=product_cat;' /bin/bash: -c: line 0: syntax error near unexpected token `(' /bin/bash: -c: line 0: `echo update wp_terms join wp_term_taxonomy using term_id set slug=replace(slug, 3, 8) where taxonomy=product_cat;' /bin/bash: -c: line 0: syntax error near unexpected token `(' /bin/bash: -c: line 0: `echo update wp_terms join wp_term_taxonomy using term_id set slug=replace(slug, 4, 9) where taxonomy=product_cat;' /bin/bash: -c: line 0: syntax error near unexpected token `(' /bin/bash: -c: line 0: `echo update wp_terms join wp_term_taxonomy using term_id set slug=replace(slug, 5, 10) where taxonomy=product_cat;'
Multiple commands inline - More
Syntaxis for subshell?
The command to be executed in parallel, have to be enveloped with the right syntaxis, to be executed as one unit (subshell?) at the right moment:
- Appearantly, use
$(cmd1; cmd2; cmd3)
to include multiple commands in a pipeline [2] - I have better experiences with using backticks for this.
Example SO
seq 5 | parallel echo '$(j=$((2*{})); echo $(($j+100)))'
It looks a bit weird to have two echo
commands, but it works
More detailed
It seems I have to use apostrophes, rather than $()
. E.g., the following statements all work:
seq 5 | parallel echo '$(j=$((2*{})); echo $(($j+100)))' parallel echo '$(j=$((2*{})); echo $(($j+100)))' ::: $(seq 5) parallel 'echo "Parallel: "; echo {1} {2} {3}' ::: $(seq 5) parallel 'echo "Parallel: "; echo {1} {2} {3}' ::: $(seq 5)
But nothing works if I try to envelop stuff in $()
. E.g.:
$ parallel $(echo {}; echo {}) ::: $(seq 3) /bin/bash: 1: command not found /bin/bash: 2: command not found /bin/bash: 3: command not found
But this works:
$ parallel 'echo {}; echo {}' ::: $(seq 3) 1 1 2 2 3 3
:::
Provide 'parallel' vars to Gnu Parall using :::
. The first examples of Parallel that I came across, used pipes. I think using :::
is actually the more common way.
First example
# "seq" and "5" are regarded as parallel arguments for echo ;) $ parallel echo ::: seq 5 seq 5
The reason why this returns the arguments seq
and 5
, rather than the result of executing seq 5
: It isn't clear that these arguments actually need to be executed! Put the arguments within $()
to get them evaluated before being passed to Parallel.
Second example
This works! Note that output from parallel is usually on multiple lines:
$ i=$(seq 5) $ echo $i $ parallel echo ::: $i 1 2 3 4 5 1 2 3 4 5
:::+
With :::
you get the Cartesian product of the variables. If you don't want that, use :::+
. See example elsewhere in this article about parsing an associative array with three columns into Parallel.
An array isn't parallel - Unless it is?
I have the impression that here, parallel doesn't treat array entries as parallel stuff, but the whole entry as just one argument:
j=(1 2 3 4 5) echo ${j[@]} seq 5 | parallel echo {} echo ${j[@]} | parallel echo {} seq 5 | parallel 'echo $(({}+{}))' echo ${j[@]} | parallel 'echo $(({}+{}))' # Error: Invalid arithmetic operator
But this works:
j=(1 2 3 4 5) parallel echo ::: ${j[@]}
Operate on entries before parallel?
Can you first operate on an entry before its being processed by parallel?
Example using :::
:
# First the argument is expanded and only then the operator applied # (hence to the last item only) # $ i=$(seq 5) $ parallel echo ::: $i+1 1 2 3 4 5+1
The sum of the elements get evaluated first, as this part of the statement is within apostrophes:
$ seq 5 | parallel 'echo $(({}+{}))' 2 4 6 8 10
Rewritten in possibly a more common form, without pipeline:
parallel 'echo $(({}+{}))' ::: $(seq 5) 2 4 6 8 10
Multiple operations on entries before parallel?
Concerning WP-CLI, it would be really cool if multiple commands can be run parallel, that each do something with the output.
Example:
- Have a sequence 1...5
- Do this in parallel for each of the numbers:
- Multiply an entry by 2
- Add 1 to the result.
Causation is important here: If 'adding 1' is done in parallel to 'multiply by 2', the results might become unpredictable.
Let's try:
seq 5 | parallel 'echo ((2*{}))' ...
Reuse argument
Casus that I encounter using WP-CLI sometimes:
- Have a sequence 1...5
- Do this in parallel for each of the numbers:
- Multiply entry by 2
- Multiply entry by 3
- Add the outcome of these two multiplications
Let's start with the last three lines and first make sure I get that part right :)
i=5 echo $((2*$i + 3*$i))
Now together:
seq 5 | parallel 'echo $((2*{} + 3*{}))'
And why stop here?
seq 5 | parallel 'echo "{} - $((2*{}+3*{}))"'
Evaluate parallel argument first
This won't work:
$ seq 20 | parallel echo $(({}+{})) bash: {}+{}: syntax error: operand expected (error token is "{}+{}")
The reason: The part between () is evaluated first, and only then it is interpreted as an argument for parallel
.
To change that, put the parallel argument between single quotes:
seq 20 | parallel 'echo $(({}+{}))'
BTW, this doesn't work:
$ seq 20 | parallel echo $(('{}'+'{}')) bash: '{}'+'{}': syntax error: operand expected (error token is "'{}'+'{}'")
Reusing an argument multiple times
You can use the parallel argument multiple times: Just use {}
multiple times:
seq 20 | parallel echo $(({}+{}))
sem
sem stands for semaphore, a token that is passed around to do stuff in parallel. I bumped into this hier tegen, but I am not sure it works for me like this within a loop. The Parallels manual doesn't seem to be exhaustive concerning this topic. I found https://www.gnu.org/software/parallel/sem.html a much better source.
How many concurrent threads?
Questions
- Is it about threads, processors, cores, sockets or what?
- Do I need to optimize myself for the number of threads? Or just leave this up to GNU Parallels?
- What are the effects for sub-optimized cases?
Answers
- What cylinders are in a car, are processors or processing units in a computer. See Processors, cores & threads on this computer (Bash) for details.
- GNU Parallel clearly knows what the optimal number of threads is. See below in the testcode for the case with
sem -j +0
: Here the number of threads is the same as the number of processors, and the statistics confirm this - When optimizing manually, rather choose a bit too high a number of threads, than too low. However, this very much depends on the use case. E.g.: If CPU power is the bottleneck or I/O - I'm quite sure that for me, it's usually CPU-power, though.
Test scripts
################################################################################ # Thread optimalisation ################################################################################ # # My laptop can do 8 threads. Let's see what happens to performance when I # force more or less threads: # # # parallel_test_function() ######################################## # function parallel_test_function() { printf "PTF - Start... " i=0 for ((i; i<=100000; i++)) do i=$i+1 i=$i-1 done printf "Done. " } export -f parallel_test_function # # Test - 8 threads # ######################################## # # # # * Execution time (s): 5, 5, 5, 5 ⇒ 5s # # # start=`date +%s` # # # sem -j 8 parallel_test_function # sem -j 8 parallel_test_function # sem -j 8 parallel_test_function # sem -j 8 parallel_test_function # sem -j 8 parallel_test_function # sem -j 8 parallel_test_function # sem -j 8 parallel_test_function # sem -j 8 parallel_test_function # sem -j 8 parallel_test_function # sem -j 8 parallel_test_function # sem -j 8 parallel_test_function # sem -j 8 parallel_test_function # sem -j 8 parallel_test_function # sem -j 8 parallel_test_function # sem -j 8 parallel_test_function # sem -j 8 parallel_test_function # sem -j 8 parallel_test_function # sem -j 8 parallel_test_function # sem -j 8 parallel_test_function # sem -j 8 parallel_test_function # sem -j 8 parallel_test_function # sem -j 8 parallel_test_function # sem -j 8 parallel_test_function # sem -j 8 parallel_test_function # sem -j 8 parallel_test_function # sem -j 8 parallel_test_function # sem -j 8 parallel_test_function # sem -j 8 parallel_test_function # sem -j 8 parallel_test_function # sem -j 8 parallel_test_function # sem -j 8 parallel_test_function # sem -j 8 parallel_test_function # # # sem --wait # end=`date +%s` # echo ""; echo Execution time was `expr $end - $start` seconds. # Test - 16 threads ######################################## # # * Execution time (s): 5, 5, 5, 5, 5 ⇒ 5s # # start=`date +%s` # # # sem -j 16 parallel_test_function # sem -j 16 parallel_test_function # sem -j 16 parallel_test_function # sem -j 16 parallel_test_function # sem -j 16 parallel_test_function # sem -j 16 parallel_test_function # sem -j 16 parallel_test_function # sem -j 16 parallel_test_function # sem -j 16 parallel_test_function # sem -j 16 parallel_test_function # sem -j 16 parallel_test_function # sem -j 16 parallel_test_function # sem -j 16 parallel_test_function # sem -j 16 parallel_test_function # sem -j 16 parallel_test_function # sem -j 16 parallel_test_function # sem -j 16 parallel_test_function # sem -j 16 parallel_test_function # sem -j 16 parallel_test_function # sem -j 16 parallel_test_function # sem -j 16 parallel_test_function # sem -j 16 parallel_test_function # sem -j 16 parallel_test_function # sem -j 16 parallel_test_function # sem -j 16 parallel_test_function # sem -j 16 parallel_test_function # sem -j 16 parallel_test_function # sem -j 16 parallel_test_function # sem -j 16 parallel_test_function # sem -j 16 parallel_test_function # sem -j 16 parallel_test_function # sem -j 16 parallel_test_function # # # sem --wait # end=`date +%s` # echo ""; echo Execution time was `expr $end - $start` seconds. # # Test - 32 threads # ######################################## # # # # * Execution time (s): 6, 6, 5, 5, 6 ⇒ 5.6s # # # start=`date +%s` # # # sem -j 32 parallel_test_function # sem -j 32 parallel_test_function # sem -j 32 parallel_test_function # sem -j 32 parallel_test_function # sem -j 32 parallel_test_function # sem -j 32 parallel_test_function # sem -j 32 parallel_test_function # sem -j 32 parallel_test_function # sem -j 32 parallel_test_function # sem -j 32 parallel_test_function # sem -j 32 parallel_test_function # sem -j 32 parallel_test_function # sem -j 32 parallel_test_function # sem -j 32 parallel_test_function # sem -j 32 parallel_test_function # sem -j 32 parallel_test_function # sem -j 32 parallel_test_function # sem -j 32 parallel_test_function # sem -j 32 parallel_test_function # sem -j 32 parallel_test_function # sem -j 32 parallel_test_function # sem -j 32 parallel_test_function # sem -j 32 parallel_test_function # sem -j 32 parallel_test_function # sem -j 32 parallel_test_function # sem -j 32 parallel_test_function # sem -j 32 parallel_test_function # sem -j 32 parallel_test_function # sem -j 32 parallel_test_function # sem -j 32 parallel_test_function # sem -j 32 parallel_test_function # sem -j 32 parallel_test_function # # # sem --wait # end=`date +%s` # echo ""; echo Execution time was `expr $end - $start` seconds. # # Test - 4 threads # ######################################## # # # # * Execution time (s): 5, 6, 5, 6, 6, 5 ⇒ 5.5s # # # start=`date +%s` # # # sem -j 4 parallel_test_function # sem -j 4 parallel_test_function # sem -j 4 parallel_test_function # sem -j 4 parallel_test_function # sem -j 4 parallel_test_function # sem -j 4 parallel_test_function # sem -j 4 parallel_test_function # sem -j 4 parallel_test_function # sem -j 4 parallel_test_function # sem -j 4 parallel_test_function # sem -j 4 parallel_test_function # sem -j 4 parallel_test_function # sem -j 4 parallel_test_function # sem -j 4 parallel_test_function # sem -j 4 parallel_test_function # sem -j 4 parallel_test_function # sem -j 4 parallel_test_function # sem -j 4 parallel_test_function # sem -j 4 parallel_test_function # sem -j 4 parallel_test_function # sem -j 4 parallel_test_function # sem -j 4 parallel_test_function # sem -j 4 parallel_test_function # sem -j 4 parallel_test_function # sem -j 4 parallel_test_function # sem -j 4 parallel_test_function # sem -j 4 parallel_test_function # sem -j 4 parallel_test_function # sem -j 4 parallel_test_function # sem -j 4 parallel_test_function # sem -j 4 parallel_test_function # sem -j 4 parallel_test_function # # # sem --wait # end=`date +%s` # echo ""; echo Execution time was `expr $end - $start` seconds. # # Test - 2 threads # ######################################## # # # # * Execution time (s): 7, 6, 6, 7, 7, 7 ⇒ 6.7s # # # start=`date +%s` # # # sem -j 2 parallel_test_function # sem -j 2 parallel_test_function # sem -j 2 parallel_test_function # sem -j 2 parallel_test_function # sem -j 2 parallel_test_function # sem -j 2 parallel_test_function # sem -j 2 parallel_test_function # sem -j 2 parallel_test_function # sem -j 2 parallel_test_function # sem -j 2 parallel_test_function # sem -j 2 parallel_test_function # sem -j 2 parallel_test_function # sem -j 2 parallel_test_function # sem -j 2 parallel_test_function # sem -j 2 parallel_test_function # sem -j 2 parallel_test_function # sem -j 2 parallel_test_function # sem -j 2 parallel_test_function # sem -j 2 parallel_test_function # sem -j 2 parallel_test_function # sem -j 2 parallel_test_function # sem -j 2 parallel_test_function # sem -j 2 parallel_test_function # sem -j 2 parallel_test_function # sem -j 2 parallel_test_function # sem -j 2 parallel_test_function # sem -j 2 parallel_test_function # sem -j 2 parallel_test_function # sem -j 2 parallel_test_function # sem -j 2 parallel_test_function # sem -j 2 parallel_test_function # sem -j 2 parallel_test_function # # # sem --wait # end=`date +%s` # echo ""; echo Execution time was `expr $end - $start` seconds. # # Test - 1 thread # ######################################## # # # # * Execution time (s): 12, 12, 12, 12 ⇒ 12s # # # start=`date +%s` # # # sem -j 1 parallel_test_function # sem -j 1 parallel_test_function # sem -j 1 parallel_test_function # sem -j 1 parallel_test_function # sem -j 1 parallel_test_function # sem -j 1 parallel_test_function # sem -j 1 parallel_test_function # sem -j 1 parallel_test_function # sem -j 1 parallel_test_function # sem -j 1 parallel_test_function # sem -j 1 parallel_test_function # sem -j 1 parallel_test_function # sem -j 1 parallel_test_function # sem -j 1 parallel_test_function # sem -j 1 parallel_test_function # sem -j 1 parallel_test_function # sem -j 1 parallel_test_function # sem -j 1 parallel_test_function # sem -j 1 parallel_test_function # sem -j 1 parallel_test_function # sem -j 1 parallel_test_function # sem -j 1 parallel_test_function # sem -j 1 parallel_test_function # sem -j 1 parallel_test_function # sem -j 1 parallel_test_function # sem -j 1 parallel_test_function # sem -j 1 parallel_test_function # sem -j 1 parallel_test_function # sem -j 1 parallel_test_function # sem -j 1 parallel_test_function # sem -j 1 parallel_test_function # sem -j 1 parallel_test_function # # # sem --wait # end=`date +%s` # echo ""; echo Execution time was `expr $end - $start` seconds. # Test - Auto-optimized ######################################## # # * Execution time (s): 5, 5, 5, 5, 5 ⇒ 5s # start=`date +%s` # sem -j +0 parallel_test_function sem -j +0 parallel_test_function sem -j +0 parallel_test_function sem -j +0 parallel_test_function sem -j +0 parallel_test_function sem -j +0 parallel_test_function sem -j +0 parallel_test_function sem -j +0 parallel_test_function sem -j +0 parallel_test_function sem -j +0 parallel_test_function sem -j +0 parallel_test_function sem -j +0 parallel_test_function sem -j +0 parallel_test_function sem -j +0 parallel_test_function sem -j +0 parallel_test_function sem -j +0 parallel_test_function sem -j +0 parallel_test_function sem -j +0 parallel_test_function sem -j +0 parallel_test_function sem -j +0 parallel_test_function sem -j +0 parallel_test_function sem -j +0 parallel_test_function sem -j +0 parallel_test_function sem -j +0 parallel_test_function sem -j +0 parallel_test_function sem -j +0 parallel_test_function sem -j +0 parallel_test_function sem -j +0 parallel_test_function sem -j +0 parallel_test_function sem -j +0 parallel_test_function sem -j +0 parallel_test_function sem -j +0 parallel_test_function # sem --wait end=`date +%s` echo ""; echo Execution time was `expr $end - $start` seconds.
Sources
- https://unix.stackexchange.com/questions/114672/gnu-parallel-more-than-one-per-cpu
- https://www.gnu.org/software/parallel/sem.html
Subshells & variables
I have the impression that GNU Parallel creates a subshell and that precautions have to be taken to assure that functions and variables are available in that subshell.
Not that this subshell stuff is not the same as scope within a single shell
Without exported function or variable
################################################################################ # Subshell & vars? - Without exporting function or var ################################################################################ # # Function ######################################## # subfunction() { echo "Subfunction - Var j: $j" } # Main ######################################## # j=12 echo "Main - var j: $j" # parallel subfunction ::: $(seq 5)
Output:
Main - var j: 12 /bin/bash: subfunction: command not found /bin/bash: subfunction: command not found /bin/bash: subfunction: command not found /bin/bash: subfunction: command not found /bin/bash: subfunction: command not found
With exported function
Now the function is exported using export -f subfunction
and GNU Parallel can find it. However, the variable j
is not available within this function.
################################################################################ # Subshell & vars? - With exporting function ################################################################################ # # Function ######################################## # subfunction() { echo "Subfunction - Var j: $j" } # Main ######################################## # j=12 echo "Main - var j: $j" export -f subfunction # parallel subfunction ::: $(seq 5)
Output:
Main - var j: 12 Subfunction - Var j: Subfunction - Var j: Subfunction - Var j: Subfunction - Var j: Subfunction - Var j:
With exported function and exporter variable
Juhu! Sometimes, things are easy:
################################################################################ # Subshell & vars? - With exporting function ################################################################################ # # Function ######################################## # subfunction() { echo "Subfunction - Var j: $j" } # Main ######################################## # j=12 echo "Main - var j: $j" export -f subfunction export j # parallel subfunction ::: $(seq 5)
Output:
Main - var j: 12 Subfunction - Var j: 12 Subfunction - Var j: 12 Subfunction - Var j: 12 Subfunction - Var j: 12 Subfunction - Var j: 12
But not for arrays
It seems that regular arrays and associate arrays cannot be exported to subshells:
################################################################################ # Subshell, var & arrays ################################################################################ # # Function ######################################## # subfunction() { echo "function - Var i: $i" echo "function - Associative array j: ${j[@]}" echo "function - Regular array k: ${k[@]}" } # Main ######################################## # i=12 declare -gA j j[foo,1]="Foo-1" j[bar,2]="Bar-2" k[1]="K1" k[2]="K2" echo "Main - var j: $i" export -f subfunction export i export j # Doesn't work export j[@] # Doesn't work export k # Doesn't work export k[@] # Doesn't work export {k[@]} # Doesn't work # parallel subfunction ::: $(seq 5)
Output:
Main - var j: 12 ./parallel.sh: line 181: export: `j[@]': not a valid identifier ./parallel.sh: line 183: export: `k[@]': not a valid identifier ./parallel.sh: line 184: export: `{k[@]}': not a valid identifier function - Var i: 12 function - Associative array j: function - Regular array k: function - Var i: 12 function - Associative array j: function - Regular array k: function - Var i: 12 function - Associative array j: function - Regular array k: function - Var i: 12 function - Associative array j: function - Regular array k: function - Var i: 12 function - Associative array j: function - Regular array k:
Pass arrays to GNU Parallel
As mentioned before, you cannot pass an array (regular or associative) to subshells and therefore in some situations to Parallels. But there is hope [3]:
- Use
:::+
? - Something with exporting functions?
- ?
Using :::+
This works exactly as intended:
################################################################################ # GNU Parallel & associative array ################################################################################ # # tmps() ######################################## # tmp2() { echo ""; echo "tmp2: " echo $1, $2, $3 } # Build the associative array ######################################## # declare -gA j j[1,tag]="_tool_" j[1,nl]="Boormachine" j[1,en]="Drilling machine" j[2,tag]="_tool_" j[2,nl]="Zaag" j[2,en]="Saw" j[3,tag]="_dim_" j[3,nl]="1,3" j[3,en]="1.3" j_rows=3 # echo ${j[@]} # Convert to regular array ######################################## # # * I need a structure with "Boormachine", "Zaag" en "1,3" that can be used # as an argument for Parallel. That's not possible with an associate # array, I guess. E.g.: echo ${j[1*]} doesn't work # * The code below to construct temporary regular arrays, is quite # inefficient. Would be nice to do it in a more efficient way, # maybe using GNU Parallel instead of a loop? # # unset j_tag unset j_nl unset j_en for i in $(seq $j_rows) do j_tag+=("${j[$i,tag]}") j_nl+=("${j[$i,nl]}") j_en+=("${j[$i,en]}") done # echo "j_tag: ${j_tag[@]}" # echo "j_nl: ${j_nl[@]}" # echo "j_en: ${j_en[@]}" # Invoke GNU Parallel ######################################## # export -f tmp2 # # parallel tmp2 ::: "${j_tag[@]}" ::: "${j_nl[@]}" ::: "${j_en[@]}" # 27 combinations? # parallel tmp2 ::: "${j_tag[@]}" ::: "${j_nl[@]}" # 9 combinations # parallel tmp2 ::: "${j_tag[@]}" :::+ "${j_nl[@]}" # 6 combinations # parallel tmp2 ::: "${j_tag[@]}" :::+ "${j_nl[@]}" :::+ "${j_en[@]}" # 3 rows - As intended
BTW: The function that is used here, could easily be replacedd by a direct statement, leading to something like this:
# With a single inline operator ######################################## # parallel echo {1} {2} {3} ::: "${j_tag[@]}" :::+ "${j_nl[@]}" :::+ "${j_en[@]}"
And for something even more exciting: Now with multiple inline commands:
# With multiple inline operators ######################################## # parallel echo {1} {2} {3} ::: "${j_tag[@]}" :::+ "${j_nl[@]}" :::+ "${j_en[@]}"
Sources
- https://unix.stackexchange.com/questions/395298/gnu-parallel-two-parameters-from-array-as-parameter
- https://www.autoscripts.net/gnu-parallel-two-parameters-from-array-as-parameter/
case: collect term_ids through wp-cli
Seems like a good case for replacing a loop with parallel.
Original code:
# Collect all term_ids through a loop ####################################### # # * There are 1.433 terms to collect # * Max. 100 items are returned at once # * Hence this loop needs 15 iterations # i=1 echo "Loop - Collect all term_ids" # for ((i; i<=$number_of_iterations; i++)) do # echo " Iteration $i/$number_of_iterations" # # Store batch of term ids in tmp array j ######################################## # mapfile -t j < <( wp wc product_attribute_term list \ $taxonomy_id \ --user=4 \ --field=id \ --offset=$((($i-1)*100)) | grep . ) # # echo " j: ${j[@]}" # # Append to array term_id ######################################## # term_id=(${term_id[@]} ${j[@]}) echo " Length term_id: ${#term_id[@]}" # done
New code:
See also
Sources
- https://en.wikipedia.org/wiki/GNU_parallel
- https://www.gnu.org/software/parallel/
- https://zenodo.org/record/1146014/files/GNU_Parallel_2018.pdf?download=1
- https://bash-prompt.net/guides/parallell-bash/
- https://medium.com/linuxstories/bash-parallel-command-execution-d4bd7c7cc1d6
- https://adamtheautomator.com/how-to-speed-up-bash-scripts-with-multithreading-and-gnu-parallel/
- https://www.baeldung.com/linux/processing-commands-in-parallel
- https://www.msi.umn.edu/support/faq/how-can-i-use-gnu-parallel-run-lot-commands-parallel
- https://stackoverflow.com/questions/61483185/gnu-parallel-multiple-commands