GNU Parallel

Uit De Vliegende Brigade
Naar navigatie springen Naar zoeken springen
Logo GNU Parallel

GNU Parallel, Parallel or parallel is a shell routine to distribute task over multiple threads. It has been written by Ole Tange in Perl.

Some impressions of where parallel can help:

First line in chapter 1 of the manual reads:

If you write shell scripts to do the same processing for different input,
then GNU Parallel will make your life easier and make your scripts run
faster.

From the man page:

If you write loops in shell, you will find GNU parallel may be able to
replace most of the loops and make them run faster by running several
jobs in parallel.

This article is as of Nov. 2022 a work in progress. That's why it looks like a brainstorm session, rather than a structured article.

Installation

$ sudo apt install parallel

...
The following NEW packages will be installed:
  parallel sysstat
...

Example intro chapter 1

The intro of chapter 1 of the manual contains this example:

seq 5 | parallel seq {} '>' example.{}

What it does:

  • seq 5 produces a series with numbers from 1 to 5
  • seq {} this takes those numbers as arguments for creating additional sequences: A sequence with only the number 1, a sequence with the numbers 1 and 2, until a sequence with the numbers 1 to 5
  • '>' example.{}: These 5 sequences are written to files example.1...example.5.

An even simpler example, although not very useful:

seq 5 | paralel echo {}

Here, the five echo commands are executed parallel.

Just call a function x times

This seems like such an easy start, but no. Very instructive, definitely:

  • The 'CPU consuming' part is a loop with an integer addition and a subtraction. I didn't want to use something with sleep, as that might not actually take up CPU resources
  • Execution time is mentioned for various implementations. I think this was on my laptop, but that's besides the point. The essence is being able to compare the results.

Baseline: Without parallel stuff

Execution time: 24s.

# parallel_test_function()
########################################
#
function parallel_test_function()
{
   printf "parallel_test_function - Start... "
   i=0
   for ((i; i<=1000000; i++))
   do
      i=$i+1
      i=$i-1
   done
   printf "Done. "
}

# Main
########################################
#
# Execution time (function: 1000000x. Here: 6x): 23, 24, 24, 24 ⇒ 24s
#
export -f parallel_test_function

start=`date +%s`
j=0
for ((j; j<=5; j++))
do
   parallel_test_function
done

end=`date +%s`
echo ""; echo Execution time was `expr $end - $start` seconds.

Just call a function with 'parallel'?

This doesn't work:

# Main
########################################
#
export -f parallel_test_function
start=`date +%s`
j=0
for ((j; j<=5; j++))
do
   parallel parallel_test_function
done
end=`date +%s`
echo ""; echo Execution time was `expr $end - $start` seconds.

It will result in this error:

parallel: Warning: Input is read from the terminal. You either know what you
parallel: Warning: are doing (in which case: YOU ARE AWESOME!) or you forgot
parallel: Warning: ::: or :::: or to pipe data into parallel. If so
parallel: Warning: consider going through the tutorial: man parallel_tutorial
parallel: Warning: Press CTRL-D to exit.

Working or not?

This seems to work, but it isn't any faster:

# Call test function with parallel (v1)
########################################
#
# * Execution time (function: 1000000x. Here: 6x): 21, 23, 24 ⇒ 24s
# * With less inner loops and more outer loops, this is even slower than
#   without using ||
#
export -f parallel_test_function

start=`date +%s`
j=0
for ((j; j<=5; j++))
do
   sem parallel_test_function
done
end=`date +%s`
echo ""; echo Execution time was `expr $end - $start` seconds.

Finally!

Execution time: 9s - This works:

# Call test function with parallel (v2)
########################################
#
# Execution time (function: 1000000x. Here: 6x): 10, 9, 9 ⇒ 9s
#
export -f parallel_test_function

start=`date +%s`

seq 6 | parallel parallel_test_function

end=`date +%s`
echo ""; echo Execution time was `expr $end - $start` seconds.

I suspect that the previous trial didn't work, because the loop kills parallelisation: Probably only what is stated after the keyword parallel, is actually parallelised. Seems quite logical (except for using sem - would this reasoning still hold?)

So the trick seems to be: When you have a loop and you want to || is, make sure that you get rid of the loop. A bit similar to changing a select query to a update query: Always a bit of puzzling, but doable.

Resources

How to get parallel stuff into parallel?

I find it difficult to understand how to get stuff parallel into parallel. It seems like the same kind of difficulties I had with understanding SQL, which is a 4GL and does stuff implicitly. This section tries to give a bit of an overview.

One operator - Multiple arguments

Example of one operator with multiple arguments. In this case, the arguments are generated on the left of the command line, and piped into parallel. There is on

seq 10 | parallel echo {}

Is this the same as

seq 10 | parallel echo   # Same as above?

It often seems that xargs implicitly picks up where to insert the piped stuff. Same for parallel?

Multiple commands

How to include multiple commands in a GNU Parallel statement?

Use a script file

here

parallel < my_script.sh

No difference between operators & arguments - Example Leo

[1]:

$ parallel "{1} {2}" ::: 'printf "%02d "' 'printf "%03d "' ::: 1 2
01 02 001 002

What it does:

  • printf "%02d ": Print "00 "
  • printf "%03d ": Print "000 "

Arguments are multiplied into:

* printf "%02d " 1
* printf "%02d " 2
* printf "%03d " 1
* printf "%03d " 2

and this is executed through parallel.

Small detail: I get output as above, but when I run the print commands separately, I get more '0's. Maybe has to do with the single quotes around the print statements?

Use a function

From Chapter 5 of the manual:

The command can be a script, a binary or a Bash function if the function
is exported using

   export -f :

my_func()
{
   echo in my_func $1
}

export -f my_func

parallel my_func ::: 1 2 3

Note export -f: Parallel operates within a subshell, and stuff from the invoking shell has to be made available in the subshell, if needed. See elswhere in the article for details.

Inline

See separate chapter below.

Multiple commands inline - Example

With the right syntaxis, it's perfectly possible to include multiple statements when invoking GNU Parallel.

Let's start here:

parallel echo ::: $(seq 5)

which is synonymous to

parallel echo ::: `seq 5`

And these two commands already hold the key to executing multiple commands in a parallel invocation: You have to encapsulate them, so that they get executed as one unit (subshell?) at the right moment.

Now expand this to two commands with some trial and error:

# I guess parallel doesn't see a connection between the echo statement
# and the seq statement
#
$ parallel echo "foobar"; echo {} ::: `seq 5`

parallel: Warning: Input is read from the terminal. You either know what you
parallel: Warning: are doing (in which case: YOU ARE AWESOME!) or you forgot
parallel: Warning: ::: or :::: or to pipe data into parallel. If so
parallel: Warning: consider going through the tutorial: man parallel_tutorial
parallel: Warning: Press CTRL-D to exit.

Another trial with error:

# The part between $() gets evaluated first and results in "foobar 1",
# "foobar 2", etc. - lacking bash commands
#
$ parallel $(echo "foobar"; echo {}) ::: `seq 5`

/bin/bash: foobar: command not found
/bin/bash: foobar: command not found
/bin/bash: foobar: command not found
/bin/bash: foobar: command not found
/bin/bash: foobar: command not found

Finally, a trial-without-error:

$ parallel echo $(echo "foobar"; echo {}) ::: `seq 5`

foobar 1
foobar 2
foobar 3
foobar 4
foobar 5
  • It may look like there is seemingly one echo statement too much, but it really isn't: The once between $() get evaluated and therefore disappear before Parallel gets to to anything - And Parallel does need a command to execute
  • Note that the stuff between $() does get the 'distributed input'

Let's expand this with a static additional statement, just for size:

parallel echo $(echo "Hoi"; echo {}; echo $((1+1))) ::: $(seq 5)

Let's now make the sum in the last statement dynamic:

$ parallel echo $(echo "Hoi"; echo {}; echo $(({}+{}))) ::: $(seq 5)

bash: {}+{}: syntax error: operand expected (error token is "{}+{}")
...

The problem: The inner stuff gets evaluated before {} gets substituted by Parallel.

Change this by putting the stuff that needs to be substituted through Parallel first, between apostrophes:

$ parallel$ parallel echo '$(echo "Hoi"; echo {}; echo $(({}+{})))' ::: $(seq 5)

Hoi 1 2
Hoi 2 4
Hoi 3 6
Hoi 4 8
Hoi 5 10

How I tend to interpret this:

  • $() is general Bash syntaxis concerning redirecting (?) the outcome of something
  • The apostrophes are a Parallel trick, to change the order of evaluation.

In this last example, let's see what happens if you remove the $() part:

$ parallel echo 'echo "Hoi"; echo {}; echo $(({}+{}))' ::: $(seq 5)
echo Hoi
1
2
echo Hoi
2
4
echo Hoi
3
6
echo Hoi
4
8
echo Hoi
5
10

What I think is happening:

There no part that gets evaluted first. It's just like calling parallel with something like

parallel echo ::: (echo "hoi"; echo {}; echo $(({}+{}))) ::: $(seq 5)

as if the first part is just an array with 3 elements. Note that {} does get substituted correctly.

Splitting over multiple lines

Let's try to split the inline commands over multiple lines, to make them more readible:

parallel echo \
$(  \
	echo "Foo"; \
	echo "bar"; \
	echo "One"; \
	echo {} \
) ::: $(seq 5)

Output:

Foo bar One 1
Foo bar One 2
Foo bar One 3
Foo bar One 4
Foo bar One 5

This works too:

parallel echo \
$(  \
	echo "Foo"; \
	echo "bar"; \
	echo "One"; \
	echo {} \
) \
::: $(seq 5)

Hierarchie of ()

This doesn't work: Bash gets confused by the ()'s:

parallel echo \
$( \
	sql="update wp_terms join wp_term_taxonomy using (term_id) "; \
	sql+="set slug=replace(slug, '{1}', '{2}') "; \
	sql+="where taxonomy='product_cat';"; \
	echo "$sql" \
) ::: $(seq 5) :::+ $(seq 6 10)

Output:

/bin/bash: -c: line 0: syntax error near unexpected token `('
/bin/bash: -c: line 0: `echo update wp_terms join wp_term_taxonomy using term_id set slug=replace(slug, 1, 6) where taxonomy=product_cat;'
/bin/bash: -c: line 0: syntax error near unexpected token `('
/bin/bash: -c: line 0: `echo update wp_terms join wp_term_taxonomy using term_id set slug=replace(slug, 2, 7) where taxonomy=product_cat;'
/bin/bash: -c: line 0: syntax error near unexpected token `('
/bin/bash: -c: line 0: `echo update wp_terms join wp_term_taxonomy using term_id set slug=replace(slug, 3, 8) where taxonomy=product_cat;'
/bin/bash: -c: line 0: syntax error near unexpected token `('
/bin/bash: -c: line 0: `echo update wp_terms join wp_term_taxonomy using term_id set slug=replace(slug, 4, 9) where taxonomy=product_cat;'
/bin/bash: -c: line 0: syntax error near unexpected token `('
/bin/bash: -c: line 0: `echo update wp_terms join wp_term_taxonomy using term_id set slug=replace(slug, 5, 10) where taxonomy=product_cat;'

Multiple commands inline - More

Syntaxis for subshell?

The command to be executed in parallel, have to be enveloped with the right syntaxis, to be executed as one unit (subshell?) at the right moment:

  • Appearantly, use $(cmd1; cmd2; cmd3) to include multiple commands in a pipeline [2]
  • I have better experiences with using backticks for this.

Example SO

seq 5 | parallel echo '$(j=$((2*{})); echo $(($j+100)))'

It looks a bit weird to have two echo commands, but it works

More detailed

It seems I have to use apostrophes, rather than $(). E.g., the following statements all work:

seq 5 | parallel echo '$(j=$((2*{})); echo $(($j+100)))'
parallel echo '$(j=$((2*{})); echo $(($j+100)))' ::: $(seq 5)
parallel 'echo "Parallel: "; echo {1} {2} {3}' ::: $(seq 5)
parallel 'echo "Parallel: "; echo {1} {2} {3}' ::: $(seq 5)

But nothing works if I try to envelop stuff in $(). E.g.:

$ parallel $(echo {}; echo {}) ::: $(seq 3)

/bin/bash: 1: command not found
/bin/bash: 2: command not found
/bin/bash: 3: command not found

But this works:

$ parallel 'echo {}; echo {}' ::: $(seq 3)

1
1
2
2
3
3

:::

Provide 'parallel' vars to Gnu Parall using :::. The first examples of Parallel that I came across, used pipes. I think using ::: is actually the more common way.

First example

# "seq" and "5" are regarded as parallel arguments for echo ;)
$ parallel echo ::: seq 5

seq
5

The reason why this returns the arguments seq and 5, rather than the result of executing seq 5: It isn't clear that these arguments actually need to be executed! Put the arguments within $() to get them evaluated before being passed to Parallel.

Second example

This works! Note that output from parallel is usually on multiple lines:

$ i=$(seq 5)
$ echo $i
$ parallel echo ::: $i

1 2 3 4 5
1
2
3
4
5

:::+

With ::: you get the Cartesian product of the variables. If you don't want that, use :::+. See example elsewhere in this article about parsing an associative array with three columns into Parallel.

An array isn't parallel - Unless it is?

I have the impression that here, parallel doesn't treat array entries as parallel stuff, but the whole entry as just one argument:

j=(1 2 3 4 5)
echo ${j[@]}

seq 5 | parallel echo {}
echo ${j[@]} | parallel echo {}

seq 5 | parallel 'echo $(({}+{}))'
echo ${j[@]} | parallel 'echo $(({}+{}))'   # Error: Invalid arithmetic operator

But this works:

j=(1 2 3 4 5)
parallel echo ::: ${j[@]}

Operate on entries before parallel?

Can you first operate on an entry before its being processed by parallel?

Example using ::::

# First the argument is expanded and only then the operator applied
# (hence to the last item only)
#
$ i=$(seq 5)
$ parallel echo ::: $i+1

1
2
3
4
5+1

The sum of the elements get evaluated first, as this part of the statement is within apostrophes:

$ seq 5 | parallel 'echo $(({}+{}))'

2
4
6
8
10

Rewritten in possibly a more common form, without pipeline:

parallel 'echo $(({}+{}))' ::: $(seq 5)

2
4
6
8
10

Multiple operations on entries before parallel?

Concerning WP-CLI, it would be really cool if multiple commands can be run parallel, that each do something with the output.

Example:

  • Have a sequence 1...5
  • Do this in parallel for each of the numbers:
    • Multiply an entry by 2
    • Add 1 to the result.

Causation is important here: If 'adding 1' is done in parallel to 'multiply by 2', the results might become unpredictable.

Let's try:

seq 5 | parallel 'echo ((2*{}))'
...

Reuse argument

Casus that I encounter using WP-CLI sometimes:

  • Have a sequence 1...5
  • Do this in parallel for each of the numbers:
    • Multiply entry by 2
    • Multiply entry by 3
    • Add the outcome of these two multiplications

Let's start with the last three lines and first make sure I get that part right :)

i=5
echo $((2*$i + 3*$i))

Now together:

seq 5 | parallel 'echo $((2*{} + 3*{}))'

And why stop here?

seq 5 | parallel 'echo "{} - $((2*{}+3*{}))"'

Evaluate parallel argument first

This won't work:

$ seq 20 | parallel echo $(({}+{}))

bash: {}+{}: syntax error: operand expected (error token is "{}+{}")

The reason: The part between () is evaluated first, and only then it is interpreted as an argument for parallel.

To change that, put the parallel argument between single quotes:

seq 20 | parallel 'echo $(({}+{}))'

BTW, this doesn't work:

$ seq 20 | parallel echo $(('{}'+'{}'))

bash: '{}'+'{}': syntax error: operand expected (error token is "'{}'+'{}'")

Reusing an argument multiple times

You can use the parallel argument multiple times: Just use {} multiple times:

seq 20 | parallel echo $(({}+{}))


sem

sem stands for semaphore, a token that is passed around to do stuff in parallel. I bumped into this hier tegen, but I am not sure it works for me like this within a loop. The Parallels manual doesn't seem to be exhaustive concerning this topic. I found https://www.gnu.org/software/parallel/sem.html a much better source.

How many concurrent threads?

Questions

  • Is it about threads, processors, cores, sockets or what?
  • Do I need to optimize myself for the number of threads? Or just leave this up to GNU Parallels?
  • What are the effects for sub-optimized cases?

Answers

  • What cylinders are in a car, are processors or processing units in a computer. See Processors, cores & threads on this computer (Bash) for details.
  • GNU Parallel clearly knows what the optimal number of threads is. See below in the testcode for the case with sem -j +0: Here the number of threads is the same as the number of processors, and the statistics confirm this
  • When optimizing manually, rather choose a bit too high a number of threads, than too low. However, this very much depends on the use case. E.g.: If CPU power is the bottleneck or I/O - I'm quite sure that for me, it's usually CPU-power, though.

Test scripts

################################################################################
# Thread optimalisation
################################################################################
#
# My laptop can do 8 threads. Let's see what happens to performance when I
# force more or less threads:
#
#
# parallel_test_function()
########################################
#
function parallel_test_function()
{
	printf "PTF - Start... "
	i=0
	for ((i; i<=100000; i++))
	do
		i=$i+1
		i=$i-1
	done
	printf "Done. "
}
export -f parallel_test_function


# # Test - 8 threads
# ########################################
# #
# # * Execution time (s): 5, 5, 5, 5 ⇒ 5s
# #
# start=`date +%s`
# #
# sem -j 8 parallel_test_function
# sem -j 8 parallel_test_function
# sem -j 8 parallel_test_function
# sem -j 8 parallel_test_function
# sem -j 8 parallel_test_function
# sem -j 8 parallel_test_function
# sem -j 8 parallel_test_function
# sem -j 8 parallel_test_function
# sem -j 8 parallel_test_function
# sem -j 8 parallel_test_function
# sem -j 8 parallel_test_function
# sem -j 8 parallel_test_function
# sem -j 8 parallel_test_function
# sem -j 8 parallel_test_function
# sem -j 8 parallel_test_function
# sem -j 8 parallel_test_function
# sem -j 8 parallel_test_function
# sem -j 8 parallel_test_function
# sem -j 8 parallel_test_function
# sem -j 8 parallel_test_function
# sem -j 8 parallel_test_function
# sem -j 8 parallel_test_function
# sem -j 8 parallel_test_function
# sem -j 8 parallel_test_function
# sem -j 8 parallel_test_function
# sem -j 8 parallel_test_function
# sem -j 8 parallel_test_function
# sem -j 8 parallel_test_function
# sem -j 8 parallel_test_function
# sem -j 8 parallel_test_function
# sem -j 8 parallel_test_function
# sem -j 8 parallel_test_function
# #
# sem --wait
# end=`date +%s`
# echo ""; echo Execution time was `expr $end - $start` seconds.


# Test - 16 threads
########################################
#
# * Execution time (s): 5, 5, 5, 5, 5 ⇒ 5s
#
# start=`date +%s`
# #
# sem -j 16 parallel_test_function
# sem -j 16 parallel_test_function
# sem -j 16 parallel_test_function
# sem -j 16 parallel_test_function
# sem -j 16 parallel_test_function
# sem -j 16 parallel_test_function
# sem -j 16 parallel_test_function
# sem -j 16 parallel_test_function
# sem -j 16 parallel_test_function
# sem -j 16 parallel_test_function
# sem -j 16 parallel_test_function
# sem -j 16 parallel_test_function
# sem -j 16 parallel_test_function
# sem -j 16 parallel_test_function
# sem -j 16 parallel_test_function
# sem -j 16 parallel_test_function
# sem -j 16 parallel_test_function
# sem -j 16 parallel_test_function
# sem -j 16 parallel_test_function
# sem -j 16 parallel_test_function
# sem -j 16 parallel_test_function
# sem -j 16 parallel_test_function
# sem -j 16 parallel_test_function
# sem -j 16 parallel_test_function
# sem -j 16 parallel_test_function
# sem -j 16 parallel_test_function
# sem -j 16 parallel_test_function
# sem -j 16 parallel_test_function
# sem -j 16 parallel_test_function
# sem -j 16 parallel_test_function
# sem -j 16 parallel_test_function
# sem -j 16 parallel_test_function
# #
# sem --wait
# end=`date +%s`
# echo ""; echo Execution time was `expr $end - $start` seconds.


# # Test - 32 threads
# ########################################
# #
# # * Execution time (s): 6, 6, 5, 5, 6 ⇒ 5.6s
# #
# start=`date +%s`
# #
# sem -j 32 parallel_test_function
# sem -j 32 parallel_test_function
# sem -j 32 parallel_test_function
# sem -j 32 parallel_test_function
# sem -j 32 parallel_test_function
# sem -j 32 parallel_test_function
# sem -j 32 parallel_test_function
# sem -j 32 parallel_test_function
# sem -j 32 parallel_test_function
# sem -j 32 parallel_test_function
# sem -j 32 parallel_test_function
# sem -j 32 parallel_test_function
# sem -j 32 parallel_test_function
# sem -j 32 parallel_test_function
# sem -j 32 parallel_test_function
# sem -j 32 parallel_test_function
# sem -j 32 parallel_test_function
# sem -j 32 parallel_test_function
# sem -j 32 parallel_test_function
# sem -j 32 parallel_test_function
# sem -j 32 parallel_test_function
# sem -j 32 parallel_test_function
# sem -j 32 parallel_test_function
# sem -j 32 parallel_test_function
# sem -j 32 parallel_test_function
# sem -j 32 parallel_test_function
# sem -j 32 parallel_test_function
# sem -j 32 parallel_test_function
# sem -j 32 parallel_test_function
# sem -j 32 parallel_test_function
# sem -j 32 parallel_test_function
# sem -j 32 parallel_test_function
# #
# sem --wait
# end=`date +%s`
# echo ""; echo Execution time was `expr $end - $start` seconds.


# # Test - 4 threads
# ########################################
# #
# # * Execution time (s): 5, 6, 5, 6, 6, 5 ⇒ 5.5s
# #
# start=`date +%s`
# #
# sem -j 4 parallel_test_function
# sem -j 4 parallel_test_function
# sem -j 4 parallel_test_function
# sem -j 4 parallel_test_function
# sem -j 4 parallel_test_function
# sem -j 4 parallel_test_function
# sem -j 4 parallel_test_function
# sem -j 4 parallel_test_function
# sem -j 4 parallel_test_function
# sem -j 4 parallel_test_function
# sem -j 4 parallel_test_function
# sem -j 4 parallel_test_function
# sem -j 4 parallel_test_function
# sem -j 4 parallel_test_function
# sem -j 4 parallel_test_function
# sem -j 4 parallel_test_function
# sem -j 4 parallel_test_function
# sem -j 4 parallel_test_function
# sem -j 4 parallel_test_function
# sem -j 4 parallel_test_function
# sem -j 4 parallel_test_function
# sem -j 4 parallel_test_function
# sem -j 4 parallel_test_function
# sem -j 4 parallel_test_function
# sem -j 4 parallel_test_function
# sem -j 4 parallel_test_function
# sem -j 4 parallel_test_function
# sem -j 4 parallel_test_function
# sem -j 4 parallel_test_function
# sem -j 4 parallel_test_function
# sem -j 4 parallel_test_function
# sem -j 4 parallel_test_function
# #
# sem --wait
# end=`date +%s`
# echo ""; echo Execution time was `expr $end - $start` seconds.


# # Test - 2 threads
# ########################################
# #
# # * Execution time (s): 7, 6, 6, 7, 7, 7 ⇒ 6.7s
# #
# start=`date +%s`
# #
# sem -j 2 parallel_test_function
# sem -j 2 parallel_test_function
# sem -j 2 parallel_test_function
# sem -j 2 parallel_test_function
# sem -j 2 parallel_test_function
# sem -j 2 parallel_test_function
# sem -j 2 parallel_test_function
# sem -j 2 parallel_test_function
# sem -j 2 parallel_test_function
# sem -j 2 parallel_test_function
# sem -j 2 parallel_test_function
# sem -j 2 parallel_test_function
# sem -j 2 parallel_test_function
# sem -j 2 parallel_test_function
# sem -j 2 parallel_test_function
# sem -j 2 parallel_test_function
# sem -j 2 parallel_test_function
# sem -j 2 parallel_test_function
# sem -j 2 parallel_test_function
# sem -j 2 parallel_test_function
# sem -j 2 parallel_test_function
# sem -j 2 parallel_test_function
# sem -j 2 parallel_test_function
# sem -j 2 parallel_test_function
# sem -j 2 parallel_test_function
# sem -j 2 parallel_test_function
# sem -j 2 parallel_test_function
# sem -j 2 parallel_test_function
# sem -j 2 parallel_test_function
# sem -j 2 parallel_test_function
# sem -j 2 parallel_test_function
# sem -j 2 parallel_test_function
# #
# sem --wait
# end=`date +%s`
# echo ""; echo Execution time was `expr $end - $start` seconds.


# # Test - 1 thread
# ########################################
# #
# # * Execution time (s): 12, 12, 12, 12 ⇒ 12s
# #
# start=`date +%s`
# #
# sem -j 1 parallel_test_function
# sem -j 1 parallel_test_function
# sem -j 1 parallel_test_function
# sem -j 1 parallel_test_function
# sem -j 1 parallel_test_function
# sem -j 1 parallel_test_function
# sem -j 1 parallel_test_function
# sem -j 1 parallel_test_function
# sem -j 1 parallel_test_function
# sem -j 1 parallel_test_function
# sem -j 1 parallel_test_function
# sem -j 1 parallel_test_function
# sem -j 1 parallel_test_function
# sem -j 1 parallel_test_function
# sem -j 1 parallel_test_function
# sem -j 1 parallel_test_function
# sem -j 1 parallel_test_function
# sem -j 1 parallel_test_function
# sem -j 1 parallel_test_function
# sem -j 1 parallel_test_function
# sem -j 1 parallel_test_function
# sem -j 1 parallel_test_function
# sem -j 1 parallel_test_function
# sem -j 1 parallel_test_function
# sem -j 1 parallel_test_function
# sem -j 1 parallel_test_function
# sem -j 1 parallel_test_function
# sem -j 1 parallel_test_function
# sem -j 1 parallel_test_function
# sem -j 1 parallel_test_function
# sem -j 1 parallel_test_function
# sem -j 1 parallel_test_function
# #
# sem --wait
# end=`date +%s`
# echo ""; echo Execution time was `expr $end - $start` seconds.


# Test - Auto-optimized
########################################
#
# * Execution time (s): 5, 5, 5, 5, 5 ⇒ 5s
#
start=`date +%s`
#
sem -j +0 parallel_test_function
sem -j +0 parallel_test_function
sem -j +0 parallel_test_function
sem -j +0 parallel_test_function
sem -j +0 parallel_test_function
sem -j +0 parallel_test_function
sem -j +0 parallel_test_function
sem -j +0 parallel_test_function
sem -j +0 parallel_test_function
sem -j +0 parallel_test_function
sem -j +0 parallel_test_function
sem -j +0 parallel_test_function
sem -j +0 parallel_test_function
sem -j +0 parallel_test_function
sem -j +0 parallel_test_function
sem -j +0 parallel_test_function
sem -j +0 parallel_test_function
sem -j +0 parallel_test_function
sem -j +0 parallel_test_function
sem -j +0 parallel_test_function
sem -j +0 parallel_test_function
sem -j +0 parallel_test_function
sem -j +0 parallel_test_function
sem -j +0 parallel_test_function
sem -j +0 parallel_test_function
sem -j +0 parallel_test_function
sem -j +0 parallel_test_function
sem -j +0 parallel_test_function
sem -j +0 parallel_test_function
sem -j +0 parallel_test_function
sem -j +0 parallel_test_function
sem -j +0 parallel_test_function

#
sem --wait
end=`date +%s`
echo ""; echo Execution time was `expr $end - $start` seconds.

Sources

Subshells & variables

I have the impression that GNU Parallel creates a subshell and that precautions have to be taken to assure that functions and variables are available in that subshell.

Not that this subshell stuff is not the same as scope within a single shell

Without exported function or variable

################################################################################
# Subshell & vars? - Without exporting function or var
################################################################################
#
# Function
########################################
#
subfunction()
{
   echo "Subfunction - Var j: $j"
}


# Main
########################################
#
j=12
echo "Main - var j: $j"
#
parallel subfunction ::: $(seq 5)

Output:

Main - var j: 12
/bin/bash: subfunction: command not found
/bin/bash: subfunction: command not found
/bin/bash: subfunction: command not found
/bin/bash: subfunction: command not found
/bin/bash: subfunction: command not found

With exported function

Now the function is exported using export -f subfunction and GNU Parallel can find it. However, the variable j is not available within this function.

################################################################################
# Subshell & vars? - With exporting function
################################################################################
#
# Function
########################################
#
subfunction()
{
   echo "Subfunction - Var j: $j"
}


# Main
########################################
#
j=12
echo "Main - var j: $j"
export -f subfunction
#
parallel subfunction ::: $(seq 5)

Output:

Main - var j: 12
Subfunction - Var j: 
Subfunction - Var j: 
Subfunction - Var j: 
Subfunction - Var j: 
Subfunction - Var j: 

With exported function and exporter variable

Juhu! Sometimes, things are easy:

################################################################################
# Subshell & vars? - With exporting function
################################################################################
#
# Function
########################################
#
subfunction()
{
   echo "Subfunction - Var j: $j"
}


# Main
########################################
#
j=12
echo "Main - var j: $j"
export -f subfunction
export j
#
parallel subfunction ::: $(seq 5)

Output:

Main - var j: 12
Subfunction - Var j: 12
Subfunction - Var j: 12
Subfunction - Var j: 12
Subfunction - Var j: 12
Subfunction - Var j: 12

But not for arrays

It seems that regular arrays and associate arrays cannot be exported to subshells:

################################################################################
# Subshell, var & arrays
################################################################################
#
# Function
########################################
#
subfunction()
{
	echo "function - Var i:               $i"
	echo "function - Associative array j: ${j[@]}"
	echo "function - Regular array k:     ${k[@]}"
}


# Main
########################################
#
i=12

declare -gA j

j[foo,1]="Foo-1"
j[bar,2]="Bar-2"

k[1]="K1"
k[2]="K2"


echo "Main - var j: $i"
export -f subfunction
export i
export j		# Doesn't work
export j[@]		# Doesn't work
export k		# Doesn't work
export k[@]		# Doesn't work
export {k[@]}	# Doesn't work
#
parallel subfunction ::: $(seq 5)

Output:

Main - var j: 12
./parallel.sh: line 181: export: `j[@]': not a valid identifier
./parallel.sh: line 183: export: `k[@]': not a valid identifier
./parallel.sh: line 184: export: `{k[@]}': not a valid identifier
function - Var i:               12
function - Associative array j: 
function - Regular array k:     
function - Var i:               12
function - Associative array j: 
function - Regular array k:     
function - Var i:               12
function - Associative array j: 
function - Regular array k:     
function - Var i:               12
function - Associative array j: 
function - Regular array k:     
function - Var i:               12
function - Associative array j: 
function - Regular array k:   

Pass arrays to GNU Parallel

As mentioned before, you cannot pass an array (regular or associative) to subshells and therefore in some situations to Parallels. But there is hope [3]:

  • Use :::+?
  • Something with exporting functions?
  • ?

Using :::+

This works exactly as intended:

################################################################################
# GNU Parallel & associative array
################################################################################
#
# tmps()
########################################
#
tmp2()
{
   echo ""; echo "tmp2: " 
   echo $1, $2, $3
}

# Build the associative array
########################################
#
declare -gA j

j[1,tag]="_tool_"
j[1,nl]="Boormachine"
j[1,en]="Drilling machine"

j[2,tag]="_tool_"
j[2,nl]="Zaag"
j[2,en]="Saw"

j[3,tag]="_dim_"
j[3,nl]="1,3"
j[3,en]="1.3"

j_rows=3

# echo ${j[@]}


# Convert to regular array
########################################
#
# * I need a structure with "Boormachine", "Zaag" en "1,3" that can be used
#   as an argument for Parallel. That's not possible with an associate
#   array, I guess. E.g.: echo ${j[1*]} doesn't work
# * The code below to construct temporary regular arrays, is quite
#   inefficient. Would be nice to do it in a more efficient way,
#   maybe using GNU Parallel instead of a loop?
#
#
unset j_tag
unset j_nl
unset j_en

for i in $(seq $j_rows)
do
   j_tag+=("${j[$i,tag]}")
   j_nl+=("${j[$i,nl]}")
   j_en+=("${j[$i,en]}")
done	

# echo "j_tag: ${j_tag[@]}"
# echo "j_nl: ${j_nl[@]}"
# echo "j_en: ${j_en[@]}"


# Invoke GNU Parallel
########################################
#
export -f tmp2
#
# parallel tmp2 ::: "${j_tag[@]}" ::: "${j_nl[@]}" ::: "${j_en[@]}"	# 27 combinations?
# parallel tmp2 ::: "${j_tag[@]}" ::: "${j_nl[@]}"	                # 9 combinations
# parallel tmp2 ::: "${j_tag[@]}" :::+ "${j_nl[@]}"	                # 6 combinations
#
parallel tmp2 ::: "${j_tag[@]}" :::+ "${j_nl[@]}" :::+ "${j_en[@]}"	# 3 rows - As intended

BTW: The function that is used here, could easily be replacedd by a direct statement, leading to something like this:

# With a single inline operator
########################################
#
parallel echo {1} {2} {3} ::: "${j_tag[@]}" :::+ "${j_nl[@]}" :::+ "${j_en[@]}"

And for something even more exciting: Now with multiple inline commands:

# With multiple inline operators
########################################
#
parallel echo {1} {2} {3} ::: "${j_tag[@]}" :::+ "${j_nl[@]}" :::+ "${j_en[@]}"

Sources

case: collect term_ids through wp-cli

Seems like a good case for replacing a loop with parallel.

Original code:

# Collect all term_ids through a loop
#######################################
#
# * There are 1.433 terms to collect
# * Max. 100 items are returned at once
# * Hence this loop needs 15 iterations
#
i=1
echo "Loop - Collect all term_ids"
#
for ((i; i<=$number_of_iterations; i++))	
do
	#
	echo "	Iteration $i/$number_of_iterations"
	#
	# Store batch of term ids in tmp array j
	########################################
	#
	mapfile -t j < <( wp wc product_attribute_term list \
		$taxonomy_id \
		--user=4 \
		--field=id \
		--offset=$((($i-1)*100)) | grep . )
	#
	# echo "	j: ${j[@]}"
	#
	# Append to array term_id
	########################################
	#
	term_id=(${term_id[@]} ${j[@]})
	echo "		Length term_id: ${#term_id[@]}"
	#
done

New code:

See also

Sources