GNU Parallel: verschil tussen versies

Uit De Vliegende Brigade
Naar navigatie springen Naar zoeken springen
 
(40 tussenliggende versies door dezelfde gebruiker niet weergegeven)
Regel 1: Regel 1:
 
[[file:20221023-0902.png|thumb|Logo GNU Parallel]]
 
[[file:20221023-0902.png|thumb|Logo GNU Parallel]]
  
''Parallel'' is a shell routine to distribute a task over all available threads. It has been written by Ole Tange in Perl.
+
''GNU Parallel'', ''Parallel'' or ''parallel'' is a shell routine to distribute task over multiple threads. It has been written by Ole Tange in Perl.
  
 
Some impressions of where parallel can help:
 
Some impressions of where parallel can help:
Regel 55: Regel 55:
  
 
Here, the five <code>echo</code> commands are executed parallel.
 
Here, the five <code>echo</code> commands are executed parallel.
 
== Just call a function x times ==
 
 
This seems like such an easy start, but no. Very instructive, definitely:
 
 
=== Baseline: Without parallel stuff ===
 
 
Execution time: 24s.
 
 
<pre>
 
# parallel_test_function()
 
########################################
 
#
 
function parallel_test_function()
 
{
 
printf "parallel_test_function - Start... "
 
i=0
 
for ((i; i<=1000000; i++))
 
do
 
i=$i+1
 
i=$i-1
 
done
 
printf "Done. "
 
}
 
 
# Main
 
########################################
 
#
 
# Execution time (function: 1000000x. Here: 6x): 23, 24, 24, 24 ⇒ 24s
 
#
 
export -f parallel_test_function
 
 
start=`date +%s`
 
j=0
 
for ((j; j<=5; j++))
 
do
 
parallel_test_function
 
done
 
 
end=`date +%s`
 
echo ""; echo Execution time was `expr $end - $start` seconds.
 
</pre>
 
 
=== Just call a function with 'parallel'? ===
 
 
This doesn't work:
 
 
<pre>
 
# Main
 
########################################
 
#
 
export -f parallel_test_function
 
start=`date +%s`
 
j=0
 
for ((j; j<=5; j++))
 
do
 
parallel parallel_test_function
 
done
 
end=`date +%s`
 
echo ""; echo Execution time was `expr $end - $start` seconds.
 
</pre>
 
 
It will result in this error:
 
 
<pre>
 
parallel: Warning: Input is read from the terminal. You either know what you
 
parallel: Warning: are doing (in which case: YOU ARE AWESOME!) or you forgot
 
parallel: Warning: ::: or :::: or to pipe data into parallel. If so
 
parallel: Warning: consider going through the tutorial: man parallel_tutorial
 
parallel: Warning: Press CTRL-D to exit.
 
</pre>
 
 
=== Working or not? ===
 
 
This seems to work, but it isn't any faster:
 
 
<pre>
 
# Call test function with parallel (v1)
 
########################################
 
#
 
# * Execution time (function: 1000000x. Here: 6x): 21, 23, 24 ⇒ 24s
 
# * With less inner loops and more outer loops, this is even slower than
 
#  without using ||
 
#
 
export -f parallel_test_function
 
 
start=`date +%s`
 
j=0
 
for ((j; j<=5; j++))
 
do
 
        sem parallel_test_function
 
done
 
end=`date +%s`
 
echo ""; echo Execution time was `expr $end - $start` seconds.
 
</pre>
 
 
=== Finally! ===
 
 
Execution time: 9s - This works:
 
 
<pre>
 
# Call test function with parallel (v2)
 
########################################
 
#
 
# Execution time (function: 1000000x. Here: 6x): 10, 9, 9 ⇒ 9s
 
#
 
export -f parallel_test_function
 
 
start=`date +%s`
 
 
seq 6 | parallel parallel_test_function
 
 
end=`date +%s`
 
echo ""; echo Execution time was `expr $end - $start` seconds.
 
</pre>
 
 
I suspect that the previous trial didn't work, because the loop kills parallelisation: Probably only what is stated after the keyword <code>parallel</code>, is actually parallelised. Seems quite logical.
 
 
So the trick seems to be: When you have a loop and you want to || is, make sure that you get rid of the loop. A bit similar to changing a ''select query'' to a ''update query'': Always a bit of puzzling, but doable.
 
 
=== See also ===
 
 
* https://askubuntu.com/questions/1171165/how-do-i-run-the-same-exact-command-n-number-of-times-in-parallel-using-gnu-par
 
 
== How to get parallel stuff into parallel? ==
 
 
I find it difficult to understand how to get stuff parallel into parallel. It seems like the same kind of difficulties I had with understanding SQL, which is a 4GL and does stuff implicitly. This section tries to give a bit of an overview.
 
  
 
==  One operator - Multiple arguments ==
 
==  One operator - Multiple arguments ==
Regel 198: Regel 71:
  
 
It often seems that <code>xargs</code> implicitly picks up where to insert the piped stuff. Same for parallel?
 
It often seems that <code>xargs</code> implicitly picks up where to insert the piped stuff. Same for parallel?
 
==  Multiple operators ==
 
 
=== Just pipe a script file ===
 
 
[https://www.msi.umn.edu/support/faq/how-can-i-use-gnu-parallel-run-lot-commands-parallel here]
 
 
<pre>
 
parallel < my_script.sh
 
</pre>
 
 
=== No difference between operators & arguments - Example Leo ===
 
 
[https://stackoverflow.com/questions/61483185/gnu-parallel-multiple-commands]:
 
 
<pre>
 
$ parallel "{1} {2}" ::: 'printf "%02d "' 'printf "%03d "' ::: 1 2
 
01 02 001 002
 
</pre>
 
 
What it does:
 
 
* <code>printf "%02d "</code>: Print "00 "
 
* <code>printf "%03d "</code>: Print "000 "
 
 
Arguments are multiplied into:
 
 
<pre>
 
* printf "%02d " 1
 
* printf "%02d " 2
 
* printf "%03d " 1
 
* printf "%03d " 2
 
</pre>
 
 
and this is executed through parallel.
 
 
Small detail: I get output as above, but when I run the <code>print</code> commands separately, I get more '0's. Maybe has to do with the single quotes around the print statements?
 
 
=== Use a function ===
 
 
From Chapter 5 of the manual:
 
 
<pre>
 
The command can be a script, a binary or a Bash function if the function
 
is exported using
 
 
  export -f :
 
 
my_func()
 
{
 
  echo in my_func $1
 
}
 
 
export -f my_func
 
 
parallel my_func ::: 1 2 3
 
</pre>
 
 
== Multiple commands in pipeline ==
 
 
Appearantly, use <code>$(cmd1; cmd2; cmd3)</code> to include multiple commands in a pipeline [https://stackoverflow.com/questions/11917708/how-to-pipe-multiple-commands-into-a-single-command-in-the-shell-sh-bash].
 
 
In combination with parallel:
 
 
<pre>
 
seq 5 | parallel echo '$(j=$((2*{})); echo $(($j+100)))'
 
</pre>
 
 
It looks a bit weird to have two <code>echo</code> commands, but it works
 
 
== ::: ==
 
 
With <code>:::</code> you provide parallel arguments without using a pipe.
 
 
=== First example ===
 
 
<pre>
 
# "seq" and "5" are regarded as parallel arguments for echo ;)
 
$ parallel echo ::: seq 5
 
 
seq
 
5
 
</pre>
 
 
The reason why this returns the arguments <code>seq</code> and <code>5</code>, rather than the ''result'' of executing <code>seq 5</code>: It isn't clear that these arguments actually need to be executed! Put the arguments within <code>$()</code> to get them evaluated before being passed to Parallel.
 
 
=== Second example ===
 
 
This works! Note that output from parallel is usually on multiple lines:
 
 
<pre>
 
$ i=$(seq 5)
 
$ echo $i
 
$ parallel echo ::: $i
 
 
1 2 3 4 5
 
1
 
2
 
3
 
4
 
5
 
</pre>
 
  
 
== An array isn't parallel - Unless it is? ==
 
== An array isn't parallel - Unless it is? ==
Regel 342: Regel 113:
 
</pre>
 
</pre>
  
This seems more intuitive using a pipe, as individual arguments are available through <code>{}</code>:
+
The sum of the elements get evaluated first, as this part of the statement is within apostrophes:
  
 
<pre>
 
<pre>
 
$ seq 5 | parallel 'echo $(({}+{}))'
 
$ seq 5 | parallel 'echo $(({}+{}))'
 +
 +
2
 +
4
 +
6
 +
8
 +
10
 +
</pre>
 +
 +
Rewritten in possibly a more common form, without pipeline:
 +
 +
<pre>
 +
parallel 'echo $(({}+{}))' ::: $(seq 5)
  
 
2
 
2
Regel 437: Regel 220:
 
</pre>
 
</pre>
  
 
+
== Questions ==
== sem ==
 
 
 
''sem'' stands for ''semaphore'', a token that is passed around to do stuff in parallel. I bumped into this [https://stackoverflow.com/questions/17307800/how-to-run-given-function-in-bash-in-parallel hier] tegen, but I am not sure it works for me like this within a loop. The Parallels manual doesn't seem to be exhaustive concerning this topic. I found https://www.gnu.org/software/parallel/sem.html a much better source.
 
 
 
== How many concurrent threads? ==
 
 
 
=== Questions ===
 
  
 
* Is it about ''threads'', ''processors'', ''cores'', ''sockets'' or what?
 
* Is it about ''threads'', ''processors'', ''cores'', ''sockets'' or what?
Regel 450: Regel 226:
 
* What are the effects for sub-optimized cases?
 
* What are the effects for sub-optimized cases?
  
=== Answers ===
+
== Answers ==
  
 
* What cylinders are in a car, are ''processors'' or ''processing units'' in a computer. See [[Processors, cores & threads on this computer (Bash)]] for details.
 
* What cylinders are in a car, are ''processors'' or ''processing units'' in a computer. See [[Processors, cores & threads on this computer (Bash)]] for details.
Regel 456: Regel 232:
 
* When optimizing manually, rather choose a bit too high a number of threads, than too low. However, this very much depends on the use case. E.g.: If CPU power is the bottleneck or I/O - I'm quite sure that for me, it's usually CPU-power, though.
 
* When optimizing manually, rather choose a bit too high a number of threads, than too low. However, this very much depends on the use case. E.g.: If CPU power is the bottleneck or I/O - I'm quite sure that for me, it's usually CPU-power, though.
  
=== Test scripts ===
+
== Test scripts ==
  
 
<pre>
 
<pre>
Regel 799: Regel 575:
 
echo ""; echo Execution time was `expr $end - $start` seconds.
 
echo ""; echo Execution time was `expr $end - $start` seconds.
 
</pre>
 
</pre>
=== Sources ===
+
 
 +
== Sources ==
  
 
* https://unix.stackexchange.com/questions/114672/gnu-parallel-more-than-one-per-cpu
 
* https://unix.stackexchange.com/questions/114672/gnu-parallel-more-than-one-per-cpu
 
* https://www.gnu.org/software/parallel/sem.html
 
* https://www.gnu.org/software/parallel/sem.html
 
== Subshells & variables ==
 
  
 
I have the impression that GNU Parallel creates a subshell and that precautions have to be taken to assure that functions and variables are available in that subshell.
 
I have the impression that GNU Parallel creates a subshell and that precautions have to be taken to assure that functions and variables are available in that subshell.
Regel 810: Regel 585:
 
Not that this subshell stuff is ''not'' the same as ''scope'' within a single shell
 
Not that this subshell stuff is ''not'' the same as ''scope'' within a single shell
  
=== Without exported function or variable ===
+
== Without exported function or variable ==
  
 
<pre>
 
<pre>
Regel 846: Regel 621:
 
</pre>
 
</pre>
  
=== With exported function ===
+
== With exported function ==
  
 
Now the function is exported using <code>export -f subfunction</code> and GNU Parallel can find it. However, the variable <code>j</code> is not available within this function.
 
Now the function is exported using <code>export -f subfunction</code> and GNU Parallel can find it. However, the variable <code>j</code> is not available within this function.
Regel 885: Regel 660:
 
</pre>
 
</pre>
  
=== With exported function and exporter variable ===
+
== With exported function and exported variable ==
  
 
Juhu! Sometimes, things are easy:
 
Juhu! Sometimes, things are easy:
Regel 925: Regel 700:
 
</pre>
 
</pre>
  
=== But not for arrays ===
+
== But not for arrays ==
  
 
It seems that ''regular arrays'' and ''associate arrays'' cannot be exported to subshells:
 
It seems that ''regular arrays'' and ''associate arrays'' cannot be exported to subshells:
Regel 939: Regel 714:
 
subfunction()
 
subfunction()
 
{
 
{
echo "function - Var i:              $i"
+
  echo "function - Var i:              $i"
echo "function - Associative array j: ${j[@]}"
+
  echo "function - Associative array j: ${j[@]}"
echo "function - Regular array k:    ${k[@]}"
+
  echo "function - Regular array k:    ${k[@]}"
 
}
 
}
  
Regel 962: Regel 737:
 
export -f subfunction
 
export -f subfunction
 
export i
 
export i
export j # Doesn't work
+
export j   # Doesn't work
export j[@] # Doesn't work
+
export j[@]   # Doesn't work
export k # Doesn't work
+
export k   # Doesn't work
export k[@] # Doesn't work
+
export k[@]   # Doesn't work
export {k[@]} # Doesn't work
+
export {k[@]} # Doesn't work
 
#
 
#
 
parallel subfunction ::: $(seq 5)
 
parallel subfunction ::: $(seq 5)
Regel 994: Regel 769:
 
function - Regular array k:   
 
function - Regular array k:   
 
</pre>
 
</pre>
 +
 +
== Pass arrays to GNU Parallel ==
 +
 +
As mentioned before, you cannot pass an array (regular or associative) to subshells and therefore in some situations to Parallels. But there is hope [https://unix.stackexchange.com/questions/395298/gnu-parallel-two-parameters-from-array-as-parameter]:
 +
 +
* Use <code>:::+</code>?
 +
* Something with ''exporting functions''?
 +
* ?
 +
 +
=== Using :::+ ===
 +
 +
This works exactly as intended:
 +
 +
<pre>
 +
################################################################################
 +
# GNU Parallel & associative array
 +
################################################################################
 +
#
 +
# tmps()
 +
########################################
 +
#
 +
tmp2()
 +
{
 +
  echo ""; echo "tmp2: "
 +
  echo $1, $2, $3
 +
}
 +
 +
# Build the associative array
 +
########################################
 +
#
 +
declare -gA j
 +
 +
j[1,tag]="_tool_"
 +
j[1,nl]="Boormachine"
 +
j[1,en]="Drilling machine"
 +
 +
j[2,tag]="_tool_"
 +
j[2,nl]="Zaag"
 +
j[2,en]="Saw"
 +
 +
j[3,tag]="_dim_"
 +
j[3,nl]="1,3"
 +
j[3,en]="1.3"
 +
 +
j_rows=3
 +
 +
# echo ${j[@]}
 +
 +
 +
# Convert to regular array
 +
########################################
 +
#
 +
# * I need a structure with "Boormachine", "Zaag" en "1,3" that can be used
 +
#  as an argument for Parallel. That's not possible with an associate
 +
#  array, I guess. E.g.: echo ${j[1*]} doesn't work
 +
# * The code below to construct temporary regular arrays, is quite
 +
#  inefficient. Would be nice to do it in a more efficient way,
 +
#  maybe using GNU Parallel instead of a loop?
 +
#
 +
#
 +
unset j_tag
 +
unset j_nl
 +
unset j_en
 +
 +
for i in $(seq $j_rows)
 +
do
 +
  j_tag+=("${j[$i,tag]}")
 +
  j_nl+=("${j[$i,nl]}")
 +
  j_en+=("${j[$i,en]}")
 +
done
 +
 +
# echo "j_tag: ${j_tag[@]}"
 +
# echo "j_nl: ${j_nl[@]}"
 +
# echo "j_en: ${j_en[@]}"
 +
 +
 +
# Invoke GNU Parallel
 +
########################################
 +
#
 +
export -f tmp2
 +
#
 +
# parallel tmp2 ::: "${j_tag[@]}" ::: "${j_nl[@]}" ::: "${j_en[@]}" # 27 combinations?
 +
# parallel tmp2 ::: "${j_tag[@]}" ::: "${j_nl[@]}"                 # 9 combinations
 +
# parallel tmp2 ::: "${j_tag[@]}" :::+ "${j_nl[@]}"                 # 6 combinations
 +
#
 +
parallel tmp2 ::: "${j_tag[@]}" :::+ "${j_nl[@]}" :::+ "${j_en[@]}" # 3 rows - As intended
 +
</pre>
 +
 +
BTW: The function that is used here, could easily be replacedd by a direct statement, leading to something like this:
 +
 +
<pre>
 +
# With a single inline operator
 +
########################################
 +
#
 +
parallel echo {1} {2} {3} ::: "${j_tag[@]}" :::+ "${j_nl[@]}" :::+ "${j_en[@]}"
 +
</pre>
 +
 +
And for something even more exciting: Now with multiple inline commands:
 +
 +
<pre>
 +
# With multiple inline operators
 +
########################################
 +
#
 +
parallel echo {1} {2} {3} ::: "${j_tag[@]}" :::+ "${j_nl[@]}" :::+ "${j_en[@]}"
 +
</pre>
 +
 +
=== Sources ===
 +
 +
* https://unix.stackexchange.com/questions/395298/gnu-parallel-two-parameters-from-array-as-parameter
 +
* https://www.autoscripts.net/gnu-parallel-two-parameters-from-array-as-parameter/
  
 
== case: collect term_ids through wp-cli ==
 
== case: collect term_ids through wp-cli ==
Regel 1.037: Regel 922:
 
</pre>
 
</pre>
  
New code:
+
New code: ...
 +
 
 +
== WP-CLI & Parallel ==
 +
 
 +
Finally got it working - Thanks to ChatGPT! → https://wiki.devliegendebrigade.nl/Wp_wc_product_delete
  
 
== See also ==
 
== See also ==

Huidige versie van 13 sep 2023 om 15:10

Logo GNU Parallel

GNU Parallel, Parallel or parallel is a shell routine to distribute task over multiple threads. It has been written by Ole Tange in Perl.

Some impressions of where parallel can help:

First line in chapter 1 of the manual reads:

If you write shell scripts to do the same processing for different input,
then GNU Parallel will make your life easier and make your scripts run
faster.

From the man page:

If you write loops in shell, you will find GNU parallel may be able to
replace most of the loops and make them run faster by running several
jobs in parallel.

This article is as of Nov. 2022 a work in progress. That's why it looks like a brainstorm session, rather than a structured article.

Installation

$ sudo apt install parallel

...
The following NEW packages will be installed:
  parallel sysstat
...

Example intro chapter 1

The intro of chapter 1 of the manual contains this example:

seq 5 | parallel seq {} '>' example.{}

What it does:

  • seq 5 produces a series with numbers from 1 to 5
  • seq {} this takes those numbers as arguments for creating additional sequences: A sequence with only the number 1, a sequence with the numbers 1 and 2, until a sequence with the numbers 1 to 5
  • '>' example.{}: These 5 sequences are written to files example.1...example.5.

An even simpler example, although not very useful:

seq 5 | paralel echo {}

Here, the five echo commands are executed parallel.

One operator - Multiple arguments

Example of one operator with multiple arguments. In this case, the arguments are generated on the left of the command line, and piped into parallel. There is on

seq 10 | parallel echo {}

Is this the same as

seq 10 | parallel echo   # Same as above?

It often seems that xargs implicitly picks up where to insert the piped stuff. Same for parallel?

An array isn't parallel - Unless it is?

I have the impression that here, parallel doesn't treat array entries as parallel stuff, but the whole entry as just one argument:

j=(1 2 3 4 5)
echo ${j[@]}

seq 5 | parallel echo {}
echo ${j[@]} | parallel echo {}

seq 5 | parallel 'echo $(({}+{}))'
echo ${j[@]} | parallel 'echo $(({}+{}))'   # Error: Invalid arithmetic operator

But this works:

j=(1 2 3 4 5)
parallel echo ::: ${j[@]}

Operate on entries before parallel?

Can you first operate on an entry before its being processed by parallel?

Example using ::::

# First the argument is expanded and only then the operator applied
# (hence to the last item only)
#
$ i=$(seq 5)
$ parallel echo ::: $i+1

1
2
3
4
5+1

The sum of the elements get evaluated first, as this part of the statement is within apostrophes:

$ seq 5 | parallel 'echo $(({}+{}))'

2
4
6
8
10

Rewritten in possibly a more common form, without pipeline:

parallel 'echo $(({}+{}))' ::: $(seq 5)

2
4
6
8
10

Multiple operations on entries before parallel?

Concerning WP-CLI, it would be really cool if multiple commands can be run parallel, that each do something with the output.

Example:

  • Have a sequence 1...5
  • Do this in parallel for each of the numbers:
    • Multiply an entry by 2
    • Add 1 to the result.

Causation is important here: If 'adding 1' is done in parallel to 'multiply by 2', the results might become unpredictable.

Let's try:

seq 5 | parallel 'echo ((2*{}))'
...

Reuse argument

Casus that I encounter using WP-CLI sometimes:

  • Have a sequence 1...5
  • Do this in parallel for each of the numbers:
    • Multiply entry by 2
    • Multiply entry by 3
    • Add the outcome of these two multiplications

Let's start with the last three lines and first make sure I get that part right :)

i=5
echo $((2*$i + 3*$i))

Now together:

seq 5 | parallel 'echo $((2*{} + 3*{}))'

And why stop here?

seq 5 | parallel 'echo "{} - $((2*{}+3*{}))"'

Evaluate parallel argument first

This won't work:

$ seq 20 | parallel echo $(({}+{}))

bash: {}+{}: syntax error: operand expected (error token is "{}+{}")

The reason: The part between () is evaluated first, and only then it is interpreted as an argument for parallel.

To change that, put the parallel argument between single quotes:

seq 20 | parallel 'echo $(({}+{}))'

BTW, this doesn't work:

$ seq 20 | parallel echo $(('{}'+'{}'))

bash: '{}'+'{}': syntax error: operand expected (error token is "'{}'+'{}'")

Reusing an argument multiple times

You can use the parallel argument multiple times: Just use {} multiple times:

seq 20 | parallel echo $(({}+{}))

Questions

  • Is it about threads, processors, cores, sockets or what?
  • Do I need to optimize myself for the number of threads? Or just leave this up to GNU Parallels?
  • What are the effects for sub-optimized cases?

Answers

  • What cylinders are in a car, are processors or processing units in a computer. See Processors, cores & threads on this computer (Bash) for details.
  • GNU Parallel clearly knows what the optimal number of threads is. See below in the testcode for the case with sem -j +0: Here the number of threads is the same as the number of processors, and the statistics confirm this
  • When optimizing manually, rather choose a bit too high a number of threads, than too low. However, this very much depends on the use case. E.g.: If CPU power is the bottleneck or I/O - I'm quite sure that for me, it's usually CPU-power, though.

Test scripts

################################################################################
# Thread optimalisation
################################################################################
#
# My laptop can do 8 threads. Let's see what happens to performance when I
# force more or less threads:
#
#
# parallel_test_function()
########################################
#
function parallel_test_function()
{
	printf "PTF - Start... "
	i=0
	for ((i; i<=100000; i++))
	do
		i=$i+1
		i=$i-1
	done
	printf "Done. "
}
export -f parallel_test_function


# # Test - 8 threads
# ########################################
# #
# # * Execution time (s): 5, 5, 5, 5 ⇒ 5s
# #
# start=`date +%s`
# #
# sem -j 8 parallel_test_function
# sem -j 8 parallel_test_function
# sem -j 8 parallel_test_function
# sem -j 8 parallel_test_function
# sem -j 8 parallel_test_function
# sem -j 8 parallel_test_function
# sem -j 8 parallel_test_function
# sem -j 8 parallel_test_function
# sem -j 8 parallel_test_function
# sem -j 8 parallel_test_function
# sem -j 8 parallel_test_function
# sem -j 8 parallel_test_function
# sem -j 8 parallel_test_function
# sem -j 8 parallel_test_function
# sem -j 8 parallel_test_function
# sem -j 8 parallel_test_function
# sem -j 8 parallel_test_function
# sem -j 8 parallel_test_function
# sem -j 8 parallel_test_function
# sem -j 8 parallel_test_function
# sem -j 8 parallel_test_function
# sem -j 8 parallel_test_function
# sem -j 8 parallel_test_function
# sem -j 8 parallel_test_function
# sem -j 8 parallel_test_function
# sem -j 8 parallel_test_function
# sem -j 8 parallel_test_function
# sem -j 8 parallel_test_function
# sem -j 8 parallel_test_function
# sem -j 8 parallel_test_function
# sem -j 8 parallel_test_function
# sem -j 8 parallel_test_function
# #
# sem --wait
# end=`date +%s`
# echo ""; echo Execution time was `expr $end - $start` seconds.


# Test - 16 threads
########################################
#
# * Execution time (s): 5, 5, 5, 5, 5 ⇒ 5s
#
# start=`date +%s`
# #
# sem -j 16 parallel_test_function
# sem -j 16 parallel_test_function
# sem -j 16 parallel_test_function
# sem -j 16 parallel_test_function
# sem -j 16 parallel_test_function
# sem -j 16 parallel_test_function
# sem -j 16 parallel_test_function
# sem -j 16 parallel_test_function
# sem -j 16 parallel_test_function
# sem -j 16 parallel_test_function
# sem -j 16 parallel_test_function
# sem -j 16 parallel_test_function
# sem -j 16 parallel_test_function
# sem -j 16 parallel_test_function
# sem -j 16 parallel_test_function
# sem -j 16 parallel_test_function
# sem -j 16 parallel_test_function
# sem -j 16 parallel_test_function
# sem -j 16 parallel_test_function
# sem -j 16 parallel_test_function
# sem -j 16 parallel_test_function
# sem -j 16 parallel_test_function
# sem -j 16 parallel_test_function
# sem -j 16 parallel_test_function
# sem -j 16 parallel_test_function
# sem -j 16 parallel_test_function
# sem -j 16 parallel_test_function
# sem -j 16 parallel_test_function
# sem -j 16 parallel_test_function
# sem -j 16 parallel_test_function
# sem -j 16 parallel_test_function
# sem -j 16 parallel_test_function
# #
# sem --wait
# end=`date +%s`
# echo ""; echo Execution time was `expr $end - $start` seconds.


# # Test - 32 threads
# ########################################
# #
# # * Execution time (s): 6, 6, 5, 5, 6 ⇒ 5.6s
# #
# start=`date +%s`
# #
# sem -j 32 parallel_test_function
# sem -j 32 parallel_test_function
# sem -j 32 parallel_test_function
# sem -j 32 parallel_test_function
# sem -j 32 parallel_test_function
# sem -j 32 parallel_test_function
# sem -j 32 parallel_test_function
# sem -j 32 parallel_test_function
# sem -j 32 parallel_test_function
# sem -j 32 parallel_test_function
# sem -j 32 parallel_test_function
# sem -j 32 parallel_test_function
# sem -j 32 parallel_test_function
# sem -j 32 parallel_test_function
# sem -j 32 parallel_test_function
# sem -j 32 parallel_test_function
# sem -j 32 parallel_test_function
# sem -j 32 parallel_test_function
# sem -j 32 parallel_test_function
# sem -j 32 parallel_test_function
# sem -j 32 parallel_test_function
# sem -j 32 parallel_test_function
# sem -j 32 parallel_test_function
# sem -j 32 parallel_test_function
# sem -j 32 parallel_test_function
# sem -j 32 parallel_test_function
# sem -j 32 parallel_test_function
# sem -j 32 parallel_test_function
# sem -j 32 parallel_test_function
# sem -j 32 parallel_test_function
# sem -j 32 parallel_test_function
# sem -j 32 parallel_test_function
# #
# sem --wait
# end=`date +%s`
# echo ""; echo Execution time was `expr $end - $start` seconds.


# # Test - 4 threads
# ########################################
# #
# # * Execution time (s): 5, 6, 5, 6, 6, 5 ⇒ 5.5s
# #
# start=`date +%s`
# #
# sem -j 4 parallel_test_function
# sem -j 4 parallel_test_function
# sem -j 4 parallel_test_function
# sem -j 4 parallel_test_function
# sem -j 4 parallel_test_function
# sem -j 4 parallel_test_function
# sem -j 4 parallel_test_function
# sem -j 4 parallel_test_function
# sem -j 4 parallel_test_function
# sem -j 4 parallel_test_function
# sem -j 4 parallel_test_function
# sem -j 4 parallel_test_function
# sem -j 4 parallel_test_function
# sem -j 4 parallel_test_function
# sem -j 4 parallel_test_function
# sem -j 4 parallel_test_function
# sem -j 4 parallel_test_function
# sem -j 4 parallel_test_function
# sem -j 4 parallel_test_function
# sem -j 4 parallel_test_function
# sem -j 4 parallel_test_function
# sem -j 4 parallel_test_function
# sem -j 4 parallel_test_function
# sem -j 4 parallel_test_function
# sem -j 4 parallel_test_function
# sem -j 4 parallel_test_function
# sem -j 4 parallel_test_function
# sem -j 4 parallel_test_function
# sem -j 4 parallel_test_function
# sem -j 4 parallel_test_function
# sem -j 4 parallel_test_function
# sem -j 4 parallel_test_function
# #
# sem --wait
# end=`date +%s`
# echo ""; echo Execution time was `expr $end - $start` seconds.


# # Test - 2 threads
# ########################################
# #
# # * Execution time (s): 7, 6, 6, 7, 7, 7 ⇒ 6.7s
# #
# start=`date +%s`
# #
# sem -j 2 parallel_test_function
# sem -j 2 parallel_test_function
# sem -j 2 parallel_test_function
# sem -j 2 parallel_test_function
# sem -j 2 parallel_test_function
# sem -j 2 parallel_test_function
# sem -j 2 parallel_test_function
# sem -j 2 parallel_test_function
# sem -j 2 parallel_test_function
# sem -j 2 parallel_test_function
# sem -j 2 parallel_test_function
# sem -j 2 parallel_test_function
# sem -j 2 parallel_test_function
# sem -j 2 parallel_test_function
# sem -j 2 parallel_test_function
# sem -j 2 parallel_test_function
# sem -j 2 parallel_test_function
# sem -j 2 parallel_test_function
# sem -j 2 parallel_test_function
# sem -j 2 parallel_test_function
# sem -j 2 parallel_test_function
# sem -j 2 parallel_test_function
# sem -j 2 parallel_test_function
# sem -j 2 parallel_test_function
# sem -j 2 parallel_test_function
# sem -j 2 parallel_test_function
# sem -j 2 parallel_test_function
# sem -j 2 parallel_test_function
# sem -j 2 parallel_test_function
# sem -j 2 parallel_test_function
# sem -j 2 parallel_test_function
# sem -j 2 parallel_test_function
# #
# sem --wait
# end=`date +%s`
# echo ""; echo Execution time was `expr $end - $start` seconds.


# # Test - 1 thread
# ########################################
# #
# # * Execution time (s): 12, 12, 12, 12 ⇒ 12s
# #
# start=`date +%s`
# #
# sem -j 1 parallel_test_function
# sem -j 1 parallel_test_function
# sem -j 1 parallel_test_function
# sem -j 1 parallel_test_function
# sem -j 1 parallel_test_function
# sem -j 1 parallel_test_function
# sem -j 1 parallel_test_function
# sem -j 1 parallel_test_function
# sem -j 1 parallel_test_function
# sem -j 1 parallel_test_function
# sem -j 1 parallel_test_function
# sem -j 1 parallel_test_function
# sem -j 1 parallel_test_function
# sem -j 1 parallel_test_function
# sem -j 1 parallel_test_function
# sem -j 1 parallel_test_function
# sem -j 1 parallel_test_function
# sem -j 1 parallel_test_function
# sem -j 1 parallel_test_function
# sem -j 1 parallel_test_function
# sem -j 1 parallel_test_function
# sem -j 1 parallel_test_function
# sem -j 1 parallel_test_function
# sem -j 1 parallel_test_function
# sem -j 1 parallel_test_function
# sem -j 1 parallel_test_function
# sem -j 1 parallel_test_function
# sem -j 1 parallel_test_function
# sem -j 1 parallel_test_function
# sem -j 1 parallel_test_function
# sem -j 1 parallel_test_function
# sem -j 1 parallel_test_function
# #
# sem --wait
# end=`date +%s`
# echo ""; echo Execution time was `expr $end - $start` seconds.


# Test - Auto-optimized
########################################
#
# * Execution time (s): 5, 5, 5, 5, 5 ⇒ 5s
#
start=`date +%s`
#
sem -j +0 parallel_test_function
sem -j +0 parallel_test_function
sem -j +0 parallel_test_function
sem -j +0 parallel_test_function
sem -j +0 parallel_test_function
sem -j +0 parallel_test_function
sem -j +0 parallel_test_function
sem -j +0 parallel_test_function
sem -j +0 parallel_test_function
sem -j +0 parallel_test_function
sem -j +0 parallel_test_function
sem -j +0 parallel_test_function
sem -j +0 parallel_test_function
sem -j +0 parallel_test_function
sem -j +0 parallel_test_function
sem -j +0 parallel_test_function
sem -j +0 parallel_test_function
sem -j +0 parallel_test_function
sem -j +0 parallel_test_function
sem -j +0 parallel_test_function
sem -j +0 parallel_test_function
sem -j +0 parallel_test_function
sem -j +0 parallel_test_function
sem -j +0 parallel_test_function
sem -j +0 parallel_test_function
sem -j +0 parallel_test_function
sem -j +0 parallel_test_function
sem -j +0 parallel_test_function
sem -j +0 parallel_test_function
sem -j +0 parallel_test_function
sem -j +0 parallel_test_function
sem -j +0 parallel_test_function

#
sem --wait
end=`date +%s`
echo ""; echo Execution time was `expr $end - $start` seconds.

Sources

I have the impression that GNU Parallel creates a subshell and that precautions have to be taken to assure that functions and variables are available in that subshell.

Not that this subshell stuff is not the same as scope within a single shell

Without exported function or variable

################################################################################
# Subshell & vars? - Without exporting function or var
################################################################################
#
# Function
########################################
#
subfunction()
{
   echo "Subfunction - Var j: $j"
}


# Main
########################################
#
j=12
echo "Main - var j: $j"
#
parallel subfunction ::: $(seq 5)

Output:

Main - var j: 12
/bin/bash: subfunction: command not found
/bin/bash: subfunction: command not found
/bin/bash: subfunction: command not found
/bin/bash: subfunction: command not found
/bin/bash: subfunction: command not found

With exported function

Now the function is exported using export -f subfunction and GNU Parallel can find it. However, the variable j is not available within this function.

################################################################################
# Subshell & vars? - With exporting function
################################################################################
#
# Function
########################################
#
subfunction()
{
   echo "Subfunction - Var j: $j"
}


# Main
########################################
#
j=12
echo "Main - var j: $j"
export -f subfunction
#
parallel subfunction ::: $(seq 5)

Output:

Main - var j: 12
Subfunction - Var j: 
Subfunction - Var j: 
Subfunction - Var j: 
Subfunction - Var j: 
Subfunction - Var j: 

With exported function and exported variable

Juhu! Sometimes, things are easy:

################################################################################
# Subshell & vars? - With exporting function
################################################################################
#
# Function
########################################
#
subfunction()
{
   echo "Subfunction - Var j: $j"
}


# Main
########################################
#
j=12
echo "Main - var j: $j"
export -f subfunction
export j
#
parallel subfunction ::: $(seq 5)

Output:

Main - var j: 12
Subfunction - Var j: 12
Subfunction - Var j: 12
Subfunction - Var j: 12
Subfunction - Var j: 12
Subfunction - Var j: 12

But not for arrays

It seems that regular arrays and associate arrays cannot be exported to subshells:

################################################################################
# Subshell, var & arrays
################################################################################
#
# Function
########################################
#
subfunction()
{
   echo "function - Var i:               $i"
   echo "function - Associative array j: ${j[@]}"
   echo "function - Regular array k:     ${k[@]}"
}


# Main
########################################
#
i=12

declare -gA j

j[foo,1]="Foo-1"
j[bar,2]="Bar-2"

k[1]="K1"
k[2]="K2"


echo "Main - var j: $i"
export -f subfunction
export i
export j    # Doesn't work
export j[@]    # Doesn't work
export k    # Doesn't work
export k[@]    # Doesn't work
export {k[@]}  # Doesn't work
#
parallel subfunction ::: $(seq 5)

Output:

Main - var j: 12
./parallel.sh: line 181: export: `j[@]': not a valid identifier
./parallel.sh: line 183: export: `k[@]': not a valid identifier
./parallel.sh: line 184: export: `{k[@]}': not a valid identifier
function - Var i:               12
function - Associative array j: 
function - Regular array k:     
function - Var i:               12
function - Associative array j: 
function - Regular array k:     
function - Var i:               12
function - Associative array j: 
function - Regular array k:     
function - Var i:               12
function - Associative array j: 
function - Regular array k:     
function - Var i:               12
function - Associative array j: 
function - Regular array k:   

Pass arrays to GNU Parallel

As mentioned before, you cannot pass an array (regular or associative) to subshells and therefore in some situations to Parallels. But there is hope [1]:

  • Use :::+?
  • Something with exporting functions?
  • ?

Using :::+

This works exactly as intended:

################################################################################
# GNU Parallel & associative array
################################################################################
#
# tmps()
########################################
#
tmp2()
{
   echo ""; echo "tmp2: " 
   echo $1, $2, $3
}

# Build the associative array
########################################
#
declare -gA j

j[1,tag]="_tool_"
j[1,nl]="Boormachine"
j[1,en]="Drilling machine"

j[2,tag]="_tool_"
j[2,nl]="Zaag"
j[2,en]="Saw"

j[3,tag]="_dim_"
j[3,nl]="1,3"
j[3,en]="1.3"

j_rows=3

# echo ${j[@]}


# Convert to regular array
########################################
#
# * I need a structure with "Boormachine", "Zaag" en "1,3" that can be used
#   as an argument for Parallel. That's not possible with an associate
#   array, I guess. E.g.: echo ${j[1*]} doesn't work
# * The code below to construct temporary regular arrays, is quite
#   inefficient. Would be nice to do it in a more efficient way,
#   maybe using GNU Parallel instead of a loop?
#
#
unset j_tag
unset j_nl
unset j_en

for i in $(seq $j_rows)
do
   j_tag+=("${j[$i,tag]}")
   j_nl+=("${j[$i,nl]}")
   j_en+=("${j[$i,en]}")
done	

# echo "j_tag: ${j_tag[@]}"
# echo "j_nl: ${j_nl[@]}"
# echo "j_en: ${j_en[@]}"


# Invoke GNU Parallel
########################################
#
export -f tmp2
#
# parallel tmp2 ::: "${j_tag[@]}" ::: "${j_nl[@]}" ::: "${j_en[@]}"	# 27 combinations?
# parallel tmp2 ::: "${j_tag[@]}" ::: "${j_nl[@]}"	                # 9 combinations
# parallel tmp2 ::: "${j_tag[@]}" :::+ "${j_nl[@]}"	                # 6 combinations
#
parallel tmp2 ::: "${j_tag[@]}" :::+ "${j_nl[@]}" :::+ "${j_en[@]}"	# 3 rows - As intended

BTW: The function that is used here, could easily be replacedd by a direct statement, leading to something like this:

# With a single inline operator
########################################
#
parallel echo {1} {2} {3} ::: "${j_tag[@]}" :::+ "${j_nl[@]}" :::+ "${j_en[@]}"

And for something even more exciting: Now with multiple inline commands:

# With multiple inline operators
########################################
#
parallel echo {1} {2} {3} ::: "${j_tag[@]}" :::+ "${j_nl[@]}" :::+ "${j_en[@]}"

Sources

case: collect term_ids through wp-cli

Seems like a good case for replacing a loop with parallel.

Original code:

# Collect all term_ids through a loop
#######################################
#
# * There are 1.433 terms to collect
# * Max. 100 items are returned at once
# * Hence this loop needs 15 iterations
#
i=1
echo "Loop - Collect all term_ids"
#
for ((i; i<=$number_of_iterations; i++))	
do
	#
	echo "	Iteration $i/$number_of_iterations"
	#
	# Store batch of term ids in tmp array j
	########################################
	#
	mapfile -t j < <( wp wc product_attribute_term list \
		$taxonomy_id \
		--user=4 \
		--field=id \
		--offset=$((($i-1)*100)) | grep . )
	#
	# echo "	j: ${j[@]}"
	#
	# Append to array term_id
	########################################
	#
	term_id=(${term_id[@]} ${j[@]})
	echo "		Length term_id: ${#term_id[@]}"
	#
done

New code: ...

WP-CLI & Parallel

Finally got it working - Thanks to ChatGPT! → https://wiki.devliegendebrigade.nl/Wp_wc_product_delete

See also

Sources