GNU Parallel: verschil tussen versies

Uit De Vliegende Brigade
Naar navigatie springen Naar zoeken springen
Regel 71: Regel 71:
  
 
It often seems that <code>xargs</code> implicitly picks up where to insert the piped stuff. Same for parallel?
 
It often seems that <code>xargs</code> implicitly picks up where to insert the piped stuff. Same for parallel?
 
== Multiple commands inline - Example ==
 
 
With the right syntaxis, it's perfectly possible to include multiple statements when invoking GNU Parallel.
 
 
Let's start here:
 
 
<pre>
 
parallel echo ::: $(seq 5)
 
</pre>
 
 
which is synonymous to
 
 
<pre>
 
parallel echo ::: `seq 5`
 
</pre>
 
 
And these two commands already hold the key to executing multiple commands in a parallel invocation: You have to encapsulate them, so that they get executed as one unit (subshell?) at the right moment.
 
 
Now expand this to two commands with some trial and error:
 
 
<pre>
 
# I guess parallel doesn't see a connection between the echo statement
 
# and the seq statement
 
#
 
$ parallel echo "foobar"; echo {} ::: `seq 5`
 
 
parallel: Warning: Input is read from the terminal. You either know what you
 
parallel: Warning: are doing (in which case: YOU ARE AWESOME!) or you forgot
 
parallel: Warning: ::: or :::: or to pipe data into parallel. If so
 
parallel: Warning: consider going through the tutorial: man parallel_tutorial
 
parallel: Warning: Press CTRL-D to exit.
 
</pre>
 
 
Another trial with error:
 
 
<pre>
 
# The part between $() gets evaluated first and results in "foobar 1",
 
# "foobar 2", etc. - lacking bash commands
 
#
 
$ parallel $(echo "foobar"; echo {}) ::: `seq 5`
 
 
/bin/bash: foobar: command not found
 
/bin/bash: foobar: command not found
 
/bin/bash: foobar: command not found
 
/bin/bash: foobar: command not found
 
/bin/bash: foobar: command not found
 
</pre>
 
 
Finally, a trial-without-error:
 
 
<pre>
 
$ parallel echo $(echo "foobar"; echo {}) ::: `seq 5`
 
 
foobar 1
 
foobar 2
 
foobar 3
 
foobar 4
 
foobar 5
 
</pre>
 
 
* It may look like there is seemingly one <code>echo</code> statement too much, but it really isn't: The once between <code>$()</code> get evaluated and therefore disappear before Parallel gets to to anything - And Parallel does need a command to execute
 
* Note that the stuff between <code>$()</code> does get the 'distributed input'
 
 
Let's expand this with a static additional statement, just for size:
 
 
<pre>
 
parallel echo $(echo "Hoi"; echo {}; echo $((1+1))) ::: $(seq 5)
 
</pre>
 
 
Let's now make the sum in the last statement dynamic:
 
 
<pre>
 
$ parallel echo $(echo "Hoi"; echo {}; echo $(({}+{}))) ::: $(seq 5)
 
 
bash: {}+{}: syntax error: operand expected (error token is "{}+{}")
 
...
 
</pre>
 
 
The problem: The inner stuff gets evaluated before <code>{}</code> gets substituted by Parallel.
 
 
Change this by putting the stuff that needs to be substituted through Parallel first, between apostrophes:
 
 
<pre>
 
$ parallel$ parallel echo '$(echo "Hoi"; echo {}; echo $(({}+{})))' ::: $(seq 5)
 
 
Hoi 1 2
 
Hoi 2 4
 
Hoi 3 6
 
Hoi 4 8
 
Hoi 5 10
 
</pre>
 
 
How I tend to interpret this:
 
 
* <code>$()</code> is general Bash syntaxis concerning redirecting (?) the outcome of something
 
* The apostrophes are a Parallel trick, to change the order of evaluation.
 
 
In this last example, let's see what happens if you remove the <code>$()</code> part:
 
 
<pre>
 
$ parallel echo 'echo "Hoi"; echo {}; echo $(({}+{}))' ::: $(seq 5)
 
echo Hoi
 
1
 
2
 
echo Hoi
 
2
 
4
 
echo Hoi
 
3
 
6
 
echo Hoi
 
4
 
8
 
echo Hoi
 
5
 
10
 
</pre>
 
 
What I think is happening:
 
 
There no part that gets evaluted first. It's just like calling parallel with something like
 
 
<pre>
 
parallel echo ::: (echo "hoi"; echo {}; echo $(({}+{}))) ::: $(seq 5)
 
</pre>
 
 
as if the first part is just an array with 3 elements. Note that <code>{}</code> does get substituted correctly.
 
 
=== Splitting over multiple lines ===
 
 
Let's try to split the inline commands over multiple lines, to make them more readible:
 
 
<pre>
 
parallel echo \
 
$(  \
 
echo "Foo"; \
 
echo "bar"; \
 
echo "One"; \
 
echo {} \
 
) ::: $(seq 5)
 
</pre>
 
 
Output:
 
 
<pre>
 
Foo bar One 1
 
Foo bar One 2
 
Foo bar One 3
 
Foo bar One 4
 
Foo bar One 5
 
</pre>
 
 
This works too:
 
 
<pre>
 
parallel echo \
 
$(  \
 
echo "Foo"; \
 
echo "bar"; \
 
echo "One"; \
 
echo {} \
 
) \
 
::: $(seq 5)
 
</pre>
 
 
=== Hierarchie of () ===
 
 
This doesn't work: Bash gets confused by the ()'s:
 
 
<pre>
 
parallel echo \
 
$( \
 
sql="update wp_terms join wp_term_taxonomy using (term_id) "; \
 
sql+="set slug=replace(slug, '{1}', '{2}') "; \
 
sql+="where taxonomy='product_cat';"; \
 
echo "$sql" \
 
) ::: $(seq 5) :::+ $(seq 6 10)
 
</pre>
 
 
Output:
 
 
<pre>
 
/bin/bash: -c: line 0: syntax error near unexpected token `('
 
/bin/bash: -c: line 0: `echo update wp_terms join wp_term_taxonomy using term_id set slug=replace(slug, 1, 6) where taxonomy=product_cat;'
 
/bin/bash: -c: line 0: syntax error near unexpected token `('
 
/bin/bash: -c: line 0: `echo update wp_terms join wp_term_taxonomy using term_id set slug=replace(slug, 2, 7) where taxonomy=product_cat;'
 
/bin/bash: -c: line 0: syntax error near unexpected token `('
 
/bin/bash: -c: line 0: `echo update wp_terms join wp_term_taxonomy using term_id set slug=replace(slug, 3, 8) where taxonomy=product_cat;'
 
/bin/bash: -c: line 0: syntax error near unexpected token `('
 
/bin/bash: -c: line 0: `echo update wp_terms join wp_term_taxonomy using term_id set slug=replace(slug, 4, 9) where taxonomy=product_cat;'
 
/bin/bash: -c: line 0: syntax error near unexpected token `('
 
/bin/bash: -c: line 0: `echo update wp_terms join wp_term_taxonomy using term_id set slug=replace(slug, 5, 10) where taxonomy=product_cat;'
 
</pre>
 
  
 
== An array isn't parallel - Unless it is? ==
 
== An array isn't parallel - Unless it is? ==

Versie van 13 sep 2023 13:05

Logo GNU Parallel

GNU Parallel, Parallel or parallel is a shell routine to distribute task over multiple threads. It has been written by Ole Tange in Perl.

Some impressions of where parallel can help:

First line in chapter 1 of the manual reads:

If you write shell scripts to do the same processing for different input,
then GNU Parallel will make your life easier and make your scripts run
faster.

From the man page:

If you write loops in shell, you will find GNU parallel may be able to
replace most of the loops and make them run faster by running several
jobs in parallel.

This article is as of Nov. 2022 a work in progress. That's why it looks like a brainstorm session, rather than a structured article.

Installation

$ sudo apt install parallel

...
The following NEW packages will be installed:
  parallel sysstat
...

Example intro chapter 1

The intro of chapter 1 of the manual contains this example:

seq 5 | parallel seq {} '>' example.{}

What it does:

  • seq 5 produces a series with numbers from 1 to 5
  • seq {} this takes those numbers as arguments for creating additional sequences: A sequence with only the number 1, a sequence with the numbers 1 and 2, until a sequence with the numbers 1 to 5
  • '>' example.{}: These 5 sequences are written to files example.1...example.5.

An even simpler example, although not very useful:

seq 5 | paralel echo {}

Here, the five echo commands are executed parallel.

One operator - Multiple arguments

Example of one operator with multiple arguments. In this case, the arguments are generated on the left of the command line, and piped into parallel. There is on

seq 10 | parallel echo {}

Is this the same as

seq 10 | parallel echo   # Same as above?

It often seems that xargs implicitly picks up where to insert the piped stuff. Same for parallel?

An array isn't parallel - Unless it is?

I have the impression that here, parallel doesn't treat array entries as parallel stuff, but the whole entry as just one argument:

j=(1 2 3 4 5)
echo ${j[@]}

seq 5 | parallel echo {}
echo ${j[@]} | parallel echo {}

seq 5 | parallel 'echo $(({}+{}))'
echo ${j[@]} | parallel 'echo $(({}+{}))'   # Error: Invalid arithmetic operator

But this works:

j=(1 2 3 4 5)
parallel echo ::: ${j[@]}

Operate on entries before parallel?

Can you first operate on an entry before its being processed by parallel?

Example using ::::

# First the argument is expanded and only then the operator applied
# (hence to the last item only)
#
$ i=$(seq 5)
$ parallel echo ::: $i+1

1
2
3
4
5+1

The sum of the elements get evaluated first, as this part of the statement is within apostrophes:

$ seq 5 | parallel 'echo $(({}+{}))'

2
4
6
8
10

Rewritten in possibly a more common form, without pipeline:

parallel 'echo $(({}+{}))' ::: $(seq 5)

2
4
6
8
10

Multiple operations on entries before parallel?

Concerning WP-CLI, it would be really cool if multiple commands can be run parallel, that each do something with the output.

Example:

  • Have a sequence 1...5
  • Do this in parallel for each of the numbers:
    • Multiply an entry by 2
    • Add 1 to the result.

Causation is important here: If 'adding 1' is done in parallel to 'multiply by 2', the results might become unpredictable.

Let's try:

seq 5 | parallel 'echo ((2*{}))'
...

Reuse argument

Casus that I encounter using WP-CLI sometimes:

  • Have a sequence 1...5
  • Do this in parallel for each of the numbers:
    • Multiply entry by 2
    • Multiply entry by 3
    • Add the outcome of these two multiplications

Let's start with the last three lines and first make sure I get that part right :)

i=5
echo $((2*$i + 3*$i))

Now together:

seq 5 | parallel 'echo $((2*{} + 3*{}))'

And why stop here?

seq 5 | parallel 'echo "{} - $((2*{}+3*{}))"'

Evaluate parallel argument first

This won't work:

$ seq 20 | parallel echo $(({}+{}))

bash: {}+{}: syntax error: operand expected (error token is "{}+{}")

The reason: The part between () is evaluated first, and only then it is interpreted as an argument for parallel.

To change that, put the parallel argument between single quotes:

seq 20 | parallel 'echo $(({}+{}))'

BTW, this doesn't work:

$ seq 20 | parallel echo $(('{}'+'{}'))

bash: '{}'+'{}': syntax error: operand expected (error token is "'{}'+'{}'")

Reusing an argument multiple times

You can use the parallel argument multiple times: Just use {} multiple times:

seq 20 | parallel echo $(({}+{}))

sem

sem stands for semaphore, a token that is passed around to do stuff in parallel. I bumped into this hier tegen, but I am not sure it works for me like this within a loop. The Parallels manual doesn't seem to be exhaustive concerning this topic. I found https://www.gnu.org/software/parallel/sem.html a much better source.

Questions

  • Is it about threads, processors, cores, sockets or what?
  • Do I need to optimize myself for the number of threads? Or just leave this up to GNU Parallels?
  • What are the effects for sub-optimized cases?

Answers

  • What cylinders are in a car, are processors or processing units in a computer. See Processors, cores & threads on this computer (Bash) for details.
  • GNU Parallel clearly knows what the optimal number of threads is. See below in the testcode for the case with sem -j +0: Here the number of threads is the same as the number of processors, and the statistics confirm this
  • When optimizing manually, rather choose a bit too high a number of threads, than too low. However, this very much depends on the use case. E.g.: If CPU power is the bottleneck or I/O - I'm quite sure that for me, it's usually CPU-power, though.

Test scripts

################################################################################
# Thread optimalisation
################################################################################
#
# My laptop can do 8 threads. Let's see what happens to performance when I
# force more or less threads:
#
#
# parallel_test_function()
########################################
#
function parallel_test_function()
{
	printf "PTF - Start... "
	i=0
	for ((i; i<=100000; i++))
	do
		i=$i+1
		i=$i-1
	done
	printf "Done. "
}
export -f parallel_test_function


# # Test - 8 threads
# ########################################
# #
# # * Execution time (s): 5, 5, 5, 5 ⇒ 5s
# #
# start=`date +%s`
# #
# sem -j 8 parallel_test_function
# sem -j 8 parallel_test_function
# sem -j 8 parallel_test_function
# sem -j 8 parallel_test_function
# sem -j 8 parallel_test_function
# sem -j 8 parallel_test_function
# sem -j 8 parallel_test_function
# sem -j 8 parallel_test_function
# sem -j 8 parallel_test_function
# sem -j 8 parallel_test_function
# sem -j 8 parallel_test_function
# sem -j 8 parallel_test_function
# sem -j 8 parallel_test_function
# sem -j 8 parallel_test_function
# sem -j 8 parallel_test_function
# sem -j 8 parallel_test_function
# sem -j 8 parallel_test_function
# sem -j 8 parallel_test_function
# sem -j 8 parallel_test_function
# sem -j 8 parallel_test_function
# sem -j 8 parallel_test_function
# sem -j 8 parallel_test_function
# sem -j 8 parallel_test_function
# sem -j 8 parallel_test_function
# sem -j 8 parallel_test_function
# sem -j 8 parallel_test_function
# sem -j 8 parallel_test_function
# sem -j 8 parallel_test_function
# sem -j 8 parallel_test_function
# sem -j 8 parallel_test_function
# sem -j 8 parallel_test_function
# sem -j 8 parallel_test_function
# #
# sem --wait
# end=`date +%s`
# echo ""; echo Execution time was `expr $end - $start` seconds.


# Test - 16 threads
########################################
#
# * Execution time (s): 5, 5, 5, 5, 5 ⇒ 5s
#
# start=`date +%s`
# #
# sem -j 16 parallel_test_function
# sem -j 16 parallel_test_function
# sem -j 16 parallel_test_function
# sem -j 16 parallel_test_function
# sem -j 16 parallel_test_function
# sem -j 16 parallel_test_function
# sem -j 16 parallel_test_function
# sem -j 16 parallel_test_function
# sem -j 16 parallel_test_function
# sem -j 16 parallel_test_function
# sem -j 16 parallel_test_function
# sem -j 16 parallel_test_function
# sem -j 16 parallel_test_function
# sem -j 16 parallel_test_function
# sem -j 16 parallel_test_function
# sem -j 16 parallel_test_function
# sem -j 16 parallel_test_function
# sem -j 16 parallel_test_function
# sem -j 16 parallel_test_function
# sem -j 16 parallel_test_function
# sem -j 16 parallel_test_function
# sem -j 16 parallel_test_function
# sem -j 16 parallel_test_function
# sem -j 16 parallel_test_function
# sem -j 16 parallel_test_function
# sem -j 16 parallel_test_function
# sem -j 16 parallel_test_function
# sem -j 16 parallel_test_function
# sem -j 16 parallel_test_function
# sem -j 16 parallel_test_function
# sem -j 16 parallel_test_function
# sem -j 16 parallel_test_function
# #
# sem --wait
# end=`date +%s`
# echo ""; echo Execution time was `expr $end - $start` seconds.


# # Test - 32 threads
# ########################################
# #
# # * Execution time (s): 6, 6, 5, 5, 6 ⇒ 5.6s
# #
# start=`date +%s`
# #
# sem -j 32 parallel_test_function
# sem -j 32 parallel_test_function
# sem -j 32 parallel_test_function
# sem -j 32 parallel_test_function
# sem -j 32 parallel_test_function
# sem -j 32 parallel_test_function
# sem -j 32 parallel_test_function
# sem -j 32 parallel_test_function
# sem -j 32 parallel_test_function
# sem -j 32 parallel_test_function
# sem -j 32 parallel_test_function
# sem -j 32 parallel_test_function
# sem -j 32 parallel_test_function
# sem -j 32 parallel_test_function
# sem -j 32 parallel_test_function
# sem -j 32 parallel_test_function
# sem -j 32 parallel_test_function
# sem -j 32 parallel_test_function
# sem -j 32 parallel_test_function
# sem -j 32 parallel_test_function
# sem -j 32 parallel_test_function
# sem -j 32 parallel_test_function
# sem -j 32 parallel_test_function
# sem -j 32 parallel_test_function
# sem -j 32 parallel_test_function
# sem -j 32 parallel_test_function
# sem -j 32 parallel_test_function
# sem -j 32 parallel_test_function
# sem -j 32 parallel_test_function
# sem -j 32 parallel_test_function
# sem -j 32 parallel_test_function
# sem -j 32 parallel_test_function
# #
# sem --wait
# end=`date +%s`
# echo ""; echo Execution time was `expr $end - $start` seconds.


# # Test - 4 threads
# ########################################
# #
# # * Execution time (s): 5, 6, 5, 6, 6, 5 ⇒ 5.5s
# #
# start=`date +%s`
# #
# sem -j 4 parallel_test_function
# sem -j 4 parallel_test_function
# sem -j 4 parallel_test_function
# sem -j 4 parallel_test_function
# sem -j 4 parallel_test_function
# sem -j 4 parallel_test_function
# sem -j 4 parallel_test_function
# sem -j 4 parallel_test_function
# sem -j 4 parallel_test_function
# sem -j 4 parallel_test_function
# sem -j 4 parallel_test_function
# sem -j 4 parallel_test_function
# sem -j 4 parallel_test_function
# sem -j 4 parallel_test_function
# sem -j 4 parallel_test_function
# sem -j 4 parallel_test_function
# sem -j 4 parallel_test_function
# sem -j 4 parallel_test_function
# sem -j 4 parallel_test_function
# sem -j 4 parallel_test_function
# sem -j 4 parallel_test_function
# sem -j 4 parallel_test_function
# sem -j 4 parallel_test_function
# sem -j 4 parallel_test_function
# sem -j 4 parallel_test_function
# sem -j 4 parallel_test_function
# sem -j 4 parallel_test_function
# sem -j 4 parallel_test_function
# sem -j 4 parallel_test_function
# sem -j 4 parallel_test_function
# sem -j 4 parallel_test_function
# sem -j 4 parallel_test_function
# #
# sem --wait
# end=`date +%s`
# echo ""; echo Execution time was `expr $end - $start` seconds.


# # Test - 2 threads
# ########################################
# #
# # * Execution time (s): 7, 6, 6, 7, 7, 7 ⇒ 6.7s
# #
# start=`date +%s`
# #
# sem -j 2 parallel_test_function
# sem -j 2 parallel_test_function
# sem -j 2 parallel_test_function
# sem -j 2 parallel_test_function
# sem -j 2 parallel_test_function
# sem -j 2 parallel_test_function
# sem -j 2 parallel_test_function
# sem -j 2 parallel_test_function
# sem -j 2 parallel_test_function
# sem -j 2 parallel_test_function
# sem -j 2 parallel_test_function
# sem -j 2 parallel_test_function
# sem -j 2 parallel_test_function
# sem -j 2 parallel_test_function
# sem -j 2 parallel_test_function
# sem -j 2 parallel_test_function
# sem -j 2 parallel_test_function
# sem -j 2 parallel_test_function
# sem -j 2 parallel_test_function
# sem -j 2 parallel_test_function
# sem -j 2 parallel_test_function
# sem -j 2 parallel_test_function
# sem -j 2 parallel_test_function
# sem -j 2 parallel_test_function
# sem -j 2 parallel_test_function
# sem -j 2 parallel_test_function
# sem -j 2 parallel_test_function
# sem -j 2 parallel_test_function
# sem -j 2 parallel_test_function
# sem -j 2 parallel_test_function
# sem -j 2 parallel_test_function
# sem -j 2 parallel_test_function
# #
# sem --wait
# end=`date +%s`
# echo ""; echo Execution time was `expr $end - $start` seconds.


# # Test - 1 thread
# ########################################
# #
# # * Execution time (s): 12, 12, 12, 12 ⇒ 12s
# #
# start=`date +%s`
# #
# sem -j 1 parallel_test_function
# sem -j 1 parallel_test_function
# sem -j 1 parallel_test_function
# sem -j 1 parallel_test_function
# sem -j 1 parallel_test_function
# sem -j 1 parallel_test_function
# sem -j 1 parallel_test_function
# sem -j 1 parallel_test_function
# sem -j 1 parallel_test_function
# sem -j 1 parallel_test_function
# sem -j 1 parallel_test_function
# sem -j 1 parallel_test_function
# sem -j 1 parallel_test_function
# sem -j 1 parallel_test_function
# sem -j 1 parallel_test_function
# sem -j 1 parallel_test_function
# sem -j 1 parallel_test_function
# sem -j 1 parallel_test_function
# sem -j 1 parallel_test_function
# sem -j 1 parallel_test_function
# sem -j 1 parallel_test_function
# sem -j 1 parallel_test_function
# sem -j 1 parallel_test_function
# sem -j 1 parallel_test_function
# sem -j 1 parallel_test_function
# sem -j 1 parallel_test_function
# sem -j 1 parallel_test_function
# sem -j 1 parallel_test_function
# sem -j 1 parallel_test_function
# sem -j 1 parallel_test_function
# sem -j 1 parallel_test_function
# sem -j 1 parallel_test_function
# #
# sem --wait
# end=`date +%s`
# echo ""; echo Execution time was `expr $end - $start` seconds.


# Test - Auto-optimized
########################################
#
# * Execution time (s): 5, 5, 5, 5, 5 ⇒ 5s
#
start=`date +%s`
#
sem -j +0 parallel_test_function
sem -j +0 parallel_test_function
sem -j +0 parallel_test_function
sem -j +0 parallel_test_function
sem -j +0 parallel_test_function
sem -j +0 parallel_test_function
sem -j +0 parallel_test_function
sem -j +0 parallel_test_function
sem -j +0 parallel_test_function
sem -j +0 parallel_test_function
sem -j +0 parallel_test_function
sem -j +0 parallel_test_function
sem -j +0 parallel_test_function
sem -j +0 parallel_test_function
sem -j +0 parallel_test_function
sem -j +0 parallel_test_function
sem -j +0 parallel_test_function
sem -j +0 parallel_test_function
sem -j +0 parallel_test_function
sem -j +0 parallel_test_function
sem -j +0 parallel_test_function
sem -j +0 parallel_test_function
sem -j +0 parallel_test_function
sem -j +0 parallel_test_function
sem -j +0 parallel_test_function
sem -j +0 parallel_test_function
sem -j +0 parallel_test_function
sem -j +0 parallel_test_function
sem -j +0 parallel_test_function
sem -j +0 parallel_test_function
sem -j +0 parallel_test_function
sem -j +0 parallel_test_function

#
sem --wait
end=`date +%s`
echo ""; echo Execution time was `expr $end - $start` seconds.

Sources

Subshells & variables

I have the impression that GNU Parallel creates a subshell and that precautions have to be taken to assure that functions and variables are available in that subshell.

Not that this subshell stuff is not the same as scope within a single shell

Without exported function or variable

################################################################################
# Subshell & vars? - Without exporting function or var
################################################################################
#
# Function
########################################
#
subfunction()
{
   echo "Subfunction - Var j: $j"
}


# Main
########################################
#
j=12
echo "Main - var j: $j"
#
parallel subfunction ::: $(seq 5)

Output:

Main - var j: 12
/bin/bash: subfunction: command not found
/bin/bash: subfunction: command not found
/bin/bash: subfunction: command not found
/bin/bash: subfunction: command not found
/bin/bash: subfunction: command not found

With exported function

Now the function is exported using export -f subfunction and GNU Parallel can find it. However, the variable j is not available within this function.

################################################################################
# Subshell & vars? - With exporting function
################################################################################
#
# Function
########################################
#
subfunction()
{
   echo "Subfunction - Var j: $j"
}


# Main
########################################
#
j=12
echo "Main - var j: $j"
export -f subfunction
#
parallel subfunction ::: $(seq 5)

Output:

Main - var j: 12
Subfunction - Var j: 
Subfunction - Var j: 
Subfunction - Var j: 
Subfunction - Var j: 
Subfunction - Var j: 

With exported function and exported variable

Juhu! Sometimes, things are easy:

################################################################################
# Subshell & vars? - With exporting function
################################################################################
#
# Function
########################################
#
subfunction()
{
   echo "Subfunction - Var j: $j"
}


# Main
########################################
#
j=12
echo "Main - var j: $j"
export -f subfunction
export j
#
parallel subfunction ::: $(seq 5)

Output:

Main - var j: 12
Subfunction - Var j: 12
Subfunction - Var j: 12
Subfunction - Var j: 12
Subfunction - Var j: 12
Subfunction - Var j: 12

But not for arrays

It seems that regular arrays and associate arrays cannot be exported to subshells:

################################################################################
# Subshell, var & arrays
################################################################################
#
# Function
########################################
#
subfunction()
{
	echo "function - Var i:               $i"
	echo "function - Associative array j: ${j[@]}"
	echo "function - Regular array k:     ${k[@]}"
}


# Main
########################################
#
i=12

declare -gA j

j[foo,1]="Foo-1"
j[bar,2]="Bar-2"

k[1]="K1"
k[2]="K2"


echo "Main - var j: $i"
export -f subfunction
export i
export j		# Doesn't work
export j[@]		# Doesn't work
export k		# Doesn't work
export k[@]		# Doesn't work
export {k[@]}	# Doesn't work
#
parallel subfunction ::: $(seq 5)

Output:

Main - var j: 12
./parallel.sh: line 181: export: `j[@]': not a valid identifier
./parallel.sh: line 183: export: `k[@]': not a valid identifier
./parallel.sh: line 184: export: `{k[@]}': not a valid identifier
function - Var i:               12
function - Associative array j: 
function - Regular array k:     
function - Var i:               12
function - Associative array j: 
function - Regular array k:     
function - Var i:               12
function - Associative array j: 
function - Regular array k:     
function - Var i:               12
function - Associative array j: 
function - Regular array k:     
function - Var i:               12
function - Associative array j: 
function - Regular array k:   

Pass arrays to GNU Parallel

As mentioned before, you cannot pass an array (regular or associative) to subshells and therefore in some situations to Parallels. But there is hope [1]:

  • Use :::+?
  • Something with exporting functions?
  • ?

Using :::+

This works exactly as intended:

################################################################################
# GNU Parallel & associative array
################################################################################
#
# tmps()
########################################
#
tmp2()
{
   echo ""; echo "tmp2: " 
   echo $1, $2, $3
}

# Build the associative array
########################################
#
declare -gA j

j[1,tag]="_tool_"
j[1,nl]="Boormachine"
j[1,en]="Drilling machine"

j[2,tag]="_tool_"
j[2,nl]="Zaag"
j[2,en]="Saw"

j[3,tag]="_dim_"
j[3,nl]="1,3"
j[3,en]="1.3"

j_rows=3

# echo ${j[@]}


# Convert to regular array
########################################
#
# * I need a structure with "Boormachine", "Zaag" en "1,3" that can be used
#   as an argument for Parallel. That's not possible with an associate
#   array, I guess. E.g.: echo ${j[1*]} doesn't work
# * The code below to construct temporary regular arrays, is quite
#   inefficient. Would be nice to do it in a more efficient way,
#   maybe using GNU Parallel instead of a loop?
#
#
unset j_tag
unset j_nl
unset j_en

for i in $(seq $j_rows)
do
   j_tag+=("${j[$i,tag]}")
   j_nl+=("${j[$i,nl]}")
   j_en+=("${j[$i,en]}")
done	

# echo "j_tag: ${j_tag[@]}"
# echo "j_nl: ${j_nl[@]}"
# echo "j_en: ${j_en[@]}"


# Invoke GNU Parallel
########################################
#
export -f tmp2
#
# parallel tmp2 ::: "${j_tag[@]}" ::: "${j_nl[@]}" ::: "${j_en[@]}"	# 27 combinations?
# parallel tmp2 ::: "${j_tag[@]}" ::: "${j_nl[@]}"	                # 9 combinations
# parallel tmp2 ::: "${j_tag[@]}" :::+ "${j_nl[@]}"	                # 6 combinations
#
parallel tmp2 ::: "${j_tag[@]}" :::+ "${j_nl[@]}" :::+ "${j_en[@]}"	# 3 rows - As intended

BTW: The function that is used here, could easily be replacedd by a direct statement, leading to something like this:

# With a single inline operator
########################################
#
parallel echo {1} {2} {3} ::: "${j_tag[@]}" :::+ "${j_nl[@]}" :::+ "${j_en[@]}"

And for something even more exciting: Now with multiple inline commands:

# With multiple inline operators
########################################
#
parallel echo {1} {2} {3} ::: "${j_tag[@]}" :::+ "${j_nl[@]}" :::+ "${j_en[@]}"

Sources

case: collect term_ids through wp-cli

Seems like a good case for replacing a loop with parallel.

Original code:

# Collect all term_ids through a loop
#######################################
#
# * There are 1.433 terms to collect
# * Max. 100 items are returned at once
# * Hence this loop needs 15 iterations
#
i=1
echo "Loop - Collect all term_ids"
#
for ((i; i<=$number_of_iterations; i++))	
do
	#
	echo "	Iteration $i/$number_of_iterations"
	#
	# Store batch of term ids in tmp array j
	########################################
	#
	mapfile -t j < <( wp wc product_attribute_term list \
		$taxonomy_id \
		--user=4 \
		--field=id \
		--offset=$((($i-1)*100)) | grep . )
	#
	# echo "	j: ${j[@]}"
	#
	# Append to array term_id
	########################################
	#
	term_id=(${term_id[@]} ${j[@]})
	echo "		Length term_id: ${#term_id[@]}"
	#
done

New code: ...

WP-CLI & Parallel

Finally got it working - Thanks to ChatGPT! → https://wiki.devliegendebrigade.nl/Wp_wc_product_delete

See also

Sources