Just call a function multiple times (GNU Parallel)

Uit De Vliegende Brigade
Naar navigatie springen Naar zoeken springen

This was one of my first trials with GNU Parallel and I though it was quite simple: Execute the same function 6 times.

Seemed like an easy start, but it took quite some time to get it working.

  • The 'CPU consuming' part is a loop with an integer addition and a subtraction. I didn't want to use something with sleep, as that might not actually take up CPU resources
  • Execution time is mentioned for various implementations. I think this was on my laptop, but that's besides the point. The essence is being able to compare the results.

Baseline: Without parallel stuff

Execution time: 24s.

# parallel_test_function()
########################################
#
function parallel_test_function()
{
   printf "parallel_test_function - Start... "
   i=0
   for ((i; i<=1000000; i++))
   do
      i=$i+1
      i=$i-1
   done
   printf "Done. "
}

# Main
########################################
#
# Execution time (function: 1000000x. Here: 6x): 23, 24, 24, 24 ⇒ 24s
#
export -f parallel_test_function

start=`date +%s`
j=0
for ((j; j<=5; j++))
do
   parallel_test_function
done

end=`date +%s`
echo ""; echo Execution time was `expr $end - $start` seconds.

Just call a function with 'parallel'?

This doesn't work:

# Main
########################################
#
export -f parallel_test_function
start=`date +%s`
j=0
for ((j; j<=5; j++))
do
   parallel parallel_test_function
done
end=`date +%s`
echo ""; echo Execution time was `expr $end - $start` seconds.

It will result in this error:

parallel: Warning: Input is read from the terminal. You either know what you
parallel: Warning: are doing (in which case: YOU ARE AWESOME!) or you forgot
parallel: Warning: ::: or :::: or to pipe data into parallel. If so
parallel: Warning: consider going through the tutorial: man parallel_tutorial
parallel: Warning: Press CTRL-D to exit.

Working or not?

This seems to work, but it isn't any faster:

# Call test function with parallel (v1)
########################################
#
# * Execution time (function: 1000000x. Here: 6x): 21, 23, 24 ⇒ 24s
# * With less inner loops and more outer loops, this is even slower than
#   without using ||
#
export -f parallel_test_function

start=`date +%s`
j=0
for ((j; j<=5; j++))
do
   sem parallel_test_function
done
end=`date +%s`
echo ""; echo Execution time was `expr $end - $start` seconds.

Finally!

Execution time: 9s - This works:

# Call test function with parallel (v2)
########################################
#
# Execution time (function: 1000000x. Here: 6x): 10, 9, 9 ⇒ 9s
#
export -f parallel_test_function

start=`date +%s`

seq 6 | parallel parallel_test_function

end=`date +%s`
echo ""; echo Execution time was `expr $end - $start` seconds.

I suspect that the previous trial didn't work, because the loop kills parallelisation: Probably only what is stated after the keyword parallel, is actually parallelised. Seems quite logical (except for using sem - would this reasoning still hold?)

So the trick seems to be: When you have a loop and you want to || is, make sure that you get rid of the loop. A bit similar to changing a select query to a update query: Always a bit of puzzling, but doable.

Resources