Parallelisation (Bash): verschil tussen versies

Uit De Vliegende Brigade
Naar navigatie springen Naar zoeken springen
 
(14 tussenliggende versies door dezelfde gebruiker niet weergegeven)
Regel 1: Regel 1:
 
WordPress benaderen via de WP-CLI, is vreselijk langzaam. Gelukkig kun je dingen in Bash paralleliseren:
 
WordPress benaderen via de WP-CLI, is vreselijk langzaam. Gelukkig kun je dingen in Bash paralleliseren:
  
== Example: Parallelize a single command ==
+
== Parallelize a single command using '&' ==
 +
 
 +
An example:
  
 
=== Without parallelisation ===
 
=== Without parallelisation ===
Regel 86: Regel 88:
 
</pre>
 
</pre>
  
* De truuk is, dat alle regels - behalve de laatste - een <code>&</code> hebben. Daardoor worden er steeds het gespecificeerde aantal parallele commando's geëxecuteerd
+
* The trick is that all lines - except the last one - have a <code>&</code>. As a result, the specified number of parallel commands are always executed
* Het laatste parallele commando heeft geen <code>&</code>. Maw. het script gaat pas verder als deze laatste regel is uitgevoerd. Daardoor zijn er steeds min-of-meer het gespecificeerde aantal parallele commando's actief (dit is niet 100% efficiënt, maar komt een heel eind)
+
* The last parallel command has no <code>&</code>. ie. the script will not continue until this last line has been executed. As a result, more-or-less the specified number of parallel commands are always active (this is not 100% efficient, but goes a long way)
* Als alle regels een <code>&</code> hebben, dan is er geen limiet aan het aantal parallele commando's. Dat geeft twee soorten foutmeldingen: (1) ID's die niet blijken te bestaan, omdat een ander proces ze al heeft verwijderd (2) Out-of-sockets (of hoe dat moge heten) voor MySQL: Er kunnen niet meer IPC's (als dat de juiste term in) worden opgezet
+
* If all lines have a <code>&</code> then there is no limit to the number of parallel commands. This gives two types of error messages: (1) IDs that don't seem to exist because another process has already deleted them (2) Out-of-sockets (or whatever it's called) for MySQL: No more IPCs (if that the correct term in) are set up
* Het is geen ramp om meer commando's parallel te executeren dan er cores zijn: Normaal doet een computer toch al een hoop dingen parallel. Dan kan dit er ook wel bij.
+
* It is not a disaster to execute more commands in parallel than there are cores: Normally a computer does a lot of things in parallel anyway. Then this can also be added.
* Het is cruciaal dat de workload van het oorspronkelijke commando opgesplitst kan worden (in dit geval: 100x hetzelfde commando maar met verschillende argumenten). Dat gaat hier middels <code>--per_page</code> en <code>--offset</code>
+
* It is crucial that the workload of the original command can be split (in this case: 100x the same command but with different arguments). This is done using <code>--per_page</code> and <code>--offset</code>
* Waarschijnlijk is deze code efficiënter te maken, door in één commando alle argumenten te verzamelen (dat is één <code>wp wc shop_order list</code>-commando, en de uitkomst te distribueren over parallele <code>wp wc shop_order delete</code>-commando's.
+
* Probably this code can be made more efficient by collecting all arguments in one command (that is one <code>wp wc shop_order list</code> command, and distributing the result over parallel <code>wp wc shop_order delete</code> commands.
 +
 
 +
== Parallelize a chunk of code ==
 +
 
 +
Quite often, I would like to parallelize more than just an individual command. How to do that? For starters: This doesn't seem to have to do with the concept of ''subshells''.
 +
 
 +
== See also ==
 +
 
 +
* [[Parallel (Bash)]]
 +
* [https://en.wikipedia.org/wiki/Pexec pexec]
 +
 
 +
== See also ==
 +
 
 +
* [[Parallel (Bash)]]
  
== Bronnen ==
+
== Sources ==
  
 
* https://unix.stackexchange.com/questions/162645/is-it-possible-to-run-two-commands-at-the-same-time-in-a-shell-script
 
* https://unix.stackexchange.com/questions/162645/is-it-possible-to-run-two-commands-at-the-same-time-in-a-shell-script
Regel 99: Regel 114:
 
* https://unix.stackexchange.com/questions/218074/how-to-know-number-of-cores-of-a-system-in-linux
 
* https://unix.stackexchange.com/questions/218074/how-to-know-number-of-cores-of-a-system-in-linux
 
* https://linuxconfig.org/multi-threaded-bash-scripting-process-management-at-the-command-line - Ziet er interessant uit!
 
* https://linuxconfig.org/multi-threaded-bash-scripting-process-management-at-the-command-line - Ziet er interessant uit!
 +
* https://en.wiktionary.org/wiki/parallelize
 +
* https://stackoverflow.com/questions/3004811/how-do-you-run-multiple-programs-in-parallel-from-a-bash-script
 +
* https://www.gnu.org/software/parallel/
 +
* https://zenodo.org/record/1146014/files/GNU_Parallel_2018.pdf?download=1
 +
* https://bash-prompt.net/guides/parallell-bash/
 +
* https://medium.com/linuxstories/bash-parallel-command-execution-d4bd7c7cc1d6
 +
* https://adamtheautomator.com/how-to-speed-up-bash-scripts-with-multithreading-and-gnu-parallel/
 +
* https://www.baeldung.com/linux/processing-commands-in-parallel

Huidige versie van 23 okt 2022 om 07:06

WordPress benaderen via de WP-CLI, is vreselijk langzaam. Gelukkig kun je dingen in Bash paralleliseren:

Parallelize a single command using '&'

An example:

Without parallelisation

Dit commando duurt ca. 5s.:

wp wc shop_order list --user=4 --field=id | xargs -n1 wp wc shop_order delete --user=4 --force=1

Ditzelfde commando op een VPS met 4 ipv 2 CPU's, duurt even lang (zelfs iets langer). Dat was geen verbazing: Dit PHP-commando is niet echt te paralleliseren, want het is één seriele aangelegenheid. Via tops en ps kon ik zien dat er wel degelijk aparte processen aan te pas komen. Zoiets als:

  • Commando-als-geheel
  • xargs
  • php
  • MySQL.

Maar nog steeds is er effectief geen sprake van parallelisatie, omdat deze processen welliswaar apart zijn, maar nog steeds seriëel worden doorlopen. Hetzelfde probleem als dat gamers meer geholpen zijn bij hele snelle processoren, dan bij veel processoren. Helaas kan ik bij TransIP niet voor snellere processoren kiezen. Alleen voor meer processoren - Lees verder!

With parallelisation

In Bash kun je met & aangeven dat het volgende commando kan starten voordat het huidige commando (waar de & bij hoort) beëindigt is. En daarmee blijk je prima te kunnen paralleliseren! Dit is dezelfde code als hierboven, ca. 100x toegepast:

# Parallel (2x)
########################################
#
date
	wp wc shop_order list --user=4 --field=id --per_page=50             | xargs -n1 wp wc shop_order delete --user=4 --force=1 &
	wp wc shop_order list --user=4 --field=id --per_page=50 --offset=50 | xargs -n1 wp wc shop_order delete --user=4 --force=1 &
date
# Parallel (4x)
########################################
#
date
	wp wc shop_order list --user=4 --field=id --per_page=25             | xargs -n1 wp wc shop_order delete --user=4 --force=1 &
	wp wc shop_order list --user=4 --field=id --per_page=25 --offset=25 | xargs -n1 wp wc shop_order delete --user=4 --force=1 &
	wp wc shop_order list --user=4 --field=id --per_page=25 --offset=50 | xargs -n1 wp wc shop_order delete --user=4 --force=1 &
	wp wc shop_order list --user=4 --field=id --per_page=25 --offset=75 | xargs -n1 wp wc shop_order delete --user=4 --force=1	
date
# Parallel (8x)
########################################
#
date
	wp wc shop_order list --user=4 --field=id --per_page=12             | xargs -n1 wp wc shop_order delete --user=4 --force=1 &
	wp wc shop_order list --user=4 --field=id --per_page=12 --offset=12 | xargs -n1 wp wc shop_order delete --user=4 --force=1 &
	wp wc shop_order list --user=4 --field=id --per_page=12 --offset=24 | xargs -n1 wp wc shop_order delete --user=4 --force=1 &
	wp wc shop_order list --user=4 --field=id --per_page=12 --offset=36 | xargs -n1 wp wc shop_order delete --user=4 --force=1 &	
	wp wc shop_order list --user=4 --field=id --per_page=12 --offset=48 | xargs -n1 wp wc shop_order delete --user=4 --force=1 &
	wp wc shop_order list --user=4 --field=id --per_page=12 --offset=60 | xargs -n1 wp wc shop_order delete --user=4 --force=1 &
	wp wc shop_order list --user=4 --field=id --per_page=12 --offset=72 | xargs -n1 wp wc shop_order delete --user=4 --force=1 &
	wp wc shop_order list --user=4 --field=id --per_page=12 --offset=84 | xargs -n1 wp wc shop_order delete --user=4 --force=1	
date
# Parallel (16x)
########################################
#
date
	wp wc shop_order list --user=4 --field=id --per_page=12              | xargs -n1 wp wc shop_order delete --user=4 --force=1 &
	wp wc shop_order list --user=4 --field=id --per_page=12 --offset=12  | xargs -n1 wp wc shop_order delete --user=4 --force=1 &
	wp wc shop_order list --user=4 --field=id --per_page=12 --offset=24  | xargs -n1 wp wc shop_order delete --user=4 --force=1 &
	wp wc shop_order list --user=4 --field=id --per_page=12 --offset=36  | xargs -n1 wp wc shop_order delete --user=4 --force=1 &
	wp wc shop_order list --user=4 --field=id --per_page=12 --offset=48  | xargs -n1 wp wc shop_order delete --user=4 --force=1 &
	wp wc shop_order list --user=4 --field=id --per_page=12 --offset=60  | xargs -n1 wp wc shop_order delete --user=4 --force=1 &
	wp wc shop_order list --user=4 --field=id --per_page=12 --offset=72  | xargs -n1 wp wc shop_order delete --user=4 --force=1 &
	wp wc shop_order list --user=4 --field=id --per_page=12 --offset=84  | xargs -n1 wp wc shop_order delete --user=4 --force=1 &
	wp wc shop_order list --user=4 --field=id --per_page=12 --offset=96  | xargs -n1 wp wc shop_order delete --user=4 --force=1 &
	wp wc shop_order list --user=4 --field=id --per_page=12 --offset=108 | xargs -n1 wp wc shop_order delete --user=4 --force=1 &
	wp wc shop_order list --user=4 --field=id --per_page=12 --offset=120 | xargs -n1 wp wc shop_order delete --user=4 --force=1 &
	wp wc shop_order list --user=4 --field=id --per_page=12 --offset=132 | xargs -n1 wp wc shop_order delete --user=4 --force=1 &
	wp wc shop_order list --user=4 --field=id --per_page=12 --offset=144 | xargs -n1 wp wc shop_order delete --user=4 --force=1 &
	wp wc shop_order list --user=4 --field=id --per_page=12 --offset=156 | xargs -n1 wp wc shop_order delete --user=4 --force=1 &
	wp wc shop_order list --user=4 --field=id --per_page=12 --offset=168 | xargs -n1 wp wc shop_order delete --user=4 --force=1 &
	wp wc shop_order list --user=4 --field=id --per_page=12 --offset=180 | xargs -n1 wp wc shop_order delete --user=4 --force=1
date
  • The trick is that all lines - except the last one - have a &. As a result, the specified number of parallel commands are always executed
  • The last parallel command has no &. ie. the script will not continue until this last line has been executed. As a result, more-or-less the specified number of parallel commands are always active (this is not 100% efficient, but goes a long way)
  • If all lines have a & then there is no limit to the number of parallel commands. This gives two types of error messages: (1) IDs that don't seem to exist because another process has already deleted them (2) Out-of-sockets (or whatever it's called) for MySQL: No more IPCs (if that the correct term in) are set up
  • It is not a disaster to execute more commands in parallel than there are cores: Normally a computer does a lot of things in parallel anyway. Then this can also be added.
  • It is crucial that the workload of the original command can be split (in this case: 100x the same command but with different arguments). This is done using --per_page and --offset
  • Probably this code can be made more efficient by collecting all arguments in one command (that is one wp wc shop_order list command, and distributing the result over parallel wp wc shop_order delete commands.

Parallelize a chunk of code

Quite often, I would like to parallelize more than just an individual command. How to do that? For starters: This doesn't seem to have to do with the concept of subshells.

See also

See also

Sources