Associative arrays (Bash): verschil tussen versies

Uit De Vliegende Brigade
Naar navigatie springen Naar zoeken springen
(34 tussenliggende versies door dezelfde gebruiker niet weergegeven)
Regel 1: Regel 1:
An ''associative array'' is an array where the index doesn't have to be a number, but can be symbolic. E.g.:
+
An ''associative array'' is an array where the index can be symbolic, rather than only numerical. E.g.:
  
 
<pre>
 
<pre>
Regel 13: Regel 13:
 
== There are no multidimensional arrays ==
 
== There are no multidimensional arrays ==
  
Bash doesn't have multidimensional arrays (as of 2022.09.29). Associative arrays aren't multidimensional arrays either, but they can emulate them. This has some limitations.
+
Bash doesn't have multidimensional arrays (as of 2022.09.29). Associative arrays aren't multidimensional arrays either, but they can emulate them. This has some limitations and this can be tricky if you're not aware of them.
  
 
As an example:
 
As an example:
Regel 46: Regel 46:
 
* When retrieving the dimension of the array, it returns only ''one''  number. Because it's still just a vector.
 
* When retrieving the dimension of the array, it returns only ''one''  number. Because it's still just a vector.
  
But does this actually matter? Sometimes it probably does, but so far, I seem to be able to live with it:
+
But does this actually matter? Sometimes it probably doesn't: It took me a while between adopting associative arrays and realizing their limitations. So far, these are the issues I've encountered:
  
=== Retrieve number of rows or columns? ===
+
* ''Retrieve number of rows?'' There is no meaningful way of retrieving the number of rows, as the matrix is actually turned into a vector
 +
* ''Iterate over rows?'' As you don't have real rows, you can't iterate over them either
 +
* ''Retrieve a specific entry'': This is probably the hardest challenge: How to retrieve an entry with a specific index? In an efficient way, concerning CPU time and code overhead?
 +
* More?
 +
 
 +
These issues are discussed in the following chapters:
 +
 
 +
== Retrieve number of rows or columns? ==
  
 
A problem with this 'emulated' multi-dimensional arrays: You can't read-out the number of rows or columns, for there are no real rows and columns. Illustration:
 
A problem with this 'emulated' multi-dimensional arrays: You can't read-out the number of rows or columns, for there are no real rows and columns. Illustration:
Regel 78: Regel 85:
 
This can be a problem, when you want to loop over the rows - Can't do that. But see below for a solution.
 
This can be a problem, when you want to loop over the rows - Can't do that. But see below for a solution.
  
=== Iterate over rows? ===
+
== Iterate over rows? ==
  
 
If there is not really such a thing as ''rows'' and ''columns'', than how to iterate over them?
 
If there is not really such a thing as ''rows'' and ''columns'', than how to iterate over them?
Regel 92: Regel 99:
 
</pre>
 
</pre>
  
=== Retrieve a specific value? ===
+
== Retrieve a specific entry? ==
 +
 
 +
How to retrieve a specific entry? I don't have the final answer yet, but it's coming:
 +
 
 +
* [[Substring extraction (Bash)]]
 +
* Below: Chapter about having a numberical index.
 +
 
 +
== Scope ==
 +
 
 +
When an associative array is defined through <code>declare -gA</code>, the array is available in all recursively invoked functions within the same shell. E.g.:
 +
 
 +
<pre>
 +
################################################################################
 +
# Associative arrays & scope
 +
################################################################################
 +
#
 +
function sub1()
 +
{
 +
echo "Within sub1: ${j[@]}"
 +
sub2
 +
}
 +
 
 +
 
 +
function sub2()
 +
{
 +
echo "Within sub2: ${j[@]}"
 +
}
 +
 
 +
 
 +
# Main
 +
########################################
 +
#
 +
unset j
 +
declare -gA j
 +
 
 +
j["foo"]=1
 +
j["bar"]=2
 +
 
 +
echo "Within main: ${j[@]}"
 +
 
 +
sub1
 +
</pre>
 +
 
 +
Output:
 +
 
 +
<pre>
 +
Within main: 1 2
 +
Within sub1: 1 2
 +
Within sub2: 1 2
 +
</pre>
 +
 
 +
Results are the same if the two functions are defined in reverse order in this script.
 +
 
 +
This is a different situation from where stuff is
 +
 
 +
== Export to subshells ==
 +
 
 +
Associative arrays cannot be exported to subshells like variables or functions [https://stackoverflow.com/questions/12944674/how-to-export-an-associative-array-hash-in-bash]. To make the content of an associate array available in subshells, you might have to use some tricks:
 +
 
 +
* Export only needed values as variables
 +
* Use files for storage & retrieval - Will probably have quite some overhead
 +
* Convert the associative array to several regular arrays for each index - This doesn't seem too hard.
  
How to retrieve a specific value? I suspect that isn't that hard either, and goes similarly as iterating over the array
+
== Conventions ==
  
=== Solutions ===
+
To alluviate some of the issues discussed before:
  
 
* When arrays have a numerical index, always start with the same number ⇒ I prefer ''base 1'', just as in matrix algebra
 
* When arrays have a numerical index, always start with the same number ⇒ I prefer ''base 1'', just as in matrix algebra
Regel 261: Regel 329:
 
</pre>
 
</pre>
  
However, this is totally not exciting: The index was explicitly specified when initialising the array. That's different when you initialise an array through e.g., <code>i= ( 1 2 3 blub 5 )</code> (or whatever the exact syntax is).
+
== Loop over index + value ==
 
 
== Index + value ==
 
  
 
Again, not very exciting, but maybe instructive at times:
 
Again, not very exciting, but maybe instructive at times:
Regel 299: Regel 365:
 
== Length of an array ==
 
== Length of an array ==
  
Use the symbol <code>#</code> to retrieve the ''length'' of an array. Since associative arrays are just vectors after all, there is only one dimension to retrieve.
+
Use the symbol <code>#</code> to retrieve the ''length'' of an array. Since associative arrays are just vectors with fancy indices, there is only one dimension to retrieve: Its length:
 
 
Example:
 
  
 
<pre>
 
<pre>
Regel 323: Regel 387:
 
</pre>
 
</pre>
  
== Awk ==
+
== Have a numerical index? ==
 +
 
 +
Example: Let's consider a matrix like:
 +
 
 +
<pre>
 +
example.com example_com  us_en
 +
example.nl  example_nl  nl_en
 +
example.de  example_de  de_en
 +
</pre>
 +
 
 +
=== Without numerical index ===
 +
 
 +
In Sep. 2022, I found it attractive to use an associative array like this:
 +
 
 +
<pre>
 +
site[example.com,example_com]
 +
site[example.com,us_en]
 +
site[example.nl,example_nl]
 +
site[example.nl,nl_en]
 +
site[example.de,example_de]
 +
site[example.de,de_en]
 +
site_rows=3
 +
</pre>
 +
 
 +
* Advantage: No additonal 'column' for primary keys - Small matrix
 +
* Disadvantage: It becomes a bit tricky to collect the data that would be part of one 'row': I have to use one of the other fields as a make-shift primary key.
 +
 
 +
=== With numerical index ===
 +
 
 +
Let's include a numerical index like this:
 +
 
 +
<pre>
 +
site[1,example.com example_com]
 +
site[1,example.com us_en]
 +
site[2,example.nl example_nl]
 +
site[2,example.nl nl_en]
 +
site[3,example.de example_de]
 +
site[3,example.de de_en]
 +
site_rows=3
 +
</pre>
 +
 
 +
* Disadvantage: Additional index - But no additional rows
 +
* Advantage: It's easier to collect the data that would be part of a 'row' as there is now a genuine index.
 +
 
 +
However, in practice it's much easier to loop through this array and use its entries. E.g.:
 +
 
 +
<pre>
 +
i=1; for ((i; i<=$site_rows; i++))
 +
do
 +
#
 +
# Extract row entities
 +
########################################
 +
#
 +
site_cat=${site[$i,cat]}
 +
site_url=${site[$i,url]}
 +
site_db=${site[$i,db]}
 +
 
 +
 
 +
# Check
 +
########################################
 +
#
 +
echo ""; echo "### Loop - row: $i - site_url: $site_url"
 +
#
 +
echo " site_cat: $site_cat"
 +
echo " site_url: $site_url"
 +
echo " site_db:  $site_db"
 +
 
 +
 
 +
# Execute
 +
########################################
 +
#
 +
backup_database
 +
disable_woocommerce_attribute_lookup
 +
delete_transients
 +
wp_update_site
 +
#
 +
done
 +
</pre>
 +
 
 +
== Additional index vs. separate array ==
 +
 
 +
I experimented with separate arrays for different sites (as part of a server update script), but it didn't work very well: For every array I had to duplicate the loop do do stuff. I also couldn't concatenate these arrays, for there wouldn't be a unique PK anymore.
 +
 
 +
Seemed much easier to create one large table, and include an index <code>tag</code> with values like e.g., <code>zwk_woo</code> to indicate customer ''zwk'' and that this is a WooCommerce site. In a loop, it would be easy to take these into account.
 +
 
 +
== Detect an missing entry? ==
 +
 
 +
=== Problem ===
 +
 
 +
I use an associative array for translations. Consider these items:
 +
 
 +
<pre>
 +
((i++))
 +
tr[$i,tag]="_empty_pt_strange"
 +
tr[$i,nl]="Zwuk - Overig"
 +
tr[$i,en]=""
 +
 
 +
 
 +
((i++))
 +
tr[$i,tag]="_px_"
 +
tr[$i,nl]="Stofzuiger"
 +
tr[$i,de]="Staubsauger"
 +
</pre>
 +
 
 +
* In the first row, something is translated to an empty string - That's fine!
 +
* In the second row, there is a German translation, but no English translation.
 +
 
 +
The problem: How to distinguish between an 'empty translation' and a missing translation?
 +
 
 +
=== Ideas ===
 +
 
 +
* In the first example, one of the tags is <code>_empty_</code> so that could be used, but that's computatively intense, plus human as I am, I'm likely to forget to include this tag at times.
 +
* Can you distinguish between an empty entry and a non-existent entry? → Yes. See solution
 +
* Can you detect a missing index? → This would be ideal → Nope
 +
 
 +
=== Solution ===
 +
 
 +
Check that the entity is ''set'':
 +
 
 +
<pre>
 +
unset j
 +
declare -gA j
 +
 
 +
j[one]="Eén"
 +
j[two]=""
 +
 
 +
[[ -v j[one] ]]  && echo " true - j[one]"    # True: Entry exists (and may be empty)
 +
[[ -v j[two] ]]  && echo " true - j[two]"   # True: Entry exists (and may be empty)
 +
[[ -v j[three] ]] && echo " true - j[three]"  # False: Entry doesn't exist - unset
 +
</pre>
 +
 
 +
== Alternatives? ==
 +
 
 +
It's still messy. Let's have an open mind concerning alternatives:
 +
 
 +
=== Awk ===
 +
 
 +
And for something entirely different: I kinda move from spreadsheets to associative arrays and back. A while ago I saw on YouTube [https://www.youtube.com/watch?v=jJ02kEETw70 Gary Explains: EVERYONE Needs to Learn a Little Bit of AWK!] - Maybe [[Awk | awk]] is what I have been looking for whole my life?
 +
 
 +
=== Database table===
 +
 
 +
This actually sounds like a perfect job for a database table.
 +
 
 +
=== Python? ===
 +
 
 +
Maybe use Python for this, rather than Bash?
 +
 
 +
=== Spreadsheet? ===
 +
 
 +
Maybe retrieve data from a spreadsheet?
 +
 
 +
A basic reason for not using spreadsheets for this kind of data: Just like not using a word processor for programming, but rather an editor, a spreadsheet is not precise enough. Auto-corrections like capitalisations, changing dashes, and not being able to store whitespace reliably.
 +
 
 +
== See also ==
  
And for something entirely else: I kinda move from spreadsheets to associative arrays and back. A while ago I saw on YouTube [https://www.youtube.com/watch?v=jJ02kEETw70 Gary Explains: EVERYONE Needs to Learn a Little Bit of AWK!] - Maybe this is what I have been looking?
+
* [[Awk | Awk]]
 +
* [[Declare (Bash)]]
 +
* [[String comparison (Bash)]]
 +
* [[Subshells (Bash)]]
 +
* [[Substring extraction (Bash)]]
 +
* [[Unset (Bash)]]
  
 
== Sources ==
 
== Sources ==

Versie van 7 nov 2022 17:30

An associative array is an array where the index can be symbolic, rather than only numerical. E.g.:

declare -A j

j[fruit]=apple
j[color]=blue
  • You can use associative arrays to mimic multi-dimensional arrays, with emphasize on mimic
  • Associative arrays are new in Bash 4. To verify which version of Bash you have: bash --version.

There are no multidimensional arrays

Bash doesn't have multidimensional arrays (as of 2022.09.29). Associative arrays aren't multidimensional arrays either, but they can emulate them. This has some limitations and this can be tricky if you're not aware of them.

As an example:

unset j
declare -A j

j[0,0]="00"
j[0,1]="01"
j[0, 1]="0 1"

for i in "${!j[@]}"
do
   echo "Index: $i - Value: ${j[$i]}"
done
echo "Length of this array: ${#j[@]}"

Output:

Index: 0, 1 - Value: 0 1
Index: 0,1 - Value: 01
Index: 0,0 - Value: 00
Length of this array: 3

What this shows:

  • The entry with index [0, 1], is not the same as the entry with index [0,1]. This shows that everything between [] is regarded as just one index and not as something multidimensional
  • When retrieving the dimension of the array, it returns only one number. Because it's still just a vector.

But does this actually matter? Sometimes it probably doesn't: It took me a while between adopting associative arrays and realizing their limitations. So far, these are the issues I've encountered:

  • Retrieve number of rows? There is no meaningful way of retrieving the number of rows, as the matrix is actually turned into a vector
  • Iterate over rows? As you don't have real rows, you can't iterate over them either
  • Retrieve a specific entry: This is probably the hardest challenge: How to retrieve an entry with a specific index? In an efficient way, concerning CPU time and code overhead?
  • More?

These issues are discussed in the following chapters:

Retrieve number of rows or columns?

A problem with this 'emulated' multi-dimensional arrays: You can't read-out the number of rows or columns, for there are no real rows and columns. Illustration:

# The array below is 3x2
#
unset j
declare -A j

j[1,1]="11"
j[1,2]="12"
j[2,1]="21"
j[2,2]="22"
j[3,1]="31"
j[3,2]="32"

echo "Length: ${#j[@]}"
echo "Complete array: ${j[@]}"

Output:

Length: 6
Complete array: 21 22 31 31 12 11

This can be a problem, when you want to loop over the rows - Can't do that. But see below for a solution.

Iterate over rows?

If there is not really such a thing as rows and columns, than how to iterate over them?

That's probably not so difficult, Just remember that the 'multidimensional index' is just one index, and that you can iterate over it. E.g.:

for i in "${!imwiz[@]}"
do
   echo "Index: $i"
   echo "Value: ${imwiz[$i]}"
done

Retrieve a specific entry?

How to retrieve a specific entry? I don't have the final answer yet, but it's coming:

Scope

When an associative array is defined through declare -gA, the array is available in all recursively invoked functions within the same shell. E.g.:

################################################################################
# Associative arrays & scope
################################################################################
#
function sub1()
{
	echo "Within sub1: ${j[@]}"
	sub2
}


function sub2()
{
	echo "Within sub2: ${j[@]}"
}


# Main
########################################
#
unset j
declare -gA j

j["foo"]=1
j["bar"]=2

echo "Within main: ${j[@]}"

sub1

Output:

Within main: 1 2
Within sub1: 1 2
Within sub2: 1 2

Results are the same if the two functions are defined in reverse order in this script.

This is a different situation from where stuff is

Export to subshells

Associative arrays cannot be exported to subshells like variables or functions [1]. To make the content of an associate array available in subshells, you might have to use some tricks:

  • Export only needed values as variables
  • Use files for storage & retrieval - Will probably have quite some overhead
  • Convert the associative array to several regular arrays for each index - This doesn't seem too hard.

Conventions

To alluviate some of the issues discussed before:

  • When arrays have a numerical index, always start with the same number ⇒ I prefer base 1, just as in matrix algebra
  • For two-dimensional arrays, the first dimensions is always rows (x) and the second is always columns (y) - Just as in matrix algebra
  • Don't use spaces around the , that separates indices: You need a uniform syntax, and using spaces actually messes up language highlighting in Sublime Text
  • Store the dimensions in associated variables (when needed). E.g., in the example above: j_rows=3 and j_columns=2

Examples

As stated above, these are not really multidimensional arrays, just arrays with fancy indices. It doesn't matter if these indices are numerical or symbolic:

declare -A j
j[0,0,0]="000"
j[0,0,1]="001"
j[0,1,0]="010"
j[0,1,1]="011"
j[1,0,0]="100"
j[1,0,1]="101"
j[1,1,0]="110"
j[1,1,1]="111"

echo "${j[0,0,0]} ${j[0,0,1]} ${j[0,1,0]} ${j[0,1,1]}"
echo "${j[1,0,0]} ${j[1,0,1]} ${j[1,1,0]} ${j[1,1,1]}"
unset j

declare -A j

j[fruit,one]=Mango
j[fruit,two]=Apple
j[bird,1]=Cockatail
j[bird,2]=Spottingbird
j[flower,1]=Rose
j[flower,2]=Sunflower
j[animal]=Tiger

for i in "${j[@]}"
do
   echo "Entry: $i"
done

Output:

Entry: Cockatail
Entry: Spottingbird
Entry: Rose
Entry: Sunflower
Entry: Tiger
Entry: Apple
Entry: Mango

Note that the entries seem to appear in arbitrary order

Loop through an array

Again: These are not multidimensional arrays. They only seem that way.....

This works [2]:

declare -A j
j[fruit]=Mango
j[bird]=Cockatail
j[flower]=Rose
j[animal]=Tiger

for i in "${j[@]}"
do
   echo "Entry: $i"
done	

Output:

Entry: Mango
Entry: Rose
Entry: Tiger
Entry: Cockatail

This works:

declare -A j

j[fruit,1]=Mango
j[fruit,2]=Apple
j[bird,1]=Cockatail
j[bird,2]=Spottingbird
j[flower,1]=Rose
j[flower,2]=Sunflower
j[animal,1]=Tiger
j[animal,1]=Mouse

for i in "${j[@]}"
do
   echo "Entry: $i"
done

Output:

Entry: Cockatail
Entry: Spottingbird
Entry: Rose
Entry: Sunflower
Entry: Mouse
Entry: Mango
Entry: Apple

The only problem: The entries seem to be quite random. This is also the case if I insert statement unset j at the beginning of the script.

Looping through the rows of an associate array: This one isn't as cool as the code before, because the index is given explicitly:

i=1
for ((i; i<=$number_of_sites; i++))
do
   echo "Row ${i}: ${site[$i,1]} &  ${site[$i,2]}"
done

Loop through the index of an array

Use the symbol ! to refer to an array's index, rather than its content. Remember that with associative arrays, you define the index yourself. There is no numerical index:

declare -A j

j[fruit,1]=Mango
j[fruit,2]=Apple
j[bird,1]=Cockatail
j[bird,2]=Spottingbird
j[flower,1]=Rose
j[flower,2]=Sunflower
j[animal,1]=Tiger
j[animal,1]=Mouse

for i in "${!j[@]}"
do
   echo "Index: $i"
done

With output:

Index: bird,1
Index: bird,2
Index: flower,1
Index: flower,2
Index: animal,1
Index: fruit,1
Index: fruit,2

Loop over index + value

Again, not very exciting, but maybe instructive at times:

unset j
declare -A j

j[fruit,one]=Mango
j[fruit,two]=Apple
j[bird,1]=Cockatail
j[bird,2]=Spottingbird
j[flower,1]=Rose
j[flower,2]=Sunflower
j[animal]=Tiger

for i in "${!j[@]}"
do
   echo "Index: $i - Value: ${j[$i]}"
done

Output:

Index: bird,1 - Value: Cockatail
Index: bird,2 - Value: Spottingbird
Index: flower,1 - Value: Rose
Index: flower,2 - Value: Sunflower
Index: animal - Value: Tiger
Index: fruit,two - Value: Apple
Index: fruit,one - Value: Mango

Length of an array

Use the symbol # to retrieve the length of an array. Since associative arrays are just vectors with fancy indices, there is only one dimension to retrieve: Its length:

unset j
declare -A j

j[0,0]="00"
j[0,1]="01"
j[1,0]="10"
j[1,1]="11"
j[2,0]="20"
j[2,1]="21"

echo "Length: ${#j[@]}"

Output:

Length: 6

Have a numerical index?

Example: Let's consider a matrix like:

example.com example_com  us_en
example.nl  example_nl   nl_en
example.de  example_de   de_en

Without numerical index

In Sep. 2022, I found it attractive to use an associative array like this:

site[example.com,example_com]
site[example.com,us_en]
site[example.nl,example_nl]
site[example.nl,nl_en]
site[example.de,example_de]
site[example.de,de_en]
site_rows=3
  • Advantage: No additonal 'column' for primary keys - Small matrix
  • Disadvantage: It becomes a bit tricky to collect the data that would be part of one 'row': I have to use one of the other fields as a make-shift primary key.

With numerical index

Let's include a numerical index like this:

site[1,example.com example_com]
site[1,example.com us_en]
site[2,example.nl example_nl]
site[2,example.nl nl_en]
site[3,example.de example_de]
site[3,example.de de_en]
site_rows=3
  • Disadvantage: Additional index - But no additional rows
  • Advantage: It's easier to collect the data that would be part of a 'row' as there is now a genuine index.

However, in practice it's much easier to loop through this array and use its entries. E.g.:

i=1; for ((i; i<=$site_rows; i++))
do
	#
	# Extract row entities
	########################################
	#
	site_cat=${site[$i,cat]}
	site_url=${site[$i,url]}
	site_db=${site[$i,db]}


	# Check
	########################################
	#
	echo ""; echo "### Loop - row: $i - site_url: $site_url"
	#
	echo "	site_cat: $site_cat"
	echo "	site_url: $site_url"
	echo "	site_db:  $site_db"


	# Execute
	########################################
	#
	backup_database
	disable_woocommerce_attribute_lookup
	delete_transients
	wp_update_site
	#
done

Additional index vs. separate array

I experimented with separate arrays for different sites (as part of a server update script), but it didn't work very well: For every array I had to duplicate the loop do do stuff. I also couldn't concatenate these arrays, for there wouldn't be a unique PK anymore.

Seemed much easier to create one large table, and include an index tag with values like e.g., zwk_woo to indicate customer zwk and that this is a WooCommerce site. In a loop, it would be easy to take these into account.

Detect an missing entry?

Problem

I use an associative array for translations. Consider these items:

((i++))
tr[$i,tag]="_empty_pt_strange"
tr[$i,nl]="Zwuk - Overig"
tr[$i,en]=""


((i++))
tr[$i,tag]="_px_"
tr[$i,nl]="Stofzuiger"
tr[$i,de]="Staubsauger"
  • In the first row, something is translated to an empty string - That's fine!
  • In the second row, there is a German translation, but no English translation.

The problem: How to distinguish between an 'empty translation' and a missing translation?

Ideas

  • In the first example, one of the tags is _empty_ so that could be used, but that's computatively intense, plus human as I am, I'm likely to forget to include this tag at times.
  • Can you distinguish between an empty entry and a non-existent entry? → Yes. See solution
  • Can you detect a missing index? → This would be ideal → Nope

Solution

Check that the entity is set:

unset j
declare -gA j

j[one]="Eén"
j[two]=""

[[ -v j[one] ]]   && echo "	true - j[one]"     # True: Entry exists (and may be empty)
[[ -v j[two] ]]   && echo "	true - j[two]"	   # True: Entry exists (and may be empty)
[[ -v j[three] ]] && echo "	true - j[three]"   # False: Entry doesn't exist - unset

Alternatives?

It's still messy. Let's have an open mind concerning alternatives:

Awk

And for something entirely different: I kinda move from spreadsheets to associative arrays and back. A while ago I saw on YouTube Gary Explains: EVERYONE Needs to Learn a Little Bit of AWK! - Maybe awk is what I have been looking for whole my life?

Database table

This actually sounds like a perfect job for a database table.

Python?

Maybe use Python for this, rather than Bash?

Spreadsheet?

Maybe retrieve data from a spreadsheet?

A basic reason for not using spreadsheets for this kind of data: Just like not using a word processor for programming, but rather an editor, a spreadsheet is not precise enough. Auto-corrections like capitalisations, changing dashes, and not being able to store whitespace reliably.

See also

Sources