Associative arrays (Bash)
An associative array is an array where the index can be symbolic, rather than only numerical. E.g.:
declare -A j j[fruit]=apple j[color]=blue
- You can use associative arrays to mimic multi-dimensional arrays, with emphasize on mimic
- Associative arrays are new in Bash 4. To verify which version of Bash you have:
bash --version
.
There are no multidimensional arrays
Bash doesn't have multidimensional arrays (as of 2022.09.29). Associative arrays aren't multidimensional arrays either, but they can emulate them. This has some limitations and this can be tricky if you're not aware of them.
As an example:
unset j declare -A j j[0,0]="00" j[0,1]="01" j[0, 1]="0 1" for i in "${!j[@]}" do echo "Index: $i - Value: ${j[$i]}" done echo "Length of this array: ${#j[@]}"
Output:
Index: 0, 1 - Value: 0 1 Index: 0,1 - Value: 01 Index: 0,0 - Value: 00 Length of this array: 3
What this shows:
- The entry with index
[0, 1]
, is not the same as the entry with index[0,1]
. This shows that everything between[]
is regarded as just one index and not as something multidimensional - When retrieving the dimension of the array, it returns only one number. Because it's still just a vector.
But does this actually matter? Sometimes it probably doesn't: It took me a while between adopting associative arrays and realizing their limitations. So far, these are the issues I've encountered:
- Retrieve number of rows? There is no meaningful way of retrieving the number of rows, as the matrix is actually turned into a vector
- Iterate over rows? As you don't have real rows, you can't iterate over them either
- Retrieve a specific entry: This is probably the hardest challenge: How to retrieve an entry with a specific index? In an efficient way, concerning CPU time and code overhead?
- More?
These issues are discussed in the following chapters:
Retrieve number of rows or columns?
A problem with this 'emulated' multi-dimensional arrays: You can't read-out the number of rows or columns, for there are no real rows and columns. Illustration:
# The array below is 3x2 # unset j declare -A j j[1,1]="11" j[1,2]="12" j[2,1]="21" j[2,2]="22" j[3,1]="31" j[3,2]="32" echo "Length: ${#j[@]}" echo "Complete array: ${j[@]}"
Output:
Length: 6 Complete array: 21 22 31 31 12 11
This can be a problem, when you want to loop over the rows - Can't do that. But see below for a solution.
Iterate over rows?
If there is not really such a thing as rows and columns, than how to iterate over them?
That's probably not so difficult, Just remember that the 'multidimensional index' is just one index, and that you can iterate over it. E.g.:
for i in "${!imwiz[@]}" do echo "Index: $i" echo "Value: ${imwiz[$i]}" done
Retrieve a specific entry?
How to retrieve a specific entry? I don't have the final answer yet, but it's coming: Substring extraction (Bash).
Conventions
To alluviate some of the issues discussed before:
- When arrays have a numerical index, always start with the same number ⇒ I prefer base 1, just as in matrix algebra
- For two-dimensional arrays, the first dimensions is always rows (x) and the second is always columns (y) - Just as in matrix algebra
- Don't use spaces around the
,
that separates indices: You need a uniform syntax, and using spaces actually messes up language highlighting in Sublime Text - Store the dimensions in associated variables (when needed). E.g., in the example above:
j_rows=3
andj_columns=2
Examples
As stated above, these are not really multidimensional arrays, just arrays with fancy indices. It doesn't matter if these indices are numerical or symbolic:
declare -A j j[0,0,0]="000" j[0,0,1]="001" j[0,1,0]="010" j[0,1,1]="011" j[1,0,0]="100" j[1,0,1]="101" j[1,1,0]="110" j[1,1,1]="111" echo "${j[0,0,0]} ${j[0,0,1]} ${j[0,1,0]} ${j[0,1,1]}" echo "${j[1,0,0]} ${j[1,0,1]} ${j[1,1,0]} ${j[1,1,1]}"
unset j declare -A j j[fruit,one]=Mango j[fruit,two]=Apple j[bird,1]=Cockatail j[bird,2]=Spottingbird j[flower,1]=Rose j[flower,2]=Sunflower j[animal]=Tiger for i in "${j[@]}" do echo "Entry: $i" done
Output:
Entry: Cockatail Entry: Spottingbird Entry: Rose Entry: Sunflower Entry: Tiger Entry: Apple Entry: Mango
Note that the entries seem to appear in arbitrary order
Loop through an array
Again: These are not multidimensional arrays. They only seem that way.....
This works [1]:
declare -A j j[fruit]=Mango j[bird]=Cockatail j[flower]=Rose j[animal]=Tiger for i in "${j[@]}" do echo "Entry: $i" done
Output:
Entry: Mango Entry: Rose Entry: Tiger Entry: Cockatail
This works:
declare -A j j[fruit,1]=Mango j[fruit,2]=Apple j[bird,1]=Cockatail j[bird,2]=Spottingbird j[flower,1]=Rose j[flower,2]=Sunflower j[animal,1]=Tiger j[animal,1]=Mouse for i in "${j[@]}" do echo "Entry: $i" done
Output:
Entry: Cockatail Entry: Spottingbird Entry: Rose Entry: Sunflower Entry: Mouse Entry: Mango Entry: Apple
The only problem: The entries seem to be quite random. This is also the case if I insert statement unset j
at the beginning of the script.
Looping through the rows of an associate array: This one isn't as cool as the code before, because the index is given explicitly:
i=1 for ((i; i<=$number_of_sites; i++)) do echo "Row ${i}: ${site[$i,1]} & ${site[$i,2]}" done
Loop through the index of an array
Use the symbol !
to refer to an array's index, rather than its content. Remember that with associative arrays, you define the index yourself. There is no numerical index:
declare -A j j[fruit,1]=Mango j[fruit,2]=Apple j[bird,1]=Cockatail j[bird,2]=Spottingbird j[flower,1]=Rose j[flower,2]=Sunflower j[animal,1]=Tiger j[animal,1]=Mouse for i in "${!j[@]}" do echo "Index: $i" done
With output:
Index: bird,1 Index: bird,2 Index: flower,1 Index: flower,2 Index: animal,1 Index: fruit,1 Index: fruit,2
Loop over index + value
Again, not very exciting, but maybe instructive at times:
unset j declare -A j j[fruit,one]=Mango j[fruit,two]=Apple j[bird,1]=Cockatail j[bird,2]=Spottingbird j[flower,1]=Rose j[flower,2]=Sunflower j[animal]=Tiger for i in "${!j[@]}" do echo "Index: $i - Value: ${j[$i]}" done
Output:
Index: bird,1 - Value: Cockatail Index: bird,2 - Value: Spottingbird Index: flower,1 - Value: Rose Index: flower,2 - Value: Sunflower Index: animal - Value: Tiger Index: fruit,two - Value: Apple Index: fruit,one - Value: Mango
Length of an array
Use the symbol #
to retrieve the length of an array. Since associative arrays are just vectors with fancy indices, there is only one dimension to retrieve: Its length:
unset j declare -A j j[0,0]="00" j[0,1]="01" j[1,0]="10" j[1,1]="11" j[2,0]="20" j[2,1]="21" echo "Length: ${#j[@]}"
Output:
Length: 6
Have a numerical index?
Example: Let's consider a matrix like:
example.com example_com us_en example.nl example_nl nl_en example.de example_de de_en
Without numerical index
In Sep. 2022, I found it attractive to use an associative array like this:
site[example.com,example_com] site[example.com,us_en] site[example.nl,example_nl] site[example.nl,nl_en] site[example.de,example_de] site[example.de,de_en]
- Advantage: No additonal 'column' for primary keys - Small matrix
- Disadvantage: It becomes a bit tricky to collect the data that would be part of one 'row': I have to use one of the other fields as a make-shift primary key.
With numerical index
Let's include a numerical index like this:
site[1,example.com example_com] site[1,example.com us_en] site[2,example.nl example_nl] site[2,example.nl nl_en] site[3,example.de example_de] site[3,example.de de_en]
- Disadvantage: Additional index - But no additional rows
- Advantage: It's easier to collect the data that would be part of a 'row' as there is now a genuine index.
Alternatives?
It's still messy. Let's have an open mind concerning alternatives:
Awk
And for something entirely different: I kinda move from spreadsheets to associative arrays and back. A while ago I saw on YouTube Gary Explains: EVERYONE Needs to Learn a Little Bit of AWK! - Maybe awk is what I have been looking for whole my life?
Database table
This actually sounds like a perfect job for a database table.
Python?
Maybe use Python for this, rather than Bash?