Regular expressions (Bash)

Uit De Vliegende Brigade
(wijz) ← Oudere versie | Huidige versie (wijz) | Nieuwere versie → (wijz)
Naar navigatie springen Naar zoeken springen

Regular expressions, or regex to make it sound more mysterious, are the black magic of programming. It's like doing mathematics with text. Or slightly more specific: advanced pattern matching.

Places where you find regex:

  • In grep for filtering output of a command
  • In comparisons, through use of the operator =~
  • Probably lots of other places.

I have the impression that regular expressions (regex) in Bash may not be the same as in MySQL, hence some more details in this article.

Match a substring

Probably the easiest case - No special characters or whatever needed:

i="blub"; [[ $i =~ blubber ]] && echo "i contains the substring 'blubber' "   # False
i="blub"; [[ $i =~ blub ]] && echo "i contains the substring 'blub' "   # True

Match a single digit

[] denotes single character-comparison, meaning that the comparison is true as soon as the string contains one of the characters indicated within [].

[[ $i =~ [2] ]] && echo "i contains '2'"
[[ $i =~ [12] ]] && echo "i contains '1' and/or '2'"

Match ranges of numbers or letters

i="blub"; [[ $i =~ [0-9] ]] && echo "i contains a number"   # False
i="blu1"; [[ $i =~ [0-9] ]] && echo "i contains a number"   # True
i="1111"; [[ $i =~ [0-9] ]] && echo "i contains a number"   # True

i="blub"; [[ $i =~ [A-Z] ]] && echo "i contains at least one uppercase letter"   # False
i="BLUB"; [[ $i =~ [A-Z] ]] && echo "i contains at least one uppercase letter"   # True
i="blub"; [[ $i =~ [a-z] ]] && echo "i contains at least one lowercase letter"   # True

i="blub"; [[ $i =~ [a-zA-Z] ]] && echo "i contains at least one letter"   # True

Using grub:

echo 12345 | grep [0-9]
12345

Sequences

  • ^: Beginning of the string
  • $: End of the string
i="BLuB"; [[ $i =~ ^[A-Z]+$ ]] && echo "i contains only capital letters"   # False
i="BLUB"; [[ $i =~ ^[A-Z]+$ ]] && echo "i contains only capital letters"   # True

Capture group

Really cool stuff: Substring extraction (Bash)

Logical OR

Filter using regex

How can I filter stuff? E.g.:

12.5(diameter) → 12.5

Some tentative impressions:

Casus: Tags & associate arrays - 2022.10

Everything works:

################################################################################
# Compare tags
################################################################################
#
source load_site_array.sh


# Show all entries
########################################
#
# echo ${site_array[@]}	# Show all - Kinda messy
# echo ${#site_array[@]}	# She entries - Doesn't say much
#
# for i in "${site_array[@]}"
# do
# 	echo "	Entry: $i"
# done


# for i in "${!site_array[@]}"
# do
# 	echo "	Entry: $i"
# done

echo "site_array_rows: $site_array_rows"

for i in `seq 0 $site_array_rows`
do
	echo "Row $i:"
	echo "	Tag: ${site_array[$i,tag]}"
	echo "	URL: ${site_array[$i,url]}"


	# OK - Compare - letter 'a'
	########################################
	#
	if [[ "${site_array[$i,tag]}" =~ [a] ]]
	then
		#
		echo "		Tag contains an 'a'"
	else
		echo "		Tag doesn't contain an 'a'"		
	fi		


	# OK - Compare - word 'bal'
	########################################
	#
	if [[ "${site_array[$i,tag]}" =~ bal ]]
	then
		#
		echo "		Tag contains word 'bal'"
	else
		echo "		Tag doesn't contain word 'bal'"		
	fi		


	# OK - Compare - word '_bal_'
	########################################
	#
	if [[ "${site_array[$i,tag]}" =~ _bal_ ]]
	then
		#
		echo "		Tag contains word '_bal_'"
	else
		echo "		Tag doesn't contain word '_bal_'"		
	fi


	# OK - Compare - word '_bal_' & '_dvb8_'
	########################################
	#
	if [[ "${site_array[$i,tag]}" =~ _bal_ ]] && [[ "${site_array[$i,tag]}" =~ _dvb8_ ]]
	then
		#
		echo "		Tag contains words '_bal_' & '_dvb8_'"
	else
		echo "		Tag doesn't contain words '_bal_' & '_dvb8_'"		
	fi


	# OK - Compare - word '_bal_' & '_dvb8_' - Split lines
	########################################
	#
	if [[ "${site_array[$i,tag]}" =~ _bal_ ]] && \
	   [[ "${site_array[$i,tag]}" =~ _dvb8_ ]]
	then
		#
		echo "		Tag contains words '_bal_' & '_dvb8_'"
	else
		echo "		Tag doesn't contain words '_bal_' & '_dvb8_'"		
	fi


	# OK - Compare - 3 words + Split lines
	########################################
	#
	if [[ "${site_array[$i,tag]}" =~ _bal_ ]] && \
	   [[ "${site_array[$i,tag]}" =~ _dvb8_ ]] && \
	   [[ "${site_array[$i,tag]}" =~ _cb_ ]]
	then
		#
		echo "		Tag contains words '_bal_', '_dvb8_' & '_cb_'"
	else
		echo "		Tag doesn't contain words '_bal_', '_dvb8_' & '_cb_'"
	fi
	#
done

See also

Sources