Regular expressions (Bash): verschil tussen versies

Uit De Vliegende Brigade
Naar navigatie springen Naar zoeken springen
 
Regel 80: Regel 80:
  
 
== Casus: Tags & associate arrays - 2022.10 ==
 
== Casus: Tags & associate arrays - 2022.10 ==
 +
 +
Everything works:
  
 
<pre>
 
<pre>
Regel 163: Regel 165:
  
  
# Compare - word '_bal_' & '_dvb8_' - Split lines
+
# OK - Compare - word '_bal_' & '_dvb8_' - Split lines
 
########################################
 
########################################
 
#
 
#
Regel 176: Regel 178:
  
  
# Compare - 3 words + Split lines
+
# OK - Compare - 3 words + Split lines
 
########################################
 
########################################
 
#
 
#

Huidige versie van 31 okt 2022 om 14:19

Regular expressions, or regex to make it sound more mysterious, are the black magic of programming. It's like doing mathematics with text. Or slightly more specific: advanced pattern matching.

Places where you find regex:

  • In grep for filtering output of a command
  • In comparisons, through use of the operator =~
  • Probably lots of other places.

I have the impression that regular expressions (regex) in Bash may not be the same as in MySQL, hence some more details in this article.

Match a substring

Probably the easiest case - No special characters or whatever needed:

i="blub"; [[ $i =~ blubber ]] && echo "i contains the substring 'blubber' "   # False
i="blub"; [[ $i =~ blub ]] && echo "i contains the substring 'blub' "   # True

Match a single digit

[] denotes single character-comparison, meaning that the comparison is true as soon as the string contains one of the characters indicated within [].

[[ $i =~ [2] ]] && echo "i contains '2'"
[[ $i =~ [12] ]] && echo "i contains '1' and/or '2'"

Match ranges of numbers or letters

i="blub"; [[ $i =~ [0-9] ]] && echo "i contains a number"   # False
i="blu1"; [[ $i =~ [0-9] ]] && echo "i contains a number"   # True
i="1111"; [[ $i =~ [0-9] ]] && echo "i contains a number"   # True

i="blub"; [[ $i =~ [A-Z] ]] && echo "i contains at least one uppercase letter"   # False
i="BLUB"; [[ $i =~ [A-Z] ]] && echo "i contains at least one uppercase letter"   # True
i="blub"; [[ $i =~ [a-z] ]] && echo "i contains at least one lowercase letter"   # True

i="blub"; [[ $i =~ [a-zA-Z] ]] && echo "i contains at least one letter"   # True

Using grub:

echo 12345 | grep [0-9]
12345

Sequences

  • ^: Beginning of the string
  • $: End of the string
i="BLuB"; [[ $i =~ ^[A-Z]+$ ]] && echo "i contains only capital letters"   # False
i="BLUB"; [[ $i =~ ^[A-Z]+$ ]] && echo "i contains only capital letters"   # True

Capture group

Really cool stuff: Substring extraction (Bash)

Logical OR

Filter using regex

How can I filter stuff? E.g.:

12.5(diameter) → 12.5

Some tentative impressions:

Casus: Tags & associate arrays - 2022.10

Everything works:

################################################################################
# Compare tags
################################################################################
#
source load_site_array.sh


# Show all entries
########################################
#
# echo ${site_array[@]}	# Show all - Kinda messy
# echo ${#site_array[@]}	# She entries - Doesn't say much
#
# for i in "${site_array[@]}"
# do
# 	echo "	Entry: $i"
# done


# for i in "${!site_array[@]}"
# do
# 	echo "	Entry: $i"
# done

echo "site_array_rows: $site_array_rows"

for i in `seq 0 $site_array_rows`
do
	echo "Row $i:"
	echo "	Tag: ${site_array[$i,tag]}"
	echo "	URL: ${site_array[$i,url]}"


	# OK - Compare - letter 'a'
	########################################
	#
	if [[ "${site_array[$i,tag]}" =~ [a] ]]
	then
		#
		echo "		Tag contains an 'a'"
	else
		echo "		Tag doesn't contain an 'a'"		
	fi		


	# OK - Compare - word 'bal'
	########################################
	#
	if [[ "${site_array[$i,tag]}" =~ bal ]]
	then
		#
		echo "		Tag contains word 'bal'"
	else
		echo "		Tag doesn't contain word 'bal'"		
	fi		


	# OK - Compare - word '_bal_'
	########################################
	#
	if [[ "${site_array[$i,tag]}" =~ _bal_ ]]
	then
		#
		echo "		Tag contains word '_bal_'"
	else
		echo "		Tag doesn't contain word '_bal_'"		
	fi


	# OK - Compare - word '_bal_' & '_dvb8_'
	########################################
	#
	if [[ "${site_array[$i,tag]}" =~ _bal_ ]] && [[ "${site_array[$i,tag]}" =~ _dvb8_ ]]
	then
		#
		echo "		Tag contains words '_bal_' & '_dvb8_'"
	else
		echo "		Tag doesn't contain words '_bal_' & '_dvb8_'"		
	fi


	# OK - Compare - word '_bal_' & '_dvb8_' - Split lines
	########################################
	#
	if [[ "${site_array[$i,tag]}" =~ _bal_ ]] && \
	   [[ "${site_array[$i,tag]}" =~ _dvb8_ ]]
	then
		#
		echo "		Tag contains words '_bal_' & '_dvb8_'"
	else
		echo "		Tag doesn't contain words '_bal_' & '_dvb8_'"		
	fi


	# OK - Compare - 3 words + Split lines
	########################################
	#
	if [[ "${site_array[$i,tag]}" =~ _bal_ ]] && \
	   [[ "${site_array[$i,tag]}" =~ _dvb8_ ]] && \
	   [[ "${site_array[$i,tag]}" =~ _cb_ ]]
	then
		#
		echo "		Tag contains words '_bal_', '_dvb8_' & '_cb_'"
	else
		echo "		Tag doesn't contain words '_bal_', '_dvb8_' & '_cb_'"
	fi
	#
done

See also

Sources