Serialised data (WordPress)
Soms bevatten database-velden serialised data: Meerdere gegevens in één veld, die zijn samengevoegd middels een bepaalde syntaxis.
Je kunt zeggen dat serialised data nogal haaks staat op het concept van databases. Aan de andere kant: Het biedt flexibiliteit die anders soms niet mogelijk is. Daarnaast wordt dit juist gebruikt om objecten op te slaan
Voorbeeld
In tabel wp_options
wordt in het veld widget_woocommerce_layered_nav
alle Woocommerce-widgets gecodeerd. Dit is een voorbeeld van een webwinkel in okt. 2020.
Rauwe inhoud:
a:14:{i:4;a:4:{s:5:"title";s:21:"Originele priri";s:9:"attribute";s:21:"originele priri";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}i:5;a:4:{s:5:"title";s:12:"Breedte (mm)";s:9:"attribute";s:7:"breedte";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}i:6;a:4:{s:5:"title";s:11:"Diepte (mm)";s:9:"attribute";s:6:"diepte";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}i:7;a:4:{s:5:"title";s:11:"Lengte (mm)";s:9:"attribute";s:6:"lengte";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}i:8;a:4:{s:5:"title";s:18:"Automatische stop?";s:9:"attribute";s:17:"automatische_stop";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}i:9;a:4:{s:5:"title";s:12:"Stroomdraad?";s:9:"attribute";s:11:"stroomdraad";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}i:10;a:4:{s:5:"title";s:10:"Connector?";s:9:"attribute";s:9:"connector";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}i:11;a:4:{s:5:"title";s:5:"Veer?";s:9:"attribute";s:4:"veer";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}i:12;a:4:{s:5:"title";s:7:"Diagram";s:9:"attribute";s:7:"diagram";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}i:13;a:4:{s:5:"title";s:19:"Ztoepassing - Model";s:9:"attribute";s:16:"toepassing_model";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}i:14;a:4:{s:5:"title";s:21:"Toepassing - Submodel";s:9:"attribute";s:19:"toepassing_submodel";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}i:15;a:4:{s:5:"title";s:17:"Azijn-productcode";s:9:"attribute";s:10:"azijn_code";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}i:16;a:4:{s:5:"title";s:11:"Productlijn";s:9:"attribute";s:11:"productlijn";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}s:12:"_multiwidget";i:1;}
De eerste paar regels in meer leesbare vorm:
a:14: { i:4;a:4: { s:5:"title"; s:21:"Originele pripripripr"; s:9:"attribute"; s:21:"originele pripripripr"; s:12:"display_type"; s:4:"list"; s:10:"query_type"; s:3:"and"; } i:5;a:4: { s:5:"title";s:12:"Breedte (mm)";s:9:"attribute";s:7:"breedte";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and"; } i:6;a:4: { s:5:"title";s:11:"Diepte (mm)";s:9:"attribute";s:6:"diepte";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and"; } i:7;a:4: { s:5:"title";s:11:"Lengte (mm)";s:9:"attribute";s:6:"lengte";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and"; } i:8;a:4: { s:5:"title";s:18:"Automatische stop?";s:9:"attribute";s:17:"automatische_stop";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and"; } i:9;a:4: { s:5:"title";s:12:"Stroomdraad?";s:9:"attribute";s:11:"stroomdraad";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and"; } }
PHP: Serialize & unserialize
WordPress gebruikt de PHP-functie serialize()
om data te serializen voor opslag in de database en de functie unserialize
voor de tegenovergestelde bewerking.
Voorbeeld [1]:
$serialized = 'a:3:{i:0;s:5:"apple";i:1;s:6:"banana";i:2;s:6:"orange";}'; var_dump(unserialize($serialized)); Output: Array ( [0] => apple [1] => banana [2] => orange )
WP-CLI
De WordPress Command Line Interface kan verrassend goed overweg met serialised data in tabel wp_options
. Zie WP-CLI (commando's) » wp option
voor details.
MySQL
MySQL lijkt geen ingebouwde functies te hebben die vergelijkbaar zijn met de PHP-functies serialize en unserialize. Dit had overigens best gekund: MySQL heeft bv. wel functies om met XML om te gaan. Een impressie:
Algemeen
- Tot op heden (2021.03.02) heb ik geen libraries of sprocs of scripts gevonden die dit kunnen [2]. En ook dat had best anders kunnen zijn: Zo ingewikkeld lijkt dit niet te zijn
- Ik heb een primitieve parser voor find-en-replace. Zoiets als de ingebouwde functie
replace
, maar dan voor strings met serialised content met een beperkte vaste structuur - Zie verderop in dit artikel
Recursie
- MySQL kent recursie en limited variable scope - Concepten die vermoedelijk goed van pas komen als ik ooit een flexibelere parser wil bouwen
- https://stackoverflow.com/questions/20215744/how-to-create-a-mysql-hierarchical-recursive-query
Syntaxis
Dankzij het voorbeeld hierboven, kun je al een deel van de syntaxis achterhalen:
- Data-elementen zijn omhuld door
{
en}
a
: Arrayi
: Counter; indexs
: String + lengte. Bv.s:5"apple"
.
Stringlengte
Strings worden voorafgegaan door de lengte van de betreffende string, gemeten in UTF8-bytes. Als dit getal niet klopt, wordt de betreffende serialised data, niet verwerkt.
Voorbeelden:
# The string "0...9" is exactly 10 bytes long: Every character is exactly one byte # s:10:"0123456789"
# The letter "è" takes two bytes, hence the whole string is 7 bytes long, although only 4 characters long # s:7:"Bèèè"
concat('s:27:"Balais de charbon d', char(39), 'origine"') # # * Balais de charbon d → 19 characters & 19 bytes # * char(39) → 1 character & 1 byte # * origine → 7 characters & 7 bytes # # * Total: 27 bytes
Enveloping symbols
- Usually, you can choose to envelop a string within 'apostrophes' or within "quotes"
- However, we have a complication here: The actual string already contains both apostrophes & quotes
- The way to solve this: Use apostrophes for enveloping the string. When an apostrophe is needed inside the string, and use char(39) to represent the apostrophe inside the string
- BTW:
char(39)
counts as one byte
Voorbeeld:
concat('s:27:"Balais de charbon d', char(39), 'origine"') # # * The string (inside the apostrophes) becomes # * s:27:"Balais de charbon d'origine"
MySQL - Sproc replace_serial
Dit was een leuk puzzeltje om te doen, maar het practische nut is helaas beperkt.
CREATE DEFINER=`strompf`@`localhost` PROCEDURE `replace_serial`( in table_name_in tinytext, # Source table name in col_serial_data_src_in tinytext, # Column with source serialised data in col_serial_data_dst_in tinytext, # Column with destination serialised data in col_src_in tinytext, # Column with source text strings in col_dst_in tinytext # Column with desination text string ) BEGIN # # Find-and-replace within serialised content ################################################################################ # # * This sproc "replace_serial" needs as input pointers to a table # ("pointer table") with the following columns: # * Column with the source serialised data # * Column for storing the updated serialised data # * Column with find-string # * Column with replace-strings # * It can be any table: The arguments for replace_serial() contains the name # of this table and the names of those four columns # * Internally, data from this input-table is copied to temporary table # "serial_tmp" # * The designated serialised data is decomposed into several fields (again, # in table "serial_tmp") - Currently, this sproc only works for serialised # data with 1 a-element, followed by 6 s-elements. This sproc has not been # tested with other configurations # * Find-and-replace is only performed on element s4 (=the 4th s-element) # * Subsequently, the new length of s4 is calculated, and the serialised string # is written to the designated column in the "pointer table" # * Subsequently, it's up to the programmer to process the resulting data in # the pointer table # * This approach with a "pointer table" works really well: Easy to grasp, # flexible and easy to debug. It's especially handy for debugging or further # processing that the output is in a separate column from the input. # * Copyright Jeroen Strompf - De Vliegende Brigade - 2021.03.09 # # # History ######################################## # # * 2021.03.09: Created, tested and pushed to "bal_dwh_org" # # ################################################################################ # Define working table "serial_tmp" ################################################################################ # drop table if exists serial_tmp; create temporary table serial_tmp ( serial_data_src varchar(255) comment "Actual serialised data", serial_data_dst varchar(255) comment "Resulting serialised data. This is a separate field, to make debugging easier", find_string varchar(255) comment "String to be found within serial_data_src", replace_string varchar(255) comment "Replacement string for within serial_data_src, resulting in serial_data_dst", a_len smallint, a_string varchar(255) comment "The string within the accolade", s1 varchar(255) comment "Complete s-part, including length statement and string", s2 varchar(255), s3 varchar(255), s4 varchar(255) comment "Complete s4-string", s4_len_src smallint comment "Original length of s4", s4_string_src varchar(255) comment "Original s4 string", s4_len_dst smallint comment "New length", s4_string_dst varchar(255) comment "New string", s5 varchar(255), s6 varchar(255) ); ################################################################################ # Copy working data to working table ################################################################################ # set @ddl=concat ( "insert into serial_tmp ( serial_data_src, find_string, replace_string ) select ", col_serial_data_src_in, ", ", col_src_in, ", ", col_dst_in, " ", "from ", table_name_in, ";" ); prepare stmt from @ddl; execute stmt; deallocate prepare stmt; ################################################################################ # Decompose serialised data ################################################################################ # # Leading a-field ######################################## # update serial_tmp set a_len = substring_index(substring_index(serial_data_src, ":", 2), ":", -1), a_string = substring_index(substring_index(serial_data_src, "}",1), "{", -1); # Complete s-fields ######################################## # update serial_tmp set s1 = substring_index(a_string, ";", 1), s2 = substring_index(substring_index(a_string, ";", 2), ";", -1), s3 = substring_index(substring_index(a_string, ";", 3), ";", -1), s4 = substring_index(substring_index(a_string, ";", 4), ";", -1), s5 = substring_index(substring_index(a_string, ";", 5), ";", -1), s6 = substring_index(substring_index(a_string, ";", 6), ";", -1); # Further decompose s4 ######################################## # update serial_tmp set s4_len_src = substring_index(substring_index(s4, ":", 2), ":", -1), s4_string_src = substring_index(substring_index(s4, '"', 2), '"', -1); ################################################################################ # Find-and-replace ################################################################################ # # Finally! The heart of the matter # update serial_tmp set s4_string_dst = replace(s4_string_src, find_string, replace_string), s4_len_dst = length(s4_string_dst); ################################################################################ # Reassemble ################################################################################ # update serial_tmp set serial_data_dst = concat ( "a:",a_len, ":", "{", s1, ";", s2, ";", s3, ";", "s:",s4_len_dst,":", char(34), s4_string_dst, char(34), ";", s5, ";", s6, ";", "}" ); ################################################################################ # Copy results back to the original input table ################################################################################ # # * Fill in the result in column "col_serial_data_dst_in" # set @ddl2=concat ( "update ", table_name_in, " join serial_tmp on serial_tmp.serial_data_src = ", table_name_in, ".",col_serial_data_src_in," set ",table_name_in, ".",col_serial_data_dst_in, " = serial_tmp.serial_data_dst;" ); prepare stmt from @ddl2; execute stmt; deallocate prepare stmt; END
Bronnen
- https://stackoverflow.com/questions/13319638/working-with-serialized-data-in-wordpress
- https://stackoverflow.com/questions/46908387/how-to-use-a-serialized-or-json-encoded-attribute-in-a-mysql-stored-procedure
- https://stackoverflow.com/questions/8922668/updating-base64-encode-serialized-array-on-mysql-databases - unserialize, re-serialize
- https://stackoverflow.com/questions/5335970/unserialize-through-query-at-database-level-itself
- https://stackoverflow.com/questions/62980734/deserialize-and-serialize-data-in-mysql - Same use case as me. I've posted my sproc here
- https://wpengine.com/support/wordpress-serialized-data