Serialised data (WordPress)

Uit De Vliegende Brigade
Naar navigatie springen Naar zoeken springen

Soms bevatten database-velden serialised data: Meerdere gegevens in één veld, die zijn samengevoegd middels een bepaalde syntaxis.

Je kunt zeggen dat serialised data nogal haaks staat op het concept van databases. Aan de andere kant: Het biedt flexibiliteit die anders soms niet mogelijk is. Daarnaast wordt dit juist gebruikt om objecten op te slaan

Voorbeeld

In tabel wp_options wordt in het veld widget_woocommerce_layered_nav alle Woocommerce-widgets gecodeerd. Dit is een voorbeeld van een webwinkel in okt. 2020.

Rauwe inhoud:

a:14:{i:4;a:4:{s:5:"title";s:21:"Originele priri";s:9:"attribute";s:21:"originele priri";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}i:5;a:4:{s:5:"title";s:12:"Breedte (mm)";s:9:"attribute";s:7:"breedte";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}i:6;a:4:{s:5:"title";s:11:"Diepte (mm)";s:9:"attribute";s:6:"diepte";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}i:7;a:4:{s:5:"title";s:11:"Lengte (mm)";s:9:"attribute";s:6:"lengte";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}i:8;a:4:{s:5:"title";s:18:"Automatische stop?";s:9:"attribute";s:17:"automatische_stop";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}i:9;a:4:{s:5:"title";s:12:"Stroomdraad?";s:9:"attribute";s:11:"stroomdraad";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}i:10;a:4:{s:5:"title";s:10:"Connector?";s:9:"attribute";s:9:"connector";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}i:11;a:4:{s:5:"title";s:5:"Veer?";s:9:"attribute";s:4:"veer";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}i:12;a:4:{s:5:"title";s:7:"Diagram";s:9:"attribute";s:7:"diagram";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}i:13;a:4:{s:5:"title";s:19:"Ztoepassing - Model";s:9:"attribute";s:16:"toepassing_model";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}i:14;a:4:{s:5:"title";s:21:"Toepassing - Submodel";s:9:"attribute";s:19:"toepassing_submodel";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}i:15;a:4:{s:5:"title";s:17:"Azijn-productcode";s:9:"attribute";s:10:"azijn_code";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}i:16;a:4:{s:5:"title";s:11:"Productlijn";s:9:"attribute";s:11:"productlijn";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}s:12:"_multiwidget";i:1;}

De eerste paar regels in meer leesbare vorm:

a:14:
{
	i:4;a:4:
	{
		s:5:"title";
		s:21:"Originele pripripripr";
		s:9:"attribute";
		s:21:"originele pripripripr";
		s:12:"display_type";
		s:4:"list";
		s:10:"query_type";
		s:3:"and";
	}
	i:5;a:4:
	{
		s:5:"title";s:12:"Breedte (mm)";s:9:"attribute";s:7:"breedte";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";
	}
	i:6;a:4:
	{
		s:5:"title";s:11:"Diepte (mm)";s:9:"attribute";s:6:"diepte";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";
	}
	i:7;a:4:
	{
		s:5:"title";s:11:"Lengte (mm)";s:9:"attribute";s:6:"lengte";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";
	}
	i:8;a:4:
	{
		s:5:"title";s:18:"Automatische stop?";s:9:"attribute";s:17:"automatische_stop";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";
	}
	i:9;a:4:
	{
		s:5:"title";s:12:"Stroomdraad?";s:9:"attribute";s:11:"stroomdraad";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";
	}
}

PHP: Serialize & unserialize

WordPress gebruikt de PHP-functie serialize() om data te serializen voor opslag in de database en de functie unserialize voor de tegenovergestelde bewerking.

Voorbeeld [1]:

$serialized = 'a:3:{i:0;s:5:"apple";i:1;s:6:"banana";i:2;s:6:"orange";}';
var_dump(unserialize($serialized));

Output:

Array
(
    [0] => apple
    [1] => banana
    [2] => orange
)

WP-CLI

De WordPress Command Line Interface kan verrassend goed overweg met serialised data in tabel wp_options. Zie WP-CLI (commando's) » wp option voor details.

MySQL

MySQL lijkt geen ingebouwde functies te hebben die vergelijkbaar zijn met de PHP-functies serialize en unserialize. Dit had overigens best gekund: MySQL heeft bv. wel functies om met XML om te gaan. Een impressie:

Algemeen

  • Tot op heden (2021.03.02) heb ik geen libraries of sprocs of scripts gevonden die dit kunnen [2]. En ook dat had best anders kunnen zijn: Zo ingewikkeld lijkt dit niet te zijn
  • Ik heb een primitieve parser voor find-en-replace. Zoiets als de ingebouwde functie replace, maar dan voor strings met serialised content met een beperkte vaste structuur - Zie verderop in dit artikel

Recursie

Syntaxis

Dankzij het voorbeeld hierboven, kun je al een deel van de syntaxis achterhalen:

  • Data-elementen zijn omhuld door { en }
  • a: Array
  • i: Counter; index
  • s: String + lengte. Bv. s:5"apple".

Stringlengte

Strings worden voorafgegaan door de lengte van de betreffende string, gemeten in UTF8-bytes. Als dit getal niet klopt, wordt de betreffende serialised data, niet verwerkt.

Voorbeelden:

# The string "0...9" is exactly 10 bytes long: Every character is exactly one byte
#
s:10:"0123456789"
# The letter "è" takes two bytes, hence the whole string is 7 bytes long, although only 4 characters long
#
s:7:"Bèèè"
concat('s:27:"Balais de charbon d', char(39), 'origine"')
#
# * Balais de charbon d	→ 19 characters & 19 bytes
# * char(39) 		→ 1 character & 1 byte
# * origine		→ 7 characters & 7 bytes
#
# * Total: 27 bytes

Enveloping symbols

  • Usually, you can choose to envelop a string within 'apostrophes' or within "quotes"
  • However, we have a complication here: The actual string already contains both apostrophes & quotes
  • The way to solve this: Use apostrophes for enveloping the string. When an apostrophe is needed inside the string, and use char(39) to represent the apostrophe inside the string
  • BTW: char(39) counts as one byte

Voorbeeld:

concat('s:27:"Balais de charbon d', char(39), 'origine"')
#
# * The string (inside the apostrophes) becomes
# * s:27:"Balais de charbon d'origine"

MySQL - Sproc replace_serial

Dit was een leuk puzzeltje om te doen, maar het practische nut is helaas beperkt.

CREATE DEFINER=`strompf`@`localhost` PROCEDURE `replace_serial`(
	in table_name_in		tinytext,	# Source table name
        in col_serial_data_src_in	tinytext,	# Column with source serialised data
        in col_serial_data_dst_in	tinytext,	# Column with destination serialised data
        in col_src_in			tinytext,	# Column with source text strings
        in col_dst_in			tinytext	# Column with desination text string
    )
BEGIN
#
# Find-and-replace within serialised content
################################################################################
#
# * This sproc "replace_serial" needs as input pointers to a table 
#   ("pointer table") with the following columns:
#   * Column with the source serialised data
#   * Column for storing the updated serialised data
#   * Column with find-string
#   * Column with replace-strings
# * It can be any table: The arguments for replace_serial() contains the name
#   of this table and the names of those four columns
# * Internally, data from this input-table is copied to temporary table 
#   "serial_tmp"
# * The designated serialised data is decomposed into several fields (again,
#   in table "serial_tmp") - Currently, this sproc only works for serialised 
#   data with 1 a-element, followed by 6 s-elements. This sproc has not been 
#   tested with other configurations
# * Find-and-replace is only performed on element s4 (=the 4th s-element)
# * Subsequently, the new length of s4 is calculated, and the serialised string
#   is written to the designated column in the "pointer table"
# * Subsequently, it's up to the programmer to process the resulting data in
#   the pointer table
# * This approach with a "pointer table" works really well: Easy to grasp,
#   flexible and easy to debug. It's especially handy for debugging or further
#   processing that the output is in a separate column from the input.
# * Copyright Jeroen Strompf - De Vliegende Brigade - 2021.03.09
#
#
# History
########################################
#
# * 2021.03.09: Created, tested and pushed to "bal_dwh_org"
#
#
################################################################################
# Define working table "serial_tmp"
################################################################################
#
drop table if exists 
	serial_tmp;
create temporary table
	serial_tmp
    (
	serial_data_src	varchar(255)	comment "Actual serialised data",	
        serial_data_dst	varchar(255)	comment "Resulting serialised data. This is a separate field, to make debugging easier",
        find_string	varchar(255)	comment "String to be found within serial_data_src",
        replace_string	varchar(255)	comment "Replacement string for within serial_data_src, resulting in serial_data_dst",

        a_len		smallint,
        a_string	varchar(255)	comment "The string within the accolade",
        s1		varchar(255)	comment "Complete s-part, including length statement and string",
	s2		varchar(255),
	s3		varchar(255),
	s4		varchar(255)	comment "Complete s4-string",
        s4_len_src	smallint	comment "Original length of s4",
        s4_string_src	varchar(255)	comment "Original s4 string",
        s4_len_dst	smallint	comment "New length",
        s4_string_dst	varchar(255)	comment "New string",
	s5		varchar(255),
	s6		varchar(255)
    );


################################################################################
# Copy working data to working table
################################################################################
#
set @ddl=concat
(
	"insert into serial_tmp 
    (
	serial_data_src,
        find_string,
        replace_string
    )
    select ",
	col_serial_data_src_in, ", ",
        col_src_in, ", ",
        col_dst_in, " ",
	"from ",
	    table_name_in, ";"
);

prepare stmt from @ddl;
execute stmt;
deallocate prepare stmt;


################################################################################
# Decompose serialised data
################################################################################
#
# Leading a-field
########################################
#
update
	serial_tmp
set
	a_len		= substring_index(substring_index(serial_data_src, ":", 2), ":", -1),
	a_string	= substring_index(substring_index(serial_data_src, "}",1), "{", -1);


# Complete s-fields
########################################
#
update
    serial_tmp
set
    s1 = substring_index(a_string, ";", 1),
    s2 = substring_index(substring_index(a_string, ";", 2), ";", -1),
    s3 = substring_index(substring_index(a_string, ";", 3), ";", -1),
    s4 = substring_index(substring_index(a_string, ";", 4), ";", -1),
    s5 = substring_index(substring_index(a_string, ";", 5), ";", -1),
    s6 = substring_index(substring_index(a_string, ";", 6), ";", -1);


# Further decompose s4
########################################
#
update
    serial_tmp
set
    s4_len_src	  = substring_index(substring_index(s4, ":", 2), ":", -1),
    s4_string_src = substring_index(substring_index(s4, '"', 2), '"', -1);


################################################################################
# Find-and-replace
################################################################################
#
# Finally! The heart of the matter
#
update
    serial_tmp
set
    s4_string_dst = replace(s4_string_src, find_string, replace_string),
    s4_len_dst 	  = length(s4_string_dst);


################################################################################
# Reassemble
################################################################################
#
update
    serial_tmp
set
    serial_data_dst = concat
    (
        "a:",a_len, ":",
        "{",
	    s1, ";", s2, ";", s3, ";",
            "s:",s4_len_dst,":",
            char(34),
		s4_string_dst,
	    char(34),
            ";", s5, ";", s6, ";",
	"}"
    );


################################################################################
# Copy results back to the original input table
################################################################################
#
# * Fill in the result in column "col_serial_data_dst_in"
#
set @ddl2=concat
(
    "update ", table_name_in, " 
    join 
        serial_tmp
        on
        serial_tmp.serial_data_src = ", table_name_in, ".",col_serial_data_src_in," 
    set
       ",table_name_in, ".",col_serial_data_dst_in, " = serial_tmp.serial_data_dst;"
);

prepare stmt from @ddl2;
execute stmt;
deallocate prepare stmt;

END

Bronnen