Serialised data (WordPress): verschil tussen versies

Uit De Vliegende Brigade
Naar navigatie springen Naar zoeken springen
Regel 74: Regel 74:
 
== WP-CLI ==
 
== WP-CLI ==
  
De WordPress Command Line Interface kan verrassend goed overweg met serialised data in tabel <code>wp_options</code>. Zie [[WP-CLI (commando's)]] » <code>wp option</code> voor details.
+
[[Hoofdpagina#WP-CLI WP-CLI]] can handle serialised data very well. The ''positional parameters'' <code>pluck</code> and <code>patch</code> are for reading and writing parts of serialised data
  
 
== MySQL ==
 
== MySQL ==

Versie van 29 aug 2022 11:40

Soms bevatten database-velden serialised data: Meerdere gegevens in één veld, die zijn samengevoegd middels een bepaalde syntaxis.

Je kunt zeggen dat serialised data nogal haaks staat op het concept van databases. Aan de andere kant: Het biedt flexibiliteit die anders soms niet mogelijk is. Daarnaast wordt dit juist gebruikt om objecten op te slaan

Voorbeeld

In tabel wp_options wordt in het veld widget_woocommerce_layered_nav alle Woocommerce-widgets gecodeerd. Dit is een voorbeeld van een webwinkel in okt. 2020.

Rauwe inhoud:

a:14:{i:4;a:4:{s:5:"title";s:21:"Originele priri";s:9:"attribute";s:21:"originele priri";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}i:5;a:4:{s:5:"title";s:12:"Breedte (mm)";s:9:"attribute";s:7:"breedte";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}i:6;a:4:{s:5:"title";s:11:"Diepte (mm)";s:9:"attribute";s:6:"diepte";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}i:7;a:4:{s:5:"title";s:11:"Lengte (mm)";s:9:"attribute";s:6:"lengte";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}i:8;a:4:{s:5:"title";s:18:"Automatische stop?";s:9:"attribute";s:17:"automatische_stop";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}i:9;a:4:{s:5:"title";s:12:"Stroomdraad?";s:9:"attribute";s:11:"stroomdraad";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}i:10;a:4:{s:5:"title";s:10:"Connector?";s:9:"attribute";s:9:"connector";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}i:11;a:4:{s:5:"title";s:5:"Veer?";s:9:"attribute";s:4:"veer";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}i:12;a:4:{s:5:"title";s:7:"Diagram";s:9:"attribute";s:7:"diagram";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}i:13;a:4:{s:5:"title";s:19:"Ztoepassing - Model";s:9:"attribute";s:16:"toepassing_model";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}i:14;a:4:{s:5:"title";s:21:"Toepassing - Submodel";s:9:"attribute";s:19:"toepassing_submodel";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}i:15;a:4:{s:5:"title";s:17:"Azijn-productcode";s:9:"attribute";s:10:"azijn_code";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}i:16;a:4:{s:5:"title";s:11:"Productlijn";s:9:"attribute";s:11:"productlijn";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}s:12:"_multiwidget";i:1;}

De eerste paar regels in meer leesbare vorm:

a:14:
{
	i:4;a:4:
	{
		s:5:"title";
		s:21:"Originele pripripripr";
		s:9:"attribute";
		s:21:"originele pripripripr";
		s:12:"display_type";
		s:4:"list";
		s:10:"query_type";
		s:3:"and";
	}
	i:5;a:4:
	{
		s:5:"title";s:12:"Breedte (mm)";s:9:"attribute";s:7:"breedte";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";
	}
	i:6;a:4:
	{
		s:5:"title";s:11:"Diepte (mm)";s:9:"attribute";s:6:"diepte";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";
	}
	i:7;a:4:
	{
		s:5:"title";s:11:"Lengte (mm)";s:9:"attribute";s:6:"lengte";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";
	}
	i:8;a:4:
	{
		s:5:"title";s:18:"Automatische stop?";s:9:"attribute";s:17:"automatische_stop";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";
	}
	i:9;a:4:
	{
		s:5:"title";s:12:"Stroomdraad?";s:9:"attribute";s:11:"stroomdraad";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";
	}
}

PHP: Serialize & unserialize

WordPress gebruikt de PHP-functie serialize() om data te serializen voor opslag in de database en de functie unserialize voor de tegenovergestelde bewerking.

Voorbeeld [1]:

$serialized = 'a:3:{i:0;s:5:"apple";i:1;s:6:"banana";i:2;s:6:"orange";}';
var_dump(unserialize($serialized));

Output:

Array
(
    [0] => apple
    [1] => banana
    [2] => orange
)

WP-CLI

Hoofdpagina#WP-CLI WP-CLI can handle serialised data very well. The positional parameters pluck and patch are for reading and writing parts of serialised data

MySQL

MySQL lijkt geen ingebouwde functies te hebben die vergelijkbaar zijn met de PHP-functies serialize en unserialize. Dit had overigens best gekund: MySQL heeft bv. wel functies om met XML om te gaan. Een impressie:

Algemeen

  • Tot op heden (2021.03.02) heb ik geen libraries of sprocs of scripts gevonden die dit kunnen [2]. En ook dat had best anders kunnen zijn: Zo ingewikkeld lijkt dit niet te zijn
  • Ik heb een primitieve parser voor find-en-replace. Zoiets als de ingebouwde functie replace, maar dan voor strings met serialised content met een beperkte vaste structuur - Zie verderop in dit artikel

Recursie

Syntaxis

Dankzij het voorbeeld hierboven, kun je al een deel van de syntaxis achterhalen:

  • Data-elementen zijn omhuld door { en }
  • a: Array
  • i: Counter; index
  • s: String + lengte. Bv. s:5"apple".

Stringlengte

Strings worden voorafgegaan door de lengte van de betreffende string, gemeten in UTF8-bytes. Als dit getal niet klopt, wordt de betreffende serialised data, niet verwerkt.

Voorbeelden:

# The string "0...9" is exactly 10 bytes long: Every character is exactly one byte
#
s:10:"0123456789"
# The letter "è" takes two bytes, hence the whole string is 7 bytes long, although only 4 characters long
#
s:7:"Bèèè"
concat('s:27:"Balais de charbon d', char(39), 'origine"')
#
# * Balais de charbon d	→ 19 characters & 19 bytes
# * char(39) 		→ 1 character & 1 byte
# * origine		→ 7 characters & 7 bytes
#
# * Total: 27 bytes

Enveloping symbols

  • Usually, you can choose to envelop a string within 'apostrophes' or within "quotes"
  • However, we have a complication here: The actual string already contains both apostrophes & quotes
  • The way to solve this: Use apostrophes for enveloping the string. When an apostrophe is needed inside the string, and use char(39) to represent the apostrophe inside the string
  • BTW: char(39) counts as one byte

Voorbeeld:

concat('s:27:"Balais de charbon d', char(39), 'origine"')
#
# * The string (inside the apostrophes) becomes
# * s:27:"Balais de charbon d'origine"

MySQL - Sproc replace_serial

Dit was een leuk puzzeltje om te doen, maar het practische nut is helaas beperkt.

CREATE DEFINER=`strompf`@`localhost` PROCEDURE `replace_serial`(
	in table_name_in		tinytext,	# Source table name
        in col_serial_data_src_in	tinytext,	# Column with source serialised data
        in col_serial_data_dst_in	tinytext,	# Column with destination serialised data
        in col_src_in			tinytext,	# Column with source text strings
        in col_dst_in			tinytext	# Column with desination text string
    )
BEGIN
#
# Find-and-replace within serialised content
################################################################################
#
# * This sproc "replace_serial" needs as input pointers to a table 
#   ("pointer table") with the following columns:
#   * Column with the source serialised data
#   * Column for storing the updated serialised data
#   * Column with find-string
#   * Column with replace-strings
# * It can be any table: The arguments for replace_serial() contains the name
#   of this table and the names of those four columns
# * Internally, data from this input-table is copied to temporary table 
#   "serial_tmp"
# * The designated serialised data is decomposed into several fields (again,
#   in table "serial_tmp") - Currently, this sproc only works for serialised 
#   data with 1 a-element, followed by 6 s-elements. This sproc has not been 
#   tested with other configurations
# * Find-and-replace is only performed on element s4 (=the 4th s-element)
# * Subsequently, the new length of s4 is calculated, and the serialised string
#   is written to the designated column in the "pointer table"
# * Subsequently, it's up to the programmer to process the resulting data in
#   the pointer table
# * This approach with a "pointer table" works really well: Easy to grasp,
#   flexible and easy to debug. It's especially handy for debugging or further
#   processing that the output is in a separate column from the input.
# * Copyright Jeroen Strompf - De Vliegende Brigade - 2021.03.09
#
#
# History
########################################
#
# * 2021.03.09: Created, tested and pushed to "bal_dwh_org"
#
#
################################################################################
# Define working table "serial_tmp"
################################################################################
#
drop table if exists 
	serial_tmp;
create temporary table
	serial_tmp
    (
	serial_data_src	varchar(255)	comment "Actual serialised data",	
        serial_data_dst	varchar(255)	comment "Resulting serialised data. This is a separate field, to make debugging easier",
        find_string	varchar(255)	comment "String to be found within serial_data_src",
        replace_string	varchar(255)	comment "Replacement string for within serial_data_src, resulting in serial_data_dst",

        a_len		smallint,
        a_string	varchar(255)	comment "The string within the accolade",
        s1		varchar(255)	comment "Complete s-part, including length statement and string",
	s2		varchar(255),
	s3		varchar(255),
	s4		varchar(255)	comment "Complete s4-string",
        s4_len_src	smallint	comment "Original length of s4",
        s4_string_src	varchar(255)	comment "Original s4 string",
        s4_len_dst	smallint	comment "New length",
        s4_string_dst	varchar(255)	comment "New string",
	s5		varchar(255),
	s6		varchar(255)
    );


################################################################################
# Copy working data to working table
################################################################################
#
set @ddl=concat
(
	"insert into serial_tmp 
    (
	serial_data_src,
        find_string,
        replace_string
    )
    select ",
	col_serial_data_src_in, ", ",
        col_src_in, ", ",
        col_dst_in, " ",
	"from ",
	    table_name_in, ";"
);

prepare stmt from @ddl;
execute stmt;
deallocate prepare stmt;


################################################################################
# Decompose serialised data
################################################################################
#
# Leading a-field
########################################
#
update
	serial_tmp
set
	a_len		= substring_index(substring_index(serial_data_src, ":", 2), ":", -1),
	a_string	= substring_index(substring_index(serial_data_src, "}",1), "{", -1);


# Complete s-fields
########################################
#
update
    serial_tmp
set
    s1 = substring_index(a_string, ";", 1),
    s2 = substring_index(substring_index(a_string, ";", 2), ";", -1),
    s3 = substring_index(substring_index(a_string, ";", 3), ";", -1),
    s4 = substring_index(substring_index(a_string, ";", 4), ";", -1),
    s5 = substring_index(substring_index(a_string, ";", 5), ";", -1),
    s6 = substring_index(substring_index(a_string, ";", 6), ";", -1);


# Further decompose s4
########################################
#
update
    serial_tmp
set
    s4_len_src	  = substring_index(substring_index(s4, ":", 2), ":", -1),
    s4_string_src = substring_index(substring_index(s4, '"', 2), '"', -1);


################################################################################
# Find-and-replace
################################################################################
#
# Finally! The heart of the matter
#
update
    serial_tmp
set
    s4_string_dst = replace(s4_string_src, find_string, replace_string),
    s4_len_dst 	  = length(s4_string_dst);


################################################################################
# Reassemble
################################################################################
#
update
    serial_tmp
set
    serial_data_dst = concat
    (
        "a:",a_len, ":",
        "{",
	    s1, ";", s2, ";", s3, ";",
            "s:",s4_len_dst,":",
            char(34),
		s4_string_dst,
	    char(34),
            ";", s5, ";", s6, ";",
	"}"
    );


################################################################################
# Copy results back to the original input table
################################################################################
#
# * Fill in the result in column "col_serial_data_dst_in"
#
set @ddl2=concat
(
    "update ", table_name_in, " 
    join 
        serial_tmp
        on
        serial_tmp.serial_data_src = ", table_name_in, ".",col_serial_data_src_in," 
    set
       ",table_name_in, ".",col_serial_data_dst_in, " = serial_tmp.serial_data_dst;"
);

prepare stmt from @ddl2;
execute stmt;
deallocate prepare stmt;

END

Bronnen