Serialised data (WordPress): verschil tussen versies

Uit De Vliegende Brigade
Naar navigatie springen Naar zoeken springen
Regel 194: Regel 194:
 
Error: Cannot create key "target" on data type string
 
Error: Cannot create key "target" on data type string
 
</pre>
 
</pre>
 +
 +
=== Tools to fix broken serialized data? ===
 +
 +
There is hope:
 +
 +
* https://btb.works/how-to-fix-a-broken-serialization/
 +
* https://stackoverflow.com/questions/15138893/fix-serialized-data-broken-due-to-editing-mysql-database-in-a-text-editor
 +
* https://stackoverflow.com/questions/3148712/regex-code-to-fix-corrupt-serialized-php-data/55074706#55074706
 +
* https://github.com/Blogestudio/Fix-Serialization
 +
* https://github.com/wsizoo/wordpress-fix-serialization
  
 
== PHP: Serialize & unserialize ==
 
== PHP: Serialize & unserialize ==

Versie van 29 aug 2022 18:20

Sometimes database fields contain serialized data: Multiple data in one field, which are concatenated using a certain syntax.

Although the concept of serialized data seems quite at odds with some basic databases principles, it offers flexibility that otherwise would be hard to get by. In addition, this is actually used to store Object-oriented objects - I hadn't thought of that before!

Example

In table wp_options, field widget_woocommerce_layered_nav encodes all Woocommerce widgets. This is an example of an online store in Oct. 2020.

Raw content:

a:14:{i:4;a:4:{s:5:"title";s:21:"Originele priri";s:9:"attribute";s:21:"originele priri";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}i:5;a:4:{s:5:"title";s:12:"Breedte (mm)";s:9:"attribute";s:7:"breedte";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}i:6;a:4:{s:5:"title";s:11:"Diepte (mm)";s:9:"attribute";s:6:"diepte";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}i:7;a:4:{s:5:"title";s:11:"Lengte (mm)";s:9:"attribute";s:6:"lengte";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}i:8;a:4:{s:5:"title";s:18:"Automatische stop?";s:9:"attribute";s:17:"automatische_stop";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}i:9;a:4:{s:5:"title";s:12:"Stroomdraad?";s:9:"attribute";s:11:"stroomdraad";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}i:10;a:4:{s:5:"title";s:10:"Connector?";s:9:"attribute";s:9:"connector";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}i:11;a:4:{s:5:"title";s:5:"Veer?";s:9:"attribute";s:4:"veer";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}i:12;a:4:{s:5:"title";s:7:"Diagram";s:9:"attribute";s:7:"diagram";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}i:13;a:4:{s:5:"title";s:19:"Ztoepassing - Model";s:9:"attribute";s:16:"toepassing_model";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}i:14;a:4:{s:5:"title";s:21:"Toepassing - Submodel";s:9:"attribute";s:19:"toepassing_submodel";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}i:15;a:4:{s:5:"title";s:17:"Azijn-productcode";s:9:"attribute";s:10:"azijn_code";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}i:16;a:4:{s:5:"title";s:11:"Productlijn";s:9:"attribute";s:11:"productlijn";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}s:12:"_multiwidget";i:1;}

The first few lines in more readable form:

a:14:
{
	i:4;a:4:
	{
		s:5:"title";
		s:21:"Originele pripripripr";
		s:9:"attribute";
		s:21:"originele pripripripr";
		s:12:"display_type";
		s:4:"list";
		s:10:"query_type";
		s:3:"and";
	}
	i:5;a:4:
	{
		s:5:"title";s:12:"Breedte (mm)";s:9:"attribute";s:7:"breedte";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";
	}
	i:6;a:4:
	{
		s:5:"title";s:11:"Diepte (mm)";s:9:"attribute";s:6:"diepte";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";
	}
	i:7;a:4:
	{
		s:5:"title";s:11:"Lengte (mm)";s:9:"attribute";s:6:"lengte";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";
	}
	i:8;a:4:
	{
		s:5:"title";s:18:"Automatische stop?";s:9:"attribute";s:17:"automatische_stop";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";
	}
	i:9;a:4:
	{
		s:5:"title";s:12:"Stroomdraad?";s:9:"attribute";s:11:"stroomdraad";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";
	}
}

The problem with serialised data

Working with serialised data in WordPress poses a specific problem: Prior to each string, its length in bytes in UTF8 (or something like that), is specified. If this number is incorrect, WordPress will refuse to process the entire field.

  • This means that it becomes quite difficult to process serialised data other than through appropriate API-calls. So no messing around directly in MySQL - Where otherwise this would be the only obvious solution.
  • But it gets worse: you cannot fix a messed-up serial field through WP-CLI functions that otherwise know how to handle serialised data. In order to fix messed-up serial fields. you probably have to recreate the whole object on a higher abstraction level (e.g.: Import a page afresh).

The problem - Example

On a site, page 51873 contains buttonbars. I can still retrieve all options for this page (including all buttonbars, implemented as ACF fields) with wp post meta list:

wp post meta list 51873

| post_id | meta_key                                    | meta_value                                                                                                                                       
+---------+---------------------------------------------+----------------------------------------------+
| 51873   | buttonbar_selection_0_image                 | 53521                                                                                                                                                
| 51873   | _buttonbar_selection_0_image                | field_5e450106734a3                                                                                                                                  
| 51873   | buttonbar_selection_0_link                  | a:3:{s:5:"title";s:0:"";s:3:"url";s:36:"https://example-canada.com/shop/blub/";s:6:"target";s:0:"";}                                          
| 51873   | _buttonbar_selection_0_link                 | field_5e450111734a4

Note that field buttonbar_selection_0_link contains serialised data. The problem: The string that is specified to be 36 characters/bytes/whatever, isn't that number. Hence you can't fetch the field using wp post meta get, eventhough the other fields around it, are still retrievable:

wp post meta get 51873 buttonbar_selection_0_title
wp post meta get 51873 buttonbar_selection_0_image
wp post meta get 51873 buttonbar_selection_0_link

Output:

Widget
53521

You can't directly update the compromised field

Looks promising, doesn't it?

wp post meta list 51873
wp post meta get 51873 buttonbar_selection_0_title
wp post meta get 51873 buttonbar_selection_0_image
wp post meta get 51873 buttonbar_selection_0_link

wp post meta pluck 51873 buttonbar_selection_0_link title
wp post meta pluck 51873 buttonbar_selection_0_link url
wp post meta pluck 51873 buttonbar_selection_0_link blub

wp post meta patch update 51873 buttonbar_selection_0_link url "https://example.com"

wp post meta list 51873
wp post meta get 51873 buttonbar_selection_0_title
wp post meta get 51873 buttonbar_selection_0_image
wp post meta get 51873 buttonbar_selection_0_link

But it's inforgiving. Even the non-compromised part title cannot be updated. The reading-part does work for a similar site without this problem (I didn't check writing there):

...
Error: No data exists for key "title"
Error: No data exists for key "url"
...

You can't insert into the compromised field

Pfff:

wp post meta list 51873 | grep buttonbar_selection_0_link


# Insert?
#
wp post meta patch insert 51873 buttonbar_selection_0_link title "Example.com"
wp post meta patch insert 51873 buttonbar_selection_0_link url "https://example.com"

Output:

Error: Cannot create key "title" on data type boolean
Error: Cannot create key "url" on data type boolean

You can't delete the subfields

By now, this was to be expected:

wp post meta patch delete 51873 buttonbar_selection_0_link title
wp post meta patch delete 51873 buttonbar_selection_0_link url

with output:

Error: No data exists for key "title"
Error: No data exists for key "url"

You can delete the field completely

Yep:

wp post meta delete 51873 buttonbar_selection_0_link
wp post meta list 51873 | grep buttonbar_selection_0_

Note that the field name including underscore (_buttonbar_selection_0_link), still exists, but that the version without underscore (buttonbar_selection_0_link) is gone:

51873	buttonbar_selection_0_title	Widget
51873	_buttonbar_selection_0_title	field_5e450082734a2
51873	buttonbar_selection_0_image	53521
51873	_buttonbar_selection_0_image	field_5e450106734a3
51873	_buttonbar_selection_0_link	field_5e450111734a4

How to recreate?

Now that the field is completely gone, how to recreate it? This doesn't seem to work:

wp post meta patch insert 51873 buttonbar_selection_0_link title "Blub"
wp post meta patch insert 51873 buttonbar_selection_0_link url "https://example.com"
wp post meta patch insert 51873 buttonbar_selection_0_link target ""

wp post meta list 51873 | grep buttonbar_selection_0_

Return:

Error: Cannot create key "title" on data type string
Error: Cannot create key "url" on data type string
Error: Cannot create key "target" on data type string

Tools to fix broken serialized data?

There is hope:

PHP: Serialize & unserialize

WordPress uses the PHP function serialize() to serialize data for storage in the database and the unserialize function for the opposite operation.

Example [1]:

$serialized = 'a:3:{i:0;s:5:"apple";i:1;s:6:"banana";i:2;s:6:"orange";}';
var_dump(unserialize($serialized));

Output:

Array
(
    [0] => apple
    [1] => banana
    [2] => orange
)

WP-CLI

WP-CLI can handle serialised data very well. The positional parameters pluck and patch are for reading and writing parts of serialised data

MySQL

MySQL does not seem to have built-in functions similar to the PHP functions serialize and unserialize. This would have been possible: MySQL, for example, does have functions to deal with XML.

Syntaxis

Dankzij het voorbeeld hierboven, kun je al een deel van de syntaxis achterhalen:

  • Data-elementen zijn omhuld door { en }
  • a: Array
  • i: Counter; index
  • s: String + lengte. Bv. s:5"apple".

Stringlengte

Strings worden voorafgegaan door de lengte van de betreffende string, gemeten in UTF8-bytes. Als dit getal niet klopt, wordt de betreffende serialised data, niet verwerkt.

Voorbeelden:

# The string "0...9" is exactly 10 bytes long: Every character is exactly one byte
#
s:10:"0123456789"
# The letter "è" takes two bytes, hence the whole string is 7 bytes long, although only 4 characters long
#
s:7:"Bèèè"
concat('s:27:"Balais de charbon d', char(39), 'origine"')
#
# * Balais de charbon d	→ 19 characters & 19 bytes
# * char(39) 		→ 1 character & 1 byte
# * origine		→ 7 characters & 7 bytes
#
# * Total: 27 bytes

Enveloping symbols

  • Usually, you can choose to envelop a string within 'apostrophes' or within "quotes"
  • However, we have a complication here: The actual string already contains both apostrophes & quotes
  • The way to solve this: Use apostrophes for enveloping the string. When an apostrophe is needed inside the string, and use char(39) to represent the apostrophe inside the string
  • BTW: char(39) counts as one byte

Voorbeeld:

concat('s:27:"Balais de charbon d', char(39), 'origine"')
#
# * The string (inside the apostrophes) becomes
# * s:27:"Balais de charbon d'origine"

MySQL - Sproc replace_serial

Dit was een leuk puzzeltje om te doen, maar het practische nut is helaas beperkt.

CREATE DEFINER=`strompf`@`localhost` PROCEDURE `replace_serial`(
	in table_name_in		tinytext,	# Source table name
        in col_serial_data_src_in	tinytext,	# Column with source serialised data
        in col_serial_data_dst_in	tinytext,	# Column with destination serialised data
        in col_src_in			tinytext,	# Column with source text strings
        in col_dst_in			tinytext	# Column with desination text string
    )
BEGIN
#
# Find-and-replace within serialised content
################################################################################
#
# * This sproc "replace_serial" needs as input pointers to a table 
#   ("pointer table") with the following columns:
#   * Column with the source serialised data
#   * Column for storing the updated serialised data
#   * Column with find-string
#   * Column with replace-strings
# * It can be any table: The arguments for replace_serial() contains the name
#   of this table and the names of those four columns
# * Internally, data from this input-table is copied to temporary table 
#   "serial_tmp"
# * The designated serialised data is decomposed into several fields (again,
#   in table "serial_tmp") - Currently, this sproc only works for serialised 
#   data with 1 a-element, followed by 6 s-elements. This sproc has not been 
#   tested with other configurations
# * Find-and-replace is only performed on element s4 (=the 4th s-element)
# * Subsequently, the new length of s4 is calculated, and the serialised string
#   is written to the designated column in the "pointer table"
# * Subsequently, it's up to the programmer to process the resulting data in
#   the pointer table
# * This approach with a "pointer table" works really well: Easy to grasp,
#   flexible and easy to debug. It's especially handy for debugging or further
#   processing that the output is in a separate column from the input.
# * Copyright Jeroen Strompf - De Vliegende Brigade - 2021.03.09
#
#
# History
########################################
#
# * 2021.03.09: Created, tested and pushed to "bal_dwh_org"
#
#
################################################################################
# Define working table "serial_tmp"
################################################################################
#
drop table if exists 
	serial_tmp;
create temporary table
	serial_tmp
    (
	serial_data_src	varchar(255)	comment "Actual serialised data",	
        serial_data_dst	varchar(255)	comment "Resulting serialised data. This is a separate field, to make debugging easier",
        find_string	varchar(255)	comment "String to be found within serial_data_src",
        replace_string	varchar(255)	comment "Replacement string for within serial_data_src, resulting in serial_data_dst",

        a_len		smallint,
        a_string	varchar(255)	comment "The string within the accolade",
        s1		varchar(255)	comment "Complete s-part, including length statement and string",
	s2		varchar(255),
	s3		varchar(255),
	s4		varchar(255)	comment "Complete s4-string",
        s4_len_src	smallint	comment "Original length of s4",
        s4_string_src	varchar(255)	comment "Original s4 string",
        s4_len_dst	smallint	comment "New length",
        s4_string_dst	varchar(255)	comment "New string",
	s5		varchar(255),
	s6		varchar(255)
    );


################################################################################
# Copy working data to working table
################################################################################
#
set @ddl=concat
(
	"insert into serial_tmp 
    (
	serial_data_src,
        find_string,
        replace_string
    )
    select ",
	col_serial_data_src_in, ", ",
        col_src_in, ", ",
        col_dst_in, " ",
	"from ",
	    table_name_in, ";"
);

prepare stmt from @ddl;
execute stmt;
deallocate prepare stmt;


################################################################################
# Decompose serialised data
################################################################################
#
# Leading a-field
########################################
#
update
	serial_tmp
set
	a_len		= substring_index(substring_index(serial_data_src, ":", 2), ":", -1),
	a_string	= substring_index(substring_index(serial_data_src, "}",1), "{", -1);


# Complete s-fields
########################################
#
update
    serial_tmp
set
    s1 = substring_index(a_string, ";", 1),
    s2 = substring_index(substring_index(a_string, ";", 2), ";", -1),
    s3 = substring_index(substring_index(a_string, ";", 3), ";", -1),
    s4 = substring_index(substring_index(a_string, ";", 4), ";", -1),
    s5 = substring_index(substring_index(a_string, ";", 5), ";", -1),
    s6 = substring_index(substring_index(a_string, ";", 6), ";", -1);


# Further decompose s4
########################################
#
update
    serial_tmp
set
    s4_len_src	  = substring_index(substring_index(s4, ":", 2), ":", -1),
    s4_string_src = substring_index(substring_index(s4, '"', 2), '"', -1);


################################################################################
# Find-and-replace
################################################################################
#
# Finally! The heart of the matter
#
update
    serial_tmp
set
    s4_string_dst = replace(s4_string_src, find_string, replace_string),
    s4_len_dst 	  = length(s4_string_dst);


################################################################################
# Reassemble
################################################################################
#
update
    serial_tmp
set
    serial_data_dst = concat
    (
        "a:",a_len, ":",
        "{",
	    s1, ";", s2, ";", s3, ";",
            "s:",s4_len_dst,":",
            char(34),
		s4_string_dst,
	    char(34),
            ";", s5, ";", s6, ";",
	"}"
    );


################################################################################
# Copy results back to the original input table
################################################################################
#
# * Fill in the result in column "col_serial_data_dst_in"
#
set @ddl2=concat
(
    "update ", table_name_in, " 
    join 
        serial_tmp
        on
        serial_tmp.serial_data_src = ", table_name_in, ".",col_serial_data_src_in," 
    set
       ",table_name_in, ".",col_serial_data_dst_in, " = serial_tmp.serial_data_dst;"
);

prepare stmt from @ddl2;
execute stmt;
deallocate prepare stmt;

END

Bronnen