Serialised data (WordPress)
Sometimes database fields contain serialized data: Multiple data in one field, which are concatenated using a certain syntax.
Although the concept of serialized data seems quite at odds with some basic databases principles, it offers flexibility that otherwise would be hard to get by. In addition, this is actually used to store Object-oriented objects - I hadn't thought of that before!
Example
In table wp_options
, field widget_woocommerce_layered_nav
encodes all Woocommerce widgets. This is an example of an online store in Oct. 2020.
Raw content:
a:14:{i:4;a:4:{s:5:"title";s:21:"Originele priri";s:9:"attribute";s:21:"originele priri";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}i:5;a:4:{s:5:"title";s:12:"Breedte (mm)";s:9:"attribute";s:7:"breedte";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}i:6;a:4:{s:5:"title";s:11:"Diepte (mm)";s:9:"attribute";s:6:"diepte";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}i:7;a:4:{s:5:"title";s:11:"Lengte (mm)";s:9:"attribute";s:6:"lengte";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}i:8;a:4:{s:5:"title";s:18:"Automatische stop?";s:9:"attribute";s:17:"automatische_stop";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}i:9;a:4:{s:5:"title";s:12:"Stroomdraad?";s:9:"attribute";s:11:"stroomdraad";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}i:10;a:4:{s:5:"title";s:10:"Connector?";s:9:"attribute";s:9:"connector";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}i:11;a:4:{s:5:"title";s:5:"Veer?";s:9:"attribute";s:4:"veer";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}i:12;a:4:{s:5:"title";s:7:"Diagram";s:9:"attribute";s:7:"diagram";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}i:13;a:4:{s:5:"title";s:19:"Ztoepassing - Model";s:9:"attribute";s:16:"toepassing_model";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}i:14;a:4:{s:5:"title";s:21:"Toepassing - Submodel";s:9:"attribute";s:19:"toepassing_submodel";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}i:15;a:4:{s:5:"title";s:17:"Azijn-productcode";s:9:"attribute";s:10:"azijn_code";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}i:16;a:4:{s:5:"title";s:11:"Productlijn";s:9:"attribute";s:11:"productlijn";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}s:12:"_multiwidget";i:1;}
The first few lines in more readable form:
a:14: { i:4;a:4: { s:5:"title"; s:21:"Originele pripripripr"; s:9:"attribute"; s:21:"originele pripripripr"; s:12:"display_type"; s:4:"list"; s:10:"query_type"; s:3:"and"; } i:5;a:4: { s:5:"title";s:12:"Breedte (mm)";s:9:"attribute";s:7:"breedte";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and"; } i:6;a:4: { s:5:"title";s:11:"Diepte (mm)";s:9:"attribute";s:6:"diepte";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and"; } i:7;a:4: { s:5:"title";s:11:"Lengte (mm)";s:9:"attribute";s:6:"lengte";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and"; } i:8;a:4: { s:5:"title";s:18:"Automatische stop?";s:9:"attribute";s:17:"automatische_stop";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and"; } i:9;a:4: { s:5:"title";s:12:"Stroomdraad?";s:9:"attribute";s:11:"stroomdraad";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and"; } }
The problem with serialised data
Working with serialised data in WordPress poses a specific problem: Prior to each string, its length in bytes in UTF8 (or something like that), is specified. If this number is incorrect, WordPress will refuse to process the entire field.
- This means that it becomes quite difficult to process serialised data other than through appropriate API-calls. So no messing around directly in MySQL - Where otherwise this would be the only obvious solution.
- But it gets worse: you cannot fix a messed-up serial field through WP-CLI functions that otherwise know how to handle serialised data. In order to fix messed-up serial fields. you probably have to recreate the whole object on a higher abstraction level (e.g.: Import a page afresh).
The problem - Example
On a site, page 51873 contains buttonbars. I can still retrieve all options for this page (including all buttonbars, implemented as ACF fields) with wp post meta list
:
wp post meta list 51873 | post_id | meta_key | meta_value +---------+---------------------------------------------+----------------------------------------------+ | 51873 | buttonbar_selection_0_image | 53521 | 51873 | _buttonbar_selection_0_image | field_5e450106734a3 | 51873 | buttonbar_selection_0_link | a:3:{s:5:"title";s:0:"";s:3:"url";s:36:"https://example-canada.com/shop/blub/";s:6:"target";s:0:"";} | 51873 | _buttonbar_selection_0_link | field_5e450111734a4
Note that field buttonbar_selection_0_link
contains serialised data. The problem: The string that is specified to be 36 characters/bytes/whatever, isn't that number. Hence you can't fetch the field using wp post meta get
, eventhough the other fields around it, are still retrievable:
wp post meta get 51873 buttonbar_selection_0_title wp post meta get 51873 buttonbar_selection_0_image wp post meta get 51873 buttonbar_selection_0_link
Output:
Widget 53521
You can't directly update the compromised field
Looks promising, doesn't it?
wp post meta list 51873 wp post meta get 51873 buttonbar_selection_0_title wp post meta get 51873 buttonbar_selection_0_image wp post meta get 51873 buttonbar_selection_0_link wp post meta pluck 51873 buttonbar_selection_0_link title wp post meta pluck 51873 buttonbar_selection_0_link url wp post meta pluck 51873 buttonbar_selection_0_link blub wp post meta patch update 51873 buttonbar_selection_0_link url "https://example.com" wp post meta list 51873 wp post meta get 51873 buttonbar_selection_0_title wp post meta get 51873 buttonbar_selection_0_image wp post meta get 51873 buttonbar_selection_0_link
But it's inforgiving. Even the non-compromised part title cannot be updated. The reading-part does work for a similar site without this problem (I didn't check writing there):
... Error: No data exists for key "title" Error: No data exists for key "url" ...
You can't insert into the compromised field
Pfff:
wp post meta list 51873 | grep buttonbar_selection_0_link # Insert? # wp post meta patch insert 51873 buttonbar_selection_0_link title "Example.com" wp post meta patch insert 51873 buttonbar_selection_0_link url "https://example.com"
Output:
Error: Cannot create key "title" on data type boolean Error: Cannot create key "url" on data type boolean
You can't delete the subfields
By now, this was to be expected:
wp post meta patch delete 51873 buttonbar_selection_0_link title wp post meta patch delete 51873 buttonbar_selection_0_link url
with output:
Error: No data exists for key "title" Error: No data exists for key "url"
You can delete the field completely
Yep:
wp post meta delete 51873 buttonbar_selection_0_link wp post meta list 51873 | grep buttonbar_selection_0_
Note that the field name including underscore (_buttonbar_selection_0_link
), still exists, but that the version without underscore (buttonbar_selection_0_link
) is gone:
51873 buttonbar_selection_0_title Widget 51873 _buttonbar_selection_0_title field_5e450082734a2 51873 buttonbar_selection_0_image 53521 51873 _buttonbar_selection_0_image field_5e450106734a3 51873 _buttonbar_selection_0_link field_5e450111734a4
How to recreate?
Now that the field is completely gone, how to recreate it? This doesn't seem to work:
wp post meta patch insert 51873 buttonbar_selection_0_link title "Blub" wp post meta patch insert 51873 buttonbar_selection_0_link url "https://example.com" wp post meta patch insert 51873 buttonbar_selection_0_link target "" wp post meta list 51873 | grep buttonbar_selection_0_
Return:
Error: Cannot create key "title" on data type string Error: Cannot create key "url" on data type string Error: Cannot create key "target" on data type string
Tools to fix broken serialized data - Inventory
There is hope:
- https://btb.works/how-to-fix-a-broken-serialization/ ← I used this one. See elsewhere
- https://stackoverflow.com/questions/15138893/fix-serialized-data-broken-due-to-editing-mysql-database-in-a-text-editor
- https://stackoverflow.com/questions/3148712/regex-code-to-fix-corrupt-serialized-php-data/55074706#55074706
- https://github.com/Blogestudio/Fix-Serialization
- https://github.com/wsizoo/wordpress-fix-serialization
fix-serialization.php
I used the fix-serializatin.php script from https://btb.works/how-to-fix-a-broken-serialization/. It works very well, except for a critical bug: $content
should be $data
. Otherwise, an empty file is written. My version of the script:
<?php # # Fix serial data string length errors ################################################################################ # # @author: Wojtek Szałkiewicz - wojtek at bracketspace.com # @license: GPL version 3 or later - http://www.gnu.org/licenses/gpl.txt # Based on Pau Iglesias http://blogestudio.com package # # # Usage ######################################## # # * php fix-serialization.php my-sql-file.sql # * php fix-serialization.php my-input-sql-file.sql my-output-sql-file.sql # # # Known errors ######################################## # # * Memory size exhausted: Allowed memory size of 67108864 bytes exhausted # (tried to allocate 35266489 bytes) How to fix: update php.ini # memory_limit to 512M or more, and restart cgi service or webserver # * Function preg_replace returns null or 0 length string If preg_last_error = # PREG_BACKTRACK_LIMIT_ERROR (value 2), increase pcre.backtrack_limit in # php.ini (by default 100k, change to 2M by example) Same way for others # preg_last_error codes: # http://www.php.net/manual/en/#unction.preg-last-error.php # # ################################################################################ # unescape_mysql() ################################################################################ # function unescape_mysql( $value ) { # # * Remove redundant MySQL database dump escape codes # * Used in main code below # return str_replace( [ "\\\\", "\\0", "\\n", "\\r", "\Z", "\'", '\"' ], [ "\\", "\0", "\n", "\r", "\x1a", "'", '"' ], $value ); } ################################################################################ # unescape_quotes() ################################################################################ # function unescape_quotes( $value ) { # * Remove escape character "\" before a double quote # * Used in main code below # return str_replace('\"', '"', $value); } ################################################################################ # Initialisation ################################################################################ # # * Read file into $data # * Reset counters # # # Check for presence of input file name ######################################## # if ( ! isset( $argv ) || ! isset( $argv[1] ) ) { # echo "Error: no input file specified\n\n"; exit; # } # Store filename in "$input ######################################## # $input = $argv[1]; # Check for existence of input file ######################################## # if ( ! file_exists( $input ) ) { # echo "Error: input file does not exists\n"; echo "{$input}\n\n"; exit; # } # Store file contents in "$data" ######################################## # $data = file_get_contents( $input ); # Check for succesful file reading ######################################## # if ( ! $data ) { # echo "Error: can`t read data from input file\n"; echo "{$input}\n\n"; exit; } # Check that file has content ######################################## # if ( ! isset( $data ) || ! ( strlen( $data ) > 0 ) ) { # echo "Warning: the file is empty or can't read contents\n"; echo "{$input}\n\n"; exit; # } # Set counters to 0 ######################################## # $processed = 0; $fixed = 0; ################################################################################ # Main code: Search & replace ################################################################################ # $data = preg_replace_callback('!s:(\d+):([\\\\]?"[\\\\]?"|[\\\\]?"((.*?)[^\\\\])[\\\\]?");!', function( $matches ) use ( &$processed, &$fixed ) { # # * preg_replace_callback — Perform a regular expression search and replace # using a callback # https://www.php.net/manual/en/function.preg-replace-callback.php # * This seems to be the main function. It seems to work recursively or # something. # Quite impressive to see the input argument # * The data is continuously stored in "$data" # # # Found a string statement? ######################################## # if ( isset( $matches[3] ) ) { # # Recalculate string length ######################################## # $len = strlen( unescape_mysql( $matches[3] ) ); $str = unescape_quotes( $matches[3] ); $result = "s:{$len}:\"$str\";"; # Do old and new string length differ? ######################################## # if ( $result !== $matches[0] ) { # Yes: Update fix counter ######################################## # $fixed++; } } # No string statement found (?) ######################################## # else { # # Just return the unchanged argument ######################################## # $result = $matches[0]; } # Increass string counter ######################################## # $processed++; return $result; }, $data); ################################################################################ # Check results from "preg_replace" ################################################################################ # if ( ! ( isset( $data ) && strlen( $data ) > 0 ) ) { // Error echo "Error: preg_replace returns nothing\n"; if ( function_exists( 'preg_last_error' ) ) { $last_error = preg_last_error(); echo "preg_last_error() = {$last_error}\n"; } echo "{$input}\n\n"; exit; } ################################################################################ # Write to old or new file? ################################################################################ # if ( isset( $argv[2] ) ) { $output = $argv[2]; # echo "Output file name has been provided: {$argv[2]}\n"; } else { $output = $input; # echo "No output file name has been provided - Overwrite input file\n"; } ################################################################################ # Write to file ################################################################################ # # * file_put_contents($filename, $data) is a standard PHP function # * Original first line: $result = file_put_contents( $output, $content ); # It seems that "$content" should have been "$data". Otherwise, an empty file # is written # # $result = file_put_contents( $output, $data ); // Write file data if ( false === $result ) { // Error echo "Error: can't write fixed content\n"; echo "{$output}\n\n"; exit; } ################################################################################ # Exit ################################################################################ # echo "Found strings: {$processed}\n"; echo "Fixed: {$fixed}\n";
Overall script
To fix multiple sites:
#!/bin/bash # ################################################################################ # fix_db() ################################################################################ # function fix_db { # # Verify variables ####################################### # if [ -z "$db_name" ]; then echo " Variable 'db_name' not provided. Exiting"; exit; fi if [ -z "$path" ]; then echo " Variable 'path' not provided. Exiting"; exit; fi if [ -z "$home" ]; then echo " Variable 'home' not provided. Exiting"; exit; fi # Create db dump ####################################### # echo "Create db dump (takes about 20s)..." # mysqldump $db_name > ${db_name}.sql # Fix db ####################################### # echo "Fix db..." php ~/scripts/fix-serialization.php ${db_name}.sql ${db_name}-fixed.sql # Import db ####################################### # echo "Import db again (takes about 70s)..." mysql $db_name < ${db_name}-fixed.sql # Delete transients ####################################### # cd $path wp transient delete --all cd $home } ################################################################################ # Main ################################################################################ # home=$PWD # en ####################################### # db_name="example_de" path="/var/www/example.de" fix_db # es ####################################### # db_name="example_es" path="/var/www/example.es" fix_db
PHP: Serialize & unserialize
WordPress uses the PHP function serialize()
to serialize data for storage in the database and the unserialize
function for the opposite operation.
Example [1]:
$serialized = 'a:3:{i:0;s:5:"apple";i:1;s:6:"banana";i:2;s:6:"orange";}'; var_dump(unserialize($serialized)); Output: Array ( [0] => apple [1] => banana [2] => orange )
WP-CLI
WP-CLI can handle serialised data very well. The positional parameters pluck
and patch
are for reading and writing parts of serialised data
MySQL
MySQL does not seem to have built-in functions similar to the PHP functions serialize and unserialize. This would have been possible: MySQL, for example, does have functions to deal with XML.
Syntaxis
Dankzij het voorbeeld hierboven, kun je al een deel van de syntaxis achterhalen:
- Data-elementen zijn omhuld door
{
en}
a
: Arrayi
: Counter; indexs
: String + lengte. Bv.s:5"apple"
.
Stringlengte
Strings worden voorafgegaan door de lengte van de betreffende string, gemeten in UTF8-bytes. Als dit getal niet klopt, wordt de betreffende serialised data, niet verwerkt.
Voorbeelden:
# The string "0...9" is exactly 10 bytes long: Every character is exactly one byte # s:10:"0123456789"
# The letter "è" takes two bytes, hence the whole string is 7 bytes long, although only 4 characters long # s:7:"Bèèè"
concat('s:27:"Balais de charbon d', char(39), 'origine"') # # * Balais de charbon d → 19 characters & 19 bytes # * char(39) → 1 character & 1 byte # * origine → 7 characters & 7 bytes # # * Total: 27 bytes
Enveloping symbols
- Usually, you can choose to envelop a string within 'apostrophes' or within "quotes"
- However, we have a complication here: The actual string already contains both apostrophes & quotes
- The way to solve this: Use apostrophes for enveloping the string. When an apostrophe is needed inside the string, and use char(39) to represent the apostrophe inside the string
- BTW:
char(39)
counts as one byte
Voorbeeld:
concat('s:27:"Balais de charbon d', char(39), 'origine"') # # * The string (inside the apostrophes) becomes # * s:27:"Balais de charbon d'origine"
Bronnen
- https://stackoverflow.com/questions/13319638/working-with-serialized-data-in-wordpress
- https://stackoverflow.com/questions/46908387/how-to-use-a-serialized-or-json-encoded-attribute-in-a-mysql-stored-procedure
- https://stackoverflow.com/questions/8922668/updating-base64-encode-serialized-array-on-mysql-databases - unserialize, re-serialize
- https://stackoverflow.com/questions/5335970/unserialize-through-query-at-database-level-itself
- https://stackoverflow.com/questions/62980734/deserialize-and-serialize-data-in-mysql - Same use case as me. I've posted my sproc here
- https://wpengine.com/support/wordpress-serialized-data