Serialised data (WordPress)

Uit De Vliegende Brigade
Naar navigatie springen Naar zoeken springen

Sometimes database fields contain serialized data: Multiple data in one field, concatenated using a certain syntax. Although the concept of serialized data seems quite at odds with some basic databases principles, there is an obvious application for: Storing objects.

However, compaired to e.g., YAML, this nameless protocol that WordPress uses, has one major issue: Before each string, the length of that string is specified. When you change the string but fail to update this string-count, the object is not readable anymore. See below for a solution.

Example

In table wp_options, field widget_woocommerce_layered_nav encodes all Woocommerce widgets. This is an example of an online store in Oct. 2020.

Raw content:

a:14:{i:4;a:4:{s:5:"title";s:21:"Originele priri";s:9:"attribute";s:21:"originele priri";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}i:5;a:4:{s:5:"title";s:12:"Breedte (mm)";s:9:"attribute";s:7:"breedte";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}i:6;a:4:{s:5:"title";s:11:"Diepte (mm)";s:9:"attribute";s:6:"diepte";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}i:7;a:4:{s:5:"title";s:11:"Lengte (mm)";s:9:"attribute";s:6:"lengte";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}i:8;a:4:{s:5:"title";s:18:"Automatische stop?";s:9:"attribute";s:17:"automatische_stop";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}i:9;a:4:{s:5:"title";s:12:"Stroomdraad?";s:9:"attribute";s:11:"stroomdraad";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}i:10;a:4:{s:5:"title";s:10:"Connector?";s:9:"attribute";s:9:"connector";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}i:11;a:4:{s:5:"title";s:5:"Veer?";s:9:"attribute";s:4:"veer";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}i:12;a:4:{s:5:"title";s:7:"Diagram";s:9:"attribute";s:7:"diagram";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}i:13;a:4:{s:5:"title";s:19:"Ztoepassing - Model";s:9:"attribute";s:16:"toepassing_model";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}i:14;a:4:{s:5:"title";s:21:"Toepassing - Submodel";s:9:"attribute";s:19:"toepassing_submodel";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}i:15;a:4:{s:5:"title";s:17:"Azijn-productcode";s:9:"attribute";s:10:"azijn_code";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}i:16;a:4:{s:5:"title";s:11:"Productlijn";s:9:"attribute";s:11:"productlijn";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}s:12:"_multiwidget";i:1;}

The first few lines in more readable form:

a:14:
{
	i:4;a:4:
	{
		s:5:"title";
		s:21:"Originele pripripripr";
		s:9:"attribute";
		s:21:"originele pripripripr";
		s:12:"display_type";
		s:4:"list";
		s:10:"query_type";
		s:3:"and";
	}
	i:5;a:4:
	{
		s:5:"title";s:12:"Breedte (mm)";s:9:"attribute";s:7:"breedte";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";
	}
	i:6;a:4:
	{
		s:5:"title";s:11:"Diepte (mm)";s:9:"attribute";s:6:"diepte";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";
	}
	i:7;a:4:
	{
		s:5:"title";s:11:"Lengte (mm)";s:9:"attribute";s:6:"lengte";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";
	}
	i:8;a:4:
	{
		s:5:"title";s:18:"Automatische stop?";s:9:"attribute";s:17:"automatische_stop";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";
	}
	i:9;a:4:
	{
		s:5:"title";s:12:"Stroomdraad?";s:9:"attribute";s:11:"stroomdraad";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";
	}
}

The problem with serialised data

Working with serialised data in WordPress poses a specific problem: Prior to each string, its length in bytes in UTF8 (or something like that), is specified. If this number is incorrect, WordPress will refuse to process the entire field.

  • This means that it becomes quite difficult to process serialised data other than through appropriate API-calls. So no messing around directly in MySQL - Where otherwise this would be the only obvious solution.
  • But it gets worse: you cannot fix a messed-up serial field through WP-CLI functions that otherwise know how to handle serialised data. In order to fix messed-up serial fields. you probably have to recreate the whole object on a higher abstraction level (e.g.: Import a page afresh).

The problem - Example

On a site, page 51873 contains buttonbars. I can still retrieve all options for this page (including all buttonbars, implemented as ACF fields) with wp post meta list:

wp post meta list 51873

| post_id | meta_key                                    | meta_value                                                                                                                                       
+---------+---------------------------------------------+----------------------------------------------+
| 51873   | buttonbar_selection_0_image                 | 53521                                                                                                                                                
| 51873   | _buttonbar_selection_0_image                | field_5e450106734a3                                                                                                                                  
| 51873   | buttonbar_selection_0_link                  | a:3:{s:5:"title";s:0:"";s:3:"url";s:36:"https://example-canada.com/shop/blub/";s:6:"target";s:0:"";}                                          
| 51873   | _buttonbar_selection_0_link                 | field_5e450111734a4

Note that field buttonbar_selection_0_link contains serialised data. The problem: The string that is specified to be 36 characters/bytes/whatever, isn't that number. Hence you can't fetch the field using wp post meta get, eventhough the other fields around it, are still retrievable:

wp post meta get 51873 buttonbar_selection_0_title
wp post meta get 51873 buttonbar_selection_0_image
wp post meta get 51873 buttonbar_selection_0_link

Output:

Widget
53521

You can't directly update the compromised field

Looks promising, doesn't it?

wp post meta list 51873
wp post meta get 51873 buttonbar_selection_0_title
wp post meta get 51873 buttonbar_selection_0_image
wp post meta get 51873 buttonbar_selection_0_link

wp post meta pluck 51873 buttonbar_selection_0_link title
wp post meta pluck 51873 buttonbar_selection_0_link url
wp post meta pluck 51873 buttonbar_selection_0_link blub

wp post meta patch update 51873 buttonbar_selection_0_link url "https://example.com"

wp post meta list 51873
wp post meta get 51873 buttonbar_selection_0_title
wp post meta get 51873 buttonbar_selection_0_image
wp post meta get 51873 buttonbar_selection_0_link

But it's unforgiving. Even the non-compromised part title cannot be updated. The reading-part does work for a similar site without this problem (I didn't check writing there):

...
Error: No data exists for key "title"
Error: No data exists for key "url"
...

You can't insert into the compromised field

Pfff:

wp post meta list 51873 | grep buttonbar_selection_0_link


# Insert?
#
wp post meta patch insert 51873 buttonbar_selection_0_link title "Example.com"
wp post meta patch insert 51873 buttonbar_selection_0_link url "https://example.com"

Output:

Error: Cannot create key "title" on data type boolean
Error: Cannot create key "url" on data type boolean

You can't delete the subfields

By now, this was to be expected:

wp post meta patch delete 51873 buttonbar_selection_0_link title
wp post meta patch delete 51873 buttonbar_selection_0_link url

with output:

Error: No data exists for key "title"
Error: No data exists for key "url"

You can delete the field completely

Yep:

wp post meta delete 51873 buttonbar_selection_0_link
wp post meta list 51873 | grep buttonbar_selection_0_

Note that the field name including underscore (_buttonbar_selection_0_link), still exists, but that the version without underscore (buttonbar_selection_0_link) is gone:

51873	buttonbar_selection_0_title	Widget
51873	_buttonbar_selection_0_title	field_5e450082734a2
51873	buttonbar_selection_0_image	53521
51873	_buttonbar_selection_0_image	field_5e450106734a3
51873	_buttonbar_selection_0_link	field_5e450111734a4

How to recreate?

Now that the field is completely gone, how to recreate it? This doesn't seem to work:

wp post meta patch insert 51873 buttonbar_selection_0_link title "Blub"
wp post meta patch insert 51873 buttonbar_selection_0_link url "https://example.com"
wp post meta patch insert 51873 buttonbar_selection_0_link target ""

wp post meta list 51873 | grep buttonbar_selection_0_

Return:

Error: Cannot create key "title" on data type string
Error: Cannot create key "url" on data type string
Error: Cannot create key "target" on data type string

Tools to fix broken serialized data - Inventory

There is hope:

fix-serialization.php

Around 2021 & 2022, I have used the fix-serializatin.php script from https://btb.works/how-to-fix-a-broken-serialization/. It works very well, except for a critical bug: $content should be $data. Otherwise, an empty file is written. My version of the script:

<?php
#
# Fix serial data string length errors
################################################################################
# 
# @author: Wojtek Szałkiewicz - wojtek  at  bracketspace.com
# @license: GPL version 3 or later - http://www.gnu.org/licenses/gpl.txt
# Based on Pau Iglesias http://blogestudio.com package
#
#
# Usage
########################################
#
# * php fix-serialization.php my-sql-file.sql
# * php fix-serialization.php my-input-sql-file.sql my-output-sql-file.sql
#
#
# Known errors
########################################
#
# * Memory size exhausted: Allowed memory size of 67108864 bytes exhausted
#   (tried to allocate 35266489 bytes) How to fix: update php.ini 
#   memory_limit to 512M or more, and restart cgi service or webserver
# * Function preg_replace returns null or 0 length string If preg_last_error =
#   PREG_BACKTRACK_LIMIT_ERROR (value 2), increase pcre.backtrack_limit in 
#   php.ini (by default 100k, change to 2M by example) Same way for others
#   preg_last_error codes: 
#   http://www.php.net/manual/en/#unction.preg-last-error.php
#
#
################################################################################
# unescape_mysql()
################################################################################
#
function unescape_mysql( $value )
{
	#
	# * Remove redundant MySQL database dump escape codes
	# * Used in main code below
	#
	return str_replace(
		[ "\\\\", "\\0", "\\n", "\\r", "\Z",  "\'", '\"' ],
		[ "\\",   "\0",  "\n",  "\r",  "\x1a", "'", '"' ],
		$value
	);
}


################################################################################
# unescape_quotes()
################################################################################
#
function unescape_quotes( $value )
{
	# * Remove escape character "\" before a double quote
	# * Used in main code below
	#
	return str_replace('\"', '"', $value);
}


################################################################################
# Initialisation
################################################################################
#
# * Read file into $data
# * Reset counters
#
#
# Check for presence of input file name
########################################
#
if ( ! isset( $argv ) || ! isset( $argv[1] ) )
{
	#
	echo "Error: no input file specified\n\n";
	exit;
	#
}


# Store filename in "$input
########################################
#
$input = $argv[1];


# Check for existence of input file
########################################
#
if ( ! file_exists( $input ) )
{
	#
	echo "Error: input file does not exists\n";
	echo "{$input}\n\n";
	exit;
	#
}


# Store file contents in "$data"
########################################
#
$data = file_get_contents( $input );


# Check for succesful file reading
########################################
#
if ( ! $data ) 
{
	#
	echo "Error: can`t read data from input file\n";
	echo "{$input}\n\n";
	exit;
}


# Check that file has content
########################################
#
if ( ! isset( $data ) || ! ( strlen( $data ) > 0 ) )
{
	#	
	echo "Warning: the file is empty or can't read contents\n";
	echo "{$input}\n\n";
	exit;
	#
}


# Set counters to 0
########################################
#
$processed = 0;
$fixed     = 0;


################################################################################
# Main code: Search & replace
################################################################################
#
$data = preg_replace_callback('!s:(\d+):([\\\\]?"[\\\\]?"|[\\\\]?"((.*?)[^\\\\])[\\\\]?");!', function( $matches ) use ( &$processed, &$fixed )
{
	#
	# * preg_replace_callback — Perform a regular expression search and replace 
	#   using a callback
	#   https://www.php.net/manual/en/function.preg-replace-callback.php
	# * This seems to be the main function. It seems to work recursively or
	#   something.
	#   Quite impressive to see the input argument
	# * The data is continuously stored in "$data"
	#
	#
	# Found a string statement?
	########################################
	#
	if ( isset( $matches[3] ) )
	{
		#
		# Recalculate string length
		########################################
		#
		$len = strlen( unescape_mysql( $matches[3] ) );
		$str = unescape_quotes( $matches[3] );
		$result = "s:{$len}:\"$str\";";


		# Do old and new string length differ?
		########################################
		#
		if ( $result !== $matches[0] )
		{
			# Yes: Update fix counter
			########################################
			#
			$fixed++;
		}
	}

	# No string statement found (?)
	########################################
	#
	else 
	{
		#
		# Just return the unchanged argument
		########################################
		#
		$result = $matches[0];
	}


	# Increass string counter
	########################################
	#
	$processed++;

	return $result;
	
}, $data);


################################################################################
# Check results from "preg_replace"
################################################################################
#
if ( ! ( isset( $data ) && strlen( $data ) > 0 ) )
{
	// Error
	echo "Error: preg_replace returns nothing\n";

	if ( function_exists( 'preg_last_error' ) ) {
		$last_error = preg_last_error();
		echo "preg_last_error() = {$last_error}\n";
	}

	echo "{$input}\n\n";
	exit;
}


################################################################################
# Write to old or new file?
################################################################################
#
if ( isset( $argv[2] ) )
{
  $output = $argv[2];
  # echo "Output file name has been provided: {$argv[2]}\n";
} else
{
  $output = $input;
  # echo "No output file name has been provided - Overwrite input file\n";
}


################################################################################
# Write to file
################################################################################
#
# * file_put_contents($filename, $data) is a standard PHP function
# * Original first line: $result = file_put_contents( $output, $content );
#   It seems that "$content" should have been "$data". Otherwise, an empty file
#   is written
#
#
$result = file_put_contents( $output, $data );
// Write file data
if ( false === $result ) {
	// Error
	echo "Error: can't write fixed content\n";
	echo "{$output}\n\n";
	exit;
}


################################################################################
# Exit
################################################################################
#
echo "Found strings: {$processed}\n";
echo "Fixed: {$fixed}\n";

fix-serialization.php - Example

I use this self-developed Bash function tourlib-general-settings.sh » fix_corrupt_serialised_data() to automatically fix all serialised data errors in a database:

################################################################################
# fix_corrupt_serialised_data()
################################################################################
#
fix_corrupt_serialised_data()
{
	#
	# * Create db dump, fix database, import again
	# * Moved from tourlib-de-buttonbar-pages.sh - 2022.11.26
	# * This function doesn't make a database backup prior to its operation.
	#   It is highly recommended to do so. Use the existing function(s) to do
	#   so, e.g., tourlib-instantiate-site.sh » backup_db_to_folder_dvb()
	#
	#
	# Initialisation
	########################################
	#
	echo ""; echo ""; echo "### fix_corrupt_serialised_data..."; echo ""
	if [ -z "$project_path" ]; then echo "	Variable 'project_path' not provided. Exiting"; exit; fi
	if [ -z "$db_name" ]; then 			echo "	Variable 'db_name' not provided. Exiting"; exit; fi
	if [ -z "$path" ]; then 			echo "	Variable 'path' not provided. Exiting"; exit; fi


	# cd to project location
	#######################################
	#
	cd $project_path


	# Create time stamp
	#######################################
	#
	time_stamp=$(date +%Y.%m.%d-%H.%M)


	# Create db dump
	#######################################
	#
	echo "Create db dump (takes about 20s)..."
	#
	mysqldump $db_name > ${db_name}-${time_stamp}-org.sql


	# Fix db
	#######################################
	#
	echo "Fix db..."
	php ~/scripts/fix-serialization.php ${db_name}-${time_stamp}-org.sql ${db_name}-${time_stamp}-fixed.sql


	# Import db
	#######################################
	#
	echo "Import db again (takes about 70s)..."
	mysql $db_name < ${db_name}-${time_stamp}-fixed.sql


	# Delete transients
	#######################################
	#
	cd $path
	wp transient delete --all
	#
}

PHP: Serialize & unserialize

WordPress uses the PHP function serialize() to serialize data for storage in the database and the unserialize function for the opposite operation.

Example [1]:

$serialized = 'a:3:{i:0;s:5:"apple";i:1;s:6:"banana";i:2;s:6:"orange";}';
var_dump(unserialize($serialized));

Output:

Array
(
    [0] => apple
    [1] => banana
    [2] => orange
)

WP-CLI

WP-CLI can handle serialised data very well, as long as it is about specific fields. E.g.: The positional parameters pluck and patch are for reading and writing parts of serialised data.

However, there are no generic WP-CLI commands for serializing and unserializing.

Maybe the aforementioned PHP commands can be invoked through wp eval, but so far (2023.11, after half an hour of trying) I didn't get it to work → Wp eval (WP-CLI).

MySQL

MySQL does not seem to have built-in functions similar to the PHP functions serialize and unserialize. This would have been possible: MySQL, for example, does have functions to deal with XML.

Syntaxis

Dankzij het voorbeeld hierboven, kun je al een deel van de syntaxis achterhalen:

  • Data-elementen zijn omhuld door { en }
  • a: Array
  • i: Counter; index
  • s: String + lengte. Bv. s:5"apple".

Stringlengte

Strings worden voorafgegaan door de lengte van de betreffende string, gemeten in UTF8-bytes. Als dit getal niet klopt, wordt de betreffende serialised data, niet verwerkt.

Voorbeelden:

# The string "0...9" is exactly 10 bytes long: Every character is exactly one byte
#
s:10:"0123456789"
# The letter "è" takes two bytes, hence the whole string is 7 bytes long, although only 4 characters long
#
s:7:"Bèèè"
concat('s:27:"Balais de charbon d', char(39), 'origine"')
#
# * Balais de charbon d	→ 19 characters & 19 bytes
# * char(39) 		→ 1 character & 1 byte
# * origine		→ 7 characters & 7 bytes
#
# * Total: 27 bytes

Enveloping symbols

  • Usually, you can choose to envelop a string within 'apostrophes' or within "quotes"
  • However, we have a complication here: The actual string already contains both apostrophes & quotes
  • The way to solve this: Use apostrophes for enveloping the string. When an apostrophe is needed inside the string, and use char(39) to represent the apostrophe inside the string
  • BTW: char(39) counts as one byte

Voorbeeld:

concat('s:27:"Balais de charbon d', char(39), 'origine"')
#
# * The string (inside the apostrophes) becomes
# * s:27:"Balais de charbon d'origine"

See also

Bronnen