Serialised data (WordPress)

Uit De Vliegende Brigade
Naar navigatie springen Naar zoeken springen
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

Sometimes database fields contain serialized data: Multiple data in one field, concatenated using a certain syntax. Although the concept of serialized data seems quite at odds with some basic databases principles, there is an obvious application for: Storing objects.

However, compaired to e.g., YAML, this nameless protocol that WordPress uses, has one major issue: Before each string, the length of that string is specified. When you change the string but fail to update this string-count, the object is not readable anymore. See below for a solution.

Example

In table wp_options, field widget_woocommerce_layered_nav encodes all Woocommerce widgets. This is an example of an online store in Oct. 2020.

Raw content:

a:14:{i:4;a:4:{s:5:"title";s:21:"Originele priri";s:9:"attribute";s:21:"originele priri";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}i:5;a:4:{s:5:"title";s:12:"Breedte (mm)";s:9:"attribute";s:7:"breedte";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}i:6;a:4:{s:5:"title";s:11:"Diepte (mm)";s:9:"attribute";s:6:"diepte";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}i:7;a:4:{s:5:"title";s:11:"Lengte (mm)";s:9:"attribute";s:6:"lengte";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}i:8;a:4:{s:5:"title";s:18:"Automatische stop?";s:9:"attribute";s:17:"automatische_stop";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}i:9;a:4:{s:5:"title";s:12:"Stroomdraad?";s:9:"attribute";s:11:"stroomdraad";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}i:10;a:4:{s:5:"title";s:10:"Connector?";s:9:"attribute";s:9:"connector";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}i:11;a:4:{s:5:"title";s:5:"Veer?";s:9:"attribute";s:4:"veer";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}i:12;a:4:{s:5:"title";s:7:"Diagram";s:9:"attribute";s:7:"diagram";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}i:13;a:4:{s:5:"title";s:19:"Ztoepassing - Model";s:9:"attribute";s:16:"toepassing_model";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}i:14;a:4:{s:5:"title";s:21:"Toepassing - Submodel";s:9:"attribute";s:19:"toepassing_submodel";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}i:15;a:4:{s:5:"title";s:17:"Azijn-productcode";s:9:"attribute";s:10:"azijn_code";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}i:16;a:4:{s:5:"title";s:11:"Productlijn";s:9:"attribute";s:11:"productlijn";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}s:12:"_multiwidget";i:1;}

The first few lines in more readable form:

a:14:
{
	i:4;a:4:
	{
		s:5:"title";
		s:21:"Originele pripripripr";
		s:9:"attribute";
		s:21:"originele pripripripr";
		s:12:"display_type";
		s:4:"list";
		s:10:"query_type";
		s:3:"and";
	}
	i:5;a:4:
	{
		s:5:"title";s:12:"Breedte (mm)";s:9:"attribute";s:7:"breedte";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";
	}
	i:6;a:4:
	{
		s:5:"title";s:11:"Diepte (mm)";s:9:"attribute";s:6:"diepte";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";
	}
	i:7;a:4:
	{
		s:5:"title";s:11:"Lengte (mm)";s:9:"attribute";s:6:"lengte";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";
	}
	i:8;a:4:
	{
		s:5:"title";s:18:"Automatische stop?";s:9:"attribute";s:17:"automatische_stop";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";
	}
	i:9;a:4:
	{
		s:5:"title";s:12:"Stroomdraad?";s:9:"attribute";s:11:"stroomdraad";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";
	}
}

The problem with serialised data

Working with serialised data in WordPress poses a specific problem: Prior to each string, its length in bytes in UTF8 (or something like that), is specified. If this number is incorrect, WordPress will refuse to process the entire field.

  • This means that it becomes quite difficult to process serialised data other than through appropriate API-calls. So no messing around directly in MySQL - Where otherwise this would be the only obvious solution.
  • But it gets worse: you cannot fix a messed-up serial field through WP-CLI functions that otherwise know how to handle serialised data. In order to fix messed-up serial fields. you probably have to recreate the whole object on a higher abstraction level (e.g.: Import a page afresh).

The problem - Example

On a site, page 51873 contains buttonbars. I can still retrieve all options for this page (including all buttonbars, implemented as ACF fields) with wp post meta list:

wp post meta list 51873

| post_id | meta_key                                    | meta_value                                                                                                                                       
+---------+---------------------------------------------+----------------------------------------------+
| 51873   | buttonbar_selection_0_image                 | 53521                                                                                                                                                
| 51873   | _buttonbar_selection_0_image                | field_5e450106734a3                                                                                                                                  
| 51873   | buttonbar_selection_0_link                  | a:3:{s:5:"title";s:0:"";s:3:"url";s:36:"https://example-canada.com/shop/blub/";s:6:"target";s:0:"";}                                          
| 51873   | _buttonbar_selection_0_link                 | field_5e450111734a4

Note that field buttonbar_selection_0_link contains serialised data. The problem: The string that is specified to be 36 characters/bytes/whatever, isn't that number. Hence you can't fetch the field using wp post meta get, eventhough the other fields around it, are still retrievable:

wp post meta get 51873 buttonbar_selection_0_title
wp post meta get 51873 buttonbar_selection_0_image
wp post meta get 51873 buttonbar_selection_0_link

Output:

Widget
53521

You can't directly update the compromised field

Looks promising, doesn't it?

wp post meta list 51873
wp post meta get 51873 buttonbar_selection_0_title
wp post meta get 51873 buttonbar_selection_0_image
wp post meta get 51873 buttonbar_selection_0_link

wp post meta pluck 51873 buttonbar_selection_0_link title
wp post meta pluck 51873 buttonbar_selection_0_link url
wp post meta pluck 51873 buttonbar_selection_0_link blub

wp post meta patch update 51873 buttonbar_selection_0_link url "https://example.com"

wp post meta list 51873
wp post meta get 51873 buttonbar_selection_0_title
wp post meta get 51873 buttonbar_selection_0_image
wp post meta get 51873 buttonbar_selection_0_link

But it's unforgiving. Even the non-compromised part title cannot be updated. The reading-part does work for a similar site without this problem (I didn't check writing there):

...
Error: No data exists for key "title"
Error: No data exists for key "url"
...

You can't insert into the compromised field

Pfff:

wp post meta list 51873 | grep buttonbar_selection_0_link


# Insert?
#
wp post meta patch insert 51873 buttonbar_selection_0_link title "Example.com"
wp post meta patch insert 51873 buttonbar_selection_0_link url "https://example.com"

Output:

Error: Cannot create key "title" on data type boolean
Error: Cannot create key "url" on data type boolean

You can't delete the subfields

By now, this was to be expected:

wp post meta patch delete 51873 buttonbar_selection_0_link title
wp post meta patch delete 51873 buttonbar_selection_0_link url

with output:

Error: No data exists for key "title"
Error: No data exists for key "url"

You can delete the field completely

Yep:

wp post meta delete 51873 buttonbar_selection_0_link
wp post meta list 51873 | grep buttonbar_selection_0_

Note that the field name including underscore (_buttonbar_selection_0_link), still exists, but that the version without underscore (buttonbar_selection_0_link) is gone:

51873	buttonbar_selection_0_title	Widget
51873	_buttonbar_selection_0_title	field_5e450082734a2
51873	buttonbar_selection_0_image	53521
51873	_buttonbar_selection_0_image	field_5e450106734a3
51873	_buttonbar_selection_0_link	field_5e450111734a4

How to recreate?

Now that the field is completely gone, how to recreate it? This doesn't seem to work:

wp post meta patch insert 51873 buttonbar_selection_0_link title "Blub"
wp post meta patch insert 51873 buttonbar_selection_0_link url "https://example.com"
wp post meta patch insert 51873 buttonbar_selection_0_link target ""

wp post meta list 51873 | grep buttonbar_selection_0_

Return:

Error: Cannot create key "title" on data type string
Error: Cannot create key "url" on data type string
Error: Cannot create key "target" on data type string

Tools to fix broken serialized data - Inventory

There is hope:

fix-serialization.php

Around 2021 & 2022, I have used the fix-serializatin.php script from https://btb.works/how-to-fix-a-broken-serialization/. It works very well, except for a critical bug: $content should be $data. Otherwise, an empty file is written. My version of the script:

<?php
#
# Fix serial data string length errors
################################################################################
# 
# @author: Wojtek Szałkiewicz - wojtek  at  bracketspace.com
# @license: GPL version 3 or later - http://www.gnu.org/licenses/gpl.txt
# Based on Pau Iglesias http://blogestudio.com package
#
#
# Usage
########################################
#
# * php fix-serialization.php my-sql-file.sql
# * php fix-serialization.php my-input-sql-file.sql my-output-sql-file.sql
#
#
# Known errors
########################################
#
# * Memory size exhausted: Allowed memory size of 67108864 bytes exhausted
#   (tried to allocate 35266489 bytes) How to fix: update php.ini 
#   memory_limit to 512M or more, and restart cgi service or webserver
# * Function preg_replace returns null or 0 length string If preg_last_error =
#   PREG_BACKTRACK_LIMIT_ERROR (value 2), increase pcre.backtrack_limit in 
#   php.ini (by default 100k, change to 2M by example) Same way for others
#   preg_last_error codes: 
#   http://www.php.net/manual/en/#unction.preg-last-error.php
#
#
################################################################################
# unescape_mysql()
################################################################################
#
function unescape_mysql( $value )
{
	#
	# * Remove redundant MySQL database dump escape codes
	# * Used in main code below
	#
	return str_replace(
		[ "\\\\", "\\0", "\\n", "\\r", "\Z",  "\'", '\"' ],
		[ "\\",   "\0",  "\n",  "\r",  "\x1a", "'", '"' ],
		$value
	);
}


################################################################################
# unescape_quotes()
################################################################################
#
function unescape_quotes( $value )
{
	# * Remove escape character "\" before a double quote
	# * Used in main code below
	#
	return str_replace('\"', '"', $value);
}


################################################################################
# Initialisation
################################################################################
#
# * Read file into $data
# * Reset counters
#
#
# Check for presence of input file name
########################################
#
if ( ! isset( $argv ) || ! isset( $argv[1] ) )
{
	#
	echo "Error: no input file specified\n\n";
	exit;
	#
}


# Store filename in "$input
########################################
#
$input = $argv[1];


# Check for existence of input file
########################################
#
if ( ! file_exists( $input ) )
{
	#
	echo "Error: input file does not exists\n";
	echo "{$input}\n\n";
	exit;
	#
}


# Store file contents in "$data"
########################################
#
$data = file_get_contents( $input );


# Check for succesful file reading
########################################
#
if ( ! $data ) 
{
	#
	echo "Error: can`t read data from input file\n";
	echo "{$input}\n\n";
	exit;
}


# Check that file has content
########################################
#
if ( ! isset( $data ) || ! ( strlen( $data ) > 0 ) )
{
	#	
	echo "Warning: the file is empty or can't read contents\n";
	echo "{$input}\n\n";
	exit;
	#
}


# Set counters to 0
########################################
#
$processed = 0;
$fixed     = 0;


################################################################################
# Main code: Search & replace
################################################################################
#
$data = preg_replace_callback('!s:(\d+):([\\\\]?"[\\\\]?"|[\\\\]?"((.*?)[^\\\\])[\\\\]?");!', function( $matches ) use ( &$processed, &$fixed )
{
	#
	# * preg_replace_callback — Perform a regular expression search and replace 
	#   using a callback
	#   https://www.php.net/manual/en/function.preg-replace-callback.php
	# * This seems to be the main function. It seems to work recursively or
	#   something.
	#   Quite impressive to see the input argument
	# * The data is continuously stored in "$data"
	#
	#
	# Found a string statement?
	########################################
	#
	if ( isset( $matches[3] ) )
	{
		#
		# Recalculate string length
		########################################
		#
		$len = strlen( unescape_mysql( $matches[3] ) );
		$str = unescape_quotes( $matches[3] );
		$result = "s:{$len}:\"$str\";";


		# Do old and new string length differ?
		########################################
		#
		if ( $result !== $matches[0] )
		{
			# Yes: Update fix counter
			########################################
			#
			$fixed++;
		}
	}

	# No string statement found (?)
	########################################
	#
	else 
	{
		#
		# Just return the unchanged argument
		########################################
		#
		$result = $matches[0];
	}


	# Increass string counter
	########################################
	#
	$processed++;

	return $result;
	
}, $data);


################################################################################
# Check results from "preg_replace"
################################################################################
#
if ( ! ( isset( $data ) && strlen( $data ) > 0 ) )
{
	// Error
	echo "Error: preg_replace returns nothing\n";

	if ( function_exists( 'preg_last_error' ) ) {
		$last_error = preg_last_error();
		echo "preg_last_error() = {$last_error}\n";
	}

	echo "{$input}\n\n";
	exit;
}


################################################################################
# Write to old or new file?
################################################################################
#
if ( isset( $argv[2] ) )
{
  $output = $argv[2];
  # echo "Output file name has been provided: {$argv[2]}\n";
} else
{
  $output = $input;
  # echo "No output file name has been provided - Overwrite input file\n";
}


################################################################################
# Write to file
################################################################################
#
# * file_put_contents($filename, $data) is a standard PHP function
# * Original first line: $result = file_put_contents( $output, $content );
#   It seems that "$content" should have been "$data". Otherwise, an empty file
#   is written
#
#
$result = file_put_contents( $output, $data );
// Write file data
if ( false === $result ) {
	// Error
	echo "Error: can't write fixed content\n";
	echo "{$output}\n\n";
	exit;
}


################################################################################
# Exit
################################################################################
#
echo "Found strings: {$processed}\n";
echo "Fixed: {$fixed}\n";

fix-serialization.php - Example

I use this self-developed Bash function tourlib-general-settings.sh » fix_corrupt_serialised_data() to automatically fix all serialised data errors in a database:

################################################################################
# fix_corrupt_serialised_data()
################################################################################
#
fix_corrupt_serialised_data()
{
	#
	# * Create db dump, fix database, import again
	# * Moved from tourlib-de-buttonbar-pages.sh - 2022.11.26
	# * This function doesn't make a database backup prior to its operation.
	#   It is highly recommended to do so. Use the existing function(s) to do
	#   so, e.g., tourlib-instantiate-site.sh » backup_db_to_folder_dvb()
	#
	#
	# Initialisation
	########################################
	#
	echo ""; echo ""; echo "### fix_corrupt_serialised_data..."; echo ""
	if [ -z "$project_path" ]; then echo "	Variable 'project_path' not provided. Exiting"; exit; fi
	if [ -z "$db_name" ]; then 			echo "	Variable 'db_name' not provided. Exiting"; exit; fi
	if [ -z "$path" ]; then 			echo "	Variable 'path' not provided. Exiting"; exit; fi


	# cd to project location
	#######################################
	#
	cd $project_path


	# Create time stamp
	#######################################
	#
	time_stamp=$(date +%Y.%m.%d-%H.%M)


	# Create db dump
	#######################################
	#
	echo "Create db dump (takes about 20s)..."
	#
	mysqldump $db_name > ${db_name}-${time_stamp}-org.sql


	# Fix db
	#######################################
	#
	echo "Fix db..."
	php ~/scripts/fix-serialization.php ${db_name}-${time_stamp}-org.sql ${db_name}-${time_stamp}-fixed.sql


	# Import db
	#######################################
	#
	echo "Import db again (takes about 70s)..."
	mysql $db_name < ${db_name}-${time_stamp}-fixed.sql


	# Delete transients
	#######################################
	#
	cd $path
	wp transient delete --all
	#
}

PHP: Serialize & unserialize

WordPress uses the PHP function serialize() to serialize data for storage in the database and the unserialize function for the opposite operation.

Example [1]:

$serialized = 'a:3:{i:0;s:5:"apple";i:1;s:6:"banana";i:2;s:6:"orange";}';
var_dump(unserialize($serialized));

Output:

Array
(
    [0] => apple
    [1] => banana
    [2] => orange
)

WP-CLI

WP-CLI can handle serialised data very well, as long as it is about specific fields. E.g.: The positional parameters pluck and patch are for reading and writing parts of serialised data.

However, there are no generic WP-CLI commands for serializing and unserializing.

Maybe the aforementioned PHP commands can be invoked through wp eval, but so far (2023.11, after half an hour of trying) I didn't get it to work → Wp eval (WP-CLI).

MySQL

MySQL does not seem to have built-in functions similar to the PHP functions serialize and unserialize. This would have been possible: MySQL, for example, does have functions to deal with XML.

Syntaxis

Dankzij het voorbeeld hierboven, kun je al een deel van de syntaxis achterhalen:

  • Data-elementen zijn omhuld door { en }
  • a: Array
  • i: Counter; index
  • s: String + lengte. Bv. s:5"apple".

Stringlengte

Strings worden voorafgegaan door de lengte van de betreffende string, gemeten in UTF8-bytes. Als dit getal niet klopt, wordt de betreffende serialised data, niet verwerkt.

Voorbeelden:

# The string "0...9" is exactly 10 bytes long: Every character is exactly one byte
#
s:10:"0123456789"
# The letter "è" takes two bytes, hence the whole string is 7 bytes long, although only 4 characters long
#
s:7:"Bèèè"
concat('s:27:"Balais de charbon d', char(39), 'origine"')
#
# * Balais de charbon d	→ 19 characters & 19 bytes
# * char(39) 		→ 1 character & 1 byte
# * origine		→ 7 characters & 7 bytes
#
# * Total: 27 bytes

Enveloping symbols

  • Usually, you can choose to envelop a string within 'apostrophes' or within "quotes"
  • However, we have a complication here: The actual string already contains both apostrophes & quotes
  • The way to solve this: Use apostrophes for enveloping the string. When an apostrophe is needed inside the string, and use char(39) to represent the apostrophe inside the string
  • BTW: char(39) counts as one byte

Voorbeeld:

concat('s:27:"Balais de charbon d', char(39), 'origine"')
#
# * The string (inside the apostrophes) becomes
# * s:27:"Balais de charbon d'origine"

See also

Bronnen