Serialised data (WordPress): verschil tussen versies

Uit De Vliegende Brigade
Naar navigatie springen Naar zoeken springen
 
(23 tussenliggende versies door dezelfde gebruiker niet weergegeven)
Regel 1: Regel 1:
Sometimes database fields contain ''serialized data'': Multiple data in one field, which are concatenated using a certain syntax.
+
Sometimes database fields contain ''serialized data'': Multiple data in one field, concatenated using a certain syntax. Although the concept of ''serialized data'' seems quite at odds with some basic databases principles, there is an obvious application for: ''Storing objects''.
  
Although the concept of ''serialized data'' seems quite at odds with some basic databases principles, it offers flexibility that otherwise would be hard to get by. In addition, this is actually used to store ''Object-oriented objects'' - I hadn't thought of that before!
+
However, compaired to e.g., YAML, this nameless protocol that WordPress uses, has one major issue: Before each string, the length of that string is specified. When you change the string but fail to update this string-count, the object is not readable anymore. See below for a solution.
  
 
== Example ==
 
== Example ==
Regel 51: Regel 51:
 
</pre>
 
</pre>
  
== The problem with serialised data & WordPress ==
+
== The problem with serialised data ==
  
 
Working with serialised data in WordPress poses a specific problem: Prior to each string, its length in bytes in UTF8 (or something like that), is specified. If this number is incorrect, WordPress will refuse to process the entire field.
 
Working with serialised data in WordPress poses a specific problem: Prior to each string, its length in bytes in UTF8 (or something like that), is specified. If this number is incorrect, WordPress will refuse to process the entire field.
Regel 58: Regel 58:
 
* But it gets worse: you cannot fix a messed-up serial field through WP-CLI functions that otherwise know how to handle serialised data. In order to fix messed-up serial fields. you probably have to recreate the whole object on a higher abstraction level (e.g.: Import a page afresh).
 
* But it gets worse: you cannot fix a messed-up serial field through WP-CLI functions that otherwise know how to handle serialised data. In order to fix messed-up serial fields. you probably have to recreate the whole object on a higher abstraction level (e.g.: Import a page afresh).
  
=== Example ===
+
=== The problem - Example ===
  
 
On a site, page 51873 contains buttonbars. I can still retrieve all options for this page (including all buttonbars, implemented as ACF fields) with <code>wp post meta list</code>:
 
On a site, page 51873 contains buttonbars. I can still retrieve all options for this page (including all buttonbars, implemented as ACF fields) with <code>wp post meta list</code>:
Regel 88: Regel 88:
 
</pre>
 
</pre>
  
== PHP: Serialize & unserialize ==
+
=== You can't directly update the compromised field ===
  
WordPress uses the PHP function <code>serialize()</code> to ''serialize'' data for storage in the database and the <code>unserialize</code> function for the opposite operation.
+
Looks promising, doesn't it?
  
Example [https://stackoverflow.com/questions/13319638/working-with-serialized-data-in-wordpress]:
 
 
<pre>
 
<pre>
$serialized = 'a:3:{i:0;s:5:"apple";i:1;s:6:"banana";i:2;s:6:"orange";}';
+
wp post meta list 51873
var_dump(unserialize($serialized));
+
wp post meta get 51873 buttonbar_selection_0_title
 +
wp post meta get 51873 buttonbar_selection_0_image
 +
wp post meta get 51873 buttonbar_selection_0_link
 +
 
 +
wp post meta pluck 51873 buttonbar_selection_0_link title
 +
wp post meta pluck 51873 buttonbar_selection_0_link url
 +
wp post meta pluck 51873 buttonbar_selection_0_link blub
 +
 
 +
wp post meta patch update 51873 buttonbar_selection_0_link url "https://example.com"
 +
 
 +
wp post meta list 51873
 +
wp post meta get 51873 buttonbar_selection_0_title
 +
wp post meta get 51873 buttonbar_selection_0_image
 +
wp post meta get 51873 buttonbar_selection_0_link
 +
</pre>
  
Output:
+
But it's unforgiving. Even the non-compromised part ''title'' cannot be updated. The reading-part does work for a similar site without this problem (I didn't check writing there):
  
Array
+
<pre>
(
+
...
    [0] => apple
+
Error: No data exists for key "title"
    [1] => banana
+
Error: No data exists for key "url"
    [2] => orange
+
...
)
 
 
</pre>
 
</pre>
  
== WP-CLI ==
+
=== You can't insert into the compromised field ===
 +
 
 +
Pfff:
  
[[Hoofdpagina#WP-CLI | WP-CLI]] can handle serialised data very well. The ''positional parameters'' <code>pluck</code> and <code>patch</code> are for reading and writing parts of serialised data
+
<pre>
 +
wp post meta list 51873 | grep buttonbar_selection_0_link
  
== MySQL ==
 
  
MySQL does not seem to have built-in functions similar to the PHP functions ''serialize'' and ''unserialize''. This would have been possible: MySQL, for example, does have functions to deal with XML.
+
# Insert?
 +
#
 +
wp post meta patch insert 51873 buttonbar_selection_0_link title "Example.com"
 +
wp post meta patch insert 51873 buttonbar_selection_0_link url "https://example.com"
 +
</pre>
  
== Syntaxis ==
+
Output:
  
Dankzij het voorbeeld hierboven, kun je al een deel van de syntaxis achterhalen:
+
<pre>
 +
Error: Cannot create key "title" on data type boolean
 +
Error: Cannot create key "url" on data type boolean
 +
</pre>
  
* Data-elementen zijn omhuld door <code>{</code> en <code>}</code>
+
=== You can't delete the subfields ===
* <code>a</code>: Array
 
* <code>i</code>: Counter; index
 
* <code>s</code>: String + lengte. Bv. <code>s:5"apple"</code>.
 
  
=== Stringlengte ===
+
By now, this was to be expected:
  
''Strings'' worden voorafgegaan door de lengte van de betreffende string, gemeten in UTF8-bytes. Als dit getal niet klopt, wordt de betreffende serialised data, niet verwerkt.
+
<pre>
 +
wp post meta patch delete 51873 buttonbar_selection_0_link title
 +
wp post meta patch delete 51873 buttonbar_selection_0_link url
 +
</pre>
  
Voorbeelden:
+
with output:
  
 
<pre>
 
<pre>
# The string "0...9" is exactly 10 bytes long: Every character is exactly one byte
+
Error: No data exists for key "title"
#
+
Error: No data exists for key "url"
s:10:"0123456789"
 
 
</pre>
 
</pre>
 +
 +
=== You can delete the field completely ===
 +
 +
Yep:
  
 
<pre>
 
<pre>
# The letter "è" takes two bytes, hence the whole string is 7 bytes long, although only 4 characters long
+
wp post meta delete 51873 buttonbar_selection_0_link
#
+
wp post meta list 51873 | grep buttonbar_selection_0_
s:7:"Bèèè"
 
 
</pre>
 
</pre>
 +
 +
Note that the field name including underscore (<code>_buttonbar_selection_0_link</code>), still exists, but that the version without underscore (<code>buttonbar_selection_0_link</code>) is gone:
  
 
<pre>
 
<pre>
concat('s:27:"Balais de charbon d', char(39), 'origine"')
+
51873 buttonbar_selection_0_title Widget
#
+
51873 _buttonbar_selection_0_title field_5e450082734a2
# * Balais de charbon d → 19 characters & 19 bytes
+
51873 buttonbar_selection_0_image 53521
# * char(39) → 1 character & 1 byte
+
51873 _buttonbar_selection_0_image field_5e450106734a3
# * origine → 7 characters & 7 bytes
+
51873 _buttonbar_selection_0_link field_5e450111734a4
#
 
# * Total: 27 bytes
 
 
</pre>
 
</pre>
  
=== Enveloping symbols ===
+
=== How to recreate? ===
 +
 
 +
Now that the field is completely gone, how to recreate it? This doesn't seem to work:
 +
 
 +
<pre>
 +
wp post meta patch insert 51873 buttonbar_selection_0_link title "Blub"
 +
wp post meta patch insert 51873 buttonbar_selection_0_link url "https://example.com"
 +
wp post meta patch insert 51873 buttonbar_selection_0_link target ""
  
* Usually, you can choose to envelop a string within 'apostrophes' or within "quotes"
+
wp post meta list 51873 | grep buttonbar_selection_0_
* However, we have a complication here: The actual string already contains both apostrophes & quotes
+
</pre>
* The way to solve this: Use apostrophes for enveloping the string. When an apostrophe is needed inside the string, and use char(39) to represent the apostrophe inside the string
 
* BTW: <code>char(39)</code> counts as one byte
 
  
Voorbeeld:
+
Return:
  
 
<pre>
 
<pre>
concat('s:27:"Balais de charbon d', char(39), 'origine"')
+
Error: Cannot create key "title" on data type string
#
+
Error: Cannot create key "url" on data type string
# * The string (inside the apostrophes) becomes
+
Error: Cannot create key "target" on data type string
# * s:27:"Balais de charbon d'origine"
 
 
</pre>
 
</pre>
  
== MySQL - Sproc replace_serial ==
+
== Tools to fix broken serialized data - Inventory ==
 +
 
 +
There is hope:
  
Dit was een leuk puzzeltje om te doen, maar het practische nut is helaas beperkt.
+
* https://btb.works/how-to-fix-a-broken-serialization/ ← I used this one. See elsewhere
 +
* https://stackoverflow.com/questions/15138893/fix-serialized-data-broken-due-to-editing-mysql-database-in-a-text-editor
 +
* https://stackoverflow.com/questions/3148712/regex-code-to-fix-corrupt-serialized-php-data/55074706#55074706
 +
* https://github.com/Blogestudio/Fix-Serialization
 +
* https://github.com/wsizoo/wordpress-fix-serialization
 +
 
 +
== fix-serialization.php ==
 +
 
 +
Around 2021 & 2022, I have used the ''fix-serializatin.php'' script from https://btb.works/how-to-fix-a-broken-serialization/. It works very well, except for a critical bug: <code>$content</code> should be <code>$data</code>. Otherwise, an empty file is written. My version of the script:
  
 
<pre>
 
<pre>
CREATE DEFINER=`strompf`@`localhost` PROCEDURE `replace_serial`(
+
<?php
in table_name_in tinytext, # Source table name
 
        in col_serial_data_src_in tinytext, # Column with source serialised data
 
        in col_serial_data_dst_in tinytext, # Column with destination serialised data
 
        in col_src_in tinytext, # Column with source text strings
 
        in col_dst_in tinytext # Column with desination text string
 
    )
 
BEGIN
 
 
#
 
#
# Find-and-replace within serialised content
+
# Fix serial data string length errors
 
################################################################################
 
################################################################################
 +
#
 +
# @author: Wojtek Szałkiewicz - wojtek  at  bracketspace.com
 +
# @license: GPL version 3 or later - http://www.gnu.org/licenses/gpl.txt
 +
# Based on Pau Iglesias http://blogestudio.com package
 +
#
 +
#
 +
# Usage
 +
########################################
 
#
 
#
# * This sproc "replace_serial" needs as input pointers to a table
+
# * php fix-serialization.php my-sql-file.sql
#  ("pointer table") with the following columns:
+
# * php fix-serialization.php my-input-sql-file.sql my-output-sql-file.sql
#  * Column with the source serialised data
 
#  * Column for storing the updated serialised data
 
#  * Column with find-string
 
#  * Column with replace-strings
 
# * It can be any table: The arguments for replace_serial() contains the name
 
#  of this table and the names of those four columns
 
# * Internally, data from this input-table is copied to temporary table
 
#  "serial_tmp"
 
# * The designated serialised data is decomposed into several fields (again,
 
#  in table "serial_tmp") - Currently, this sproc only works for serialised
 
#  data with 1 a-element, followed by 6 s-elements. This sproc has not been
 
#  tested with other configurations
 
# * Find-and-replace is only performed on element s4 (=the 4th s-element)
 
# * Subsequently, the new length of s4 is calculated, and the serialised string
 
#  is written to the designated column in the "pointer table"
 
# * Subsequently, it's up to the programmer to process the resulting data in
 
#  the pointer table
 
# * This approach with a "pointer table" works really well: Easy to grasp,
 
#  flexible and easy to debug. It's especially handy for debugging or further
 
#  processing that the output is in a separate column from the input.
 
# * Copyright Jeroen Strompf - De Vliegende Brigade - 2021.03.09
 
 
#
 
#
 
#
 
#
# History
+
# Known errors
 
########################################
 
########################################
 
#
 
#
# * 2021.03.09: Created, tested and pushed to "bal_dwh_org"
+
# * Memory size exhausted: Allowed memory size of 67108864 bytes exhausted
 +
#  (tried to allocate 35266489 bytes) How to fix: update php.ini
 +
#  memory_limit to 512M or more, and restart cgi service or webserver
 +
# * Function preg_replace returns null or 0 length string If preg_last_error =
 +
#  PREG_BACKTRACK_LIMIT_ERROR (value 2), increase pcre.backtrack_limit in
 +
#  php.ini (by default 100k, change to 2M by example) Same way for others
 +
#  preg_last_error codes:
 +
#  http://www.php.net/manual/en/#unction.preg-last-error.php
 
#
 
#
 
#
 
#
 
################################################################################
 
################################################################################
# Define working table "serial_tmp"
+
# unescape_mysql()
 
################################################################################
 
################################################################################
 
#
 
#
drop table if exists
+
function unescape_mysql( $value )
serial_tmp;
+
{
create temporary table
+
#
serial_tmp
+
# * Remove redundant MySQL database dump escape codes
    (
+
# * Used in main code below
serial_data_src varchar(255) comment "Actual serialised data",
+
#
        serial_data_dst varchar(255) comment "Resulting serialised data. This is a separate field, to make debugging easier",
+
return str_replace(
        find_string varchar(255) comment "String to be found within serial_data_src",
+
[ "\\\\", "\\0", "\\n", "\\r", "\Z",  "\'", '\"' ],
        replace_string varchar(255) comment "Replacement string for within serial_data_src, resulting in serial_data_dst",
+
[ "\\",  "\0",  "\n", "\r", "\x1a", "'", '"' ],
 +
$value
 +
);
 +
}
  
        a_len smallint,
+
 
        a_string varchar(255) comment "The string within the accolade",
+
################################################################################
        s1 varchar(255) comment "Complete s-part, including length statement and string",
+
# unescape_quotes()
s2 varchar(255),
+
################################################################################
s3 varchar(255),
+
#
s4 varchar(255) comment "Complete s4-string",
+
function unescape_quotes( $value )
        s4_len_src smallint comment "Original length of s4",
+
{
        s4_string_src varchar(255) comment "Original s4 string",
+
# * Remove escape character "\" before a double quote
        s4_len_dst smallint comment "New length",
+
# * Used in main code below
        s4_string_dst varchar(255) comment "New string",
+
#
s5 varchar(255),
+
return str_replace('\"', '"', $value);
s6 varchar(255)
+
}
    );
 
  
  
 
################################################################################
 
################################################################################
# Copy working data to working table
+
# Initialisation
 
################################################################################
 
################################################################################
 
#
 
#
set @ddl=concat
+
# * Read file into $data
(
+
# * Reset counters
"insert into serial_tmp
+
#
    (
+
#
serial_data_src,
+
# Check for presence of input file name
        find_string,
+
########################################
        replace_string
+
#
    )
+
if ( ! isset( $argv ) || ! isset( $argv[1] ) )
    select ",
+
{
col_serial_data_src_in, ", ",
+
#
        col_src_in, ", ",
+
echo "Error: no input file specified\n\n";
        col_dst_in, " ",
+
exit;
"from ",
+
#
    table_name_in, ";"
+
}
);
+
 
  
prepare stmt from @ddl;
+
# Store filename in "$input
execute stmt;
+
########################################
deallocate prepare stmt;
+
#
 +
$input = $argv[1];
  
  
################################################################################
+
# Check for existence of input file
# Decompose serialised data
+
########################################
################################################################################
 
 
#
 
#
# Leading a-field
+
if ( ! file_exists( $input ) )
 +
{
 +
#
 +
echo "Error: input file does not exists\n";
 +
echo "{$input}\n\n";
 +
exit;
 +
#
 +
}
 +
 
 +
 
 +
# Store file contents in "$data"
 
########################################
 
########################################
 
#
 
#
update
+
$data = file_get_contents( $input );
serial_tmp
 
set
 
a_len = substring_index(substring_index(serial_data_src, ":", 2), ":", -1),
 
a_string = substring_index(substring_index(serial_data_src, "}",1), "{", -1);
 
  
  
# Complete s-fields
+
# Check for succesful file reading
 
########################################
 
########################################
 
#
 
#
update
+
if ( ! $data )  
    serial_tmp
+
{
set
+
#
    s1 = substring_index(a_string, ";", 1),
+
echo "Error: can`t read data from input file\n";
    s2 = substring_index(substring_index(a_string, ";", 2), ";", -1),
+
echo "{$input}\n\n";
    s3 = substring_index(substring_index(a_string, ";", 3), ";", -1),
+
exit;
    s4 = substring_index(substring_index(a_string, ";", 4), ";", -1),
+
}
    s5 = substring_index(substring_index(a_string, ";", 5), ";", -1),
 
    s6 = substring_index(substring_index(a_string, ";", 6), ";", -1);
 
  
  
# Further decompose s4
+
# Check that file has content
 
########################################
 
########################################
 
#
 
#
update
+
if ( ! isset( $data ) || ! ( strlen( $data ) > 0 ) )
     serial_tmp
+
{
set
+
#
    s4_len_src   = substring_index(substring_index(s4, ":", 2), ":", -1),
+
echo "Warning: the file is empty or can't read contents\n";
    s4_string_src = substring_index(substring_index(s4, '"', 2), '"', -1);
+
echo "{$input}\n\n";
 +
exit;
 +
#
 +
}
 +
 
 +
 
 +
# Set counters to 0
 +
########################################
 +
#
 +
$processed = 0;
 +
$fixed     = 0;
 +
 
 +
 
 +
################################################################################
 +
# Main code: Search & replace
 +
################################################################################
 +
#
 +
$data = preg_replace_callback('!s:(\d+):([\\\\]?"[\\\\]?"|[\\\\]?"((.*?)[^\\\\])[\\\\]?");!', function( $matches ) use ( &$processed, &$fixed )
 +
{
 +
#
 +
# * preg_replace_callback — Perform a regular expression search and replace
 +
#  using a callback
 +
#  https://www.php.net/manual/en/function.preg-replace-callback.php
 +
# * This seems to be the main function. It seems to work recursively or
 +
#  something.
 +
#  Quite impressive to see the input argument
 +
# * The data is continuously stored in "$data"
 +
#
 +
#
 +
# Found a string statement?
 +
########################################
 +
#
 +
if ( isset( $matches[3] ) )
 +
{
 +
#
 +
# Recalculate string length
 +
########################################
 +
#
 +
$len = strlen( unescape_mysql( $matches[3] ) );
 +
$str = unescape_quotes( $matches[3] );
 +
$result = "s:{$len}:\"$str\";";
 +
 
 +
 
 +
# Do old and new string length differ?
 +
########################################
 +
#
 +
if ( $result !== $matches[0] )
 +
{
 +
# Yes: Update fix counter
 +
########################################
 +
#
 +
$fixed++;
 +
}
 +
}
 +
 
 +
# No string statement found (?)
 +
########################################
 +
#
 +
else
 +
{
 +
#
 +
# Just return the unchanged argument
 +
########################################
 +
#
 +
$result = $matches[0];
 +
}
 +
 
 +
 
 +
# Increass string counter
 +
########################################
 +
#
 +
$processed++;
 +
 
 +
return $result;
 +
 +
}, $data);
  
  
 
################################################################################
 
################################################################################
# Find-and-replace
+
# Check results from "preg_replace"
 
################################################################################
 
################################################################################
 
#
 
#
# Finally! The heart of the matter
+
if ( ! ( isset( $data ) && strlen( $data ) > 0 ) )
 +
{
 +
// Error
 +
echo "Error: preg_replace returns nothing\n";
 +
 
 +
if ( function_exists( 'preg_last_error' ) ) {
 +
$last_error = preg_last_error();
 +
echo "preg_last_error() = {$last_error}\n";
 +
}
 +
 
 +
echo "{$input}\n\n";
 +
exit;
 +
}
 +
 
 +
 
 +
################################################################################
 +
# Write to old or new file?
 +
################################################################################
 
#
 
#
update
+
if ( isset( $argv[2] ) )
    serial_tmp
+
{
set
+
  $output = $argv[2];
    s4_string_dst = replace(s4_string_src, find_string, replace_string),
+
  # echo "Output file name has been provided: {$argv[2]}\n";
    s4_len_dst   = length(s4_string_dst);
+
} else
 +
{
 +
  $output = $input;
 +
  # echo "No output file name has been provided - Overwrite input file\n";
 +
}
  
  
 
################################################################################
 
################################################################################
# Reassemble
+
# Write to file
 
################################################################################
 
################################################################################
 
#
 
#
update
+
# * file_put_contents($filename, $data) is a standard PHP function
    serial_tmp
+
# * Original first line: $result = file_put_contents( $output, $content );
set
+
#  It seems that "$content" should have been "$data". Otherwise, an empty file
    serial_data_dst = concat
+
#  is written
    (
+
#
        "a:",a_len, ":",
+
#
        "{",
+
$result = file_put_contents( $output, $data );
    s1, ";", s2, ";", s3, ";",
+
// Write file data
            "s:",s4_len_dst,":",
+
if ( false === $result ) {
            char(34),
+
// Error
s4_string_dst,
+
echo "Error: can't write fixed content\n";
    char(34),
+
echo "{$output}\n\n";
            ";", s5, ";", s6, ";",
+
exit;
"}"
+
}
    );
 
  
  
 
################################################################################
 
################################################################################
# Copy results back to the original input table
+
# Exit
 
################################################################################
 
################################################################################
 
#
 
#
# * Fill in the result in column "col_serial_data_dst_in"
+
echo "Found strings: {$processed}\n";
 +
echo "Fixed: {$fixed}\n";
 +
 
 +
</pre>
 +
 
 +
== fix-serialization.php - Example ==
 +
 
 +
I use this self-developed Bash function <code>tourlib-general-settings.sh » fix_corrupt_serialised_data()</code> to automatically fix all serialised data errors in a database:
 +
 
 +
<pre>
 +
################################################################################
 +
# fix_corrupt_serialised_data()
 +
################################################################################
 
#
 
#
set @ddl2=concat
+
fix_corrupt_serialised_data()
 +
{
 +
#
 +
# * Create db dump, fix database, import again
 +
# * Moved from tourlib-de-buttonbar-pages.sh - 2022.11.26
 +
# * This function doesn't make a database backup prior to its operation.
 +
#  It is highly recommended to do so. Use the existing function(s) to do
 +
#  so, e.g., tourlib-instantiate-site.sh » backup_db_to_folder_dvb()
 +
#
 +
#
 +
# Initialisation
 +
########################################
 +
#
 +
echo ""; echo ""; echo "### fix_corrupt_serialised_data..."; echo ""
 +
if [ -z "$project_path" ]; then echo " Variable 'project_path' not provided. Exiting"; exit; fi
 +
if [ -z "$db_name" ]; then echo " Variable 'db_name' not provided. Exiting"; exit; fi
 +
if [ -z "$path" ]; then echo " Variable 'path' not provided. Exiting"; exit; fi
 +
 
 +
 
 +
# cd to project location
 +
#######################################
 +
#
 +
cd $project_path
 +
 
 +
 
 +
# Create time stamp
 +
#######################################
 +
#
 +
time_stamp=$(date +%Y.%m.%d-%H.%M)
 +
 
 +
 
 +
# Create db dump
 +
#######################################
 +
#
 +
echo "Create db dump (takes about 20s)..."
 +
#
 +
mysqldump $db_name > ${db_name}-${time_stamp}-org.sql
 +
 
 +
 
 +
# Fix db
 +
#######################################
 +
#
 +
echo "Fix db..."
 +
php ~/scripts/fix-serialization.php ${db_name}-${time_stamp}-org.sql ${db_name}-${time_stamp}-fixed.sql
 +
 
 +
 
 +
# Import db
 +
#######################################
 +
#
 +
echo "Import db again (takes about 70s)..."
 +
mysql $db_name < ${db_name}-${time_stamp}-fixed.sql
 +
 
 +
 
 +
# Delete transients
 +
#######################################
 +
#
 +
cd $path
 +
wp transient delete --all
 +
#
 +
}
 +
</pre>
 +
 
 +
== PHP: Serialize & unserialize ==
 +
 
 +
WordPress uses the PHP function <code>serialize()</code> to ''serialize'' data for storage in the database and the <code>unserialize</code> function for the opposite operation.
 +
 
 +
Example [https://stackoverflow.com/questions/13319638/working-with-serialized-data-in-wordpress]:
 +
<pre>
 +
$serialized = 'a:3:{i:0;s:5:"apple";i:1;s:6:"banana";i:2;s:6:"orange";}';
 +
var_dump(unserialize($serialized));
 +
 
 +
Output:
 +
 
 +
Array
 
(
 
(
     "update ", table_name_in, "
+
     [0] => apple
     join
+
    [1] => banana
        serial_tmp
+
     [2] => orange
        on
+
)
        serial_tmp.serial_data_src = ", table_name_in, ".",col_serial_data_src_in,"
+
</pre>
    set
+
 
      ",table_name_in, ".",col_serial_data_dst_in, " = serial_tmp.serial_data_dst;"
+
== WP-CLI ==
);
+
 
 +
[[Hoofdpagina#WP-CLI | WP-CLI]] can handle serialised data very well, as long as it is about ''specific fields''. E.g.: The ''positional parameters'' <code>pluck</code> and <code>patch</code> are for reading and writing parts of serialised data.
 +
 
 +
However, there are no generic WP-CLI commands for serializing and unserializing.
 +
 
 +
Maybe the aforementioned PHP commands can be invoked through <code>wp eval</code>, but so far (2023.11, after half an hour of trying) I didn't get it to work → [[Wp eval (WP-CLI)]].
 +
 
 +
== MySQL ==
 +
 
 +
MySQL does not seem to have built-in functions similar to the PHP functions ''serialize'' and ''unserialize''. This would have been possible: MySQL, for example, does have functions to deal with XML.
 +
 
 +
== Syntaxis ==
 +
 
 +
Dankzij het voorbeeld hierboven, kun je al een deel van de syntaxis achterhalen:
 +
 
 +
* Data-elementen zijn omhuld door <code>{</code> en <code>}</code>
 +
* <code>a</code>: Array
 +
* <code>i</code>: Counter; index
 +
* <code>s</code>: String + lengte. Bv. <code>s:5"apple"</code>.
 +
 
 +
=== Stringlengte ===
 +
 
 +
''Strings'' worden voorafgegaan door de lengte van de betreffende string, gemeten in UTF8-bytes. Als dit getal niet klopt, wordt de betreffende serialised data, niet verwerkt.
 +
 
 +
Voorbeelden:
 +
 
 +
<pre>
 +
# The string "0...9" is exactly 10 bytes long: Every character is exactly one byte
 +
#
 +
s:10:"0123456789"
 +
</pre>
  
prepare stmt from @ddl2;
+
<pre>
execute stmt;
+
# The letter "è" takes two bytes, hence the whole string is 7 bytes long, although only 4 characters long
deallocate prepare stmt;
+
#
 +
s:7:"Bèèè"
 +
</pre>
  
END
+
<pre>
 +
concat('s:27:"Balais de charbon d', char(39), 'origine"')
 +
#
 +
# * Balais de charbon d → 19 characters & 19 bytes
 +
# * char(39) → 1 character & 1 byte
 +
# * origine → 7 characters & 7 bytes
 +
#
 +
# * Total: 27 bytes
 
</pre>
 
</pre>
 +
 +
=== Enveloping symbols ===
 +
 +
* Usually, you can choose to envelop a string within 'apostrophes' or within "quotes"
 +
* However, we have a complication here: The actual string already contains both apostrophes & quotes
 +
* The way to solve this: Use apostrophes for enveloping the string. When an apostrophe is needed inside the string, and use char(39) to represent the apostrophe inside the string
 +
* BTW: <code>char(39)</code> counts as one byte
 +
 +
Voorbeeld:
 +
 +
<pre>
 +
concat('s:27:"Balais de charbon d', char(39), 'origine"')
 +
#
 +
# * The string (inside the apostrophes) becomes
 +
# * s:27:"Balais de charbon d'origine"
 +
</pre>
 +
 +
== See also ==
 +
 +
* [[Wp eval (WP-CLI)]]
  
 
== Bronnen ==
 
== Bronnen ==

Huidige versie van 27 nov 2023 om 18:07

Sometimes database fields contain serialized data: Multiple data in one field, concatenated using a certain syntax. Although the concept of serialized data seems quite at odds with some basic databases principles, there is an obvious application for: Storing objects.

However, compaired to e.g., YAML, this nameless protocol that WordPress uses, has one major issue: Before each string, the length of that string is specified. When you change the string but fail to update this string-count, the object is not readable anymore. See below for a solution.

Example

In table wp_options, field widget_woocommerce_layered_nav encodes all Woocommerce widgets. This is an example of an online store in Oct. 2020.

Raw content:

a:14:{i:4;a:4:{s:5:"title";s:21:"Originele priri";s:9:"attribute";s:21:"originele priri";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}i:5;a:4:{s:5:"title";s:12:"Breedte (mm)";s:9:"attribute";s:7:"breedte";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}i:6;a:4:{s:5:"title";s:11:"Diepte (mm)";s:9:"attribute";s:6:"diepte";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}i:7;a:4:{s:5:"title";s:11:"Lengte (mm)";s:9:"attribute";s:6:"lengte";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}i:8;a:4:{s:5:"title";s:18:"Automatische stop?";s:9:"attribute";s:17:"automatische_stop";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}i:9;a:4:{s:5:"title";s:12:"Stroomdraad?";s:9:"attribute";s:11:"stroomdraad";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}i:10;a:4:{s:5:"title";s:10:"Connector?";s:9:"attribute";s:9:"connector";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}i:11;a:4:{s:5:"title";s:5:"Veer?";s:9:"attribute";s:4:"veer";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}i:12;a:4:{s:5:"title";s:7:"Diagram";s:9:"attribute";s:7:"diagram";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}i:13;a:4:{s:5:"title";s:19:"Ztoepassing - Model";s:9:"attribute";s:16:"toepassing_model";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}i:14;a:4:{s:5:"title";s:21:"Toepassing - Submodel";s:9:"attribute";s:19:"toepassing_submodel";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}i:15;a:4:{s:5:"title";s:17:"Azijn-productcode";s:9:"attribute";s:10:"azijn_code";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}i:16;a:4:{s:5:"title";s:11:"Productlijn";s:9:"attribute";s:11:"productlijn";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";}s:12:"_multiwidget";i:1;}

The first few lines in more readable form:

a:14:
{
	i:4;a:4:
	{
		s:5:"title";
		s:21:"Originele pripripripr";
		s:9:"attribute";
		s:21:"originele pripripripr";
		s:12:"display_type";
		s:4:"list";
		s:10:"query_type";
		s:3:"and";
	}
	i:5;a:4:
	{
		s:5:"title";s:12:"Breedte (mm)";s:9:"attribute";s:7:"breedte";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";
	}
	i:6;a:4:
	{
		s:5:"title";s:11:"Diepte (mm)";s:9:"attribute";s:6:"diepte";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";
	}
	i:7;a:4:
	{
		s:5:"title";s:11:"Lengte (mm)";s:9:"attribute";s:6:"lengte";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";
	}
	i:8;a:4:
	{
		s:5:"title";s:18:"Automatische stop?";s:9:"attribute";s:17:"automatische_stop";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";
	}
	i:9;a:4:
	{
		s:5:"title";s:12:"Stroomdraad?";s:9:"attribute";s:11:"stroomdraad";s:12:"display_type";s:4:"list";s:10:"query_type";s:3:"and";
	}
}

The problem with serialised data

Working with serialised data in WordPress poses a specific problem: Prior to each string, its length in bytes in UTF8 (or something like that), is specified. If this number is incorrect, WordPress will refuse to process the entire field.

  • This means that it becomes quite difficult to process serialised data other than through appropriate API-calls. So no messing around directly in MySQL - Where otherwise this would be the only obvious solution.
  • But it gets worse: you cannot fix a messed-up serial field through WP-CLI functions that otherwise know how to handle serialised data. In order to fix messed-up serial fields. you probably have to recreate the whole object on a higher abstraction level (e.g.: Import a page afresh).

The problem - Example

On a site, page 51873 contains buttonbars. I can still retrieve all options for this page (including all buttonbars, implemented as ACF fields) with wp post meta list:

wp post meta list 51873

| post_id | meta_key                                    | meta_value                                                                                                                                       
+---------+---------------------------------------------+----------------------------------------------+
| 51873   | buttonbar_selection_0_image                 | 53521                                                                                                                                                
| 51873   | _buttonbar_selection_0_image                | field_5e450106734a3                                                                                                                                  
| 51873   | buttonbar_selection_0_link                  | a:3:{s:5:"title";s:0:"";s:3:"url";s:36:"https://example-canada.com/shop/blub/";s:6:"target";s:0:"";}                                          
| 51873   | _buttonbar_selection_0_link                 | field_5e450111734a4

Note that field buttonbar_selection_0_link contains serialised data. The problem: The string that is specified to be 36 characters/bytes/whatever, isn't that number. Hence you can't fetch the field using wp post meta get, eventhough the other fields around it, are still retrievable:

wp post meta get 51873 buttonbar_selection_0_title
wp post meta get 51873 buttonbar_selection_0_image
wp post meta get 51873 buttonbar_selection_0_link

Output:

Widget
53521

You can't directly update the compromised field

Looks promising, doesn't it?

wp post meta list 51873
wp post meta get 51873 buttonbar_selection_0_title
wp post meta get 51873 buttonbar_selection_0_image
wp post meta get 51873 buttonbar_selection_0_link

wp post meta pluck 51873 buttonbar_selection_0_link title
wp post meta pluck 51873 buttonbar_selection_0_link url
wp post meta pluck 51873 buttonbar_selection_0_link blub

wp post meta patch update 51873 buttonbar_selection_0_link url "https://example.com"

wp post meta list 51873
wp post meta get 51873 buttonbar_selection_0_title
wp post meta get 51873 buttonbar_selection_0_image
wp post meta get 51873 buttonbar_selection_0_link

But it's unforgiving. Even the non-compromised part title cannot be updated. The reading-part does work for a similar site without this problem (I didn't check writing there):

...
Error: No data exists for key "title"
Error: No data exists for key "url"
...

You can't insert into the compromised field

Pfff:

wp post meta list 51873 | grep buttonbar_selection_0_link


# Insert?
#
wp post meta patch insert 51873 buttonbar_selection_0_link title "Example.com"
wp post meta patch insert 51873 buttonbar_selection_0_link url "https://example.com"

Output:

Error: Cannot create key "title" on data type boolean
Error: Cannot create key "url" on data type boolean

You can't delete the subfields

By now, this was to be expected:

wp post meta patch delete 51873 buttonbar_selection_0_link title
wp post meta patch delete 51873 buttonbar_selection_0_link url

with output:

Error: No data exists for key "title"
Error: No data exists for key "url"

You can delete the field completely

Yep:

wp post meta delete 51873 buttonbar_selection_0_link
wp post meta list 51873 | grep buttonbar_selection_0_

Note that the field name including underscore (_buttonbar_selection_0_link), still exists, but that the version without underscore (buttonbar_selection_0_link) is gone:

51873	buttonbar_selection_0_title	Widget
51873	_buttonbar_selection_0_title	field_5e450082734a2
51873	buttonbar_selection_0_image	53521
51873	_buttonbar_selection_0_image	field_5e450106734a3
51873	_buttonbar_selection_0_link	field_5e450111734a4

How to recreate?

Now that the field is completely gone, how to recreate it? This doesn't seem to work:

wp post meta patch insert 51873 buttonbar_selection_0_link title "Blub"
wp post meta patch insert 51873 buttonbar_selection_0_link url "https://example.com"
wp post meta patch insert 51873 buttonbar_selection_0_link target ""

wp post meta list 51873 | grep buttonbar_selection_0_

Return:

Error: Cannot create key "title" on data type string
Error: Cannot create key "url" on data type string
Error: Cannot create key "target" on data type string

Tools to fix broken serialized data - Inventory

There is hope:

fix-serialization.php

Around 2021 & 2022, I have used the fix-serializatin.php script from https://btb.works/how-to-fix-a-broken-serialization/. It works very well, except for a critical bug: $content should be $data. Otherwise, an empty file is written. My version of the script:

<?php
#
# Fix serial data string length errors
################################################################################
# 
# @author: Wojtek Szałkiewicz - wojtek  at  bracketspace.com
# @license: GPL version 3 or later - http://www.gnu.org/licenses/gpl.txt
# Based on Pau Iglesias http://blogestudio.com package
#
#
# Usage
########################################
#
# * php fix-serialization.php my-sql-file.sql
# * php fix-serialization.php my-input-sql-file.sql my-output-sql-file.sql
#
#
# Known errors
########################################
#
# * Memory size exhausted: Allowed memory size of 67108864 bytes exhausted
#   (tried to allocate 35266489 bytes) How to fix: update php.ini 
#   memory_limit to 512M or more, and restart cgi service or webserver
# * Function preg_replace returns null or 0 length string If preg_last_error =
#   PREG_BACKTRACK_LIMIT_ERROR (value 2), increase pcre.backtrack_limit in 
#   php.ini (by default 100k, change to 2M by example) Same way for others
#   preg_last_error codes: 
#   http://www.php.net/manual/en/#unction.preg-last-error.php
#
#
################################################################################
# unescape_mysql()
################################################################################
#
function unescape_mysql( $value )
{
	#
	# * Remove redundant MySQL database dump escape codes
	# * Used in main code below
	#
	return str_replace(
		[ "\\\\", "\\0", "\\n", "\\r", "\Z",  "\'", '\"' ],
		[ "\\",   "\0",  "\n",  "\r",  "\x1a", "'", '"' ],
		$value
	);
}


################################################################################
# unescape_quotes()
################################################################################
#
function unescape_quotes( $value )
{
	# * Remove escape character "\" before a double quote
	# * Used in main code below
	#
	return str_replace('\"', '"', $value);
}


################################################################################
# Initialisation
################################################################################
#
# * Read file into $data
# * Reset counters
#
#
# Check for presence of input file name
########################################
#
if ( ! isset( $argv ) || ! isset( $argv[1] ) )
{
	#
	echo "Error: no input file specified\n\n";
	exit;
	#
}


# Store filename in "$input
########################################
#
$input = $argv[1];


# Check for existence of input file
########################################
#
if ( ! file_exists( $input ) )
{
	#
	echo "Error: input file does not exists\n";
	echo "{$input}\n\n";
	exit;
	#
}


# Store file contents in "$data"
########################################
#
$data = file_get_contents( $input );


# Check for succesful file reading
########################################
#
if ( ! $data ) 
{
	#
	echo "Error: can`t read data from input file\n";
	echo "{$input}\n\n";
	exit;
}


# Check that file has content
########################################
#
if ( ! isset( $data ) || ! ( strlen( $data ) > 0 ) )
{
	#	
	echo "Warning: the file is empty or can't read contents\n";
	echo "{$input}\n\n";
	exit;
	#
}


# Set counters to 0
########################################
#
$processed = 0;
$fixed     = 0;


################################################################################
# Main code: Search & replace
################################################################################
#
$data = preg_replace_callback('!s:(\d+):([\\\\]?"[\\\\]?"|[\\\\]?"((.*?)[^\\\\])[\\\\]?");!', function( $matches ) use ( &$processed, &$fixed )
{
	#
	# * preg_replace_callback — Perform a regular expression search and replace 
	#   using a callback
	#   https://www.php.net/manual/en/function.preg-replace-callback.php
	# * This seems to be the main function. It seems to work recursively or
	#   something.
	#   Quite impressive to see the input argument
	# * The data is continuously stored in "$data"
	#
	#
	# Found a string statement?
	########################################
	#
	if ( isset( $matches[3] ) )
	{
		#
		# Recalculate string length
		########################################
		#
		$len = strlen( unescape_mysql( $matches[3] ) );
		$str = unescape_quotes( $matches[3] );
		$result = "s:{$len}:\"$str\";";


		# Do old and new string length differ?
		########################################
		#
		if ( $result !== $matches[0] )
		{
			# Yes: Update fix counter
			########################################
			#
			$fixed++;
		}
	}

	# No string statement found (?)
	########################################
	#
	else 
	{
		#
		# Just return the unchanged argument
		########################################
		#
		$result = $matches[0];
	}


	# Increass string counter
	########################################
	#
	$processed++;

	return $result;
	
}, $data);


################################################################################
# Check results from "preg_replace"
################################################################################
#
if ( ! ( isset( $data ) && strlen( $data ) > 0 ) )
{
	// Error
	echo "Error: preg_replace returns nothing\n";

	if ( function_exists( 'preg_last_error' ) ) {
		$last_error = preg_last_error();
		echo "preg_last_error() = {$last_error}\n";
	}

	echo "{$input}\n\n";
	exit;
}


################################################################################
# Write to old or new file?
################################################################################
#
if ( isset( $argv[2] ) )
{
  $output = $argv[2];
  # echo "Output file name has been provided: {$argv[2]}\n";
} else
{
  $output = $input;
  # echo "No output file name has been provided - Overwrite input file\n";
}


################################################################################
# Write to file
################################################################################
#
# * file_put_contents($filename, $data) is a standard PHP function
# * Original first line: $result = file_put_contents( $output, $content );
#   It seems that "$content" should have been "$data". Otherwise, an empty file
#   is written
#
#
$result = file_put_contents( $output, $data );
// Write file data
if ( false === $result ) {
	// Error
	echo "Error: can't write fixed content\n";
	echo "{$output}\n\n";
	exit;
}


################################################################################
# Exit
################################################################################
#
echo "Found strings: {$processed}\n";
echo "Fixed: {$fixed}\n";

fix-serialization.php - Example

I use this self-developed Bash function tourlib-general-settings.sh » fix_corrupt_serialised_data() to automatically fix all serialised data errors in a database:

################################################################################
# fix_corrupt_serialised_data()
################################################################################
#
fix_corrupt_serialised_data()
{
	#
	# * Create db dump, fix database, import again
	# * Moved from tourlib-de-buttonbar-pages.sh - 2022.11.26
	# * This function doesn't make a database backup prior to its operation.
	#   It is highly recommended to do so. Use the existing function(s) to do
	#   so, e.g., tourlib-instantiate-site.sh » backup_db_to_folder_dvb()
	#
	#
	# Initialisation
	########################################
	#
	echo ""; echo ""; echo "### fix_corrupt_serialised_data..."; echo ""
	if [ -z "$project_path" ]; then echo "	Variable 'project_path' not provided. Exiting"; exit; fi
	if [ -z "$db_name" ]; then 			echo "	Variable 'db_name' not provided. Exiting"; exit; fi
	if [ -z "$path" ]; then 			echo "	Variable 'path' not provided. Exiting"; exit; fi


	# cd to project location
	#######################################
	#
	cd $project_path


	# Create time stamp
	#######################################
	#
	time_stamp=$(date +%Y.%m.%d-%H.%M)


	# Create db dump
	#######################################
	#
	echo "Create db dump (takes about 20s)..."
	#
	mysqldump $db_name > ${db_name}-${time_stamp}-org.sql


	# Fix db
	#######################################
	#
	echo "Fix db..."
	php ~/scripts/fix-serialization.php ${db_name}-${time_stamp}-org.sql ${db_name}-${time_stamp}-fixed.sql


	# Import db
	#######################################
	#
	echo "Import db again (takes about 70s)..."
	mysql $db_name < ${db_name}-${time_stamp}-fixed.sql


	# Delete transients
	#######################################
	#
	cd $path
	wp transient delete --all
	#
}

PHP: Serialize & unserialize

WordPress uses the PHP function serialize() to serialize data for storage in the database and the unserialize function for the opposite operation.

Example [1]:

$serialized = 'a:3:{i:0;s:5:"apple";i:1;s:6:"banana";i:2;s:6:"orange";}';
var_dump(unserialize($serialized));

Output:

Array
(
    [0] => apple
    [1] => banana
    [2] => orange
)

WP-CLI

WP-CLI can handle serialised data very well, as long as it is about specific fields. E.g.: The positional parameters pluck and patch are for reading and writing parts of serialised data.

However, there are no generic WP-CLI commands for serializing and unserializing.

Maybe the aforementioned PHP commands can be invoked through wp eval, but so far (2023.11, after half an hour of trying) I didn't get it to work → Wp eval (WP-CLI).

MySQL

MySQL does not seem to have built-in functions similar to the PHP functions serialize and unserialize. This would have been possible: MySQL, for example, does have functions to deal with XML.

Syntaxis

Dankzij het voorbeeld hierboven, kun je al een deel van de syntaxis achterhalen:

  • Data-elementen zijn omhuld door { en }
  • a: Array
  • i: Counter; index
  • s: String + lengte. Bv. s:5"apple".

Stringlengte

Strings worden voorafgegaan door de lengte van de betreffende string, gemeten in UTF8-bytes. Als dit getal niet klopt, wordt de betreffende serialised data, niet verwerkt.

Voorbeelden:

# The string "0...9" is exactly 10 bytes long: Every character is exactly one byte
#
s:10:"0123456789"
# The letter "è" takes two bytes, hence the whole string is 7 bytes long, although only 4 characters long
#
s:7:"Bèèè"
concat('s:27:"Balais de charbon d', char(39), 'origine"')
#
# * Balais de charbon d	→ 19 characters & 19 bytes
# * char(39) 		→ 1 character & 1 byte
# * origine		→ 7 characters & 7 bytes
#
# * Total: 27 bytes

Enveloping symbols

  • Usually, you can choose to envelop a string within 'apostrophes' or within "quotes"
  • However, we have a complication here: The actual string already contains both apostrophes & quotes
  • The way to solve this: Use apostrophes for enveloping the string. When an apostrophe is needed inside the string, and use char(39) to represent the apostrophe inside the string
  • BTW: char(39) counts as one byte

Voorbeeld:

concat('s:27:"Balais de charbon d', char(39), 'origine"')
#
# * The string (inside the apostrophes) becomes
# * s:27:"Balais de charbon d'origine"

See also

Bronnen