Transients Layered Nav Counts problem (WooCommerce): verschil tussen versies

Uit De Vliegende Brigade
Naar navigatie springen Naar zoeken springen
Regel 79: Regel 79:
 
* At the same time, <code>/var/log/php/error.log</code> was also growing with about 2GB/minute (or something like that). I haven't analysed its content: By that time, we already had a script in place to delete this file every  minute - This is the only part of 'what happened' that I don't really understand - No worries, though!
 
* At the same time, <code>/var/log/php/error.log</code> was also growing with about 2GB/minute (or something like that). I haven't analysed its content: By that time, we already had a script in place to delete this file every  minute - This is the only part of 'what happened' that I don't really understand - No worries, though!
 
* Already for some years, we have a script that deletes the transients once a week, because otherwise layout of some pages get messed up. Accidentially, we discovered this week that this messed-up layout is because an error is outputted.
 
* Already for some years, we have a script that deletes the transients once a week, because otherwise layout of some pages get messed up. Accidentially, we discovered this week that this messed-up layout is because an error is outputted.
 +
 +
In the following chapters, let's move from the problem to a solution:
  
 
== Layered navigation ==
 
== Layered navigation ==

Versie van 7 dec 2023 11:22

In the autumn of 2023, we encountered a problem on WooCommerce webshops. It turned out that this was related to some other problems that we have been confronted with for probably one or two years already.

Immediate problems

This is what we experienced as the immediate problem - the cause for this article and our troubshooting around this.

For context:

  • This all plays on our new Hetzner server, which is big and fast
  • There are about 40 webshops on this server.

These 'immediate problems' nicely illustrate how amorphous computer problems can be: It seems like twice the same problem, isn't it? Or like one problem and one underlying cause - How to keep these kinds of things structured? → Try to keep it intuitive, rather than 'logically rigid'.

Immediate problems
Problem Notes
Random database server crashes? Every couple of days, the MariaDB server crashed, seemingly due to unlikely issues, like memory shortage or something that seemed like compromized data
Database server crashes due to storage issues Somehow, /var/logs/php/php.error was growing with about 2GB per minute, crashing the database server when the server run out of storage space

The problem

We have the problem described here at GitHub, but with a minor difference:

  • The situation described in this article, is the 'old' situation when there was only one transient for all filters, called _transient_wc_layered_nav_countsr
  • To alluviate this problem, Woo decided to split this one transient into separate transients for each filter [1]
  • Instead of this one transient, we now have several transients with names like _transient_wc_layered_nav_counts_pa_<name of the attribute filter>.

Despite this difference, we have the exact symptomps described in the link above.

Symptoms

What we experience, and that I found back in the article mentioned above:

  • There are a couple of transients with names starting with transient_wc_layered_nav_counts_pa_ that make up about 80% of the size of table wp_options
  • Storage space for table wp_options grows with about 2GB per minute, probably because of logging, as these transients are updated constantly
  • Db server crashes when running out of storage space
  • Site performance degredation, even with enough storage space available
  • Every couple of days, we have to delete transients - We have been doing this for at least a year already and only now can I connect it with the other issues
  • File /var/log/php/error.log grows as well with about 2GB/minute. Unknown if this is related

Our symptoms are nicely illustrated by this comment:

I confirm that since WC 3.2.1, I have to purge my MariaDB
bin log files every 5 days. With the current traffic I get
on my site (which is yet little trafic), approximately every
5 days, this transient string is so huge that it generates up
to 8 gigas of log when it's saved to the table.

Another illustration of our symptoms [2]:

We are proactively contacting you because of hyper filling
up the storage up to 500GB due to massive transaction logs
by the site.

The engineering team detected that it’s generating ~150MB/sec
of mysql bin logs (transaction logs). From a quick glance,
it looks like the wp_options table and a key named
_transient_wc_layered_nav_countsr may be the culprit. The
key is being overwritten/appended to at a huge rate.

What seemed to be happening

Each webshop has about 9 to 14 attribute taxonomies + associated filters at the /shop page. The desciption below is not complete nor consistent: Like always, everything is changing all the time. Especially: While I was measuring stuff, someone else was fixing the problems:

  • For each filter, a transient is created (in table wp_options). Such transients easily become several MB per field (didn't test this exhaustingly)
  • WP-CLI command wp transient list all usually doesn't work
  • These filters are being updated all the time, increasing the size of the database logs. That's why files like /var/lib/mysql/example_com/wp_options.ibd were growing so fast, that the overall storage was decreasing with about 2GB per minute
  • At the same time, table wp_options still had only about 1,400 lines. Individual fields were becomming megabytes in size. The total size of this table was measured through Bash commands like out4=$(mysqldump ${db} wp_options | wc -c)
  • At the same time, /var/log/php/error.log was also growing with about 2GB/minute (or something like that). I haven't analysed its content: By that time, we already had a script in place to delete this file every minute - This is the only part of 'what happened' that I don't really understand - No worries, though!
  • Already for some years, we have a script that deletes the transients once a week, because otherwise layout of some pages get messed up. Accidentially, we discovered this week that this messed-up layout is because an error is outputted.

In the following chapters, let's move from the problem to a solution:

Layered navigation

In WooCommerce, layered navigation refers to a filtering system that helps customers refine their search results while browsing a store. It allows shoppers to narrow down product selections based on various attributes or criteria, such as price range, size, color, brand, or any custom product attributes set by the store owner.

Layered navigation typically appears as a sidebar or series of filters on category pages or search results. When customers select specific criteria or attributes, the displayed products dynamically update to show only items that match the chosen filters. This feature enhances the user experience by making it easier for customers to find the products that meet their specific preferences or requirements.

Layered navigation counts

WC Layered navigation counts refers to counting the number of items that are within a filter. Amongst other, this used for the WC Layered Nav Widget to decide whether or not to hide an item when count is 0.

Maybe this came out of this issue about how slow the query was that does this job [3]. Hence cachching[4]

Transients

Example:

option_id                    option_name                     First characters       Size
---------   -----------------------------------------   -------------------------   -----
2168239     _transient_wc_layered_nav_counts_pa_asxxx   a:16:{i:0;b:0;s:32:"c5662   0.28M
2168225     _transient_wc_layered_nav_counts_pa_auxxx   a:16:{i:0;b:0;s:32:"4c798   0.60M
2168229     _transient_wc_layered_nav_counts_pa_coxxx   a:16:{i:0;b:0;s:32:"c3e8d   0.35M
2168221     _transient_wc_layered_nav_counts_pa_depxx   a:16:{i:0;b:0;s:32:"56496   0.36M
2168213     _transient_wc_layered_nav_counts_pa_devxx   a:16:{i:0;b:0;s:32:"4c23e   0.34M
2168235     _transient_wc_layered_nav_counts_pa_dixxx   a:16:{i:0;b:0;s:32:"258a3   0.39M
2168223     _transient_wc_layered_nav_counts_pa_lexxx   a:16:{i:0;b:0;s:32:"099f4   0.37M
2168215     _transient_wc_layered_nav_counts_pa_moxxx   a:16:{i:0;b:0;s:32:"cc751   0.38M
2168233     _transient_wc_layered_nav_counts_pa_orxxx   a:18:{i:0;b:0;s:32:"65b34  10.28M
2168237     _transient_wc_layered_nav_counts_pa_prxxx   a:16:{i:0;b:0;s:32:"4ae79   1.86M
2168231     _transient_wc_layered_nav_counts_pa_sprxx   a:16:{i:0;b:0;s:32:"c5e23   0.36M
2168217     _transient_wc_layered_nav_counts_pa_subxx   a:16:{i:0;b:0;s:32:"52a9d   0.34M
2168219     _transient_wc_layered_nav_counts_pa_widxx   a:16:{i:0;b:0;s:32:"7735d  14.90M
2168227     _transient_wc_layered_nav_counts_pa_wirxx   a:16:{i:0;b:0;s:32:"a04c7   0.34M

Potential solutions

Improvement in performance by applying a patch to remove the transient
  1. Using memcache [5]
  2. Flushing via cron when it gets too big [6]. By default, they hang around for about 1 day, so it size is an issue, they should probably be flushed more often, e.g., once per hour
  3. Removing the transient - See image
  4. Using an boject cache like Memcache or Reddis to store this in-memory, rather than in a database [7]
  5. Remove the count in the layered navigation widgets - What is this???

See also

Sources