Indexes (MySQL)

Wauw: Na toevoeging van een index via alter table dimensions_tmp add index sku(sku); is de executietijd van 3,3 minuten teruggebracht naar 2 seconde! En da's niet de eerste keer dat ik zo'n dramatische verbetering meemaak!

Er is een primaire index, genaamd PRIMARY. Daarnaast zonet handmatig een tweede, gewone' index blub aangemaakt

Ik heb dramatische verbeteringen meegemaakt na het toevoegen van indexes aan query's. Trage query? Indexes is een van de eerste dingen om aan te denken!

Meestal zoek ik zoiets:

alter table tool_tmp
    add index(tool_id),
    add index(tool_kind);

Inleiding

[1]:

Basically an index on a table works like an index in a book (that's where the name came from):

Let's say you have a book about databases and you want to find some information about, say, storage. Without an index (assuming no other aid, such as a table of contents) you'd have to go through the pages one by one, until you found the topic (that's a full table scan). On the other hand, an index has a list of keywords, so you'd consult the index and see that storage is mentioned on pages 113-120,231 and 354. Then you could flip to those pages directly, without searching (that's a search with an index, somewhat faster).

Of course, how useful the index will be, depends on many things - a few examples, using the simile above:

If you had a book on databases and indexed the word "database", you'd see that it's mentioned on pages 1-59,61-290, and 292 to 400. In such case, the index is not much help and it might be faster to go through the pages one by one (in a database, this is "poor selectivity").
For a 10-page book, it makes no sense to make an index, as you may end up with a 10-page book prefixed by a 5-page index, which is just silly - just scan the 10 pages and be done with it.
The index also needs to be useful - there's generally no point to index e.g. the frequency of the letter "L" per page.

[2]:

There are different kinds of indexes and they're implemented in the storage layer, so there's no standard between them and they also depend on the storage engine that you're using.

Soorten

Voor InnoDB-databases, is de meest gebruikelijke soort van index de B+Tree based index. Ik vermoed dat dit de enige soort index is, die ik in de praktijk tegen ben gekomen. Daarnaast bestaat er een hash-index, voor in-memory tabellen, zoals tijdelijke tabellen.

Tenzij anders aangegeven, gaat dit artikel alleen over deze eerste soort indexes.

Verder...

Voeg altijd indexes toe aan tabellen op velden die in joins gebruikt worden
TXT- & BLOB-velden kunnen niet zondermeer geïndexeerd worden. Voor text-velden moet je de sleutellengte defineren. In zulke situaties vraag ik me af, of een tekst-veld wel de juiste keuze is
PK-velden zijn altijd geïndexeerd.

Syntax & voorbeelden

De primaire onderdelen van een index:

Naam
Type
Geïndexeerde kolommen.

Verder zijn er nog een paar toeters en bellen:

Storage type
Key block size
Parser
Commentaar.

De gebruikelijke manier om een index aan te maken, lijkt te zijn met zoiets als

alter table [table_name]
add index [index_naam] ([veld_nam(en)]);

Je kunt de naam echter weglaten. Dan wordt de naam hetzelfde als de naam van het veld. Bv.:

alter table [table_name]
add index ([veld_nam(en)]);

Je kunt in één statement meerdere indexen toevoegen:

alter table tool_tmp
    add index(tool_id),
    add index(tool_kind);

Voorbeeld van een samengestelde index die via de interface van MySQL Workbench is aangemaakt:

ALTER TABLE `bal_dwh_process`.`tool_combined_tmp` 
ADD INDEX `index3` (`sku` ASC, `sku_oem` ASC);

MySQL Workbench genereert zelf een naam: "index" + volgnummer van de index
Merk op dat is gespecificeerd dat de velden oplopend zijn geïndexeerd.

Je kunt indexes ook specificeren tijdens de definitie van een tabel. Bv.:

create table tmp
(
    blub	varchar(12),
    blub2	varchar(13),

    index (blub),
    index (blub2)
);

Samengestelde sleutels & volgorde

The "problem" about this index type is that you have to query for the leftmost value to use the index. So, if your index has two columns, say last_name and first_name, the order that you query these fields matters a lot.

So, given the following table:

CREATE TABLE person 
(
    last_name VARCHAR(50) NOT NULL,
    first_name VARCHAR(50) NOT NULL,
    INDEX (last_name, first_name)
);

This query would take advantage of the index:

SELECT 
   last_name, 
   first_name 
FROM person
WHERE 
   last_name = "John" 
   AND 
   first_name LIKE "J%"

But the following one would not

SELECT 
   last_name, 
   first_name 
FROM person 
WHERE first_name = "Constantine"

Because you're querying the first_name column first and it's not the leftmost column in the index.

This last example is even worse:

SELECT 
   last_name, 
   first_name 
FROM person 
WHERE first_name LIKE "%Constantine"

Because now, you're comparing the rightmost part of the rightmost field in the index.

INFORMATION_SCHEMA

Informatie over indexes, kun je vinden in information_schema.statistics. Voorbeeld van de tabel uit een eerder voorbeeld:

select
   index_name,
   seq_in_index,
   column_name
from information_schema.statistics
where 
   table_schema = database()
and
   table_name = "tool_combined_tmp";

Uitkomst:

             
PRIMARY   1   sku       |
PRIMARY   2   tool_id   | Samengestelde index (sku, tool_id)
tool_id   1   tool_id
index3    1   sku       |
index3    2   sku_oem   | Samengestelde index (sku, sku_oem)

Add index if not exists (2019)

Helaas bestaat er geen commando add index if not exists, maar er zijn oplossingen:

Index gewoon aanmaken - Dan krijg je een extra index
Zelf een sproc voor schrijven.

Gewoon aanmaken

Meestal kun je gewoon zo'n additionele index aanmaken, tenzij dat uit de hand loopt: Je krijgt een foutmelding as er meer dan 64 indexes zijn gedefineerd op een tabel

Eigen sproc schrijven

Eigen sproc, geïnspireerd op [3]:

CREATE DEFINER=`root`@`localhost` PROCEDURE `add_index_if_not_exists`
(
    table_name_in	varchar(64),
    column_name_in	varchar(64)
)
BEGIN
#
# Add an index if not exists - ONLY WORKS FOR SINGLE INDEXES - NOT FOR COMPOUND INDEXES
############################################################################################################
# Declare variable index_exists
############################################################################################################
#
declare index_exists integer;


############################################################################################################
# Figure out of index exists - Store result in index_exists
############################################################################################################
#
select count(1) into index_exists
from information_schema.statistics
where 
	table_schema = database()
and
	table_name = table_name_in
and
	seq_in_index=1
and
	column_name = column_name_in;


############################################################################################################
# Create index if not exists
############################################################################################################
#
if index_exists = 0 then

    # Form ddl
    #######################################################
    #
    set @ddl = concat
    (
	"create index ", column_name_in, " ",
        "on ", database(), ".", table_name_in, " ",
        "(", column_name_in, ")"
    );

    # Execute stmt
    #######################################################
    #
    prepare stmt from @ddl;
    execute stmt;
    deallocate prepare stmt;
       
end if;
    
END

Casus: Update-query (april 2017)

Het probleem

Deze query is zo traag, dat ik 'm na 10 minuten heb onderbroken:

update ean_code
inner join ean_code_tmp on ean_code.ean_id = ean_code_tmp.ean_id
set 

    ean_code.sku = ean_code_tmp.sku,
    ean_code.note = ean_code_tmp.note;

Tabel ean_code heeft 100.000 regels
Tabel ean_code_tmp heeft 13.997 regels

Aanvullende gegevens

Ik heb ooit ergens opgepikt dat sommige soorten query's onthutsend traag zijn, omdat de engine daarbij per record van de ene tabel, alle records van de andere tabel moet langslopen (full table scan). Wellicht speelt dat hier een rol
Geen indexen gebruikt, terwijl dit waarschijnlijk precies een situatie is, waarin dat helpt om full table scans te voorkomen

De oplossing

-- Indexes aanmaken
-- ================
--
alter table ean_code     add index ean_id (ean_id);
alter table ean_code_tmp add index ean_id (ean_id);

update ean_code
inner join ean_code_tmp on ean_code.ean_id = ean_code_tmp.ean_id
set 

    ean_code.sku = ean_code_tmp.sku,
    ean_code.note = ean_code_tmp.note;

... Executietijd was nu nihil!

Casus: Join met een view (april 2020)

Een union select query met daarin tabellen en views, deed er te lang over. Je kunt geen indexes defineren op views, maar één van de subqueries die zo traag was, had geen index op het veld dat in de join zat. Daar een index op defineren, maakte een groot verschil:

Oude situatie: Duration/Fetch 2,6s/178s

Nieuw situatie: Duration/Fetch 2,0s/0,9s - Da's 198x zo snel!

Index voor BLOB/TEXT column

Voor kolommen van het type BLOB of TEXT moet je de lengte van de sleutel defineren. Voorbeeld (2021.05.05):

alter table knl_s1.wp_postmeta
add index(meta_value(40));

Zie ook

Bronnen

http://stackoverflow.com/questions/3567981/how-do-mysql-indexes-work
http://stackoverflow.com/questions/3002605/how-do-i-add-indexes-to-mysql-tables
https://dev.mysql.com/doc/refman/5.6/en/mysql-indexes.html - 8.3.1 How MySQL Uses Indexes - Handig!
https://dev.mysql.com/doc/refman/5.5/en/multiple-column-indexes.html
https://dev.mysql.com/doc/refman/8.0/en/create-index.html
https://dev.mysql.com/doc/refman/5.7/en/explain.html - Explain maakt inzichtelijk of er indexes worden gebruikt in een query