PostgreSQL B-Tree Index Explained - PART 2

08 May 2020
Sebastian Brestin
PostgreSQL

An index is an additional database structure which has the purpose of improving read performance at the cost of extra storage. For more details about the structure of the index and how an index influences simple queries, one can read PostgreSQL B-tree Index Explained PART 1. This article will focus more on advanced queries such as joins and group by’s and how an index will influence their execution.

To explain better the mechanics behind the index structure, I will use a PostgreSQL 10 database with two tables: employees and addresses. These two tables will serve as examples for all the code to follow.

CREATE TABLE "public"."addresses" (
    "id" integer NOT NULL,
    "city" character varying(40) NOT NULL,
    "country" character varying(40) NOT NULL,
    "street" character varying(40) NOT NULL
);

CREATE TABLE "public"."employees" (
    "id" integer NOT NULL,
    "company_id" integer NOT NULL,
    "dep" integer NOT NULL,
    "first_name" character varying(20),
    "last_name" character varying(20),
    "salary" integer,
    "address_id" integer
);

In order to create and populate the tables, one can use the script from postgres-btree-dataset and run it inside PostgreSQL CLI:

\i database.sql

1. Joins

As one might know, the SQL join operation binds at least two SQL tables together, based on one or more criteria/conditions. The database will use this new relation to retrieve the requested data. As with simple queries, indexes contribute to the performance of the join, but it’s impact is different for each join type.

There are three types of join[1]:

1. Nested Loops Join

2. Hash Join

3. Merge Join

The conditions on which the join is made are called join predicates, similar with the predicates discussed in the aforementioned article. The conditions following a join operation in a where clause which refer to only one table and are not part of the join predicates are called independent predicates[2].

Nested Loops Join

Nested Loops Join is similar to using nested selects(select from a select). Because of this aspect, indexes that will benefit a nested select implementation will also benefit the Nested Loops Join. Therefore indexes defined on the join predicates will improve query performance[2, 3].

Query Example 1

CREATE INDEX ON employees (address_id);
VACUUM ANALYZE employees;

EXPLAIN ANALYZE SELECT * FROM employees AS e JOIN addresses AS a ON e.address_id = a.id LIMIT 100;
                                                                     QUERY PLAN                                                                      
-----------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.29..41.22 rows=100 width=63) (actual time=0.034..0.868 rows=100 loops=1)
   ->  Nested Loop  (cost=0.29..40929.00 rows=100000 width=63) (actual time=0.032..0.820 rows=100 loops=1)
         ->  Seq Scan on addresses a  (cost=0.00..1731.00 rows=100000 width=27) (actual time=0.016..0.050 rows=102 loops=1)
         ->  Index Scan using employees_address_id_idx on employees e  (cost=0.29..0.37 rows=2 width=36) (actual time=0.004..0.006 rows=1 loops=102)
               Index Cond: (address_id = a.id)
 Planning time: 0.431 ms
 Execution time: 0.945 ms
(7 rows)

The query above limits the number of results, in order to force the planner to choose a simpler algorithm such as nested loops. One can see that because there is an index on address_id, it can be used to execute an index scan on employees. If one drops the index, only sequential scans will be executed.

Query Example 2

DROP INDEX employees_address_id_idx;
VACUUM ANALYZE employees;

EXPLAIN ANALYZE SELECT * FROM employees AS e JOIN addresses AS a ON e.address_id = a.id LIMIT 100;
                                                              QUERY PLAN                                                              
--------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=3665.00..3670.46 rows=100 width=63) (actual time=73.465..73.549 rows=100 loops=1)
   ->  Hash Join  (cost=3665.00..9124.00 rows=100000 width=63) (actual time=73.464..73.539 rows=100 loops=1)
         Hash Cond: (e.address_id = a.id)
         ->  Seq Scan on employees e  (cost=0.00..1836.00 rows=100000 width=36) (actual time=0.075..0.092 rows=193 loops=1)
         ->  Hash  (cost=1731.00..1731.00 rows=100000 width=27) (actual time=72.471..72.471 rows=100000 loops=1)
               Buckets: 65536  Batches: 2  Memory Usage: 3448kB
               ->  Seq Scan on addresses a  (cost=0.00..1731.00 rows=100000 width=27) (actual time=0.021..31.573 rows=100000 loops=1)
 Planning time: 0.458 ms
 Execution time: 74.150 ms
(9 rows)

Hash Join

Hash Join as opposed to nested loops join, does not execute a tree traversal[2](index lookup). Instead it loads one side of the join into a hash table that will be queried for each row of the other join side. Indexing the join predicates does not improve query performance in this case, but indexing the independent predicates from the where clause will. This type of join performs very good when managing the hash table is not costly[3].

Query Example 3

EXPLAIN ANALYZE SELECT * FROM employees AS e JOIN addresses AS a ON e.address_id = a.id;
                                                           QUERY PLAN                                                           
--------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=3665.00..9124.00 rows=100000 width=63) (actual time=64.237..122.478 rows=100000 loops=1)
   Hash Cond: (e.address_id = a.id)
   ->  Seq Scan on employees e  (cost=0.00..1836.00 rows=100000 width=36) (actual time=0.015..5.663 rows=100000 loops=1)
   ->  Hash  (cost=1731.00..1731.00 rows=100000 width=27) (actual time=63.379..63.379 rows=100000 loops=1)
         Buckets: 65536  Batches: 2  Memory Usage: 3448kB
         ->  Seq Scan on addresses a  (cost=0.00..1731.00 rows=100000 width=27) (actual time=0.015..17.690 rows=100000 loops=1)
 Planning time: 0.237 ms
 Execution time: 125.573 ms
(8 rows)

The query above is the same as the one used for nested loops, except the limit is removed. In this case, the database determines that it is more efficient to build a hash table in order to execute the join. If one adds the index back on address_id, it will not influence the hash algorithm in any way.

Query Example 4

CREATE INDEX ON employees (address_id);
VACUUM ANALYZE employees;

EXPLAIN ANALYZE SELECT * FROM employees AS e JOIN addresses AS a ON e.address_id = a.id;
                                                           QUERY PLAN                                                           
--------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=3665.00..9124.00 rows=100000 width=63) (actual time=61.079..120.207 rows=100000 loops=1)
   Hash Cond: (e.address_id = a.id)
   ->  Seq Scan on employees e  (cost=0.00..1836.00 rows=100000 width=36) (actual time=0.014..5.728 rows=100000 loops=1)
   ->  Hash  (cost=1731.00..1731.00 rows=100000 width=27) (actual time=60.192..60.192 rows=100000 loops=1)
         Buckets: 65536  Batches: 2  Memory Usage: 3448kB
         ->  Seq Scan on addresses a  (cost=0.00..1731.00 rows=100000 width=27) (actual time=0.015..16.652 rows=100000 loops=1)
 Planning time: 0.539 ms
 Execution time: 123.305 ms
(8 rows)

Merge Join

The algorithm behind a merge join combines two sorted relations into one[2]. That means, both sides of the join will be ordered according to the join predicates. Because of this extra sort operation, merge join can often be less efficient in comparison with hash join. An index on the join predicates can often eliminate the extra sort operation by using the order established by the indexes thus improving execution time. Without the indexes on the join predicates, merge join will behave similar to hash join. It will try to materialize(cache) one of the join sides in order to boost performance[3].

The next query forces a merge join over a hash join by specifying an order by. That means the database has to execute a sort eventually, thus the merge join seems more appropriate.

Query Example 5

EXPLAIN ANALYZE SELECT * FROM employees AS e JOIN addresses AS a ON e.address_id = a.id ORDER BY a.id;
                                                                        QUERY PLAN                                                                         
-----------------------------------------------------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=12430.16..20378.00 rows=100000 width=67) (actual time=93.726..184.670 rows=100000 loops=1)
   Merge Cond: (e.address_id = a.id)
   ->  Index Scan using employees_address_id_idx on employees e  (cost=0.29..5948.25 rows=100000 width=36) (actual time=0.016..27.180 rows=100000 loops=1)
   ->  Materialize  (cost=12429.82..12929.82 rows=100000 width=27) (actual time=93.705..119.142 rows=136728 loops=1)
         ->  Sort  (cost=12429.82..12679.82 rows=100000 width=27) (actual time=93.702..106.165 rows=100000 loops=1)
               Sort Key: a.id
               Sort Method: external sort  Disk: 3720kB
               ->  Seq Scan on addresses a  (cost=0.00..1731.00 rows=100000 width=27) (actual time=0.014..17.224 rows=100000 loops=1)
 Planning time: 0.384 ms
 Execution time: 189.356 ms
(10 rows)

From this query plan, one can see that the existing index from one side is used in order to avoid the sort, while for the other side and explicit sort is performed. If one adds a primary key on the id column of the address table, both sort operations are eliminated and the join is performed much more efficient.

Query Example 6

ALTER TABLE addresses ADD PRIMARY KEY (id);
VACUUM ANALYZE addresses;

EXPLAIN ANALYZE SELECT * FROM employees AS e JOIN addresses AS a ON e.address_id = a.id ORDER BY a.id;
                                                                        QUERY PLAN                                                                         
-----------------------------------------------------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=0.58..10786.37 rows=100000 width=67) (actual time=0.033..107.612 rows=100000 loops=1)
   Merge Cond: (e.address_id = a.id)
   ->  Index Scan using employees_address_id_idx on employees e  (cost=0.29..5948.20 rows=100000 width=36) (actual time=0.015..42.339 rows=100000 loops=1)
   ->  Index Scan using addresses_pkey on addresses a  (cost=0.29..3338.29 rows=100000 width=27) (actual time=0.012..15.166 rows=100000 loops=1)
 Planning time: 0.473 ms
 Execution time: 113.863 ms
(6 rows)

2. Sorting and Grouping

As mentioned before, indexes enforce a logical order, therefore it will benefit operations that rely on ordering results such as sorting or grouping(or merge join as seen above).

Sorting

The Order By clause sorts the result of a query on the specified columns. In case these columns are part of an index definition, the sort step can be entirely skipped due to the predefined order from the index. In other words, if the index order corresponds with the order by clause, the sort step can be entirely omitted[4].

In order to take advantage of the index when sorting, use at least a prefix of the index.

Using order by with an index could execute the query in a pipelined manner, which means that the database is able to return results as they come in without having to process all records[2].

The index definition can include an explicit ordering argument for each index column. Also the order by clause can specify an explicit ordering for each column. In the case of multi column index, if the index ordering does not fit the order by clause ordering, the index will not be used and an explicit sort step will take place. This aspect is not a problem for single column indexes: because simple indexes can be read in both directions(ascending and descending)[5], the order in which they are read will always match the order by clause[4]. The following queries will illustrate this concept.

While there is an index on the address_id column from the employees table, an ORDER BY can be executed without an actual sort:

Query Example 7

EXPLAIN (ANALYZE) SELECT first_name FROM employees ORDER BY address_id;
                                                                    QUERY PLAN                                                                     
---------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using employees_address_id_idx on employees  (cost=0.29..5948.20 rows=100000 width=12) (actual time=0.041..39.535 rows=100000 loops=1)
 Planning time: 0.159 ms
 Execution time: 43.628 ms
(3 rows)

The index can easily be used to ORDER BY descending. The index lookup will execute backwards as one can see below:

Query Example 8

EXPLAIN (ANALYZE) SELECT first_name FROM employees ORDER BY address_id DESC;
                                                                         QUERY PLAN                                                                         
------------------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan Backward using employees_address_id_idx on employees  (cost=0.29..5948.20 rows=100000 width=12) (actual time=0.024..38.660 rows=100000 loops=1)
 Planning time: 0.112 ms
 Execution time: 42.697 ms
(3 rows)

If one drops the index from the address_id column, an explicit sort will be executed in order to accommodate the ORDER BY clause:

Query Example 9

DROP INDEX employees_address_id_idx;
VACUUM ANALYZE employees;

EXPLAIN (ANALYZE) SELECT first_name FROM employees ORDER BY address_id DESC;
                                                       QUERY PLAN                                                       
------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=10140.82..10390.82 rows=100000 width=12) (actual time=58.812..70.593 rows=100000 loops=1)
   Sort Key: address_id DESC
   Sort Method: external merge  Disk: 2336kB
   ->  Seq Scan on employees  (cost=0.00..1836.00 rows=100000 width=12) (actual time=0.018..13.346 rows=100000 loops=1)
 Planning time: 0.088 ms
 Execution time: 75.379 ms
(6 rows)

For the multi column index, things are different as mentioned previously. When the order by clause match the index order, the index is used to retrieve all the data:

Query Example 10

CREATE INDEX ON employees (company_id, dep, last_name);
VACUUM ANALYZE employees;

EXPLAIN (ANALYZE) SELECT first_name FROM employees WHERE company_id > 10 ORDER BY company_id ASC, dep ASC;
                                                                          QUERY PLAN                                                                           
---------------------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using employees_company_id_dep_last_name_idx on employees  (cost=0.42..6507.22 rows=90000 width=16) (actual time=0.048..34.780 rows=89900 loops=1)
   Index Cond: (company_id > 10)
 Planning time: 0.172 ms
 Execution time: 38.372 ms
(4 rows)

When the order by does not match the index order, the database will fallback to sequential scan and perform the actual sort:

Query Example 11

EXPLAIN (ANALYZE) SELECT first_name FROM employees WHERE company_id > 10 ORDER BY company_id DESC, dep ASC;
                                                      QUERY PLAN                                                      
----------------------------------------------------------------------------------------------------------------------
 Sort  (cost=9491.94..9716.94 rows=90000 width=16) (actual time=78.372..90.275 rows=89900 loops=1)
   Sort Key: company_id DESC, dep
   Sort Method: external merge  Disk: 2456kB
   ->  Seq Scan on employees  (cost=0.00..2086.00 rows=90000 width=16) (actual time=0.019..16.598 rows=89900 loops=1)
         Filter: (company_id > 10)
         Rows Removed by Filter: 10100
 Planning time: 0.155 ms
 Execution time: 94.903 ms
(8 rows)

The explanation in detail for this is as follows[5]:

The predicate used, determines a leaf node range which is ordered ASC by company_id and also ASC by dep(default order).
When it tries to return the records in the order specified by the order by clause(DESC, ASC), it cannot easily do that by just using the index order from the leaf nodes. This is because in this case the leaf node traversal has to begin with the record having the biggest company_id and the smallest dep value. This record is not at the beginning or at the end of the range, it is inside the range. That would mean that the index lookup would have to jump its way through the leaf node chain in order to respect the order by clause.

Grouping

By using the Group By clause, one can perform analysis over data. There are two grouping algorithms used by PostgreSQL: a hash algorithm, aggregates the records in a temporary hash table and a sort/group algorithm that first sorts the input by the grouping key and after aggregates them.

As with hash join, the first algorithm will not benefit from an index, but the latter algorithm will benefit in a similar way the order by does. In order for a sort/group group by to be executed in a pipelined manner, the following conditions are required:

The group by clause has to be a prefix of the index definition
The index has to have the same ordering for all its elements

The next query illustrates the hash algorithm, grouping on a column without an index(a limit is used in order for the database to not be influenced by the size of the data, but only by the need of a sort operation):

Query Example 12

EXPLAIN (ANALYZE) SELECT COUNT(*) FROM employees GROUP BY address_id LIMIT 100;
                                                         QUERY PLAN                                                         
----------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=2336.00..2337.00 rows=100 width=12) (actual time=30.622..30.647 rows=100 loops=1)
   ->  HashAggregate  (cost=2336.00..2877.31 rows=54131 width=12) (actual time=30.621..30.638 rows=100 loops=1)
         Group Key: address_id
         ->  Seq Scan on employees  (cost=0.00..1836.00 rows=100000 width=4) (actual time=0.018..5.817 rows=100000 loops=1)
 Planning time: 0.138 ms
 Execution time: 33.519 ms
(6 rows)

After adding an index on the address_id column, the database chooses the group/sort algorithms to perform the group by taking advantage of the predefined index order to skip an extra sort:

Query Example 13

CREATE INDEX ON employees (address_id);
VACUUM ANALYZE employees;

EXPLAIN (ANALYZE) SELECT COUNT(*) FROM employees GROUP BY address_id LIMIT 100;
                                                                          QUERY PLAN                                                                           
---------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.29..7.01 rows=100 width=12) (actual time=0.064..0.314 rows=100 loops=1)
   ->  GroupAggregate  (cost=0.29..3647.36 rows=54307 width=12) (actual time=0.062..0.272 rows=100 loops=1)
         Group Key: address_id
         ->  Index Only Scan using employees_address_id_idx on employees  (cost=0.29..2604.29 rows=100000 width=4) (actual time=0.050..0.105 rows=151 loops=1)
               Heap Fetches: 0
 Planning time: 0.178 ms
 Execution time: 0.397 ms
(7 rows)

The queries above demonstrate how an index can influence a group by. When an index is created on the group by column, the algorithm is switched from hash to sort/group. When it comes to grouping by a multi column index, the index order can influence which algorithm will get picked for grouping. The group by clause must use a prefix of the multi column index in order to be able to use the group/sort algorithm, similar to ORDER BY. The following queries demonstrate that not using a prefix can influence the performance of a group by.

Query Example 14

EXPLAIN (ANALYZE) SELECT COUNT(*) FROM employees GROUP BY company_id LIMIT 5;
                                                                                  QUERY PLAN                                                                                  
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.42..188.67 rows=5 width=12) (actual time=0.746..3.660 rows=5 loops=1)
   ->  GroupAggregate  (cost=0.42..3765.42 rows=100 width=12) (actual time=0.745..3.655 rows=5 loops=1)
         Group Key: company_id
         ->  Index Only Scan using employees_company_id_dep_last_name_idx on employees  (cost=0.42..3264.42 rows=100000 width=4) (actual time=0.039..2.031 rows=5010 loops=1)
               Heap Fetches: 0
 Planning time: 0.164 ms
 Execution time: 3.716 ms
(7 rows)

The query above uses only the leading column of a multi column index. The database can take advantage of this index and use its natural order to skip an extra sort step.

Query Example 15

EXPLAIN (ANALYZE) SELECT COUNT(*) FROM employees GROUP BY company_id, dep LIMIT 5;
                                                                                 QUERY PLAN                                                                                  
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.42..10.50 rows=5 width=16) (actual time=0.086..0.236 rows=5 loops=1)
   ->  GroupAggregate  (cost=0.42..4034.42 rows=2000 width=16) (actual time=0.084..0.231 rows=5 loops=1)
         Group Key: company_id, dep
         ->  Index Only Scan using employees_company_id_dep_last_name_idx on employees  (cost=0.42..3264.42 rows=100000 width=8) (actual time=0.034..0.130 rows=240 loops=1)
               Heap Fetches: 0
 Planning time: 0.233 ms
 Execution time: 0.293 ms
(7 rows)

For the query above, the first two leading column are used, thus the index natural order can still be used.

Query Example 16

EXPLAIN (ANALYZE) SELECT COUNT(*) FROM employees GROUP BY dep LIMIT 5;
                                                         QUERY PLAN                                                         
----------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=2336.00..2336.05 rows=5 width=12) (actual time=23.551..23.552 rows=5 loops=1)
   ->  HashAggregate  (cost=2336.00..2336.20 rows=20 width=12) (actual time=23.549..23.550 rows=5 loops=1)
         Group Key: dep
         ->  Seq Scan on employees  (cost=0.00..1836.00 rows=100000 width=4) (actual time=0.016..6.543 rows=100000 loops=1)
 Planning time: 0.163 ms
 Execution time: 23.608 ms
(6 rows)

The query above uses only the second index column, thus the database cannot benefit from the natural order, falling back to a hash group by.

Until now, all grouping queries involving multi column indexes used an index with the same order for its columns. In order to test a mixture of orderings, a new multi column index has to be created: first, the previous index is dropped; second, a new index with a mixture of orderings is created:

Query Example 17

DROP INDEX employees_company_id_dep_last_name_idx;
CREATE INDEX ON employees (company_id, dep DESC, last_name);
VACUUM ANALYZE employees;

EXPLAIN (ANALYZE) SELECT COUNT(*) FROM employees GROUP BY company_id, dep LIMIT 5;
                                                         QUERY PLAN                                                         
----------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=2586.00..2586.05 rows=5 width=16) (actual time=24.843..24.845 rows=5 loops=1)
   ->  HashAggregate  (cost=2586.00..2606.00 rows=2000 width=16) (actual time=24.842..24.842 rows=5 loops=1)
         Group Key: company_id, dep
         ->  Seq Scan on employees  (cost=0.00..1836.00 rows=100000 width=8) (actual time=0.015..6.214 rows=100000 loops=1)
 Planning time: 0.215 ms
 Execution time: 24.931 ms
(6 rows)

When examining the result of the query above, one can see that instead of using the sort/group algorithm the database uses the hash. If the company_id is fixed to a single value, the index can be once again used to avoid an extra sort:

Query Example 18

EXPLAIN (ANALYZE) SELECT COUNT(*) FROM employees WHERE company_id = 2 GROUP BY company_id, dep LIMIT 5;
                                                                                    QUERY PLAN                                                                                    
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.42..0.75 rows=5 width=16) (actual time=0.087..0.260 rows=5 loops=1)
   ->  GroupAggregate  (cost=0.42..56.11 rows=844 width=16) (actual time=0.086..0.255 rows=5 loops=1)
         Group Key: company_id, dep
         ->  Index Only Scan Backward using employees_company_id_dep_last_name_idx on employees  (cost=0.42..39.49 rows=1090 width=8) (actual time=0.039..0.148 rows=253 loops=1)
               Index Cond: (company_id = 2)
               Heap Fetches: 0
 Planning time: 0.240 ms
 Execution time: 0.322 ms
(8 rows)

The explanation for this is hidden in the fact that the database tries to use only one ordering for all columns during the group by. To test this assumption, one can create a multi column index with descending ordering:

Query Example 19

DROP INDEX employees_company_id_dep_last_name_idx;
CREATE INDEX ON employees (company_id DESC, dep DESC, last_name DESC);

EXPLAIN (ANALYZE) SELECT COUNT(*) FROM employees GROUP BY company_id, dep LIMIT 5;
                                                                                      QUERY PLAN                                                                                      
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.42..10.50 rows=5 width=16) (actual time=0.132..0.286 rows=5 loops=1)
   ->  GroupAggregate  (cost=0.42..4034.42 rows=2000 width=16) (actual time=0.130..0.283 rows=5 loops=1)
         Group Key: company_id, dep
         ->  Index Only Scan Backward using employees_company_id_dep_last_name_idx on employees  (cost=0.42..3264.42 rows=100000 width=8) (actual time=0.077..0.177 rows=247 loops=1)
               Heap Fetches: 0
 Planning time: 0.447 ms
 Execution time: 0.356 ms
(7 rows)

The plan suggests that the scan is done backwards of the descending order, thus the assumption is correct.

Therefore the explanation for this behaviour is similar to the one from ORDER BY: after creating an index with different orderings for the columns, the group by behaves exactly as order by does, and cannot use the index order because it cannot follow the leaf node chain uninterrupted. After adding company_id=2, the index can be used again to follow the leaf node chain backwards without jumps, because inside the leaf node range defined by the predicate company_id=2, all records are ordered by dep.

3. Summary

The article starts with the join strategies and goes into details about the impact an index can have on different join algorithms: nested loops join, hash join and merge join.

The second part tries to explain how sort and aggregates can be improved by using indexes. It dives into different ordering directions and grouping algorithms.

Both parts of the article leverage query plans to illustrate what happens under the hood when one adds an index or changes an ordering direction.

[1] https://www.postgresql.org/docs/10/planner-optimizer.html

[2] SQL Performance explained - Markus Winand

[3] PostgreSQL 10 High Performance, Chapter 10 - Query Optimization - Ibrar Ahmed, Gregory Smith, Enrico Pirozzi

[4] https://www.postgresql.org/docs/10/indexes-ordering.html

[5] PostgreSQL B-tree Index Explained PART 1

Sebastian Brestin

Sebastian founded Qwertee in 2017. He holds a BS in computer science from Babes Bolyai university (Cluj-Napoca, Romania). His expertise ranges from backend development to data engineering, and he has a keen interest in network and security related topics. His experience includes working in multinational corporations such as HP but also in a fast paced startup environment. Sebastian has a wide variety of interests such as learning about video game design or meeting up with the local startup community.

PostgreSQL B-Tree Index Explained - PART 2

1. Joins

Nested Loops Join

Hash Join

Merge Join

2. Sorting and Grouping

Sorting

Grouping

3. Summary

Sebastian Brestin

Categories

Latest Posts

Tags Cloud

STAY UP TO DATE ON BIG DATA

PostgreSQL B-Tree Index Explained - PART 2

1. Joins

Nested Loops Join

Hash Join

Merge Join

2. Sorting and Grouping

Sorting

Grouping

3. Summary

Sebastian Brestin

Categories

Latest Posts

Tags Cloud