Which is best: Partial Index and Expression Index: PostgreSQL

Here, we came across in the context of a real-world use case. Let see, Which is best: Partial Index and Expression Index in PostgreSQL. Both are good and the usages depend on the context of user queries.

Read: MySQL Index: Understanding MySQL indexing

Here, We show the power of advanced RDBMS features such as partial indexes and expression indexes.

Let us assume we have a table in PostgreSQL named users, where each row in the table represents a user. The table is defined as follows.

CREATE TABLE users (
  id    SERIAL PRIMARY KEY,
  email VARCHAR DEFAULT NULL,
  name  VARCHAR
);

Now, let us assume we create the following indexes on the table above.

Index #1Index #2
CREATE UNIQUE INDEX users_idx1 ON users (email) WHERE email IS NOT NULL;CREATE UNIQUE INDEX users_idx2 ON users (( email IS NOT NULL ));
partial index and expression index

What is the difference between the two indexes shown above? The first index #1 is a partial index while index #2 is an expression index. And it should be no surprise that because We reuse the PostgreSQL native query layer, it already supports both partial and expression indexes!

Let’s dive right in and take a look at what these indexes do.

Partial Indexes

As the PostgreSQL documentation on partial indexes states, a partial index is built over a subset of rows in a table defined by a conditional expression (called the predicate of the partial index). The index contains entries only for those table rows that satisfy the predicate. A major reason for using a partial index is to avoid indexing commonly occurring values. Since a query searching for a commonly occurring value (one that accounts for more than a few percent of all the table rows) will run through most of the table anyway, the benefit from using an index is marginal. A better strategy is to create a partial index where such rows are excluded altogether. Partial indexing reduces the size of the index and hence speeds up those queries that do use the index. It will also speed up many write operations because the index does not need to be updated in all cases. And in the context of a distributed SQL database, every such speed up helps in lowering the effective latency observed by client applications.

Consider the first index in our example.

CREATE UNIQUE INDEX users_idx1 
  ON users (email) 
  WHERE email IS NOT NULL;

This is a partial unique index that:

  • Only indexes users whose email is not NULL.
  • Ensures that no two users have the same, non-null email address.

Hence, with this index, the table can have any number of users without an email address. But if an email address for a user is specified, then another user cannot already have the same email address. This can be seen from the following examples.

/* OK: Insert a user with a name and an email. */
test1=# INSERT INTO users (name, email) 
           VALUES ('Satya', 'satya@eduguru.in');

/* OK: Insert the same user name without an email (email is NULL). */
test2=# INSERT INTO users (name) VALUES ('Satya');

/* FAIL: Insert a different user with same email */
test3=# INSERT INTO users (name, email) 
           VALUES ('1001', 'satya@eduguru.in');
ERROR: duplicate key value violates unique constraint "users_idx1"

Further, the index users_idx1 is used when a lookup is performed by a non-empty email, while a full-scan is performed if the query looks up users that have a NULL value for their email.

test4=# EXPLAIN SELECT * FROM users 
           WHERE email='jbond@yugabyte.com';
                               QUERY PLAN
-------------------------------------------------------------------------
 Index Scan using users_idx1 on users  (cost=0.00..4.11 rows=1 width=68)
   Index Cond: ((email)::text = 'satya@eduguru.in'::text)
(2 rows)


test5=# EXPLAIN SELECT * FROM users WHERE email IS NULL;
                          QUERY PLAN
---------------------------------------------------------------
 Foreign Scan on users  (cost=0.00..100.00 rows=1000 width=68)
   Filter: (email IS NULL)
(2 rows)

Expression Indexes

The PostgreSQL documentation on expression indexes notes:

An index column need not be just a column of the underlying table but can be a function or scalar expression computed from one or more columns of the table. This feature is useful to obtain fast access to tables based on the results of computations.

The second example is an expression index, as shown below.

CREATE UNIQUE INDEX users_idx2
  ON users 
  (( email IS NOT NULL ));

The index table in the above case would contain the result of evaluating the expression, ( email IS NOT NULL ). The result would evaluate to either true or false, and would allow exactly one row of each kind since it is an UNIQUE index. This means that the above index allows only two rows in the table, one row with email being NULL and the other row with a non-NULL email.

While this particular index is not very useful in practice, it certainly illustrates how expression indexes would work. Below are some example rows being inserted into the table.

/* OK: Insert a row with non-NULL email */
test6=# INSERT INTO users (name, email)
           VALUES ('Satya', 'satya@eduguru.in');


/* OK: Insert a row with non-NULL email */
test7=# INSERT INTO users (name) VALUES ('Satya');


/* FAIL: Row with NULL email already exists */
test8=# INSERT INTO users (name) VALUES ('Munna');
ERROR:  duplicate key value violates unique constraint "users_idx2"


/* FAIL: Row with non-NULL email already exists */
test9=# INSERT INTO users (name, email)                                                                   VALUES ('Munna', 'munna@eduguru.in');
ERROR:  duplicate key value violates unique constraint "users_idx2"

A more useful example of an expression index in our scenario is to create a case-insensitive index as follows.

test10=# CREATE UNIQUE INDEX users_idx3
           ON users (lower(email));

This would prevent inserting duplicate email addresses using a different case, as shown in the example below.

/* OK: Insert a row a new email */
test11=# INSERT INTO users (name, email)
           VALUES ('Satya', 'satya@eduguru.in');


/* FAIL: Lowercase version of email address already exists! */
test12=# INSERT INTO users (name, email)                                                                   VALUES ('Satya', 'SATYA@eduguru.in');
ERROR:  duplicate key value violates unique constraint "users_idx3"

Leave a Reply