RANGE Partitioning in MySQL

RANGE Partitioning in MySQL

What is Partitioning?

Partitioning is a way in which a database (MySQL in this case) splits its actual data down into separate tables, but still get treated as a single table by the SQL layer.

When partitioning, it’s a good idea to find a natural partition key. You want to ensure that table lookups go to the correct partition or group of partitions. This means that all SELECT, UPDATE, DELETE should include that column in the WHERE clause. Otherwise, the storage engine does a scatter-gather, and queries ALL partitions in a UNION that is not concurrent.

Generally, you must add the partition key into the primary key along with the auto increment, i.e., PRIMARY KEY (part_id,id). If you don’t have well-designed and small columns for this composite primary key, it could enlarge all of your secondary indexes.

Understanding RANGE Partitioning in MySQL

A table that is partitioned by range is partitioned in such a way that each partition contains rows for which the partitioning expression value lies within a given range. Ranges should be contiguous but not overlapping, and are defined using the VALUES LESS THAN operator. For the next few examples, suppose that you are creating a table such as the following to hold personnel records for a chain of 20 video stores, numbered 1 through 20:

 

CREATE TABLE employees ( id INT NOT NULL, fname VARCHAR(30), lname VARCHAR(30), hired DATE NOT NULL DEFAULT ‘1970-01-01’, separated DATE NOT NULL DEFAULT ‘9999-12-31’, job_code INT NOT NULL, store_id INT NOT NULL );

The employees table used here has no primary or unique keys. While the examples work as shown for purposes of the present discussion, you should keep in mind that tables are extremely likely in practice to have primary keys, unique keys, or both, and that allowable choices for partitioning columns depend on the columns used for these keys, if any are present.

 

How table can be partitioned by range

  1. Partition Based on lowest to highest value

This table can be partitioned by range in a number of ways, depending on your needs. One way would be to use the store_id column. For instance, you might decide to partition the table 4 ways by adding a PARTITION BY RANGE clause as shown here:

CREATE TABLE employees ( id INT NOT NULL, fname VARCHAR(30), lname VARCHAR(30), hired DATE NOT NULL DEFAULT ‘1970-01-01’, separated DATE NOT NULL DEFAULT ‘9999-12-31’, job_code INT NOT NULL, store_id INT NOT NULL ) PARTITION BY RANGE (store_id) ( PARTITION p0 VALUES LESS THAN (6), PARTITION p1 VALUES LESS THAN (11), PARTITION p2 VALUES LESS THAN (16), PARTITION p3 VALUES LESS THAN (21) );

 

In this partitioning scheme, all rows corresponding to employees working at stores 1 through 5 are stored in partition p0, to those employed at stores 6 through 10 are stored in partition p1, and so on.

Note that each partition is defined in order, from lowest to highest.

This is a requirement of the PARTITION BY RANGE syntax; you can think of it as being analogous to a series of if … elseif … statements in C or Java in this regard.

It is easy to determine that a new row containing the data (72, ‘Mitchell’, ‘Wilson’, ‘1998-06-25’, NULL, 13) is inserted into partition p2, but what happens when your chain adds a 21st store?

Under this scheme, there is no rule that covers a row whose store_id is greater than 20, so an error results because the server does not know where to place it. You can keep this from occurring by using a “catchall” VALUES LESS THAN clause in the CREATE TABLE statement that provides for all values greater than the highest value explicitly named:

CREATE TABLE employees ( id INT NOT NULL, fname VARCHAR(30), lname VARCHAR(30), hired DATE NOT NULL DEFAULT ‘1970-01-01’, separated DATE NOT NULL DEFAULT ‘9999-12-31’, job_code INT NOT NULL, store_id INT NOT NULL ) PARTITION BY RANGE (store_id) ( PARTITION p0 VALUES LESS THAN (6), PARTITION p1 VALUES LESS THAN (11), PARTITION p2 VALUES LESS THAN (16), PARTITION p3 VALUES LESS THAN MAXVALUE );

2. Partition based on MAXVALUE

MAXVALUE represents an integer value that is always greater than the largest possible integer value (in mathematical language, it serves as a least upper bound). Now, any rows whose store_id column value is greater than or equal to 16 (the highest value defined) are stored in partition p3. At some point in the future—when the number of stores has increased to 25, 30, or more—you can use an ALTER TABLE statement to add new partitions for stores 21-25, 26-30, and so on

In much the same fashion, you could partition the table based on employee job codes—that is, based on ranges of job_code column values. For example—assuming that two-digit job codes are used for regular (in-store) workers, three-digit codes are used for office and support personnel, and four-digit codes are used for management positions—you could create the partitioned table using the following statement:

CREATE TABLE employees ( id INT NOT NULL, fname VARCHAR(30), lname VARCHAR(30), hired DATE NOT NULL DEFAULT ‘1970-01-01’, separated DATE NOT NULL DEFAULT ‘9999-12-31’, job_code INT NOT NULL, store_id INT NOT NULL ) PARTITION BY RANGE (job_code) ( PARTITION p0 VALUES LESS THAN (100), PARTITION p1 VALUES LESS THAN (1000), PARTITION p2 VALUES LESS THAN (10000) );

 

In this instance, all rows relating to in-store workers would be stored in partition p0, those relating to office and support staff in p1, and those relating to managers in partition p2. It is also possible to use an expression in VALUES LESS THAN clauses. However, MySQL must be able to evaluate the expression’s return value as part of a LESS THAN (<) comparison. Rather than splitting up the table data according to store number, you can use an expression based on one of the two DATE columns instead. For example, let us suppose that you wish to partition based on the year that each employee left the company; that is, the value of YEAR(separated). An example of a CREATE TABLE statement that implements such a partitioning scheme is shown here:

CREATE TABLE employees ( id INT NOT NULL, fname VARCHAR(30), lname VARCHAR(30), hired DATE NOT NULL DEFAULT ‘1970-01-01’, separated DATE NOT NULL DEFAULT ‘9999-12-31’, job_code INT, store_id INT ) PARTITION BY RANGE ( YEAR(separated) ) ( PARTITION p0 VALUES LESS THAN (1991), PARTITION p1 VALUES LESS THAN (1996), PARTITION p2 VALUES LESS THAN (2001), PARTITION p3 VALUES LESS THAN MAXVALUE );

 

In this scheme, for all employees who left before 1991, the rows are stored in partition p0; for those who left in the years 1991 through 1995, in p1; for those who left in the years 1996 through 2000, in p2; and for any workers who left after the year 2000, in p3.

3. Partition based on Timestamp 

It is also possible to partition a table by RANGE, based on the value of a TIMESTAMP column, using the UNIX_TIMESTAMP() function, as shown in this example:

 

CREATE TABLE quarterly_report_status ( report_id INT NOT NULL, report_status VARCHAR(20) NOT NULL, report_updated TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP ) PARTITION BY RANGE ( UNIX_TIMESTAMP(report_updated) ) ( PARTITION p0 VALUES LESS THAN ( UNIX_TIMESTAMP(‘2008-01-01 00:00:00’) ), PARTITION p1 VALUES LESS THAN ( UNIX_TIMESTAMP(‘2008-04-01 00:00:00’) ), PARTITION p2 VALUES LESS THAN ( UNIX_TIMESTAMP(‘2008-07-01 00:00:00’) ), PARTITION p3 VALUES LESS THAN ( UNIX_TIMESTAMP(‘2008-10-01 00:00:00’) ), PARTITION p4 VALUES LESS THAN ( UNIX_TIMESTAMP(‘2009-01-01 00:00:00’) ), PARTITION p5 VALUES LESS THAN ( UNIX_TIMESTAMP(‘2009-04-01 00:00:00’) ), PARTITION p6 VALUES LESS THAN ( UNIX_TIMESTAMP(‘2009-07-01 00:00:00’) ), PARTITION p7 VALUES LESS THAN ( UNIX_TIMESTAMP(‘2009-10-01 00:00:00’) ), PARTITION p8 VALUES LESS THAN ( UNIX_TIMESTAMP(‘2010-01-01 00:00:00’) ), PARTITION p9 VALUES LESS THAN (MAXVALUE) );

 

Partitioning schemes based on time intervals. If you wish to implement a partitioning scheme based on ranges or intervals of time in MySQL 5.7, you have two options:

1. Partition the table by RANGE, and for the partitioning expression, employ a function operating on a DATE, TIME, or DATETIME column and returning an integer value, as shown here: CREATE TABLE members ( firstname VARCHAR(25) NOT NULL, lastname VARCHAR(25) NOT NULL, username VARCHAR(16) NOT NULL, email VARCHAR(35), joined DATE NOT NULL ) PARTITION BY RANGE( YEAR(joined) ) ( PARTITION p0 VALUES LESS THAN (1960), PARTITION p1 VALUES LESS THAN (1970), PARTITION p2 VALUES LESS THAN (1980), PARTITION p3 VALUES LESS THAN (1990), PARTITION p4 VALUES LESS THAN MAXVALUE );

In MySQL 5.7, it is also possible to partition a table by RANGE based on the value of a TIMESTAMP column, using the UNIX_TIMESTAMP() function, as shown in this example:

CREATE TABLE quarterly_report_status ( report_id INT NOT NULL, report_status VARCHAR(20) NOT NULL, report_updated TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP ) PARTITION BY RANGE ( UNIX_TIMESTAMP(report_updated) ) ( PARTITION p0 VALUES LESS THAN ( UNIX_TIMESTAMP(‘2008-01-01 00:00:00’) ), PARTITION p1 VALUES LESS THAN ( UNIX_TIMESTAMP(‘2008-04-01 00:00:00’) ), PARTITION p2 VALUES LESS THAN ( UNIX_TIMESTAMP(‘2008-07-01 00:00:00’) ), PARTITION p3 VALUES LESS THAN ( UNIX_TIMESTAMP(‘2008-10-01 00:00:00’) ), PARTITION p4 VALUES LESS THAN ( UNIX_TIMESTAMP(‘2009-01-01 00:00:00’) ), PARTITION p5 VALUES LESS THAN ( UNIX_TIMESTAMP(‘2009-04-01 00:00:00’) ), PARTITION p6 VALUES LESS THAN ( UNIX_TIMESTAMP(‘2009-07-01 00:00:00’) ), PARTITION p7 VALUES LESS THAN ( UNIX_TIMESTAMP(‘2009-10-01 00:00:00’) ), PARTITION p8 VALUES LESS THAN ( UNIX_TIMESTAMP(‘2010-01-01 00:00:00’) ), PARTITION p9 VALUES LESS THAN (MAXVALUE) );

4. Partition Based on Date or Datetime column

Partition the table by RANGE COLUMNS, using a DATE or DATETIME column as the partitioning column. For example, the members table could be defined using the joined column directly, as shown here: CREATE TABLE members ( firstname VARCHAR(25) NOT NULL, username VARCHAR(16) NOT NULL, email VARCHAR(35), joined DATE NOT NULL ) PARTITION BY RANGE COLUMNS(joined) ( PARTITION p0 VALUES LESS THAN (‘1960-01-01’), PARTITION p1 VALUES LESS THAN (‘1970-01-01’), PARTITION p2 VALUES LESS THAN (‘1980-01-01’), PARTITION p3 VALUES LESS THAN (‘1990-01-01’), PARTITION p4 VALUES LESS THAN MAXVALUE );

 

 

 

 

Leave a Reply