Heuristic in Query optimization

Heuristic optimization often includes making transformations to the query tree by moving operators up and down the tree so that the transformed tree is equivalent to the tree before the transformations. Before we discuss these heuristics, it is necessary to discuss the following rules governing the manipulation of relational algebraic expressions:

1. Joins and Products are commutative. e.g.
R X S = S X R
R |X| S = S |X| R

where |X| may be a join or a natural join. The order of attributes in the two products or joins may not be quite the same but the ordering of attributes is not considered significant in the relational model since the attributes are referred to by their name not by their position in the list of attributes.

2. Restriction is commutative. e.g.

Screenshot from 2020-08-12 21-33-51

3. Joins and Products are associative. e.g.

( R X S) X T = R X ( S X T)
( R |X| S) |X| T = R |X| ( S |X| T)

The associativity of the above operations guarantees that we will obtain the same results whatever be the ordering of computations of the operations product and join. Union and intersection are also associative.

4. Cascade of Projections. e.g.

Screenshot from 2020-08-12 21-34-46

where the attributes A is a subset of the attributes B. The above expression formalises the obvious that there is no need to take the projection with attributes B if there is going to be another projection which is a subset of B that follows it.

5. Cascade of restrictions. e.g.

Screenshot from 2020-08-12 21-35-28

The above expression also formalises the obvious that if there are two restrictions, one after the other, then there is no need to carry out the restrictions one at a time (since each will require processing a relation) and instead both restrictions could be combined.

6. Commuting restrictions and Projections. e.g.

Screenshot from 2020-08-12 21-37-23

or

Screenshot from 2020-08-12 21-37-49

There is no difficulty in computing restriction with a projection since we are then doing the restriction before the projection. However if we wish to commute the projection and the restriction, that is possible only if the restriction used no attributes other than those that are in the projection.

7. Commuting restrictions with Cartesian Product. In some cases, it is possible to apply commutative law to restrictions and a product. For example,

Screenshot from 2020-08-12 21-38-55

or

Screenshot from 2020-08-12 21-39-30

In the above expressions we have assumed that the predicate p has only attributes from R and the predicate q has attributes from S only.

8. Commuting restriction with a Union.
9. Commuting restriction with a Set Difference.
10. Commuting Projection with a Cartesian Product or a Join — we assume that the projection includes the join predicate attributes.
11. Commuting Projection with a UNION.

We now use the above rules to transform the query tree to minimize the query cost. Since the cost is assumed to be closely related to the size of the relations on which the operation is being carried out, one of the primary aims of the transformations that we discuss is to reduce the size of intermediate relations.

The basic transformations include the following:

(a) Moving restrictions down the tree as far as possible. The idea is to carry out restrictions as early as possible. If the query involves joins as well as restrictions, moving the restrictions down is likely to lead to substantial savings since the relations that are joined after restrictions are likely to be smaller (in some cases much smaller) than before restrictions. This is clearly shown by the example that we used earlier in this topic to show that some query plans can be much more expensive than others. The query plans that cost the least were those in which the restriction was carried out first. There are of course situations where a restriction does not reduce the relation significantly, for example, a restriction selecting only women from a large relation of customers or clients.

(b) Projections are executed as early as possible. In real-life databases, a relation may have one hundred or more attributes and therefore the size of each tuple is relatively large. Some relations can even have attributes that are images making each tuple in such relations very large. In such situations, if a projection is executed early and it leads to elimination of many attributes so that the resulting relation has tuples of much smaller size, the amount of data that needs to be read in from the disk for the operations that follow could be reduced substantially leading to cost savings. It should be noted that only attributes that we need to retain from the relations are those that are either needed for the result or those that are to be used in one of the operations that is to be carried out on the relation.

(c) Optimal Ordering of the Joins. We have noted earlier that the join operator is associative and therefore when a query involves more than one join, it is necessary to find an efficient ordering for carrying out the joins. An ordering is likely to be efficient if we carry out those joins first that are likely to lead to small results rather than carrying out those joins that are likely to lead to large results.

(d) Cascading restrictions and Projections. Sometimes it is convenient to carry out more than one operations together. When restrictions and projections have the same operand, the operations may be carried out together thereby saving the cost of scanning the relations more than once.

(e) Projections of projections are merged into one projection. Clearly, if more than one projection is to be carried out on the same operand relation, the projections should be merged and this could lead to substantial savings since no intermediate results need to be written on the disk and read from the disk.

(f) Combining certain restrictions and Cartesian Product to form a Join. A query may involve a cartesian product followed by a restriction rather than specifying a join. The optimizer should recognise this and execute a join which is usually much cheaper to perform.

(g) Sorting is deferred as much as possible. Sorting is normally a nlogn operation and by deferring sorting, we may need to sort a relation that is much smaller than it would have been if the sorting was carried out earlier.