Strategy functions are user-defined functions that can be used in queries to select data. Registering them as strategy functions with the CREATE OPCLASS statement lets the optimizer know that an associated R-tree index can be used to execute a query that contains one of those functions.
For example, assume there is an R-tree index on a column called boxes, and Overlap is defined as a strategy function. If a query contains the qualification WHERE Overlap (boxes, region), the query optimizer considers using the R-tree index to evaluate the query.
You can include up to 64 strategy functions when you create a new operator class for the R-tree access method. You must, however, include the following four strategy functions:
You must list these functions first, in the order shown, when you execute the CREATE OPCLASS statement to register the operator class with the database server. This SQL statement is described in Syntax for Creating a New Operator Class.
The four required strategy functions are defined in detail in later sections of this chapter, with an example of creating the Contains strategy function.
The main purpose of the strategy functions is to tell the query optimizer when it should consider using an R-tree index, as described in the preceding section. However, the R-tree access method also uses the strategy functions internally to search in the R-tree index, to delete entries from the index, and to optimize the performance of updates to the index.
The R-tree access method uses the four required strategy functions in a variety of combinations when searching in an R-tree index, as the following table shows.
Slot Number | Strategy Function | Commutator Function | Function Called on an Index Key in a Nonleaf Page |
---|---|---|---|
1 | Overlap | Overlap | Overlap |
2 | Equal | Equal | Contains |
3 | Contains | Within | Contains |
4 | Within | Contains | Overlap |
5 | Available for use | Same function | Same function |
... | ... | ... | ... |
64 | Available for use | Same function | Same function |
You can use the RtreeInfo function to redefine these switching semantics. |
The first column of the table refers to the position in the CREATE OPCLASS statement of the strategy function. The four required strategy functions must be listed first, in the order shown in the second column.
The third column specifies the function that the R-tree access method uses as the commutator of a particular strategy function. The Within and Contains functions are commutators of each other. Other functions, including those numbered 5 and up, are assumed to be their own commutators. This means that the R-tree access method assumes that when it calls the function, the access method can reverse the order of the arguments without changing the results of the function. Strategy functions should be implemented with these commutator substitutions in mind.
In certain cases, the query optimizer uses the commutator functions as substitute functions. For example, suppose a query has the predicate Within(A, B) in its WHERE clause, where A is a constant search object and B is a table column with an R-tree index defined on it. Predicate functions in WHERE clauses are written to work with an index on the first argument, so the Within function cannot be used in this case, because the R-tree index is on the second argument. The commutator information allows the optimizer to substitute Contains(B, A), which allows the R-tree index on B to be used in the execution of the query.
The strategy functions in slots 5 through 64 can have commutator functions specified by the COMMUTATOR = FUNCTION modifier of the CREATE FUNCTION statements used to register the functions in SQL. If you do not specify a commutator function, the query optimizer does not attempt to change the order of the arguments in order to get an indexed column as the first argument. The following example registers the Contains strategy function and specifies that the Within function is its commutator:
CREATE FUNCTION Contains (MyShape, MyShape) RETURNS BOOLEAN WITH ( COMMUTATOR = Within, NOT VARIANT ) EXTERNAL NAME "$INFORMIXDIR/extend/shapes.3.0/shapes.bld (MyShapeContains)" LANGUAGE C;
The strategy functions in slots 5 through 64 can also have negator functions specified by the NEGATOR = FUNCTION modifier of the CREATE FUNCTION statements used to register the functions in SQL. The R-tree access method cannot process queries with a negated strategy function, such as NOT Separated(A,B). However, if the Separated strategy function declares the Overlap function as its negator, the query optimizer is able to substitute the predicate Overlap(A,B) for the NOT Separated(A,B), which allows the use of an R-tree index on column A.
The fourth column specifies the function that the R-tree access method uses when searching for an index key in a nonleaf page. The following paragraph explains why the entry for Within is Overlap, and the entry for Equal is Contains.
Suppose a query has the predicate Within(A, B) in its WHERE clause, where B is a constant search object and A is a table column with an R-tree index defined on it. When a leaf page of the index is searched, the index entries are true candidates to match the query, so the Within function is used directly for each index entry. The search of a branch page tests to see if there exists an entry in the subtree below the branch page that is within the search object B. In this case, the search does not test whether the bounding box of the subtree is within B, but whether the bounding box of the subtree overlaps B. This is because a small entry within the subtree, in the overlapping portion of the bounding box, could be completely within B. Therefore, an index search that uses the Within function must substitute the Overlap function for nonleaf (branch) index pages.
Similarly, an index search that uses the Contains function must substitute the Equal function for nonleaf index pages because a qualifying index entry could be in any subtree whose bounding box contains the search object.
The access method uses the Contains function for index scans that search for leaf objects that must be deleted from the R-tree index after their associated row in the table is deleted.
The access method uses the Equal function to optimize the performance of updates to the R-tree index. When a row in a table is updated, any R-tree index on the table might also need to be updated. Updates usually mean deleting the old entry and inserting the new entry. First, however, the access method uses the Equal strategy function to check whether the new entry is different from the old entry. If they are both equal, the access method does not perform the update.
The Overlap function returns a Boolean value that indicates whether two objects overlap or have at least one point in common.
Figure 6 shows a circle that overlaps a triangle. The circle, however, does not overlap the box, because the circle does not have any points in common with the box.
The signature of the Overlap function must be:
Overlap (UDT, UDT) RETURNS BOOLEAN
UDT refers to user-defined type, or the data type you want to index with the R-tree access method.
The Overlap function returns TRUE if the object in the first parameter overlaps or intersects the object in the second parameter and FALSE otherwise.
When you design the Overlaps function, you might want to first test if the bounding boxes of the two data objects overlap; and if they do, then test if the data objects overlap. The first test is a relatively quick and easy calculation and might eliminate many candidates before the second, more complicated test.
For example, Figure 7 shows that the first bounding box test eliminates the box-circle overlap immediately, but the second data object test is required to find out if the triangle and circle overlap. In this case, they do not.
Appendix A. Shapes3 Sample DataBlade Module contains sample C code to create an Overlap function that takes the MyShape data type as its two parameters.
The Equal function returns a Boolean value that indicates whether two objects are equal. For example, in two-dimensional space, two points that have the same coordinates might be equal, as are two circles that have the same center and radius.
The signature of the Equal function must be:
Equal (UDT, UDT) RETURNS BOOLEAN
UDT refers to user-defined type, or the data type you want to index with the R-tree access method.
The Equal function returns TRUE if the two objects contained in the two parameters are equal and FALSE otherwise. It is up to the application or data type designer to define what equal means for the user-defined data type.
Appendix A. Shapes3 Sample DataBlade Module contains sample C code to create an Equal function that takes the MyShape data type as its two parameters.
The Contains function returns a Boolean value that indicates whether an object entirely contains another object.
Figure 8 shows a circle that contains a box. The circle, however, does not contain the triangle, because part of the triangle lies outside the circle.
The signature of the Contains function must be:
Contains (UDT, UDT) RETURNS BOOLEAN
UDT refers to user-defined type, or the data type you want to index with the R-tree access method.
The Contains function returns TRUE if the object in the first parameter completely contains the object in the second parameter and FALSE otherwise.
When you design the Contains function, you might want to first test if the bounding box of the first object contains the bounding object of the second object; and if it does, then test if the first data object contains the second data object. The first test is a relatively quick and easy calculation and might eliminate many candidates before the second, more complicated test.
For example, Figure 9 shows that the first bounding box test eliminates the box-circle containment immediately, but the second data object test is required to find out if the circle contains the triangle. In this case, it does not.
If you allow loose, or inexact, bounding boxes, be careful when you calculate the containment of bounding boxes. For example, Figure 10 shows that although the exact bounding box of the rectangle does not contain the loose bounding box of the circle, the rectangle still contains the circle.
In this case, a preliminary test for bounding box containment returns inaccurate results unless you used a compensating factor to account for the circle's loose bounding box. For more information on loose bounding boxes, refer to Loose Bounding Box Calculations.
Appendix A. Shapes3 Sample DataBlade Module contains sample C code to create a Contains function that takes the MyShape data type as its two parameters.
The Within function returns a Boolean value that indicates whether an object is contained by another object. It is similar to the Contains function, but the order of the two parameters is switched.
Figure 11 shows a box that is within, or contained by, a circle. The triangle, however, is not within either the circle or the box, because all or part of the triangle lies outside both the circle and the box.
The signature of the Within function must be:
Within (UDT, UDT) RETURNS BOOLEAN
UDT refers to user-defined type, or the data type you want to index with the R-tree access method.
The Within function returns TRUE if the object in the first parameter is within, or completely contained in, the object in the second parameter and FALSE otherwise.
When you design the Within function, you might want to first test if the bounding box of the first object is contained in the bounding object of the second object; and if it is, then test if the first data object is contained in the second data object. The first test is a relatively quick and easy calculation and might eliminate many candidates before the second, more complicated test.
For example, Figure 12 shows that the first bounding box test eliminates the box-circle containment immediately, but the second data object test is required to find out if the triangle is within the circle. In this case, it is not.
If you allow loose, or inexact, bounding boxes, be careful when you calculate the containment of bounding boxes. For example, Figure 13 shows that although the loose bounding box of the circle is not within the exact bounding box of the rectangle, the circle is still within the rectangle.
For more information on loose bounding boxes, refer to Loose Bounding Box Calculations.
Appendix A. Shapes3 Sample DataBlade Module contains sample C code to create a Within function that takes the MyShape data type as its two parameters.
You can create up to 60 nonrequired strategy functions for an operator class. This means that together with the four required functions, you can have a total of 64 strategy functions defined for a particular operator class.
For example, you might want to create a function that calculates whether one object is outside a second object. You create the Outside function in the same way you create the other required functions, except that the C code to implement the function is quite different. When you create the operator class with the CREATE OPCLASS statement, you list the Outside function as the fifth strategy function, right after the four required strategy functions.
Other types of strategy functions you might want to create include specialized Overlap and Within functions. For example, these functions could implement whether two objects overlap a lot, overlap a little, or interlock but do not touch.
The CREATE OPCLASS statement is described in Syntax for Creating a New Operator Class.
This example describes the SQL statement that registers the Contains strategy function with the database server. The sample C code to create the function is provided in Appendix A. Shapes3 Sample DataBlade Module; the example is based on the objects of the sample DataBlade module, described in that appendix.
The SQL statements to register the Overlap, Equal, and Within strategy functions with the database server are similar to the SQL statement to register the Contains function.
The following SQL statement shows how to register the Contains strategy function with the database server:
CREATE FUNCTION Contains (MyShape, MyShape) RETURNS BOOLEAN WITH ( COMMUTATOR = Within, NOT VARIANT ) EXTERNAL NAME "$INFORMIXDIR/extend/shapes.3.0/shapes.bld (MyShapeContains)" LANGUAGE C;
The two parameters of the function are both of data type MyShape. The C function MyShapeContains, found in the shared object file $INFORMIXDIR/extend/Shapes.3.6/Shapes.bld, contains the actual C code that calculates whether the first object contains the second object. The statement specifies that the commutator of the Contains function is the Within function.
For the sample C code of the MyShapeContains function, see Contains Strategy Function. C code uses the DataBlade API to interact with the database server. Sample C code to implement the Overlap, Equal, and Within functions is also provided in that appendix.
For more information on the DataBlade API, refer to the IBM Informix: DataBlade API Programmer's Guide.
For more information and examples on how to create user-defined functions, refer to IBM Informix: User-Defined Routines and Data Types Developer's Guide.
Home | [ Top of Page | Previous Page | Next Page | Contents | Index ]