Clustering TEXT and BYTE Data

Home | Previous Page | Next Page Using the Optical Subsystem > Reading TEXT and BYTE Data Columns from an Optical Platter >

Clustering TEXT and BYTE Data

You can use the CREATE OPTICAL CLUSTER statement to store logically related TEXT and BYTE data on the same volume of an optical platter. Storing related TEXT and BYTE data together minimizes platter exchanges when the TEXT and BYTE data is retrieved. The CREATE OPTICAL CLUSTER statement requires you to perform the following steps:

Assign a name to the optical cluster.
Specify the TEXT or BYTE data columns in a table that are to be stored together in the optical cluster.
Specify the cluster key.
The cluster key consists of one or more columns in the table that define the logical grouping of the TEXT or BYTE data columns. The columns that make up the cluster key must not be TEXT or BYTE data columns.
Specify the size of the cluster in kilobytes.

The following CREATE OPTICAL CLUSTER statement allocates a cluster of 18,000 kilobytes on an optical platter for the cat_picture BYTE column in the catalog table. It assigns the name catalog_catpics to the cluster.

CREATE OPTICAL CLUSTER catalog_catpics
   FOR catalog (cat_picture)
   ON (manu_code)
   CLUSTERSIZE 18000

The cluster key for catalog_catpics is manu_code. Therefore the cat_picture TEXT and BYTE data that have the same value in the manu_code column are stored together on the same volume or volumes, if more than one is required.

Choosing the Cluster Key

In the preceding example, the manu_code column was chosen for the cluster key because, of all the columns in the catalog table, it best meets the following criteria for choosing a cluster key:

The manu_code column provides a logical order for retrieving the TEXT and BYTE data.
The manu_code column contains duplicate values.

The first criterion for a cluster key, which can consist of more than one column, is that it provides a logical order for retrieving the TEXT and BYTE data. The purpose of the cluster key is to optimize retrieval of the TEXT and BYTE data by grouping related TEXT and BYTE data on the same volume. Grouping TEXT and BYTE data by a cluster key is only advantageous, however, if the TEXT and BYTE data is normally retrieved in the order that it is grouped. If the cat_picture TEXT and BYTE data is grouped by stock_num and then retrieved by manu_code, as the following example shows, the optical-storage subsystem might still need to perform multiple platter exchanges to retrieve the TEXT and BYTE data:

SELECT cat_picture FROM catalog WHERE manu_code = 'HRO'

The second criterion for a cluster key is that it must contain duplicate values. Of all the columns in the catalog table, only the first three, catalog_num, stock_num, and manu_code, provide logical sequences for accessing the table. Of these columns, catalog_num is not useful because it has a SERIAL data type, which produces a unique value for each row in the table. Clustering on this column produces a cluster for every row in the table where cat_picture is not null.

Assume, for example, that 26 rows in the catalog table have pictures in the cat_picture column, and that the manu_code column has seven unique values with occurrences as the following example shows:

manu_code      occurrences
ANZ                 7
HRO                 6
KAR                 1
NRG                 1
PRC                 4
SHM                 5
SMT                 2

For these characteristics, manu_code produces seven unique clusters.

Choosing the Cluster Size

The cluster size is the size of the cluster expressed in kilobytes. It is based on the number of TEXT and BYTE data objects that you expect to store in each cluster and the average size of the TEXT and BYTE data. You can calculate it as follows:

cluster size = estimated number of TEXT and BYTE data objects *
   average TEXT and BYTE data size

If the average size of the cat_picture images is 60 kilobytes, and you want to store up to 300 TEXT and BYTE data objects per cluster, the cluster size is 18,000 kilobytes.

Altering the Cluster Size

You can use the ALTER OPTICAL CLUSTER statement to change the size of an optical cluster for all clusters that are created after the statement is executed. In the following example, the size of the catalog_catpics cluster is reduced to 6,000 kilobytes, a cluster size that stores up to one hundred 60-kilobyte TEXT or BYTE data objects per cluster:

ALTER OPTICAL CLUSTER catalog_catpics CLUSTERSIZE 6000

The ALTER OPTICAL CLUSTER statement changes the cluster size only for TEXT or BYTE data objects created after the statement is executed. Clusters that were created previously remain their original size.

For more information, see ALTER OPTICAL CLUSTER (+, DB-Access, ESQL/C).

Dropping an Optical Cluster

You can use the DROP OPTICAL CLUSTER statement to terminate optical clustering. The following example terminates clustering for catalog_catpics:

DROP OPTICAL CLUSTER catalog_catpics

The DROP OPTICAL CLUSTER statement does not affect the TEXT and BYTE data objects that were already stored in a cluster. It terminates clustering only for the affected columns in the future.

You can also use the DROP OPTICAL CLUSTER statement with the CREATE OPTICAL CLUSTER statement to change either the TEXT or BYTE data columns or the cluster-key columns for an optical cluster. For example, you can change the cluster key for catalog_catpics from manu_code to stock_num with the following statements:

DROP OPTICAL CLUSTER catalog_catpics;
CREATE OPTICAL CLUSTER catalog_catpics
   FOR catalog (cat_picture)
   ON (stock_num)
   CLUSTERSIZE 3000

These statements do not affect TEXT and BYTE data objects that were previously clustered on manu_code because you cannot alter data written to a WORM optical platter. These statements change the clustering only for future inserts or updates. If you want to recluster the TEXT and BYTE data objects that were clustered by manu_code, you must update them. The TEXT and BYTE data objects are then rewritten to the new cluster. The TEXT or BYTE data descriptors are updated with the new location of the TEXT and BYTE data objects. The space that these TEXT and BYTE data objects originally occupied is lost because you cannot reuse space on WORM optical media.

For more information, see DROP OPTICAL CLUSTER (+, DB-Access, ESQL/C).