BTScanner

The following information relates to the BTSCANNER configuration parameter

BTSCANNER num=<number>:prio=<high|low>:rangescan=<number>:threshold=<number>

 
What should the threshold be set to?
    Question If this is set low the btscanner will continually be scanning the leaf pages, which causes excessive IO. If it is high there would be large numbers of 'dirty' index pages which would cause normal threads to do extra work on a regular basis.

    Answer The setting of the threshold ( onmode -C threshold {size} ) sets the number of deleted items an index must encounter before it is placed onto the hot list to be cleaned. The size is dependent on usage, thus falls into the tuning area.
 
Btscanner Priority
    Question What is the resource use of the btscanner like. For example if the btscanner is given a low priority on a busy system will it still be able to keep up with the cleaning it needs to do or will it be overwhelmed as it rarely gets the chance to run. Would this be characterised by lots of yielding scanner threads?

    Answer The priority of the btscanner thread(s) can only be set to either low or high, low being a priority lower than normal user threads, and high being a priority equal to that of normal user threads. If the system is busy and you want index cleaning to occur, use onmode -C high to increase the priority as above.
 
Range Scanning
    Question Does this work with version 9.40x? Currently we have it disabled and any onmode commands relating to range scanning just return a usage message. How is it enabled?

    Answer The Light Range Scan is incorporated in 9.40x. It is only applicable when the number of indexes on a partition is exactly equal to one (i.e. detached indexes or an attached index with only one index on the table). For a full decsription of implementation see the rest of this article. When using the onmode -range {value} command, Light Range scanning will automatically occur when only only one index exists on that partition.

Multiple Btscanner Threads

    Question Is it possible to permanently configure the number of btscanner threads? At the moment additional threads can be added with onmode but these are lost when the instance is restarted. If there were 20 btscanner threads but only one table in the hot list would only would thread clean that it? Sometimes some threads will do a lot of reads but no writes - if one scanner was assigned to one table then surely all would do some writes?

    Answer The engine will start by default one and only one thread. The thread will run in a low priority, you can run the onstats to see this. In the later engines the number of threads can be configured at startup. In earlier engines there's not a way to configure it to start multiple threads each time. If you want to add more, you must do that with onmode each time.
 
Overview of the BTScanner

The new Btscanner replaces the earlier btree cleaner implementation of earlier Informix Dynamic Server versions. The design covers three major areas that have changed, that of

  1. workload generation,

    The workload for cleaning indexes is determined by keeping track of how many times items an index caused the server to do extra work. The index which causes the server to do the most extra work, will be the next index cleaned by the btree scanner thread(s).

  2. how the btree is cleaned of dirty items,

    An index will have it's entire leaf level examined looking for deleted items. Upon finding a deleted index item, the cleaner will test lock the item, then undertake a foreground remove of the item, and then determine if the page warrants compression of the index page.

  3. the use of multiple threads

    The implementation now allows the dynamic allocation of threads for configurable workloads.

 
Design Details

Submission of work to be cleaned will now be accomplished by profiling the number of times a reader of the index encounters deleted items that require the reader to do extra work. The extra work will be profiled for each index, and will be the basis for developing a Hot List which will drive the workload of the btree scanner.

This hot list will be created by under the following conditions,

    `
  • when the hot list is empty of work;
  • a user request is place (by onmode), or
  • when the list has changed enough to be out of date.
Any btree scanner seeing a sort task pending will acquire the task, setting the sort task in progress and starting a scan of the partitions, creating a list of part numbers, key numbers and dirty hits.

This list is then sorted by hits and replaces the previous hot list.

The btree scanner thread now replaces the btree cleaner thread. Correspondingly, the new name of the thread(s) have also be changed to btscanner #. The btree scanner has three main task groups, and are processed in order.
  • administrative tasks,
  • second sort tasks, and
  • third index cleaning tasks
Therefore it's typical task cycle consists of checking to see if
  1. any new administrative tasks exist,
  2. getting an index off the hot list, and
  3. cleaning the index.

Administrative Tasks Exit a thread
 Start a new thread
 Kill
 Enable
 Disable
 Set Priority High
 Set Priority Low
 Yield N
 Yield 0
 Set Threshold
Sort Task Sort in Progress
 Sort Pending
Cleaning Task Scan Index
 
Scan

There now exists two types of scans that may be undertaken to clean an index. The first is a Basic Index Scan and the second is a Light Range Scan. N.B . It should be noted that the Light Range Scan can only be used when the number of indexes on a partition is exactly equal to one (i.e. detached indexes or an attached index with only one index on the table). The Basic Index Scan, however, can be used any time.

 
Basic Index (Leaf ) Scan methodology

The Basic Index (Leaf) Scan consists of starting at the root node of the index and then walking to the farthest left leaf. Once at the leaf level, each node is checked for dirty items and cleaned if required. The leaf node's next pointer is used to move to the next node to be processed, until the last leaf node is visited.

  • Advantages:
    Examines all index nodes from left to right.
    Looks at the buffer pool.
    Able to operate when more than one index key exists in a partition.

  • Disadvantages:
    Reads every index page into the buffer pool.
    Slow due to the amount of I/O operations involved.
 
Light Range Scan methodology

The Light Range Scan was added to improve the performance of index cleaning. It combines several other online performance features, such as light scanning and range scanning into the index cleaning scan. When a user submits a request for cleaning, the minimum and maximum logical page numbers are tracked off the memory partition. The scan then starts by reading a block of pages from the disk starting with the lowest logical page that a request has been submitted for. This block is then examined for any leaf pages having deleted items. Any pages having deleted items are then read into the buffer pool and cleaned.

During the cleaning of dirty pages, an asynchronous I/O is submitted for the next block to process until the highest logical page number is encountered.

  • Advantages:
    Uses light I/O scans.
    Only scans between the high and low boundaries.

  • Disadvantages:
    Does not clean index pages that have not been flushed to disk.
 
Implementation Impact

This new implementation involves changes to the server engine. Therefore is should not directly affect how applications are run. Any applications that previously did large batch updates or deletions to a single table will no longer bottleneck on the btree cleaner latch. End users will not have to change their code in order to take advantage of the new features of the btree scanner. The changes in the code are to engine algorithms, which will only affect the DBA.s tuning of resources.

 
User Interface Changes

Both the onmode and the onstat command have a new option -C

Onmode

onmode -C start {count} There can be a maximum of 32 btree scanner threads running at one time. If a count is not specified a default count of 1 is assumed.
onmode -C stop {count} This command is used to stop or kill btree scanner threads. This command will not execute immediately, but will take place on the assignment of the next unit of work. If a count is not specified a default count of 1 is assumed
onmode -C threshold {size} Sets the minimum number of deleted items an index must encounter before an index will be placed onto the hot list. Once all indexes above the threshold have been cleaned then indexes below this threshold will be added to the hot list.
onmode -C high Sets the priority of all running btree scanner threads. This will set the priority of the btree scanner threads equal to that of normal users.
onmode -C low This command sets the priority of all running btree scanner threads lower than normal users. This will allow the btree scanner threads to consume only spare resources and ensure that they will not use CPU cycles of normal users.
onmode -C enable Enables the btree scanner thread(s) after the disable command has been issued. (Normally only used during testing)
onmode -C disable Disables the btree scanner thread(s) from generating a sort list or scanning any indexes for deleted items. (Normally only used during testing)


Onstat

onstat -C Prints the profile information about the btree scanner subsystem and about each btree scanner thread active.
onstat -C prof Print the profile information for the system and each thread.
onstat -C hot Print the hot list index key in the order they are to be cleaned.
onstat -C part Print all partitions with index statistics.
onstat -C clean Show information about all partitions which have been cleaned or are in need of being cleaned.
onstat -C all Print all onstat - C options.

 
Performance Tuning

Number of threads versus Priority:

The btree scanner threads run at a lower priority than user threads, so when the system becomes busy the cleaning of the indexes will not occur as fast. If the system is busy and you want index cleaning to occur, set the threads to high priority (a priority which is equal to normal users priority).

 
Onstat -C prof
Active Threads The number of currently running B-tree scanners
Global Commands The commands that have been requested to run
Number of partition scans The number of times the B-tree scanner has examined all partitions looking for index partitions to clean
Main Block the pointer to the B-tree scanners main block
BTC Admin The pointer to the current assigned admin thread
BTS info The pointer to the current B-tree scanner information
Id The B-tree scanner id and array position
Prio The current priority assigned to the B-tree scanner n HIGH - The B-tree scanner run with the same priority as a normal user n LOW - The B-tree scanner runs behind all normal users
Partnum The partition number of the index the B-tree scanner is cleaning. If set to 0 then it is not currently cleaning
Key The index key number that the B-tree scanner is cleaning
Cmd The current command being processed by this B-tree scanner
Number of leaves pages Scanned The number of pages the B-tree scanner has read and processed
Number of leaves with deleted items The number of leaf pages in which the B-tree scanner has deleted items
Time spent cleaning (sec) A very gross estimate of the number of whole seconds the B-tree scanner has spent cleaning indexes
 
Onstat -C hot
Current Item The current item which is being cleaned. If this number is greater than the size of the list, then the entire list has been assigned or has been already cleaned.
List Size The number of items on the hot list
Hit Threshold The number of dirty items that must be encountered on a specific index before the index will be placed on the hot list. If the B-tree scanner has been idle for over 5 minutes it might decide to internally lower this number.
List Created The time the current list was created
List expires in The number of seconds left before this list will expire
Range Scan Threshold Index contains more than X pages will use Range Scanning to clean the index. If the value is -1 then range scan cleaning is disabled
Partnum The partition of the index which needs to be cleaned
Key The key number of the index which needs to be cleaned
Hits The number of hits encountered * This index has been assigned to be cleaned or has already been cleaned
 
Onstat -C clean
Partnum The partition of the index which needs to be cleaned

C - the index is in the process of being cleaned.
N - theindex may NOT be cleaned
Dirty Hits The current number of hits (index items which have not been removed from the index which a user has encountered) on this index
Clean Time The time in seconds spent cleaning this index
Pg Examined The number of pages which have been examined by a B-tree scanner for this index
Items Del The number of dirty items removed from this index
Pages/Sec The average number of pages cleaned by the B-tree scanner per second on this index key. This should only be used for gross performance calculations because a timer with 1 second granularity is used, so the precision is low.
 
Onstat -C range
Partnum The partition of the index which needs to be cleaned

C - the index is in the process of being cleaned.
N - theindex may NOT be cleaned
Low The lowest logical page which needs to be scanned
High The highest page which needs to be scanned
Size The current number of pages which exist in this partition
Saving The percentage of pages saved by not having to scan the entire index



Post to CDI by Mark Ashworth

The sentences "Therefore, each modification is done in a series. The operations make one attempt at a modification. If the index is locked, the operation fails." Are not worded very well. Thank you for bringing it to my attention and I will work with the team on improving it.

Let me expand a little on how bts behaves. It does depend a little on the version of IDS:

In 11.50.xC5 and earlier, the BTS index operations execute in The a single bts VP and each operation executed sequentially. Multiple readers and writers can interleave their operations.

There is an internal lock to ensure no concurrent (threaded) operations execute on the VP and multiple VPs were not allowed. There is a slim possibility that the lock would timeout and the SQL statement would fail but I have not had a report of this happening. That was essentially the behaviour for bts.1.00, bts.1.10 and bts.2.00 up to and including 11.50.xC5.

Starting with 11.50.xC6, we enabled bts to work with multiple bts VPs. There is still a restriction that each VP only executes one operation at one time, however multiple bts VPs may be create with the VPCLASS onconfig variable or more bts VPs can be added with the onmode -p command. (also, the noyield flag is no longer required when defining a bts VPCLASS).

There may be multiple readers and writers of any bts index across several bts VPs. There is a Critical Section in the transaction commit with put an exclusive (write) lock on the BTS index during this phase. The lock will wait a significantly long time before it times out.

The compact operation still has an exclusive lock. The purpose of compact is to free up space back to the file system in an index built in an extent space or if the index is built in an sbspace (a bts.2.00 feature) will free up pages back to the sbspace.

But unlike the first release of BTS, BTS in 11.50 will eventually reuse pages in the index when new rows are inserted after rows have been deleted. This greatly reduces the need to compact.