## Understanding Statistics and its affect on Query Performance

Query Processing and its performance is the heart of any RDBMS. SAP ASE uses **cost based optimizer** for query processing and optimization means it creates the different access path (query plan) for a query and it picks up best plan which has least cost for a given query. This query plan cost is calculated from the data distribution and statistics information of a table and its columns.

There are two types of statistics –

**Object Level Stats :** The object level statistics describe a table and its indexes and include values such as number of rows and pages in the table and/or index (es), the number of empty pages, and the cluster ratios among others. Some of the object level statistics are updated automatically by ASE, others when ‘update statistics’ is run.

**Column Level Stats :** The column level statistics describe the distribution of values in the column; they consist of the column’s histogram and density values and are updated when an index is created or an ‘update statistics’ command is run.

**System Tables which store these stats:**

The systabstats and sysstatistics tables store statistics for all tables, indexes, and any un-indexed columns for which you have explicitly created statistics.

**systabstats** stores information about the table or index as an object, and is updated by query processing, data definition language, and update statistics commands. Simply it stores object level of statistics.These statistics include:

- Number of data pages for a table, or the number of leaf level pages for an index.
- Number of rows in the table
- Height of the index
- Average length of data rows and leaf rows
- Number of forwarded and deleted rows
- Number of empty pages
- Statistics to increase the accuracy of I/O cost estimates, including cluster ratios, the number of pages that share an extent with an allocation page, and the number of OAM and allocation pages used for the object
- Stopping points for the reorg command so that it can resume processing
- systabstats contains one row for each clustered index, one row for each nonclustered index, one row for each table without a clustered index, and one row for each partition.
- The storage for clustered index information depends on the locking scheme for the table:
- For data-only-locked tables, systabstats stores an additional row for a clustered index.
- For allpages-locked tables, the data pages are treated as the leaf level of the index, so the systabstats entry for a clustered index is stored in the same row as the table data.The indid column for clustered indexes on allpages-locked tables is always 1.

**sysstatistics** table stores one or more rows for each indexed column on a user table; it also stores statistics for unindexed columns.

- The first row for each column stores basic statistics about the column, such as the density for joins and search arguments, the selectivity for some operators, and the number of steps stored in the histogram for the column.
- If the index has multiple columns, or if you specify multiple columns when you generate statistics for unindexed columns, there is a row for each prefix subset of columns.
- Additional rows store histogram data for the leading column. Histograms do not exist if indexes were created before any data was inserted into a table. To generate a histogram, run update statistics after inserting data.

Stay Tuned ! We will talk about more in coming posting!