Indexes and Statistics

This is a high-level overview of some important aspects of database server performance and provides some examples of the value I can provide by automating the optimization of your database server environment.

Proper indexing and proper index maintenance are essential to good database server performance. Database indexes are a mysterious subject to most people, and for good reasons: the subject is extensive and quite technical. To simplify the concept of indexes, the best analogy for indexes is to consider how to find information in a large book. The book represents a database. The index in the back of the book is the key to finding information quickly and efficiently without having to search the entire book.

Database indexes are of two basic types: clustered and non-clustered. All of the data in a database is contained in clustered indexes. They are the book. Non-clustered indexes contain small subsets of the data. They are the index in the back of a book. They allow for efficient searches for specific data.

Index and Data Histogram Deterioration

Non-clustered indexes exist to support efficient execution of SQL queries. A SQL query cannot execute efficiently, especially in a large database, without the existence of correctly implemented, supporting, non-clustered indexes. But there’s more to it than that. For indexes to be efficient, they must be maintained. There are several aspects of index efficiency that are not well known:

  1. Indexes’ efficiency deteriorates with database usage and time. Specifically, data modifications cause index fragmentation. Physical and logical fragmentations introduce inefficiencies that increase data access times and aggravate database server performance problems. If a non-clustered index is too fragmented, SQL Server will ignore the index when trying to locate data, and instead, perform a full scan of the clustered index (the entire book). This occurrence significantly increases database access times and can even cause SQL query timeouts.
  2. Microsoft SQL Server maintains a 200 data point histogram of data distribution statistics for each index. The histograms exist as a subset representation of the actual distribution of data within the indexes to facilitate the swift creation of efficient SQL statement query plans by the SQL Server Query Optimizer. The Query Optimizer generated query plans are used by the Database Engine to execute queries. The query plans are only as efficient as the histogram is accurate. Data modification accesses, specifically data insertions, deletions, and updates, slowly degrade the accuracy of the histograms which may cause the Query Optimizer to generate inefficient query plans producing poor query performance.

Maintaining Optimum Index and Data Histogram Efficiency

Proper maintenance of indexes and histograms to maintain database server performance at optimum levels has several aspects:

  1. Some indexes need to be rebuilt on a regular basis. Some don’t. The usual DBA solution is to periodically rebuild all indexes in all databases. Rebuilding all indexes in a database involves reading the entire database and is highly resource intensive; it can severely impact server operations. I have developed an intelligent, proprietary algorithm that rebuilds only those indexes that require it. Selective index rebuilding greatly reduces stress on a database server's environment.
  2. Indexes have an important setting named “fill factor”. An index’s fill factor can be anywhere from 1% to 100%. SQL Server’s default is 100% (meaning the records in the index are as tightly packed as possible) but this is rarely efficient, primarily because the acts of data insertions and updates can cause database page splits, which cause fragmentation, which increases disk accesses (disks are the slowest resource component on a server), which negatively impacts a server's performance. Index fill factors need to be tailored to the server’s operating environment to allow for data insertions and updates with a minimal number of page splits. Choosing a fill factor that is optimum for each index (there can be thousands of indexes in a single database) is a time consuming, one index at a time, trial and error process for DBA’s hoping to achieve an optimum fill factor. I have developed an intelligent, proprietary algorithm that dynamically determines and implements optimum fill factors for each index and keeps them optimized on a regular basis. Even if the work load on a server increases or decreases, my algorithm automatically compensates for these changing conditions and adjusts index fill factors optimally the next time the indexes are rebuilt.
  3. Index histograms deteriorate with time, causing the SQL Server Optimizer to produce less than optimum query plans, producing poor query performance, and sometimes negatively affecting an entire server environment. There is a way to completely update all index histograms but this process, the one most DBA's use, involves high resource consumption that can negatively affect a server’s performance. I have implemented an alternative method of maintaining and updating index histograms that selectively updates only those index histograms that require it. This method is highly efficient and greatly reduces stress on a database server.

If you need help with performance tuning your SQL Server,  contact me to schedule a free consultation.