A database index is a data structure that improves the speed of data retrieval operations from a database table and it works based on a key value pair technique – the core binary search and linear search algorithms are built on the indexing technique. In other words, we can say – an index is a copy of selected columns of data from an original table that can be searched very efficiently, which establishes a direct link to the complete row of data from where it was copied. Again, there are various types of indexing inthe database like cluster and non cluster, etc. I have not been going into that since the purpose of this article is different. In a short note – indexing is a technique using which we can get the fastest query result from the database table. Indexing costs us additional writes and storage space in the file system.
To make it more realistic, we can give the example of the indexing in the Book. Here, the main content inside the book is the main data table and indexed pages which we generally see in the beginning of the each book where it mentions which content or chapter of the book is located in which page, this is just equivalent to hash tables in the database. Using the indexed page number – we can very quickly navigate to a specific chapter or page. Exactly indexing in database table also works in a similar way.
Let’s write the below SQL query – this displays all purchase order details from "Purchase Order" table where "OrderName" is 'PO001_Laptop'. If there are 10,000 items (rows) in the SQL table, this query will traverse thru the 10,000 rows one by one to check the below condition, even though the “PO001_Laptop” is not available in many of the rows but still this query will scan through all the rows.
Then, finally it displays the result which is a time consuming process and hits the query performance. In this technique we cannot skip even one row, the query has to scan through right from the first row to last row. Imagine if we have millions of records in the database table how long would it take to complete this query?
- SELECT * FROM PurchaseOrder
- WHERE OrderName = 'PO001_Laptop'
For the same query – if we create index on “OrderName” column – as per the above diagram this will create an entry in indexing table (hash table) and pointer will map to the original data row, whenever any query comes to SQL, first this will check the indexing table (hash table) and directly will fetch the matching items from the original data source table and will skip the checking for all unwanted rows where items are not matched. This is how indexing improves the performance of SQL queries.
How does indexing column work in SharePoint?
The list items are stored in the "AllUserData" table in the SQL. For every defined indexed column, SharePoint stores the index value for every list item in a separate table, i.e. "NameValuePair" table which we have seen in the above. Let’s say we have 20,0000 items in the list, which means that we have 20,000 rows in “AllUserData” and 20,000 additional rows in the "NameValuePair" table (used for indexing).
So the SharePoint list items view requests are served based on the key value in the "NameValuePair" table and pick the appropriate mapping items from the main "AllUserData" table which is the actual data source. This is how the SharePoint indexing column speeds up the query performance.
Is column Indexing recommended in SQL tables for the sake of better performance?
So far we have learned what is indexing and how indexed column improves the performance of the query. Now we need to see the the other side of the coin as well – I mean is column indexing recommended?
There is no straight answer for this – it all depends on the needs and the way you operate your database table. As we have learned, the column indexing costs us additional writes and storage space, so if our application requires more insert/update operation, we might want to use tables without indexes, but if it requires more data retrieval operations, we should go for indexing table.
Why is SharePoint limited to have 20 indexed columns in a list?
There might be other reasons, but this is my analysis. As we have seen the drawback of indexed column is that the indexed column indices require additional space on the disk as the indices are stored together in a table using the MyISAM engine, this file may rapidly reach to the threshold limits of the given file system, which means it will exhaust all disk space when many columns from the same list are indexed. So we should try to minimize the indexed column number as much as possible.
What are the SharePoint columns types that are supported or unsupported for indexing?
SUPPORTED COLUMN TYPES
- Title (but not in a document library)
- Single line of text
- Choice field (but not multi choice)
- Number
- Currency
- Date/ Time
- Lookup (but not multi value)
- Person or Group (but not multi value)
UNSUPPORTED COLUMN TYPES
- Multiple lines of text
- Hyperlink/Picture
- Custom Field Types
- Calculated Field
- Boolean (yes/no)
- UI version
- Checked out to
- Content type ID
Summary
Therefore, in this article we have learned what exactly the column indexing is in the database and how it improves the performance of SharePoint list view queries and we have covered the below topics,
- How does the SharePoint store the list items?
- What is indexing in database table?
- Usually without indexing how does the data get fetched from a SQL table?
- How does indexing column work in SharePoint?
- Column Indexing is recommended in SQL table for the sake of better performance?
- Why SharePoint is limited to have 20 indexed column in a list?
- What are the SharePoint columns types are supported or unsupported for indexing?
References