Introduction
In real-world enterprise applications, we often perform long-running batch operations for a huge number of records. Typically, we load the data upfront from the database one time and load it into memory via data table or in some other form (main table and other supported tables) and then loop through individual records in Parallel For Each for better performance. However, in some cases, if the data load is huge, we may end up having memory issues loading all the records up front so we load each individual record inside the loop in SQL which could potentially affect the performance of the job.
In order to handle this scenario, we have to address avoiding memory exceptions and also not reading the data for each record in SQL. The SQL paging concept comes in handy to address this issue by fetching rows in a slice with some limits (eg: 20K rows at a time) and performing the parallel operations in the loop.
In SQL Server 2012, Microsoft introduced OFFSET and FETCH keyword to apply the paging in SQL query results. We will loop for every 20k records and perform the parallel operations instead of individual records.
Example
- SELECT First Name + ' ' + Last Name FROM Employees
- ORDER BY First Name OFFSET 10 ROWS FETCH NEXT 5 ROWS ONLY;
This will skip the first 10 rows and return the next 5 rows.
Limitations in Using OFFSET-FETCH
- ORDER BY is mandatory to use OFFSET and FETCH clause.
- OFFSET clause is mandatory with FETCH. You can never use, ORDER BY … FETCH.
- TOP cannot be combined with OFFSET and FETCH in the same query expression.
- The OFFSET/FETCH rowcount expression can be any arithmetic, constant, or parameter expression that will return an integer value. The rowcount expression does not support scalar sub-queries.
In a real-world application, I will be using it like below.
- int startIndex = 0
- int offset = 20000
- while (true)
- {
-
-
-
- if(no rows) break;
- System.Threading.Tasks.ParallelOptions po = new System.Threading.Tasks.ParallelOptions();
- po.MaxDegreeOfParallelism = MAX_THREAD_LIMIT
- System.Threading.Tasks.Parallel.ForEach(queryData, po, row =>
- {
-
- }
- startIndex = startIndex + offset + 1;
- }