Comparison Of Different SQL Query Responses Of SQL Server On Windows And Ubuntu Linux

Mukesh Kumar
1y
28.4k
0
3

Article

Introduction

This article is inspired by “A Study for Performance Comparison of Different In-Memory Databases” where the author compares the query response time of four in-memory databases TimesTen, Altibase, solidDB, and SQLite on Linux and Windows environments. This article uses SQL Server 2017 on Windows Server 2012 R2 and Ubuntu Linux environment. Here, I have covered the queries with different complexity levels and access data amounts to be executed. In this article, the steps are also mentioned with details on how to set up an environment for testing. I assume that the reader already knows a bit about SQL Server and Management Studio on the Windows platform, which is why more focus is on the Ubuntu Linux environment.

Test System Detail

The purpose of this study is to compare the performance of SQL Server in terms of CPU time, elapsed time, and total execution time. To conduct this study, two similar machines in terms of hardware were set up. All tests were performed using the same set of queries on the same machines. The queries were using the same row or id, tables, and columns on both machines. The two machines are identical except for the operating system. One has Windows Server 2012 R2 64-bit and the other has Ubuntu Linux 16.04- Server 64-bit. Both machines have two CPUs with Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz processor. The RAM on both machines is 8GB. The hard disk used on both machines is 128GB SSD. The products used in testing are SQL Server 2017 Express Edition and SQL Operation Studio (preview) 0.24.1. SQL Operation Studio is a free lightweight tool that can be used for development or operations on Windows, Linux, or macOS environments.

The database used in tests is ContosoBIdemo which has a large sample data with up to 12,627,608 rows.

The size of rows in each table of the database is like below.

Database

Tests, results, and observations

We already have SQL Server and SQL Operation Studio installed on machines, let's start SQL Operation Studio and run our test queries.

In Windows, we can start SQL Operation Studio from the application list and click it. The setting for the connection can be like below.

SQL Operation

In Ubuntu Linux, the settings for the connection can look like below.

sqlops

Ubuntu Linux

Query 01. Restore the database using TSQL. The size of the backup .bak file is 645MB and the restored database is 1.66GB.

Windows

SET STATISTICS TIME On;

ALTER DATABASE ContosoRetailDW
SET SINGLE_USER
WITH ROLLBACK IMMEDIATE;

RESTORE DATABASE ContosoRetailDW
FROM DISK = 'C:\Program Files\Microsoft SQL Server\MSSQL14.SQLEXPRESS\MSSQL\Backup\ContosoRetailDW.bak'
WITH REPLACE,
MOVE 'ContosoRetailDW2.0' TO 'C:\Program Files\Microsoft SQL Server\MSSQL14.SQLEXPRESS\MSSQL\DATA\ContosoRetailDW.mdf',
MOVE 'ContosoRetailDW2.0_log' TO 'C:\Program Files\Microsoft SQL Server\MSSQL14.SQLEXPRESS\MSSQL\DATA\ContosoRetailDW.ldf';

ALTER DATABASE ContosoRetailDW
SET MULTI_USER;

SET STATISTICS TIME OFF;

Linux

SET STATISTICS TIME On;

ALTER DATABASE ContosoRetailDW
SET SINGLE_USER WITH ROLLBACK IMMEDIATE;

RESTORE DATABASE ContosoRetailDW
FROM DISK = '/var/opt/mssql/ContosoRetailDW.bak'
WITH REPLACE,
MOVE 'ContosoRetailDW2.0' TO '/var/opt/mssql/data/ContosoRetailDW.mdf',
MOVE 'ContosoRetailDW2.0_log' TO '/var/opt/mssql/data/ContosoRetailDW.ldf';

ALTER DATABASE ContosoRetailDW
SET MULTI_USER;

SET STATISTICS TIME OFF;

Query 02. Retrieve a record based on the primary key.

SET STATISTICS TIME On;

SELECT TOP 1000 *
FROM FactStrategyPlan
WHERE StrategyPlanKey = 2181378
ORDER BY StrategyPlanKey DESC;

SET STATISTICS TIME Off;

Primary key

Query 03. Retrieve a record based on a foreign key.

SET STATISTICS TIME On;

SELECT COUNT(*)
FROM FactStrategyPlan
WHERE ProductCategoryKey = 5;

SET STATISTICS TIME Off;

Foreign key

Query 04. Retrieve a record when there is no key for any column.

SET STATISTICS TIME On;

SELECT COUNT(*)
FROM FactStrategyPlan
WHERE Amount > 3000 AND Amount < 4500;

SET STATISTICS TIME Off;

No key

Query 05. Retrieve the data when joining tables. Three tables are joined, FactSales has 3,406,089 rows, DimDate has 2556 rows and DimChannel has 4 rows.

SET STATISTICS TIME On;

SELECT dd.CalendarMonth, dc.ChannelName, SUM(fs.SalesAmount) AS TotalSaleAmount
FROM FactSales fs
JOIN DimDate dd ON dd.DateKey = fs.DateKey
JOIN DimChannel dc ON dc.ChannelKey = fs.ChannelKey
WHERE dd.CalendarYear = 2007
GROUP BY dd.CalendarYear, dd.CalendarMonth, dc.ChannelName
ORDER BY dc.ChannelName, dd.CalendarYear, dd.CalendarMonth;

SET STATISTICS TIME Off;

DimChannel

Query 06. Retrieve data when joining a big table. The three tables are; FactOnlineSales with 12,627,608 rows, DimDate has 2556 rows and DimStore has 306 rows.

SET STATISTICS TIME On;

SELECT
    dd.CalendarMonth,
    ds.StoreName,
    SUM(fos.SalesAmount) AS TotalSaleAmount
FROM
    FactOnlineSales fos
    JOIN DimDate dd ON dd.DateKey = fos.DateKey
    JOIN DimStore ds ON ds.StoreKey = fos.StoreKey
WHERE
    dd.CalendarYear = 2007
GROUP BY
    dd.CalendarYear,
    dd.CalendarMonth,
    ds.StoreName
ORDER BY
    ds.StoreName,
    dd.CalendarYear,
    dd.CalendarMonth;

SET STATISTICS TIME Off;

DimStore

Query 07. Update a table "FactOnlineSales" with rows 12,627,608. We are altering tables with a new column with the default value of "newsequentialid()". It is a GUID but they are created in sequence so reading/fetching a record is faster.

SET STATISTICS TIME On;

ALTER TABLE FactOnlineSales
    ADD SeqGuid UNIQUEIDENTIFIER CONSTRAINT FOS_FactOnlineSales_SeqGuid DEFAULT (newsequentialid()) with values;

SET STATISTICS TIME Off;

FactOnlineSales

Query 08. Update a table "FactOnlineSales" with rows 12,627,608. We are altering tables with a new column with the default value of "newid()". It is a GUID value.

SET STATISTICS TIME On;

ALTER TABLE FactOnlineSales
    ADD NewGuid UNIQUEIDENTIFIER CONSTRAINT FOS_FactOnlineSales_NewGuid DEFAULT (newid()) with values;

SET STATISTICS TIME Off;

GUID value

Query 09. We are trying to retrieve one row from 12,627,608 rows using sequential GUID.

-- Record id = 27063240
-- Windows
SET STATISTICS TIME On;

SELECT * FROM FactOnlineSales
WHERE seqguid = 'a53bb53e-f36b-1410-bba5-800000000000';

SET STATISTICS TIME Off;

-- Linux

SET STATISTICS TIME On;

SELECT * FROM FactOnlineSales
WHERE seqguid = '7541a8cb-17ec-e711-80c3-000d3a16ac77';

SET STATISTICS TIME Off;

Sequential GUID

Query 10. We are trying to retrieve one row from 12,627,608 rows using non-sequential GUID.

-- Record id = 27063240

-- Windows
SET STATISTICS TIME On;

SELECT * FROM FactOnlineSales
WHERE newguid = 'f22ff329-d2f5-4637-a5dd-847d8b77786b';

SET STATISTICS TIME Off;

-- Linux
SET STATISTICS TIME On;

SELECT * FROM FactOnlineSales
WHERE newguid = '2fea845d-3467-438b-8fae-49b8143c810e';

SET STATISTICS TIME Off;

Non-sequential GUID

Conclusion

From the result figures, it looks like SQL Server on Linux works as well as on Windows when reading data using Select while the Windows environment has better performance for SQL Server when it comes to writing; for example, restoring the database or updating the table.

To understand the result, it is important to understand some basics of how SQL Server actually works. The SQL Server is an in-memory database that is designed to store data in the memory to provide significant improvement in performance by fast query response time. The older approach was disk I/O operations which is very slow if compared to in-memory.

In our results, we have 3 parameters that were taken under consideration for testing. One is CPU time; it is the actual time that the CPU on the machine takes to finish the job, and it is our major point of interest. Elapsed time is CPU time plus wait time; the wait time includes time for reading from disk, locks, waiting for locks or latches, and the wait time to fetch the row. The total execution time is the time that is required for parsing, compiling, and executing each statement.

SQL Server 2017 comes with technologies like DirectIO and NUMA (Non-Uniform Memory Access). DirectIO uses Direct_IO parameter. For Linux environments, Direct IO is used while for Windows environments, SQL Server uses Informix I/O subsystem.

Direct I/O provides direct access to read and write for the disk by bypassing the file system buffers. The direct I/O can be configured using Direct I/O API and configuring the DIRECT_IO parameter. When Direct I/O is used, the performance for cooked files can match the performance of raw devices used for dbspace chunks. On Windows, the Informix I/O subsystem is implemented, which is used as overlapped or asynchronous operations without OS buffering. So, DIRECT I/O is not required for the Windows environment.

NUMA (non-uniform memory access) helps memory at different points of a processor to have different performance. Microsoft Windows has developed a subsystem of NUMA that enables the system to place memory structure efficiently. Memory-access speed is required in enterprise-class applications rather than high computing systems, so less effort is needed on NUMA memory handling in Windows when compared with Linux.

From the above facts from reliable resources, it looks like NUMA works a bit better on Windows or provides the same level of performance when reading/retrieving operations are executed, while Informix I/O works better than DIRECT I/O.

These tests can be extended with more complicated TSQL to explore more performance-related facts.