Problem Statement
Is it possible to instantaneously sync data from the SQL server to Synapse in case of any data changes (Insert / Update /Delete) within a table?
Prerequisites
- SQL Server ( On-Prem SQL Server / IaaS SQL Server / SQL MI)
- Any SQL server wherein we can create a Linked server (So Azure SQL Servers are excluded)
- Synapse / Azure SQL Datawarehouse (Dedicated Pool)
- Azure Blob Storage
Solution
Create a Linked server in the SQL server database instance with Synapse / SQL Dedicated pool.
USE [master]
GO
EXEC master.dbo.sp_addlinkedserver @server = N'EventLS',
@srvproduct = N'',
@provider = N'SQLNCLI',
@datasrc = N'<>.database.windows.net',
@catalog = N'<>'
EXEC master.dbo.sp_addlinkedsrvlogin @rmtsrvname = N'EventLS',
@useself = N'False',
@locallogin = NULL,
@rmtuser = N'<>',
@rmtpassword = '<>'
GO
EXEC master.dbo.sp_serveroption @server = N'EventLS',
@optname = N'collation compatible',
@optvalue = N'false'
GO
EXEC master.dbo.sp_serveroption @server = N'EventLS',
@optname = N'data access',
@optvalue = N'true'
GO
EXEC master.dbo.sp_serveroption @server = N'EventLS',
@optname = N'dist',
@optvalue = N'false'
GO
EXEC master.dbo.sp_serveroption @server = N'EventLS',
@optname = N'pub',
@optvalue = N'false'
GO
EXEC master.dbo.sp_serveroption @server = N'EventLS',
@optname = N'rpc',
@optvalue = N'false'
GO
EXEC master.dbo.sp_serveroption @server = N'EventLS',
@optname = N'rpc out',
@optvalue = N'true'
GO
EXEC master.dbo.sp_serveroption @server = N'EventLS',
@optname = N'sub',
@optvalue = N'false'
GO
EXEC master.dbo.sp_serveroption @server = N'EventLS',
@optname = N'connect timeout',
@optvalue = N'0'
GO
EXEC master.dbo.sp_serveroption @server = N'EventLS',
@optname = N'collation name',
@optvalue = null
GO
EXEC master.dbo.sp_serveroption @server = N'EventLS',
@optname = N'lazy schema validation',
@optvalue = N'false'
GO
EXEC master.dbo.sp_serveroption @server = N'EventLS',
@optname = N'query timeout',
@optvalue = N'0'
GO
EXEC master.dbo.sp_serveroption @server = N'EventLS',
@optname = N'use remote collation',
@optvalue = N'true'
GO
EXEC master.dbo.sp_serveroption @server = N'EventLS',
@optname = N'remote proc transaction promotion',
@optvalue = N'false'
GO
Establish an External data source within Synapse to Azure Blob Storage and create an External file format.
The SQL for the above commands is available at this GitHub location.
The purpose behind #2 is for us to leverage this in the CETAS creation, which in turn would generate a file in the Azure blob storage, which we can leverage for Blob trigger purposes. So the next action is to create a Stored procedure in Synapse which can be triggered from the SQL server via the Linked server created in #1.
Create a Copy activity Pipeline with SQL server as Source and Dedicated SQL Pool / Synapse as Sink with Blob Event type trigger.
Note. The Blob path begins is equivalent to the Location specified in the CETAS script in #3.
The next aspect would be to set up Trigger in the SQL server for the table that needs Event sync set up and execute the Stored procedure present in Synapse created in #3.
So the setup is established, which would flow in the below order whenever any Data changes (Insert / Update / Delete) happen within the SQL server table.
- SQL Trigger activated
- Synapse Stored Procedure Executed
- External Table Creation (CETAS)
- File Creation in Blob
- ADF/Synapse Event Trigger activated
- ADF/Synapse Copy activity Pipeline triggered
A couple of Linked Server configurations aspects.
Within a trigger, you are inside a transaction defined by the statement that fired the trigger. By default, SQL Server attempts to promote local transactions to be a distributed transaction. As we can tell from the error message, this is not possible in this case (for Synapse), so we need to turn this Property off.
Execution Output
Scenario 1. Insert Data into SQL Server table
3 records were inserted into the table, and 1 file was generated in blob via CETAS.
Blob
ADF Log
Scenario 2. Update Data within the SQL Server table.
Scenario 3. Delete Data from the SQL Server table