Introduction
An increasing number of applications require access to multiple databases located at different sites, perhaps widely separated geographically.
A common business scenario is a retailer with a network of warehouses spread across the country in order to speed delivery of its products. A distributed database system allows applications to access database local and remote transparently.
When data might be distributed? Why not gather all data of the enterprise in one central site? There are answers to these questions:
- Data might be distributed in order to balance the workload.
- Data might be on the site where it was created for maintenance and security purposes.
- A client sends query to the closest database in order to minimized communication costs and increase throughput.
- In order to increase data availability in the event of system crashes.
It is remarkable to say that there are two main approaches to distributed data: replication and distribution of queries. In this article, I will cover on distribution of queries where each system manages a single copy of data, and uses distributed transaction mechanisms to access local and remote data. In this case, the distributed database looks like a collection of individual databases with its own schema. If the database systems are from different vendors, the system is referred to as heterogeneous, otherwise homogeneous.
Implementing distributed databases using SQL Server
Linked servers provide SQL Server the ability to access data from remote data sources. Using these mechanisms, you can issue queries, perform data modifications and execute remote procedures. You can use the T-SQL function OPENROWSET to query a remote data source without a linked server.
Let us now create a linked server to another SQL Server instance using the T-SQL procedure sp_addlinkedserver. The syntax of sp_addlinkedserver is shown in Listing 1:
- sp_addlinkedserver
- [ @server= ] 'server'
- [ ,[ @srvproduct= ] 'product_name'
- [ , [ @provider= ] 'provider_name'
- [ , [ @datasrc= ] 'data_source'
- [ , [ @location= ] 'location' ]
- [ , [ @provstr= ] 'provider_string'
- [ , [ @catalog= ] 'catalog' ]
Listing 1
And each parameter is described in table 1.
Parameter |
Description |
server |
Local name used for the linked server. |
product_name |
Product name of the OLE DB data source. For SQL Server instances, the product_name is 'SQL Server'. |
provider_name |
This is the unique programmatic identifier for the OLE DB provider. When not specified, the provider name is the SQL Server data source. The explicit provider_name for SQL Server is SQLNCLI (for Microsoft SQL Native Client OLE DB Provider). MSDAORA is used for Oracle, OraOLEDB.Oracle for Oracle versions 8 and higher, Microsoft.Jet.OLEDB.4.0 for MS Access and MS Excel, DB2OLEDB for IBM DB2, and MSDASQL for an ODBC data source. |
data_source |
This is the data source as interpreted by the OLE DB provider. |
location |
The location interpreted by the OLE DB provider. |
provider_string |
The connection string specific to the OLE DB provider. |
catalog |
This varies from the database systems. |
Table 1.
You can configure the linked server properties such as collection compatible, collation name, connect timeout and so on (for more information refer to the SQL Server online help) using the stored procedure sp_serveroption.
When executing a distributed query against a linked server, SQL Server maps your local login and credentials to the linked server. Based on the security on the remote data source, your credentials are accepted or rejected. By default, when a linked server is created, your local credentials are used to access data on the remote site. If you do not have the proper permission on the remote site, then your connection attempt is rejected. You can set up the mapping from your local login to any existing remote login using the stored procedure sp_addlinkedsrvlogin.
The syntax of this procedure is shown in Listing 2.
- sp_addlinkedsrvlogin [ @rmtsrvname = ] 'rmtsrvname'
- [ , [ @useself = ] 'useself' ]
- [ , [ @locallogin = ] 'locallogin' ]
- [ , [ @rmtuser = ] 'rmtuser' ]
- [ , [ @rmtpassword = ] 'rmtpassword' ]
Listing 2.
The parameters of this procedure are described in Table 2.
Parameter |
Description |
Rmtsrvname |
The linked server name. |
Useself |
When the value true is used, the local SQL or Windows login is used to connect to the remote server name. If false, the locallogin, rmtuser, and rmtpassword parameters of the sp_addlinkedsrvlogin stored procedure will apply to the new mapping. |
locallogin |
The name of login to be mapped. If this parameter is null, then the mapping applies to all the logins. |
rmtuser |
The name of the remote user. |
rmtpassword |
The password of the remote user. |
Table 2.
Let us create and configure a linked server from one instance of SQL Server 2005 to one instance of SQL Server 2000 in order to execute queries on the sample database Northwind. The local and remote login mapping is done using Integrated Security associated with the current logged user. The code in TSQL is illustrated in Listing 2.
- USE [master]
- GO
- EXEC master.dbo.sp_addlinkedserver @server = N'REMOTE_NORTHWIND', @provider='SQLNCLI', @srvproduct=N'', @datasrc='remote_site_dns_or_ip', @provstr='Integrated Security=SSPI;'
- GO
- EXEC master.dbo.sp_serveroption @server=N'REMOTE_NORTHWIND', @optname=N'collation compatible', @optvalue=N'false'
- GO
- EXEC master.dbo.sp_serveroption @server=N'REMOTE_NORTHWIND', @optname=N'data access', @optvalue=N'true'
- GO
- EXEC master.dbo.sp_serveroption @server=N'REMOTE_NORTHWIND', @optname=N'rpc', @optvalue=N'false'
- GO
- EXEC master.dbo.sp_serveroption @server=N'REMOTE_NORTHWIND', @optname=N'rpc out', @optvalue=N'false'
- GO
- EXEC master.dbo.sp_serveroption @server=N'REMOTE_NORTHWIND', @optname=N'connect timeout', @optvalue=N'0'
- GO
- EXEC master.dbo.sp_serveroption @server=N'REMOTE_NORTHWIND', @optname=N'collation name', @optvalue=null
- GO
- EXEC master.dbo.sp_serveroption @server=N'REMOTE_NORTHWIND', @optname=N'query timeout', @optvalue=N'0'
- GO
- EXEC master.dbo.sp_serveroption @server=N'REMOTE_NORTHWIND', @optname=N'use remote collation', @optvalue=N'true'
- GO
- USE [master]
- GO
- EXEC master.dbo.sp_addlinkedsrvlogin @rmtsrvname = N'REMOTE_NORTHWIND', @locallogin = NULL , @useself = N'true'
- GO
Listing 3.
Now let us execute some queries on the remote instance, specifically on the database Northwind. You can issue distributed queries referring to linked server name by using the following syntax as shown in Listing 3.
[linked_server_name].[catalog].[schema].[object_name]
Listing 3.
Let us query the remote table Categories in the Northwind database using the connection provided by the linked server as shown in Listing 4.
- select * from REMOTE_NORTHWIND.Northwind.dbo.Categories
Listing 4.
As you can see you have long name to refer to remote database object. SQL Server 2005 provides a mechanism named synonym to refer to long name. Let us create a synonym to the remote table as shown in Listing 5.
- create synonym dbo.remote_northwind_categories for REMOTE_NORTHWIND.Northwind.dbo.Categories;
Listing 5.
Now the query in the Listing 4 is transformed into the query shown in Listing 6.
- select * from dbo.remote_northwind_categories
Listing 6.
Conclusion
In this article, I covered the main approach of distributed database and how it is realized using SQL Server systems.