New, clean, and structured data are what the latest analytics platforms are built on. ETL from OLTP databases to a data warehouse is a key process to enable reporting, dashboards, and machine learning. In this book, we show how to build an effective and sustainable ETL pipeline using C# 14 and.NET 9.
Why Use C# for ETL?
C# needs to offer.
- Strong typing for security and readability
- High-performance capabilities for large data transformations
- First-class native support for SQL Server, PostgreSQL, MySQL, Oracle
- Native async and parallel
- Simple integration with APIs, cloud providers, and new data formats (Parquet, Avro, JSON)
With 14 new features in C# — main constructors, lambda properties, and collection expressions — ETL code is more readable and writable. With the accompanying .NET 9 performance boost, C# is ready for production in enterprise data engineering.
Scenario: SQL Server Retrieves Orders, Loads into Synapse
Step 1. Main Constructors for Data Model Creation
Your models are concise and readable with main constructors, perfect for mapping flat shapes from database rows.
public record Order(
string Id,
DateTime OrderDate,
decimal Total,
string CustomerId
);
It is easy to develop your data models without boilerplate.
Step 2. Extract Data from OLTP (SQL Server)
High-performance and flexible data access with Dapper.
var sql = @"
SELECT Id, OrderDate, Total, CustomerId
FROM Orders
WHERE OrderDate >= @From";
using var connection = new SqlConnection(connectionString);
var orders = await connection.QueryAsync<Order>(
sql,
new { From = DateTime.UtcNow.AddDays(-1) }
);
Parametrize the SQL query to load data in batches (last 24 hours, last load time, etc.). Batch by key ranges or date partitions for bulk loads.
Step 3. Transform for Data Warehouse Shape
Apply any logic to transform your DW schema (denormalizing, rounding, reformatting).
var orderFacts = orders.Select(o => new
{
o.Id,
OrderDate = o.OrderDate.ToString("yyyy-MM-dd"),
TotalAmount = Math.Round(o.Total, 2),
CustomerKey = o.CustomerId
});
Use collection expressions where static lookup values are needed.
const currencies = ["USD", "EUR", "GBP"];
Load the data as needed, either by joining the customer or product dimensions in memory or by using APIs calls.
Step 4. Bulk Load Warehouse (Azure Synapse or SQL DW)
SQLBulkCopy or Synapse COPY statement for high-speed ingest.
using (var bulk = new SqlBulkCopy(synapseConnection))
{
bulk.DestinationTableName = "[dw].[OrderFacts]";
bulk.BatchSize = 5000;
bulk.BulkCopyTimeout = 300;
bulk.WriteToServer(orderFacts.ToDataTable());
}
ToDataTable() either utilizing reflection or the FastMember library for performance-optimized property mapping.
For cloud warehouses like Snowflake or BigQuery, stage and load through their respective .NET SDK or REST APIs.
Step 5. Automate and Schedule
Stabilize and reproducibility make your ETL job.
- Use a .NET Worker Service to host and run in the background continuously.
- Use Quartz.NET or Windows Task Scheduler for a scheduled run.
- Log execution metadata (start/end date, rows processed, errors) using Serilog or NLog.
- Write progress status (e.g., last extracted date) to a metadata table.
Example metadata tracking
await connection.ExecuteAsync(
"UPDATE etl_metadata SET last_load = @Time",
new { Time = DateTime.UtcNow }
);
You can also add alerting via email or webhook if the job fails.
Optional Optimizations
Load as Parquet to Data Lake
Use Apache Arrow/Parquet libraries.
ParquetWriter.Write("output/orders.parquet", orderFacts);
Upload then,
var client = new DataLakeServiceClient("connection-string");
var fsClient = client.GetFileSystemClient("mydata");
await fsClient
.GetDirectoryClient("raw/orders")
.UploadAsync("orders.parquet", overwrite: true);
Use EF Core + Dapper Both
Use EF Core for lookup and Dapper for bulk data reading.
var customers = await dbContext.Customers
.ToDictionaryAsync(c => c.Id);
Best Practices for Enterprise Pipelines
- Transactional staging: Stage to final tables on validation after staging to a temp table.
- Validation checks: Row count equality, null, and schema validation.
- Retry logic: Retry transient failures from the source or DW with Polly.
- Data lineage logging: Log source, transform, and load stages.
- Parallelism: Parallel extract and parallel transform with Parallel.ForEachAsync().
Conclusion
With C# 14's new capabilities and strong libraries like Dapper and first-class cloud infrastructure support within.NET, you can build scalable, sustainable, and resilient ETL pipelines. Wherever your destination, whether SQL Server, Azure Synapse, or data lake, C# provides the speed, security, and accountability that serious data processing demands.
Full Class Example
using System;
using System.Collections.Generic;
using System.Data;
using System.Data.SqlClient;
using System.Linq;
using System.Threading.Tasks;
using Dapper;
using Microsoft.Extensions.Logging;
public class OrderEtlProcessor
{
private readonly string _sourceConnectionString;
private readonly string _targetConnectionString;
private readonly ILogger<OrderEtlProcessor> _logger;
public OrderEtlProcessor(string sourceConn, string targetConn, ILogger<OrderEtlProcessor> logger)
{
_sourceConnectionString = sourceConn;
_targetConnectionString = targetConn;
_logger = logger;
}
public async Task RunAsync(DateTime since)
{
try
{
_logger.LogInformation("Starting ETL for orders since {Time}", since);
var orders = await ExtractOrdersAsync(since);
var transformed = TransformOrders(orders);
await LoadToWarehouseAsync(transformed);
_logger.LogInformation("ETL complete. Loaded {Count} orders.", transformed.Count);
}
catch (Exception ex)
{
_logger.LogError(ex, "ETL process failed.");
throw;
}
}
private async Task<List<Order>> ExtractOrdersAsync(DateTime since)
{
const string query = "SELECT Id, OrderDate, Total, CustomerId FROM Orders WHERE OrderDate >= @Since";
using var conn = new SqlConnection(_sourceConnectionString);
var orders = await conn.QueryAsync<Order>(query, new { Since = since });
return orders.ToList();
}
private List<OrderFact> TransformOrders(IEnumerable<Order> orders)
{
return orders.Select(o => new OrderFact
{
OrderId = o.Id,
OrderDate = o.OrderDate.ToString("yyyy-MM-dd"),
TotalAmount = Math.Round(o.Total, 2),
CustomerKey = o.CustomerId
}).ToList();
}
private async Task LoadToWarehouseAsync(List<OrderFact> facts)
{
using var conn = new SqlConnection(_targetConnectionString);
using var bulk = new SqlBulkCopy(conn)
{
DestinationTableName = "[dw].[OrderFacts]",
BatchSize = 5000,
BulkCopyTimeout = 300
};
var table = new DataTable();
table.Columns.Add("OrderId", typeof(string));
table.Columns.Add("OrderDate", typeof(string));
table.Columns.Add("TotalAmount", typeof(decimal));
table.Columns.Add("CustomerKey", typeof(string));
foreach (var fact in facts)
{
table.Rows.Add(fact.OrderId, fact.OrderDate, fact.TotalAmount, fact.CustomerKey);
}
await conn.OpenAsync();
await bulk.WriteToServerAsync(table);
}
}
public record Order(string Id, DateTime OrderDate, decimal Total, string CustomerId);
public class OrderFact
{
public string OrderId { get; set; }
public string OrderDate { get; set; }
public decimal TotalAmount { get; set; }
public string CustomerKey { get; set; }
}