Introduction
A simple rule says that for optimal performance, we need to make as few database requests as possible. This is especially relevant for insert and update scenarios, where we sometimes need to work with thousands of objects. Sending those objects to the database one by one is usually significantly slower than in a batch. With tools like
Entity LINQ we can easily write efficient data modification queries.
Let's analyze the possible cases:
- INSERT
- UPDATE (DELETE)
- Upsert (UPDATE or INSERT)
INSERT case optimizations
Insert multiple rows in a single statement. Assuming we have a collection of promotions, the following snippet inserts them in a single query:
- public void InsertBatch(IEnumerable<Promotions> promos)
- {
- var query = DbContext.Promotions.Query((Promotions promo) =>
- {
- var set = promo.@using((promo.PromotionName, promo.Discount, promo.StartDate, promo.ExpiredDate));
-
- INSERT().INTO(set);
- var r = OUTPUT(INSERTED<Promotions>());
- VALUES(set.RowsFrom(promos));
-
- return r;
- });
-
- foreach (var promo in query)
- Console.WriteLine((promo.PromotionId, promo.PromotionName, promo.Discount, promo.StartDate, promo.ExpiredDate));
- }
INSERT INTO ... SELECT ...
This is a so-called bulk insert. In the case that the data is already in the database, it is much cheaper to avoid data pulling altogether. The operation can potentially be performed inside the database. The following snippet copies addresses from one table to another according to some criteria without a single byte to leave the database.
- var cities = new[] { "Santa Cruz", "Baldwin" };
-
- DbContext.Database.Query((Stores stores, Addresses address) =>
- {
- var set = address.@using((address.Street, address.City, address.State, address.ZipCode));
-
- INSERT().INTO(set);
- SELECT((stores.Street, stores.City, stores.State, stores.ZipCode));
- FROM(stores);
- WHERE(cities.Contains(stores.City));
- });
UPDATE case optimizations
A bulk update works when there is a need to update multiple rows in the same table. There is a special SQL construct for this case - UPDATE ... WHERE, which performs the update in a single query. Some databases, like SQL server, also support a more powerful UPDATE ... JOIN construct. Below we update all the tax configurations without pulling them to the application server:
- var one = 0.01M;
- var two = 0.02M;
- DbContext.Database.Query((Taxes taxes) =>
- {
- UPDATE(taxes).SET(() => {
- taxes.MaxLocalTaxRate += two;
- taxes.AvgLocalTaxRate += one;
- });
- WHERE(taxes.MaxLocalTaxRate == one);
- });
Bulk delete, same idea for the delete case.
- private static void PrepareProductHistory(Products products)
- {
- var productHistory = ToTable<Products>(PRODUCT_HISTORY);
-
- SELECT(products).INTO(productHistory);
- FROM(products);
-
- Semicolon();
- }
-
- ...
-
- var year = 2017;
- DbContext.Database.Query((Products products) =>
- {
- PrepareProductHistory(products);
- var productHistory = ToTable<Products>(PRODUCT_HISTORY);
-
- DELETE().FROM(productHistory);
- WHERE(productHistory.ModelYear == year);
- });
UPSERT optimization
Upsert means UPDATE or INSERT in a single statement. In cases when a table has more than 1 unique constraint (in addition to PK), plain INSERT can fail on duplicate key. In those cases, we usually want to ignore, replace or combine some existing fields with new values. Most vendors support this capability, but using different syntax and providing different features.
MERGE
SQL Server and Oracle. In fact, this is the
official standard. In its simplest form, it allows for the specification of what to do WHEN MATCHED, i.e. when there is a unique key collision, and what to do WHEN NOT MATCHED, i.e. we can INSERT safely.
- DbContext.Category.Query((Category category) =>
- {
- var staging = ToTable<Category>(CATEGORY_STAGING);
-
- MERGE().INTO(category).USING(staging).ON(category == staging);
-
- WHEN_MATCHED().THEN(MERGE_UPDATE().SET(() =>
- {
- category.CategoryName = staging.CategoryName;
- category.Amount = staging.Amount;
- }));
-
- var set = category.@using((category.CategoryId, category.CategoryName, category.Amount));
- WHEN_NOT_MATCHED().THEN(MERGE_INSERT(set.ColumnNames(), VALUES(set.RowFrom(staging))));
-
- Semicolon();
-
- return SelectAll(category);
- });
INSERT ... ON DUPLICATE ... - MySQL and Postgres. The syntax is much simpler and allows us to handle the most common case only (compared to feature-packed MERGE):
-
-
- newOrExisting.LastUpdate = DateTime.Now;
-
- DbContext.Database.Query((Store store) =>
- {
- var view = store.@using((store.StoreId, store.AddressId, store.ManagerStaffId, store.LastUpdate));
- INSERT().INTO(view);
- VALUES(view.RowFrom(newOrExisting));
- ON_DUPLICATE_KEY_UPDATE(() => store.LastUpdate = INSERTED_VALUES(store.LastUpdate));
- });
Summary
Batch/bulk updates are usually in the order of a magnitude faster than working with entities one by one. Optimizing those scenarios is usually an easy improvement.