Be Careful When Combining Distinct() And OrderBy() in EF Core Queries

When working with Entity Framework Core (EF Core), developers often rely on LINQ queries to retrieve and manipulate data from the database. However, certain behaviors in EF Core can catch even experienced developers off guard, particularly when combining OrderBy or OrderByDescending with Distinct.

This article explores how EF Core handles these operations, highlights the problems you might encounter, and provides solutions to ensure you achieve the desired results.

Understanding the Problem

When you use both OrderBy and Distinct in a query, the placement of these operations in your LINQ expression matters significantly. If you apply OrderBy before DistinctEF Core's query translation will ignore the OrderBy clause, resulting in unsorted results. Conversely, if you place OrderBy after Distinct, the sorting will work correctly.

Why Does This Happen?

The root cause of this behavior lies in how EF Core translates LINQ expressions into SQL queries. The Distinct operator instructs EF Core to select unique records. However, depending on its placement in the LINQ query, it may cause EF Core to disregard the OrderBy clause because Distinct forces the query to focus on eliminating duplicates before applying ordering.

This behavior can feel like a bug, but it is better understood as a limitation of EF Core’s query translation logic.

 

Examples

Example 1: OrderBy Before Distinct (Ignored OrderBy)

var result = await context.Employees
    .OrderBy(e => e.LastName)
    .Distinct()
    .ToListAsync();

Generated SQL:

SELECT DISTINCT [e].[Id], [e].[FirstName], [e].[LastName]
FROM Employees AS [e]

In this case, the OrderBy clause is ignored in the generated SQL, and the results are not sorted by LastName.

Example 2: Distinct Before OrderBy (Correct Behavior)

var result = await context.Employees
    .Distinct()
    .OrderBy(e => e.LastName)
    .ToListAsync();

Generated SQL:

SELECT [e].[Id], [e].[FirstName], [e].[LastName]
FROM (
    SELECT DISTINCT [e].[Id], [e].[FirstName], [e].[LastName]
    FROM Employees AS [e]
) AS DistinctEmployees
ORDER BY DistinctEmployees.LastName

Here, the OrderBy clause works as expected because it is applied after duplicates are removed.

Why Does Order Matter?

In LINQ, operations are applied in sequence. When you use OrderBy first, EF Core assumes ordering is not required for the subsequent Distinct operation because the SQL standard does not guarantee that the order of rows will be preserved after applying DISTINCT.

Solutions and Best Practices

To ensure consistent results when using OrderBy and Distinct, follow these guidelines:

1. Use Distinct Before OrderBy

Place the Distinct operation before OrderBy to preserve sorting in your LINQ query.

2. Explicitly Use a Subquery

If your query is complex, you can break it into subqueries to control how EF Core translates it:

var distinctEmployees = context.Employees
    .Distinct();

var result = await distinctEmployees
    .OrderBy(e => e.LastName)
    .ToListAsync();

3. Verify Generated SQL

Always check the generated SQL query (e.g., using logging in EF Core) to ensure it matches your expectations.

4. Use Application-Side Ordering

If EF Core fails to generate the correct SQL query, you can fetch the data into memory and apply ordering on the client side:

var result = await context.Employees
    .Distinct()
    .ToListAsync();

var sortedResult = result.OrderBy(e => e.LastName).ToList();

While this works, it is less efficient because all records are loaded into memory before ordering.

Conclusion

When using EF Core, always be mindful of the order in which you place OrderBy and Distinct in your LINQ queries. Misplacement can lead to unexpected results due to EF Core's query translation logic. By understanding how these operators interact and following best practices, you can avoid pitfalls and ensure your queries return the desired results.

Did you encounter similar issues in your EF Core projects? Share your experiences in the comments!


Similar Articles