c# - right - linq to sql left join




Does “where” position in LINQ query matter when joining in-memory? (2)

For in memory lists (IEnumerables), no optimization is applied and query execution is made in chained order for in-memory lists.

I also tried result by first casting it to IQueryable then apply filtering but apparently casting time is pretty high for this big table.

I made a quick test for this case.

Console.WriteLine($"List Row Count = {list.Count()}"); 
Console.WriteLine($"JoinList Row Count = {joinList.Count()}"); 

var watch = Stopwatch.StartNew();
var result = list.Join(joinList, l => l.Prop3, i=> i.Prop3, (lst, inner) => new {lst, inner})
   .Where(t => t.inner.Prop3 == "Prop13")
   .Select(t => new { t.inner.Prop4, t.lst.Prop2}); 
result.Dump();
watch.Stop();

Console.WriteLine($"Result1 Elapsed = {watch.ElapsedTicks}");

watch.Restart();
var result2 = list
   .Where(t => t.Prop3 == "Prop13")
   .Join(joinList, l => l.Prop3, i=> i.Prop3, (lst, inner) => new {lst, inner})
   .Select(t => new { t.inner.Prop4, t.lst.Prop2});

result2.Dump();
watch.Stop();
Console.WriteLine($"Result2 Elapsed = {watch.ElapsedTicks}"); 

watch.Restart();
var result3 = list.AsQueryable().Join(joinList, l => l.Prop3, i=> i.Prop3, (lst, inner) => new {lst, inner})
   .Where(t => t.inner.Prop3 == "Prop13")
   .Select(t => new { t.inner.Prop4, t.lst.Prop2}); 
result3.Dump();
watch.Stop();
Console.WriteLine($"Result3 Elapsed = {watch.ElapsedTicks}"); 

Findings:

List Count = 100
JoinList Count = 10
Result1 Elapsed = 27
Result2 Elapsed = 17
Result3 Elapsed = 591

List Count = 1000
JoinList Count = 10
Result1 Elapsed = 20
Result2 Elapsed = 12
Result3 Elapsed = 586

List Count = 100000
JoinList Count = 10
Result1 Elapsed = 603
Result2 Elapsed = 19
Result3 Elapsed = 1277

List Count = 1000000
JoinList Count = 10
Result1 Elapsed = 1469
Result2 Elapsed = 88
Result3 Elapsed = 3219

Situation: Say we are executing a LINQ query that joins two in-memory lists (so no DbSets or SQL-query generation involved) and this query also has a where clause. This where only filters on properties included in the original set (the from part of the query).

Question: Does the linq query interpreter optimize this query in that it first executes the where before it performs the join, regardless of whether I write the where before or after the join? – so it does not have to perform a join on elements that are not included later anyways.

Example: For example, I have a categories list I want to join with a products list. However, I am just interested in the category with ID 1. Does the linq interpreter internally perform the exact same operations regardless of whether I write:

from category in categories
join prod in products on category.ID equals prod.CategoryID
where category.ID == 1 // <------ below join
select new { Category = category.Name, Product = prod.Name };

or

from category in categories
where category.ID == 1 // <------ above join
join prod in products on category.ID equals prod.CategoryID
select new { Category = category.Name, Product = prod.Name };

Previous research: I already saw this question but the OP author stated that his/her question is only targeting non-in-memory cases with generated SQL. I am explicitly interested with LINQ executing a join on two lists in-memory.

Update: This is not a dublicate of "Order execution of chain linq query" question as the referenced question clearly refers to a dbset and my question explicitly addressed a non-db scenario. (Moreover, although similar, I am not asking about inclusions based on navigational properties here but about "joins".)

Update2: Although very similar, this is also not a dublicate of "Is order of the predicate important when using LINQ?" as I am asking explicitly about in-memory situations and I cannot see the referenced question explicitly addressing this case. Moreover, the question is a bit old and I am actually interested in linq in the context of .NET Core (which didn't exist in 2012), so I updated the tag of this question to reflect this second point.

Please note: With this question I am aiming at whether the linq query interpreter somehow optimizes this query in the background and am hoping to get a reference to a piece of documentation or source code that shows how this is done by linq. I am not interested in answers such as "it does not matter because the performance of both queries is roughly the same".


The LINQ query syntax will be compiled to a method chain. For details, read e.g. in this question.

The first LINQ query will be compiled to the following method chain:

categories
    .Join(
        products,
        category => category.ID,
        prod => prod.CategoryID,
        (category, prod) => new { category, prod })
    .Where(t => t.category.ID == 1)
    .Select(t => new { Category = t.category.Name, Product = t.prod.Name });

The second one:

categories
    .Where(category => category.ID == 1)
    .Join(
        products,
        category => category.ID,
        prod => prod.CategoryID,
        (category, prod) => new { Category = category.Name, Product = prod.Name });

As you can see, the second query will cause less allocations (note only one anonymous type vs 2 in the first query, and note how many instances of those anonymous types will be created on performing the query).

Furthermore, it's clear that the first query will perform a join operation on lot more data than the second (already filtered) one.

There will be no additional query optimization in case of LINQ-to-objects queries.

So the second version is preferable.





where