c# within How to join unknown number of lists in LINQ




linq where condition in list c# (5)

I have three lists of different types :

List<Customer> customerList = new List<Customer>();
List<Product> productList = new List<Product>();
List<Vehicle> vehicleList = new List<Vehicle>();

I also have this list

List<string> stringList = {"AND","OR"};

Since first element of stringList is AND I want to make inner join with customerList and productList. Then I want to make right join vehicleList with the result such as :

from cust in customerList 
join prod in productList on cust.ProductId equals prod.Id
join veh in vehicleList on prod.VehicleId equals veh.Id into v
from veh in v.DefaultIfEmpty()
select new {customerName = cust.Name, customerVehicle=veh.VehicleName}

I want to make this in automatized way, lets say I have N number of lists and N-1 number of ANDs and ORs, how can I join them? Besides there can be many lists of the same type. Is such a thing even possible? If not what can I do to make this closer to my need? Thanks in advance.

EDIT : I'm holding the lists and their types in a Dictionary like this :

var listDict = new Dictionary<Type, object>();

So I can iterate inside this dictionary if necessary.


If you always want the same set of output columns, then write your query ahead of time:

select * 
from

  customerList c
  inner join 
  productList p on c.ProductId = p.Id

  inner join
  vehicleList v on p.VehicleId = v.Id

Then append a dynamic where. At its simplest, just replace 'CustomerCity:' with 'c.city' and so on, so that what they wrote becomes valid SQL (Danger danger: if your user is not to be trusted then you must must must make your SQL injection proof. At the very least scan it for DML, or limit the keywords they can provide. Better would be to parse it into fields, parameterise it properly and add the values they provide to parameters)

Simple (ugh) we let the SQL parser do some work:

string whereClause = userInput;
whereClause = whereClause.Replace("CustomerCity:", "c.City = '");
whereClause = whereClause.Replace("VehicleNumber:", "v.Number = ");
//and so on
whereClause = whereClause.Replace(" AND", "' AND");
//some logic here to go through the string and close up those apostrophes

Ugly, and fragile. And hackable (if you care).

Parsing would be better:

sqlCommand.CommandText = "SELECT ... WHERE ";

string whereBits = userInput.Split(" ");
var parameters as new Dictionary<string, string>();
parameters["customercity"] = "c.City";
parameters["vehiclenumber"] = "v.Number";

foreach(var token in whereBits){
    var frags = token.Split(':');
    string friendlyName = frags[0].ToLower();

    //handle here the AND and OR -> append to sql command text and continue the loop        

    if(parameters.ContainsKey(friendlyName)){
      sqlCommand.CommandText += parameters[friendlyName] + " = @" + friendlyName;
      sqlCommand.Parameters.AddWithValue("@" + friendlyname, frags[1]);
    }
}

//now you should have an sql that looks like
//SELECT ... WHERE customercity = @customercity ...
// and a params collection that looks like:
//sql.Params[0] => ("@customercity", "Seattle", varchar)...

One thing to consider: will your user be able to construct that query and get the results they want? What in a users mind does CustomerCity:Seattle OR ProductType:Computer AND VehicleNumber:8 AND CustomerName:Jason mean anyway? Everyone in Seattle, plus every Jason whose Computer is in vehicle 8? Everyone in Seattle or who has a computer, but they must have vehicle 8 and be called jason?

Without precedence, queries could just turn out garbage in the user's hands


I think it would have been better if you just describe what the requirement is, instead of asking how to implement this strange design.

Performance isn't a problem... now. But that is how it always starts...

Anyways, I do not think performance has to be an issue. But that depends on the relations between tables. In your example there are lists with only one foreign key. Each customer has one product and each product has one vehicle. Resulting in one record.

But what happens if one vehicle has multiple products, from multiple customers? If you allow to combine tables in all kinds of ways, you're bound to create a Cartesian Product somewhere. Resulting in 1000s or more rows.

And how are you going to implement multiple relations between objects? Suppose there are users, and customer has the fields UpdatedByUser and CreatedByUser. How do you know which user maps to which field?

And what about numeric fields? It seems that you are treating all fields as string.

If you want to allow users to build queries, according to the relations in the database and existing fields, the best thing to do may be to write (generic) code to build your own expression trees. Using reflection you can show properties, etc. That may also result in the best queries.

But you may also consider to use MongoDB instead of Sql Server. If relations are not that important, then a relational database may not be the right place to store data. You may also consider to use the Full-text search feature in Sql Server.

If you want to use Sql Server then you should take advantage of the navigation properties that are present in Entity Framework 6 (code first). You think that is not what you need, but I think it can be very easy.

First you'll need to create a model and entities. Please note that you should not use the [Required] attribute for foreign keys. Because if you do, this will be translated to an inner join.

Next take the table you want to query:

var ctx = new Model();
//ctx.Configuration.ProxyCreationEnabled = false;
var q = ctx.Customers.AsQueryable();
// parse the 'parameters' to build the query
q = q.Include("Product");
// You'll have to build the include string
q = q.Include("Product.Vehicle");
var res = q.FirstOrDefault();

This will get all the data you'll need, all using left joins. In order to 'convert' a left join to an inner join you filter the foreign key to be not null:

var res = q.FirstOrDefault(cust => cust.ProductId != null);

So all you need is the table where you want to start. And then build the query anyway you like. You can even parse a string: Customer AND Product OR Vehicle instead of using seperate lists.

The variable res contains the customer which links to Product. But res should be the result of a select:

var res = q.Select(r => new { CustName = Customer.Name, ProductName = Customer.Product.Name).FirstOrDefault();

In the question there is no mention of filters, but in the comments there is. In case you want to add filters you can also think of building your query like this:

q = q.Where(cust => cust.Name.StartsWith("a"));
if (someCondition = true)
    q = q.Where(cust => cust.Product.Name.StartsWith("a"));
var res = q.ToList();

This is just to give you an idea how you can take advantage of EF6 (code-first). You don't have to think about the joins, since these are already defined and automatically picked up.


UPDATE 5-15-17:

Just for the sake of recap what I am proposing is an example that we want to:

  1. Pass in a list of N number of Table objects.
  2. Pass in a list of N-1 join clauses of how to join them. EG: You have 2 tables you need a single join, 3 you need 2, and so on.
  3. We want to be to pass in a predicate to go up or down the chain to narrow scope.

What I would propose is to do all of this in SQL and pass into SQL an xml object that it can parse. However to keep it a little more simple to not deal with XML serialization too, let's stick with strings that are essentially one or many values to pass in. Say we have a structure going off of above like this:

/*
CREATE TABLE Customer ( Id INT IDENTITY, CustomerName VARCHAR(64), ProductId INT)
INSERT INTO Customer VALUES ('Acme', 1),('Widgets', 2)
CREATE TABLE Product (Id INT IDENTITY, ProductName VARCHAR(64), VehicleId INT)
Insert Into Product Values ('Shirt', 1),('Pants', 2)
CREATE TABLE VEHICLE (Id INT IDENTITY, VehicleName VARCHAR(64))
INSERT INTO dbo.VEHICLE VALUES ('Car'),('Truck')

CREATE TABLE Joins (Id INT IDENTITY, OriginTable VARCHAR(32), DestinationTable VARCHAR(32), JoinClause VARCHAR(32))
INSERT INTO Joins VALUES ('Customer', 'Product', 'ProductId = Id'),('Product', 'Vehicle', 'VehicleId = Id')

--Data as is if I joined all three tables
CustomerId  CustomerName    ProductId   ProductName VehicleId   VehicleName
1   Acme    1   Shirt   1   Car
2   Widgets 2   Pants   2   Truck
*/

This structure is pretty simplistic and everything is one to one key relationships versus it could have some other identifiers. The key to making things work is to maintain a table that describes HOW these tables relate. I called this table joins. Now I can create a dynamic proc like so:

CREATE PROC pDynamicFind
  (
    @Tables varchar(256)
  , @Joins VARCHAR(256)
  , @Predicate VARCHAR(256)
  )
AS
BEGIN
  SET NOCOUNT ON;

    DECLARE @SQL NVARCHAR(MAX) = 
'With x as 
    (
    SELECT
    a.Id
  , {nameColumns}
  From {joins}
  Where {predicate}
  )
SELECT *
From x
  UNPIVOT (Value FOR TableName In ({nameColumns})) AS unpt
'
    DECLARE @Tbls TABLE (id INT IDENTITY, tableName VARCHAR(256), joinType VARCHAR(16))
    DECLARE @Start INT = 2
    DECLARE @alphas VARCHAR(26) = 'abcdefghijklmnopqrstuvwxyz'

    --Comma seperated into temp table (realistically most people create a function to do this so you don't have to do it over and over again)
    WHILE LEN(@Tables) > 0
    BEGIN
        IF PATINDEX('%,%', @Tables) > 0
        BEGIN
            INSERT INTO @Tbls (tableName) VALUES (RTRIM(LTRIM(SUBSTRING(@Tables, 0, PATINDEX('%,%', @Tables)))))
            SET @Tables = SUBSTRING(@Tables, LEN(SUBSTRING(@Tables, 0, PATINDEX('%,%', @Tables)) + ',') + 1, LEN(@Tables))
        END
        ELSE
        BEGIN
            INSERT INTO @Tbls (tableName) VALUES (RTRIM(LTRIM(@Tables)))
            SET @Tables = NULL
        END
    END

    --Have to iterate over this one seperately
    WHILE LEN(@Joins) > 0
    BEGIN
        IF PATINDEX('%,%', @Joins) > 0
        BEGIN
            Update @Tbls SET joinType = (RTRIM(LTRIM(SUBSTRING(@Joins, 0, PATINDEX('%,%', @Joins))))) WHERE id = @Start
            SET @Joins = SUBSTRING(@Joins, LEN(SUBSTRING(@Joins, 0, PATINDEX('%,%', @Joins)) + ',') + 1, LEN(@Joins))
            SET @Start = @Start + 1
        END
        ELSE
        BEGIN
            Update @Tbls SET joinType = (RTRIM(LTRIM(@Joins))) WHERE id = @Start
            SET @Joins = NULL
            SET @Start = @Start + 1
        END
    END

    DECLARE @Join VARCHAR(256) = ''
    DECLARE @Cols VARCHAR(256) = ''

    --Determine dynamic columns and joins
    Select 
      @Join += CASE WHEN joinType IS NULL THEN t.tableName + ' ' + SUBSTRING(@alphas, t.id, 1) 
      ELSE ' ' + joinType + ' JOIN ' + t.tableName + ' ' + SUBSTRING(@alphas, t.id, 1) + ' ON ' + SUBSTRING(@alphas, t.id-1, 1) + '.' + REPLACE(j.JoinClause, '= ', '= ' + SUBSTRING(@alphas, t.id, 1) + '.' )
      END
    , @Cols += CASE WHEN joinType IS NULL THEN t.tableName + 'Name' ELSE ' , ' + t.tableName + 'Name' END
    From @Tbls t
      LEFT JOIN Joins j ON t.tableName = j.DestinationTable

    SET @SQL = REPLACE(@SQL, '{joins}', @Join)
    SET @SQL = REPLACE(@SQL, '{nameColumns}', @Cols)
    SET @SQL = REPLACE(@SQL, '{predicate}', @Predicate)

    --PRINT @SQL
    EXEC sp_executesql @SQL
END
GO

I now have a medium for finding things that makes it stubbed query so to speak that I can replace the source of the from statement, what I query on, what value I use to query on. I would get results from it like this:

EXEC pDynamicFind 'Customer, Product', 'Inner', 'CustomerName = ''Acme'''
EXEC pDynamicFind 'Customer, Product, Vehicle', 'Inner, Inner', 'VehicleName = ''Car'''

Now what about setting that up in EF and using it in code? Well you can add procs to EF and get data from this as context. The answer that this addresses is that I am essentially giving back a fixed object now despite however many columns I may add. If my pattern is always going to be '(table)name' to N numbers of tables I can normalize my result by unpivoting and then just getting N number of rows for however many tables I have. Thus performance may be worse as you get larger result sets but the potential to make however many joins you want as long as similar structure is used is possible.

The point I am making though is that SQL is ultimately getting your data and doing crazy joins that result from Linq is at times more work than it's worth. But if you do have a small result set and a small db, you are probably fine. This is just an example of how you would get completely different objects in SQL using dynamic sql and how fast it can do something once the code for the proc is written. This is just one way to skin a cat of which I am sure there are many. The problem is whatever road you go down with dynamic joins or a method of getting things out is going to require some type of normalization standard, factory pattern or something where it says I can have N inputs that always yield the same X object no matter what. I do this through a vertical result set, but if you want a different column than say 'name' you are going to have to code more for that as well. However the way I built this if you want the description but say wanted to do a predicate for a date field, this would be fine with that.


The following code solves your problem.

Fist we need data, so I build some sample lists of three different types. My solution can handle multiple tables of the same data type.

Then I build the list of join specifications, specifying the tables, join fields and join type:

Warning: The order of the specifications must be same (must follow the topological sort). The first join joins two tables. The subsequent joins must join one new table to one of the existing tables.

var joinSpecs = new IJoinSpecification[] {
    JoinSpecification.Create(list1, list2, v1 => v1.Id, v2 => v2.ForeignKeyTo1, JoinType.Inner),
    JoinSpecification.Create(list2, list3, v2 => v2.Id, v3 => v3.ForeignKeyTo2, JoinType.LeftOuter)
};

then you just execute the joins:

//Creating LINQ query
IEnumerable<Dictionary<object, object>> result = null;
foreach (var joinSpec in joinSpecs) {
    result = joinSpec.PerformJoin(result);
}
//Executing the LINQ query
var finalResult = result.ToList();

The result is a list of dictionaries containing the joined items, so the access looks like this: rowDict[table1].Column2. You can even have multiple tables of same type - this system handles that easily.

Here is how you do the final projection of your joined data:

var resultWithColumns = (
    from row in finalResult
    let item1 = row.GetItemFor(list1)
    let item2 = row.GetItemFor(list2)
    let item3 = row.GetItemFor(list3)
    select new {
        Id1 = item1?.Id,
        Id2 = item2?.Id,
        Id3 = item3?.Id,
        Value1 = item1?.Value,
        Value2 = item2?.Value,
        Value3 = item3?.Value
    }).ToList();

The full code:

using System;
using System.Collections.Generic;
using System.Linq;

public class Type1 {
    public int Id { get; set; }
    public int Value { get; set; }
}

public class Type2 {
    public int Id { get; set; }
    public string Value { get; set; }
    public int ForeignKeyTo1 { get; set; }
}

public class Type3 {
    public int Id { get; set; }
    public string Value { get; set; }
    public int ForeignKeyTo2 { get; set; }
}

public class Program {
    public static void Main() {
        //Data
        var list1 = new List<Type1>() {
            new Type1 { Id = 1, Value = 1 },
            new Type1 { Id = 2, Value = 2 },
            new Type1 { Id = 3, Value = 3 }
            //4 is missing
        };
        var list2 = new List<Type2>() {
            new Type2 { Id = 1, Value = "1", ForeignKeyTo1 = 1 },
            new Type2 { Id = 2, Value = "2", ForeignKeyTo1 = 2 },
            //3 is missing
            new Type2 { Id = 4, Value = "4", ForeignKeyTo1 = 4 }
        };
        var list3 = new List<Type3>() {
            new Type3 { Id = 1, Value = "1", ForeignKeyTo2 = 1 },
            //2 is missing
            new Type3 { Id = 3, Value = "2", ForeignKeyTo2 = 2 },
            new Type3 { Id = 4, Value = "4", ForeignKeyTo2 = 4 }
        };

        var joinSpecs = new IJoinSpecification[] {
            JoinSpecification.Create(list1, list2, v1 => v1.Id, v2 => v2.ForeignKeyTo1, JoinType.Inner),
            JoinSpecification.Create(list2, list3, v2 => v2.Id, v3 => v3.ForeignKeyTo2, JoinType.LeftOuter)
        };

        //Creating LINQ query
        IEnumerable<Dictionary<object, object>> result = null;
        foreach (var joinSpec in joinSpecs) {
            result = joinSpec.PerformJoin(result);
        }

        //Executing the LINQ query
        var finalResult = result.ToList();

        //This is just to illustrate how to get the final projection columns
        var resultWithColumns = (
            from row in finalResult
            let item1 = row.GetItemFor(list1)
            let item2 = row.GetItemFor(list2)
            let item3 = row.GetItemFor(list3)
            select new {
                Id1 = item1?.Id,
                Id2 = item2?.Id,
                Id3 = item3?.Id,
                Value1 = item1?.Value,
                Value2 = item2?.Value,
                Value3 = item3?.Value
            }).ToList();

        foreach (var row in resultWithColumns) {
            Console.WriteLine(row.ToString());
        }
        //Outputs:
        //{ Id1 = 1, Id2 = 1, Id3 = 1, Value1 = 1, Value2 = 1, Value3 = 1 }
        //{ Id1 = 2, Id2 = 2, Id3 = 3, Value1 = 2, Value2 = 2, Value3 = 2 }
    }
}

public static class RowDictionaryHelpers {
    public static IEnumerable<Dictionary<object, object>> CreateFrom<T>(IEnumerable<T> source) where T : class {
        return source.Select(item => new Dictionary<object, object> { { source, item } });
    }

    public static T GetItemFor<T>(this Dictionary<object, object> dict, IEnumerable<T> key) where T : class {
        return dict[key] as T;
    }

    public static Dictionary<object, object> WithAddedItem<T>(this Dictionary<object, object> dict, IEnumerable<T> key, T item) where T : class {
        var result = new Dictionary<object, object>(dict);
        result.Add(key, item);
        return result;
    }
}

public interface IJoinSpecification {
    IEnumerable<Dictionary<object, object>> PerformJoin(IEnumerable<Dictionary<object, object>> sourceData);
}

public enum JoinType {
    Inner = 1,
    LeftOuter = 2
}

public static class JoinSpecification {
    public static JoinSpecification<TLeft, TRight, TKeyType> Create<TLeft, TRight, TKeyType>(IEnumerable<TLeft> LeftTable, IEnumerable<TRight> RightTable, Func<TLeft, TKeyType> LeftKeySelector, Func<TRight, TKeyType> RightKeySelector, JoinType JoinType) where TLeft : class where TRight : class {
        return new JoinSpecification<TLeft, TRight, TKeyType> {
            LeftTable = LeftTable,
            RightTable = RightTable,
            LeftKeySelector = LeftKeySelector,
            RightKeySelector = RightKeySelector,
            JoinType = JoinType,
        };
    }
}

public class JoinSpecification<TLeft, TRight, TKeyType> : IJoinSpecification where TLeft : class where TRight : class {
    public IEnumerable<TLeft> LeftTable { get; set; } //Must already exist
    public IEnumerable<TRight> RightTable { get; set; } //Newly joined table
    public Func<TLeft, TKeyType> LeftKeySelector { get; set; }
    public Func<TRight, TKeyType> RightKeySelector { get; set; }
    public JoinType JoinType { get; set; }

    public IEnumerable<Dictionary<object, object>> PerformJoin(IEnumerable<Dictionary<object, object>> sourceData) {
        if (sourceData == null) {
            sourceData = RowDictionaryHelpers.CreateFrom(LeftTable);
        }
        return
            from joinedRowsObj in sourceData
            join rightRow in RightTable
                on joinedRowsObj.GetItemFor(LeftTable).ApplyIfNotNull(LeftKeySelector) equals rightRow.ApplyIfNotNull(RightKeySelector)
                into rightItemsForLeftItem
            from rightItem in rightItemsForLeftItem.DefaultIfEmpty()
            where JoinType == JoinType.LeftOuter || rightItem != null
            select joinedRowsObj.WithAddedItem(RightTable, rightItem)
        ;
    }
}

public static class FuncExtansions {
    public static TResult ApplyIfNotNull<T, TResult>(this T item, Func<T, TResult> func) where T : class {
        return item != null ? func(item) : default(TResult);
    }
}

The code outputs:

{ Id1 = 1, Id2 = 1, Id3 = 1, Value1 = 1, Value2 = 1, Value3 = 1 }

{ Id1 = 2, Id2 = 2, Id3 = 3, Value1 = 2, Value2 = 2, Value3 = 2 }

P.S. The code absolutely lacks any error checking to make it more compact and easier to read.


decompose your linq/lambda expression using How to Convert LINQ Comprehension Query Syntax to Method Syntax using Lambda

you will get

   customerList.Join(productList, cust => cust.ProductId, prod => prod.Id, (cust, prod) => new { cust = cust, prod = prod })
                .GroupJoin(vehicleList, cp => cp.prod.VehicleId, veh => veh.Id, (cp, v) => new { cp = cp, v = v })
                .SelectMany(cv => cv.v.DefaultIfEmpty(), (cv, veh) => new { customerName = cv.cp.cust.Name, customerVehicle = veh.VehicleName });

besides listDict, you will need the following keyArr as well:

keyArr[0] = { OuterKey = cust => cust.ProductId; InnerKey = prod => cust.Id; };
keyArr[1] = ...

for loop the listDict using the follow code:

var result = customerList;
foreach(var ld in listDict)
{
    //use this
    result = result.Join(ld, keyArr[i].OuterKey, keyArr[i].InnerKey, (cust, prod) => new { cust = cust, prod = prod });

    //or this or both depends on the query
    result = result.GroupJoin(ld, cp => cp.prod.VehicleId, veh => veh.Id, (cp, v) => new { cp = cp, v = v })
}
// need to define concrete class for each table
// and grouping result after each join

//and finally
result.SelectMany(cv => cv.v.DefaultIfEmpty(), (cv, veh) => { customerName = cv.cp.cust.Name, customerVehicle = veh.VehicleName });






linq