c# - update - entity framework extensions bulk insert




Fastest Way of Inserting in Entity Framework (17)

I'm looking for the fastest way of inserting into Entity Framework.

I'm asking this because of the scenario where you have an active TransactionScope and the insertion is huge (4000+). It can potentially last more than 10 minutes (default timeout of transactions), and this will lead to an incomplete transaction.


I'm looking for the fastest way of inserting into Entity Framework

There are some third-party libraries supporting Bulk Insert available:

  • Z.EntityFramework.Extensions (Recommended)
  • EFUtilities
  • EntityFramework.BulkInsert

See: Entity Framework Bulk Insert library

Be careful, when choosing a bulk insert library. Only Entity Framework Extensions supports all kind of associations and inheritances and it's the only one still supported.


Disclaimer: I'm the owner of Entity Framework Extensions

This library allows you to perform all bulk operations you need for your scenarios:

  • Bulk SaveChanges
  • Bulk Insert
  • Bulk Delete
  • Bulk Update
  • Bulk Merge

Example

// Easy to use
context.BulkSaveChanges();

// Easy to customize
context.BulkSaveChanges(bulk => bulk.BatchSize = 100);

// Perform Bulk Operations
context.BulkDelete(customers);
context.BulkInsert(customers);
context.BulkUpdate(customers);

// Customize Primary Key
context.BulkMerge(customers, operation => {
   operation.ColumnPrimaryKeyExpression = 
        customer => customer.Code;
});

Dispose() context create problems if the entities you Add() rely on other preloaded entities (e.g. navigation properties) in the context

I use similar concept to keep my context small to achieve the same performance

But instead of Dispose() the context and recreate, I simply detach the entities that already SaveChanges()

public void AddAndSave<TEntity>(List<TEntity> entities) where TEntity : class {

const int CommitCount = 1000; //set your own best performance number here
int currentCount = 0;

while (currentCount < entities.Count())
{
    //make sure it don't commit more than the entities you have
    int commitCount = CommitCount;
    if ((entities.Count - currentCount) < commitCount)
        commitCount = entities.Count - currentCount;

    //e.g. Add entities [ i = 0 to 999, 1000 to 1999, ... , n to n+999... ] to conext
    for (int i = currentCount; i < (currentCount + commitCount); i++)        
        _context.Entry(entities[i]).State = System.Data.EntityState.Added;
        //same as calling _context.Set<TEntity>().Add(entities[i]);       

    //commit entities[n to n+999] to database
    _context.SaveChanges();

    //detach all entities in the context that committed to database
    //so it won't overload the context
    for (int i = currentCount; i < (currentCount + commitCount); i++)
        _context.Entry(entities[i]).State = System.Data.EntityState.Detached;

    currentCount += commitCount;
} }

wrap it with try catch and TrasactionScope() if you need, not showing them here for keeping the code clean


Another option is to use SqlBulkTools available from Nuget. It's very easy to use and has some powerful features.

Example:

var bulk = new BulkOperations();
var books = GetBooks();

using (TransactionScope trans = new TransactionScope())
{
    using (SqlConnection conn = new SqlConnection(ConfigurationManager
    .ConnectionStrings["SqlBulkToolsTest"].ConnectionString))
    {
        bulk.Setup<Book>()
            .ForCollection(books)
            .WithTable("Books") 
            .AddAllColumns()
            .BulkInsert()
            .Commit(conn);
    }

    trans.Complete();
}

See the documentation for more examples and advanced usage. Disclaimer: I am the author of this library and any views are of my own opinion.


As other people have said SqlBulkCopy is the way to do it if you want really good insert performance.

It's a bit cumbersome to implement but there are libraries that can help you with it. There are a few out there but I will shamelesslyplug my own library this time: https://github.com/MikaelEliasson/EntityFramework.Utilities#batch-insert-entities

The only code you would need is:

 using (var db = new YourDbContext())
 {
     EFBatchOperation.For(db, db.BlogPosts).InsertAll(list);
 }

So how much faster is it? Very hard to say because it depends on so many factors, computer performance, network, object size etc etc. The performance tests I've made suggests 25k entities can be inserted at around 10s the standard way on localhost IF you optimize your EF configuration like mentioned in the other answers. With EFUtilities that takes about 300ms. Even more interesting is that I have saved around 3 millions entities in under 15 seconds using this method, averaging around 200k entities per second.

The one problem is ofcourse if you need to insert releated data. This can be done efficently into sql server using the method above but it requires you to have an Id generation strategy that let you generate id's in the app-code for the parent so you can set the foreign keys. This can be done using GUIDs or something like HiLo id generation.


But, for more than (+4000) inserts i recommend to use stored procedure. attached the time elapsed. I did inserted it 11.788 rows in 20"

thats it code

 public void InsertDataBase(MyEntity entity)
    {
        repository.Database.ExecuteSqlCommand("sp_mystored " +
                "@param1, @param2"
                 new SqlParameter("@param1", entity.property1),
                 new SqlParameter("@param2", entity.property2));
    }

Have you ever tried to insert through a background worker or task?

In my case, im inserting 7760 registers, distributed in 182 different tables with foreign key relationships ( by NavigationProperties).

Without the task, it took 2 minutes and a half. Within a Task ( Task.Factory.StartNew(...) ), it took 15 seconds.

Im only doing the SaveChanges() after adding all the entities to the context. (to ensure data integrity)


I agree with Adam Rackis. SqlBulkCopy is the fastest way of transferring bulk records from one data source to another. I used this to copy 20K records and it took less than 3 seconds. Have a look at the example below.

public static void InsertIntoMembers(DataTable dataTable)
{           
    using (var connection = new SqlConnection(@"data source=;persist security info=True;user id=;password=;initial catalog=;MultipleActiveResultSets=True;App=EntityFramework"))
    {
        SqlTransaction transaction = null;
        connection.Open();
        try
        {
            transaction = connection.BeginTransaction();
            using (var sqlBulkCopy = new SqlBulkCopy(connection, SqlBulkCopyOptions.TableLock, transaction))
            {
                sqlBulkCopy.DestinationTableName = "Members";
                sqlBulkCopy.ColumnMappings.Add("Firstname", "Firstname");
                sqlBulkCopy.ColumnMappings.Add("Lastname", "Lastname");
                sqlBulkCopy.ColumnMappings.Add("DOB", "DOB");
                sqlBulkCopy.ColumnMappings.Add("Gender", "Gender");
                sqlBulkCopy.ColumnMappings.Add("Email", "Email");

                sqlBulkCopy.ColumnMappings.Add("Address1", "Address1");
                sqlBulkCopy.ColumnMappings.Add("Address2", "Address2");
                sqlBulkCopy.ColumnMappings.Add("Address3", "Address3");
                sqlBulkCopy.ColumnMappings.Add("Address4", "Address4");
                sqlBulkCopy.ColumnMappings.Add("Postcode", "Postcode");

                sqlBulkCopy.ColumnMappings.Add("MobileNumber", "MobileNumber");
                sqlBulkCopy.ColumnMappings.Add("TelephoneNumber", "TelephoneNumber");

                sqlBulkCopy.ColumnMappings.Add("Deleted", "Deleted");

                sqlBulkCopy.WriteToServer(dataTable);
            }
            transaction.Commit();
        }
        catch (Exception)
        {
            transaction.Rollback();
        }

    }
}

I have made an generic extension of @Slauma s example above;

public static class DataExtensions
{
    public static DbContext AddToContext<T>(this DbContext context, object entity, int count, int commitCount, bool recreateContext, Func<DbContext> contextCreator)
    {
        context.Set(typeof(T)).Add((T)entity);

        if (count % commitCount == 0)
        {
            context.SaveChanges();
            if (recreateContext)
            {
                context.Dispose();
                context = contextCreator.Invoke();
                context.Configuration.AutoDetectChangesEnabled = false;
            }
        }
        return context;
    }
}

Usage:

public void AddEntities(List<YourEntity> entities)
{
    using (var transactionScope = new TransactionScope())
    {
        DbContext context = new YourContext();
        int count = 0;
        foreach (var entity in entities)
        {
            ++count;
            context = context.AddToContext<TenancyNote>(entity, count, 100, true,
                () => new YourContext());
        }
        context.SaveChanges();
        transactionScope.Complete();
    }
}

I would recommend this article on how to do bulk inserts using EF.

Entity Framework and slow bulk INSERTs

He explores these areas and compares perfomance:

  1. Default EF (57 minutes to complete adding 30,000 records)
  2. Replacing with ADO.NET Code (25 seconds for those same 30,000)
  3. Context Bloat- Keep the active Context Graph small by using a new context for each Unit of Work (same 30,000 inserts take 33 seconds)
  4. Large Lists - Turn off AutoDetectChangesEnabled (brings the time down to about 20 seconds)
  5. Batching (down to 16 seconds)
  6. DbTable.AddRange() - (performance is in the 12 range)

I've investigated Slauma's answer (which is awesome, thanks for the idea man), and I've reduced batch size until I've hit optimal speed. Looking at the Slauma's results:

  • commitCount = 1, recreateContext = true: more than 10 minutes
  • commitCount = 10, recreateContext = true: 241 sec
  • commitCount = 100, recreateContext = true: 164 sec
  • commitCount = 1000, recreateContext = true: 191 sec

It is visible that there is speed increase when moving from 1 to 10, and from 10 to 100, but from 100 to 1000 inserting speed is falling down again.

So I've focused on what's happening when you reduce batch size to value somewhere in between 10 and 100, and here are my results (I'm using different row contents, so my times are of different value):

Quantity    | Batch size    | Interval
1000    1   3
10000   1   34
100000  1   368

1000    5   1
10000   5   12
100000  5   133

1000    10  1
10000   10  11
100000  10  101

1000    20  1
10000   20  9
100000  20  92

1000    27  0
10000   27  9
100000  27  92

1000    30  0
10000   30  9
100000  30  92

1000    35  1
10000   35  9
100000  35  94

1000    50  1
10000   50  10
100000  50  106

1000    100 1
10000   100 14
100000  100 141

Based on my results, actual optimum is around value of 30 for batch size. It's less than both 10 and 100. Problem is, I have no idea why is 30 optimal, nor could have I found any logical explanation for it.


The fastest way would be using bulk insert extension, which I developed.

It uses SqlBulkCopy and custom datareader to get max performance. As a result it is over 20 times faster than using regular insert or AddRange

usage is extremely simple

context.BulkInsert(hugeAmountOfEntities);

The secret is to insert into an identical blank staging table. Inserts are lightening quick. Then run a single insert from that into your main large table. Then truncate the staging table ready for the next batch.

ie.

insert into some_staging_table using Entity Framework.

-- Single insert into main table (this could be a tiny stored proc call)
insert into some_main_already_large_table (columns...)
   select (columns...) from some_staging_table
truncate table some_staging_table

To your remark in the comments to your question:

"...SavingChanges (for each record)..."

That's the worst thing you can do! Calling SaveChanges() for each record slows bulk inserts extremely down. I would do a few simple tests which will very likely improve the performance:

  • Call SaveChanges() once after ALL records.
  • Call SaveChanges() after for example 100 records.
  • Call SaveChanges() after for example 100 records and dispose the context and create a new one.
  • Disable change detection

For bulk inserts I am working and experimenting with a pattern like this:

using (TransactionScope scope = new TransactionScope())
{
    MyDbContext context = null;
    try
    {
        context = new MyDbContext();
        context.Configuration.AutoDetectChangesEnabled = false;

        int count = 0;            
        foreach (var entityToInsert in someCollectionOfEntitiesToInsert)
        {
            ++count;
            context = AddToContext(context, entityToInsert, count, 100, true);
        }

        context.SaveChanges();
    }
    finally
    {
        if (context != null)
            context.Dispose();
    }

    scope.Complete();
}

private MyDbContext AddToContext(MyDbContext context,
    Entity entity, int count, int commitCount, bool recreateContext)
{
    context.Set<Entity>().Add(entity);

    if (count % commitCount == 0)
    {
        context.SaveChanges();
        if (recreateContext)
        {
            context.Dispose();
            context = new MyDbContext();
            context.Configuration.AutoDetectChangesEnabled = false;
        }
    }

    return context;
}

I have a test program which inserts 560.000 entities (9 scalar properties, no navigation properties) into the DB. With this code it works in less than 3 minutes.

For the performance it is important to call SaveChanges() after "many" records ("many" around 100 or 1000). It also improves the performance to dispose the context after SaveChanges and create a new one. This clears the context from all entites, SaveChanges doesn't do that, the entities are still attached to the context in state Unchanged. It is the growing size of attached entities in the context what slows down the insertion step by step. So, it is helpful to clear it after some time.

Here are a few measurements for my 560.000 entities:

  • commitCount = 1, recreateContext = false: many hours (That's your current procedure)
  • commitCount = 100, recreateContext = false: more than 20 minutes
  • commitCount = 1000, recreateContext = false: 242 sec
  • commitCount = 10000, recreateContext = false: 202 sec
  • commitCount = 100000, recreateContext = false: 199 sec
  • commitCount = 1000000, recreateContext = false: out of memory exception
  • commitCount = 1, recreateContext = true: more than 10 minutes
  • commitCount = 10, recreateContext = true: 241 sec
  • commitCount = 100, recreateContext = true: 164 sec
  • commitCount = 1000, recreateContext = true: 191 sec

The behaviour in the first test above is that the performance is very non-linear and decreases extremely over time. ("Many hours" is an estimation, I never finished this test, I stopped at 50.000 entities after 20 minutes.) This non-linear behaviour is not so significant in all other tests.


Try to use a Stored Procedure that will get an XML of the data that you want to insert.


Use stored procedure that takes input data in form of xml to insert data.

From your c# code pass insert data as xml.

e.g in c#, syntax would be like this:

object id_application = db.ExecuteScalar("procSaveApplication", xml)


[NEW SOLUTION FOR POSTGRESQL] Hey, I know it's quite an old post, but I have recently run into similar problem, but we were using Postgresql. I wanted to use effective bulkinsert, what turned out to be pretty difficult. I haven't found any proper free library to do so on this DB. I have only found this helper: https://bytefish.de/blog/postgresql_bulk_insert/ which is also on Nuget. I have written a small mapper, which auto mapped properties the way Entity Framework:

public static PostgreSQLCopyHelper<T> CreateHelper<T>(string schemaName, string tableName)
        {
            var helper = new PostgreSQLCopyHelper<T>("dbo", "\"" + tableName + "\"");
            var properties = typeof(T).GetProperties();
            foreach(var prop in properties)
            {
                var type = prop.PropertyType;
                if (Attribute.IsDefined(prop, typeof(KeyAttribute)) || Attribute.IsDefined(prop, typeof(ForeignKeyAttribute)))
                    continue;
                switch (type)
                {
                    case Type intType when intType == typeof(int) || intType == typeof(int?):
                        {
                            helper = helper.MapInteger("\"" + prop.Name + "\"",  x => (int?)typeof(T).GetProperty(prop.Name).GetValue(x, null));
                            break;
                        }
                    case Type stringType when stringType == typeof(string):
                        {
                            helper = helper.MapText("\"" + prop.Name + "\"", x => (string)typeof(T).GetProperty(prop.Name).GetValue(x, null));
                            break;
                        }
                    case Type dateType when dateType == typeof(DateTime) || dateType == typeof(DateTime?):
                        {
                            helper = helper.MapTimeStamp("\"" + prop.Name + "\"", x => (DateTime?)typeof(T).GetProperty(prop.Name).GetValue(x, null));
                            break;
                        }
                    case Type decimalType when decimalType == typeof(decimal) || decimalType == typeof(decimal?):
                        {
                            helper = helper.MapMoney("\"" + prop.Name + "\"", x => (decimal?)typeof(T).GetProperty(prop.Name).GetValue(x, null));
                            break;
                        }
                    case Type doubleType when doubleType == typeof(double) || doubleType == typeof(double?):
                        {
                            helper = helper.MapDouble("\"" + prop.Name + "\"", x => (double?)typeof(T).GetProperty(prop.Name).GetValue(x, null));
                            break;
                        }
                    case Type floatType when floatType == typeof(float) || floatType == typeof(float?):
                        {
                            helper = helper.MapReal("\"" + prop.Name + "\"", x => (float?)typeof(T).GetProperty(prop.Name).GetValue(x, null));
                            break;
                        }
                    case Type guidType when guidType == typeof(Guid):
                        {
                            helper = helper.MapUUID("\"" + prop.Name + "\"", x => (Guid)typeof(T).GetProperty(prop.Name).GetValue(x, null));
                            break;
                        }
                }
            }
            return helper;
        }

I use it the following way (I had entity named Undertaking):

var undertakingHelper = BulkMapper.CreateHelper<Model.Undertaking>("dbo", nameof(Model.Undertaking));
undertakingHelper.SaveAll(transaction.UnderlyingTransaction.Connection as Npgsql.NpgsqlConnection, undertakingsToAdd));

I showed an example with transaction, but it can also be done with normal connection retrieved from context. undertakingsToAdd is enumerable of normal entity records, which I want to bulkInsert into DB.

This solution, to which I've got after few hours of research and trying, is as you could expect much faster and finally easy to use and free! I really advice you to use this solution, not only for the reasons mentioned above, but also because it's the only one with which I had no problems with Postgresql itself, many other solutions work flawlessly for example with SqlServer.





entity-framework