[.net] 哪个方法执行得更好:.Any()vs .Count()> 0?


Answers

注意:当实体框架4是真实的时,我写了这个答案。 这个答案的重点不在于简单.Count().Count()性能测试。 重点在于表明EF远远不够完美。 新版本更好...但是如果你有一部分代码很慢并且使用EF,那么使用直接TSQL进行测试并比较性能,而不是依赖于假设(即.Count() > 0总是比.Count() > 0更快) 。

虽然我同意最多的答案和意见 - 特别是在Any信号开发人员意图Count() > 0更好 - 我已经有了在SQL Server(EntityFramework 4)上Count数量级更快的情况。

这里是Any查询超时例外(在~200.000记录):

con = db.Contacts.
    Where(a => a.CompanyId == companyId && a.ContactStatusId <= (int) Const.ContactStatusEnum.Reactivated
        && !a.NewsletterLogs.Any(b => b.NewsletterLogTypeId == (int) Const.NewsletterLogTypeEnum.Unsubscr)
    ).OrderBy(a => a.ContactId).
    Skip(position - 1).
    Take(1).FirstOrDefault();

以毫秒为单位执行Count版本:

con = db.Contacts.
    Where(a => a.CompanyId == companyId && a.ContactStatusId <= (int) Const.ContactStatusEnum.Reactivated
        && a.NewsletterLogs.Count(b => b.NewsletterLogTypeId == (int) Const.NewsletterLogTypeEnum.Unsubscr) == 0
    ).OrderBy(a => a.ContactId).
    Skip(position - 1).
    Take(1).FirstOrDefault();

我需要找到一种方法来查看LINQ所产生的确切SQL - 但很明显,在某些情况下, CountAny之间存在巨大的性能差异,不幸的是,似乎在任何情况下都不能坚持使用Any

编辑:这里是生成的SQL。 美女,你可以看到;)

ANY

exec sp_executesql N'SELECT TOP (1) 
[Project2].[ContactId] AS [ContactId], 
[Project2].[CompanyId] AS [CompanyId], 
[Project2].[ContactName] AS [ContactName], 
[Project2].[FullName] AS [FullName], 
[Project2].[ContactStatusId] AS [ContactStatusId], 
[Project2].[Created] AS [Created]
FROM ( SELECT [Project2].[ContactId] AS [ContactId], [Project2].[CompanyId] AS [CompanyId], [Project2].[ContactName] AS [ContactName], [Project2].[FullName] AS [FullName], [Project2].[ContactStatusId] AS [ContactStatusId], [Project2].[Created] AS [Created], row_number() OVER (ORDER BY [Project2].[ContactId] ASC) AS [row_number]
    FROM ( SELECT 
        [Extent1].[ContactId] AS [ContactId], 
        [Extent1].[CompanyId] AS [CompanyId], 
        [Extent1].[ContactName] AS [ContactName], 
        [Extent1].[FullName] AS [FullName], 
        [Extent1].[ContactStatusId] AS [ContactStatusId], 
        [Extent1].[Created] AS [Created]
        FROM [dbo].[Contact] AS [Extent1]
        WHERE ([Extent1].[CompanyId] = @p__linq__0) AND ([Extent1].[ContactStatusId] <= 3) AND ( NOT EXISTS (SELECT 
            1 AS [C1]
            FROM [dbo].[NewsletterLog] AS [Extent2]
            WHERE ([Extent1].[ContactId] = [Extent2].[ContactId]) AND (6 = [Extent2].[NewsletterLogTypeId])
        ))
    )  AS [Project2]
)  AS [Project2]
WHERE [Project2].[row_number] > 99
ORDER BY [Project2].[ContactId] ASC',N'@p__linq__0 int',@p__linq__0=4

COUNT

exec sp_executesql N'SELECT TOP (1) 
[Project2].[ContactId] AS [ContactId], 
[Project2].[CompanyId] AS [CompanyId], 
[Project2].[ContactName] AS [ContactName], 
[Project2].[FullName] AS [FullName], 
[Project2].[ContactStatusId] AS [ContactStatusId], 
[Project2].[Created] AS [Created]
FROM ( SELECT [Project2].[ContactId] AS [ContactId], [Project2].[CompanyId] AS [CompanyId], [Project2].[ContactName] AS [ContactName], [Project2].[FullName] AS [FullName], [Project2].[ContactStatusId] AS [ContactStatusId], [Project2].[Created] AS [Created], row_number() OVER (ORDER BY [Project2].[ContactId] ASC) AS [row_number]
    FROM ( SELECT 
        [Project1].[ContactId] AS [ContactId], 
        [Project1].[CompanyId] AS [CompanyId], 
        [Project1].[ContactName] AS [ContactName], 
        [Project1].[FullName] AS [FullName], 
        [Project1].[ContactStatusId] AS [ContactStatusId], 
        [Project1].[Created] AS [Created]
        FROM ( SELECT 
            [Extent1].[ContactId] AS [ContactId], 
            [Extent1].[CompanyId] AS [CompanyId], 
            [Extent1].[ContactName] AS [ContactName], 
            [Extent1].[FullName] AS [FullName], 
            [Extent1].[ContactStatusId] AS [ContactStatusId], 
            [Extent1].[Created] AS [Created], 
            (SELECT 
                COUNT(1) AS [A1]
                FROM [dbo].[NewsletterLog] AS [Extent2]
                WHERE ([Extent1].[ContactId] = [Extent2].[ContactId]) AND (6 = [Extent2].[NewsletterLogTypeId])) AS [C1]
            FROM [dbo].[Contact] AS [Extent1]
        )  AS [Project1]
        WHERE ([Project1].[CompanyId] = @p__linq__0) AND ([Project1].[ContactStatusId] <= 3) AND (0 = [Project1].[C1])
    )  AS [Project2]
)  AS [Project2]
WHERE [Project2].[row_number] > 99
ORDER BY [Project2].[ContactId] ASC',N'@p__linq__0 int',@p__linq__0=4

看起来纯粹在EXISTS下工作比计算Count要糟糕得多,然后在计数== 0的地方进行。

让我知道你们是否在我的发现中看到了一些错误。 无论Any vs Count的讨论如何,所有这些都可以避免,当更改为存储过程时,任何更复杂的LINQ都会更好;)。

Question

System.Linq命名空间中,我们现在可以扩展IEnumerableAny()Count() 扩展方法

最近有人告诉我,如果我想检查一个集合包含1个或多个项目,我应该使用.Count() > 0扩展方法而不是.Count() > 0扩展方法,因为.Count()扩展方法必须遍历所有项目。

其次,一些集合有一个属性 (不是扩展方法),它是CountLength 。 使用这些,而不是.Count().Count()会更好吗?

是/娜?




编辑:它被固定在EF版本6.1.1。 这个答案不再是实际的

对于SQL Server和EF4-6,Count()比Any()执行速度快两倍。

当你运行Table.Any()时,它会产生类似( 警告:不要伤害大脑,试图理解它

SELECT 
CASE WHEN ( EXISTS (SELECT 
    1 AS [C1]
    FROM [Table] AS [Extent1]
)) THEN cast(1 as bit) WHEN ( NOT EXISTS (SELECT 
    1 AS [C1]
    FROM [Table] AS [Extent2]
)) THEN cast(0 as bit) END AS [C1]
FROM  ( SELECT 1 AS X ) AS [SingleRowTable1]

这需要对条件进行2次扫描。

我不喜欢写Count() > 0因为它隐藏了我的意图。 我更喜欢为此使用自定义谓词:

public static class QueryExtensions
{
    public static bool Exists<TSource>(this IQueryable<TSource> source, Expression<Func<TSource, bool>> predicate)
    {
        return source.Count(predicate) > 0;
    }
}



关于Count()方法,如果IEnumarable是一个ICollection ,那么我们不能遍历所有项目,因为我们可以检索ICollectionCount字段,如果IEnumerable不是ICollection,我们必须使用一段时间遍历所有项目一个MoveNext ,看看.NET Framework代码:

public static int Count<TSource>(this IEnumerable<TSource> source)
{
    if (source == null) 
        throw Error.ArgumentNull("source");

    ICollection<TSource> collectionoft = source as ICollection<TSource>;
    if (collectionoft != null) 
        return collectionoft.Count;

    ICollection collection = source as ICollection;
    if (collection != null) 
        return collection.Count;

    int count = 0;
    using (IEnumerator<TSource> e = source.GetEnumerator())
    {
        checked
        {
            while (e.MoveNext()) count++;
        }
    }
    return count;
}

参考: 参考源可枚举




这取决于数据集有多大,以及您的性能要求是什么?

如果没有什么大问题,那就使用最易读的形式,这对我自己来说是可行的,因为它更短,更易读,而不是一个等式。




Related