在MySQL中查找重复值


Answers

SELECT varchar_col
FROM table
GROUP BY varchar_col
HAVING count(*) > 1;
Question

我有一个带有varchar列的表,并且我想查找在此列中具有重复值的所有记录。 我可以用来查找重复项的最佳查询是什么?




CREATE TABLE tbl_master
    (`id` int, `email` varchar(15));

INSERT INTO tbl_master
    (`id`, `email`) VALUES
    (1, 'test1@gmail.com'),
    (2, 'test2@gmail.com'),
    (3, 'test1@gmail.com'),
    (4, 'test2@gmail.com'),
    (5, 'test5@gmail.com');

QUERY : SELECT id, email FROM tbl_master
WHERE email IN (SELECT email FROM tbl_master GROUP BY email HAVING COUNT(id) > 1)



SELECT * 
FROM `dps` 
WHERE pid IN (SELECT pid FROM `dps` GROUP BY pid HAVING COUNT(pid)>1)



进一步考虑@ maxyfc的回答 ,我需要找到所有与重复值一起返回的行,所以我可以在MySQL Workbench中编辑它们:

SELECT * FROM table
   WHERE field IN (
     SELECT field FROM table GROUP BY field HAVING count(*) > 1
   ) ORDER BY field



SELECT 
    t.*,
    (SELECT COUNT(*) FROM city AS tt WHERE tt.name=t.name) AS count 
FROM `city` AS t 
WHERE 
    (SELECT count(*) FROM city AS tt WHERE tt.name=t.name) > 1 ORDER BY count DESC



如果你的服务器支持它(这将返回逗号分隔的id列表),建立levik的答案来获取重复行的ID,你可以做一个GROUP_CONCAT

SELECT GROUP_CONCAT(id), name, COUNT(*) c FROM documents GROUP BY name HAVING c > 1;



一个很晚的贡献...如果它有助于任何人下线......我有一项任务是在银行应用程序中找到匹配的交易对(实际上账户到账户转账的双方),以确定哪些对于每个账户间转账交易都是'从'和'到',所以我们最终得出了这样的结论:

SELECT 
    LEAST(primaryid, secondaryid) AS transactionid1,
    GREATEST(primaryid, secondaryid) AS transactionid2
FROM (
    SELECT table1.transactionid AS primaryid, 
        table2.transactionid AS secondaryid
    FROM financial_transactions table1
    INNER JOIN financial_transactions table2 
    ON table1.accountid = table2.accountid
    AND table1.transactionid <> table2.transactionid 
    AND table1.transactiondate = table2.transactiondate
    AND table1.sourceref = table2.destinationref
    AND table1.amount = (0 - table2.amount)
) AS DuplicateResultsTable
GROUP BY transactionid1
ORDER BY transactionid1;

结果是DuplicateResultsTable提供了包含匹配(即重复)事务的行,但它也提供了相同的事务ID,第二次匹配同一对,所以外部SELECT在那里按照第一个事务ID进行分组,通过使用LEASTGREATEST来确保两个transactionid在结果中始终保持相同的顺序,从而可以安全地按GROUP排序,从而消除所有重复的匹配。 历经近百万条记录,并在不到2秒的时间内确定了12,000多场比赛。 当然,transactionid是主要的索引,真的有帮助。







我没有看到任何JOIN的问题,在复制方面有很多用途。

这aproeach给你实际加倍的结果。

SELECT t1.* FROM table as t1 LEFT JOIN table as t2 ON t1.name=t2.name and t1.id!=t2.id WHERE t2.id IS NOT NULL ORDER BY t1.name



我看到上面的结果和查询将正常工作,如果你需要检查重复的单列值。 例如电子邮件。

但是,如果您需要检查更多的列,并希望检查结果的组合,以便此查询将正常工作:

SELECT COUNT(CONCAT(name,email)) AS tot,
       name,
       email
FROM users
GROUP BY CONCAT(name,email)
HAVING tot>1 (This query will SHOW the USER list which ARE greater THAN 1
              AND also COUNT)



Links



Tags