mysql - one - sql server select row with max value




SQL select only rows with max value on a column (20)

I have this table for documents (simplified version here):

+------+-------+--------------------------------------+
| id   | rev   | content                              |
+------+-------+--------------------------------------+
| 1    | 1     | ...                                  |
| 2    | 1     | ...                                  |
| 1    | 2     | ...                                  |
| 1    | 3     | ...                                  |
+------+-------+--------------------------------------+

How do I select one row per id and only the greatest rev?
With the above data, the result should contain two rows: [1, 3, ...] and [2, 1, ..]. I'm using MySQL.

Currently I use checks in the while loop to detect and over-write old revs from the resultset. But is this the only method to achieve the result? Isn't there a SQL solution?

Update
As the answers suggest, there is a SQL solution, and here a sqlfiddle demo.

Update 2
I noticed after adding the above sqlfiddle, the rate at which the question is upvoted has surpassed the upvote rate of the answers. That has not been the intention! The fiddle is based on the answers, especially the accepted answer.


At first glance...

All you need is a GROUP BY clause with the MAX aggregate function:

SELECT id, MAX(rev)
FROM YourTable
GROUP BY id

It's never that simple, is it?

I just noticed you need the content column as well.

This is a very common question in SQL: find the whole data for the row with some max value in a column per some group identifier. I heard that a lot during my career. Actually, it was one the questions I answered in my current job's technical interview.

It is, actually, so common that community has created a single tag just to deal with questions like that: greatest-n-per-group.

Basically, you have two approaches to solve that problem:

Joining with simple group-identifier, max-value-in-group Sub-query

In this approach, you first find the group-identifier, max-value-in-group (already solved above) in a sub-query. Then you join your table to the sub-query with equality on both group-identifier and max-value-in-group:

SELECT a.id, a.rev, a.contents
FROM YourTable a
INNER JOIN (
    SELECT id, MAX(rev) rev
    FROM YourTable
    GROUP BY id
) b ON a.id = b.id AND a.rev = b.rev

Left Joining with self, tweaking join conditions and filters

In this approach, you left join the table with itself. Equality, of course, goes in the group-identifier. Then, 2 smart moves:

  1. The second join condition is having left side value less than right value
  2. When you do step 1, the row(s) that actually have the max value will have NULL in the right side (it's a LEFT JOIN, remember?). Then, we filter the joined result, showing only the rows where the right side is NULL.

So you end up with:

SELECT a.*
FROM YourTable a
LEFT OUTER JOIN YourTable b
    ON a.id = b.id AND a.rev < b.rev
WHERE b.id IS NULL;

Conclusion

Both approaches bring the exact same result.

If you have two rows with max-value-in-group for group-identifier, both rows will be in the result in both approaches.

Both approaches are SQL ANSI compatible, thus, will work with your favorite RDBMS, regardless of its "flavor".

Both approaches are also performance friendly, however your mileage may vary (RDBMS, DB Structure, Indexes, etc.). So when you pick one approach over the other, benchmark. And make sure you pick the one which make most of sense to you.


NOT mySQL, but for other people finding this question and using SQL, another way to resolve the greatest-n-per-group problem is using Cross Apply in MS SQL

WITH DocIds AS (SELECT DISTINCT id FROM docs)

SELECT d2.id, d2.rev, d2.content
FROM DocIds d1
CROSS APPLY (
  SELECT Top 1 * FROM docs d
  WHERE d.id = d1.id
  ORDER BY rev DESC
) d2

Here's an example in SqlFiddle


Another manner to do the job is using MAX() analytic function in OVER PARTITION clause

SELECT t.*
  FROM
    (
    SELECT id
          ,rev
          ,contents
          ,MAX(rev) OVER (PARTITION BY id) as max_rev
      FROM YourTable
    ) t
  WHERE t.rev = t.max_rev 

The other OVER PARTITION solution already documented in this post is

SELECT t.*
  FROM
    (
    SELECT id
          ,rev
          ,contents
          ,ROW_NUMBER() OVER (PARTITION BY id ORDER BY rev DESC) rank
      FROM YourTable
    ) t
  WHERE t.rank = 1 

This 2 SELECT work well on Oracle 10g.


Here is a nice way of doing that

Use following code :

with temp as  ( 
select count(field1) as summ , field1
from table_name
group by field1 )
select * from temp where summ = (select max(summ) from temp)

How about this:

select all_fields.*  
from  (select id, MAX(rev) from yourtable group by id) as max_recs  
left outer join yourtable as all_fields  
on max_recs.id = all_fields.id

I am flabbergasted that no answer offered SQL window function solution:

SELECT a.id, a.rev, a.contents
  FROM (SELECT id, rev, contents,
               ROW_NUMBER() OVER (PARTITION BY id ORDER BY rev DESC) rank
          FROM YourTable) a
 WHERE a.rank = 1 

Added in SQL standard ANSI/ISO Standard SQL:2003 and later extended with ANSI/ISO Standard SQL:2008, window (or windowing) functions are available with all major vendors now. There are more types of rank functions available to deal with a tie issue: RANK, DENSE_RANK, PERSENT_RANK.


I like to do this by ranking the records by some column. In this case, rank rev values grouped by id. Those with higher rev will have lower rankings. So highest rev will have ranking of 1.

select id, rev, content
from
 (select
    @rowNum := if(@prevValue = id, @rowNum+1, 1) as row_num,
    id, rev, content,
    @prevValue := id
  from
   (select id, rev, content from YOURTABLE order by id asc, rev desc) TEMP,
   (select @rowNum := 1 from DUAL) X,
   (select @prevValue := -1 from DUAL) Y) TEMP
where row_num = 1;

Not sure if introducing variables makes the whole thing slower. But at least I'm not querying YOURTABLE twice.


I like to use a NOT EXIST-based solution for this problem:

SELECT id, rev
FROM YourTable t
WHERE NOT EXISTS (
   SELECT * FROM YourTable t WHERE t.id = id AND rev > t.rev
)

I used the below to solve a problem of my own. I first created a temp table and inserted the max rev value per unique id.

CREATE TABLE #temp1
(
    id varchar(20)
    , rev int
)
INSERT INTO #temp1
SELECT a.id, MAX(a.rev) as rev
FROM 
    (
        SELECT id, content, SUM(rev) as rev
        FROM YourTable
        GROUP BY id, content
    ) as a 
GROUP BY a.id
ORDER BY a.id

I then joined these max values (#temp1) to all of the possible id/content combinations. By doing this, I naturally filter out the non-maximum id/content combinations, and am left with the only max rev values for each.

SELECT a.id, a.rev, content
FROM #temp1 as a
LEFT JOIN
    (
        SELECT id, content, SUM(rev) as rev
        FROM YourTable
        GROUP BY id, content
    ) as b on a.id = b.id and a.rev = b.rev
GROUP BY a.id, a.rev, b.content
ORDER BY a.id

I would use this:

select t.*
from test as t
join
   (select max(rev) as rev
    from test
    group by id) as o
on o.rev = t.rev

Subquery SELECT is not too eficient maybe, but in JOIN clause seems to be usable. I'm not an expert in optimizing queries, but I've tried at MySQL, PostgreSQL, FireBird and it does work very good.

You can use this schema in multiple joins and with WHERE clause. It is my working example (solving identical to yours problem with table "firmy"):

select *
from platnosci as p
join firmy as f
on p.id_rel_firmy = f.id_rel
join (select max(id_obj) as id_obj
      from firmy
      group by id_rel) as o
on o.id_obj = f.id_obj and p.od > '2014-03-01'

It is asked on tables having teens thusands of records, and it takes less then 0,01 second on really not too strong machine.

I wouldn't use IN clause (as it is mentioned somewhere above). IN is given to use with short lists of constans, and not as to be the query filter built on subquery. It is because subquery in IN is performed for every scanned record which can made query taking very loooong time.


If you have many fields in select statement and you want latest value for all of those fields through optimized code:

select * from
(select * from table_name
order by id,rev desc) temp
group by id 

My preference is to use as little code as possible...

You can do it using IN try this:

SELECT * 
FROM t1 WHERE (id,rev) IN 
( SELECT id, MAX(rev)
  FROM t1
  GROUP BY id
)

to my mind it is less complicated... easier to read and maintain.


SELECT * FROM Employee where Employee.Salary in (select max(salary) from Employee group by Employe_id) ORDER BY Employee.Salary


Since this is most popular question with regard to this problem, I'll re-post another answer to it here as well:

It looks like there is simpler way to do this (but only in MySQL):

select *
from (select * from mytable order by id, rev desc ) x
group by id

Please credit answer of user Bohemian in this question for providing such a concise and elegant answer to this problem.

EDIT: though this solution works for many people it may not be stable in the long run, since MySQL doesn't guarantee that GROUP BY statement will return meaningful values for columns not in GROUP BY list. So use this solution at your own risk


Sorted the rev field in reverse order and then grouped by id which gave the first row of each grouping which is the one with the highest rev value.

SELECT * FROM (SELECT * FROM table1 ORDER BY id, rev DESC) X GROUP BY X.id;

Tested in http://sqlfiddle.com/ with the following data

CREATE TABLE table1
    (`id` int, `rev` int, `content` varchar(11));

INSERT INTO table1
    (`id`, `rev`, `content`)
VALUES
    (1, 1, 'One-One'),
    (1, 2, 'One-Two'),
    (2, 1, 'Two-One'),
    (2, 2, 'Two-Two'),
    (3, 2, 'Three-Two'),
    (3, 1, 'Three-One'),
    (3, 3, 'Three-Three')
;

This gave the following result in MySql 5.5 and 5.6

id  rev content
1   2   One-Two
2   2   Two-Two
3   3   Three-Two

This solution makes only one selection from YourTable, therefore it's faster. It works only for MySQL and SQLite(for SQLite remove DESC) according to test on sqlfiddle.com. Maybe it can be tweaked to work on other languages which I am not familiar with.

SELECT *
FROM ( SELECT *
       FROM ( SELECT 1 as id, 1 as rev, 'content1' as content
              UNION
              SELECT 2, 1, 'content2'
              UNION
              SELECT 1, 2, 'content3'
              UNION
              SELECT 1, 3, 'content4'
            ) as YourTable
       ORDER BY id, rev DESC
   ) as YourTable
GROUP BY id

Yet another solution is to use a correlated subquery:

select yt.id, yt.rev, yt.contents
    from YourTable yt
    where rev = 
        (select max(rev) from YourTable st where yt.id=st.id)

Having an index on (id,rev) renders the subquery almost as a simple lookup...

Following are comparisons to the solutions in @AdrianCarneiro's answer (subquery, leftjoin), based on MySQL measurements with InnoDB table of ~1million records, group size being: 1-3.

While for full table scans subquery/leftjoin/correlated timings relate to each other as 6/8/9, when it comes to direct lookups or batch (id in (1,2,3)), subquery is much slower then the others (Due to rerunning the subquery). However I couldnt differentiate between leftjoin and correlated solutions in speed.

One final note, as leftjoin creates n*(n+1)/2 joins in groups, its performance can be heavily affected by the size of groups...


You can make the select without a join when you combine the rev and id into one maxRevId value for MAX() and then split it back to original values:

SELECT maxRevId & ((1 << 32) - 1) as id, maxRevId >> 32 AS rev
FROM (SELECT MAX(((rev << 32) | id)) AS maxRevId
      FROM YourTable
      GROUP BY id) x;

This is especially fast when there is a complex join instead of a single table. With the traditional approaches the complex join would be done twice.

The above combination is simple with bit functions when rev and id are INT UNSIGNED (32 bit) and combined value fits to BIGINT UNSIGNED (64 bit). When the id & rev are larger than 32-bit values or made of multiple columns, you need combine the value into e.g. a binary value with suitable padding for MAX().


SELECT * FROM t1 ORDER BY rev DESC LIMIT 1;

select * from yourtable
group by id
having rev=max(rev);




greatest-n-per-group