sql - Get the last dates from multiple columns




sql-server datetime (4)

This feels like it should be an easy one. How do I get the latest dates that are in different columns

DROP TABLE #indebtedness
CREATE TABLE #indebtedness (call_case CHAR(10), date1 DATETIME, date2 DATETIME, date3 DATETIME)
INSERT #indebtedness VALUES ('Key1', '2019-10-30', '2019-11-30', '2019-10-25')
INSERT #indebtedness VALUES ('Key2', '2019-10-20', '2019-10-30', '2019-10-15')
INSERT #indebtedness VALUES ('Key3', '2019-11-11', '2019-10-29', '2019-10-30')
INSERT #indebtedness VALUES ('Key4',     null    , '2019-10-29', '2019-10-13')

select call_case, ?? AS 'Latest Date' from #indebtedness 

I would like the result to be:

call_case   Latest Date
Key1        2019-11-30 
Key2        2019-10-30 
Key3        2019-11-11 
Key4        2019-10-29 

In my view, Pivot is the best and efficient option for this query. Copy and Paste in the MS SQL SERVER. Please check the code written below:

CREATE TABLE #indebtedness (call_case CHAR(10), date1 DATETIME, date2 DATETIME, date3 DATETIME)
INSERT #indebtedness VALUES ('Key1', '2019-10-30', '2019-11-30', '2019-10-31')
INSERT #indebtedness VALUES ('Key2', '2019-10-20', '2019-10-30', '2019-11-21')
INSERT #indebtedness VALUES ('Key3', '2019-11-11', '2019-10-29', '2019-10-30')
INSERT #indebtedness VALUES ('Key4', Null, '2019-10-29', '2019-10-13')

--Solution-1:
SELECT        
    call_case,
    MAX(RecnetDate) as MaxDateColumn         
FROM #indebtedness
UNPIVOT
(RecnetDate FOR COL IN ([date1], [date2], [date3])) as TRANSPOSE
GROUP BY call_case 

--Solution-2:
select 
    call_case, case 
    when date1>date2 and date1 > date3 then date1
    when date2>date3                   then date2
    when date3>date1                   then date1 
   else date3 end as date
from #indebtedness as a 


Drop table #indebtedness

The currently accepted answer is the best answer, but I don't think it does a good enough job of explaining why. The other answers certainly look much cleaner at a glance (who wants to write that ugly case statement), but are likely to be much worse when you start operating at scale.

SELECT @@VERSION

Microsoft SQL Server 2016 (SP2) (KB4052908) - 13.0.5026.0 (X64) 
Mar 18 2018 09:11:49 
Copyright (c) Microsoft Corporation
Developer Edition (64-bit) on Windows 10 Enterprise 10.0 <X64> (Build 17763: )

Here is how I set everything up

DECLARE @Offset bigint = 0;
DECLARE @Max bigint = 10000000;

DROP TABLE IF EXISTS #Indebtedness;
CREATE TABLE #Indebtedness
(
  call_case char(10) COLLATE DATABASE_DEFAULT NOT NULL,
  date1     datetime NULL,
  date2     datetime NULL,
  date3     datetime NULL
);

WHILE @Offset < @Max
BEGIN

  INSERT INTO #Indebtedness
  ( call_case, date1, date2, date3 )
    SELECT @Offset + ROW_NUMBER() OVER ( ORDER BY ( SELECT NULL )),
           DATEADD( DAY,
                    CASE WHEN RAND() > 0 THEN 1
                         ELSE -1 END * ROUND( RAND(), 0 ),
                    CURRENT_TIMESTAMP ),
           DATEADD( DAY,
                    CASE WHEN RAND() > 0 THEN 1
                         ELSE -1 END * ROUND( RAND(), 0 ),
                    CURRENT_TIMESTAMP ),
           DATEADD( DAY,
                    CASE WHEN RAND() > 0 THEN 1
                         ELSE -1 END * ROUND( RAND(), 0 ),
                    CURRENT_TIMESTAMP )
      FROM master.dbo.spt_values a
        CROSS APPLY master.dbo.spt_values b;


  SET @Offset = @Offset + ROWCOUNT_BIG();
END;

On my system this gets me 12,872,738 rows in the table. If I try each of the above queries (tweaked to SELECT INTO so I don't need to wait for it to finish printing the results in SSMS), I get the following results:

Method                                | CPU time (ms) | Elapsed time (ms) | Relative Cost
-----------------------------------------------------------------------------------------
Tim Biegeleisen (CASE)                | 13485         | 2167              | 2%
Red Devil (Subquery over MAX columns) | 55187         | 9891              | 14%
Vignesh Kumar (Subquery over columns) | 33750         | 5139              | 5%
Serkan Arslan (UNPIVOT)               | 86205         | 15023             | 12%
Metal (STRING_SPLIT)                  | 459668        | 186742            | 68%

If you look at the query plans, it becomes pretty obvious why - adding any kind of unpivot or aggregate (or heaven forbid STRING_SPLIT ) you'll end up with all sorts of additional operators that you don't need (and it forces the plan to go parallel, taking away resources other queries might want). By contract, the CASE based solution does not go parallel, runs very quickly, and is incredibly simple.

In this case, unless you have unlimited resources (you don't), you should pick the simplest and fastest approach.

There was a question of what to do if you need to keep adding new columns and expanding the case statement. Yes, this gets unwieldy, but so does every other solution. If this is actually a plausible workflow, then you should redesign your table. What you want probably looks something like this:

CREATE TABLE #Indebtedness2
(
  call_case     char(10) COLLATE DATABASE_DEFAULT NOT NULL,
  activity_type bigint   NOT NULL,  -- This indicates which date# column it was, if you care
  timestamp     datetime NOT NULL
);

SELECT Indebtedness.call_case,
       Indebtedness.activity_type,
       Indebtedness.timestamp
  FROM ( SELECT call_case,
                activity_type,
                timestamp,
                ROW_NUMBER() OVER ( PARTITION BY call_case
                                    ORDER BY timestamp DESC ) RowNumber
           FROM #Indebtedness2 ) Indebtedness
  WHERE Indebtedness.RowNumber = 1;

This is certainly not free of potential performance issues, and will require careful index tuning, but is the best way to handle an arbitrary number of potential timestamps

In case any answers get deleted, here are the versions I was comparing (in order)

SELECT
    call_case,
    CASE WHEN date1 > date2 AND date1 > date3
         THEN date1
         WHEN date2 > date3
         THEN date2
         ELSE date3 END AS [Latest Date]
FROM #indebtedness;

SELECT call_case,
  (SELECT Max(v) 
   FROM (VALUES (date1), (date2), (date3),...) AS value(v)) as [MostRecentDate]
FROM #indebtedness

SELECT call_case,
  (SELECT
     MAX(call_case) 
   FROM ( VALUES 
            (MAX(date1)), 
            (MAX(date2))
            ,(max(date3)) 
        ) MyAlias(call_case)
  ) 
FROM #indebtedness
group by call_case

select call_case, MAX(date)  [Latest Date] from #indebtedness 
UNPIVOT(date FOR col IN ([date1], [date2], [date3])) UNPVT
GROUP BY call_case

select call_case , max(cast(x.Item as date)) as 'Latest Date' from #indebtedness  t
cross apply dbo.SplitString(concat(date1, ',', date2, ',', date3), ',') x
group by call_case

Use a CASE expression:

SELECT
    call_case,
    CASE WHEN date1 > date2 AND date1 > date3
         THEN date1
         WHEN date2 > date3
         THEN date2
         ELSE date3 END AS [Latest Date]
FROM #indebtedness;

Note that some databases, such as MySQL, SQL Server, and SQLite, support a scalar greatest function. SQL Server does not, so we can use a CASE expression as a workaround.

Edit:

It appears that in your actual table, one or more of the three date columns could have NULL values. We can adapt the above query as follows:

SELECT
    call_case,
    CASE WHEN (date1 > date2 OR date2 IS NULL) AND (date1 > date3 OR date3 IS NULL)
         THEN date1
         WHEN date2 > date3 OR date3 IS NULL
         THEN date2
         ELSE date3 END AS [Latest Date]
FROM #indebtedness;


SQL FIDDLE

Use MAX()

SELECT call_case,
  (SELECT Max(v) 
   FROM (VALUES (date1), (date2), (date3),...) AS value(v)) as [MostRecentDate]
FROM #indebtedness

Use CASE

 SELECT
        CASE
            WHEN Date1 >= Date2 AND Date1 >= Date3 THEN Date1
            WHEN Date2 >= Date1 AND Date2 >= Date3 THEN Date2
            WHEN Date3 >= Date1 AND Date3 >= Date2 THEN Date3
            ELSE                                        Date1
        END AS MostRecentDate
 FROM  #indebtedness




datetime