top - sql update first record in group




從每組的第一行和最後一行獲取值 (2)

我是Postgres的新手,來自MySQL,希望你們中的一個能夠幫助我。

我有一個三列的表: nameweekvalue 。 這個表格記錄了名字,他們記錄身高的一周,以及身高的值。 像這樣的東西:

Name  |  Week  | Value
------+--------+-------
John  |  1     | 9
Cassie|  2     | 5
Luke  |  6     | 3
John  |  8     | 14
Cassie|  5     | 7
Luke  |  9     | 5
John  |  2     | 10
Cassie|  4     | 4
Luke  |  7     | 4

我想要的是每個用戶在最低星期和最高星期的價值列表。 像這樣的東西:

Name  |minWeek | Value |maxWeek | value
------+--------+-------+--------+-------
John  |  1     | 9     | 8      | 14
Cassie|  2     | 5     | 5      | 7
Luke  |  6     | 3     | 9      | 5

在Postgres中,我使用這個查詢:

select name, week, value
from table t
inner join(
select name, min(week) as minweek
from table
group by name)
ss on t.name = ss.name and t.week = ss.minweek
group by t.name
;

但是,我收到一個錯誤:

列“w.week”必須出現在GROUP BY子句中或用於聚合函數中
職位:20

這在MySQL中工作得很好,所以我想知道我在這裡做錯了什麼?


這有點痛苦,因為Postgres有很好的窗口函數first_value()last_value() ,但是這些不是聚合函數。 所以,這裡是一個方法:

select t.name, min(t.week) as minWeek, max(firstvalue) as firstvalue,
       max(t.week) as maxWeek, max(lastvalue) as lastValue
from (select t.*, first_value(value) over (partition by name order by week) as firstvalue,
             last_value(value) over (partition by name order by week) as lastvalue
      from table t
     ) t
group by t.name;

有各種更簡單快捷的方法。

2x DISTINCT ON

SELECT *
FROM  (
   SELECT DISTINCT ON (name)
          name, week AS first_week, value AS first_val
   FROM   tbl
   ORDER  BY name, week
   ) f
JOIN (
   SELECT DISTINCT ON (name)
          name, week AS last_week, value AS last_val
   FROM   tbl
   ORDER  BY name, week DESC
   ) l USING (name);

或更短:

SELECT *
FROM  (SELECT DISTINCT ON (1) name, week AS first_week, value AS first_val
       FROM   tbl ORDER BY 1,2) f
JOIN  (SELECT DISTINCT ON (1) name, week AS last_week, value AS last_val
       FROM   tbl ORDER BY 1,2 DESC) l USING (name);

簡單易懂。 在我的測試中也是最快的。 DISTINCT ON詳細說明:

複合類型的first_value()

集合函數min()max()不接受複合類型作為輸入。 你將不得不創建自定義的聚合函數(這並不困難)。
但是窗口函數first_value()last_value() 可以 。 在此基礎上我們可以設計一個非常簡單的解決方案:

簡單的查詢

SELECT DISTINCT ON (name)
       name, week AS first_week, value AS first_value
     ,(first_value((week, value)) OVER (PARTITION BY name
                                        ORDER BY week DESC))::text AS l
FROM   tbl t
ORDER  BY name, week;

輸出包含所有數據,但是上週的值填入匿名記錄。 您可能需要分解值。

分解的結果與機會使用表類型

為此,我們需要一個眾所周知的類型,用系統註冊包含元素的類型。 一個適應的表格定義將允許直接使用表格類型本身:

CREATE TABLE tbl (week int, value int, name text) -- note optimized column order

weekvalue第一。

SELECT (l).name, first_week, first_val
     , (l).week AS last_week, (l).value AS last_val
FROM (
   SELECT DISTINCT ON (name)
          week AS first_week, value AS first_val
         ,first_value(t) OVER (PARTITION BY name ORDER BY week DESC) AS l
   FROM   tbl t
   ORDER  BY name, week
   ) sub;

來自用戶定義的行類型的分解結果

但是,在大多數情況下這可能是不可能的。 只需使用CREATE TYPE (永久)或CREATE TEMP TABLE (用於臨時使用)的用戶定義類型即可:

CREATE TEMP TABLE nv(last_week int, last_val int);  -- register composite type

SELECT name, first_week, first_val, (l).last_week, (l).last_val
FROM (
   SELECT DISTINCT ON (name)
          name, week AS first_week, value AS first_val
         ,first_value((week, value)::nv) OVER (PARTITION BY name
                                               ORDER BY week DESC) AS l
   FROM   tbl t
   ORDER  BY name, week
   ) sub;

在Postgres 9.3的一個類似的50k行表的本地測試中,這些查詢中的每一個都比當前接受的答案快得多。 用EXPLAIN ANALYZE測試。

SQL小提琴顯示全部。





greatest-n-per-group