# 基准

### 建立

``````CREATE TABLE purchases (
id          serial
, customer_id int  -- REFERENCES customer
, total       int  -- could be amount of money in Cent
, some_column text -- to make the row bigger, more realistic
);
``````

``````INSERT INTO purchases (customer_id, total, some_column)    -- insert 200k rows
SELECT (random() * 10000)::int             AS customer_id  -- 10k customers
, (random() * random() * 100000)::int AS total
, 'note: ' || repeat('x', (random()^2 * random() * random() * 500)::int)
FROM   generate_series(1,200000) g;

ALTER TABLE purchases ADD CONSTRAINT purchases_id_pkey PRIMARY KEY (id);

DELETE FROM purchases WHERE random() > 0.9; -- some dead rows

INSERT INTO purchases (customer_id, total, some_column)
SELECT (random() * 10000)::int             AS customer_id  -- 10k customers
, (random() * random() * 100000)::int AS total
, 'note: ' || repeat('x', (random()^2 * random() * random() * 500)::int)
FROM   generate_series(1,20000) g;  -- add 20k to make it ~ 200k

CREATE INDEX purchases_3c_idx ON purchases (customer_id, total DESC, id);

VACUUM ANALYZE purchases;
``````

`customer`表 - 用于优越的查询

``````CREATE TABLE customer AS
SELECT customer_id, 'customer_' || customer_id AS customer
FROM   purchases
GROUP  BY 1
ORDER  BY 1;

ALTER TABLE customer ADD CONSTRAINT customer_customer_id_pkey PRIMARY KEY (customer_id);

VACUUM ANALYZE customer;
``````

### `purchases`表格的对象大小

``````               what                | bytes/ct | bytes_pretty | bytes_per_row
-----------------------------------+----------+--------------+---------------
core_relation_size                | 20496384 | 20 MB        |           102
visibility_map                    |        0 | 0 bytes      |             0
free_space_map                    |    24576 | 24 kB        |             0
table_size_incl_toast             | 20529152 | 20 MB        |           102
indexes_size                      | 10977280 | 10 MB        |            54
total_size_incl_toast_and_indexes | 31506432 | 30 MB        |           157
live_rows_in_text_representation  | 13729802 | 13 MB        |            68
------------------------------    |          |              |
row_count                         |   200045 |              |
live_tuples                       |   200045 |              |
``````

# 查询

### 1.在CTE中的`row_number()` ，（ 请参阅其他答案 ）

``````WITH cte AS (
SELECT id, customer_id, total
, row_number() OVER(PARTITION BY customer_id ORDER BY total DESC) AS rn
FROM   purchases
)
SELECT id, customer_id, total
FROM   cte
WHERE  rn = 1;
``````

### 2.子查询中的`row_number()` （我的优化）

``````SELECT id, customer_id, total
FROM   (
SELECT id, customer_id, total
, row_number() OVER(PARTITION BY customer_id ORDER BY total DESC) AS rn
FROM   purchases
) sub
WHERE  rn = 1;
``````

### 3. `DISTINCT ON` （ 查看其他答案 ）

``````SELECT DISTINCT ON (customer_id)
id, customer_id, total
FROM   purchases
ORDER  BY customer_id, total DESC, id;
``````

### 4.带有`LATERAL`子查询的rCTE（ 请参阅此处 ）

``````WITH RECURSIVE cte AS (
(  -- parentheses required
SELECT id, customer_id, total
FROM   purchases
ORDER  BY customer_id, total DESC
LIMIT  1
)
UNION ALL
SELECT u.*
FROM   cte c
,      LATERAL (
SELECT id, customer_id, total
FROM   purchases
WHERE  customer_id > c.customer_id  -- lateral reference
ORDER  BY customer_id, total DESC
LIMIT  1
) u
)
SELECT id, customer_id, total
FROM   cte
ORDER  BY customer_id;
``````

### 5. `customer`表与`LATERAL` （ 见这里 ）

``````SELECT l.*
FROM   customer c
,      LATERAL (
SELECT id, customer_id, total
FROM   purchases
WHERE  customer_id = c.customer_id  -- lateral reference
ORDER  BY total DESC
LIMIT  1
) l;
``````

### 6.使用`ORDER BY``array_agg()` （ 请参阅其他答案 ）

``````SELECT (array_agg(id ORDER BY total DESC))[1] AS id
, customer_id
, max(total) AS total
FROM   purchases
GROUP  BY customer_id;
``````

# 结果

### A. Postgres 9.4具有200k行，每个`customer_id`约为20

``````1. 273.274 ms
2. 194.572 ms
3. 111.067 ms
4.  92.922 ms
5.  37.679 ms  -- winner
6. 189.495 ms
``````

### B.与Postgres 9.5相同

``````1. 288.006 ms
2. 223.032 ms
3. 107.074 ms
4.  78.032 ms
5.  33.944 ms  -- winner
6. 211.540 ms
``````

### C.与B.相同，但每个`customer_id`约2.3行

``````1. 381.573 ms
2. 311.976 ms
3. 124.074 ms  -- winner
4. 710.631 ms
5. 311.976 ms
6. 421.679 ms
``````

### 2011年原始（过时）基准

1. 选择整个表格，在这种情况下会产生5958行。

``````A: 567.218 ms
B: 386.673 ms
``````
2. 使用条件`WHERE customer BETWEEN x AND y`导致1000行。

``````A: 249.136 ms
B:  55.111 ms
``````
3. 选择`WHERE customer = x`的单个客户。

``````A:   0.143 ms
B:   0.072 ms
``````

``````CREATE INDEX purchases_3c_idx ON purchases (customer, total DESC, id);
``````

``````1A: 277.953 ms
1B: 193.547 ms

2A: 249.796 ms -- special index not used
2B:  28.679 ms

3A:   0.120 ms
3B:   0.048 ms
``````

``````SELECT * FROM purchases;
``````

```id | customer | total
---+----------+------
1 | Joe      | 5
2 | Sally    | 3
3 | Joe      | 2
4 | Sally    | 1
```

``````SELECT FIRST(id), customer, FIRST(total)
FROM  purchases
GROUP BY customer
ORDER BY total DESC;
``````

```FIRST(id) | customer | FIRST(total)
----------+----------+-------------
1 | Joe      | 5
2 | Sally    | 3
```

## 在Oracle 9.2+（不像原来的8i +），SQL Server 2005+，PostgreSQL 8.4+，DB2，Firebird 3.0+，Teradata，Sybase，Vertica：

``````WITH summary AS (
SELECT p.id,
p.customer,
p.total,
ROW_NUMBER() OVER(PARTITION BY p.customer
ORDER BY p.total DESC) AS rk
FROM PURCHASES p)
SELECT s.*
FROM summary s
WHERE s.rk = 1
``````

## 任何数据库支持：

``````  SELECT MIN(x.id),  -- change to MAX if you want the highest
x.customer,
x.total
FROM PURCHASES x
JOIN (SELECT p.customer,
MAX(total) AS max_total
FROM PURCHASES p
GROUP BY p.customer) y ON y.customer = x.customer
AND y.max_total = x.total
GROUP BY x.customer, x.total
``````

``````SELECT  customer,
(array_agg(id ORDER BY total DESC))[1],
max(total)
FROM purchases
GROUP BY customer
``````

• `array_agg`是一个聚合函数，所以它可以与`GROUP BY`
• `array_agg`可以让你指定一个`array_agg`于自身的排序，所以它不会限制整个查询的结构。 如果您需要执行与默认值不同的操作，那么还有如何对NULL进行排序的语法。
• 一旦我们构建了数组，我们就拿第一个元素。 （Postgres数组是1索引的，而不是0索引的）。
• 你可以用类似的方法为你的第三个输出列使用`array_agg` ，但`max(total)`更简单。
• `DISTINCT ON`不同，使用`array_agg`可让您保留`GROUP BY` ，以防其他原因需要使用。

``````SELECT a.*
FROM
purchases a
JOIN (
SELECT customer, min( id ) as id
FROM purchases
GROUP BY customer
) b USING ( id );
``````

``````create index purchases_id on purchases (id);
``````

``````select * from purchase
join (
select min(id) as id from purchase
join (
select customer, max(total) as total from purchase
group by customer
) t1 using (customer, total)
group by customer
) t2 using (id)
order by customer
``````

1. t1，t2是可以根据数据库删除的子查询别名。

2. 注意 ：截至2017年1月，在此编辑中， `using (...)`子句目前不支持MS-SQL和Oracle数据库。您必须`on t2.id = purchase.id`将其扩展到例如`on t2.id = purchase.id``on t2.id = purchase.id`语法适用于SQLite，MySQL和PostgreSQL。

``````SELECT purchases.*
FROM purchases
LEFT JOIN purchases as p
ON
p.customer = purchases.customer
AND
purchases.total < p.total
WHERE p.total IS NULL
``````

• ∀xT（客户，id）> T（客户，x）（该总数高于该客户的所有其他总数）

• ¬∃xT（customer，id）<T（customer，x）（该客户不存在更高的总数）

``````      LEFT JOIN purchases as p
ON
p.customer = purchases.customer
AND
purchases.total < p.total
``````

``````purchases.id, purchases.customer, purchases.total, p.id, p.customer, p.total
1           , Tom           , 200             , 2   , Tom   , 300
2           , Tom           , 300
3           , Bob           , 400             , 4   , Bob   , 500
4           , Bob           , 500
5           , Alice         , 600             , 6   , Alice   , 700
6           , Alice         , 700
``````

``````WHERE p.total IS NULL

purchases.id, purchases.name, purchases.total, p.id, p.name, p.total
2           , Tom           , 300
4           , Bob           , 500
6           , Alice         , 700
``````