mysql - tutorial - sql commands
Why is there no “first greater/less than[or equal to]” comparison operator in SQL? (2)
I am confused as to whether this is an appropriate forum for this question. But, the reason these operators do not exist is that they are not particularly useful and other ANSI SQL functionality takes their place.
First, the comparison operators in the
on clause are, in every database I know, available in the
case clauses. It is unclear how these operators would be used in these contexts.
Second, the operators don't specify what to do in the case of ties. Return all rows? But that would return multiple rows when the user of such an operator would expect only one row.
Third, ANSI standard functionality, such as
row_number() can generate equivalent results. Although it might not be as optimal for this particular problem, it is more general. And standard.
By the way, Postgres has a nice capability with
distinct on(), which is often more efficient than the analytic function equivalent.
I have secretly wanted a new
lookup join, that would fail if more than one record matched. However, I'm not so sure that the entire language should be changed for this purpose.
I am considering the proposal of 4 new comparison operators in SQL. These are similar to the
<= operators, but are true only if the value of each operand is, among all values satisfying the inequality, the one closest to the value of the other operand. Since one value is almost at the other value, I have come to the conclusion (after realizing that there is no
first keyword, and after discarding the
unique keyword) that a good choice would be to define these 4 new operators:
a @> b: true if
a > band no
a' < asatisfies
a' > band no
b' > bsatisfies
a > b'
a @< b: true if
b @> a
a @>= b: true if
a ≥ band no
a' < asatisfies
a' ≥ band no
b' > bsatisfies
a ≥ b'
a @<= b: true if
b @>= a
The question is: is there some good reason why operators like these don't exist already?
(2014-03-20) I reformulate the question, because the above formulation apparently isn't clear enough:
Is there a reason why operators like these should not exist?
The following examples are intended as a starting point for finding out where the problems with the
@... operators could lie. I will use 3 MySQL tables:
create table ta (id int auto_increment, ca char, primary key(id), unique index(ca)); create table tb (id int auto_increment, cb char, primary key(id), index(cb)); create table tc (id int auto_increment, cc char, primary key(id)); insert into ta (ca) values ('A'),('E'),('I'),('O'),('U'); insert into tb (cb) values ('C'),('D'),('E'),('F'),('F'),('M'),('N'),('O'),('Z'); insert into tc (cc) values ('C'),('D'),('E'),('F'),('F'),('M'),('N'),('O'),('Z');
When columns have unique values, the effect of the
@... operators can be obtained by limiting the output of queries or subqueries to 1 row, although with a somewhat more clumsy syntax:
?> select * from ta where ca @> 'B'; -- currently not valid, equivalent to: !> select * from ta where ca > 'B' order by ca limit 1; +----+------+ | id | ca | +----+------+ | 2 | E | +----+------+
limit 1 is specific to MySQL, MariaDB, PostgreSQL, etc., other RDBMSs have
select top 1,
where rownum = 1, etc.)
ta we have a unique index on column
ca. This index can be exploited to get to the selected value at the same speed as for
ca = 'E'. The optimizer may realize this, but if it doesn't, data structures may be set up for an unneeded scan starting at the selected value (MySQL's
explain says that this is a
range type query).
When columns have non-unique values, limiting output rows is useless, and the syntax becomes even more clumsy:
?> select * from tb where cb @> 'E'; -- currently not valid, equivalent to: !> select * from tb where cb = (select min(cb) from tb where cb > 'E'); +----+------+ | id | cb | +----+------+ | 4 | F | | 5 | F | +----+------+
Luckily, if I correctly read the output of
explain, MySQL is smart enough to optimize the subquery away, but if it weren't, the index would be used twice instead of once.
In the case of the table
tc, which has no index on column
cc, MySQL makes two table scans. This is understandable, since a single table scan would mean using an unknown amount of storage for the temporary result.
Suppose you need all pairs consisting of a value and its successor value:
?> select t1.ca as c1, t2.ca as c2 from ta t1 join ta t2 on t1.ca @< t2.ca; -- currently not valid, equivalent to: !> select t1.ca as c1, t2.ca as c2 from ta t1 join ta t2 on t2.ca = (select min(ca) from ta where ca > t1.ca); +------+------+ | c1 | c2 | +------+------+ | A | E | | E | I | | I | O | | O | U | +------+------+
If I read the output of
explain correctly, the MySQL optimizer is not able to do without the correlated subquery, while we humans would know better. Maybe with the help of special handling of the
@... operators wired in, the optimizer would do a single scan?
This is similar, but across two tables, one of which has a non-unique index:
?> select * from ta join tb on ca @< cb; -- currently not valid, equivalent to: !> select * from ta join tb on cb = (select min(cb) from tb where cb > ca); +----+------+----+------+ | id | ca | id | cb | +----+------+----+------+ | 1 | A | 1 | C | | 2 | E | 4 | F | | 2 | E | 5 | F | | 3 | I | 6 | M | | 4 | O | 9 | Z | | 5 | U | 9 | Z | +----+------+----+------+
Here too, the MySQL optimizer does not optimize away the subquery, although (maybe with a hint by the
@<) it could.
(Added on 2014-03-20.) The
@... operators seem to make sense wherever their
@-less counterparts do. Here is a contrived example of an expression in a
?> select * from ta join tb where round((ascii(ca)+ascii(cb))/2) @> ascii('E'); -- currently not valid, equivalent to: !> select * from ta join tb where round((ascii(ca)+ascii(cb))/2) = ( select min(round((ascii(ca)+ascii(cb))/2)) from ta, tb where round((ascii(ca)+ascii(cb))/2) > ascii('E') ); +----+------+----+------+ | id | ca | id | cb | +----+------+----+------+ | 3 | I | 1 | C | | 2 | E | 4 | F | | 2 | E | 5 | F | +----+------+----+------+
... and this is another example, this time of a
?> select *, cb @< ca from tb, ta; -- currently not valid, equivalent to: !> select *, ifnull(cb = (select max(cb) from tb where cb < ca), 0) as 'cb @< ca' from tb, ta; +----+------+----+------+----------+ | id | cb | id | ca | cb @< ca | +----+------+----+------+----------+ | 1 | C | 1 | A | 0 | | 1 | C | 2 | E | 0 | | 1 | C | 3 | I | 0 | | 1 | C | 4 | O | 0 | | 1 | C | 5 | U | 0 | | 2 | D | 1 | A | 0 | | 2 | D | 2 | E | 1 | | -- (omitting rows with cb @< ca equal to 0 from here on) | 4 | F | 3 | I | 1 | | 5 | F | 3 | I | 1 | | 7 | N | 4 | O | 1 | | 8 | O | 5 | U | 1 |
I am aware of the following caveats:
@... operators are "non-local", because they need the knowledge of all possible values of their operands. This seems not to be a problem in all conditions of the kinds shown in the above examples, but could be a problem in other places (although I haven't found an example yet that could not be solved by an additional subquery).
@... operators, unlike their
@-less counterparts, are not transitive. They share this property with the
<> operator, though.
Fully exploiting the
@... operators could mean introducing new index and table access types (as discussed in the examples).
Please note that this question is not meant as a starting point for a discussion. I am looking for the reasons why something like the
@... operators is not in the standard nor in any SQL dialect known to me - I expect these reasons to be related to some problem with the definition and/or implementation of these operators that I have overlooked.
I know that one reason is "Occam's razor" (pluralitas non est ponenda sine necessitate), but, as I have tried to show above, pluralitas here also brings some advantages (conciseness and ease of optimization). I am looking for stronger reasons.
@<= could become
|<= or similar (read: first greater/less [equal]), in order not to collide with established usages of the
@ identifier prefix.
Perhaps this is not truly a question for somewhere like this. It does sound like a nice function, I must admit, but there are other ways of doing it. (Though I guess a lot of other functions have been added that had other ways f doing it too!)
I suppose the true answers to something like this could be something like:
A. Demand - Is it worth the developers time to add functions like this to SQL if they feel that very few people would use them?
B. Functionality - Does it actually allow the developer to DO MORE, or is it just another way of getting to the same goal?
C. Laziness - Can they actually be bothered to develop something like this?
The best way to get an answer to this would be to submit it to their development team and see what their response is, if enough people could get involved in asking for something like this to be implemented, then it could... power of the majority!