Version: v1

完整 SELECT 語法

所有操作都為 ANSI SQL compliant 的語法。ANSI SQL 相關介紹可以參考 W3School SQL Tutorial。

note

Canner Enterprise SQL 引擎基於 PrestoSQL (Trino) 底層做優化與改良，所有 Trino 支援的項目(連結) Canner Enterprise 亦支援。許多美國大廠企業都在使用像是 Amazon, Facebook, Uber, Airbnb 等每日處理 PB 級的數據透過 Presto 相當穩定。

caution

為了企業數據安全，我們在 Canner Enterprise 資料調用解決方案只能讓使用者 READ-only 也就是目前僅支援 SELECT statement，不支援 Update, Delete 等操作。

以下為 Canner Enterprise 所支援的 SELECT statement 相關用法。

Synopsis

[ WITH with_query [, ...] ]
SELECT [ ALL | DISTINCT ] select_expression [, ...]
[ FROM from_item [, ...] ]
[ WHERE condition ]
[ GROUP BY [ ALL | DISTINCT ] grouping_element [, ...] ]
[ HAVING condition]
[ { UNION | INTERSECT | EXCEPT } [ ALL | DISTINCT ] select ]
[ ORDER BY expression [ ASC | DESC ] [, ...] ]
[ OFFSET count [ ROW | ROWS ] ]
[ LIMIT { count | ALL } | FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } { ONLY | WITH TIES } ]

where from_item is one of

table_name [ [ AS ] alias [ ( column_alias [, ...] ) ] ]

from_item join_type from_item [ ON join_condition | USING ( join_column [, ...] ) ]

and join_type is one of

[ INNER ] JOIN
LEFT [ OUTER ] JOIN
RIGHT [ OUTER ] JOIN
FULL [ OUTER ] JOIN
CROSS JOIN

and grouping_element is one of

()
expression
GROUPING SETS ( ( column [, ...] ) [, ...] )
CUBE ( column [, ...] )
ROLLUP ( column [, ...] )

Description

Retrieve rows from zero or more tables.

WITH Clause

The WITH clause defines named relations for use within a query. It allows flattening nested queries or simplifying subqueries. For example, the following queries are equivalent:

SELECT a, b
FROM (
  SELECT a, MAX(b) AS b FROM t GROUP BY a
) AS x;

WITH x AS (SELECT a, MAX(b) AS b FROM t GROUP BY a)
SELECT a, b FROM x;

This also works with multiple subqueries:

WITH
  t1 AS (SELECT a, MAX(b) AS b FROM x GROUP BY a),
  t2 AS (SELECT a, AVG(d) AS d FROM y GROUP BY a)
SELECT t1.*, t2.*
FROM t1
JOIN t2 ON t1.a = t2.a;

Additionally, the relations within a WITH clause can chain:

WITH
  x AS (SELECT a FROM t),
  y AS (SELECT a AS b FROM x),
  z AS (SELECT b AS c FROM y)
SELECT c FROM z;

caution

Currently, the SQL for the WITH clause will be inlined anywhere the named relation is used. This means that if the relation is used more than once and the query is non-deterministic, the results may be different each time.

SELECT Clause

The SELECT clause specifies the output of the query. Each select_expression defines a column or columns to be included in the result.

SELECT [ ALL | DISTINCT ] select_expression [, ...]

The ALL and DISTINCT quantifiers determine whether duplicate rows are included in the result set. If the argument ALL is specified, all rows are included. If the argument DISTINCT is specified, only unique rows are included in the result set. In this case, each output column must be of a type that allows comparison. If neither argument is specified, the behavior defaults to ALL.

Select expressions

Each select_expression must be in one of the following forms:

expression [ [ AS ] column_alias ]

row_expression.* [ AS ( column_alias [, ...] ) ]

relation.*

In the case of expression [ [ AS ] column_alias ], a single output column is defined.

In the case of row_expression.* [ AS ( column_alias [, ...] ) ], the row_expression is an arbitrary expression of type ROW. All fields of the row define output columns to be included in the result set.

In the case of relation.*, all columns of relation are included in the result set. In this case column aliases are not allowed.

In the case of *, all columns of the relation defined by the query are included in the result set.

In the result set, the order of columns is the same as the order of their specification by the select expressions. If a select expression returns multiple columns, they are ordered the same way they were ordered in the source relation or row type expression.

If column aliases are specified, they override any preexisting column or row field names:

SELECT (CAST(ROW(1, true) AS ROW(field1 bigint, field2 boolean))).* AS (alias1, alias2);

  alias1 | alias2
--------+--------
      1 | true
(1 row)

Otherwise, the existing names are used:

SELECT (CAST(ROW(1, true) AS ROW(field1 bigint, field2 boolean))).*;

  field1 | field2
--------+--------
      1 | true
(1 row)

and in their absence, anonymous columns are produced:

SELECT (ROW(1, true)).*;

  _col0 | _col1
-------+-------
      1 | true
(1 row)

GROUP BY Clause

The GROUP BY clause divides the output of a SELECT statement into groups of rows containing matching values. A simple GROUP BY clause may contain any expression composed of input columns or it may be an ordinal number selecting an output column by position (starting at one).

The following queries are equivalent. They both group the output by the nationkey input column with the first query using the ordinal position of the output column and the second query using the input column name:

SELECT count(*), nationkey FROM customer GROUP BY 2;

SELECT count(*), nationkey FROM customer GROUP BY nationkey;

GROUP BY clauses can group output by input column names not appearing in the output of a select statement. For example, the following query generates row counts for the customer table using the input column mktsegment:

SELECT count(*) FROM customer GROUP BY mktsegment;

  _col0
-------
  29968
  30142
  30189
  29949
  29752
(5 rows)

When a GROUP BY clause is used in a SELECT statement all output expressions must be either aggregate functions or columns present in the GROUP BY clause.

Complex Grouping Operations

Presto also supports complex aggregations using the GROUPING SETS, CUBE and ROLLUP syntax. This syntax allows users to perform analysis that requires aggregation on multiple sets of columns in a single query. Complex grouping operations do not support grouping on expressions composed of input columns. Only column names or ordinals are allowed.

Complex grouping operations are often equivalent to a UNION ALL of simple GROUP BY expressions, as shown in the following examples. This equivalence does not apply, however, when the source of data for the aggregation is non-deterministic.

GROUPING SETS

Grouping sets allow users to specify multiple lists of columns to group on. The columns not part of a given sublist of grouping columns are set to NULL. :

SELECT * FROM shipping;

  origin_state | origin_zip | destination_state | destination_zip | package_weight
--------------+------------+-------------------+-----------------+----------------
  California   |      94131 | New Jersey        |            8648 |             13
  California   |      94131 | New Jersey        |            8540 |             42
  New Jersey   |       7081 | Connecticut       |            6708 |            225
  California   |      90210 | Connecticut       |            6927 |           1337
  California   |      94131 | Colorado          |           80302 |              5
  New York     |      10002 | New Jersey        |            8540 |              3
(6 rows)

GROUPING SETS semantics are demonstrated by this example query:

SELECT origin_state, origin_zip, destination_state, sum(package_weight)
FROM shipping
GROUP BY GROUPING SETS (
    (origin_state),
    (origin_state, origin_zip),
    (destination_state));

  origin_state | origin_zip | destination_state | _col0
--------------+------------+-------------------+-------
  New Jersey   | NULL       | NULL              |   225
  California   | NULL       | NULL              |  1397
  New York     | NULL       | NULL              |     3
  California   |      90210 | NULL              |  1337
  California   |      94131 | NULL              |    60
  New Jersey   |       7081 | NULL              |   225
  New York     |      10002 | NULL              |     3
  NULL         | NULL       | Colorado          |     5
  NULL         | NULL       | New Jersey        |    58
  NULL         | NULL       | Connecticut       |  1562
(10 rows)

The preceding query may be considered logically equivalent to a UNION ALL of multiple GROUP BY queries:

SELECT origin_state, NULL, NULL, sum(package_weight)
FROM shipping GROUP BY origin_state

UNION ALL

SELECT origin_state, origin_zip, NULL, sum(package_weight)
FROM shipping GROUP BY origin_state, origin_zip

UNION ALL

SELECT NULL, NULL, destination_state, sum(package_weight)
FROM shipping GROUP BY destination_state;

However, the query with the complex grouping syntax (GROUPING SETS, CUBE or ROLLUP) will only read from the underlying data source once, while the query with the UNION ALL reads the underlying data three times. This is why queries with a UNION ALL may produce inconsistent results when the data source is not deterministic.

CUBE

The CUBE operator generates all possible grouping sets (i.e. a power set) for a given set of columns. For example, the query:

SELECT origin_state, destination_state, sum(package_weight)
FROM shipping
GROUP BY CUBE (origin_state, destination_state);

is equivalent to:

SELECT origin_state, destination_state, sum(package_weight)
FROM shipping
GROUP BY GROUPING SETS (
    (origin_state, destination_state),
    (origin_state),
    (destination_state),
    ());

  origin_state | destination_state | _col0
--------------+-------------------+-------
  California   | New Jersey        |    55
  California   | Colorado          |     5
  New York     | New Jersey        |     3
  New Jersey   | Connecticut       |   225
  California   | Connecticut       |  1337
  California   | NULL              |  1397
  New York     | NULL              |     3
  New Jersey   | NULL              |   225
  NULL         | New Jersey        |    58
  NULL         | Connecticut       |  1562
  NULL         | Colorado          |     5
  NULL         | NULL              |  1625
(12 rows)

ROLLUP

The ROLLUP operator generates all possible subtotals for a given set of columns. For example, the query:

SELECT origin_state, origin_zip, sum(package_weight)
FROM shipping
GROUP BY ROLLUP (origin_state, origin_zip);

  origin_state | origin_zip | _col2
--------------+------------+-------
  California   |      94131 |    60
  California   |      90210 |  1337
  New Jersey   |       7081 |   225
  New York     |      10002 |     3
  California   | NULL       |  1397
  New York     | NULL       |     3
  New Jersey   | NULL       |   225
  NULL         | NULL       |  1625
(8 rows)

is equivalent to:

SELECT origin_state, origin_zip, sum(package_weight)
FROM shipping
GROUP BY GROUPING SETS ((origin_state, origin_zip), (origin_state), ());

Combining multiple grouping expressions

Multiple grouping expressions in the same query are interpreted as having cross-product semantics. For example, the following query:

SELECT origin_state, destination_state, origin_zip, sum(package_weight)
FROM shipping
GROUP BY
    GROUPING SETS ((origin_state, destination_state)),
    ROLLUP (origin_zip);

which can be rewritten as:

SELECT origin_state, destination_state, origin_zip, sum(package_weight)
FROM shipping
GROUP BY
    GROUPING SETS ((origin_state, destination_state)),
    GROUPING SETS ((origin_zip), ());

is logically equivalent to:

SELECT origin_state, destination_state, origin_zip, sum(package_weight)
FROM shipping
GROUP BY GROUPING SETS (
    (origin_state, destination_state, origin_zip),
    (origin_state, destination_state));

  origin_state | destination_state | origin_zip | _col3
--------------+-------------------+------------+-------
  New York     | New Jersey        |      10002 |     3
  California   | New Jersey        |      94131 |    55
  New Jersey   | Connecticut       |       7081 |   225
  California   | Connecticut       |      90210 |  1337
  California   | Colorado          |      94131 |     5
  New York     | New Jersey        | NULL       |     3
  New Jersey   | Connecticut       | NULL       |   225
  California   | Colorado          | NULL       |     5
  California   | Connecticut       | NULL       |  1337
  California   | New Jersey        | NULL       |    55
(10 rows)

The ALL and DISTINCT quantifiers determine whether duplicate grouping sets each produce distinct output rows. This is particularly useful when multiple complex grouping sets are combined in the same query. For example, the following query:

SELECT origin_state, destination_state, origin_zip, sum(package_weight)
FROM shipping
GROUP BY ALL
    CUBE (origin_state, destination_state),
    ROLLUP (origin_state, origin_zip);

is equivalent to:

SELECT origin_state, destination_state, origin_zip, sum(package_weight)
FROM shipping
GROUP BY GROUPING SETS (
    (origin_state, destination_state, origin_zip),
    (origin_state, origin_zip),
    (origin_state, destination_state, origin_zip),
    (origin_state, origin_zip),
    (origin_state, destination_state),
    (origin_state),
    (origin_state, destination_state),
    (origin_state),
    (origin_state, destination_state),
    (origin_state),
    (destination_state),
    ());

However, if the query uses the DISTINCT quantifier for the GROUP BY:

SELECT origin_state, destination_state, origin_zip, sum(package_weight)
FROM shipping
GROUP BY DISTINCT
    CUBE (origin_state, destination_state),
    ROLLUP (origin_state, origin_zip);

only unique grouping sets are generated:

SELECT origin_state, destination_state, origin_zip, sum(package_weight)
FROM shipping
GROUP BY GROUPING SETS (
    (origin_state, destination_state, origin_zip),
    (origin_state, origin_zip),
    (origin_state, destination_state),
    (origin_state),
    (destination_state),
    ());

The default set quantifier is ALL.

GROUPING Operation

grouping(col1, ..., colN) -> bigint

The grouping operation returns a bit set converted to decimal, indicating which columns are present in a grouping. It must be used in conjunction with GROUPING SETS, ROLLUP, CUBE or GROUP BY and its arguments must match exactly the columns referenced in the corresponding GROUPING SETS, ROLLUP, CUBE or GROUP BY clause.

To compute the resulting bit set for a particular row, bits are assigned to the argument columns with the rightmost column being the least significant bit. For a given grouping, a bit is set to 0 if the corresponding column is included in the grouping and to 1 otherwise. For example, consider the query below:

SELECT origin_state, origin_zip, destination_state, sum(package_weight),
       grouping(origin_state, origin_zip, destination_state)
FROM shipping
GROUP BY GROUPING SETS (
        (origin_state),
        (origin_state, origin_zip),
        (destination_state));

origin_state | origin_zip | destination_state | _col3 | _col4
--------------+------------+-------------------+-------+-------
California   | NULL       | NULL              |  1397 |     3
New Jersey   | NULL       | NULL              |   225 |     3
New York     | NULL       | NULL              |     3 |     3
California   |      94131 | NULL              |    60 |     1
New Jersey   |       7081 | NULL              |   225 |     1
California   |      90210 | NULL              |  1337 |     1
New York     |      10002 | NULL              |     3 |     1
NULL         | NULL       | New Jersey        |    58 |     6
NULL         | NULL       | Connecticut       |  1562 |     6
NULL         | NULL       | Colorado          |     5 |     6
(10 rows)

The first grouping in the above result only includes the origin_state column and excludes the origin_zip and destination_state columns. The bit set constructed for that grouping is 011 where the most significant bit represents origin_state.

HAVING Clause

The HAVING clause is used in conjunction with aggregate functions and the GROUP BY clause to control which groups are selected. A HAVING clause eliminates groups that do not satisfy the given conditions. HAVING filters groups after groups and aggregates are computed.

The following example queries the customer table and selects groups with an account balance greater than the specified value:

SELECT count(*), mktsegment, nationkey,
       CAST(sum(acctbal) AS bigint) AS totalbal
FROM customer
GROUP BY mktsegment, nationkey
HAVING sum(acctbal) > 5700000
ORDER BY totalbal DESC;

  _col0 | mktsegment | nationkey | totalbal
-------+------------+-----------+----------
| AUTOMOBILE |        19 |  5856939
| FURNITURE  |        14 |  5794887
| FURNITURE  |         9 |  5784628
| FURNITURE  |        12 |  5757371
| HOUSEHOLD  |         3 |  5753216
| MACHINERY  |         2 |  5719140
| FURNITURE  |         8 |  5701952
(7 rows)

UNION | INTERSECT | EXCEPT Clause

UNION INTERSECT and EXCEPT are all set operations. These clauses are used to combine the results of more than one select statement into a single result set:

query UNION [ALL | DISTINCT] query

query INTERSECT [DISTINCT] query

query EXCEPT [DISTINCT] query

The argument ALL or DISTINCT controls which rows are included in the final result set. If the argument ALL is specified all rows are included even if the rows are identical. If the argument DISTINCT is specified only unique rows are included in the combined result set. If neither is specified, the behavior defaults to DISTINCT. The ALL argument is not supported for INTERSECT or EXCEPT.

Multiple set operations are processed left to right, unless the order is explicitly specified via parentheses. Additionally, INTERSECT binds more tightly than EXCEPT and UNION. That means A UNION B INTERSECT C EXCEPT D is the same as A UNION (B INTERSECT C) EXCEPT D.

UNION

UNION combines all the rows that are in the result set from the first query with those that are in the result set for the second query. The following is an example of one of the simplest possible UNION clauses. It selects the value 13 and combines this result set with a second query that selects the value 42:

SELECT 13
UNION
SELECT 42;

  _col0
-------
    13
    42
(2 rows)

The following query demonstrates the difference between UNION and UNION ALL. It selects the value 13 and combines this result set with a second query that selects the values 42 and 13:

SELECT 13
UNION
SELECT * FROM (VALUES 42, 13);

  _col0
-------
    13
    42
(2 rows)

SELECT 13
UNION ALL
SELECT * FROM (VALUES 42, 13);

  _col0
-------
    13
    42
    13
(2 rows)

INTERSECT

INTERSECT returns only the rows that are in the result sets of both the first and the second queries. The following is an example of one of the simplest possible INTERSECT clauses. It selects the values 13 and 42 and combines this result set with a second query that selects the value 13. Since 42 is only in the result set of the first query, it is not included in the final results.:

SELECT * FROM (VALUES 13, 42)
INTERSECT
SELECT 13;

  _col0
-------
    13
(2 rows)

EXCEPT

EXCEPT returns the rows that are in the result set of the first query, but not the second. The following is an example of one of the simplest possible EXCEPT clauses. It selects the values 13 and 42 and combines this result set with a second query that selects the value 13. Since 13 is also in the result set of the second query, it is not included in the final result.:

SELECT * FROM (VALUES 13, 42)
EXCEPT
SELECT 13;

  _col0
-------
    42
(2 rows)

ORDER BY Clause

The ORDER BY clause is used to sort a result set by one or more output expressions:

ORDER BY expression [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [, ...]

Each expression may be composed of output columns, or it may be an ordinal number selecting an output column by position, starting at one. The ORDER BY clause is evaluated after any GROUP BY or HAVING clause, and before any OFFSET, LIMIT or FETCH FIRST clause. The default null ordering is NULLS LAST, regardless of the ordering direction.

Note that, following the SQL specification, an ORDER BY clause only affects the order of rows for queries that immediately contain the clause. Presto follows that specification, and drops redundant usage of the clause to avoid negative performance impacts.

In the following example, the clause only applies to the select statement.

INSERT INTO some_table
SELECT * FROM another_table
ORDER BY field

Since tables in SQL are inherently unordered, and the ORDER BY clause in this case does not result in any difference, but negatively impacts performance of running the overall insert statement, Presto skips the sort operation.

Another example where the ORDER BY clause is redundant, and does not affect the outcome of the overall statement, is a nested query:

SELECT *
FROM some_table
    JOIN (SELECT * FROM another_table ORDER BY field) u
    ON some_table.key = u.key

OFFSET Clause

The OFFSET clause is used to discard a number of leading rows from the result set:

OFFSET count [ ROW | ROWS ]

If the ORDER BY clause is present, the OFFSET clause is evaluated over a sorted result set, and the set remains sorted after the leading rows are discarded:

SELECT name FROM nation ORDER BY name OFFSET 22;

      name
----------------
  UNITED KINGDOM
  UNITED STATES
  VIETNAM
(3 rows)

Otherwise, it is arbitrary which rows are discarded. If the count specified in the OFFSET clause equals or exceeds the size of the result set, the final result is empty.

LIMIT or FETCH FIRST Clauses

The LIMIT or FETCH FIRST clause restricts the number of rows in the result set.

LIMIT { count | ALL }

FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } { ONLY | WITH TIES }

The following example queries a large table, but the LIMIT clause restricts the output to only have five rows (because the query lacks an ORDER BY, exactly which rows are returned is arbitrary):

SELECT orderdate FROM orders LIMIT 5;

  orderdate
------------
  1994-07-25
  1993-11-12
  1992-10-06
  1994-01-04
  1997-12-28
(5 rows)

LIMIT ALL is the same as omitting the LIMIT clause.

The FETCH FIRST clause supports either the FIRST or NEXT keywords and the ROW or ROWS keywords. These keywords are equivalent and the choice of keyword has no effect on query execution.

If the count is not specified in the FETCH FIRST clause, it defaults to 1:

SELECT orderdate FROM orders FETCH FIRST ROW ONLY;

  orderdate
------------
  1994-02-12
(1 row)

If the OFFSET clause is present, the LIMIT or FETCH FIRST clause is evaluated after the OFFSET clause:

SELECT * FROM (VALUES 5, 2, 4, 1, 3) t(x) ORDER BY x OFFSET 2 LIMIT 2;

  x
---
  3
  4
(2 rows)

For the FETCH FIRST clause, the argument ONLY or WITH TIES controls which rows are included in the result set.

If the argument ONLY is specified, the result set is limited to the exact number of leading rows determined by the count.

If the argument WITH TIES is specified, it is required that the ORDER BY clause be present. The result set consists of the same set of leading rows and all of the rows in the same peer group as the last of them ('ties') as established by the ordering in the ORDER BY clause. The result set is sorted:

SELECT name, regionkey FROM nation ORDER BY regionkey FETCH FIRST ROW WITH TIES;

    name    | regionkey
------------+-----------
  ETHIOPIA   |         0
  MOROCCO    |         0
  KENYA      |         0
  ALGERIA    |         0
  MOZAMBIQUE |         0
(5 rows)

TABLESAMPLE

There are multiple sample methods:

BERNOULLI Each row is selected to be in the table sample with a probability of the sample percentage. When a table is sampled using the Bernoulli method, all physical blocks of the table are scanned and certain rows are skipped (based on a comparison between the sample percentage and a random value calculated at runtime).

The probability of a row being included in the result is independent from any other row. This does not reduce the time required to read the sampled table from disk. It may have an impact on the total query time if the sampled output is processed further.

SYSTEM This sampling method divides the table into logical segments of data and samples the table at this granularity. This sampling method either selects all the rows from a particular segment of data or skips it (based on a comparison between the sample percentage and a random value calculated at runtime).

The rows selected in a system sampling will be dependent on which connector is used. For example, when used with Hive, it is dependent on how the data is laid out on HDFS. This method does not guarantee independent sampling probabilities.

note

Neither of the two methods allow deterministic bounds on the number of rows returned.

Examples:

SELECT *
FROM users TABLESAMPLE BERNOULLI (50);

SELECT *
FROM users TABLESAMPLE SYSTEM (75);

Using sampling with joins:

SELECT o.*, i.*
FROM orders o TABLESAMPLE SYSTEM (10)
JOIN lineitem i TABLESAMPLE BERNOULLI (40)
  ON o.orderkey = i.orderkey;

UNNEST

UNNEST can be used to expand an :ref:array_type or :ref:map_type into a relation. Arrays are expanded into a single column, and maps are expanded into two columns (key, value). UNNEST can also be used with multiple arguments, in which case they are expanded into multiple columns, with as many rows as the highest cardinality argument (the other columns are padded with nulls). UNNEST can optionally have a WITH ORDINALITY clause, in which case an additional ordinality column is added to the end. UNNEST is normally used with a JOIN and can reference columns from relations on the left side of the join.

Using a single column:

SELECT student, score
FROM tests
CROSS JOIN UNNEST(scores) AS t (score);

Using multiple columns:

SELECT numbers, animals, n, a
FROM (
  VALUES
    (ARRAY[2, 5], ARRAY['dog', 'cat', 'bird']),
    (ARRAY[7, 8, 9], ARRAY['cow', 'pig'])
) AS x (numbers, animals)
CROSS JOIN UNNEST(numbers, animals) AS t (n, a);

  numbers  |     animals      |  n   |  a
-----------+------------------+------+------
  [2, 5]    | [dog, cat, bird] |    2 | dog
  [2, 5]    | [dog, cat, bird] |    5 | cat
  [2, 5]    | [dog, cat, bird] | NULL | bird
  [7, 8, 9] | [cow, pig]       |    7 | cow
  [7, 8, 9] | [cow, pig]       |    8 | pig
  [7, 8, 9] | [cow, pig]       |    9 | NULL
(6 rows)

WITH ORDINALITY clause:

SELECT numbers, n, a
FROM (
  VALUES
    (ARRAY[2, 5]),
    (ARRAY[7, 8, 9])
) AS x (numbers)
CROSS JOIN UNNEST(numbers) WITH ORDINALITY AS t (n, a);

  numbers  | n | a
-----------+---+---
  [2, 5]    | 2 | 1
  [2, 5]    | 5 | 2
  [7, 8, 9] | 7 | 1
  [7, 8, 9] | 8 | 2
  [7, 8, 9] | 9 | 3
(5 rows)

Joins

Joins allow you to combine data from multiple relations.

CROSS JOIN

A cross join returns the Cartesian product (all combinations) of two relations. Cross joins can either be specified using the explit CROSS JOIN syntax or by specifying multiple relations in the FROM clause.

Both of the following queries are equivalent:

SELECT *
FROM nation
CROSS JOIN region;

SELECT *
FROM nation, region;

The nation table contains 25 rows and the region table contains 5 rows, so a cross join between the two tables produces 125 rows:

SELECT n.name AS nation, r.name AS region
FROM nation AS n
CROSS JOIN region AS r
ORDER BY 1, 2;

      nation     |   region
----------------+-------------
  ALGERIA        | AFRICA
  ALGERIA        | AMERICA
  ALGERIA        | ASIA
  ALGERIA        | EUROPE
  ALGERIA        | MIDDLE EAST
  ARGENTINA      | AFRICA
  ARGENTINA      | AMERICA
...
(125 rows)

LATERAL

Subqueries appearing in the FROM clause can be preceded by the keyword LATERAL. This allows them to reference columns provided by preceding FROM items.

A LATERAL join can appear at the top level in the FROM list, or anywhere within a parenthesized join tree. In the latter case, it can also refer to any items that are on the left-hand side of a JOIN for which it is on the right-hand side.

When a FROM item contains LATERAL cross-references, evaluation proceeds as follows: for each row of the FROM item providing the cross-referenced columns, the LATERAL item is evaluated using that row set's values of the columns. The resulting rows are joined as usual with the rows they were computed from. This is repeated for set of rows from the column source tables.

LATERAL is primarily useful when the cross-referenced column is necessary for computing the rows to be joined:

SELECT name, x, y
FROM nation
CROSS JOIN LATERAL (SELECT name || ' :-' AS x)
CROSS JOIN LATERAL (SELECT x || ')' AS y)

Qualifying Column Names

When two relations in a join have columns with the same name, the column references must be qualified using the relation alias (if the relation has an alias), or with the relation name:

SELECT nation.name, region.name
FROM nation
CROSS JOIN region;

SELECT n.name, r.name
FROM nation AS n
CROSS JOIN region AS r;

SELECT n.name, r.name
FROM nation n
CROSS JOIN region r;

The following query will fail with the error Column 'name' is ambiguous:

SELECT name
FROM nation
CROSS JOIN region;

Subqueries

A subquery is an expression which is composed of a query. The subquery is correlated when it refers to columns outside of the subquery. Logically, the subquery will be evaluated for each row in the surrounding query. The referenced columns will thus be constant during any single evaluation of the subquery.

note

Support for correlated subqueries is limited. Not every standard form is supported.

EXISTS

The EXISTS predicate determines if a subquery returns any rows:

SELECT name
FROM nation
WHERE EXISTS (SELECT * FROM region WHERE region.regionkey = nation.regionkey)

IN

The IN predicate determines if any values produced by the subquery are equal to the provided expression. The result of IN follows the standard rules for nulls. The subquery must produce exactly one column:

SELECT name
FROM nation
WHERE regionkey IN (SELECT regionkey FROM region)

Scalar Subquery

A scalar subquery is a non-correlated subquery that returns zero or one row. It is an error for the subquery to produce more than one row. The returned value is NULL if the subquery produces no rows:

SELECT name
FROM nation
WHERE regionkey = (SELECT max(regionkey) FROM region)

note

Currently only single column can be returned from the scalar subquery.

Synopsis​

Description​

WITH Clause​

SELECT Clause​

GROUP BY Clause​

HAVING Clause​

UNION | INTERSECT | EXCEPT Clause​

ORDER BY Clause​

OFFSET Clause​

LIMIT or FETCH FIRST Clauses​

TABLESAMPLE​

UNNEST​

Joins​

CROSS JOIN​

LATERAL​

Qualifying Column Names​

Subqueries​

EXISTS​

IN​

Scalar Subquery​

Synopsis

Description

WITH Clause

SELECT Clause

GROUP BY Clause

HAVING Clause

UNION | INTERSECT | EXCEPT Clause

ORDER BY Clause

OFFSET Clause

LIMIT or FETCH FIRST Clauses

TABLESAMPLE

UNNEST

Joins

CROSS JOIN

LATERAL

Qualifying Column Names

Subqueries

EXISTS

IN

Scalar Subquery