>>-select-clause--from-clause--+--------------+----------------->
'-where-clause-'
>----+-----------------+---+---------------+-------------------><
'-group-by-clause-' '-having-clause-'
The subselect is a component of the fullselect, the CREATE VIEW statement, and the INSERT statement. It is also a component of certain predicates, which in turn, are components of a subselect. A subselect that is a component of a predicate is called a subquery.
A subselect specifies a result table derived from the tables or views identified in the FROM clause. The derivation can be described as a sequence of operations in which the result of each operation is input for the next. (This is only a way of describing the subselect. The method used to perform the derivation may be quite different from this description.)
The sequence of the (hypothetical) operations is:
.-ALL------.
>>-SELECT----+----------+--------------------------------------->
'-DISTINCT-'
>-----+-*----------------------------------------------+-------><
| .-,-----------------------------------------. |
| V | |
'----+-expression--+----------------------+-+--+-'
| | .-AS-. | |
| '-+----+--column-name--' |
+-table-name.*-------------------------+
+-view-name.*--------------------------+
'-correlation-name.*-------------------'
|
The SELECT clause specifies the columns of the final result table. The column values are produced by the application of the select list to R. The select list is the names or expressions specified in the SELECT clause, and R is the result of the previous operation of the subselect. For example, if the only clauses specified are SELECT, FROM, and WHERE, R is the result of that WHERE clause.
Two rows are duplicates of one another only if each value in the first row is equal to the corresponding value in the second row. (For determining duplicate rows, two null values are considered equal.) Sort sequence is also used for determining distinct values.
DISTINCT is not allowed if the select list contains a LOB or DATALINK column.
The list of names is established when the statement containing the SELECT clause is prepared. Hence, * does not identify any columns that have been added to a table after the statement has been prepared.
The list of names is established when the statement containing the SELECT clause is prepared. Hence, * does not identify any columns that have been added to a table after the statement has been prepared.
Normally, when SQL statements are implicitly rebound, the list of names is not re-established. Therefore, the number of columns returned by the statement does not change. However, there are four cases where the list of names is established again and the number of columns can change:
The number of columns in the result of SELECT is the same as the number of expressions in the operational form of the select list (that is, the list established at prepare time), and cannot exceed 8000. The result of a subquery must be a single expression, unless the subquery is used in the EXISTS predicate.
Some of the results of applying the select list to R depend on whether or not GROUP BY or HAVING is used. Those results are described separately.
In either case the nth column of the result contains the values specified by applying the nth expression in the operational form of the select list.
Result columns allow null values if they are derived from:
Each column of the result of SELECT acquires a data type from the
expression from which it is derived.
| When the expression is: | The data type of the result column is: |
|---|---|
| The name of any numeric column | The same as the data type of the column, with the same precision and scale for decimal columns. |
| An integer constant | INTEGER or BIGINT (if the value of the constant is outside the range of INTEGER, but within the range of BIGINT). |
| A decimal or floating-point constant | The same as the data type of the constant, with the same precision and scale for decimal constants. |
| The name of any numeric host variable | The same as the data type of the variable, with the same precision and scale for decimal variables. If the data type of the variable is not identical to an SQL data type (for example, DISPLAY SIGN LEADING SEPARATE in COBOL), the result column is decimal. |
| An arithmetic expression | The same as the data type of the result, with the same precision and scale for decimal results as described under Expressions. |
| Any function | See "Built-In Functions" to determine the data type of the result. |
| The name of any string column | The same as the data type of the column, with the same length attribute. |
| The name of any string host variable | The same as the data type of the variable, with a length attribute equal to the length of the variable. If the data type of the variable is not identical to an SQL data type (for example, a NUL-terminated string in C), the result column is a varying-length string. |
| A character-string constant of length n | VARCHAR(n) |
| A graphic-string constant of length n | VARGRAPHIC(n) |
| The name of a datetime column, or an ILE RPG for AS/400 or ILE COBOL for AS/400 datetime host variable | The same as the data type of the column or host variable. |
.-,------------------.
V |
>>-FROM-------table-reference---+------------------------------><
|
The FROM clause specifies an intermediate result table.
If only one table-reference is specified, the intermediate result table is simply the result of that table-reference. If more than one table-reference is specified in the FROM clause, the intermediate result table consists of all possible combinations of the rows of the specified table-references (the Cartesian product). Each row of the result is a row from the first table-reference concatenated with a row from the second table-reference, concatenated in turn with a row from the third, and so on. The number of rows in the result is the product of the number of rows in all the individual table-references. For a description of table-reference, see table-reference.
>>-+-single-table------------+---------------------------------><
+-nested-table-expression-+
'-joined-table------------'
single-table
|----+-table-name-+---+---------------------+-------------------|
'-view-name--' '-correlation-clause--'
nested-table-expression
|---(--subselect--)--correlation-clause-------------------------|
correlation-clause
.-AS-.
|---+----+--correlation-name--+----------------------------+----|
| .-,--------------. |
| V | |
'-(-----column-name---+---)--'
|
A table-reference specifies an intermediate result table.
The list of names in the FROM clause must conform to these rules:
Each correlation-name is defined as a designator of the intermediate result table specified by the immediately preceding table-reference. A correlation name must be specified for a nested table expression.
The exposed names of all table references should be unique. An exposed name is:
Any qualified reference to a column for a table, view, or nested table expression must use the exposed name. If the same table name or view name is specified twice, at least one specification should be followed by a correlation-name. The correlation-name is used to qualify references to the columns of the table or view. When a correlation-name is specified, column-names can also be specified to give names to the columns of the table-name, view-name, or nested-table-expression. If a column list is specified, there must be a name in the column list for each column in the table or view and for each result column in the nested-table-expression. For more information, see Correlation Names.
In general, nested-table-expressions can be specified in any from-clause. Columns from the nested table expressions can be referenced in the select list and in the rest of the subselect using the correlation name which must be specified. The scope of this correlation name is the same as correlation names for other table or view names in the FROM clause. A nested table expression can be used:
Correlated references can be used in nested table expressions. The basic rule that applies is that the correlated reference must be from a table-reference at a higher level in the hierarchy of subqueries. This hierarchy includes the table-references that have already been resolved in the left-to-right processing of the FROM clause. So the following examples are valid syntax:
The following example is valid:
SELECT D.DEPTNO, D.DEPTNAME, EMPINFO.AVGSAL, EMPINFO.EMPCOUNT FROM DEPARTMENT D, (SELECT AVG(E.SALARY) AS AVGSAL,COUNT (*) AS EMPCOUNT FROM EMPLOYEE E WHERE E.WORKDEPT = (SELECT X.DEPTNO FROM DEPARTMENT X WHERE X.DEPTNO = E.WORKDEPT ) ) AS EMPINFO
The following example is not valid because the reference to D.DEPTNO in the WHERE clause of the nested-table-expression attempts to reference a table that is outside the hierarchy of subqueries:
SELECT D.DEPTNO, D.DEPTNAME, EMPINFO.AVGSAL, EMPINFO.EMPCOUNT FROM DEPARTMENT D, (SELECT AVG(E.SALARY) AS AVGSAL,COUNT (*) AS EMPCOUNT FROM EMPLOYEE E WHERE E.WORKDEPT = D.DEPTNO ) AS EMPINFO
.-INNER-----------.
|--+-table-reference--+-+-----------------+---JOIN--table-reference--ON--join-condition--+-+->
| | | .-OUTER-. | | |
| | +-LEFT-+-------+--+ | |
| | '-EXCEPTION-------' | |
| '-CROSS JOIN--table-reference--------------------------------------' |
'-(--joined-table--)--------------------------------------------------------------------'
>---------------------------------------------------------------|
join-condition
.-AND---------------------------------------.
V |
|-------+-expression--+- = --+---expression--+--+---------------|
| +- ¬= -+ |
| +- <> -+ |
| +- > --+ |
| +- >= -+ |
| +- < --+ |
| +- <= -+ |
| +- ¬< -+ |
| '- ¬> -' |
'-(--join-condition--)---------------'
| Note: | In the join-condition, you can use the exclamation point (!) in place of the not (¬) symbol. |
A joined-table specifies an intermediate result table that is the result of either an inner join, outer, cross or exception join. The table is derived by applying one of the join operators: INNER, LEFT OUTER, EXCEPTION, or CROSS to its operands.
If a join-operator is not specified, INNER is implicit. The order in which multiple joins are performed can affect the result. Joins can be nested within other joins. The order of processing for joins is generally from left to right, but based on the position of the required join-condition. Parentheses are recommended to make the order of nested joins more readable. For example:
TB1 LEFT JOIN TB2 ON TB1.C1=TB2.C1
LEFT JOIN TB3 LEFT JOIN TB4 ON TB3.C1=TB4.C1
ON TB1.C1=TB3.C1
is the same as
(TB1 LEFT JOIN TB2 ON TB1.C1=TB2.C1)
LEFT JOIN (TB3 LEFT JOIN TB4 ON TB3.C1=TB4.C1)
ON TB1.C1=TB3.C1
A joined table can be used in any context in which any form of the SELECT statement is used. A view or a cursor is read-only if its SELECT statement includes a joined table.
Using the INNER JOIN syntax with a join-condition will produce the same result as specifying the join by listing two tables in the FROM clause separated by commas and using the where-clause to provide the condition.
>>-WHERE--search-condition------------------------------------->< |
The WHERE clause specifies an intermediate result table that consists of those rows of R for which the search-condition is true. R is the result of the FROM clause of the statement.
The search-condition must conform to the following rules:
If a sort sequence other than *HEX is in effect when the statement that contains the WHERE clause is executed and if the search-condition contains predicates that have SBCS, mixed, or UCS-2 data, then the comparison for those predicates is done using weighted values. The weighted values are derived by applying the sort sequence to the operands of the predicate.
Any subquery in the search-condition is effectively executed for each row of R and the results are used in the application of the search-condition to the given row of R. A subquery is actually executed for each row of R only if it includes a correlated reference to a column of R. In fact, a subquery with no correlated reference is executed just once, whereas a subquery with a correlated reference may have to be executed once for each row.
.-,----------------------.
V |
>>-GROUP BY-------grouping-expression---+----------------------><
|
The GROUP BY clause specifies an intermediate result table that consists of a grouping of the rows of R. R is the result of the previous clause of the subselect.
In its simplest form, a GROUP BY clause contains a grouping-expression. A grouping-expression is an expression used in defining the grouping of R. Each column-name included in a grouping-expression must unambiguously identify a column of R. LOB and DataLink columns cannot be used in a grouping-expression. The length attribute of each grouping-expression must not be more than 2000, or 1999 if the column is nullable. A grouping-expression cannot include a function that is variant, or the RRN, PARTITION, NODENAME, or NODENUMBER functions.
The result of the GROUP BY clause is a set of groups of rows. In each group of more than one row, all values of each grouping-expression are equal, and all rows with the same set of values of the grouping-expressions are in the same group. For grouping, all null values for a grouping-expression are considered equal.
If a sort sequence other than *HEX is in effect when the statement that contains the GROUP BY clause is executed, the rows are placed into groups using the weighted values. The weighted values are derived by applying the sort sequence to the SBCS data grouping-expressions, to the SBCS data of mixed data grouping-expressions, and to UCS-2 data grouping-expressions.
Grouping-expressions can be used in a search condition in a HAVING clause, in the SELECT clause, or in a sort-key-expression of an ORDER BY clause (see order-by-clause for details). In each case, the reference specifies only one value for each group. The grouping-expression specified in these clauses must exactly match the grouping-expression in the GROUP BY clause, except that blanks are not significant. For example, a grouping-expression of
SALARY*.10
will match the expression in a having-clause of
HAVING SALARY*.10
but will not match
HAVING .10 *SALARY or HAVING (SALARY*.10)+100
If the grouping-expression contains varying-length strings with trailing blanks, the values in the group can differ in the number of trailing blanks and may not all have the same length. In that case, a reference to the grouping-expression still specifies only one value for each group, but the value for a group is chosen arbitrarily from the available set of values. Thus, the actual length of the result value is unpredictable.
The GROUP BY clause can contain up to 120 grouping-expressions or 2000 - n bytes, where n is the number of grouping-expressions specified that allow nulls.
>>-HAVING--search-condition------------------------------------>< |
The HAVING clause specifies an intermediate result table that consists of those groups of R for which the search-condition is true. R is the result of the previous clause of the subselect. If this clause is not GROUP BY, R is considered a single group with no grouping expressions.
Each expression that contains a column-name in the search condition must do one of the following:
The RRN, PARTITION, NODENAME, and NODENUMBER functions cannot be specified in the HAVING clause unless it is within a column function. See "Functions" in Chapter 3 for restrictions that apply to the use of column functions.
If a sort sequence other than *HEX is in effect when the statement that contains the HAVING clause is executed and if the search-condition contains predicates that have SBCS, mixed, or UCS-2 data, the comparison for those predicates is done using weighted values. The weighted values are derived by applying the sort sequence to the operands in the predicate.
A group of R to which the search condition is applied supplies the argument for each column function in the search condition, except for any function whose argument is a correlated reference.
If the search condition contains a subquery, the subquery can be thought of as being executed each time the search condition is applied to a group of R, and the results used in applying the search condition. In actuality, the subquery is executed for each group only if it contains a correlated reference. For an illustration of the difference, see examples 6 and 7 under Examples of a subselect.
A correlated reference to a group of R must either identify a grouping column or be contained within a column function.
When HAVING is used without GROUP BY, any column name in the select list must appear within a column function.
Select all columns and rows from the EMPLOYEE table.
SELECT * FROM EMPLOYEE
Join the EMP_ACT and EMPLOYEE tables, select all the columns from the EMP_ACT table and add the employee's surname (LASTNAME) from the EMPLOYEE table to each row of the result.
SELECT EMP_ACT.*, LASTNAME
FROM EMP_ACT, EMPLOYEE
WHERE EMP_ACT.EMPNO = EMPLOYEE.EMPNO
Join the EMPLOYEE and DEPARTMENT tables, select the employee number (EMPNO), employee surname (LASTNAME), department number (WORKDEPT in the EMPLOYEE table and DEPTNO in the DEPARTMENT table) and department name (DEPTNAME) of all employees who were born (BIRTHDATE) earlier than 1930.
SELECT EMPNO, LASTNAME, WORKDEPT, DEPTNAME
FROM EMPLOYEE, DEPARTMENT
WHERE WORKDEPT = DEPTNO
AND YEAR(BIRTHDATE) < 1930
This subselect could also be written as follows:
SELECT EMPNO, LASTNAME, WORKDEPT, DEPTNAME
FROM EMPLOYEE INNER JOIN DEPARTMENT
ON WORKDEPT = DEPTNO
WHERE YEAR(BIRTHDATE) < 1930
Select the job (JOB) and the minimum and maximum salaries (SALARY) for each group of rows with the same job code in the EMPLOYEE table, but only for groups with more than one row and with a maximum salary greater than or equal to 27000.
SELECT JOB, MIN(SALARY), MAX(SALARY)
FROM EMPLOYEE
GROUP BY JOB
HAVING COUNT(*) > 1 AND MAX(SALARY) >= 27000
Select all the rows of EMP_ACT table for employees (EMPNO) in department (WORKDEPT) 'E11'. (Employee department numbers are shown in the EMPLOYEE table.)
SELECT * FROM EMP_ACT
WHERE EMPNO IN (SELECT EMPNO
FROM EMPLOYEE
WHERE WORKDEPT = 'E11')
From the EMPLOYEE table, select the department number (WORKDEPT) and maximum departmental salary (SALARY) for all departments whose maximum salary is less than the average salary for all employees.
SELECT WORKDEPT, MAX(SALARY)
FROM EMPLOYEE
GROUP BY WORKDEPT
HAVING MAX(SALARY) < (SELECT AVG(SALARY)
FROM EMPLOYEE)
The subquery in the HAVING clause would only be executed once in this example.
Using the EMPLOYEE table, select the department number (WORKDEPT) and maximum departmental salary (SALARY) for all departments whose maximum salary is less than the average salary in all other departments.
SELECT WORKDEPT, MAX(SALARY)
FROM EMPLOYEE EMP_COR
GROUP BY WORKDEPT
HAVING MAX(SALARY) < (SELECT AVG(SALARY)
FROM EMPLOYEE
WHERE NOT WORKDEPT = EMP_COR.WORKDEPT)
In contrast to example 6, the subquery in the HAVING clause would need to be executed for each group.
Join the EMPLOYEE and EMP_ACT tables, select all of the employees and their project numbers. Return even those employees that do not have a project number currently assigned.
SELECT EMPLOYEE.EMPNO, PROJNO
FROM EMPLOYEE LEFT OUTER JOIN EMP_ACT
ON EMPLOYEE.EMPNO = EMP_ACT.EMPNO
Any employee in the EMPLOYEE table that does not have a project number in the EMP_ACT table will return one row in the result table containing the EMPNO value and the null value in the PROJNO column.