SQL DML
Goals
- Learn how to change the ordering and limit the number of SQL results.
- See how to update data using SQL.
- Explore SQL query and update expressions.
- Understand SQL result aggregation and grouping.
Concepts
- aggregate function
- collation
- Data Manipulation Language (DML)
- limit
- offset
- paging
- query
- searched case expression
- simple case expression
Language
SQL
||
FETCH FIRST
GROUP BY
HAVING
LIMIT
OFFSET
ORDER BY
UPDATE
Library
java.lang.Math.abs(double a)
java.lang.Math.ceil(double a)
java.lang.Math.exp(double a)
java.lang.Math.floor(double a)
java.lang.Math.log(double a)
java.lang.Math.pow(double a, double b)
java.lang.Math.sqrt(double a)
Lesson
The relational algebra sets out the basic operations one can perform on a relational database. By using some conceptual sequence of operators such as select, rename, and project, one can query the database to find specific information that was stored earlier. As a Data Manipulation Language (DML) SQL provides most of the functionality of the relational algebra, along with its own share of variations, additional features, and quirks for querying and updating data.
Ordering and Paging
The tables SQL queries produce have a fixed column order, even though the relational model recognizes no attribute order in a relation. By default SQL does follow the relational model regarding the returned rows: if no further action is taken, the rows returned by an SQL query can be in any order. Nevertheless it is often necessary for the application to need data in a certain order, such as to list items in increasing order of stock.
You could sort the results after retrieving them, but often it is more efficient to have the database perform sorting. If you do not need all the results, it would moreover be extremely time consuming to transfer thousands of results from a remote database if you intend to discard all but five of them. Modern SQL provides special clauses for sorting the result table rows and limiting those returned to a subset of available results.
ORDER BY
SELECT name, stock
FROM Item
ORDER BY stock;
name | stock |
---|---|
kite | 5 |
doll | 10 |
game table | 12 |
chair | 15 |
table | 20 |
balloon | 99 |
To indicate a specific row order, add a clause ORDER BY expression [ASC | DESC] [NULLS FIRST | NULLS LAST]
to your query. The expression
is normally the name of a column to sort. By default sorting is performed in ascending order, even if ASC
is not indicated; to specify descending order, indicate DESC
. Similarly by default NULL
values are sorted last; to change this, indicate NULLS FIRST
. Thus to list all items by ascending order of stock, use the query SELECT name, stock FROM Item ORDER BY stock
.
For circumstances in which several rows may have the same sorting value, you can provide multiple sorting columns. Imagine that you wish to list items primarily by stock in descending order. With thousands of items in the database, it is likely that several items would have the same stock. Rather than listing items with identical stock values in arbitrary order, you might wish that items with the same stock be sorted secondarily by item name, in ascending order, as shown in the figure.
FETCH FIRST
After getting the results into the order you want, you may not all of them. If you wish to award medals to the first, second, and third place winners, you don't not need to retrieve all the runners who entered the marathon. SQL now provides a standard way to limit the number of rows returned. Although the full syntax has several variations, to simply limit the number of rows you need to use FETCH FIRST count ROWS ONLY
. To query the top three lowest times of marathon runners, you might use SELECT name, duration FROM Runner ORDER BY duration ASC FETCH FIRST 3 ROWS ONLY
.
OFFSET
Whether or not you limit the results, you may not want to start with the first row returned. You can specify the zero-based offset of the returned rows using OFFSET offset ROWS
. Without this clause, a query acts as if it were performed with an offset of 0. If both FETCH FIRST
and OFFSET
clauses are present, the OFFSET
clause must come first. Thus to query all the marathon runners, ignoring the top-three medal winners, you might use SELECT name, duration FROM Runner ORDER BY duration ASC OFFSET 3 ROWS
.
Paging
SELECT name, stock
FROM Item
ORDER BY stock
OFFSET 4 ROWS
FETCH FIRST 2 ROWS ONLY;
name | stock |
---|---|
table | 20 |
balloon | 99 |
Limit and offset commonly work together to perform paging. When you perform an Internet search, only a limited number of matches are returned. You have the option of navigating to the next or previous “page” of results. An independent SQL can retrieve just the results for a particular page. The number of results on each page is the limit, specified by the FETCH FIRST
clause. The offset page is be calculated by multiplying a zero-based page index by the number of results on each page.
Imagine that the user wants to page through the items, ordered by stock, showing two items on each page. If the user navigates to the third page of information, they expect to see items #5 and #6. The limit in this case will be 2. The offset will be the pageIndex × limit = 2 × 2 = 4. The figure on the side shows the query necessary to retrieve only this page of items.
Modifying Data
You already learned how to insert data into a database using SQL. If inserts and queries provide the “create” and “read” functionality of CRUD, there remains the “update” and ”delete” functionality. SQL in fact provides both an UPDATE
and a DELETE
statement just for these purposes. Both of them allow an optional predicate to indicate which rows are to be updated or deleted.
UPDATE
The syntax for updating data using SQL is UPDATE table SET column = value [WHERE predicate]
. Often you only want to update the value for one row, which you can identify in the predicate. Although the predicate is optional, leaving it out will cause the column value to be set to for all rows! This may be useful as well, if you want to to change some common value, or update the rows differently based on some formula.
Imagine that supplier Acme Furniture Company has changed its address from 123 Some Street to 789 Last Lane, and you want to reflect this is in the database. You will need to use a predicate that selects only Acme Furniture Company. You can do this with a WHERE name = 'Acme Furniture Company'
predicate as shown in the example, if you are certain that no other suppliers have the same name. Ideally your program would already know the primary key of this supplier, if the user selected it from a list for instance, allowing you to select it using WHERE id = 1
.
DELETE
The statement for removing rows has the form DELETE FROM table [WHERE predicate]
. As with UPDATE, leaving off the predicate will cause the statement to apply to all rows—which means that all data will be deleted from the table!
If Top Toys Corporation went out of business you could delete that company altogether from the Supplier
table. Ideally your program would already know the primary key of this supplier, allowing you to specify it using WHERE id = 3
rather than by name.
Data Manipulation
Expressions
The SELECT
statement in SQL is versatile. You've already seen how it can perform both the project()
and rename()
operations of the relational algebra. In addition you can place literals, expressions, and/or functions in place of column names to essentially create data from scratch as it were.
Expressions can also be used as values when performing updates. Let's say that you sold a single chair, so you need to reduce that item's stock. This can easily be done with an UPDATE
statement, but you don't know the stock ahead of time.
The correct approach is to indicate the new stock value as an expression, based upon the old stock value. In terms of Java, this is no different than saying stock = stock - 1
. In fact this is exactly the syntax you would use in the SQL UPDATE
statement to reduce the stock of an item.
String Expressions
Strings can be concatenated by using the ||
operator. A typical example would be to display a full name using SELECT firstName || ' ' || lastName FROM Customer
.
Case Expressions
A CASE
expression in SQL allows you to have a sort of conditional logic that returns some value, not necessarily based on some formula, but based on choosing among several input expressions. You could create a CASE
expression to return opposites: if some value is “cold”, return “hot”; otherwise if the value is “short”, return “tall”; etc. Obviously if the list of decisions becomes too long, it would be better to have some sort of lookup table.
There are two forms of the CASE expression. The first one, the aptly named simple case expression, most resembles the form of switch(…) {case …}
in Java because its output is based on a single input expression (similar to what goes inside switch(…)
. The form is CASE input WHEN optionValue1 THEN result1 WHEN optionValue2 THEN result2 … [ELSE defaultResult] END
. When input
is equal to optionValue1
, the result will be result1
; when input
is equal to optionValue2
, the result will be result2
; etc. If none of the options match the input
expression, the defaultResult
is used. If the ELSE
clause is not provided, then NULL
will be used as the default result.
Suppose you wanted to improve the Category
lookup table to provide a description of each category. After adding a new column you would want to fill that column with the appropriate description: “Fun and games” for the toy
category, for example. You would use a normal UPDATE
statement; the value to set, instead of being a literal, would be the result of a CASE
statement based on the category name, as shown in the figure.
The second, more complex form is the searched case expression. In this form no input
is given; rather, each WHEN
clause constitutes an independent condition expression: CASE WHEN condition1 THEN result1 WHEN condition2 THEN result2 … [ELSE defaultResult] END
. As such the searched case expression more resembles a series of if(…) … else if(…) [else …]
statements in Java, the only difference being CASE
returns some value at the end rather than performing some action.
Using a searched case expression you could increase the stock of the items by different amounts. You could add 100 more balloons, double the stock of all items with stock under 20, and leave the stocks of all other other items as they are. As the figure shows, to leave the other stocks as they are you must provide an ELSE
clause that gives the current stock value; otherwise the stock would be set to NULL
if none of the conditions were met, which would produce an error because stock has a NOT NULL
constraint.
Functions
SQL provides many functions for transforming the data returned from queries. As with expression in general, they can be used for updating values as well based upon existing data. The information provided here is an overview of the ANSI/ISO standard. Be sure and check the functions reference for PostgreSQL, Oracle, and H2 as well.
String Functions
Function | Description | Example |
---|---|---|
CHAR[ACTER]_LENGTH(string) | Returns the number of characters in a string. Oracle uses the LENGTH(string) function instead. | SELECT CHAR_LENGTH(name) from Item; |
LOWER(string) | Converts a string to lowercase. See UPPER() . | SELECT LOWER(name) from Category; |
POSITION(substring in string) | Returns the one-based position of the substring in the string, or 0 if the substring doesn't appear. Oracle uses the INSTR(string, substring) function instead. | SELECT * FROM Supplier where POSITION('Street', address) > 0; |
SUBSTRING(string FROM start [FOR length]) | Extracts a portion of a string, starting at the one-based start position, for the given length or the remaining characters if the length is not specified. The first position is 1. Oracle uses the SUBSTR(string, start, [length]) function instead. | SELECT * FROM Supplier where SUBSTRING(name, 1, 3) = 'Top'; |
TRIM([[LEADING | TRAILING | BOTH] [character] FROM] string) | Removes leading and/or trailing occurrences of some character from a string. By default this function trims a string of spaces on each side. | SELECT TRIM(' user input '); |
UPPER(string) | Converts a string to lowercase. See LOWER() . | SELECT UPPER(name) from Category; |
Numeric Functions
Function | Description | Example |
---|---|---|
ABS(number) | Returns the absolute value of a number. Equivalent to Math.abs(double a) . | SELECT ABS(12 - 17); |
CEIL[ING](number) | Rounds up to the next integer. Oracle only supports the CEIL() form. Equivalent to Math.ceil(double a) . See FLOOR() . | SELECT CEIL(5.3); |
EXP(number) | Calculates e to the power of the given number. Euler's number e is approximately 2.71828. Equivalent to Math.exp(double a) . | SELECT EXP(3); |
FLOOR(number) | Rounds down to the next integer. Equivalent to Math.floor(double a) . See CEIL() . | SELECT FLOOR(4.7); |
LN(number) | Calculates the natural logarithm, the power needed to raise e to the given number. Equivalent to Math.log(double a) . | SELECT CEIL(LN(20)); |
MOD(dividend, divider) | Returns the remainder of dividing the dividend by the divider. Equivalent to the Java % operator. | SELECT MOD(17, 5); |
POWER(base, exponent) | Raises the base to a given exponent power. Equivalent to Math.pow(double a, double b) . | SELECT POWER(2, 3); |
SQRT(number) | Returns the square root of a number. Equivalent to Math.sqrt(double a) . | SELECT SQRT(49); |
Aggregate Functions
SELECT name, stock FROM Item;
name | stock |
---|---|
table | 20 |
game table | 12 |
chair | 15 |
balloon | 99 |
kite | 5 |
doll | 10 |
Item
stock.SELECT SUM(stock) FROM Item;
sum |
---|
161 |
While the other functions discussed so far use single values as inputs, such as the value of a column in a single row, an aggregate function performs some calculation using the values of all the rows in a column as input. An example makes this clearer .The statement SELECT stock FROM Item
would produce a table with a row for each item, each indicating the stock of the item. The statement SELECT SUM(stock) FROM Item
, which uses the aggregate function SUM()
, would calculate the sum of all the items.
Because an aggregate function performs a calculation across many rows, producing a single value, it has the effect of collapsing all rows. Thus the statement SELECT SUM(stock), AVG(stock) FROM Item
would not produce a row for each item, but instead produce an output a single row with two columns containing the total stock and average stock of all the rows that would have been produced without the aggregate function.
Function | Description | Example |
---|---|---|
AVG(expression) | Calculates the average of the values in the column given by the expression. | SELECT AVG(stock) FROM Item; |
COUNT(expression) | Calculates the number of rows given by the expression. COUNT(*) may be to count all rows without regard to columns. | SELECT COUNT(*) FROM Item; |
MIN(expression) | Determines the minimum value in the column given by the expression. See also MAX() . | SELECT MIN(stock) FROM Item; |
MAX(expression) | Determines the maximum value in the column given by the expression. See also MIN() . | SELECT MAX(stock) FROM Item; |
SUM(expression) | Calculates the sum of the values in the column given by the expression. | SELECT SUM(stock) FROM Item; |
Grouping
By default an aggregate function performs its calculations over all the rows that would have been generated in the original query, as a single group, collapsing the result into a single row. You may instead want the original output be divided into several groups, and have the aggregate function(s) generate a separate row for each of the groups. For example the statement above, SELECT SUM(stock) FROM Item
, will produce a single row containing the total stock of all the items. You may instead wish to see the total stock for the items from each supplier.
GROUP BY
SELECT supplierId, SUM(stock)
FROM Item
GROUP BY supplierId;
supplierId | sum |
---|---|
2 | 114 |
1 | 47 |
SQL provides a clause GROUP BY columns
to indicate that the result should be divided into groups for purposes of aggregation. The comma-separated column names indicate that each distinct value in that column will be grouped together and collapsed into a single row with the result of the aggregate function.
You could therefore use SELECT SUM(stock) FROM Item GROUP BY supplierId
to find the total stock from each supplier. You would not at this point, however, know which supplier each group was for, so you would probably want to include supplierId
in the output as well: SELECT supplierId, SUM(stock) FROM Item GROUP BY supplierId
.
While the supplierId
might be useful programmatically, it is somewhat opaque to users. To find the name of each supplier based upon the supplier ID, you would have to join the Item
table with the Supplier
table. You might be tempted to do the following:
The problem is that SQL does not know that Supplier.name
will be the same within each group. You may know that suppliers are identified by ID, and because each supplier only has one name, grouping by supplierId
will require each the groups to contain only one supplier name. But SQL is not that clever, and has instead adopted a simpler rule: every column in the SELECT
clause must also appear in the GROUP BY
clause or be an aggregate function.
SELECT Supplier.name, SUM(stock)
FROM Item JOIN Supplier ON supplierId = Supplier.id
GROUP BY Supplier.name;
name | sum |
---|---|
Bozo Toy Company | 114 |
Acme Furniture Company | 47 |
If you are absolutely certain (e.g. you have added a unique constraint to Supplier.name
), then you could instead group by the supplier name, allowing you to include it in the result, as shown in the figure to the side.
Because the Supplier.id
is the supplier's primary key, it would seem more semantically correct to group on this identifier than on the supplier name. Remember that the result of an SQL query is a table. A modification of the previous query, SELECT supplierId, SUM(stock) AS stockSum FROM Item GROUP BY supplierId
, produces a table with the columns supplierId
and stockSum
.
This result table can itself be used as a query within another query to join with another table: specifically the Supplier
table, which contains the names of the suppliers. You would need to alias the grouping table using some name, such as AS Grouping
, so that you could refer to the columns in the final output. How this query-in-a-query works is explained in the Subqueries section below.
HAVING
HAVING
.SELECT Supplier.name, SUM(stock)
FROM Item JOIN Supplier ON supplierId = Supplier.id
GROUP BY Supplier.name
HAVING SUM(stock) > 50.
name | sum |
---|---|
Bozo Toy Company | 114 |
After grouping the items by supplier, calculating the total stock for each supplier, you may want to filter the results to only include suppliers that have items with a total stock over 50. Normally to filter results you would add a WHERE
clause. However the WHERE
predicate is evaluated before the rows are grouped.
The HAVING
clause provides a restriction predicate that is evaluated after grouping occurs. Otherwise HAVING
works just like WHERE
. The example shows the results being filtered on an aggregate function column, but the results could have been filtered on a grouping column instead. It is semantically more correct to group by supplierId
, as shown above, deriving the supplier name for the results using a subquery as detailed in the following Subqueries section.
Review
Gotchas
- For
GROUP BY
queries, SQL does not permit columns in theSELECT
clause that are not aggregate functions and that do not appear in theGROUP BY
clause. - Sorting strings without investigating the collation rules of your database and schema can produce unexpected ordering when used with some languages.
In the Real World
- While SQL uses the term “fetch only” to indicate a maximum number of results, APIs more often simply use the term “limit”.
Think About It
- Is a
HAVING
clause absolutely necessary? How might you effect the same results if SQL did not provide a clause for filteringGROUP BY
results?
Self Evaluation
- If an
ORDER BY
clause doesn't indicate otherwise, will results be sorted in ascending or descending order? WillNULL
values appear at the beginning or at the end? - Why might ordering by a string column produce unexpected results without performing special configurations specific to your database product?
- What is paging of results? What is the formula for calculating the offset for a particular page of information to retrieve?
- What SQL statements correspond to each of the CRUD operations?
- How do you concatenate two strings in SQL?
- What is the central difference between the
CASE
in SQL andswitch(…) {case …}
in Java? - What will a
CASE
expression produce if noWHEN
clause matches and noELSE
clause is provided? - What are two string-related functions for which Oracle uses different function names than the ANSI/ISO standard?
- How does an aggregate function affect the number of rows returned in a query?
- Why is it preferable to use an aggregate function rather than retrieving the results and performing aggregate calculation manually?
- When grouping rows, why must every result column in the
SELECT
clause also appear in theGROUP BY
clause or be an aggregate function?
Task
Continue to improve the Booker repository interface and database implementation.
- Add one or more methods to the
PublicationRepository
interface to allow paged retrieval of publications.- You must provide a JDBC-specific implementation using the appropriate SQL constructs.
- Provide a default implementation in the interface. You may allow the other repository implementations to use the method default implementation if you wish.
- Implement database methods for deleting publications via
PublicationRepository
. - Implement all
PublicationRepository
methods related to stock retrieval and update for the database implementation. - Add a method to
PublicationRepository
for determining the total stock of books for each available publication year.- A map would be an appropriate type to return.
- You must provide a JDBC-specific implementation using the appropriate SQL constructs.
- Provide a default implementation in the interface.
Add two new optional lookup command-line parameters, --offset
and --limit
, to allowing paging from the CLI when listing publications. These parameters can only be used when listing publications.
Example usage: booker list --offset 15 --limit 5
booker list [--debug] [--locale <locale>] [--isbn <ISBN> | --issn <ISSN>] [--name <name>] [--type (book|periodical)] --lookup [--offset <offset>] [--limit <limit>]
booker load-snapshot [--debug] [--locale <locale>]
booker purchase --isbn <ISBN> [--debug] [--locale <locale>]
booker subscribe --issn <ISSN> [--debug] [--locale <locale>]
booker -h | --help
Option | Alias | Description |
---|---|---|
list | Lists all available publications. | |
load-snapshot | Loads the snapshot list of publications into the current repository. | |
purchase | Removes a single copy of the book identified by ISBN from stock. | |
subscribe | Subscribes to a year's worth of issues of the periodical identified by ISSN. | |
--debug | -d | Includes debug information in the logs. |
--help | -h | Prints out a help summary of available switches. |
--isbn | Identifies a book, for example for the purchase command. | |
--issn | Identifies a periodical, for example for the subscribe command. | |
--limit | The maximum number of publications to include when listing. | |
--locale | -l | Indicates the locale to use in the program, overriding the system default. The value is in language tag format. |
--lookup | Retrieves from the Internet information on a book identified by its ISBN. | |
--name | -n | Indicates a filter by name for the list command. |
--offset | The zero-based paging offset to use when listing; i.e. the number of publications to skip. | |
--type | -t | Indicates the type of publication to list, either book or periodical . If not present, all publications will be listed. |
See Also
- Learning SQL, Second Edition (Alan Beaulieu - O'Reilly, 2009)
- Using OFFSET and FETCH with the ORDER BY clause (Essential SQL)
- jOOQ User Manual: The LIMIT .. OFFSET clause
- SQL 101, Part 5: An Order of Sorts (Melanie Caffrey - SQL 101, Oracle Magazine)
- SQL 101, Part 6: A Function of Character (Melanie Caffrey - SQL 101, Oracle Magazine)
- SQL 101, Part 7: From Floor to Ceiling and Other Functional Cases (Melanie Caffrey - SQL 101, Oracle Magazine)
- SQL 101, Part 9: Having Sums, Averages, and Other Grouped Data (Melanie Caffrey - SQL 101, Oracle Magazine)
References
- ISO 9075-1:2011: SQL/Framework (free)
- ISO 9075-2:2011: SQL/Foundation
- PostresQL Functions and Operators
- Oracle Functions
- H2 Functions
Acknowledgments
- Joe Celko's SQL for Smarties, Fifth Edition (Joe Celko - Morgan Kaufmann, 2014)
- SQL:1999 (Jim Melton, Alan Simon - Morgan Kaufmann, 2001)
- SQL Antipatterns (Bill Karwin, The Pragmatic Bookshelf, 2010)
- SQL in a Nutshell, Third Edition (Kevin Kline - O'Reilly, 2008)
- Some symbols are from Font Awesome by Dave Gandy.