SQL DML

Goals

Concepts

Language

SQL

Library

Lesson

The relational algebra sets out the basic operations one can perform on a relational database. By using some conceptual sequence of operators such as select, rename, and project, one can query the database to find specific information that was stored earlier. As a Data Manipulation Language (DML) SQL provides most of the functionality of the relational algebra, along with its own share of variations, additional features, and quirks for querying and updating data.

Ordering and Paging

The tables SQL queries produce have a fixed column order, even though the relational model recognizes no attribute order in a relation. By default SQL does follow the relational model regarding the returned rows: if no further action is taken, the rows returned by an SQL query can be in any order. Nevertheless it is often necessary for the application to need data in a certain order, such as to list items in increasing order of stock.

You could sort the results after retrieving them, but often it is more efficient to have the database perform sorting. If you do not need all the results, it would moreover be extremely time consuming to transfer thousands of results from a remote database if you intend to discard all but five of them. Modern SQL provides special clauses for sorting the result table rows and limiting those returned to a subset of available results.

ORDER BY

Listing items by lowest stock first.
SELECT name, stock
  FROM Item
  ORDER BY stock;
name stock
kite 5
doll 10
game table 12
chair 15
table 20
balloon 99

To indicate a specific row order, add a clause ORDER BY expression [ASC | DESC] [NULLS FIRST | NULLS LAST] to your query. The expression is normally the name of a column to sort. By default sorting is performed in ascending order, even if ASC is not indicated; to specify descending order, indicate DESC. Similarly by default NULL values are sorted last; to change this, indicate NULLS FIRST. Thus to list all items by ascending order of stock, use the query SELECT name, stock FROM Item ORDER BY stock.

For circumstances in which several rows may have the same sorting value, you can provide multiple sorting columns. Imagine that you wish to list items primarily by stock in descending order. With thousands of items in the database, it is likely that several items would have the same stock. Rather than listing items with identical stock values in arbitrary order, you might wish that items with the same stock be sorted secondarily by item name, in ascending order, as shown in the figure.

Sorting items primarily by stock and secondarily by name.
SELECT name, stock FROM Item
  ORDER BY stock DESC, name ASC;

FETCH FIRST

After getting the results into the order you want, you may not all of them. If you wish to award medals to the first, second, and third place winners, you don't not need to retrieve all the runners who entered the marathon. SQL now provides a standard way to limit the number of rows returned. Although the full syntax has several variations, to simply limit the number of rows you need to use FETCH FIRST count ROWS ONLY. To query the top three lowest times of marathon runners, you might use SELECT name, duration FROM Runner ORDER BY duration ASC FETCH FIRST 3 ROWS ONLY.

OFFSET

Whether or not you limit the results, you may not want to start with the first row returned. You can specify the zero-based offset of the returned rows using OFFSET offset ROWS. Without this clause, a query acts as if it were performed with an offset of 0. If both FETCH FIRST and OFFSET clauses are present, the OFFSET clause must come first. Thus to query all the marathon runners, ignoring the top-three medal winners, you might use SELECT name, duration FROM Runner ORDER BY duration ASC OFFSET 3 ROWS.

Paging

Listing third page of two items ordered by stock.
SELECT name, stock
  FROM Item
  ORDER BY stock
  OFFSET 4 ROWS
  FETCH FIRST 2 ROWS ONLY;
name stock
table 20
balloon 99

Limit and offset commonly work together to perform paging. When you perform an Internet search, only a limited number of matches are returned. You have the option of navigating to the next or previous “page” of results. An independent SQL can retrieve just the results for a particular page. The number of results on each page is the limit, specified by the FETCH FIRST clause. The offset page is be calculated by multiplying a zero-based page index by the number of results on each page.

Imagine that the user wants to page through the items, ordered by stock, showing two items on each page. If the user navigates to the third page of information, they expect to see items #5 and #6. The limit in this case will be 2. The offset will be the pageIndex × limit = 2 × 2 = 4. The figure on the side shows the query necessary to retrieve only this page of items.

Modifying Data

You already learned how to insert data into a database using SQL. If inserts and queries provide the “create” and “read” functionality of CRUD, there remains the “update” and ”delete” functionality. SQL in fact provides both an UPDATE and a DELETE statement just for these purposes. Both of them allow an optional predicate to indicate which rows are to be updated or deleted.

UPDATE

The syntax for updating data using SQL is UPDATE table SET column = value [WHERE predicate]. Often you only want to update the value for one row, which you can identify in the predicate. Although the predicate is optional, leaving it out will cause the column value to be set to for all rows! This may be useful as well, if you want to to change some common value, or update the rows differently based on some formula.

Changing the address of a supplier.
UPDATE Supplier
  SET address = '789 Last Lane'
  WHERE name = 'Acme Furniture Company';

Imagine that supplier Acme Furniture Company has changed its address from 123 Some Street to 789 Last Lane, and you want to reflect this is in the database. You will need to use a predicate that selects only Acme Furniture Company. You can do this with a WHERE name = 'Acme Furniture Company' predicate as shown in the example, if you are certain that no other suppliers have the same name. Ideally your program would already know the primary key of this supplier, if the user selected it from a list for instance, allowing you to select it using WHERE id = 1.

DELETE

The statement for removing rows has the form DELETE FROM table [WHERE predicate]. As with UPDATE, leaving off the predicate will cause the statement to apply to all rows—which means that all data will be deleted from the table!

If Top Toys Corporation went out of business you could delete that company altogether from the Supplier table. Ideally your program would already know the primary key of this supplier, allowing you to specify it using WHERE id = 3 rather than by name.

Removing a supplier.
DELETE FROM Supplier
  WHERE name = 'Top Toys Corporation';

Data Manipulation

Expressions

The SELECT statement in SQL is versatile. You've already seen how it can perform both the project() and rename() operations of the relational algebra. In addition you can place literals, expressions, and/or functions in place of column names to essentially create data from scratch as it were.

Including literal expressions in a SELECT statement.
SELECT 'Hello, World!';
SELECT 1 + 2;
SELECT SQRT(25);
SELECT CURRENT_TIMESTAMP;

Expressions can also be used as values when performing updates. Let's say that you sold a single chair, so you need to reduce that item's stock. This can easily be done with an UPDATE statement, but you don't know the stock ahead of time.

Reducing the stock of an item.
UPDATE Item
  SET stock = stock - 1
  WHERE name = 'chair';

The correct approach is to indicate the new stock value as an expression, based upon the old stock value. In terms of Java, this is no different than saying stock = stock - 1. In fact this is exactly the syntax you would use in the SQL UPDATE statement to reduce the stock of an item.

String Expressions

Strings can be concatenated by using the || operator. A typical example would be to display a full name using SELECT firstName || ' ' || lastName FROM Customer.

Case Expressions

A CASE expression in SQL allows you to have a sort of conditional logic that returns some value, not necessarily based on some formula, but based on choosing among several input expressions. You could create a CASE expression to return opposites: if some value is “cold”, return “hot”; otherwise if the value is “short”, return “tall”; etc. Obviously if the list of decisions becomes too long, it would be better to have some sort of lookup table.

There are two forms of the CASE expression. The first one, the aptly named simple case expression, most resembles the form of switch(…) {case …} in Java because its output is based on a single input expression (similar to what goes inside switch(…). The form is CASE input WHEN optionValue1 THEN result1 WHEN optionValue2 THEN result2 … [ELSE defaultResult] END. When input is equal to optionValue1, the result will be result1; when input is equal to optionValue2, the result will be result2; etc. If none of the options match the input expression, the defaultResult is used. If the ELSE clause is not provided, then NULL will be used as the default result.

Adding category descriptions using a simple case expression.
ALTER TABLE Category
  ADD description VARCHAR(9999);
UPDATE Category
  SET description =
    CASE name
      WHEN 'furniture' THEN 'Stuff for sitting and setting.'
      WHEN 'toy' THEN 'Fun and games.'
    END;

Suppose you wanted to improve the Category lookup table to provide a description of each category. After adding a new column you would want to fill that column with the appropriate description: “Fun and games” for the toy category, for example. You would use a normal UPDATE statement; the value to set, instead of being a literal, would be the result of a CASE statement based on the category name, as shown in the figure.

The second, more complex form is the searched case expression. In this form no input is given; rather, each WHEN clause constitutes an independent condition expression: CASE WHEN condition1 THEN result1 WHEN condition2 THEN result2 … [ELSE defaultResult] END. As such the searched case expression more resembles a series of if(…) … else if(…) [else …] statements in Java, the only difference being CASE returns some value at the end rather than performing some action.

Updating stocks using a searched case expression.
UPDATE Item
  SET stock =
    CASE
      WHEN name = 'balloon' THEN 100
      WHEN stock < 20 THEN stock * 2
      ELSE stock
    END;

Using a searched case expression you could increase the stock of the items by different amounts. You could add 100 more balloons, double the stock of all items with stock under 20, and leave the stocks of all other other items as they are. As the figure shows, to leave the other stocks as they are you must provide an ELSE clause that gives the current stock value; otherwise the stock would be set to NULL if none of the conditions were met, which would produce an error because stock has a NOT NULL constraint.

Functions

SQL provides many functions for transforming the data returned from queries. As with expression in general, they can be used for updating values as well based upon existing data. The information provided here is an overview of the ANSI/ISO standard. Be sure and check the functions reference for PostgreSQL, Oracle, and H2 as well.

String Functions
Function Description Example
CHAR[ACTER]_LENGTH(string) Returns the number of characters in a string. Oracle uses the LENGTH(string) function instead. SELECT CHAR_LENGTH(name) from Item;
LOWER(string) Converts a string to lowercase. See UPPER(). SELECT LOWER(name) from Category;
POSITION(substring in string) Returns the one-based position of the substring in the string, or 0 if the substring doesn't appear. Oracle uses the INSTR(string, substring) function instead. SELECT * FROM Supplier where POSITION('Street', address) > 0;
SUBSTRING(string FROM start [FOR length]) Extracts a portion of a string, starting at the one-based start position, for the given length or the remaining characters if the length is not specified. The first position is 1. Oracle uses the SUBSTR(string, start, [length]) function instead. SELECT * FROM Supplier where SUBSTRING(name, 1, 3) = 'Top';
TRIM([[LEADING | TRAILING | BOTH] [character] FROM] string) Removes leading and/or trailing occurrences of some character from a string. By default this function trims a string of spaces on each side. SELECT TRIM('   user input  ');
UPPER(string) Converts a string to lowercase. See LOWER(). SELECT UPPER(name) from Category;
Numeric Functions
Function Description Example
ABS(number) Returns the absolute value of a number. Equivalent to Math.abs(double a). SELECT ABS(12 - 17);
CEIL[ING](number) Rounds up to the next integer. Oracle only supports the CEIL() form. Equivalent to Math.ceil(double a). See FLOOR(). SELECT CEIL(5.3);
EXP(number) Calculates e to the power of the given number. Euler's number e is approximately 2.71828. Equivalent to Math.exp(double a). SELECT EXP(3);
FLOOR(number) Rounds down to the next integer. Equivalent to Math.floor(double a). See CEIL(). SELECT FLOOR(4.7);
LN(number) Calculates the natural logarithm, the power needed to raise e to the given number. Equivalent to Math.log(double a). SELECT CEIL(LN(20));
MOD(dividend, divider) Returns the remainder of dividing the dividend by the divider. Equivalent to the Java % operator. SELECT MOD(17, 5);
POWER(base, exponent) Raises the base to a given exponent power. Equivalent to Math.pow(double a, double b). SELECT POWER(2, 3);
SQRT(number) Returns the square root of a number. Equivalent to Math.sqrt(double a). SELECT SQRT(49);
Aggregate Functions
Listing all the stock of all items.
SELECT name, stock FROM Item;
name stock
table 20
game table 12
chair 15
balloon 99
kite 5
doll 10
Calculating the sum of Item stock.
SELECT SUM(stock) FROM Item;
sum
161

While the other functions discussed so far use single values as inputs, such as the value of a column in a single row, an aggregate function performs some calculation using the values of all the rows in a column as input. An example makes this clearer .The statement SELECT stock FROM Item would produce a table with a row for each item, each indicating the stock of the item. The statement SELECT SUM(stock) FROM Item, which uses the aggregate function SUM(), would calculate the sum of all the items.

Because an aggregate function performs a calculation across many rows, producing a single value, it has the effect of collapsing all rows. Thus the statement SELECT SUM(stock), AVG(stock) FROM Item would not produce a row for each item, but instead produce an output a single row with two columns containing the total stock and average stock of all the rows that would have been produced without the aggregate function.

Function Description Example
AVG(expression) Calculates the average of the values in the column given by the expression. SELECT AVG(stock) FROM Item;
COUNT(expression) Calculates the number of rows given by the expression. COUNT(*) may be to count all rows without regard to columns. SELECT COUNT(*) FROM Item;
MIN(expression) Determines the minimum value in the column given by the expression. See also MAX(). SELECT MIN(stock) FROM Item;
MAX(expression) Determines the maximum value in the column given by the expression. See also MIN(). SELECT MAX(stock) FROM Item;
SUM(expression) Calculates the sum of the values in the column given by the expression. SELECT SUM(stock) FROM Item;

Grouping

By default an aggregate function performs its calculations over all the rows that would have been generated in the original query, as a single group, collapsing the result into a single row. You may instead want the original output be divided into several groups, and have the aggregate function(s) generate a separate row for each of the groups. For example the statement above, SELECT SUM(stock) FROM Item, will produce a single row containing the total stock of all the items. You may instead wish to see the total stock for the items from each supplier.

GROUP BY

Calculating total and average stock by category.
SELECT supplierId, SUM(stock)
  FROM Item
  GROUP BY supplierId;
supplierId sum
2 114
1 47

SQL provides a clause GROUP BY columns to indicate that the result should be divided into groups for purposes of aggregation. The comma-separated column names indicate that each distinct value in that column will be grouped together and collapsed into a single row with the result of the aggregate function.

You could therefore use SELECT SUM(stock) FROM Item GROUP BY supplierId to find the total stock from each supplier. You would not at this point, however, know which supplier each group was for, so you would probably want to include supplierId in the output as well: SELECT supplierId, SUM(stock) FROM Item GROUP BY supplierId.

While the supplierId might be useful programmatically, it is somewhat opaque to users. To find the name of each supplier based upon the supplier ID, you would have to join the Item table with the Supplier table. You might be tempted to do the following:

Incorrectly including columns when grouping.
-- ERROR:  column "supplier.name" must appear in the GROUP BY clause or be used in an aggregate function
SELECT Supplier.name, SUM(stock) FROM Item JOIN Supplier ON supplierId = Supplier.id GROUP BY supplierId;

The problem is that SQL does not know that Supplier.name will be the same within each group. You may know that suppliers are identified by ID, and because each supplier only has one name, grouping by supplierId will require each the groups to contain only one supplier name. But SQL is not that clever, and has instead adopted a simpler rule: every column in the SELECT clause must also appear in the GROUP BY clause or be an aggregate function.

Improved grouping with supplier name in result.
SELECT Supplier.name, SUM(stock)
  FROM Item JOIN Supplier ON supplierId = Supplier.id
  GROUP BY Supplier.name;
name sum
Bozo Toy Company 114
Acme Furniture Company 47

If you are absolutely certain (e.g. you have added a unique constraint to Supplier.name), then you could instead group by the supplier name, allowing you to include it in the result, as shown in the figure to the side.

Because the Supplier.id is the supplier's primary key, it would seem more semantically correct to group on this identifier than on the supplier name. Remember that the result of an SQL query is a table. A modification of the previous query, SELECT supplierId, SUM(stock) AS stockSum FROM Item GROUP BY supplierId, produces a table with the columns supplierId and stockSum.

Using a subquery to provide names for the supplier groups.
SELECT Supplier.name, SupplierGroup.stockSum
  FROM (
    SELECT supplierId, SUM(stock) AS stockSum
      FROM Item GROUP BY supplierId
  ) AS SupplierGroup
  JOIN Supplier ON SupplierGroup.supplierId = Supplier.id;

This result table can itself be used as a query within another query to join with another table: specifically the Supplier table, which contains the names of the suppliers. You would need to alias the grouping table using some name, such as AS Grouping, so that you could refer to the columns in the final output. How this query-in-a-query works is explained in the Subqueries section below.

HAVING

Filtering after grouping using HAVING.
SELECT Supplier.name, SUM(stock)
  FROM Item JOIN Supplier ON supplierId = Supplier.id
  GROUP BY Supplier.name
  HAVING SUM(stock) > 50.
name sum
Bozo Toy Company 114

After grouping the items by supplier, calculating the total stock for each supplier, you may want to filter the results to only include suppliers that have items with a total stock over 50. Normally to filter results you would add a WHERE clause. However the WHERE predicate is evaluated before the rows are grouped.

The HAVING clause provides a restriction predicate that is evaluated after grouping occurs. Otherwise HAVING works just like WHERE. The example shows the results being filtered on an aggregate function column, but the results could have been filtered on a grouping column instead. It is semantically more correct to group by supplierId, as shown above, deriving the supplier name for the results using a subquery as detailed in the following Subqueries section.

Review

Gotchas

In the Real World

Think About It

Self Evaluation

Task

Continue to improve the Booker repository interface and database implementation.

Add two new optional lookup command-line parameters, --offset and --limit, to allowing paging from the CLI when listing publications. These parameters can only be used when listing publications.

Example usage: booker list --offset 15 --limit 5

Option Alias Description
list Lists all available publications.
load-snapshot Loads the snapshot list of publications into the current repository.
purchase Removes a single copy of the book identified by ISBN from stock.
subscribe Subscribes to a year's worth of issues of the periodical identified by ISSN.
--debug -d Includes debug information in the logs.
--help -h Prints out a help summary of available switches.
--isbn Identifies a book, for example for the purchase command.
--issn Identifies a periodical, for example for the subscribe command.
--limit The maximum number of publications to include when listing.
--locale -l Indicates the locale to use in the program, overriding the system default. The value is in language tag format.
--lookup Retrieves from the Internet information on a book identified by its ISBN.
--name -n Indicates a filter by name for the list command.
--offset The zero-based paging offset to use when listing; i.e. the number of publications to skip.
--type -t Indicates the type of publication to list, either book or periodical. If not present, all publications will be listed.

See Also

References

Acknowledgments