SQL
Goals
- Understand how SQL is used to program relational databases.
- Be introduced to the standard SQL types and how they are implemented on some popular RDBMS products.
- Learn how to create tables and add constraints using SQL DDL statements.
- Learn how to populate tables and view their contents using SQL DML statements.
- Experience setting up a remote PostgreSQL database on Heroku.
Concepts
- American National Standards Institute (ANSI)
- approximate numeric
- Data Definition Language (DDL)
- Data Manipulation Language (DML)
- day-time interval
- declarative
- delimited identifier
- dialect
- exact numeric
- Heroku
- identity column
- International Organization for Standardization (ISO)
- pad
- platform as a service (PaaS)
- precision
- regular identifier
- relational algebra
- scale
- sequence
- Structured Query Language (SQL)
- year-month interval
Language
SQL
ALTER TABLE
CREATE TABLE
INSERT INTO
SELECT
Library
Java
java.time.Duration
java.time.Instant
java.time.LocalDate
java.time.LocalTime
java.time.OffsetDateTime
java.time.OffsetTime
java.time.Period
Preparation
- Download and install PostgreSQL.
- Use the installer from EnterpriseDB if given a choice for your platform.
- Install PostgreSQL version 10 or above.
- You will only need the PostgreSQL command line tools for this lesson.
- Add the PostgreSQL
bin/
directory to your systemPATH
if you wish to access the CLI from the OS command line. - You may decide to also install the GUI front-end pgAdmin to use once you have mastered the command-line interface.
- You may install the full PostgreSQL server for local testing if you desire.
- Sign up for a free Herokuaccount.
Lesson
The relational database model, invented by E. F Codd while working at IBM, has been the workhorse of data storage for decades. To access data in its relational database prototypes, the company created a language named SEQUEL for “Structured English Query Language”. Eventually this language became the Structured Query Language (SQL), which is now the most popular form of accessing relational databases. SQL is proving more popular than the relational model itself: even some newer “NoSQL” databases (which you will learn about in future lessons) are providing front-end languages based on SQL.
As relational database products proliferated, vendors adopted different forms of SQL. In an attempt to bring some unification to the various implementations, the American National Standards Institute (ANSI) in the 1980s developed a standard for SQL. Over the years ANSI and the International Organization for Standardization (ISO) have continued expanding and refining the language, releasing a series of standards from SQL-87 to the 15-part SQL:2016 (ISO/IEC 9075:2016). The core language features were already well-defined in the SQL:1999 and SQL:2003 versions, and these lessons will concentrate on those releases. This level of SQL compliance is sometimes referred to as “SQL3”, which was the unofficial name of SQL:1999 while it was being developed.
Declarative Processing
You will remember that relational database theory is is based on sets, and mathematical set theory comes with its own functions for combining sets and and extracting information from them. SQL is therefore a declarative language; rather than indicate processing steps to perform, you provide a definitional statement of which data you want to retrieve. The database system itself will determine the steps it needs to take to perform the request. In many ways SQL programming is like functional programming; you indicate what needs to be done, not how to do it.
In SQL you will describe which sets of information you desire, in which ways they will be combined, and what filters will be used in returning the final result. You will not write code to iterate through the tuples. In SQL, and in the relational model in general, you manipulate data, not on a per-variable or per-tuple basis, but in the unit of the entire set of data in a relation.
Relational Algebra
In the relational model, each relation is a set of tuples—a read-only value, just like an immutable set in Java. When you perform operations on a database, conceptually you use what is called the relational algebra to perform operations on the relations. The relational algebra is based upon set theory, and has similar operations as you might perform with sets, such as “union”.
An important aspect of the relational algebra is that all of its operations produce new relations! Similar to a Java stream processing, the relations produced by the relational algebra can be used in operations themselves. Although SQL does not completely implement a relational model with a pure relational algebra, you will understand the better what is happening and write better code if you think in term of the relational model and relational algebra operations.
Statement Classes
Beyond the day to day storage and retrieval of data, there are other tasks to perform on a database, such as defining the database schema. Legacy database specifications have used the term Data Definition Language (DDL) to refer to those commands related to creating and updating the schema, and Data Manipulation Language (DML) for CRUD operations such as storing data. Another category, Data Control Language (DCL), refers to commands that set the permissions and user access levels. DCL commands are usually highly specific to the database product being used. These “language” designations have long been used as a way to divide SQL commands or statements into categories.
Starting with SQL:1999, the standard classifies SQL statements by function. Here are some of these SQL statement categories.
- SQL schema statements
- Statements to define the data schema. Roughly corresponds to the DDL category.
- SQL data statements
- Statements for storing and retrieving data. Roughly corresponds to the DML category.
- SQL transaction statements
- Statements that control the atomicity of a group of statements.
- SQL connection statements
- Statements for connecting to a database server.
- SQL session statements
- Statements for controlling the parameters for interaction with a database server.
Identifiers
Unlike Java and most other modern programming languages,SQL is case-insensitive. You can enter SQL statements in uppercase or lowercase, or even mixed case. The same goes for the names of database objects such as tables; these regular identifiers are case-insensitive as well. Your table can be named FooBar
or foobar
or FOOBAR
, and SQL won't notice the difference.
SQL also allows delimited identifiers, sometimes referred to as “quoted identifiers”, which are case sensitive. If you surround an identifier with quotation mark "
characters, SQL will maintain its case and moreover will consider case significant when referring to the identifier. If you create a table named "FooBar"
, for example, SQL would not find the table if you referred to "foobar"
or "FOOBAR"
. Note that these case sensitivity rules apply to identifiers; the comparison of actual string values are case sensitive unless you request otherwise.
NULL
SQL provides a special value called NULL
that is best considered not a value at all. The NULL
in SQL is different than the null
in Java, the latter of which you can compare as if you would any other value. In SQL NULL
is neither equal to nor not equal to any value! This means that NULL
does not even equal NULL
in SQL! Dealing with NULL in SQL requires special handling; avoid it as much as possible. Dealing with SQL's peculiar NULL
logic will be addressed in a future lesson.
Types
SQL indicates several predefined types as general categories. Unfortunately the compliance of the different dialects is very diverse, each providing additional variations. It is therefore difficult to choose types that have the same names and work identically across products.
Numeric
The ISO SQL standard divides number types into exact numeric, and approximate numeric. By “approximate” SQL means that the implementation will use a format, probably IEEE 754, which cannot exactly represent all decimal values. In other words, “approximate numeric” in practice indicates the use of floating point numbers.
All the numeric types have a precision, the total number of digits the number can have. Exact numbers also have a scale, the maximum number of digits that can appear after the decimal point.
Character String
Strings in SQL usually allow you to indicate some maximum size. SQL provides two types of string in general: fixed-length strings and variable-length strings. For fixed-length strings, the database will pad the string by adding trailing spaces to ensure that the string is as long as the size specified.
In SQL a string literal is placed inside apostrophe (single quote) characters, such as 'foo bar'
, unlike Java which uses double quotes. If you need to include an actual apostrophe in a string, such as “Foo's Bar”, there is no escape characters; you simply repeat the apostrophe character in the middle of the string literal, producing for example 'Foo''s Bar'
.
Boolean
A Boolean type represents TRUE
or false
. Boolean operations in SQL, are more confusing because the presence of NULL
can produce an UNKNOWN
outcome, which will be discussed in a future lesson. SQL actually allows the UNKNOWN value unless there is a not-null constraint, although few products implement this part of the specification.
Datetime
The ISO SQL specification provides supports for several temporal types, but not as many as Java. The DATE
and TIME
types correspond to java.time.LocalDate
and java.time.LocalTime
, respectively. The TIMESTAMP type corresponds to a Java java.time.Instant
. But the names of the TIME WITH TIME ZONE
and TIMESTAMP WITH TIME ZONE
types are misleading, as they refer to time zone offsets; they thus correspond to java.time.OffsetTime
and java.time.OffsetDateTime
, respectively.
Interval
To express differences in time, SQL has two groups of interval types. A year-month interval refers to an interval expressed in years and/or months—a date-based interval. This is similar in concept to java.time.Period
. A day-time interval refers to some number of days, hour, minutes, and/or seconds—a time-based interval. This is similar in concept to java.time.Duration
. In addition to storing intervals, subtracting two dates or times produces an interval of the appropriate type.
The SQL specification allows intervals to be specified in various ways, but the simplest is to use INTERVAL
followed by a qualifier indicating YEAR
, MONTH
, DAY
, HOUR
, MINUTE
or SECOND
. The qualifier allows a precision to be indicated in parentheses; if not present the precision defaults to 2
except for SECOND
, which defaults to a precision of 6
. Lastly the qualifier can be followed by TO
and another qualifier to specify granularity. For example INTERVAL YEAR(3) TO MONTH
indicates an interval of up to 999 years and some months.
Category | Type | Description | PostgreSQL | Oracle | H2 |
---|---|---|---|---|---|
Exact Numeric | INTEGER / INT | Represents an integer value. | INTEGER / INT |
| INTEGER / INT |
SMALLINT | Integer with smaller precision and scale of zero. | SMALLINT | SMALLINT | ||
BIGINT | Integer with larger precision and scale of zero. | BIGINT | BIGINT | ||
NUMERIC[(precision[, scale])] | Exact decimal representation, with defined precision and scale. |
|
| ||
DECIMAL[(precision[, scale])] / DEC[(precision[, scale])] | May have higher precision than requested. In many database products, DECIMAL is merely a synonym for NUMERIC . | ||||
Approximate Numeric | REAL | Single-precision floating-point; usually IEEE 754. Corresponds to Java float . | REAL | BINARY_FLOAT | REAL |
DOUBLE PRECISION | Double-precision floating-point; usually IEEE 754. Corresponds to Java double . | DOUBLE PRECISION | BINARY_DOUBLE | DOUBLE PRECISION | |
FLOAT[(precision)] | Arbitrary-precision floating-point; usually IEEE 754. Seldom used in real life. |
|
| FLOAT | |
Character String | CHARACTER[(size)] ; CHAR[(size)] | Fixed-size string, padded with trailing spaces. |
|
|
|
CHARACTER VARYING(size) / CHAR VARYING(size) / VARCHAR(size) | Variable-length string with maximum size specified. |
1GB if unspecified.
|
4000 , or 32767 if MAX_STRING_SIZE = EXTENDED in v12.1.
|
2147483647 .
| |
Boolean | BOOLEAN | Represents either TRUE or FALSE . | BOOLEAN / BOOL | N/A | BOOLEAN / BOOL |
Datetime | DATE | A local date as marked on a calendar. Equivalent to java.time.LocalDate . | DATE | DATE | DATE |
TIME | A local time as appears on a clock. Equivalent to java.time.LocalTime . | TIME | N/A | TIME | |
TIME WITH TIME ZONE | A time with a zone offset. Equivalent to java.time.OffsetTime . | TIME WITH TIME ZONE | N/A | N/A | |
TIMESTAMP | An instant in time in terms of UTC. Equivalent to java.time.Instant . | TIMESTAMP | TIMESTAMP | TIMESTAMP | |
TIMESTAMP WITH TIME ZONE | The date and time in some local zone, designated by an offset from UTC. Equivalent to java.time.OffsetDateTime . | TIMESTAMP WITH TIME ZONE | TIMESTAMP WITH TIME ZONE | TIMESTAMP WITH TIME ZONE | |
Year-Month Interval | INTERVAL YEAR[(precision)] TO MONTH | A date-based interval. Similar to java.time.Period . |
|
| N/A |
Day-Time Interval | INTERVAL DAY[(precision)] TO SECOND | A time-based interval. Similar to java.time.Period . |
|
| N/A |
SQL Schema Statements
Before getting to the point where you can actually store and retrieve information, you must first provide the database with a definition of your tables and other schema items using SQL DDL statements.
CREATE TABLE
Primary in the schema statement category is the CREATE TABLE
statement. It has the general form CREATE TABLE table (elements)
as shown in the example below.
IDENTITY
SQL:2003 introduced a standard way to create a column for a surrogate key that automatically increments when rows are added to a table. An identity column can be useful because it the database takes responsibility for atomically generating unique surrogate key values. Previous solutions used proprietary types such as “serial”. Another solution was to use a sequence, a named counter that exists separate from a table. Identity columns are available in PostgreSQL 10, Oracle 12.1, and above.
ALTER TABLE
If you need to modify a table later, such as redefining columns or even adding new ones, you can use the ALTER TABLE
statement. Unfortunately the syntax for altering a table varies among database products.
ALTER TABLE table ADD CONSTRAINT
One of the most common uses for the ALTER TABLE
statement is to add a constraint after a table has already been created. Some developers prefer to add all constraints separately; this is for the most part a matter of style.
SQL Data Statements
After defining your tables, you can populate and then query them using SQL DML statements.
INSERT INTO
The SQL statement for inserting data is straightforward, most commonly taking the form INSERT INTO table [(columns)] VALUES (values)
. The columns
is a comma-separated list of common names; the values
are the values to insert, each corresponding to a named column.
SELECT
The main statement for searching and retrieving data from a database is SELECT
. The simplest form is SELECT columns FROM table
, which lists all rows in the indicated table. The comma-separated columns
indicate which columns should be included. For columns
you may use the asterisk *
character as a wildcard to include all columns.
PostgreSQL
As you can see from the above table, the open-source database PosgreSQL is PostgreSQL is highly compliant with the ANSI/ISO SQL standard. The current product started as POSTGRES in the 1980s, but its roots go back to even earlier projects. PostreSQL, often still referred to as “Postgres”, has evolved with the standards and offers many plugins and advanced features not available on other databases.
These lessons will use PostgreSQL in a client/server configuration, as might typically be found in an enterprise setting. Because of this configuration, installing a local database is optional. At the moment you will only need the command-line tool, but you may wish to install other tools such as the GUI front-end pgAdmin for future use.
- Download and install PostgreSQL.
- Use the installer from EnterpriseDB if given a choice for your platform.
- You will only need the PostgreSQL command line tools for this lesson.
- Add the PostgreSQL
bin/
directory to your systemPATH
if you wish to access the CLI from the OS command line.
PostgreSQL provides several organizing structures at a higher level than tables.
- schema
- As with standard SQL, PostgreSQL uses the word “schema” to refer to a collection of tables and related objects.
- database
- PosgreSQL refers to a collection of schemas as a “database”. This corresponds to what ANSI/ISO SQL calls a “catalog”. PostgreSQL uses the term “catalog” simply to refer to a system schema.
PostgreSQL provides two system schemas holding metadata.
- INFORMATION_SCHEMA
- Metadata about the database as prescribed by SQL specification.
- PG_CATALOG
- Provides more extensive database metadata and PostgreSQL system information.
Heroku
The service Heroku is an entire online platform for deploying your Internet applications, which Heroku calls “apps”. As a cloud platform as a service (PaaS), Heroku allows each app add and remove virtual server container processes, each called a “dyno” because it is dynamically provisioned. The applications code itself is deployed using Git.
Each Heroku app allows “add-ons”, which are extra features or tools. Most important for the purposes of this lesson, one add-on is Heroku Postgres, a live deployment of PostgreSQL you can access securely over the Internet. Moreover Heroku's Hobby Dev plan is completely free, supporting a limited number of connections.
- If you don't already have an account, go to Heroku and sign up.
- Go to the Heroku dashboard to see your apps.
- Select Create New App.
- Choose a region for the dyno and select Create app. If you do not specify an app name, Heroku will assign one for you using a series of words and numbers, for example
infinite-wildwood-58236
. - Go back to your apps and select the app you just created.
- On the dashboard of the new app, select Open app.
- On the Overview tab, select Configure Add-ons.
- Under Add-ons, type “Postgress” in the search field and select Heroku Postgres from the options that appear.
- Select the appropriate plan. The Hobby Dev — Free plan should be sufficient for this course.
- Select Provision and the
heroku-postgresql
add-on will be added to your app. - Select the Heroku Postgres :: Database that appears in the list of installed add-ons to bring up the
heroku-postgresql
dashboard. Verify that the PostgreSQL version is10
or above. - On the Settings tab, select View Credentials…. The properties that appear are the parameters you will need to connect to your new database.
psql
psql
is the official command-line interface for PostgreSQL and is included in the distribution. Start the program by entering psql
on the command line. Make sure you have added the PosgreSQL installation directory on the system PATH
environment variable, as indicated above.
There are several important psql
command-line options you should be familiar with:
--help | -?
- Lists the available command-line options.
--version | -V
- Shows the version of the
psql
utility. --host | -h
- Indicates the host on which the database server is running. Defaults to the local system.
--port | -p
- Specifies the port to use when connecting to the database. Defaults to port 5432.
--dbname | -d
- Specifies which database to connect to. Defaults to the current operating system user for the database name.
--username | -U
- Indicates the username to use when connecting to the database. Defaults to the current operating system user.
--command | -c
- Runs a a single SQL statement or internal command. Don't forget to use quotes around the command if it contains spaces.
--file | -f
- Executes the SQL and/or commands in a specified file.
Use psql
to connect to your Heroku Postgres instance, using the information provided using View Credentials… as explained above. Heroku Postgres is likely using the default PostgreSQL port, so you don't need to indicate it when connecting.
Once you are connected, psql
provides several useful commands you can use from within its interactive shell. These psql
commands, unlike SQL statements, do not need to end with a semicolon ;
character.
\?
- Provides a list of all commands.
\h command
- Provides help on a single command.
\q
- Quits psql and exits to the operating system command line.
\conninfo
- Displays information about the current connection.
\d[S[+]]
- Provides a description of the tables, views sequences, and indexes. If
S
is included, includes system objects. Adding+
indicates that extra details should be included \connect database | \c database
- Connects to a different database.
\g file
- Redirects query output to some external file. Use
\o
to stop redirecting output. \i file
- Executes the SQL in some external script file. This is the interactive equivalent of the command-line
--file | f
option. \! [command]
- Executes a system command or program in the operating system. By itself
\i
will open a new command prompt for you to type as many commands as you like, usingexit
to return topsql
. \copy
…- Exports data to a delimited text file, or imports a delimited text file into the database. Use
\h copy
to see more details on the command syntax and options.
View your connection information using \conninfo
to ensure that you are connected to the database.View all the tables using \d
or \d+
. Because your database is empty, nothing should be listed. Use \dS
or \dS+
to include the system objects, and you should see all the tables in the pg_catalog
schema. Feel free to list the contents of some of these pg_catalog
schema tables to get an idea of the sort of metadata PostgreSQL keeps related to your database.
Create the necessary tables based upon the model expressed by your ER diagram using SQL DDL statements.
Now when you show all database objects using \d
, you should see the tables you just created.
Now that the tables exist, populate them with data using SQL DML statements.
Now that you've populated the tables, you will be able to show the contents of any table using the SELECT
statement presented above.
You now know how to create a database schema, populate the tables with rows, and then list the rows of each table. In coming lessons you'll learn how to perform more advanced queries on your data, taking into account the relationships among the tables.
Review
Gotchas
- SQL is not a purely relational language. It cuts corners and “bends the rules”, not always to the benefit of your data model. Always keep in mind the distinction between the relational model and SQL.
- Do not approach SQL programming thinking in procedural terms, as if to modify “fields” in “records” of data in a loop, or you will produce SQL queries that are woefully inefficient and inflexible.
- As with Java floating point numbers, do not use SQL approximate numeric types to store money values.
In the Real World
- Most people still use the terms “DDL”, “DML”, and “DCL”, even though SQL now has a separate set of classifications for SQL statements.
- The
FLOAT
type is seldom used. Instead most people use the IEEE 754REAL
andDOUBLE PRECISION
, corresponding to the Javafloat
anddouble
types. - There is a slight distinction in the definition between
NUMERIC
andDECIMAL
, but is is mostly inconsequential. Nevertheless you should probably preferDECIMAL
because in addition to possibly bringing better precision, its name make it clearer that its values can accommodate things such as money.
Think About It
- What SQL compliance level are most database products at today? Will database products ever be fully SQL compliant? Why or why not?
Self Evaluation
- Who creates the official SQL specification? What compliance level are most database products at today?
- Name two differences between SQL tables and true mathematical relations.
- How should programming a relational database in SQL be different from writing a program in a procedural language such as Java?
- Give an example of a relational algebra operator?
- What do the terms “DDL”, “DML”, and “DCL” mean?
- What is a delimited SQL identifier and why might you want to use one?
- How is SQL
NULL
different from Javanull
? - Why are some SQL number types referred to as “exact”, and others as “approximate”?
- What is the difference between fixed and varying-size character strings? How do they behave differently in comparisons?
- How would you include an apostrophe in an SQL string literal?
- How many values can an SQL
BOOLEAN
type have? - How does Oracle consider the empty string in a
VARCHAR
column? - Where are the two places you can place constraints in
CREATE TABLE
statement? - What is the latest, most standards-compliant and cross-product supported way to create an automatically generated surrogate key column?
Task
- Create a set of SQL DDL statements for creating the Booker database schema from the ER diagram. Save these statements in a file
data/Booker.ddl.sql
.- You may of course experiment as much as needed using the
psql
command-line utility. - When finished, you should be able to create your schema from scratch using the
psql
--file
option, specifying the file you created.
- You may of course experiment as much as needed using the
- Create a set of SQL DML statements for populating the Booker tables with all the publications in your snapshot repository. Save these statements in a file
data/Booker.dml.sql
.- You may of course experiment as much as needed using the
psql
command-line utility. - When finished, you should be able to populate your database from the command line using the
psql
--file
option, specifying the file you created.
- You may of course experiment as much as needed using the
- Export each of your tables to a different CSV file encoded in UTF-8, with a header naming the columns, and store them in the
data/
directory as well. Each CSV file should be named after the official name of the table, with acsv
extension. - Submit a pull request as normal with the files you added to the
data/
directory of your project.
See Also
- Structured Query Language (WikiBooks)
- Learning SQL, Second Edition (Alan Beaulieu - O'Reilly, 2009)
- SQL and Relational Theory, Third Edition (C. J. Date - O'Reilly, 2015)
- SQL:1999 (Jim Melton, Alan Simon - Morgan Kaufmann, 2001)
- Joe Celko's SQL for Smarties, Fifth Edition (Joe Celko - Morgan Kaufmann, 2014)
- How Heroku Works
- PostgreSQL: Up and Running, Third Edition (Regina Obe, Leo Hsu - O'Reilly, 2017)
References
- ISO 9075-1:2011: SQL/Framework (free)
- ISO 9075-2:2011: SQL/Foundation
- ISO 9075:1992 (draft; free)
- SQL Dialects Reference (WikiBooks)
- PostgreSQL Data Types
- Oracle 12c Data Types
- H2 Data Types
- psql
Resources
Acknowledgments
- Some symbols are from Font Awesome by Dave Gandy.