SQL

Goals

Understand how SQL is used to program relational databases.
Be introduced to the standard SQL types and how they are implemented on some popular RDBMS products.
Learn how to create tables and add constraints using SQL DDL statements.
Learn how to populate tables and view their contents using SQL DML statements.
Experience setting up a remote PostgreSQL database on Heroku.

Concepts

American National Standards Institute (ANSI)
approximate numeric
Data Definition Language (DDL)
Data Manipulation Language (DML)
day-time interval
declarative
delimited identifier
dialect
exact numeric
Heroku
identity column
International Organization for Standardization (ISO)
pad
platform as a service (PaaS)
precision
regular identifier
relational algebra
scale
sequence
Structured Query Language (SQL)
year-month interval

Language

SQL

ALTER TABLE
CREATE TABLE
INSERT INTO
SELECT

Library

Java

Preparation

Download and install PostgreSQL.
- Use the installer from EnterpriseDB if given a choice for your platform.
- Install PostgreSQL version 10 or above.
- You will only need the PostgreSQL command line tools for this lesson.
- Add the PostgreSQL bin/ directory to your system PATH if you wish to access the CLI from the OS command line.
- You may decide to also install the GUI front-end pgAdmin to use once you have mastered the command-line interface.
- You may install the full PostgreSQL server for local testing if you desire.
Sign up for a free Herokuaccount.

Lesson

The relational database model, invented by E. F Codd while working at IBM, has been the workhorse of data storage for decades. To access data in its relational database prototypes, the company created a language named SEQUEL for “Structured English Query Language”. Eventually this language became the Structured Query Language (SQL), which is now the most popular form of accessing relational databases. SQL is proving more popular than the relational model itself: even some newer “NoSQL” databases (which you will learn about in future lessons) are providing front-end languages based on SQL.

As relational database products proliferated, vendors adopted different forms of SQL. In an attempt to bring some unification to the various implementations, the American National Standards Institute (ANSI) in the 1980s developed a standard for SQL. Over the years ANSI and the International Organization for Standardization (ISO) have continued expanding and refining the language, releasing a series of standards from SQL-87 to the 15-part SQL:2016 (ISO/IEC 9075:2016). The core language features were already well-defined in the SQL:1999 and SQL:2003 versions, and these lessons will concentrate on those releases. This level of SQL compliance is sometimes referred to as “SQL3”, which was the unofficial name of SQL:1999 while it was being developed.

Declarative Processing

You will remember that relational database theory is is based on sets, and mathematical set theory comes with its own functions for combining sets and and extracting information from them. SQL is therefore a declarative language; rather than indicate processing steps to perform, you provide a definitional statement of which data you want to retrieve. The database system itself will determine the steps it needs to take to perform the request. In many ways SQL programming is like functional programming; you indicate what needs to be done, not how to do it.

In SQL you will describe which sets of information you desire, in which ways they will be combined, and what filters will be used in returning the final result. You will not write code to iterate through the tuples. In SQL, and in the relational model in general, you manipulate data, not on a per-variable or per-tuple basis, but in the unit of the entire set of data in a relation.

Relational Algebra

In the relational model, each relation is a set of tuples—a read-only value, just like an immutable set in Java. When you perform operations on a database, conceptually you use what is called the relational algebra to perform operations on the relations. The relational algebra is based upon set theory, and has similar operations as you might perform with sets, such as “union”.

An important aspect of the relational algebra is that all of its operations produce new relations! Similar to a Java stream processing, the relations produced by the relational algebra can be used in operations themselves. Although SQL does not completely implement a relational model with a pure relational algebra, you will understand the better what is happening and write better code if you think in term of the relational model and relational algebra operations.

Statement Classes

Beyond the day to day storage and retrieval of data, there are other tasks to perform on a database, such as defining the database schema. Legacy database specifications have used the term Data Definition Language (DDL) to refer to those commands related to creating and updating the schema, and Data Manipulation Language (DML) for CRUD operations such as storing data. Another category, Data Control Language (DCL), refers to commands that set the permissions and user access levels. DCL commands are usually highly specific to the database product being used. These “language” designations have long been used as a way to divide SQL commands or statements into categories.

Starting with SQL:1999, the standard classifies SQL statements by function. Here are some of these SQL statement categories.

SQL schema statements: Statements to define the data schema. Roughly corresponds to the DDL category.
SQL data statements: Statements for storing and retrieving data. Roughly corresponds to the DML category.
SQL transaction statements: Statements that control the atomicity of a group of statements.
SQL connection statements: Statements for connecting to a database server.
SQL session statements: Statements for controlling the parameters for interaction with a database server.

Identifiers

Although regular identifiers are case-insensitive, a database supporting SQL is supposed to normalize them to uppercase, converting FooBar to FOOBAR, for example. PostgreSQL however normalizes regular identifiers to lowercase, which is why you would see FooBar converted to foobar.

Unlike Java and most other modern programming languages,SQL is case-insensitive. You can enter SQL statements in uppercase or lowercase, or even mixed case. The same goes for the names of database objects such as tables; these regular identifiers are case-insensitive as well. Your table can be named FooBar or foobar or FOOBAR, and SQL won't notice the difference.

SQL also allows delimited identifiers, sometimes referred to as “quoted identifiers”, which are case sensitive. If you surround an identifier with quotation mark " characters, SQL will maintain its case and moreover will consider case significant when referring to the identifier. If you create a table named "FooBar", for example, SQL would not find the table if you referred to "foobar" or "FOOBAR". Note that these case sensitivity rules apply to identifiers; the comparison of actual string values are case sensitive unless you request otherwise.

`NULL`

SQL provides a special value called NULL that is best considered not a value at all. The NULL in SQL is different than the null in Java, the latter of which you can compare as if you would any other value. In SQL NULL is neither equal to nor not equal to any value! This means that NULL does not even equal NULL in SQL! Dealing with NULL in SQL requires special handling; avoid it as much as possible. Dealing with SQL's peculiar NULL logic will be addressed in a future lesson.

Types

SQL indicates several predefined types as general categories. Unfortunately the compliance of the different dialects is very diverse, each providing additional variations. It is therefore difficult to choose types that have the same names and work identically across products.

Numeric

The ISO SQL standard divides number types into exact numeric, and approximate numeric. By “approximate” SQL means that the implementation will use a format, probably IEEE 754, which cannot exactly represent all decimal values. In other words, “approximate numeric” in practice indicates the use of floating point numbers.

All the numeric types have a precision, the total number of digits the number can have. Exact numbers also have a scale, the maximum number of digits that can appear after the decimal point.

Character String

Strings in SQL usually allow you to indicate some maximum size. SQL provides two types of string in general: fixed-length strings and variable-length strings. For fixed-length strings, the database will pad the string by adding trailing spaces to ensure that the string is as long as the size specified.

In SQL a string literal is placed inside apostrophe (single quote) characters, such as 'foo bar', unlike Java which uses double quotes. If you need to include an actual apostrophe in a string, such as “Foo's Bar”, there is no escape characters; you simply repeat the apostrophe character in the middle of the string literal, producing for example 'Foo''s Bar'.

Boolean

A Boolean type represents TRUE or false. Boolean operations in SQL, are more confusing because the presence of NULL can produce an UNKNOWN outcome, which will be discussed in a future lesson. SQL actually allows the UNKNOWN value unless there is a not-null constraint, although few products implement this part of the specification.

Datetime

The ISO SQL specification provides supports for several temporal types, but not as many as Java. The DATE and TIME types correspond to java.time.LocalDate and java.time.LocalTime, respectively. The TIMESTAMP type corresponds to a Java java.time.Instant. But the names of the TIME WITH TIME ZONE and TIMESTAMP WITH TIME ZONE types are misleading, as they refer to time zone offsets; they thus correspond to java.time.OffsetTime and java.time.OffsetDateTime, respectively.

Interval

To express differences in time, SQL has two groups of interval types. A year-month interval refers to an interval expressed in years and/or months—a date-based interval. This is similar in concept to java.time.Period. A day-time interval refers to some number of days, hour, minutes, and/or seconds—a time-based interval. This is similar in concept to java.time.Duration. In addition to storing intervals, subtracting two dates or times produces an interval of the appropriate type.

The SQL specification allows intervals to be specified in various ways, but the simplest is to use INTERVAL followed by a qualifier indicating YEAR, MONTH, DAY, HOUR, MINUTE or SECOND. The qualifier allows a precision to be indicated in parentheses; if not present the precision defaults to 2 except for SECOND, which defaults to a precision of 6. Lastly the qualifier can be followed by TO and another qualifier to specify granularity. For example INTERVAL YEAR(3) TO MONTH indicates an interval of up to 999 years and some months.

Category	Type	Description	PostgreSQL	Oracle	H2
Exact Numeric	`INTEGER` / `INT`	Represents an integer value.	`INTEGER` / `INT`	`NUMBER(precision, scale)` `NUMBER(precision)` `NUMBER`	`INTEGER` / `INT`
	`SMALLINT`	Integer with smaller precision and scale of zero.	`SMALLINT`		`SMALLINT`
	`BIGINT`	Integer with larger precision and scale of zero.	`BIGINT`		`BIGINT`
	`NUMERIC[(precision[, scale])]`	Exact decimal representation, with defined precision and scale.	`NUMERIC(precision, scale)` / `DECIMAL(precision, scale)` `NUMERIC(precision)` / `DECIMAL(scale)` `NUMERIC` / `DECIMAL`		`NUMERIC(precision, scale)` / `DECIMAL(precision, scale)` / `DEC(precision, scale)` `NUMERIC(precision)` / `DECIMAL(precision)` / `DEC(precision)` `NUMERIC` / `DECIMAL` / `DEC`
	`DECIMAL[(precision[, scale])]` / `DEC[(precision[, scale])]`	May have higher precision than requested. In many database products, `DECIMAL` is merely a synonym for `NUMERIC`.
Approximate Numeric	`REAL`	Single-precision floating-point; usually IEEE 754. Corresponds to Java `float`.	`REAL`	`BINARY_FLOAT`	`REAL`
	`DOUBLE PRECISION`	Double-precision floating-point; usually IEEE 754. Corresponds to Java `double`.	`DOUBLE PRECISION`	`BINARY_DOUBLE`	`DOUBLE PRECISION`
	`FLOAT[(precision)]`	Arbitrary-precision floating-point; usually IEEE 754. Seldom used in real life.	`FLOAT` `FLOAT(precision)`	`FLOAT` `FLOAT(precision)`	`FLOAT`
Character String	`CHARACTER[(size)]`; `CHAR[(size)]`	Fixed-size string, padded with trailing spaces.	`CHARACTER` / `CHAR` `CHARACTER(size)` / `CHAR(size)`	`CHAR` `CHAR(size)`	`CHARACTER` / `CHAR` `CHARACTER(size)` / `CHAR(size)`
Character String	`CHARACTER VARYING(size)` / `CHAR VARYING(size)` / `VARCHAR(size)`	Variable-length string with maximum size specified.	`CHARACTER VARYING` / `VARCHAR` `CHARACTER VARYING(size)` / `VARCHAR(size)` `TEXT` Maximum `size` is ~`1GB` if unspecified.	`VARCHAR(size)` `VARCHAR2(size)` Maximum `size` is `4000`, or `32767` if `MAX_STRING_SIZE = EXTENDED` in v12.1.	`VARCHAR` `VARCHAR(size)` `VARCHAR2(size)` Maximum `size is` `2147483647`.
Boolean	`BOOLEAN`	Represents either `TRUE` or `FALSE`.	`BOOLEAN` / `BOOL`	N/A	`BOOLEAN` / `BOOL`
Datetime	`DATE`	A local date as marked on a calendar. Equivalent to `java.time.LocalDate`.	`DATE`	`DATE`	`DATE`
	`TIME`	A local time as appears on a clock. Equivalent to `java.time.LocalTime`.	`TIME`	N/A	`TIME`
	`TIME WITH TIME ZONE`	A time with a zone offset. Equivalent to `java.time.OffsetTime`.	`TIME WITH TIME ZONE`	N/A	N/A
	`TIMESTAMP`	An instant in time in terms of UTC. Equivalent to `java.time.Instant`.	`TIMESTAMP`	`TIMESTAMP`	`TIMESTAMP`
	`TIMESTAMP WITH TIME ZONE`	The date and time in some local zone, designated by an offset from UTC. Equivalent to `java.time.OffsetDateTime`.	`TIMESTAMP WITH TIME ZONE`	`TIMESTAMP WITH TIME ZONE`	`TIMESTAMP WITH TIME ZONE`
Year-Month Interval	`INTERVAL YEAR[(precision)] TO MONTH`	A date-based interval. Similar to `java.time.Period`.	`INTERVAL YEAR TO MONTH` `INTERVAL YEAR(precision) TO MONTH`	`INTERVAL YEAR TO MONTH` `INTERVAL YEAR(precision) TO MONTH`	N/A
Day-Time Interval	`INTERVAL DAY[(precision)] TO SECOND`	A time-based interval. Similar to `java.time.Period`.	`INTERVAL DAY TO SECOND` `INTERVAL DAY(precision) TO SECOND`	`INTERVAL DAY TO SECOND` `INTERVAL DAY(precision) TO SECOND`	N/A

In real life FLOAT is seldom used. The REAL and DOUBLE PRECISION are commonly used for IEEE 754 floating point values.

In the most popular database products, regardless of the size you indicate for the VARCHAR type, internally the database will only allocate space based upon the actual size of each value stored.

Oracle originally limited the size of VARCHAR / VARCHAR2 to 4000 bytes. In Oracle 12c (12.1) it is possible to strings up to 32767 bytes long, but the database has to be specially configured using MAX_STRING_SIZE = EXTENDED. See Oracle 12.1 Database SQL Language Reference: Extended Data Types and a useful guide to Extended Data Types with VARCHAR2(32767).

Internally PostgreSQL stores CHAR, VARCHAR, and TEXT types in the same way. The storage format, is nicknamed varlena, meaning “variable-length array”. Although there are semantic differences in space-padding and length-checking, there is no significant performance difference among these types. Although some PostgreSQL experts advocate using TEXT exclusively, the ISO-compliance and cross-platform standardization of VARCHAR should also be considered. See VARCHAR(n) Considered Harmful and In Defense of varchar(x).

On Oracle the VARCHAR and VARCHAR2 types are synonymous as of version 12.2. However, Oracle recommends not to use the VARCHAR type, because that its string comparison semantics will change in the future. This warning has been present for several versions, and this warning may be balanced against the benefit of using a standards-compliant type for strings.

Oracle does not support the standard BOOLEAN type; moreover Oracle has not provided clear guidance on what type use to represent Boolean value. The best suggestion seems to be to use CHAR(1), as CHAR(1) uses fewer bytes than NUMBER(1), with a check constraint restricting the value to 0 or 1. See the answers to Is there a boolean type in oracle databases?, an in-depth question-and-answer thread about the Boolean Datatype, and a discussion of a recommended approach at Oracle Tip: Choosing an efficient design for Boolean column values.

SQL Schema Statements

Before getting to the point where you can actually store and retrieve information, you must first provide the database with a definition of your tables and other schema items using SQL DDL statements.

`CREATE TABLE`

Primary in the schema statement category is the CREATE TABLE statement. It has the general form CREATE TABLE table (elements) as shown in the example below.

Creating an Item table.

CREATE TABLE Item (
  code VARCHAR(99) NOT NULL,
  name VARCHAR(9999) NOT NULL,
  stock INTEGER DEFAULT 0 NOT NULL,
  supplierId INTEGER NOT NULL,
  CONSTRAINT Item_pk PRIMARY KEY (code),
  CONSTRAINT Item_name_un UNIQUE (name),
  CONSTRAINT Item_supplierId_fk FOREIGN KEY (supplierId) REFERENCES Supplier(id)
);

In most SQL environments that allow multiple commands, each SQL statement must end with a semicolon ; character.

Lines that begin with two hypen-minus -- characters are considered comments and are ignored.

If a constraint applies to multiple columns, you can list them separated by commas. For example a composite primary key for the columns foo and bar would be indicated as … PRIMARY KEY (foo, bar).

The constraints in the examples above were declared at the table level. A shorter form allows constraints to be declared at the column level.

CREATE TABLE Item (
  code VARCHAR(99) NOT NULL PRIMARY KEY,
  name VARCHAR(9999) NOT NULL UNIQUE,
  stock INTEGER DEFAULT 0 NOT NULL,
  supplierId INTEGER NOT NULL REFERENCES Supplier(id)
);

Some column-level constraints may be given names as well by adding the CONSTRAINT keyword. This allows the constraints themselves to be directly changed or removed later.

…
  supplierId INTEGER NOT NULL CONSTRAINT Item_supplierId_fk REFERENCES Supplier(id)
…

If a constraint references multiple columns, it must be defined at the table level.

What is it exactly that you are creating? From the SQL syntax, it might appear that you are creating a tabular structure in memory, ready for items to be added to it. But in relational terms you are creating a relation type! The tricky thing is that SQL acts as if it creates a variable with the same name as the type; something like this:

public class ItemRelation extends Relation<ItemTuple> {
  …
}

ItemRelation itemRelation = new ItemRelation();

From now on, when you refer to Item, you'll be referring to a variable that holds the current, immutable instance of ItemRelation.

`IDENTITY`

SQL:2003 introduced a standard way to create a column for a surrogate key that automatically increments when rows are added to a table. An identity column can be useful because it the database takes responsibility for atomically generating unique surrogate key values. Previous solutions used proprietary types such as “serial”. Another solution was to use a sequence, a named counter that exists separate from a table. Identity columns are available in PostgreSQL 10, Oracle 12.1, and above.

Creating a Supplier table.

CREATE TABLE Supplier (
  id INTEGER NOT NULL GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  name VARCHAR(9999) NOT NULL UNIQUE,
  address VARCHAR(9999) NOT NULL
);

The ALWAYS specifier ensures that the column will always contain an auto-generated by preventing custom values from being inserted. If you replace ALWAYS with BY DEFAULT, values will still be auto-generated unless an explicit value is provided when a row is inserted or updated.

`ALTER TABLE`

If you need to modify a table later, such as redefining columns or even adding new ones, you can use the ALTER TABLE statement. Unfortunately the syntax for altering a table varies among database products.

`ALTER TABLE table ADD CONSTRAINT`

One of the most common uses for the ALTER TABLE statement is to add a constraint after a table has already been created. Some developers prefer to add all constraints separately; this is for the most part a matter of style.

Adding a primary key constraint to the Supplier table.

ALTER TABLE Supplier
  ADD CONSTRAINT Supplier_pk PRIMARY KEY (id)
);

The ISO SQL specification includes three types of constraints that can be used with the ADD CONSTRAINT statement; database products may add others.

PRIMARY KEY
FOREIGN KEY
UNIQUE
CHECK

Check constraints allow custom conditions to be placed on columns. The SQL standard refers to both PRIMARY KEY and UNIQUE constraints as “unique constraints”.

SQL Data Statements

After defining your tables, you can populate and then query them using SQL DML statements.

`INSERT INTO`

The SQL statement for inserting data is straightforward, most commonly taking the form INSERT INTO table [(columns)] VALUES (values). The columns is a comma-separated list of common names; the values are the values to insert, each corresponding to a named column.

Inserting rows into the Supplier table.

INSERT INTO Supplier (name, address) VALUES ('Acme Furniture Company', '123 Some Street');
INSERT INTO Supplier (name, address) VALUES ('Bozo Toy Company', '321 Other Street');
INSERT INTO Supplier (name, address) VALUES ('Top Toys Corporation', '456 Far Avenue');

The Supplier id column is automatically generated, so it does not need to be specified when data is inserted.

Although the columns specification is optional, you should not rely on the order of the columns in the table. If you leave out the names of the columns, the database schema may change in the future and no longer match your statement. Moreover remember that the attributes of a relation are unordered! Don't rely on this SQL-specific behavior; program your relational database adhering to the relational model.

Are you really “inserting” values “into” the table? The SQL statement naming makes it seem that way, and it may be that your database project implements it that way. But remember that conceptually relations are immutable values. When you “insert” a tuple into a relation, you are really simply changing your Supplier variable to reference another relation that was produced by adding the new tuple. And in fact the “new tuple” you are “inserting” is really another relation, of the same SupplierRelation type, that contains the single tuple. Thus INSERT INTO Supplier (name, address) VALUES ('Acme Furniture Company', '123 Some Street') might look something like this in Java:

supplerRelation = supplierRelation.union(
    SupplierRelation.of(SupplierTuple.of("Acme Furniture Company", "123 Some Street")));

`SELECT`

The main statement for searching and retrieving data from a database is SELECT. The simplest form is SELECT columns FROM table, which lists all rows in the indicated table. The comma-separated columns indicate which columns should be included. For columns you may use the asterisk * character as a wildcard to include all columns.

Listing all the rows in the Supplier table.

SELECT * FROM Supplier;

The * wildcard is useful interactively querying a database, but it can cause your code to be unclear or even inefficient when querying the database from withing your program, and could even cause mismatches when when querying multiple tables. Inside your program always list the exact columns you are interested in when using the SELECT statement.

PostgreSQL

As you can see from the above table, the open-source database PosgreSQL is PostgreSQL is highly compliant with the ANSI/ISO SQL standard. The current product started as POSTGRES in the 1980s, but its roots go back to even earlier projects. PostreSQL, often still referred to as “Postgres”, has evolved with the standards and offers many plugins and advanced features not available on other databases.

These lessons will use PostgreSQL in a client/server configuration, as might typically be found in an enterprise setting. Because of this configuration, installing a local database is optional. At the moment you will only need the command-line tool, but you may wish to install other tools such as the GUI front-end pgAdmin for future use.

Download and install PostgreSQL.
- Use the installer from EnterpriseDB if given a choice for your platform.
- You will only need the PostgreSQL command line tools for this lesson.
- Add the PostgreSQL bin/ directory to your system PATH if you wish to access the CLI from the OS command line.

PostgreSQL provides several organizing structures at a higher level than tables.

schema: As with standard SQL, PostgreSQL uses the word “schema” to refer to a collection of tables and related objects.
database: PosgreSQL refers to a collection of schemas as a “database”. This corresponds to what ANSI/ISO SQL calls a “catalog”. PostgreSQL uses the term “catalog” simply to refer to a system schema.

PostgreSQL provides two system schemas holding metadata.

INFORMATION_SCHEMA: Metadata about the database as prescribed by SQL specification.
PG_CATALOG: Provides more extensive database metadata and PostgreSQL system information.

Heroku

The service Heroku is an entire online platform for deploying your Internet applications, which Heroku calls “apps”. As a cloud platform as a service (PaaS), Heroku allows each app add and remove virtual server container processes, each called a “dyno” because it is dynamically provisioned. The applications code itself is deployed using Git.

Each Heroku app allows “add-ons”, which are extra features or tools. Most important for the purposes of this lesson, one add-on is Heroku Postgres, a live deployment of PostgreSQL you can access securely over the Internet. Moreover Heroku's Hobby Dev plan is completely free, supporting a limited number of connections.

If you don't already have an account, go to Heroku and sign up.
Go to the Heroku dashboard to see your apps.
Select Create New App.
Choose a region for the dyno and select Create app. If you do not specify an app name, Heroku will assign one for you using a series of words and numbers, for example infinite-wildwood-58236.
Go back to your apps and select the app you just created.
On the dashboard of the new app, select Open app.
On the Overview tab, select Configure Add-ons.
Under Add-ons, type “Postgress” in the search field and select Heroku Postgres from the options that appear.
Select the appropriate plan. The Hobby Dev — Free plan should be sufficient for this course.
Select Provision and the heroku-postgresql add-on will be added to your app.
Select the Heroku Postgres :: Database that appears in the list of installed add-ons to bring up the heroku-postgresql dashboard. Verify that the PostgreSQL version is 10 or above.
On the Settings tab, select View Credentials…. The properties that appear are the parameters you will need to connect to your new database.

Note that Heroku not only created the PostgreSQL instance, it created a separate “database” in which to place your schemas—an SQL “catalog” as explained above. This is why you must indicate the database name when logging in, explained below under psql. Heroku also created the default PostgreSQL public schema, which is used automatically.

`psql`

psql is the official command-line interface for PostgreSQL and is included in the distribution. Start the program by entering psql on the command line. Make sure you have added the PosgreSQL installation directory on the system PATH environment variable, as indicated above.

There are several important psql command-line options you should be familiar with:

--help | -?: Lists the available command-line options.
--version | -V: Shows the version of the psql utility.
--host | -h: Indicates the host on which the database server is running. Defaults to the local system.
--port | -p: Specifies the port to use when connecting to the database. Defaults to port 5432.
--dbname | -d: Specifies which database to connect to. Defaults to the current operating system user for the database name.
--username | -U: Indicates the username to use when connecting to the database. Defaults to the current operating system user.
--command | -c: Runs a a single SQL statement or internal command. Don't forget to use quotes around the command if it contains spaces.
--file | -f: Executes the SQL and/or commands in a specified file.

Use psql to connect to your Heroku Postgres instance, using the information provided using View Credentials… as explained above. Heroku Postgres is likely using the default PostgreSQL port, so you don't need to indicate it when connecting.

Connect to Heroku Postgres using psql.

psql --host example.amazonaws.com --port 5432 --dbname exampledatabase --username exampleusername

Once you are connected, psql provides several useful commands you can use from within its interactive shell. These psql commands, unlike SQL statements, do not need to end with a semicolon ; character.

\?: Provides a list of all commands.
\h command: Provides help on a single command.
\q: Quits psql and exits to the operating system command line.
\conninfo: Displays information about the current connection.
\d[S[+]]: Provides a description of the tables, views sequences, and indexes. If S is included, includes system objects. Adding + indicates that extra details should be included
\connect database | \c database: Connects to a different database.
\g file: Redirects query output to some external file. Use \o to stop redirecting output.
\i file: Executes the SQL in some external script file. This is the interactive equivalent of the command-line --file | f option.
\! [command]: Executes a system command or program in the operating system. By itself \i will open a new command prompt for you to type as many commands as you like, using exit to return to psql.
\copy …: Exports data to a delimited text file, or imports a delimited text file into the database. Use \h copy to see more details on the command syntax and options.

The \copy command provides a convenient way to export a table to a CSV file for viewing in a text editor or a spreadsheet program. Here is an example of exporting the Supplier table to a CSV file using UTF-8, along with a separate header row indicating the names of the table columns.

\copy (SELECT * FROM Supplier) TO 'Supplier.csv' CSV HEADER ENCODING 'UTF-8'

View your connection information using \conninfo to ensure that you are connected to the database.View all the tables using \d or \d+. Because your database is empty, nothing should be listed. Use \dS or \dS+ to include the system objects, and you should see all the tables in the pg_catalog schema. Feel free to list the contents of some of these pg_catalog schema tables to get an idea of the sort of metadata PostgreSQL keeps related to your database.

Checking the psql connection and viewing a description of all database objects.

\conninfo
…
\d
…
\dS

By default PostgreSQL uses the public schema. If you wish you may create a different schema, perhaps one named stock just for storing information about items, suppliers, and stock for the store.

CREATE SCHEMA stock;

Don't forget that you would then need to qualify all the table names with the schema name, such as stock.Item and stock.Supplier. For more options to select schemas, see How to select a schema in postgres when using psql?.

Create the necessary tables based upon the model expressed by your ER diagram using SQL DDL statements.

Creating the tables for the item stock schema.

CREATE TABLE Supplier (
  id INTEGER NOT NULL GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  name VARCHAR(9999) NOT NULL UNIQUE,
  address VARCHAR(9999) NOT NULL
);

CREATE TABLE Item (
  code VARCHAR(99) NOT NULL,
  name VARCHAR(9999) NOT NULL,
  stock INTEGER DEFAULT 0 NOT NULL,
  supplierId INTEGER NOT NULL,
  CONSTRAINT Item_pk PRIMARY KEY (code),
  CONSTRAINT Item_name_un UNIQUE (name),
  CONSTRAINT Item_supplierId_fk FOREIGN KEY (supplierId) REFERENCES Supplier(id)
);

CREATE TABLE Category (
  id INTEGER NOT NULL GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  name VARCHAR(9999) NOT NULL,
  CONSTRAINT Category_name_un UNIQUE (name)
);

CREATE TABLE ItemCategory (
  itemCode VARCHAR(99) NOT NULL,
  categoryId INTEGER NOT NULL,
  CONSTRAINT ItemCategory_pk PRIMARY KEY (itemCode, categoryId),
  CONSTRAINT ItemCategory_itemCode_fk FOREIGN KEY (itemCode) REFERENCES Item(code),
  CONSTRAINT ItemCategory_categoryId_fk FOREIGN KEY (categoryId) REFERENCES Category(id)
);

Now when you show all database objects using \d, you should see the tables you just created.

If you look at all the database objects now using \d, you will see that PostgreSQL has created the four tables you requested as category, item, itemcategory, and supplier. You'll also see that, to implement the generated columns with two sequence objects, category_id_seq and supplier_id_seq. If you wish to list only the tables, you may use \dt.

Now that the tables exist, populate them with data using SQL DML statements.

Adding data to the tables for the item stock schema.

-- Supplier
INSERT INTO Supplier (name, address) VALUES ('Acme Furniture Company', '123 Some Street');
INSERT INTO Supplier (name, address) VALUES ('Bozo Toy Company', '321 Other Street');
INSERT INTO Supplier (name, address) VALUES ('Top Toys Corporation', '456 Far Avenue');

-- Item
-- The item codes here are arbitrary, and might come from your store catalog.
-- You should first verify the values that were generated in the Supplier.id identity column.
INSERT INTO Item (code, name, stock, supplierId) VALUES ('A1', 'table', 20, 1);
INSERT INTO Item (code, name, stock, supplierId) VALUES ('B2', 'game table', 12, 1);
INSERT INTO Item (code, name, stock, supplierId) VALUES ('C3', 'chair', 15, 1);
INSERT INTO Item (code, name, stock, supplierId) VALUES ('D4', 'balloon', 99, 2);
INSERT INTO Item (code, name, stock, supplierId) VALUES ('E5', 'kite', 5, 2);
INSERT INTO Item (code, name, stock, supplierId) VALUES ('F6', 'doll', 10, 2);

-- Category
INSERT INTO Category (name) VALUES ('furniture');
INSERT INTO Category (name) VALUES ('toy');

-- ItemCategory
-- You should first verify the values that were generated in the Category.id identity column.
INSERT INTO ItemCategory (itemCode, categoryId) VALUES ('A1', 1);
INSERT INTO ItemCategory (itemCode, categoryId) VALUES ('B2', 1);
INSERT INTO ItemCategory (itemCode, categoryId) VALUES ('B2', 2);
INSERT INTO ItemCategory (itemCode, categoryId) VALUES ('C3', 1);
INSERT INTO ItemCategory (itemCode, categoryId) VALUES ('D4', 2);
INSERT INTO ItemCategory (itemCode, categoryId) VALUES ('E5', 2);
INSERT INTO ItemCategory (itemCode, categoryId) VALUES ('F6', 2);

PostgreSQL automatically generated values for the Supplier.id and Category.id values each time you entered rows in those tables. You will want to verify which ID was assigned to which supplier and which category before using those values in the Item.supplierId and ItemCategory.categoryId foreign key columns.

Now that you've populated the tables, you will be able to show the contents of any table using the SELECT statement presented above.

Listing all the rows in the Item table.

SELECT * FROM Item;

You'll probably notice that the order of the rows in the Item table, for example, are not listed in the order you inserted them. In fact the Item table has no specific order, as relations are an unordered set of tuples. SQL however does provide a way to sort the returned information, which you'll learn about in a future lesson.

You now know how to create a database schema, populate the tables with rows, and then list the rows of each table. In coming lessons you'll learn how to perform more advanced queries on your data, taking into account the relationships among the tables.

Review

Gotchas

SQL is not a purely relational language. It cuts corners and “bends the rules”, not always to the benefit of your data model. Always keep in mind the distinction between the relational model and SQL.
Do not approach SQL programming thinking in procedural terms, as if to modify “fields” in “records” of data in a loop, or you will produce SQL queries that are woefully inefficient and inflexible.
As with Java floating point numbers, do not use SQL approximate numeric types to store money values.

In the Real World

Most people still use the terms “DDL”, “DML”, and “DCL”, even though SQL now has a separate set of classifications for SQL statements.
The FLOAT type is seldom used. Instead most people use the IEEE 754 REAL and DOUBLE PRECISION, corresponding to the Java float and double types.
There is a slight distinction in the definition between NUMERIC and DECIMAL, but is is mostly inconsequential. Nevertheless you should probably prefer DECIMAL because in addition to possibly bringing better precision, its name make it clearer that its values can accommodate things such as money.

Think About It

What SQL compliance level are most database products at today? Will database products ever be fully SQL compliant? Why or why not?

Self Evaluation

Who creates the official SQL specification? What compliance level are most database products at today?
Name two differences between SQL tables and true mathematical relations.
How should programming a relational database in SQL be different from writing a program in a procedural language such as Java?
Give an example of a relational algebra operator?
What do the terms “DDL”, “DML”, and “DCL” mean?
What is a delimited SQL identifier and why might you want to use one?
How is SQL NULL different from Java null?
Why are some SQL number types referred to as “exact”, and others as “approximate”?
What is the difference between fixed and varying-size character strings? How do they behave differently in comparisons?
How would you include an apostrophe in an SQL string literal?
How many values can an SQL BOOLEAN type have?
How does Oracle consider the empty string in a VARCHAR column?
Where are the two places you can place constraints in CREATE TABLE statement?
What is the latest, most standards-compliant and cross-product supported way to create an automatically generated surrogate key column?

Task

Create a set of SQL DDL statements for creating the Booker database schema from the ER diagram. Save these statements in a file data/Booker.ddl.sql.
- You may of course experiment as much as needed using the psql command-line utility.
- When finished, you should be able to create your schema from scratch using the psql --file option, specifying the file you created.
Create a set of SQL DML statements for populating the Booker tables with all the publications in your snapshot repository. Save these statements in a file data/Booker.dml.sql.
- You may of course experiment as much as needed using the psql command-line utility.
- When finished, you should be able to populate your database from the command line using the psql --file option, specifying the file you created.
Export each of your tables to a different CSV file encoded in UTF-8, with a header naming the columns, and store them in the data/ directory as well. Each CSV file should be named after the official name of the table, with a csv extension.
Submit a pull request as normal with the files you added to the data/ directory of your project.

References

Resources

Acknowledgments

Some symbols are from Font Awesome by Dave Gandy.