Relational Databases
Goals
- Understand fundamental relational model theory.
- Learn notation for diagramming relational database schemas.
- Discover how to map an object-oriented domain model to the relational model for database persistence.
- Understand the purpose of normalization, and become proficient with the most important normal forms.
- Be familiar with the most popular relational database management system products.
Concepts
- anomaly
- atomic
- attribute
- Boyce-Codd Normal Form (BCNF)
- candidate key
- cardinality
- catalog
- class table inheritance
- cluster
- composite key
- concrete table inheritance
- constraint
- “crow's foot” notation
- database
- database system
- deletion anomaly
- domain
- domain integrity
- domain model
- entities
- Entity-Relationship (ER) diagram
- First Normal Form (1NF)
- flat files
- foreign key
- foreign key constraint
- functional dependence (FD)
- heading
- insertion anomaly
- integrity
- key
- many-to-many
- many-to-one
- modality
- natural key
- normal forms
- normalization
- not-null constraint
- object-relational impedance mismatch
- one-to-many
- one-to-one
- opaque
- primary key
- primary key constraint
- query
- record
- referential integrity
- relation
- relational database management system (RDBMS)
- schema
- Second Normal Form (2NF)
- single table inheritance
- surrogate key
- Third Normal Form (3NF)
- transitive functional dependency
- tuple
- unique constraint
- update anomaly
Preparation
Lesson
Persisting data is essential in almost every application. Some applications create simple data that must be remembered, such as settings changes and log information. Other data programs need to query or look up data that is already saved somewhere based upon some selection criteria.
For small data sets, the object serialization techniques you've learned in previous lessons may be sufficient. But as soon as the amount of data grows to a moderate level, serializing individual objects becomes difficult, especial when it comes to managing the relationships among the objects. A database typically stores general data, encodes relationships among the data, and allows queries to find data later.
Relational Databases
Some of the simplest databases are stored in so-called flat files, with rows of text data separated by commas or tabs. In flat-file databases, the consumer application usually must “connect the dots”, as it were, and not only determine what the data means but also how pieces of the data relates. If a library is available for taking care of some of this work of reading and connecting the information so that the application doesn't have to, it is called a database system.
A database system represents entities, which are analogous to objects with unique identifiers. A Vehicle
is one example of an entity; each vehicle (entity instance) might be identified by a vehicle identification number (VIN). An entity may have various attributes, such as year of production. Any one Vehicle
might be related to its Manufacturer
, the particular entity that produced it.
Early database systems conceptually arranged entities into trees of hierarchies. Later systems improved these data models by allowing general graph or “network” data models to indicate more complex relationships among entities. But these databases were still inflexible and were hard to query.
Databases were revolutionized in 1970 when E. F. Codd, a computer scientist at IBM, proposed A relational model of data for large shared data banks. Based upon a rigorous mathematical foundation, Codd's proposal resulted in the development of the relational database management system (RDBMS), which remains the most popular database system in use today. An RDBMS allows entities to be represented in general ways, and then queried in an almost unlimited fashion.
Relations
The fundamental modeling unit of an RDBMS is a relation, a mathematical concept that arises from arranging data into rows and columns. Each relation represents an entity. Each row represents an instance of that entity, with each column representing an attribute of that the entity.
Item | Stock | Supplier |
---|---|---|
table | 20 | Acme Furniture Company, 123 Some Street |
chair | 15 | Acme Furniture Company, 123 Some Street |
balloon | 99 | Bozo Toy Company, 321 Other Street |
kite | 5 | Bozo Toy Company, 321 Other Street |
doll | 10 | Bozo Toy Company, 321 Other Street |
Think back to the very first lesson of this course, which explained the concept of indirection. The store owner in that lesson needed to keep track of the stock of toys and furniture, along with their suppliers. The store owner, perhaps using only a pad of paper and a pencil, may have referred to this as a “list”. As you completed the task for that lesson, you may have thought of the information as being arranged in a “table”. In the relational model, the correct terminology for this arrangement of data is a relation.
But a relation is not just any tabular representation of information. A relation has certain formal pieces that must adhere to certain rules for the relational model to work. You may hear people refer to “tables”, “rows”, and “columns” when discussing relational database management systems, but relations are not general tables. The rows and columns of a relation have special rules; ignoring them will cause problems in your model and provide erroneous results in your queries. In casual conversation you may use the words “tables”, “rows”, and “columns” to simplify communication, but don't forget that there is formal distinction between these items in common use and the formal definition of a relation.
Attributes
Each “column” in a relation represents an attribute of the relation. Each attribute is identified in a special “row” called the relation heading. Unlike general table columns, attributes follow certain rules:
- Each attribute has a unique name.
- Each attribute has a unique type or domain.
- The attributes may appear in any order in the relation. Although all the values under
Stock
must represent the stock count, it makes no difference whether theItem
attribute comes before or after theStock
attribute. Attribute order is not significant.
Thus the Item_Stock
relation above has three attributes: Item
, Stock
, and Supplier
. Each will have a type, and they may appear in any order.
Tuples
Each “row” in a relation is referred to as a tuple, and represents an instance of the entity. Unlike general table rows, tuples follow certain rules:
- Each tuple has unique values. A relation cannot contain duplicate tuples.
- Tuples can appear in any order in the relation. In the above example the tuple containing the item named
chair
might appear first, last, or anywhere in between. Tuple order is not significant.
The Item_Stock
relation above contains five tuples, which may appear in any order.
Sets
The relational database model is based on set theory. This point is essential to understanding programming relational databases. Storing, querying, and updating data in a relational database is based on operations or functions performed on sets of data.
Thus the definition of some relation such as Item_Stock
is a set of attributes. The actual relation (the data that Item_Stock
contains at any particular time) is a set of tuples. Programming a relational database is essentially indicating what operations to perform on these sets of data.
Semantics
The relational model is more than just a way to organize data. The “tables”, “rows”, and “columns” mean something. As you might have guessed, each relation represents a type of entity such as a vehicle. The heading indicating what attributes a Vehicle
is allowed to have.
Each tuple represents a instance of an entity, as if you created a vehicle using new Vehicle()
in Java. The value in each “table cell” specifies a value of an entity's attribute. Seen in this way, each tuple is a set of logical propositions about an entity instance, asserting for example “such and such Vehicle has the color blue” and “such and such vehicle has four wheels”.
Keys
“Such and such” a vague way to identify a vehicle. A common aspect of historical database systems has been the storage of data about an entity in a record, essentially an array of variables potentially of different types. But different systems traditionally used different ways to link the records together. Part of the “secret sauce” of relational databases is that the relational model itself provides a way to identify a tuple, which is analogous to a record in previous database models. The term “record” is still used in casual speech when referring to the information in a tuple.
Any attribute or collection of attributes that uniquely identifies a tuple within a relation is called a key. In a relational database need to select one of these keys (sometimes called candidate keys) to serve as the primary key, or official unique identifier for the entity.
Constraints
A primary key is an example of a constraint, which is a rule restricting the values placed in the database. A primary key constraint, for example, merely tells the database that a particular attribute serves as the official tuple identifier. But if there are other attributes that must have different values for each entity, you can add a unique constraint that prevents duplicates.
The benefit of a constraint is that it is maintained by the database system, without any extra work on the part of the application. If your program attempts to add two tuples with attribute values that violate a unique constraint, for example, the attempt will fail with an error. Constraints thus ensure integrity of the database; any data stored is guaranteed to be bound by all constraints.
Databases
While relations and their subcomponents form the heart of a relational database, an RDBMS organizes relations into larger structures still. Although the terminology varies among database products, the following represents the general arrangement of logical structures:
- schema
- A schema groups a set of related tables. The word “schema” can also refer to the definition of the relations; see Schema Design below.
- catalog
- All the available schemas are then grouped into a catalog. In many RDBMS products, a catalog is simply called the “database”. Some products use the word “catalog” to refer to special system schemas holding available types and other metadata.
- cluster
- Multiple catalogs or databases can in turn be grouped into even larger arrangements called “clusters”. In some products a “cluster” may be synonymous with “database server”, the installation of the product on a particular machine. Historically the word cluster referred to the physical grouping of some database elements into a single file for faster access.
Schema Design
In colloquial use the word “schema” can also mean a set of rules defining relational structures such as relations, attributes, and keys. Designing the schema is similar to designing the classes and methods of an object-oriented program. As with classes, the design of tables is derived from the domain model that defines the conceptual view of the program. (Here the “domain” of the program's domain model, which refers to the world-view the program works with, is different from the “domain” of a relation attribute, which refers to the attribute's value type.)
In creating your program's domain model you used UML to create a “sketch” of of the central objects in the domain, and used the same modeling language to produce a “blueprint” of how those objects could be turned into Java classes. In a similar way you will take the original domain model and create a rough view of of the tables, attributes, and relationships in your database schema. But the initial plan for a database schema is almost never sufficient; you will need to refine your schema so that it represents the data in the most efficient way for the relational paradigm.
Entity Relationship Diagram
While a UML class diagram is the diagram of choice for illustrating the domain model as well as the blueprint of actual classes, the database world mostly uses the Entity-Relationship (ER) diagram. Charles Bachman was one of the first to use diagrams to represent data in the 1960s. Building on Bachman's notation, Peter Chen in the 1970s developed the ER approach, in which rectangle represent entities and lines connecting the rectangles indicate relationships.
Chen's original notation for describing relationships included a diamond to describe the relationship. Because the diamond takes up extra space in a diagram, some authors simplify the notation to a single line, with the name of the relationship above the line. Nowadays the most popular ER diagram notation is the information engineering (IE) or “crow's foot” notation, named for the diagram's approach for representing relationship multiplicity, as explained in Cardinality below.
Cardinality
UML class diagrams come with a notation for indicating the “multiplicity” of a relationship, that is the minimum and maximum of each entity type that can appear in the relationship. For example in UML 1..*
indicates “one or more”. The multiplicity consists of two parts: the minimum number (here 1
meaning “one”), and the maximum number (here *
meaning unlimited or “many”).
The maximum number of entities that can appear on one side of a relationship is called the cardinality. Referring to an entity on the left of the relationship line, Crow's foot notation indicates a “one” cardinality with a bar |
, and a “many” cardinality with a greater-than symbol. Together with the relationship line, the “many” symbol >
appears somewhat like a bird's foot, which is how the notation got it's nickname, “crow's foot”.
There are thus three fundamental combinations of modalities in ER diagrams, just as in UML diagrams. A fourth type, a many-to-one (M:1) relationship, is simply the inverse of a one-to-many (1:N) relationship.
- one-to-one (1:1)
▭| |▭
- one-to-many (1:N)
▭| <▭
- many-to-many (M:N)
▭> <▭
There relationship representing “manufactures” between a Company
and a Vehicle
, for example, would have a cardinality of one-to-many or Company| manufactures <Vehicle
. Each company may manufacture many vehicles, but each vehicle is only manufactured by only one company.
Modality
While cardinality refers to the maximum number of a type of entity participating in a relationship, modality refers to the minimum number. In simplest terms this means that the entity is “optional”, denoted by a circle ○
, or “required”, again donated by a bar |
. The modality, however, is shown on the inside of the relationship line, between the cardinality symbols (that is, not next to the entity it is associated with).
The cardinality of the “manufactures” relationship would be represented as Company|| manufactures ○<Vehicle
. A vehicle cannot be manufactured without a company, which is “required”, but a company may exists that has not manufactured any vehicles at all, which would indicate a modality of “optional”.
Entities
As discussed above, a relation represents a class of entities, with each tuple representing an instance of the entity. For much of your domain model, creating your schema will be as simple as creating an entity in the diagram, representing a relation, for each of your domain model classes in your class diagram.
Keys
A more difficult task is finding an attribute to serve as the primary key of an entity. A primary key must be unique and must not change. Some entities may already have an attribute, a natural key, that uniquely identifies instances. For example the automotive industry generates a vehicle identification number (VIN) that uniquely identifies each vehicle produced. A VIN could be used as the primary key of a Vehicle
relation in a database for a used-car sales company.
If no suitable natural key is available, you will need to create a surrogate key, an artificial identifier that brings uniqueness to the tuple. Many databases provide a facility to atomically generate an arbitrary counter when a new record is added. Another option is to use a universally unique identifier (UUID), which practically guarantees uniqueness even though they are independently generated.
Relationships
Foreign Keys
Item | Stock | Supplier |
---|---|---|
table | 20 | ACME |
chair | 15 | ACME |
balloon | 99 | BOZO |
kite | 5 | BOZO |
doll | 10 | BOZO |
ID | Name | Address |
---|---|---|
ACME | Acme Furniture Company | 123 Some Street |
BOZO | Bozo Toy Company | 321 Other Street |
TOP | Top Toys Corporation | 456 Far Avenue |
In a relational database, a relationship is most always modeled using a foreign key. A foreign key is merely an attribute in one relation that references the primary key of another table. Defining a foreign key constraint allows the database to ensure that for every foreign key in one relation, there exists a matching primary key int the referenced relation; this maintains the referential integrity of your database.
Using a single code to reference rows in another table is exactly the approach you intuitively used in the initial lesson to separate “Item Stock” and “Suppliers” into separate lists, reproduced in the figure. In the “Item Stock” table, the “Supplier” column served as an informal foreign key to the supplier in the “Suppliers” table. There the “ID” column services as a primary key to identify each supplier.
Association Entities
Many-to-many relationships pose an interesting problem in the relational model. Imagine that the store owner has placed each item in one or more categories. That is, each category, such as “furniture”, can contain many items such as “table” and “chair”. Additionally, some items may fall into multiple categories, such as a game table that is both “furniture” and a “toy”. There is therefore a many-to-many relationship between the Item
and Category
entities.
Simply adding a foreign key to the Item
and/or Category
entities is insufficient to represent a many-to-many relationship. The solution is to add yet another layer of indirection by creating an entity that represents the relationship of a particular item being in a particular category. This type of entity is called an association entity. It goes by many names including “junction table”.
Generalization
Mapping an inheritance hierarchy of super classes and subclasses (what UML refers to as “generalization” in class diagrams) to relations requires additional thought. In your object-oriented domain model, generalization indicates that some some class contains common properties that apply to several child classes. The child subclasses likely contain their own specialized properties in addition to those of the super class. Although some database products have added object-oriented features, the relational model does not include the concept of entities that are specializations of other entities, so you will have to use special techniques to represent this in your schema.
To use the examples from the lesson on inheritance, an AbstractVehicle
base class might have a color
property which would apply to all vehicles. You might have created AbstractWheeledVehicle
to serve as a base class to all vehicles with wheels; it might add a wheelCount
property. The concrete class Truck
might indicate how much of a load it can carry, in some unit such as kilograms. A separate Boat
class would inherit the color
property from AbstractVehicle
, while adding an addition property hullCount
to indicate the number of main body portions the boat has.
There are three common approaches for representing inheritance in a relational database schema, each with benefits and trade-offs. You will need to evaluate the approach that best fits your design and use cases. This introduction uses the terminology from Martin Fowler's Patterns of Enterprise Application Architecture; more information and a reference site can be found at the end of this lesson.
- single table inheritance
- You could create a single relation to represent the entire hierarchy of vehicles. It would need some attribute representing which concrete class was instantiated. The relation heading would include all the possible attributes in the entire hierarchy: in this case
color
,wheelCount
,maxLoad
, andhullCount
. Each tuple would only supply values for those attributes relating to its indicated type. This technique can result in a large, confusing table with many NULL for many attributes. - concrete table inheritance
- Instead of creating a single relation for all classes, you could create a different relation for each concrete class, with each relation including all the attributes related to that class. In this example, you would create a
Truck
relation withcolor
,wheelCount
, andmaxLoad
attributes; and aBoat
relation withcolor
andhullCount
attributes. This approach results in duplicate attributes across relations. The database can't enforce relationships in this pattern; you might wind up with two vehicles with conflicting primary keys because they were in different relations. - class table inheritance
- Finally you could create a separate relation to represent each entity in the hierarchy—even the abstract classes. The “root” relation would ensure the vehicle identifier is unique across all vehicle types. Each superclass relation would contain attributes relating to itself and all its subclasses. Finally each relation representing a subclass would contain a foreign key indicating the tuple in the super class relation containing its super class properties. You might choose to use the same primary key values across all tables, with these same keys serving both as primary keys and as foreign keys for the subclass relations. Class table inheritance mostly closely reflects the original object-oriented domain model from a relational perspective. This arrangement makes the schema and consequently the application logic more complicated, requiring more complex queries to retrieve the data for a single object instance.
Normalization
The relational model is flexible and permits data to be stuffed into relations in many forms. But not every schema follows relational principles. The relational model has a rigorous mathematical foundation, and it in order to efficiently represent and query information, the schema design should follow certain rules. Breaking these rules may result in duplicated and inconsistent data, as well as make it hard to retrieve data once the data is stored.
Normalization is the process of changing a schema to follow relational principles, preventing data duplication and allowing for flexible querying. Normalization could be considered a type of refactoring of a schema, except that it follows a more formal set of rules. The primary goal of normalization is simple: each tuple in a relation should represent a single instance of a “thing”, identifiable by its primary key. More formally the concept is known as functional dependence (FD), and ensures that the primary key functions like a key in a map, “determining” the values of the other attributes of a tuple.
5 | Fifth Normal Form | ||
---|---|---|---|
4 | Fourth Normal Form | ||
3 |
| ||
2 | Second Normal Form | ||
1 | First Normal Form |
There are several ways that a schema could have broken this principle, which is why normalization sets out a series of normal forms that a schema can be in. The normal forms are arranged in a sequence; modifying a schema so that it is one normal form (or verifying that it is already in that normal form) guarantees that it adheres to that form and the normal forms before it. There are at least seven normal forms, and academics are coming up with more all the time.
First Normal Form
Item Code | Item | Stock | Supplier | Categories |
---|---|---|---|---|
100 | table | 20 | Acme Furniture Company, 123 Some Street | furniture |
200 | game table | 12 | Acme Furniture Company, 123 Some Street | furniture, toy |
300 | chair | 15 | Acme Furniture Company, 123 Some Street | furniture |
400 | balloon | 99 | Bozo Toy Company, 321 Other Street | toy |
500 | kite | 5 | Bozo Toy Company, 321 Other Street | toy |
600 | doll | 10 | Bozo Toy Company, 321 Other Street | toy |
The most basic form of normalization, First Normal Form (1NF), requires that each attribute value be atomic, that is, it can't be broken down into smaller pieces. Take another look at the “Item Stock” table the store owner was using in the lesson on indirection. The store owner has decided to note the categories of each item. There are two ways in which this table is not in 1NF:
- The
Supplier
column contains information that really should be separate: the name of a supplier and its address. As mentioned in the lesson on indirection, there would be no way to update a supplier's address without updating all the rows in which the supplier appears. - The
Categories
column has “repeating values”, a classic example of violation of 1NF. For items in more than one category, such as “game table”, the multiple categories are merely repeated as a single value in the column. Repeating values are hard to search for and manipulate.
You already saw how Category
really should be a separate entity. Having a many-to-many relationship with the items, an additional ItemCategory
association entity should be created as well.
Second Normal Form
Item Code | Category Code | Category Name |
---|---|---|
100 | AAA | furniture |
200 | AAA | furniture |
200 | BBB | toy |
300 | AAA | furniture |
400 | BBB | toy |
500 | BBB | toy |
600 | BBB | toy |
The Second Normal Form (2NF) requires that, in addition to being in 1NF, if there is a composite primary key, the other attributes must be functionally dependent on the entire key. Suppose that the store owner created only an Item Category
table that included the item code, the category code, and the category name. This indeed allows a many-to-many relationship; note that item with code 200
(the game table) has both AAA (furniture) and BBB
(toy) categories. Similarly there can be many items for one category.
The problem is that the Category Name
is functionally dependent on the Category Code
, which is only part of the composite key! This results in a lot of duplicated data, and can produce anomalies as explained below. The solution is to recognize that Category
should be a separate entity; the Category Name
should be moved to a separate Category
table, and the Item Category should consist of a composite key to join the Item Stock
and Category
tables, as illustrated above in the section on Association Entities.
Third Normal Form and Boyce-Codd Normal Form
Suppose further than in addition to splitting out the categories, the owner created Supplier ID
, Supplier Name
, and Supplier Address
columns in the original Item Stock
table. As required by 2NF, the Supplier Name
and Supplier Address
are functionally dependent on the entire primary key, the Item Name
. That is, any item code will completely determine the supplier ID, the supplier name, and the supplier address.
Item Code | Item | Stock | Supplier ID | Supplier Name | Supplier Address |
---|---|---|---|---|---|
100 | table | 20 | ACME | Acme Furniture Company | 123 Some Street |
200 | game table | 12 | ACME | Acme Furniture Company | 123 Some Street |
300 | chair | 15 | ACME | Acme Furniture Company | 123 Some Street |
400 | balloon | 99 | BOZO | Bozo Toy Company | 321 Other Street |
500 | kite | 5 | BOZO | Bozo Toy Company | 321 Other Street |
600 | doll | 10 | BOZO | Bozo Toy Company | 321 Other Street |
But the presence of so much duplicated data should be a hint that there is a problem. Duplicated data not only takes up extra space, it may also lead to an anomaly, in which changes to the data produce other, unintended results. This table exhibits three common anomalies of a relational database that has not been normalized. These anomalies correspond to “create”, “update”, and “delete” of the CRUD operations you are familiar with.
- insertion anomaly
- It is impossible to add a supplier without there being some item associated with the supplier.
- update anomaly
- Because the supplier address is duplicated, updating the supplier address for one item, such as “chair”, results in a discrepancy with the same supplier's address as listed for the other items, such “table”.
- deletion anomaly
- Removing an item that is no longer sold would also remove the supplier, if that supplier didn't supply any other items.
The problem with the new Item Stock
table is that the Supplier Name
and Supplier Address
attributes are functionally dependent on the full Item Code
key, they exhibit a transitive functional dependency. Rather than being directly dependent on Item Code
, they are each actually functionally dependent on Supplier ID
, which in turn is dependent on Item Code
.
The Third Normal Form (3NF) requries that, in addition to being in 2NF, there must be no transitive functional dependencies. The solution as before is to recognize that Supplier
is a separate entity with Supplier ID
, Supplier Name
, and Supplier Address
attributes. This in fact is exactly what the store owner did in the lesson on indirection, explained in the above section, Foreign Keys.
Products
The relational model has provided popular and useful for decades, so there are many RDBMS products available. Here are some of the significant products you should be aware of, along with any modeling software tailored specifically to that product. The popularity of a particular product does not always reflect how closely it follows the relational model or related industry standards.
- Oracle Database
- One of the oldest relational database products, Oracle is very popular for corporate enterprise deployments. Oracle provides the free SQL Developer Data Modeler software, written in Java, for modeling relational database for the Oracle platform. If you plan on actually connecting to an Oracle database, you should probably opt instead for the full Oracle SQL Developer software, also free, which provides live data query and manipulation capabilities in additional to all SQL Developer Data Modeler functionality.
- SQL Server
- SQL Server is Oracle's main enterprise competition from Microsoft Corporation. For database design Microsoft provides SQL Server Data Tools for Visual Studio as a component of its larger development environment.
- PostgreSQL
- With roots going back over three decades, PosgreSQL is a robust, enterprise-class, open-source database system. The open-source pgModeler software is an attractive package for database design tuned specifically for PostgreSQL. This course will primarily use PostgreSQL. You can compile the pgModeler source for free, or download a precompiled demo version supporting a limited number of tables. A full, unlimited version of the software is available for free to official students of this GlobalMentor course; contact your instructor for details.
- MySQL
- MySQL is another open-source database. Oracle Corporation, which now owns MySQL, makes the product available in a free community addition along with various commercial distributions. The MySQL Workbench software is a colorful, user-friendly tool for designing databases to run on MySQL. It is available in a free community edition, either bundled with MySQL or as a separate installation. You are required to create and sign in with a free Oracle account before you can download any MySQL software.
- SQLite
- SQLite is a public-domain database with a small footprint. It is made to be embedded in the application, and it claims to have the most installations of any database system. Although not a full modeling tool, the open-source DB Browser for SQLite provides a visual interface for designing and editing SQLite database files. The SQLite database system does not ensure that attribute values are of the same type as the the attribute domain; it should only be chosen if its ability to be embedded outweighs its lack of domain integrity.
- H2
- The H2 Database Engine is a small, fast, open-source, pure-Java database that is highly compliant with the most recent database industry standards. It can be used in both embedded and client-server mode, and even has an option for storing the entire database in memory. There is no data modeling software specifically targeted for H2, but many modeling tools allow connections to H2 databases.
Review
Gotchas
- Don't depend on attributes or “columns” in a relation appearing in any particular order.
- Don't depend on tuples or “rows” in a relation appearing in any particular order.
- Don't encode meaning into a surrogate key.
In the Real World
- The presence of
NULL
in a database tuple causes many problems; avoid it if you can.
Think About It
- Will I ever need to query components of this attribute value? If so, maybe the value should be a separate entity, stored in a separate relation with its own attributes.
Self Evaluation
- What mathematical construct is the relational model based on?
- What a relation different from a general table?
- How can a relational database be considered a set of logical propositions about the world?
- Give an example of a natural key.
- What does it mean for a key to be “opaque”, and is this a good or a bad thing?
- Name and explain at least three types of database constraints.
- What is domain integrity, and why is it important? Which popular database product lacks domain integrity?
- How are two ways that the word “schema” can be used? Provide examples.
- What is “object-relational impedence mismatch”? Give some examples. Do you believe it is more a difference of model semantics or design philosphy?
- Explain three techqniques for modeling an object inheritance hierarchy.
- What is the primary purpose of the lower normal forms?
- What is the minimal normal form you should aim for in your schema?
Task
Enhance your Booker domain model to keep track of the name, address, and web site of each publisher.
Use pgModeler to design a relational database schema for your Booker domain model. Represent the schema with an ER diagram using crow's foot notation.
- You will need to use pgModeler version 0.9.1-alpha1 or later for crow's foot notation support, which can be configured in the settings.
- Upon startup on Windows, pgModeler may complain that it is unable to write to the plugins directory. This should not prevent any functionality needed to complete this task.
- Export your ER diagram to SVG format. Store both the pgModeler file and the SVG file in the
doc/
directory you used for storing your UML class diagrams, which now should be located in the root of your repository.
See Also
- A Simple Guide to Five Normal Forms in Relational Database Theory (William Kent)
- Usage-Driven Database Design (George Tillmann - Apress, 2017)
- Database Design & Relational Theory (C. J. Date - O'Reilly, 2012)
- Relational Database Design and Implementation, Fourth Edition (Jan Harrington - Morgan Kaufmann, 2016)
- Patterns of Enterprise Application Architecture (Martin Fowler - Addison Wesley, 2003)
References
Resources
- Oracle Database
- Oracle SQL Developer
- Oracle SQL Developer Data Modeler
- SQL Server
- SQL Server Data Tools for Visual Studio
- PostgreSQL
- pgModeler
- MySQL
- MySQL Workbench
- SQLite
- DB Browser for SQLite
- H2
Acknowledgments
- Some symbols are from Font Awesome by Dave Gandy.