0. Resources

F2023 #01 - Relational Model & Algebra (CMU Intro to Database Systems)

1. Structure of Relational Database

Important

The Relational Model defines a database abstraction based on relations to avoid maintenance overhead.

Key tenets:

Store DB in simple data structures (relations).
Physical storage left up to the DBMS implementation. This means that it’s no longer necessary to define the physical definition of the DB (tree or has table structures). You need just to say: “Here’s my relations and attributes” and the DBMS can try to make the best decision of how it actually wants to store data.
Access data through high-level language, DBMS figures out best execution strategy. Instead of using procedural code (Cobol, Fortran, C, etc.) to makes direct calls to the DB API, you can use an high level language to tell the DBMS that you want to do and the DBMS figure out the best way to do that.

A Relational DB (= a DB based on Relational Model) consists of a collection of tables, each of which is assigned a unique name:

A table in Relational DB is called Relation (because it’s representing a relation between attributes).
An Attribute refers to a column of a table.
A Tuple is a set of attribute values in the relation (basically it refers to a row of a table). For each attribute there is a set of permitted values, called the Domain of that attribute, so tuples values must be contained in that domain. The special NULL value is a member of every domain (if allowed).
A table with n columns is also called n-ary Relation.

2. Database Schema

Database schema. The description of the database is called the database schema or the database intension. This is specified at the creation of the database. It is not expected to change very often.
Database instance. The raw data that populates a database at a particular point in time is called a database instance or the extension of the database.

Here’s an example of these 2 aspects: b91e1a38256bf2762a2c5f9e677c708c

3. Keys

https://www.youtube.com/watch?v=xHlr6aQUZeU&t=84s&ab_channel=10MinutesLecturesinComputerScience

We must have a way to specify how tuples within a given relation are distinguished, and this is expressed in terms of their attributes. This is because an important property of a relation (table) in Relational DBMS is:

Important

each tuple is distinct, meaning that duplicates are not accepted.

Actually commercial DBMS relax this requirement and instead allow duplicate tuples.

To safeguard this property, we need to have one or more attributes of a relation that can uniquely identify every tuple in a relation.

Important

A Key is a set of one or more attributes of a relation that can uniquely identify every tuple in that relation.

This means that the following query:

SELECT * FROM Emp WHERE Emp_ID = 'E101'

must return either 0 or 1 records at maximum. If this happens, Emp is a key.

Relational DBMS uses three different Key concepts and all have the same property, that is they uniquely determine a record in a table:

Super Key: an attribute, or set of attributes, that uniquely identifies a tuple in a relation. The idea is that it can be any attribute, or any combination of attributes. If you consider the following table:

super key can be: (EmpID), (EmpID, Ename), (EmpID, Dept), (Ename,Dept), (EmpID, Ename, Dept), etc.
Candidate Key: a super key which no proper subset consists of a super key. Let’s take the five super key considered in the previous bullet point:
- (EmpID): the proper subset is Ø. This subset does not consist in any super key, so (EmpID) is a candidate key;
- (EmpID, Ename): the proper subset consists of {{EmpID}, {Ename}, {Ø}}. Out of these three, EmpID is already in the list of the super keys, hence (EmpID, Ename) is not a candidate key;
- (EmpID, Dept): the proper subset consists of {{EmpID}, {Dept}, {Ø}}. For the same reason as above it cannot be a candidate key.
- (Ename,Dept): the proper subset consists of {{Ename}, {Dept}, {Ø}}. None of them are part of the super key list, hence (Ename,Dept) is a candidate key.
Primary Key: one of the available candidate keys can be chosen. In most of cases we choose as primary key a candidate key with a single attribute because it’s easy to deal with. If the candidate key I choose consists of more than one attribute, we can refer to it as Composite Key.

Some DBMS automatically create an internal primary key if a table does not define one.

A Foreign Key specifies that an attribute from one relation maps to a tuple in another relation.

4. Relational Query Languages

A Query Language is a language in which a user requests information from the DB. Query Languages can be:

Imperative Query Languages: the user instructs the system to perform a specific sequence of operations on the database to compute the desired result.
Functional Query Languages: the computation is expressed as the evaluation of functions that may operate on data in the database or on the results of other functions.
Declarative Query Languages: the user describes the desired information without giving.a specific sequence of steps or function calls for obtaining that information and the desired information is typically described using some form of mathematical logic. It is the job of the DB system to figure out how to obtain the desired information.

The Relational Algebra is a fundamental query language and it forms the theoretical basis of the SQL Query Language.

Query Languages used in practice, such as SQL, include elements of the imperative, functional, and declarative approaches.

5. The Relational Algebra

The Relational Algebra consists of a set of operations that take one or two relations as input and produce a new relation as their result. We can chain operators together to create more complex operations.

Unary Operations: operations involving one relation.
Binary Operations: operations involving a pair of relations.

Remember that since a relation is a set of tuples, relations cannot contain duplicate tuples. In practice, however, tables in DBMS are permitted to contain duplicates unless a specific constraint prohibits it. In discussing the formal Relational Algebra, we require that duplicates be eliminated, as it required by the mathematical definition of set.

The fundamental operations of Relational Algebra are:

0f1fbd662836a1109c57fc3bfecc8269

5.1. Select Operation

The Select operation selects a subset of the tuples from a relation that satisfies a selection predicate.

Predicates acts as a filter to retain only tuples that fulfils its qualifying requirement.
Can combine multiple predicates using conjunctions/disjunctions.

Syntax: $σ_{p re d i c a t e} (R)$

Examples: $σ_{d e pt_nam e =^{'} P h ys i c s^{'}} (in s t r u c t or)$ or $σ_{s a l a ry > 9000} (in s t r u c t or)$

Select operation is equivalent to WHERE clause in SQL:

SELECT -- example 1
	*
FROM instructor 
WHERE dept_name='Physics';
 
SELECT -- example 2
	*
FROM instructor 
WHERE salary>9000

5.2. Projection Operation

The Projection operation generate a relation with tuples that contains only the specified attributes. It aims at:

Rearrange attributes’ ordering.
Remove unwanted attributes.
Manipulate values to create derived attributes.

Syntax: $Π_{A 1, A 2, ..., A n} (R)$

Examples: $Π_{I D, nam e, s a l a ry} (in s t r u c t or)$ or $Π_{I D, nam e, s a l a ry /12} (in s t r u c t or)$

Projection operation is equivalent to SELECT clause in SQL:

SELECT -- example 1
	ID
	, name
	, salary
FROM instructor;
 
SELECT -- example 2
	ID
	, name
	, salary/12
FROM instructor

5.3. Cartesian-Product Operation

The Cartesian-Product operation generates a relation that contains all possible combinations of tuples from the input relations.

Syntax: $R \times S$

Example:

8ad1697786c729f7ebc8c7774baf361a

Projection operation is equivalent to CROSS JOIN clause in SQL:

SELECT * FROM R CROSS JOIN S;

5.4. Join Operation

The Join operation generates a relation that contains all tuples that are a combination of two tuples (one from each input relation) with a common value(s) for one or more attributes.

Syntax: $(R ⋈ S)$

Example:

f15ac23224f16e590fa15a300dc309ab

ecfc39ce619e5a66b6255cd5d51bfb18

Projection operation is equivalent to JOIN clause in SQL:

SELECT * FROM R NATURAL JOIN S; -- option 1
 
SELECT * FROM R JOIN S USING (a_id, b_id); -- option 2
 
SELECT * FROM R JOIN S ON R.a_id=S.a_id AND R.b_id=S.b_id; -- option3

5.5. Union Operation

The Union operation generates a relation that contains all tuples that appear in either only one or both input relations.

Syntax: $(R \cup S)$

Example:

2b2d5f888f03d3d50129419386051c0b

4c9e0d631b95d6f6622df47da0328694

Union operation is equivalent to UNION clause in SQL:

(SELECT * FROM R)
UNION
(SELECT * FROM S);

5.6. Intersection Operation

The Intersection operation generate a relation that contains only the tuples that appear in both of the input relations.

Syntax: $(R \cap S)$

Example:

ae26394e0d3c466fd423886aabb19c8c

Intersection operation is equivalent to INTERSECT clause in SQL:

(SELECT * FROM R)
INTERSECT
(SELECT * FROM S);

5.7. Difference Operation

The Difference operation generates a relation that contains only the tuples that appear in the first and not the second of the input relations.

Syntax: $(R - S)$

Example:

a95dd80663b40caa51d2f4c64df9d108

Intersection operation is equivalent to INTERSECT clause in SQL:

(SELECT * FROM R)
EXCEPT
(SELECT * FROM S);

5.8. Extra Operators

Rename: $(ρ)$
Assignment: $(R \leftarrow S)$
Duplicate Elimination: $(δ)$
Aggregation: $(γ)$
Sorting: $(τ)$
Division: $(R \div S)$

5.9. Equivalent Queries

Often there is more than one way to write a query in relational algebra. If you want to find information about courses taught by instructors in the Physics department you can use:

$σ_{d e pt_nam e =^{'} P h ys i c s^{'}} (in s t r u c t or ⋈_{in s t r u c t or . I D = t e a c h es . I D} t e a c h es)$

alternatively:

$(σ_{d e pt_nam e =^{'} P h ys i c s^{'}} (in s t r u c t or)) ⋈_{in s t r u c t or . I D = t e a c h es . I D} t e a c h es$

In the first query the Select (or WHERE in SQL) operator restricts dept_name to Physics after the Join, whereas in the second query the Select operator restricts dept_name to Physics to the instructor relation and then the Join operation is applied.

These two queries are equivalent because they return the same result on any DB.

Query Optimizers in DB typically look at that result an expression computes and find the most efficient way of computing that result, rather than following the exact sequence of steps specified in the query.

Quartz 4

Explorer

01. Relational Model & Algebra

0. Resources

1. Structure of Relational Database

2. Database Schema

3. Keys

4. Relational Query Languages

5. The Relational Algebra

5.1. Select Operation

5.2. Projection Operation

5.3. Cartesian-Product Operation

5.4. Join Operation

5.5. Union Operation

5.6. Intersection Operation

5.7. Difference Operation

5.8. Extra Operators

5.9. Equivalent Queries

Graph View

Table of Contents