0. Resources
- F2023 #02 - Modern SQL (CMU Intro to Database Systems)
- Database System Concept - Chapter 3-4-5
1. SQL History
When Ted Codd wrote his famous paper, he didn’t provide a programming language for operating on relational database.
Some people at IBM tried to put in practice those mathematical ideas and so IBM creates its first relational query language, called SQUARE. It was difficult to use, though.
In 1972 then IBM created SEQUEL (Structured English QUEry Language), as part of the System R project. This name was later changed to SQL because “Sequel” was a trademark of a UK company.
After testing SQL at customer test sites to determine the usefulness and practicality of the system, IBM released SQL-based DBMSs based on their System R prototype, including System/38 (1979), SQL/DS (1981), and DB2 (1983).
By 1986, ANSI and ISO standard groups officially adopted the standard “Database Language SQL” language definition. New versions of the standard were published in 1989, 1992, 1996, 1999, 2003, 2006, 2008, 2011, 2016 and most recently, 2023.
Actually nobody follows SQL Standard. Every single DBMS has its own proprietary extension that deviates from SQL Standard. They sit on it, but they add new and specific functions.
2. SQL Overview
The SQL Language has several parts:
- Data-Definition Language (DDL): it provides commands for defining relation schemas, deleting relations, and modifying relation schemas. It also allows to specify integrity constraints that the data stored in the DB must satisfy (Integrity) and to define views (View Definition).
- Data-Manipulation Language (DML): it provides the ability to query information from the DB and to insert tuples into, delete tuples from, and modify tuples in the DB. As we already said remember that the portion of a DML that involves information retrieval is called a Data Query Language (DQL). Although technically incorrect, it is common practice to use the terms query language (DQL) and DML synonymously. (Actually Wikipedia treats DQL as a separate SQL sub-language, not a portion of DML, but this doesn’t change the meaning of what we are talking about).
Important: SQL is based on bags (there could be duplicates in relations) not sets (no duplicates) as in relational algebra.
3. SQL Data-Definition Language (SQL DDL)
The set of relations and information about each relation in a DB are specified using a DDL.
The SQL standard supports a variety of built-in types, including:
char(n): a fixed-length character string with user-specified lengthn;varchar(n): a variable-length character string with user-specified maximum lengthn;int: an integer;smallint: a small integer;numeric(p,d): a fixed-point number with user-specified precision where the number consists ofpdigits (plus a sign), anddof thepdigits are to the right of the decimal point.real,double precision: floating-point and double-precision floating-point numbers with machine-dependent precision;float(n): a floating-point number with precision of at leastndigits.
Each type may include a special value called the null value. A null value indicates an absent value that may exist but be unknown or that may not exist at all. In certain cases, we may wish to prohibit null values from being entered.
It is suggested to always use varchar instead of char because if you compare A and B values where A is a char(10) value containing the value “Avi” and B is a varchar(10) containing the same value “Avi”, in some DBMS extra spaces are automatically attached to the shorter value to make them the same size before performing the comparison A=B. Some DBMS do that and some don’t, so don’t use varchar.
Common SQL DDL commands are: CREATE TABLE, INSERT, UPDATE, DELETE, DROP TABLE, ALTER TABLE
3.1. Basic Schema Definition
crate table: it is used to define a SQL relation by specifying the name of attributes and the associated data types. For example:
create table department
(dept name
building
budget
primary key (dept name));SQL supports a number of different integrity constraints:
- primary key(A_j1, A_j2, …, A_jm): this specification says that attributes A_j1, A_j2, …, A_jm form the primary key for the relation. The primary-key attributed are required to be nonnull and unique.
- foreign key(A_k1, A_k2, …, A_kn) references s: this specification says that the values of attributes A_k1, A_k2, …, A_kn for any tuple in the relation must correspond to values of the primary key attributes of some tuple in relation s.
- not null: this constraint specifies that the null value is not allowed for that attribute.
A newly created relation is empty initially. Inserting tuples into a relation, updating them, and deleting them are done by DML statements insert, update, and delete.
3.2. Other DDL Commands
To remove a relation with its schema from a SQL DB we’ll use:
drop table r;If you want, instead, to retains relation r, but deleters all tuples in r, we’ll use:
delete from r;Note that delete is a DML statement.
To add attributes to an existing relation we’ll use:
alter table r add attribute1;Note that all tuples in the relation are assigned null as the value for the new attribute.
We can also drop attributes from a relation by the command:
alter table r drop attribute1;4. SQL Data Query Language (SQL DQL)
The basic structure of an SQL query consists of three clauses: select, from, and where. A query takes as its input the relations listed in the from clause, operates on them as specified in the where and select clauses, and then produces a relation as the result.
For example:
select name from instructor;The result is a relation consisting of a single attribute with the heading name. If you want to remove duplicates:
select distinct name from instructor;The select clause may also contain arithmetic expressions involving the operators +, −, ∗, and / operating on constants or attributes of tuples:
select ID, name, dept_name, salary * 1.1 from instructor;The where clause allows us to select only those rows in the result relation of the from clause that satisfy a specified predicate:
select name from instructor
where dept_name = 'Comp. Sci.' and salary > 70000;SQL allows the use of the logical connectives and, or, and not in the where clause. Furthermore you can use comparison operators like <, <=, >, >=, =, and <>.
4.0. To do. follow the book
4.1. Aggregate Functions
Aggregate functions are functions that return a single value from a bag of tuples. SQL offers five standard built-in aggregate functions:
AVG(col): returns the average col value;MIN(col): return minimum col value;MAX(col): return maximum col value;SUM(col): return sum of values in col;COUNT(col): return # of values for col.
Aggregate functions can (almost) only be used in the SELECT output list.
Query: “Get # of students with a ‘@cs’ login”.
SELECT COUNT(login) AS cnt
FROM student WHERE login LIKE '%@cs'
-- alternatively
SELECT COUNT(*) AS cnt
FROM student WHERE login LIKE '%@cs'
-- alternatively
SELECT COUNT(1) AS cnt
FROM student WHERE login LIKE '%@cs'
-- alternatively
SELECT COUNT(1+1+1) AS cnt
FROM student WHERE login LIKE '%@cs'Note that DB system doesn’t care what’s the expression inside COUNT() function.
Query: “Get # of students and their average GPA that have a ‘@cs’ login”.
SELECT AVG(gpa), COUNT(sid) AS cnt
FROM student WHERE login LIKE '%@cs'to finish!!!!!!