Hands on sql introduction/ tutorial

Orginally created by Darrell Aucoin in conjuection with the Stats Club, find the original here.

Installation of SQLite Browser

SQLite Browser webpage

Windows

Download and install the file below

Windows download

Mac

Download and install the package below

Mac download

Ubuntu and Linux Mint

In terminal execute the following command

sudo apt-get install sqlitebrowser

Get the material

  1. Go to https://github.com/NormallySane/IntroSQL
  2. Download zip file alt text
  3. Unzip downloaded file in your preferred directory

Using SQLite Browser

  1. Open SQLite browser
  2. Open stats_club.db database file in the downloaded directory
  3. Click on ‘Execute SQL’ tab
    1. Open SQL file IntroSQL_Presentation1.sql file in the downloaded directory, IntroSQL_Presentation2.sql for second presentation.
    2. Follow along with file, executing statement as topics dictate
  4. Content of the talk is on https://github.com/NormallySane/IntroSQL/wiki (open in your favorite browser)

Features of SQLite Browser

alt text

alt text

alt text

alt text

alt text

Motivation

Q: Why learn a database lanugage?

Q: Why learn SQL?

Q: What is SQL?

Q: Why use SQLite?

Different Implementations of SQL

MySQL: Highly popular open source SQL implementation.

PostgreSQL: Open source SQL designed around letting users create User Defined Functions (UDF).

SQLite: Open sources light weight SQL usually used as an embedded database for applications (web browsers, mobile applications, etc.), or light to medium traffic websites. Database is saved as a single file and only allows one writer at a time. Various OS’s have SQLite preinstalled (type sqlite3 in terminal for mac)

Oracle: SQL implementation produced and marketed by Oracle Corporation.

Microsoft SQL Server: SQL implementation developed by Microsoft.

DB2: SQL developed by IBM. The database used by University of Waterloo.

Relational Databases

Relational Database: A relational database is a system organized in tables containing records (rows) related to other records in other tables. Each entity having attributes (columns also called fields) which give additional information on an entity.

![alt text](https://raw.githubusercontent.com/NormallySane/IntroSQL/master/Images/RelationalDatabase.png)

Field, Attribute (Column): An individual piece of data stored in a table.

Record (Row): A tuple of columns describing an entity or action of an entity.

Table: A collection of rows. Usually in reference a persistent table saved permanently to memory

Result set: A non-persistent table, usually the result of a query.

(Virtual) View: A named query saved into memory and performed whenever it is named. Some SQL servers have materialized views that permanently save the data for faster access.

Subquery: A query that returns a table to another query.

Primary key: A tuple of columns that uniquely define each row in a table.

Foreign key: A tuple of columns identifying a relationship to another table.

The E-R Model

The E-R (entity-relationship) Model is an analysis tool for relational databases. E refers to entity: An object R refers to relationship: As in how objects relate to each other.

Each entity has properties associated with every instance of an object.

![alt text](https://raw.githubusercontent.com/NormallySane/IntroSQL/master/Images/ER_Model.png)

Objects are related to each other through relationships:

![alt text](https://raw.githubusercontent.com/NormallySane/IntroSQL/master/Images/ER_Model2.png)

E-R Relationships

  1. One to one relationships: For every instance of object A, there are exactly one instance of object B.
    • Every student has exactly one quest account
  2. One to many relationships: For every instance of object A, there are an indeterminate number of instances of object B.
    • A single course section has many students
  3. Many to one relationships: For many instances of object A, there are a single instance of object B.
    • Many students are enrolled in a course section
  4. Many to many relationships: For many instances of object A, there are a many possible instances of object B.
    • Many students take many courses

Normalization

Database Normalization is a process in relational databases to minimize redundacy. Data can be constructed so that additions, deletions, and modifications can be made in just one table.

Data goes through three forms until it is fully utilizied by a relational database:

  1. First Normalized Form: Each attribute contains only atomic values and only a single value from that domain. i.e. The data is in the form of a table with no merged columns or rows.
  2. Second Normalized Form: Each attribute in the table is dependent on the entity it is describing (dependent on the primary key).
  3. Third Normalized Form: All non-primary key attributes are not determined by any other non-primary key attributes.

Example: For each event for Stats Club we have: 1. a list of attendies along with some basic information (email, ect.), 2. the event’s name, type (social or education), the roles for Stats Club execs for the event, time and location of the event, and a budget, 3. as well as a list of expenses for the event

A report of the event would look something like:

![alt text](https://raw.githubusercontent.com/NormallySane/IntroSQL/master/Images/Event_report.png)

We first need to tabulate the data:

![alt text](https://raw.githubusercontent.com/NormallySane/IntroSQL/master/Images/Event_report_table.png)

However this is not enough, the data is not in a form that can be easily recognized by computers. - How do you add new events? - What about members that attend multiple events? - that attend no events?

1st Normalization

First Normalized Form: Each attribute contains only atomic values and only a single value from that domain

Example: To bring the data into 1st normalized form we need to break the table into two tables: Event, and Expenses:

![alt text](https://raw.githubusercontent.com/NormallySane/IntroSQL/master/Images/Event_report_table_1st.png)
![alt text](https://raw.githubusercontent.com/NormallySane/IntroSQL/master/Images/Event_report_table_1st.png)

2nd Normalization

Second Normalized Form: Each attribute in the table is dependent on the entity it is describing (dependent on the primary key).

Example: To bring the data into 2nd normalized form, we need to break the Event table again. Let’s break the table so we get important description of the events (name, type, presenter, organizer, etc.) and a list of members that attended each event.

![alt text](https://raw.githubusercontent.com/NormallySane/IntroSQL/master/Images/Event_report_table_2nd.png)
![alt text](https://raw.githubusercontent.com/NormallySane/IntroSQL/master/Images/Event_report_table_2nd.png)

3rd Normalization

Third Normalized Form: All non-primary key attributes are not determined by any other non-primary key attributes.

Example: The information on each member (name, email, etc.) is not determined by the event. We need to break the attendance table to bring into 3rd normalized form: attendance and members.

![alt text](https://raw.githubusercontent.com/NormallySane/IntroSQL/master/Images/Event_report_table_3rd.png)

Primary Keys and Foriegn Keys

We can reconstruct the orignal table by joining tables, foreign keys with what they reference (primary keys). - We can only construct an instance of a foreign keys if their is already an instance of their reference - i.e. the set of foreign keys for one table is a subset of the primary keys they reference

Primary key: A tuple of columns that uniquely define each row in a table. (Red items below) Foreign key: A tuple of columns identifying a relationship to another table. (Blue items below)

![alt text](https://raw.githubusercontent.com/NormallySane/IntroSQL/master/Images/primary_keys.png)

Quiz: Normalization

Normalize the following table:

Student Name Quest ID Course Description Section
         

Solutions are here

Relational Algebra Operations

Projection: Returns a subset of columns.

Selection: Returns only entities where some condition is true.

Rename: Rename an attribute.

Natural Join: Tuples from one table is joined to tuples from another table based on common attributes (at least one column with the same name and possible values is common between them)

Θ__-Join and Equijoin:__ Join tuples from two different tables where some binary condition (Θ = {≥, ≤, >, <, =}) between two tables attributes is true. When Θ is =, the join is called an equijoin.

Set Operations: Set theory’s unions, set difference, and cartesian product of tuples performed on tuples of different tables.

Quiz: Relational Algebra

Q1: What kind of operation is performed on the following tables?

![alt text](https://raw.githubusercontent.com/NormallySane/IntroSQL/master/Images/quiz2a.png)

Q2: What kinds of joins can peform on tables?

Solutions are here

Constraints

Constraints limit what can be entered into fields in a table and help ensure encapsulation:

PRIMARY KEY constraints Uniquely identifies each record in a table (quickly referencing it)

FOREIGN KEY constraints Points to a PRIMARY KEY of another table, enabling them to easily join them

CHECK constraints Limits the range of values that a field can take.

UNIQUE constraints Enforces uniqueness on an field (column).

NOT NULL constraints Enforces a field to always contain a value.

SQL Language

![alt text](https://raw.githubusercontent.com/NormallySane/IntroSQL/master/Images/StatClubTables.png)

Data Types

In SQLite3, there are 5 basic data types:

  1. INTEGER The value is a signed integer.

  2. REAL The value is a floating point value, stored as an 8-byte IEEE floating point number.

  3. TEXT The value is a text string.

  4. BLOB The value is a blob of data, stored exactly as it was input.

  5. NUMERIC May contain integer or real values. The data is stored as text but converted as necessary.

SQL General Data Types

Data type Description
CHAR(n) Character string. Fixed length n
VARCHAR(n) Character string. Variable length <= n
BINARY(n) Binary string. Fixed length n
BOOLEAN TRUE or FALSE values
VARBINARY(n) Binary string. Variable length <= n
INTEGER(p) Integer numerical (no decimal). Precision p
SMALLINT Integer numerical (no decimal). Precision 5
INTEGER Integer numerical (no decimal). Precision 10
BIGINT Integer numerical (no decimal). Precision 19
DECIMAL(p,s) Exact numerical. precision p, scale s
NUMERIC(p,s) Exact numerical. precision p, scale s
FLOAT(p) Floating point number. mantissa precision p
REAL Approximate numerical. Mantissa percision 7
FLOAT Approximate numerical. Mantissa percision 16
DATE Stores year, month, and day values
TIME Stores hour, minute, and second values
TIMESTAMP Stores year, month, day, hour, minute, and second values

NULL Values

NULL A NULL can be thought of as an unknown value. - Any datatype can have NULL values - if either x or y is NULL:

x+y => NULL x>y => NULL

SQL uses a three-value logic system: TRUE, FALSE, NULL:

AND TRUE FALSE NULL
TRUE T F NULL
FALSE F F F
NULL NULL F NULL
OR TRUE FALSE NULL
TRUE T T T
FALSE T F NULL
NULL T NULL NULL
NOT TRUE FALSE NULL
  F T NULL

Projection (SELECT Clause)

SELECT Statement: The SELECT statement returns a table of values (sometimes empty) as specified in the statement.

SELECT Clause: The SELECT clause of a SELECT statement specifies the columns for the result set.

SELECT col1, col2, ...
FROM table_name;
SELECT *
FROM table_name;

In SELECT clauses, we can specify more than just columns from a table: Literials: Strings, numbers that are repeated for every row Expressions: Expressions of columns/literals Functions: Built in functions in SQL (ROUND(), etc.) User Defined Functions: Functions that a user can create within SQL to run

Example: Let’s try something simple: some literials and a calculation of 1 / 2

SELECT 1, 2, 'this is a string', 1/2;
1 2 'this is a string' 1/2
0 1 2 this is a string 0

Q: What happened here?

A: Since 1 and 2 are both integers, the expression 1/2 returns an integer. To get 0.5, we need to use a real number (a FLOAT).

SELECT 1, 2, 'this is a string', 1/2, 4/2, 5/2, 1/2., 1./2, 1/CAST(2 AS FLOAT);
1 2 'this is a string' 1/2 4/2 5/2 1/2. 1./2 1/CAST(2 AS FLOAT)
0 1 2 this is a string 0 2 2 0.5 0.5 0.5

SQL statements ignore white spaces and new lines, the statement is only processed when after it sees ‘;’

Example: The following two statements produce the same table.

SELECT 1, 2, 'this is a string', 1/2;
1 2 'this is a string' 1/2
0 1 2 this is a string 0
SELECT 1,
2,
             'this is a string',   1/2;
1 2 'this is a string' 1/2
0 1 2 this is a string 0

SQL Functions

There are many functions in SQLite and each implementation of SQL have different functions. Here are some random functions in SQLite:

Function Description
ABS(col) Absolute value of numeric column
LENGTH(col) Return length of string column
LOWER(col) Return the string column in lower case
UPPER(col) Return the string column in upper case
RANDOM() A pseudo-random integer between -9223372036854775808 and +9223372036854775807
SELECT ABS(-8), LENGTH('This is a String'), LOWER('ThIS Is A StRiNg'), RANDOM();
ABS(-8) LENGTH('This is a String') LOWER('ThIS Is A StRiNg') RANDOM()
0 8 16 this is a string 1085249690759383083

There are many more core functions within SQLite.

Quiz: SELECT Clause

Calculate the average of 2 random numbers.

Q: What is the upper and lower case for the string ‘UPPER or lower’?

Solutions are here

FROM Clause

FROM Clause: Specifies the table: either a persistant table, or a result set: a join of two or more tables or a subquery or some combination of the two.

SELECT col1, col2, ...
FROM table_name;

Example: What are the names, type, times and locations for Stats Club?

SELECT name, type, start_time, end_time, location
FROM event;
name type start_time end_time location
0 BOT social 2015-01-28 19:00:00 2015-01-28 22:00:00 C & D
1 EOT social None None None
2 Intro to Hadoop educational None None None
3 Intro to SQL educational 2015-02-05 18:00:00 2015-02-05 19:30:00 MC-3003
4 Prof Talk educational None None None
5 Intro to SQL 2 educational None None None
6 Prof Talk 2 educational None None None
7 Prof Talk 3 educational None None None

Quiz: FROM Clause

Q: Who are the execs for Stats Club, and what are their positions, and emails? - Projection of name, position, and email from the table exec

Solutions are here

Aggregate Functions

Aggregate Functions: Takes in the columns of a table and aggregates over the entries.

Function Return value
AVG(column) Average of non-null values
COUNT(column) Count of non-null values
MAX(column) Maximum of values
MIN(column) Minimum of values
SUM(column) Sum of values
GROUP_CONCAT(column) Concatenation of column strings

More detailed descriptions of the aggregate functions within SQLite can be found here.

SELECT COUNT( 12 ), COUNT('ssdf'), COUNT(NULL), SUM(23), SUM(0), SUM(NULL),
  AVG(0), AVG(NULL);
COUNT( 12 ) COUNT('ssdf') COUNT(NULL) SUM(23) SUM(0) SUM(NULL) AVG(0) AVG(NULL)
0 1 1 0 23 0 None 0 None

Lets work with some aggregate functions with the table example below:

number floating string
0 1 23.23 this
1 3232 -21.23 is
2 11 -2.00 a
3 -23 54.00 string
4 2 NaN concatenated
SELECT COUNT(*), COUNT(string), COUNT(floating), AVG(number), SUM(number),
GROUP_CONCAT(string, ' ')
FROM example;
COUNT(*) COUNT(string) COUNT(floating) AVG(number) SUM(number) GROUP_CONCAT(string, ' ')
0 5 5 4 644.6 3223 this is a string concatenated

DISTINCT Prefix

In the SELECT clause we can specify to return only distinct tuples of columns

SELECT DISTINCT col1, col2, ...
FROM table_name;
SELECT aggregate_function(DISTINCT column_name)
FROM table_name;

Example: What events have members attended? - What are the distinct events where at least one member attended?

SELECT DISTINCT event
FROM attendance;
event
0 Intro to SQL
1 BOT
2 EOT
3 Intro to Hadoop
4 Intro to SQL 2
5 Prof Talk
6 Prof Talk 2

Quiz: DISTINCT

Q: What are the distinct majors of Stats Club members?

Q: How many distinct majors of Stats Club members are there?

Solutions are here

Alias

Column Alias

To increase the readability of SQL as well as the result set, we can give columns new names:

SELECT col AS new_name
FROM table_name;

Table Alias

We can also give tables new names as well:

SELECT table_alias_1.col_name, table_alias_2.col_name, ...
FROM table_1 AS table_alias_1, table_2 AS table_alias_2;

Note: We can reference what table a column is coming from by a ‘.’

table_name.column_name

Example: If we give column aliases for the previous table, we make the result more interpretiable:

SELECT COUNT(*) AS num_rows, COUNT(string) AS num_strings,
  COUNT(floating) AS num_float, AVG(number) AS avg_integer,
  SUM(number) AS sum_int, GROUP_CONCAT(string, ' ') AS cat_string
FROM example;
num_rows num_strings num_float avg_integer sum_int cat_string
0 5 5 4 644.6 3223 this is a string concatenated

Quiz: Aliases

Perform a query using a table alias, and use this table alias when referencing the column i.e. table_alias.column_name - This will be really important for bigger, more complicated queries

Solutions are here

Selection (WHERE Clause)

WHERE clauses filters the result set, removing rows where the condition returns either FALSE or NULL.

SELECT col1, col2, ...
FROM table_name
WHERE condition;

Example: What Stats Club events are social?

SELECT name, type, start_time, end_time, location
FROM event
WHERE type = 'social';
name type start_time end_time location
0 BOT social 2015-01-28 19:00:00 2015-01-28 22:00:00 C & D
1 EOT social None None None

Prediate Operators

Operator Description Example
= Equal to WHERE gender = ‘M’
<>, != Not equal to WHERE gender <> ‘M’
> Greater than WHERE num > 5
< Less than WHERE num < 5
>= Greater than or equal to WHERE num >= 5
<= Less than or equal to WHERE num <= 5
IS NULL Value is NULL WHERE num IS NULL
IS NOT NULL Value is not NULL WHERE num IS NOT NULL
BETWEEN Between a range WHERE num BETWEEN 3 AND 5
IN In a list of values WHERE num IN (3, 5, 8)
LIKE Pattern Search WHERE str LIKE ‘F%’
EXISTS Subquery have any rows? WHERE EXISTS (subquery)

LIKE Predicate

LIKE predicates

‘_’ means a character of any type ‘%’ means between 0 or more characters of any type

Example: What Stats Club members has a name begining with F?

SELECT *
FROM member
WHERE name LIKE 'F%';
quest_id name email faculty major paid card
0 fred Fred E. Finch fred@uwaterloo.ca Math Pure Math Y Y
1 frances Frances A. Miller frances@uwaterloo.ca Math Stats Y Y

Operator Modifiers

ALL, ANY or SOME Operator Modifiers

The operators =, <>, !=, >, <, >=, <= can be used with a list of values and the operators ALL or ANY / SOME.

ANY, SOME Operator returns true, if operator is true for any value in the set.

ALL Operator returns true, if operator is true for all values in the set.

SELECT *
FROM table_name
WHERE column_name < ALL (subquery_returning_one_column);

Example: What are the expenses for non-social events? (ALL, ANY, and SOME are not implmented in SQLite)

SELECT * 
FROM expenses 
WHERE event != ALL (SELECT name 
                    FROM event
                    WHERE type = 'social');
event expense price
0 Intro to SQL pizza 87.43
1 Intro to SQL pop 15.34
2 Intro to Hadoop coffee 23.12
3 Intro to Hadoop water 10.23
4 Intro to Hadoop donuts 53.23
5 Intro to Hadoop cookies 10.23
6 Intro to SQL: Basic Queries cookies 10.23
7 Intro to SQL: Basic Queries donuts 20.34
8 Intro to SQL: Basic Queries pop 21.54
9 Intro to SQL: Basic Queries water 10.52
10 Prof Talk pop 20.31
11 Prof Talk pizza 62.56
12 Prof Talk 2 pizza 61.56
13 Prof Talk 2 pop 15.65
14 Prof Talk 3 pizza 62.45
15 Prof Talk 3 pop 13.23
16 Intro to SQL: Advanced Queries cookies 10.23
17 Intro to SQL: Advanced Queries donuts 20.34
18 Intro to SQL: Advanced Queries pop 21.54
19 Intro to SQL: Advanced Queries water 10.52

AND, OR Operators

A group of filter conditions can be linked together with AND or OR operators.

SELECT col1, col2, ...
FROM table_name
WHERE (condition1 AND condition2 ) OR condition3;

Example: What Stats Club members has a name with it’s second letter as A or ends in B?

SELECT name
FROM member
WHERE name LIKE '_a%' OR  name LIKE '%b';
name
0 Darrell Aucoin
1 James M. Eddings
2 James A. Foxt
3 Daniel J. Moore
4 Nancy P. Jackson
5 Ralph L. Waldrop
6 Tameika M. McMaster
7 Janelle T. Smith
8 Ruben Lamb
9 Patrick Robertson

Quiz: Filtering (WHERE Clause)

Q: What events for Stats Club are introductory talks? - Introductory talk names start with ‘Intro’

Q: What Stats Club members have their first name starting with a letter BETWEEN A and G?

Solutions are here

GROUP BY Clause

![alt text](https://raw.githubusercontent.com/NormallySane/IntroSQL/master/Images/Group By.png)
SELECT col1, col2, ..., aggregate_function
FROM table_name
GROUP BY col1, col2, ...;

Aggregate Functions

Recall: Aggregate Functions: Takes in the columns of a table and aggregates over the entries. - If we use a GROUP BY clause, the aggregation will be over those groups

Function Return value
AVG(column) Average of non-null values
COUNT(column) Count of non-null values
MAX(column) Maximum of values
MIN(column) Minimum of values
SUM(column) Sum of values
GROUP_CONCAT(column) Concatenation of column strings

Example: What are the number of each type of event for Stats Club?

SELECT type, COUNT(*) AS num_events
FROM event
GROUP BY type;
type num_events
0 educational 6
1 social 2

HAVING Clause

HAVING clauses are very similar to WHERE clauses but can have aggregate function in their conditions.

You can have a WHERE and HAVING clause in the same statement.

Example: How many Stats Club members are in each major where the major has at least 2 members?

SELECT faculty, major, COUNT(*)
FROM member
GROUP BY faculty, major
HAVING COUNT(*) >= 2;
faculty major COUNT(*)
0 Math Act Sci 10
1 Math Applied Math 2
2 Math C & O 2
3 Math CS 3
4 Math Pure Math 2
5 Math Stats 14

GROUP BY with ROLLUP / CUBE

The ROLLUP operator produces a result set where the aggregate function is applied to each level of the GROUP BY hierachy. - The ROLLUP operator is NOT implemented in SQLite - Useful for making reports with totals and subtotals

SELECT col1, col2, ..., aggregate_function
FROM table_name
GROUP BY col1, col2, ... WITH ROLLUP;

Example: What are the number of Stats Club members in each faculty and major, including subtotals?

SELECT faculty, major, COUNT(*) AS num_members
FROM member
GROUP BY faculty, major WITH ROLLUP;
faculty major num_members
0 Art Econ 1
1 Art None 1
2 Math Act Sci 10
3 Math Applied Math 2
4 Math C & O 2
5 Math CS 3
6 Math Pure Math 2
7 Math Stats 14
8 Math None 33
9 None None 34

Note that for Art and Math in faculty there is a row which has a NULL value. This is a total for those groups.

There is also one row with NULL values for faculty and major, this is the grand total of all members.

Quiz: Aggregation (GROUP BY Clause)

Q: What is the attendance for each Stats Club event?

Solutions are here

Joins

At times, we need information from multiple tables, to do this we need to join tables together. We can do this several ways:

  1. CROSS JOIN: The cartesian product of rows from each table.
  2. INNER JOIN: Join two tables on a join-predicate, losing rows when evaluated false/null.
  3. OUTER JOIN: Retains each record for the table(s) even when it has no matching rows from the other table. The returning table has null values for missing records.

    1. LEFT OUTER JOIN: Keep each record for first table but not the table it’s joining with.

    2. RIGHT OUTER JOIN: Keep each record for second table but not the table it’s joining with.

    3. FULL OUTER JOIN: Keep all record for all tables.

  4. NATURAL JOIN: Tables with the exact same column name and datatype are joined along that column.

CROSS JOIN

CROSS JOIN is the cartesian product of two tables

SELECT col1, col2, ...
FROM table1 CROSS JOIN table2;

We can also get a CROSS JOIN by listing the tables seperated by a ‘,’

SELECT col1, col2, ...
FROM table1, table2;

Example: Suppose we are creating a games tourtament between Stats Club members where every member play every other member once. How can we create such a table?

SELECT m1.name AS Player_1, m2.name  AS Player_2
FROM member AS m1 CROSS JOIN member AS m2;
Player_1 Player_2
0 Darrell Aucoin Darrell Aucoin
1 Darrell Aucoin Fred E. Finch
2 Darrell Aucoin Ryan T. Luby
3 Darrell Aucoin Billy L. Hunter
4 Darrell Aucoin John J. Oquendo
5 Darrell Aucoin Stephanie R. Matthews
6 Darrell Aucoin Robert B. Williams
7 Darrell Aucoin Austin K. Gilliard
8 Darrell Aucoin James M. Eddings
9 Darrell Aucoin Elaine S. Ott
10 Darrell Aucoin James A. Foxt
11 Darrell Aucoin Daniel J. Moore
12 Darrell Aucoin Kelly S. Ferguson
13 Darrell Aucoin Joseph L. Wood
14 Darrell Aucoin Vivian R. Donley
15 Darrell Aucoin Frances A. Miller
16 Darrell Aucoin Mina W. Lawrence
17 Darrell Aucoin Phillip C. Mascarenas
18 Darrell Aucoin Jeff M. Wright
19 Darrell Aucoin Deborah D. Helfrich
20 Darrell Aucoin Nancy P. Jackson
21 Darrell Aucoin Bobbie D. Mathews
22 Darrell Aucoin Arnold J. Fuller
23 Darrell Aucoin Melvin O. Martin
24 Darrell Aucoin Ralph L. Waldrop
25 Darrell Aucoin Mildred F. Hottinger
26 Darrell Aucoin Tameika M. McMaster
27 Darrell Aucoin Melissa R. Anderson
28 Darrell Aucoin Janelle T. Smith
29 Darrell Aucoin Ann W. McLaughlin
... ... ...
1126 Patrick Robertson John J. Oquendo
1127 Patrick Robertson Stephanie R. Matthews
1128 Patrick Robertson Robert B. Williams
1129 Patrick Robertson Austin K. Gilliard
1130 Patrick Robertson James M. Eddings
1131 Patrick Robertson Elaine S. Ott
1132 Patrick Robertson James A. Foxt
1133 Patrick Robertson Daniel J. Moore
1134 Patrick Robertson Kelly S. Ferguson
1135 Patrick Robertson Joseph L. Wood
1136 Patrick Robertson Vivian R. Donley
1137 Patrick Robertson Frances A. Miller
1138 Patrick Robertson Mina W. Lawrence
1139 Patrick Robertson Phillip C. Mascarenas
1140 Patrick Robertson Jeff M. Wright
1141 Patrick Robertson Deborah D. Helfrich
1142 Patrick Robertson Nancy P. Jackson
1143 Patrick Robertson Bobbie D. Mathews
1144 Patrick Robertson Arnold J. Fuller
1145 Patrick Robertson Melvin O. Martin
1146 Patrick Robertson Ralph L. Waldrop
1147 Patrick Robertson Mildred F. Hottinger
1148 Patrick Robertson Tameika M. McMaster
1149 Patrick Robertson Melissa R. Anderson
1150 Patrick Robertson Janelle T. Smith
1151 Patrick Robertson Ann W. McLaughlin
1152 Patrick Robertson Judith B. Gibbons
1153 Patrick Robertson Ruben Lamb
1154 Patrick Robertson Dominick Byrd
1155 Patrick Robertson Patrick Robertson

1156 rows × 2 columns

However we have have players playing themselves, and rounds with the same players in opposite roles. We just need to filter these out.

SELECT m1.name AS Player_1, m2.name  AS Player_2
FROM member AS m1 CROSS JOIN member AS m2
WHERE m1.name > m2.name;
Player_1 Player_2
0 Darrell Aucoin Billy L. Hunter
1 Darrell Aucoin Austin K. Gilliard
2 Darrell Aucoin Daniel J. Moore
3 Darrell Aucoin Bobbie D. Mathews
4 Darrell Aucoin Arnold J. Fuller
5 Darrell Aucoin Ann W. McLaughlin
6 Fred E. Finch Darrell Aucoin
7 Fred E. Finch Billy L. Hunter
8 Fred E. Finch Austin K. Gilliard
9 Fred E. Finch Elaine S. Ott
10 Fred E. Finch Daniel J. Moore
11 Fred E. Finch Frances A. Miller
12 Fred E. Finch Deborah D. Helfrich
13 Fred E. Finch Bobbie D. Mathews
14 Fred E. Finch Arnold J. Fuller
15 Fred E. Finch Ann W. McLaughlin
16 Fred E. Finch Dominick Byrd
17 Ryan T. Luby Darrell Aucoin
18 Ryan T. Luby Fred E. Finch
19 Ryan T. Luby Billy L. Hunter
20 Ryan T. Luby John J. Oquendo
21 Ryan T. Luby Robert B. Williams
22 Ryan T. Luby Austin K. Gilliard
23 Ryan T. Luby James M. Eddings
24 Ryan T. Luby Elaine S. Ott
25 Ryan T. Luby James A. Foxt
26 Ryan T. Luby Daniel J. Moore
27 Ryan T. Luby Kelly S. Ferguson
28 Ryan T. Luby Joseph L. Wood
29 Ryan T. Luby Frances A. Miller
... ... ...
531 Dominick Byrd Daniel J. Moore
532 Dominick Byrd Deborah D. Helfrich
533 Dominick Byrd Bobbie D. Mathews
534 Dominick Byrd Arnold J. Fuller
535 Dominick Byrd Ann W. McLaughlin
536 Patrick Robertson Darrell Aucoin
537 Patrick Robertson Fred E. Finch
538 Patrick Robertson Billy L. Hunter
539 Patrick Robertson John J. Oquendo
540 Patrick Robertson Austin K. Gilliard
541 Patrick Robertson James M. Eddings
542 Patrick Robertson Elaine S. Ott
543 Patrick Robertson James A. Foxt
544 Patrick Robertson Daniel J. Moore
545 Patrick Robertson Kelly S. Ferguson
546 Patrick Robertson Joseph L. Wood
547 Patrick Robertson Frances A. Miller
548 Patrick Robertson Mina W. Lawrence
549 Patrick Robertson Jeff M. Wright
550 Patrick Robertson Deborah D. Helfrich
551 Patrick Robertson Nancy P. Jackson
552 Patrick Robertson Bobbie D. Mathews
553 Patrick Robertson Arnold J. Fuller
554 Patrick Robertson Melvin O. Martin
555 Patrick Robertson Mildred F. Hottinger
556 Patrick Robertson Melissa R. Anderson
557 Patrick Robertson Janelle T. Smith
558 Patrick Robertson Ann W. McLaughlin
559 Patrick Robertson Judith B. Gibbons
560 Patrick Robertson Dominick Byrd

561 rows × 2 columns

INNER JOIN

INNER JOIN Joins two tables where the join condition returns true. Discarded when returning false or NULL.

SELECT col1, col2, ...
FROM table1 INNER JOIN table2 ON condition;

ON Clause

The ON clause specifies the join condition:

Example: How many events does each member attend?

SELECT m.name, COUNT(a.event) AS events_attended
FROM member AS m INNER JOIN attendance AS a ON m.quest_id = a.member
GROUP BY m.name
ORDER BY events_attended;
name events_attended
0 John J. Oquendo 1
1 James M. Eddings 2
2 Melissa R. Anderson 2
3 Melvin O. Martin 2
4 Mina W. Lawrence 2
5 Ann W. McLaughlin 3
6 Bobbie D. Mathews 3
7 Janelle T. Smith 3
8 Mildred F. Hottinger 3
9 Phillip C. Mascarenas 3
10 Ryan T. Luby 3
11 Vivian R. Donley 3
12 Arnold J. Fuller 4
13 Austin K. Gilliard 4
14 Jeff M. Wright 4
15 Kelly S. Ferguson 4
16 Nancy P. Jackson 4
17 Ralph L. Waldrop 4
18 Ruben Lamb 4
19 Billy L. Hunter 5
20 Deborah D. Helfrich 5
21 Dominick Byrd 5
22 Frances A. Miller 5
23 Fred E. Finch 5
24 Joseph L. Wood 5
25 Judith B. Gibbons 5
26 Robert B. Williams 5
27 Stephanie R. Matthews 5
28 Tameika M. McMaster 5
29 Elaine S. Ott 6
30 Daniel J. Moore 7
31 Patrick Robertson 7

OUTER JOIN

OUTER JOIN A join that returns all rows for 1 or 2 tables, even when there is no corresponding value. In these cases, NULL values are entered for these corresponding rows.

There are 3 types of OUTER JOINs:

  1. LEFT OUTER JOIN: An OUTER JOIN returning all rows of the table first mentioned.

  2. RIGHT OUTER JOIN: An OUTER JOIN returning all rows of the table second mentioned.

  3. FULL OUTER JOIN: An OUTER JOIN returning all rows of both tables.

Example: What are the names of Stat Club members and how many events they attended?

SELECT m.name, COUNT(a.event) AS events_attended
FROM member AS m LEFT OUTER JOIN attendance AS a ON m.quest_id = a.member
GROUP BY m.name
ORDER BY events_attended;
name events_attended
0 Darrell Aucoin 0
1 James A. Foxt 0
2 John J. Oquendo 1
3 James M. Eddings 2
4 Melissa R. Anderson 2
5 Melvin O. Martin 2
6 Mina W. Lawrence 2
7 Ann W. McLaughlin 3
8 Bobbie D. Mathews 3
9 Janelle T. Smith 3
10 Mildred F. Hottinger 3
11 Phillip C. Mascarenas 3
12 Ryan T. Luby 3
13 Vivian R. Donley 3
14 Arnold J. Fuller 4
15 Austin K. Gilliard 4
16 Jeff M. Wright 4
17 Kelly S. Ferguson 4
18 Nancy P. Jackson 4
19 Ralph L. Waldrop 4
20 Ruben Lamb 4
21 Billy L. Hunter 5
22 Deborah D. Helfrich 5
23 Dominick Byrd 5
24 Frances A. Miller 5
25 Fred E. Finch 5
26 Joseph L. Wood 5
27 Judith B. Gibbons 5
28 Robert B. Williams 5
29 Stephanie R. Matthews 5
30 Tameika M. McMaster 5
31 Elaine S. Ott 6
32 Daniel J. Moore 7
33 Patrick Robertson 7

Natural Join

NATURAL JOIN A join condition that lets the server decide on the join conditions based on the same column names and types across columns for the tables.

Example: What are the position and duties of each Stats Club exec?

SELECT  e.name, e.position, ep.duties
FROM exec AS e NATURAL JOIN exec_position AS ep;
name position duties
0 Darrell Aucoin President To be aware of MathSoc's Policies and Bylaws i...
1 Darrell Aucoin President To call and preside over general meetings.
2 Darrell Aucoin President To manage the executive team and the strategic...
3 Darrell Aucoin President To post announcements of all club meetings, an...
4 Judith B. Gibbons Events To assist the president and other vice-preside...
5 Judith B. Gibbons Events To chair the organization and promotion of lea...
6 Lamar Roy Finance To ensure membership fees are collected and ma...
7 Lamar Roy Finance To keep an up-to-date record of financial tran...
8 Lamar Roy Finance To prepare a summary of the financial records ...
9 Lamar Roy Finance To prepare the budget at the beginning of term.
10 Lamar Roy Finance To volunteer as president in the absence of th...
11 Gilberto Cross Events To assist the president and other vice-preside...
12 Gilberto Cross Events To chair the organization and promotion of lea...
13 Melba Lane President To be aware of MathSoc's Policies and Bylaws i...
14 Melba Lane President To call and preside over general meetings.
15 Melba Lane President To manage the executive team and the strategic...
16 Melba Lane President To post announcements of all club meetings, an...
17 Ruben Lamb Technology Maintain and update the club website.
18 Ruben Lamb Technology Maintain any hardware, software, or technology...
19 Ruben Lamb Technology Perform the duties of a Vice President - Event...
20 Patrick Robertson Events To assist the president and other vice-preside...
21 Patrick Robertson Events To chair the organization and promotion of lea...
22 Dominick Byrd Events To assist the president and other vice-preside...
23 Dominick Byrd Events To chair the organization and promotion of lea...

Quiz: Joining Tables

Q: What are the email addresses and phone numbers of stats club execs who are in change or organizing at least one event?

![alt text](https://raw.githubusercontent.com/NormallySane/IntroSQL/master/Images/StatClubTables2.png)

Solutions are here

Subqueries

Subqueries are queries contained in queries. These subqueries are contained in ‘(‘, ‘)’

There are two types of subqueries:

  1. Non-Correlated Subqueries: Can be run independently of the larger query.

  2. Correlated Subqueries: Must be run concurrently with the outer query. They are dependent on the outer query.

Non-Correlated Subqueries

Non-Correlated Subquery: Any valid query within query that if executed by itself will produce a result (including empty set). These are enclosed in ‘(‘, ‘)’ in FROM, WHERE, or HAVING clauses.

Example: What Stats Club execs are in charge of making posters?

SELECT e.name, e.position
FROM exec AS e
WHERE e.questid IN (SELECT poster FROM event);
name position
0 Dominick Byrd Events
1 Gilberto Cross Events
2 Judith B. Gibbons Events
3 Patrick Robertson Events

Correlated Subqueries

Correlated Subquery makes references to it’s containing query, executing it for every candidate row referenced. - Correlated subqueries can appear in the SELECT, WHERE, or HAVING clauses.

Example: What majors are the current Stats Club execs enrolled in? - The member table has the information on majors but exec has the information on execs

SELECT name, position,
(SELECT faculty FROM member AS m WHERE m.quest_id = e.questid) AS faulty,
(SELECT major FROM member AS m WHERE m.quest_id = e.questid) AS major
FROM exec AS e;
name position faulty major
0 Darrell Aucoin President Math Stats
1 Judith B. Gibbons Events Math Act Sci
2 Lamar Roy Finance None None
3 Gilberto Cross Events None None
4 Melba Lane President None None
5 Ruben Lamb Technology Math Act Sci
6 Hannah Mason SeniorAdvisor None None
7 Patrick Robertson Events Math Stats
8 Dominick Byrd Events Math Pure Math

Correlated vs Non-Correlated

  1. Correlated subquery is dependent on outer query, non-correlated is independent.

  2. Correlated subquery is executed concurrently with outer query, non- correlated is executed before.

  3. In general, for speed of execution:

Correlated subquery < Non-Correlated subquery < Joins

Quiz: Subqueries

Q: Where can a Non-Correlated subquery can be placed?

Q: Where can a Correlated subquery can be placed?

Q: Using a non-correlated subquery, what are the names, locations, and descriptions of events that served pizza? 1. Break the problem into smaller pieces: What are the events that served pizza? 2. Only retrieve values from the table event that event.name matches those values

Solutions are here

Set Operations

Set operations create a combination of rows from 2 tables into one result set. - The tables (or the projection of those tables) have to have the same number of columns and datatypes - if one column appears in a table but not in another, supply the value NULL for the missing column

  1. UNION: For tables A and B, combines the rows for A and B into one result set.
  2. INTERSECT: For tables A and B, returns the rows for A and B that they have in common.
  3. Difference (EXCEPT): For tables A and B, return only the rows in A that are not in common with B.

UNION

UNION operator: Addition of one result set to another result set with the same number of attributes and types.

SELECT ... FROM ...
UNION [ALL]
SELECT ... FROM ...

Example: Suppose are you need to contact everyone involved in Stats Club: members and execs. What are the names, email addresses, and phone numbers (if you have them) of all Stats Club members and execs?

SELECT name, email, NULL AS phone
FROM member
UNION
SELECT name, email, phone
FROM exec;
name email phone
0 Ann W. McLaughlin ann@uwaterloo.ca None
1 Arnold J. Fuller arnold@uwaterloo.ca None
2 Austin K. Gilliard austin@uwaterloo.ca None
3 Billy L. Hunter billy@uwaterloo.ca None
4 Bobbie D. Mathews bobbie@uwaterloo.ca None
5 Daniel J. Moore daniel@uwaterloo.ca None
6 Darrell Aucoin darrell.aucoin@gmail.com None
7 Darrell Aucoin darrell.aucoin@gmail.com 519-555-1424
8 Deborah D. Helfrich deborah@uwaterloo.ca None
9 Dominick Byrd dominick@uwaterloo.ca None
10 Dominick Byrd dominick@uwaterloo.ca 519-555-2325
11 Elaine S. Ott elaine@uwaterloo.ca None
12 Frances A. Miller frances@uwaterloo.ca None
13 Fred E. Finch fred@uwaterloo.ca None
14 Gilberto Cross gilberto@uwaterloo.ca 519-555-3453
15 Hannah Mason hannah@uwaterloo.ca 519-555-2342
16 James A. Foxt james.fox@uwaterloo.ca None
17 James M. Eddings james@uwaterloo.ca None
18 Janelle T. Smith janelle@uwaterloo.ca None
19 Jeff M. Wright jeff@uwaterloo.ca None
20 John J. Oquendo john@uwaterloo.ca None
21 Joseph L. Wood joseph@uwaterloo.ca None
22 Judith B. Gibbons judith@uwaterloo.ca None
23 Judith B. Gibbons judith@uwaterloo.ca 519-555-2343
24 Kelly S. Ferguson kelly@uwaterloo.ca None
25 Lamar Roy lamar@uwaterloo.ca 519-555-3432
26 Melba Lane melba@uwaterloo.ca 519-555-2322
27 Melissa R. Anderson melissa@uwaterloo.ca None
28 Melvin O. Martin melvin@uwaterloo.ca None
29 Mildred F. Hottinger mildred@uwaterloo.ca None
30 Mina W. Lawrence mina@uwaterloo.ca None
31 Nancy P. Jackson nancy@uwaterloo.ca None
32 Patrick Robertson patrick@uwaterloo.ca None
33 Patrick Robertson patrick@uwaterloo.ca 519-555-2312
34 Phillip C. Mascarenas phillip@uwaterloo.ca None
35 Ralph L. Waldrop ralph@uwaterloo.ca None
36 Robert B. Williams robert@uwaterloo.ca None
37 Ruben Lamb ruben@uwaterloo.ca None
38 Ruben Lamb ruben@uwaterloo.ca 519-555-5232
39 Ryan T. Luby ryan@uwaterloo.ca None
40 Stephanie R. Matthews step@uwaterloo.ca None
41 Tameika M. McMaster tameika@uwaterloo.ca None
42 Vivian R. Donley vivian@uwaterloo.ca None

Intersection (INTERSECT Operator)

INTERSECT operator: Returns only tuples that are in common between two result sets. Result sets must be equal in number and type of attributes.

SELECT ... FROM ...
INTERSECT
SELECT ... FROM ...;

Example: What Stats Club execs have also signed up as Stats Club members?

SELECT name, quest_id
FROM member
INTERSECT
SELECT name, questid
FROM exec;
name quest_id
0 Darrell Aucoin darrell
1 Dominick Byrd dominick
2 Judith B. Gibbons judith
3 Patrick Robertson patrick
4 Ruben Lamb ruben

Difference (EXCEPT Operator)

EXCEPT operator: Returns the first result set minus anything it has in common with the second result set.

SELECT ... FROM ...
EXCEPT [ALL]
SELECT ... FROM ...

Example: What Stats Club members are not execs?

SELECT name, quest_id
FROM member
EXCEPT
SELECT name, questid
FROM exec;
name quest_id
0 Ann W. McLaughlin ann
1 Arnold J. Fuller arnold
2 Austin K. Gilliard austin
3 Billy L. Hunter billy
4 Bobbie D. Mathews bobbie
5 Daniel J. Moore daniel
6 Deborah D. Helfrich deborah
7 Elaine S. Ott elaine
8 Frances A. Miller frances
9 Fred E. Finch fred
10 James A. Foxt james.fox
11 James M. Eddings james
12 Janelle T. Smith janelle
13 Jeff M. Wright jeff
14 John J. Oquendo john
15 Joseph L. Wood joseph
16 Kelly S. Ferguson kelly
17 Melissa R. Anderson melissa
18 Melvin O. Martin melvin
19 Mildred F. Hottinger mildred
20 Mina W. Lawrence mina
21 Nancy P. Jackson nancy
22 Phillip C. Mascarenas phillip
23 Ralph L. Waldrop ralph
24 Robert B. Williams robert
25 Ryan T. Luby ryan
26 Stephanie R. Matthews step
27 Tameika M. McMaster tameika
28 Vivian R. Donley vivian

WITH Clause

WITH clause: Makes a non-correlated subquery look like a table in the executed statement:

WITH subquery_name [(colname1, ...)] AS
        (SELECT ...),
        subquery_name2 [(colname1, ...)] AS
        (SELECT ...)
SELECT ...

Example: Suppose you are doing a report for MathSoc detailing Stats Club events, budget, and expenses.

WITH cost (event, expenses) AS
    (SELECT event, SUM(price)
    FROM expenses
    GROUP BY event)
SELECT e.name, e.type, e.budget, cost.expenses
FROM event AS e INNER JOIN cost ON e.name = cost.event;
name type budget expenses
0 BOT social 90.0 58.57
1 EOT social 160.0 160.65
2 Intro to Hadoop educational 90.0 96.81
3 Intro to SQL educational 90.0 102.77
4 Intro to SQL 2 educational 200 62.63
5 Prof Talk educational 90.0 82.87
6 Prof Talk 2 educational 90.0 77.21
7 Prof Talk 3 educational 90.0 75.68

Quiz: WITH Clause (HARD)

Q: MathSoc only provides a maximum cap on social events based on the formula below. What is the max cap for social expenses and is Stats Club over this limit? - Membership fee for Stats Club is 2 dollars

Social expenses = max{250, (MathSoc_Members) * (5 + (Member_fee))} Social expenses = max{250, (MathSoc_Members) * 7}

Break the problem into smaller problems: - What are the total expenses for social events? - What is the max budget for social events?

Smaller Problems: - What are the total expenses for social events? - What events are social events?

#### Think about how to solve this for 5 mins

Solutions are here

CASE Expressions

CASE expressions: Similar to a series of if else statements executed for every entry in a table. A new value is returned for every row in the table.

CASE [column]
        WHEN condition1 THEN result1
        WHEN condition2 THEN result2
        ...
        WHEN condition_n THEN result_n
        [ELSE result]
END

Example: Suppose like before we are preparing a report for MathSoc, but now we want to give a warning if the event is over budget or not.

WITH cost (event, expenses) AS
    (SELECT event, SUM(price)
    FROM expenses
    GROUP BY event)
SELECT e.name, e.type, e.budget, cost.expenses,
    CASE
    WHEN e.budget - cost.expenses < 0 THEN 'Over budget'
    ELSE NULL
END AS warning
FROM event AS e INNER JOIN cost ON e.name = cost.event;
name type budget expenses warning
0 BOT social 90 58.57 None
1 EOT social 160 160.65 Over budget
2 Intro to Hadoop educational 90 96.81 Over budget
3 Intro to SQL educational 90 102.77 Over budget
4 Intro to SQL 2 educational 200 62.63 None
5 Prof Talk educational 90 82.87 None
6 Prof Talk 2 educational 90 77.21 None
7 Prof Talk 3 educational 90 75.68 None

Quiz: CASE Expressions

Q: Suppose we are interested in the healthiness of our food options at Stats Club events. A score of various foods was given below. What is the average ‘healthiness’ of Stats Club events?

Food Score
donuts -2
pop -2
fries -2
pizza -1
cookies -1
coffee 0
water 2
meals 2
veggie platter 3

Solutions are here

ORDER BY Clause

ORDER BY Clause: Rearranges the rows of a result set according to a tuple of columns.

SELECT column_list
FROM table_name
ORDER BY column1, column2, .. columnN [ASC | DESC];

ASC is ascending (default)

DESC is descending

Example: What is the membership list ordered by name?

SELECT name
FROM member
ORDER BY name;
name
0 Ann W. McLaughlin
1 Arnold J. Fuller
2 Austin K. Gilliard
3 Billy L. Hunter
4 Bobbie D. Mathews
5 Daniel J. Moore
6 Darrell Aucoin
7 Deborah D. Helfrich
8 Dominick Byrd
9 Elaine S. Ott
10 Frances A. Miller
11 Fred E. Finch
12 James A. Foxt
13 James M. Eddings
14 Janelle T. Smith
15 Jeff M. Wright
16 John J. Oquendo
17 Joseph L. Wood
18 Judith B. Gibbons
19 Kelly S. Ferguson
20 Melissa R. Anderson
21 Melvin O. Martin
22 Mildred F. Hottinger
23 Mina W. Lawrence
24 Nancy P. Jackson
25 Patrick Robertson
26 Phillip C. Mascarenas
27 Ralph L. Waldrop
28 Robert B. Williams
29 Ruben Lamb
30 Ryan T. Luby
31 Stephanie R. Matthews
32 Tameika M. McMaster
33 Vivian R. Donley

LIMIT Clause

LIMIT Clause: Restrict the result set to the first n of rows.

SELECT column1, column2, columnN
FROM table_name
LIMIT no_of_rows;
SELECT column1, column2, columnN
FROM table_name
LIMIT no_of_rows OFFSET row_num;

Example: What is the first 10 members ordered by name?

SELECT name
FROM member
ORDER BY name
LIMIT 10;
name
0 Ann W. McLaughlin
1 Arnold J. Fuller
2 Austin K. Gilliard
3 Billy L. Hunter
4 Bobbie D. Mathews
5 Daniel J. Moore
6 Darrell Aucoin
7 Deborah D. Helfrich
8 Dominick Byrd
9 Elaine S. Ott

Quiz: ORDER BY and LIMIT Clause

Q: What are the top 10 highest priced items in expenses?

Solutions are here

Further Practice Questions?

Quiz: Practice Questions

Q: What events have dates specified?

Q: What events don’t have dates specified?

Q: What Stats Club members are in Stats, Act Sci, or CS? - Recall the IN predicate operator

Q: What are the Stats Club exec positions? - Avoid duplication of positions

Q: How many different Stats Club exec positions are there? - Note that we can use DISTINCT within aggregate functions

Topics Not Covered

Hands on SQL introduction/ tutorial - February 19, 2015 - Andrew Andrade