DATASCI 350 - Data Science Computing

Lecture 19 - Tables in SQL

Danilo Freire

danilo.freire@emory.edu

Department of Data and Decision Sciences
Emory University

Hello again! 😊

Brief recap 📚

Recap of last class and today’s plan

Last time we learned how to:

Connect SQL with Python with sqlite3 and pandas
Use many SQL commands, such as CASE WHEN, window functions, and string functions
Fill missing data with COALESCE and CASE WHEN
Use pandas to write and run SQL queries
Pivot tables (long to wide) in SQLite

Today we will learn how to:

See different types of joins in SQL
Use special joins, such as CROSS JOIN and SELF JOIN
Merge tables by row with UNION, INTERSECT, and EXCEPT
Learn how to create UPSERT statements in SQLite
Create VIEWS in SQLite
Solve exercises to practice what we learned
Let’s get started! 🚀

Basic joins 📊

Primary and foreign keys

As with many languages, you can merge two tables in SQL either by columns or by rows
The most common method is the JOIN clause
JOIN is used to combine rows and columns from two or more tables based on a related column between them
As you know, there are two types of keys, primary and foreign keys
The primary key is a column that uniquely identifies each row in a table
A foreign key is a column that identifies a column in another table
- One table can have multiple foreign keys, and they can be NULL
- SQLite supports foreign keys, and the Python sqlite3 module handles them correctly by default

Load the libraries and connect to the database

Let’s load the libraries and connect to the SQLite database. We’ll use a file named lecture19.db.

import pandas as pd; import sqlite3

# Connect to the SQLite database (this will create the file if it doesn't exist)
connection = sqlite3.connect('lecture19.db'); cursor = connection.cursor()

cursor.execute('DROP TABLE IF EXISTS players;')
cursor.execute('''
CREATE TABLE players (
    player_id INTEGER PRIMARY KEY AUTOINCREMENT, player_name TEXT NOT NULL UNIQUE,
    goals INT NOT NULL, victories INT NOT NULL
);
''')

cursor.execute('DROP TABLE IF EXISTS teams;')
cursor.execute('''
CREATE TABLE teams (
    team_id INTEGER PRIMARY KEY AUTOINCREMENT, team_name TEXT NOT NULL
);
''')
connection.commit() # Commit changes

Create the tables

Now let’s insert some data into the tables!

# Insert data into the tables
cursor.execute('''
INSERT INTO players (player_name, goals, victories) VALUES
('Messi', 10, 5),
('Vini Jr', 8, 4),
('Neymar', 6, 3),
('Mbappé', 5, 2),
('Lewandowski', 4, 1),
('Haaland', 5, 3);
''')

cursor.execute('''
INSERT INTO teams (team_name) VALUES
('Inter Miami'),
('Real Madrid'),
('Santos'),
('Real Madrid'),
('Barcelona');
''')
connection.commit() # Commit changes

Visualise the tables

Let’s see our tables using pandas 🐼
read_sql works fine with the sqlite3 connection object!

pd.read_sql('SELECT * FROM players', connection)

	player_id	player_name	goals	victories
0	1	Messi	10	5
1	2	Vini Jr	8	4
2	3	Neymar	6	3
3	4	Mbappé	5	2
4	5	Lewandowski	4	1
5	6	Haaland	5	3

pd.read_sql('SELECT * FROM teams', connection)

	team_id	team_name
0	1	Inter Miami
1	2	Real Madrid
2	3	Santos
3	4	Real Madrid
4	5	Barcelona

Types of joins

Inner join

The INNER JOIN returns only the records where there is a match between both tables (intersection) based on the join condition
If there’s no match for a record in either table, that record will be excluded from the results
The matching condition is specified in the ON clause (e.g., ON table1.id = table2.id)
Note that Haaland (player_id 6) is not in the teams table (which only has 5 rows), so he will not appear in the result set
Note: here we join on player_id = team_id because both are auto-incremented IDs that happen to line up. In practice, you would have a team_id foreign key column in the players table to link them properly

pd.read_sql('''
SELECT players.player_name, teams.team_name, players.goals, players.victories
FROM players
INNER JOIN teams
ON players.player_id = teams.team_id;
''', connection)

	player_name	team_name	goals	victories
0	Messi	Inter Miami	10	5
1	Vini Jr	Real Madrid	8	4
2	Neymar	Santos	6	3
3	Mbappé	Real Madrid	5	2
4	Lewandowski	Barcelona	4	1

Left join

The LEFT JOIN returns all records from the left table (first table) and the matched records from the right table (second table)
The result is NULL (None or NaN in Python) for columns from the right side if there is no match
This is perhaps the most common type of join, as it keeps all the data from the table we are usually primarily interested in (the “left” table)
Note that Haaland is included here because he is in the players table (the left table)

pd.read_sql('''
SELECT players.player_name, teams.team_name, players.goals
FROM players
LEFT JOIN teams
ON players.player_id = teams.team_id;
''', connection)

	player_name	team_name	goals
0	Messi	Inter Miami	10
1	Vini Jr	Real Madrid	8
2	Neymar	Santos	6
3	Mbappé	Real Madrid	5
4	Lewandowski	Barcelona	4
5	Haaland	None	5

Right join (emulated in SQLite)

The RIGHT JOIN command returns all records from the right table (second table) and the matched records from the left table (first table)
SQLite does not implement RIGHT JOIN, so we emulate it by swapping tables in a LEFT JOIN
- We select from the teams table first, then LEFT JOIN the players table
In our case, since teams has fewer rows than players and all team_ids match a player_id, the emulated result looks the same as the INNER JOIN.

pd.read_sql('''
SELECT players.player_name, teams.team_name, players.goals
FROM teams
LEFT JOIN players
ON players.player_id = teams.team_id;
''', connection)

	player_name	team_name	goals
0	Messi	Inter Miami	10
1	Vini Jr	Real Madrid	8
2	Neymar	Santos	6
3	Mbappé	Real Madrid	5
4	Lewandowski	Barcelona	4

Full outer join (emulated in SQLite)

FULL OUTER JOIN returns all records from both of them, regardless of whether there is a match in the other table
Rows from either table that do not have a match in the other table will have NULL for the columns of the table without a match
SQLite also does not implement FULL OUTER JOIN. We can emulate it by taking a LEFT JOIN in both directions and combining with UNION (which removes duplicates)

pd.read_sql('''
SELECT players.player_id, players.player_name, teams.team_name, players.goals
FROM players
LEFT JOIN teams
ON players.player_id = teams.team_id

UNION

SELECT players.player_id, players.player_name, teams.team_name, players.goals
FROM teams
LEFT JOIN players
ON players.player_id = teams.team_id
ORDER BY players.player_id;
''', connection)

	player_id	player_name	team_name	goals
0	1	Messi	Inter Miami	10
1	2	Vini Jr	Real Madrid	8
2	3	Neymar	Santos	6
3	4	Mbappé	Real Madrid	5
4	5	Lewandowski	Barcelona	4
5	6	Haaland	None	5

Try it yourself! 🧠

Let’s create two new tables (products, reviews) and insert some data into them. We use REAL for the price in SQLite

# Create the tables and insert data
cursor.execute('DROP TABLE IF EXISTS reviews;') 
cursor.execute('DROP TABLE IF EXISTS products;')
cursor.execute('''
CREATE TABLE products (
    product_id INTEGER PRIMARY KEY AUTOINCREMENT,
    product_name TEXT NOT NULL UNIQUE,
    price REAL 
);
''')

# Insert products
cursor.execute('''
INSERT INTO products (product_name, price) VALUES
    ('Coffee Maker', 99.99),
    ('Toaster', 29.99),
    ('Blender', 79.99),
    ('Microwave', 149.99),
    ('Air Fryer', 89.99);
''')

cursor.execute('''
CREATE TABLE reviews (
    review_id INTEGER PRIMARY KEY AUTOINCREMENT,
    product_id INT,
    rating INT CHECK (rating BETWEEN 1 AND 5),
    comment TEXT,
    FOREIGN KEY (product_id) REFERENCES products(product_id)
);
''')

# Insert reviews
cursor.execute('''
INSERT INTO reviews (product_id, rating, comment) VALUES
    (1, 5, 'Great coffee maker!'),
    (1, 4, 'Good but expensive'),
    (2, 3, 'Average toaster'),
    (3, 5, 'Best blender ever');
''')
connection.commit()
print("Tables 'products' and 'reviews' created and populated.")

Tables 'products' and 'reviews' created and populated.

Try it yourself! 🧠

Now try to merge the products and reviews tables using INNER JOIN and LEFT JOIN
Explain the differences between the two results based on which products appear
Appendix 01

Special joins 🌟

Cross join

CROSS JOIN is available in SQLite
A cross join does not use any comparisons (like ON) to match rows
Instead, the result is constructed by pairing every row from the first table with every row from the second table (Cartesian product)
Useful for generating all possible combinations (e.g., pairing all clothing sizes with all colours)

# Displaying cross join between players and teams
pd.read_sql('''
SELECT players.player_name, teams.team_name
FROM players
CROSS JOIN teams
ORDER BY players.player_id, teams.team_id;
''', connection)

	player_name	team_name
0	Messi	Inter Miami
1	Messi	Real Madrid
2	Messi	Santos
3	Messi	Real Madrid
4	Messi	Barcelona
5	Vini Jr	Inter Miami
6	Vini Jr	Real Madrid
7	Vini Jr	Santos
8	Vini Jr	Real Madrid
9	Vini Jr	Barcelona
10	Neymar	Inter Miami
11	Neymar	Real Madrid
12	Neymar	Santos
13	Neymar	Real Madrid
14	Neymar	Barcelona
15	Mbappé	Inter Miami
16	Mbappé	Real Madrid
17	Mbappé	Santos
18	Mbappé	Real Madrid
19	Mbappé	Barcelona
20	Lewandowski	Inter Miami
21	Lewandowski	Real Madrid
22	Lewandowski	Santos
23	Lewandowski	Real Madrid
24	Lewandowski	Barcelona
25	Haaland	Inter Miami
26	Haaland	Real Madrid
27	Haaland	Santos
28	Haaland	Real Madrid
29	Haaland	Barcelona

Cross join

Another example of CROSS JOIN
SQLite uses || for string concatenation

# Drop and recreate tables
cursor.execute('DROP TABLE IF EXISTS colours;')
cursor.execute('DROP TABLE IF EXISTS sizes;')
cursor.execute('CREATE TABLE colours (colour_name TEXT);')
cursor.execute('CREATE TABLE sizes (size_code TEXT);')
cursor.execute("INSERT INTO colours VALUES ('Black'), ('Red');")
cursor.execute("INSERT INTO sizes VALUES ('S'), ('M');")
connection.commit()

# Perform cross join and concatenate strings using ||
pd.read_sql('''
SELECT colours.colour_name, sizes.size_code, colours.colour_name || ' - ' || sizes.size_code as t_shirt
FROM colours
CROSS JOIN sizes
ORDER BY colours.colour_name, sizes.size_code DESC;
''', connection)

	colour_name	size_code	t_shirt
0	Black	S	Black - S
1	Black	M	Black - M
2	Red	S	Red - S
3	Red	M	Red - M

Self join

A self join is when you join a table with itself to find relationships between rows in the same table
But why would you do that? 🤔
- When data contains hierarchical relationships (manager → employee, mother → child, etc.)
You need two different “views” of the same table, so you use aliases to tell them apart
Imagine an employees table, and you want to show each employee with their manager’s name:

employee_id	name	manager_id
1	Alice	NULL
2	Bob	1
3	Charlie	1

The SQL query would look like this:

SELECT e.name AS employee, m.name AS manager
FROM employees AS e          -- First "copy" for employees
LEFT JOIN employees AS m     -- Second "copy" for managers
ON e.manager_id = m.employee_id  -- Join condition

Self join

Let’s see an example with a family table where mother_id refers back to person_id.

cursor.execute('DROP TABLE IF EXISTS family;')
cursor.execute('''
CREATE TABLE family (
    person_id INTEGER PRIMARY KEY AUTOINCREMENT,
    name TEXT,
    mother_id INT 
);
''')

cursor.execute('''
INSERT INTO family (name, mother_id) VALUES
    ('Emma', NULL), -- grandmother (id 1)
    ('Sarah', 1),   -- Emma's daughter (id 2)
    ('Lisa', 1),    -- Emma's daughter (id 3)
    ('Tom', 2),     -- Sarah's son (id 4)
    ('Alice', 2);   -- Sarah's daughter (id 5)
''')
connection.commit()

pd.read_sql('SELECT * FROM family;', connection)

	person_id	name	mother_id
0	1	Emma	NaN
1	2	Sarah	1.0
2	3	Lisa	1.0
3	4	Tom	2.0
4	5	Alice	2.0

Now we want to find each child with their mother’s name using a self join

# Self join to find child-mother pairs
pd.read_sql('''
SELECT children.name as child, mothers.name as mother
FROM family AS children
JOIN family AS mothers ON children.mother_id = mothers.person_id
ORDER BY mothers.name;
''', connection)

	child	mother
0	Sarah	Emma
1	Lisa	Emma
2	Tom	Sarah
3	Alice	Sarah

It matches each child to their mother by connecting the child’s mother_id to the mother’s person_id
Sarah (children) and Lisa (children): mother_id = 1 → Emma (mothers): person_id = 1
Tom (children) and Alice (children): mother_id = 2 → Sarah (mothers): person_id = 2
Emma (children): mother_id = NULL → No match in mothers table

Self join

Let’s see another example using the players table
Here we want to compare the goals of every player against every other player
We use p1.player_id < p2.player_id to avoid duplicate pairs (e.g., Messi vs Vini Jr. and Vini Jr. vs Messi) and comparing a player to themselves

pd.read_sql('''
SELECT
    p1.player_name,
    p1.goals,
    p2.player_name as compared_to,
    p2.goals as their_goals,
    p1.goals - p2.goals as difference
FROM players AS p1
JOIN players AS p2
ON p1.player_id < p2.player_id -- Avoid duplicates and self-comparison
ORDER BY difference DESC;
''', connection)

	player_name	goals	compared_to	their_goals	difference
0	Messi	10	Lewandowski	4	6
1	Messi	10	Mbappé	5	5
2	Messi	10	Haaland	5	5
3	Messi	10	Neymar	6	4
4	Vini Jr	8	Lewandowski	4	4
5	Vini Jr	8	Mbappé	5	3
6	Vini Jr	8	Haaland	5	3
7	Messi	10	Vini Jr	8	2
8	Vini Jr	8	Neymar	6	2
9	Neymar	6	Lewandowski	4	2
10	Neymar	6	Mbappé	5	1
11	Neymar	6	Haaland	5	1
12	Mbappé	5	Lewandowski	4	1
13	Mbappé	5	Haaland	5	0
14	Lewandowski	4	Haaland	5	-1

Merge tables by row 🧩

Union

The UNION operator combines the result sets of two or more SELECT statements vertically (stacking rows)
It automatically removes duplicate rows from the combined results
- If you want to keep duplicates, use UNION ALL
UNION is useful when:
- Combining different tables with different structures
- Each query needs different logic or different columns
- You want to add categories or labels per query
Here is an example where we categorise players based on different criteria using UNION (note the added category column):

# UNION example: Different categories for different criteria
pd.read_sql('''
SELECT player_name, goals, victories, 'Elite Scorer' as category
FROM players
WHERE goals > 9

UNION

SELECT player_name, goals, victories, 'Team Leader' as category
FROM players
WHERE victories > 2 AND goals < 10

ORDER BY category, player_name;
''', connection)

	player_name	goals	victories	category
0	Messi	10	5	Elite Scorer
1	Haaland	5	3	Team Leader
2	Neymar	6	3	Team Leader
3	Vini Jr	8	4	Team Leader

Union all and intersect

Similar to UNION, UNION ALL also merges tables by rows (stacks results vertically)
Unlike UNION, UNION ALL retains all duplicate rows
It simply appends the results. It’s generally faster than UNION as it doesn’t need to check for duplicates
Let’s see an example combining players who are high scorers (more than 7 goals) and players with many victories (more than 3), keeping duplicates if a player meets both criteria:

# UNION ALL keeps all rows, including duplicates if a player meets both criteria.
print(pd.read_sql('''
SELECT player_name, goals, victories, 'High Scorer (>7)' AS category
FROM players
WHERE goals > 7

UNION ALL

SELECT player_name, goals, victories, 'Many Victories (>3)' AS category
FROM players
WHERE victories > 3

ORDER BY player_name, category;
''', connection))

  player_name  goals  victories             category
0       Messi     10          5     High Scorer (>7)
1       Messi     10          5  Many Victories (>3)
2     Vini Jr      8          4     High Scorer (>7)
3     Vini Jr      8          4  Many Victories (>3)

Intersect

The INTERSECT operator returns only the rows that are common to the result sets of two (or more) SELECT statements
It’s useful when you need to find the overlap between two distinct lists that are difficult to combine with a simple AND
For instance, we want to find “Platinum” customers for a loyalty programme based on two separate tables
- Table 1: TopSpenders (customers who spent over $500)
- Table 2: FrequentShoppers (customers who made 10+ purchases)
- We want to find customers who are on both tables

Table: TopSpenders

customer_name
Alice
Bob
David

Table: FrequentShoppers

customer_name
Bob
Charlie
David

# Create the two separate tables for our example
cursor.execute('DROP TABLE IF EXISTS TopSpenders;')
cursor.execute('CREATE TABLE TopSpenders (customer_name TEXT);')
cursor.execute("INSERT INTO TopSpenders VALUES ('Alice'), ('Bob'), ('David');")

cursor.execute('DROP TABLE IF EXISTS FrequentShoppers;')
cursor.execute('CREATE TABLE FrequentShoppers (customer_name TEXT);')
cursor.execute("INSERT INTO FrequentShoppers VALUES ('Bob'), ('Charlie'), ('David');")
connection.commit()

# Now, find the customers who are in BOTH lists
print(pd.read_sql('''
    SELECT customer_name FROM TopSpenders
    
    INTERSECT
    
    SELECT customer_name FROM FrequentShoppers;
''', connection))

  customer_name
0           Bob
1         David

Except

The EXCEPT operator returns all rows from the first SELECT statement that are not present in the second SELECT statement’s results
It finds the difference between two tables (e.g., “Give me everything in A, except for the things that are also in B”)
For example, now we want to run a “re-engagement” campaign using the same tables as before
- Table 1: TopSpenders (customers who spent over $500)
- Table 2: FrequentShoppers (customers who made 10+ purchases)
- Find customers who are TopSpenders but are not FrequentShoppers (they spent a lot once, but haven’t been back)

Table: TopSpenders

customer_name
Alice
Bob
David

Table: FrequentShoppers

customer_name
Bob
Charlie
David

print(pd.read_sql('''
    SELECT customer_name FROM TopSpenders
    
    EXCEPT
    
    SELECT customer_name FROM FrequentShoppers;
''', connection))

  customer_name
0         Alice

Upsert (`INSERT ... ON CONFLICT`)

SQLite provides UPSERT (Update or Insert) operations using the INSERT ... ON CONFLICT clause
This allows you to attempt an INSERT, and if it violates a constraint (like UNIQUE or PRIMARY KEY), you should perform a DO UPDATE SET or a DO NOTHING instead
For instance, if we try to insert a player with an existing player_name, we update their goals instead
You will notice that excluded is a special table that refers to the row that would have been inserted if there was no conflict. SQLite always uses this name

# Let's see Messi's current stats before the UPSERT
print("Before UPSERT:")
print(pd.read_sql("SELECT * FROM players WHERE player_name = 'Messi'", connection))

# We will insert him again with 2 more goals and 1 more victory
# Since 'Messi' already exists, the ON CONFLICT clause updates his stats instead
cursor.execute("""
INSERT INTO players (player_name, goals, victories)
VALUES ('Messi', 2, 1)
ON CONFLICT(player_name) DO UPDATE SET
    goals = goals + excluded.goals,
    victories = victories + excluded.victories;
""")
connection.commit()

print("\nAfter UPSERT:")
print(pd.read_sql('SELECT * FROM players', connection))

Before UPSERT:
   player_id player_name  goals  victories
0          1       Messi     10          5

After UPSERT:
   player_id  player_name  goals  victories
0          1        Messi     12          6
1          2      Vini Jr      8          4
2          3       Neymar      6          3
3          4       Mbappé      5          2
4          5  Lewandowski      4          1
5          6      Haaland      5          3

Upsert with `DO NOTHING`

In the previous example, we used DO UPDATE SET to change the existing row when a conflict occurs
Sometimes you just want to skip the insert silently if the row already exists, without updating anything
For that, you can use DO NOTHING instead

# Try to insert 'Neymar' again with different stats
# Since 'Neymar' already exists, nothing happens
cursor.execute("""
INSERT INTO players (player_name, goals, victories)
VALUES ('Neymar', 100, 100)
ON CONFLICT(player_name) DO NOTHING;
""")
connection.commit()

# Neymar's stats are unchanged!
print(pd.read_sql("SELECT * FROM players WHERE player_name = 'Neymar'", connection))

   player_id player_name  goals  victories
0          3      Neymar      6          3

This is useful when you are loading data from multiple sources and want to keep the first version you inserted without raising an error
Think of DO NOTHING as “insert if new, ignore if duplicate”

Try it yourself! 🧠

Using the products table from earlier (Coffee Maker, Toaster, Blender, Microwave, Air Fryer), write an UPSERT statement that:
1. Tries to insert a new product called 'Toaster' with a price of 39.99
2. If 'Toaster' already exists, update its price to the new value (39.99)
Hint: remember the excluded keyword and the ON CONFLICT syntax from the examples above!
Print the products table before and after to check your result
Appendix 03

Views 🔎

Views

A VIEW is a virtual table based on the result of a SELECT query.
It does not store data itself, but it is useful for several reasons: saving you from rewriting complex queries, restricting which columns or rows users can see (security), and providing a stable interface even if the underlying table structure changes
You can create a view using the CREATE VIEW statement, followed by the view name and the SELECT query
In SQLite, you can also drop a view using DROP VIEW

# Drop the view if it exists
cursor.execute('DROP VIEW IF EXISTS player_stats;')

# Create the view
cursor.execute('''
CREATE VIEW player_stats AS
SELECT player_name, SUM(goals) AS total_goals, SUM(victories) AS total_victories
FROM players
GROUP BY player_name;
''')
connection.commit()

pd.read_sql('SELECT * FROM player_stats LIMIT 4', connection)

	player_name	total_goals	total_victories
0	Haaland	5	3
1	Lewandowski	4	1
2	Mbappé	5	2
3	Messi	12	6

Views

Views really shine when you have a complex query that you need to run often
For example, imagine you regularly need a summary of each product with its average rating and number of reviews
Without a view, you would have to write the full LEFT JOIN + GROUP BY query every time
With a view, you write it once and then query it like a regular table!

cursor.execute('DROP VIEW IF EXISTS product_summary;')

# Create a view that joins products with reviews and calculates stats
cursor.execute('''
CREATE VIEW product_summary AS
SELECT
    p.product_name,
    COUNT(r.review_id) AS num_reviews,
    ROUND(AVG(r.rating), 1) AS avg_rating
FROM products p
LEFT JOIN reviews r ON p.product_id = r.product_id
GROUP BY p.product_id;
''')
connection.commit()

# Now we can query the view with a simple SELECT
pd.read_sql('SELECT * FROM product_summary;', connection)

	product_name	num_reviews	avg_rating
0	Coffee Maker	2	4.5
1	Toaster	1	3.0
2	Blender	1	5.0
3	Microwave	0	NaN
4	Air Fryer	0	NaN

Notice how products with no reviews (Microwave, Air Fryer) still appear thanks to the LEFT JOIN, with 0 reviews and no rating
Anyone can now run SELECT * FROM product_summary without knowing how the join works!

Conclusion 📖

Conclusion

Today we learned about different types of joins in SQL (INNER, LEFT, RIGHT, FULL OUTER), noting that SQLite does not support RIGHT and FULL OUTER joins natively
We also learned about special joins: CROSS JOIN for combinations and SELF JOIN for relating rows within the same table
We saw how to merge tables vertically (by row) with UNION, UNION ALL, INTERSECT, and EXCEPT
We also saw how SQLite implements UPSERT functionality using INSERT ... ON CONFLICT and how to update existing records (DO UPDATE SET) or DO NOTHING on conflicts
We saw how to create and use VIEWS in SQLite, which are virtual tables based on SELECT queries
Check the appendices for exercise solutions! 📝

And that’s all for today! 🎉

Appendix 01

Here is the solution to the exercise comparing INNER JOIN and LEFT JOIN for products and reviews.

print("INNER JOIN Results (Only products with reviews):")
# INNER JOIN only includes products that have at least one review.
# Products like 'Microwave' and 'Air Fryer' are excluded.
print(pd.read_sql('''
    SELECT p.product_name, r.rating, r.comment
    FROM products p
    INNER JOIN reviews r ON p.product_id = r.product_id
    ORDER BY p.product_id, r.review_id; -- Added review_id for consistent ordering
''', connection))

INNER JOIN Results (Only products with reviews):
   product_name  rating              comment
0  Coffee Maker       5  Great coffee maker!
1  Coffee Maker       4   Good but expensive
2       Toaster       3      Average toaster
3       Blender       5    Best blender ever

print("\nLEFT JOIN Results (All products, reviews where available):")
print(pd.read_sql('''
    SELECT p.product_name, r.rating, r.comment
    FROM products p
    LEFT JOIN reviews r ON p.product_id = r.product_id
    ORDER BY p.product_id, r.review_id;
''', connection))


LEFT JOIN Results (All products, reviews where available):
   product_name  rating              comment
0  Coffee Maker     5.0  Great coffee maker!
1  Coffee Maker     4.0   Good but expensive
2       Toaster     3.0      Average toaster
3       Blender     5.0    Best blender ever
4     Microwave     NaN                 None
5     Air Fryer     NaN                 None

Back to exercise

Appendix 03

Here is the solution to the UPSERT exercise updating the Toaster price.

print("Before UPSERT:")
print(pd.read_sql("SELECT * FROM products WHERE product_name = 'Toaster'", connection))

# UPSERT: insert 'Toaster' at 39.99, or update price if it already exists
cursor.execute("""
INSERT INTO products (product_name, price)
VALUES ('Toaster', 39.99)
ON CONFLICT(product_name) DO UPDATE SET
    price = excluded.price;
""")
connection.commit()

print("\nAfter UPSERT:")
print(pd.read_sql("SELECT * FROM products WHERE product_name = 'Toaster'", connection))

Before UPSERT:
   product_id product_name  price
0           2      Toaster  29.99

After UPSERT:
   product_id product_name  price
0           2      Toaster  39.99

Back to exercise

Appendix 04

Cleaning the SQLite database

Unlike server-based databases like PostgreSQL, SQLite databases are typically single files.
“Cleaning” the database often means simply deleting the database file and starting fresh.
If you encounter issues, the easiest way to reset is to close any active connections to the database file and then remove the file using your operating system or Python’s os module.
Here’s how you can close the connection and delete the database file (lecture19.db) we used in this lecture using Python:

# Ensure the connection is closed first
try:
    connection.close()
    print("SQLite connection closed.")
except Exception as e:
    print(f"Error closing connection (might be already closed): {e}")

SQLite connection closed.

Back to the lecture

DATASCI 350 - Data Science Computing

Hello again! 😊

Brief recap 📚

Recap of last class and today’s plan

Last time we learned how to:

Today we will learn how to:

Basic joins 📊

Primary and foreign keys

Load the libraries and connect to the database

Create the tables

Visualise the tables

Types of joins

Inner join

Left join

Right join (emulated in SQLite)

Full outer join (emulated in SQLite)

Try it yourself! 🧠

Try it yourself! 🧠

Special joins 🌟

Cross join

Cross join

Self join

Self join

Self join

Merge tables by row 🧩

Union

Union all and intersect

Intersect

Except

Upsert (INSERT ... ON CONFLICT)

Upsert with DO NOTHING

Try it yourself! 🧠

Views 🔎

Views

Views

Conclusion 📖

Conclusion

And that’s all for today! 🎉

Appendix 01

Appendix 03

Appendix 04

Cleaning the SQLite database

Upsert (`INSERT ... ON CONFLICT`)

Upsert with `DO NOTHING`