Lecture 17 - Introduction to SQL
sqlite3
.quit
to exit the SQLite prompt!sqlite3
DLL and the command-line shell (sqlite-dll...
and sqlite-tools...
)C:\>sqlite
and unzip the files in this folder, which will give you sqlite3.def
, sqlite3.dll
and sqlite3.exe
filessqlite3
file to open the software, and you will see the SQLite promptEdit the system environment variables
Environment Variables...
C:\sqlite
to Path
in both or either User variables and/or System variables, then you can use SQLite on terminal (command prompt)SQLite3 Editor
in the extensions tab in VSCodeInstall
to install the extensiongit diff
Download it here or search for it in the extensions tab in VSCode
.sqlite
or .db
and open it with the SQLite3 Editorexample.db
and open it with the extensionOpen to the Side
– same thing)driver_id
, driver_name
, nationality
, and victories
:
driver_id
is an integer
, not null
, and primary key
(select these options in the editor)driver_name
is a text
variablenationality
is a text
variable toovictories
is an integer
commit
to save the changesTEXT
: Stores string values of any length using UTF-8 encodingINTEGER
: Holds whole numbers (positive or negative)NUMERIC
: Stores decimal numbers as floating-point valuesBLOB
: Contains binary data preserved as inputNULL
: Represents missing or undefined dataREAL
, BOOLEAN
, DATE
, TIME
, and DATETIME
PRIMARY KEY
constraint is used to uniquely identify each row in a tableNULL
values and are usually integersAUTOINCREMENT
keyword
driver_id INTEGER PRIMARY KEY AUTOINCREMENT
FOREIGN KEY
constraint is used to link two tables together
PRIMARY KEY
in another tableROWID
to uniquely identify each row in a table, but it is not recommended
customers
, products
, and orders
customers
table will have a primary key customer_id
products
table will have a primary key product_id
orders
table will have a primary key order_id
orders
table will also have two foreign keys: customer_id
and product_id
customer_id (PK) | first_name | last_name | |
---|---|---|---|
1 | John | Doe | john.doe@email.com |
2 | Jane | Smith | jane.smith@email.com |
3 | Bob | Johnson | bob.j@email.com |
product_id (PK) | product_name | price |
---|---|---|
101 | Laptop | 999.99 |
102 | Smartphone | 599.99 |
103 | Wireless Mouse | 29.99 |
order_id (PK) | customer_id (FK) | product_id (FK) | quantity | total_price |
---|---|---|---|---|
1 | 1 | 101 | 1 | 999.99 |
2 | 2 | 102 | 2 | 1199.98 |
3 | 3 | 103 | 3 | 89.97 |
INSERT INTO
commandVALUES
to the bottom of the table;
)Query Editor
and paste the following code:INSERT INTO drivers VALUES (2, 'Michael Schumacher', 'German', 53);
INSERT INTO drivers VALUES (3, 'Sebastian Vettel', 'German', 91);
INSERT INTO drivers VALUES (4, 'Fernando Alonso', 'Spanish', 32);
Or more efficiently:
INSERT INTO drivers (driver_id, driver_name, nationality, victories) VALUES
(2, 'Michael Schumacher', 'German', 53),
(3, 'Sebastian Vettel', 'German', 91),
(4, 'Fernando Alonso', 'Spanish', 32);
▶️️ Execute
️to run the codeSELECT
command*
) from the drivers
tableSELECT
is probably the most used command in SQL and is used to extract data from a databaseSELECT
to see the resultsClick on Query Editor
and create a table called students
with the following columns:
student_id
(integer not null primary key)student_name
(text)major
(text)gpa
(numeric)Add the following rows to the table:
student_id
: 1, student_name
: ‘John Doe’, major
: ‘Computer Science’, gpa
: 3.5
student_id
: 2, student_name
: ‘Jane Doe’, major
: ‘Mathematics’, gpa
: 3.8
Query the table to see the results
If the table already exists on your database, you can drop it with the following command:
PRAGMA table_info()
commandPRAGMA table_info()
command returns one row for each column in the tablecid
: Column IDname
: Column nametype
: Data typenotnull
: 1 if the column cannot contain NULL
values, 0 otherwisedflt_value
: Default value for the columnpk
: 1 if the column is part of the primary key, 0 otherwiseCREATE TABLE
command, as we saw beforeIF NOT EXISTS
clause to avoid errors if the table already exists
CREATE TABLE IF NOT EXISTS table_name (column1_name data_type, ...);
ALTER TABLE table_name ADD column_name data_type;
ALTER TABLE table_name DROP COLUMN column_name;
ALTER TABLE table_name RENAME COLUMN old_name TO new_name;
DROP TABLE
command
DROP TABLE table_name;
DELETE FROM
command
DELETE FROM table_name WHERE condition;
students
tableALTER TABLE students ADD COLUMN age INTEGER;
ALTER TABLE students DROP COLUMN major;
ALTER TABLE students RENAME COLUMN gpa TO grade;
student_id = 2
: DELETE FROM students WHERE student_id = 2;
SUM()
or AVG()
AS
is an alias which assigns a name to the aggregate statisticSELECT SUM(victories) AS sum_victories,
COUNT(*) AS num_rows,
AVG(victories) AS mean_victories,
MIN(victories) AS min_victories,
MAX(victories) AS max_victories
FROM drivers;
SELECT
command to get the average victories (a subquery)WHERE
clause. See the reason why here. More about subqueries here.HAVING
)GROUP BY
to group data by a columnSELECT nationality,
SUM(victories) AS sum_victories,
AVG(victories) AS mean_victories,
MIN(victories) AS min_victories,
MAX(victories) AS max_victories
FROM drivers
GROUP BY nationality;
ROUND()
to round the valuesSELECT nationality,
SUM(victories) AS sum_victories,
ROUND(AVG(victories), 1) AS mean_victories,
MIN(victories) AS min_victories,
MAX(victories) AS max_victories
FROM drivers
GROUP BY nationality;
HAVING
to filter variables after aggregatingWHERE
, which is used to filter variables before aggregatingSELECT nationality,
SUM(victories) AS sum_victories,
ROUND(AVG(victories), 1) AS mean_victories,
MIN(victories) AS min_victories,
MAX(victories) AS max_victories
FROM drivers
GROUP BY nationality
HAVING SUM(victories) > 50;
HAVING
must be used after GROUP BY
and before ORDER BY
HAVING
is used to filter the results of the GROUP BY
clauseWHERE
is used before aggregating and HAVING
is used after aggregating!SQL Command | Pandas Equivalent |
---|---|
SELECT |
df[['column1', 'column2']] |
INSERT INTO |
df.loc[new_index] = new_row or df = df.append(new_row, ignore_index=True) |
WHERE |
df[df['column'] == value] |
GROUP BY |
df.groupby('column') |
HAVING |
df.groupby('column').filter(lambda x: condition) |
ROUND() |
df['column'].round(decimals) |
Aggregate Functions | df.groupby('column').agg({'col1': 'sum', 'col2': 'mean', 'col3': 'min', 'col4': 'max'}) |
Other Tools
and then Export / Import
sqlite3 example.db .mode csv .output example.csv SELECT * FROM drivers;
sqlite3 example.db .mode csv .import example.csv drivers;
students
table and add the rowsSELECT driver_name, nationality, victories
FROM drivers
WHERE (nationality = 'German' OR nationality = 'British') AND victories > 50;
SELECT driver_name, victories
FROM drivers
WHERE victories > (SELECT AVG(victories) FROM drivers);