Lecture 17 - Introduction to SQL
30 October, 2024
postgres
. Please remember it, as you will need it to connect to the database5432
and the host is localhost
(no need to change them)postgres
and the default database is postgres
postgres
with the right mouse button, then click on Connect Server
postgres
databaseDatabases
, then postgres
below it. After this, click with the right mouse button and go to Query Tool
TABLE
called drivers
,
) that contain the column_name
and column_type
, respectivelydriver_id
is an int
, the driver_name
is a varchar
with a maximum of 30 characters, the nationality
is a varchar
with a maximum of 15 characters, and the victories
is an int
driver_id
is the primary key, which means it is unique for each driverExecute
(▶️) button to run the codeINSERT INTO
commandVALUES
to the bottom of the table;
)INSERT INTO drivers VALUES (1, 'Lewis Hamilton','British', 103);
INSERT INTO drivers VALUES (4, 'Fernando Alonso', 'Spanish', 32);
INSERT INTO drivers VALUES (3, 'Sebastian Vettel', 'German', 91);
INSERT INTO drivers VALUES (2, 'Michael Schumacher', 'German', 53);
Execute
(▶️) button to run the codeData Output
tab (bottom left) to see the results… but there’s nothing there! 😱:::
Data Output
tab is empty because we need to run a SELECT
command to see the data 😉SELECT
command soon, but for now let’s just run the following command:*
) from the drivers
tableExecute
(▶️) button to run the code (or press F5
)drivers
table on your owndriver_id
for each new driver;
)SQL Notebook
icon on the left side of the screenNew SQL Connection
localhost
(or any name you want)postgres
localhost
5432
postgres
postgres
Create
.sql
with SQL Notebook17-introduction-sql.sql
file in the repository for this lectureOpen With
menu option (right button). Then, select the SQL Notebook formatSUM()
or AVG()
AS
is an alias which assigns a name to the aggregate statisticnationality
and driver_id
for drivers with more than 60 victoriesGROUP BY
to group data by a columnROUND()
to round the valuesHAVING
to filter variables after aggregatingWHERE
, which is used to filter variables before aggregatingHAVING
SQL Command | Pandas Equivalent |
---|---|
SELECT |
df[['column1', 'column2']] |
INSERT INTO |
df.loc[new_index] = new_row or df = df.append(new_row, ignore_index=True) |
WHERE |
df[df['column'] == value] |
GROUP BY |
df.groupby('column') |
HAVING |
df.groupby('column').filter(lambda x: condition) |
ROUND() |
df['column'].round(decimals) |
Aggregate Functions | df.groupby('column').agg({'col1': 'sum', 'col2': 'mean', 'col3': 'min', 'col4': 'max'}) |