Lecture 19 - SQL and Python
06 October, 2024
SELECT
FROM
WHERE
ORDER BY
GROUP BY
INSERT INTO
CREATE TABLE
DROP TABLE
psycopg2
is a PostgreSQL adapter for Python. Info here
sqlalchemy
is a SQL toolkit. More info here
pgAdmin
or SQL Notebook
anymore, as you can run SQL queries directly in Pythonpsycopg2
and sqlalchemy
🐘psycopg2
and sqlalchemy
using condas
or pip
create_engine()
creates a connection to the databasepostgresql+psycopg2
is the database driverpostgres:postgres
is the username and passwordlocalhost:5432
is the host and portpostgres
is the database nameto_sql()
method from pandas
to import the data into PostgreSQLresults.to_sql('results', connection, if_exists='replace', index=False)
races.to_sql('results', connection, if_exists='replace', index=False)
102
if_exists='replace'
parameter will replace the table if it already exists
DROP TABLE
and CREATE TABLE
in SQLindex=False
parameter will not import the index columncircuits
table into PostgreSQL\
) to define strins over multiple lines\
, otherwise it won’t workThis is a string defined over multiple lines
This will "supposedly" put double quotes inside a string
SELECT "driverId" FROM results;
using backslashpd.read_sql_query()
method to run SQL queriestext()
method from sqlalchemy
to format the querytext()
raceId | year | round | circuitId | name | date | time | url | fp1_date | fp1_time | fp2_date | fp2_time | fp3_date | fp3_time | quali_date | quali_time | sprint_date | sprint_time | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 2009 | 1 | 1 | Australian Grand Prix | 2009-03-29 | 06:00:00 | http://en.wikipedia.org/wiki/2009_Australian_G... | \N | \N | \N | \N | \N | \N | \N | \N | \N | \N |
1 | 2 | 2009 | 2 | 2 | Malaysian Grand Prix | 2009-04-05 | 09:00:00 | http://en.wikipedia.org/wiki/2009_Malaysian_Gr... | \N | \N | \N | \N | \N | \N | \N | \N | \N | \N |
2 | 3 | 2009 | 3 | 17 | Chinese Grand Prix | 2009-04-19 | 07:00:00 | http://en.wikipedia.org/wiki/2009_Chinese_Gran... | \N | \N | \N | \N | \N | \N | \N | \N | \N | \N |
# Extract all data from a column
example1 = pd.read_sql(text("SELECT * FROM results;"), connection)
# Extract a subset of columns
example2 = pd.read_sql(text("SELECT points \
FROM results;"), connection)
# Subset based on a string condition
example3 = pd.read_sql(text("SELECT * \
FROM races \
WHERE name = 'Abu Dhabi Grand Prix';"), connection)
raceId
and constructorId
from resultsresults.csv
file to a table in PostgreSQL again, replacing the table if it already existscircuits
table into PostgreSQLcircuitId | circuitRef | name | location | country | lat | lng | alt | url | |
---|---|---|---|---|---|---|---|---|---|
0 | 1 | albert_park | Albert Park Grand Prix Circuit | Melbourne | Australia | -37.84970 | 144.9680 | 10 | http://en.wikipedia.org/wiki/Melbourne_Grand_P... |
1 | 2 | sepang | Sepang International Circuit | Kuala Lumpur | Malaysia | 2.76083 | 101.7380 | 18 | http://en.wikipedia.org/wiki/Sepang_Internatio... |
2 | 3 | bahrain | Bahrain International Circuit | Sakhir | Bahrain | 26.03250 | 50.5106 | 7 | http://en.wikipedia.org/wiki/Bahrain_Internati... |
SELECT "driverId" FROM results;
using backslashSELECT "driverId" FROM results;
pd.read_sql_query()
method to run the query