Lecture 16 - Merging Data
28 October, 2024
.agg()
method.agg()
with .groupby()
.query()
to filter and summarise data
Type | Description | Example |
---|---|---|
One-to-One | One entity relates to exactly one other entity | One student, one mailing address |
One-to-Many | One entity relates to multiple instances of another entity | One store, many departments |
Many-to-Many | Multiple instances of both entities relate to each other | Students and courses |
resultId | raceId | driverId | constructorId | number | grid | position | positionText | positionOrder | points | laps | time | milliseconds | fastestLap | rank | fastestLapTime | fastestLapSpeed | statusId | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 18 | 1 | 1 | 22 | 1 | 1 | 1 | 1 | 10.0 | 58 | 1:34:50.616 | 5690616 | 39 | 2 | 1:27.452 | 218.300 | 1 |
1 | 2 | 18 | 2 | 2 | 3 | 5 | 2 | 2 | 2 | 8.0 | 58 | +5.478 | 5696094 | 41 | 3 | 1:27.739 | 217.586 | 1 |
2 | 3 | 18 | 3 | 3 | 7 | 7 | 3 | 3 | 3 | 6.0 | 58 | +8.163 | 5698779 | 41 | 5 | 1:28.090 | 216.719 | 1 |
3 | 4 | 18 | 4 | 4 | 5 | 11 | 4 | 4 | 4 | 5.0 | 58 | +17.181 | 5707797 | 58 | 7 | 1:28.603 | 215.464 | 1 |
4 | 5 | 18 | 5 | 1 | 23 | 3 | 5 | 5 | 5 | 4.0 | 58 | +18.014 | 5708630 | 43 | 1 | 1:27.418 | 218.385 | 1 |
5 | 6 | 18 | 6 | 3 | 8 | 13 | 6 | 6 | 6 | 3.0 | 57 | \N | \N | 50 | 14 | 1:29.639 | 212.974 | 11 |
6 | 7 | 18 | 7 | 5 | 14 | 17 | 7 | 7 | 7 | 2.0 | 55 | \N | \N | 22 | 12 | 1:29.534 | 213.224 | 5 |
7 | 8 | 18 | 8 | 6 | 1 | 15 | 8 | 8 | 8 | 1.0 | 53 | \N | \N | 20 | 4 | 1:27.903 | 217.180 | 5 |
8 | 9 | 18 | 9 | 2 | 4 | 2 | \N | R | 9 | 0.0 | 47 | \N | \N | 15 | 9 | 1:28.753 | 215.100 | 4 |
9 | 10 | 18 | 10 | 7 | 12 | 18 | \N | R | 10 | 0.0 | 43 | \N | \N | 23 | 13 | 1:29.558 | 213.166 | 3 |
raceId | year | round | circuitId | name | date | time | url | fp1_date | fp1_time | fp2_date | fp2_time | fp3_date | fp3_time | quali_date | quali_time | sprint_date | sprint_time | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 2009 | 1 | 1 | Australian Grand Prix | 2009-03-29 | 06:00:00 | http://en.wikipedia.org/wiki/2009_Australian_G... | \N | \N | \N | \N | \N | \N | \N | \N | \N | \N |
1 | 2 | 2009 | 2 | 2 | Malaysian Grand Prix | 2009-04-05 | 09:00:00 | http://en.wikipedia.org/wiki/2009_Malaysian_Gr... | \N | \N | \N | \N | \N | \N | \N | \N | \N | \N |
2 | 3 | 2009 | 3 | 17 | Chinese Grand Prix | 2009-04-19 | 07:00:00 | http://en.wikipedia.org/wiki/2009_Chinese_Gran... | \N | \N | \N | \N | \N | \N | \N | \N | \N | \N |
3 | 4 | 2009 | 4 | 3 | Bahrain Grand Prix | 2009-04-26 | 12:00:00 | http://en.wikipedia.org/wiki/2009_Bahrain_Gran... | \N | \N | \N | \N | \N | \N | \N | \N | \N | \N |
4 | 5 | 2009 | 5 | 4 | Spanish Grand Prix | 2009-05-10 | 12:00:00 | http://en.wikipedia.org/wiki/2009_Spanish_Gran... | \N | \N | \N | \N | \N | \N | \N | \N | \N | \N |
5 | 6 | 2009 | 6 | 6 | Monaco Grand Prix | 2009-05-24 | 12:00:00 | http://en.wikipedia.org/wiki/2009_Monaco_Grand... | \N | \N | \N | \N | \N | \N | \N | \N | \N | \N |
6 | 7 | 2009 | 7 | 5 | Turkish Grand Prix | 2009-06-07 | 12:00:00 | http://en.wikipedia.org/wiki/2009_Turkish_Gran... | \N | \N | \N | \N | \N | \N | \N | \N | \N | \N |
7 | 8 | 2009 | 8 | 9 | British Grand Prix | 2009-06-21 | 12:00:00 | http://en.wikipedia.org/wiki/2009_British_Gran... | \N | \N | \N | \N | \N | \N | \N | \N | \N | \N |
8 | 9 | 2009 | 9 | 20 | German Grand Prix | 2009-07-12 | 12:00:00 | http://en.wikipedia.org/wiki/2009_German_Grand... | \N | \N | \N | \N | \N | \N | \N | \N | \N | \N |
9 | 10 | 2009 | 10 | 11 | Hungarian Grand Prix | 2009-07-26 | 12:00:00 | http://en.wikipedia.org/wiki/2009_Hungarian_Gr... | \N | \N | \N | \N | \N | \N | \N | \N | \N | \N |
circuitId | circuitRef | name | location | country | lat | lng | alt | url | |
---|---|---|---|---|---|---|---|---|---|
0 | 1 | albert_park | Albert Park Grand Prix Circuit | Melbourne | Australia | -37.84970 | 144.96800 | 10 | http://en.wikipedia.org/wiki/Melbourne_Grand_P... |
1 | 2 | sepang | Sepang International Circuit | Kuala Lumpur | Malaysia | 2.76083 | 101.73800 | 18 | http://en.wikipedia.org/wiki/Sepang_Internatio... |
2 | 3 | bahrain | Bahrain International Circuit | Sakhir | Bahrain | 26.03250 | 50.51060 | 7 | http://en.wikipedia.org/wiki/Bahrain_Internati... |
3 | 4 | catalunya | Circuit de Barcelona-Catalunya | Montmeló | Spain | 41.57000 | 2.26111 | 109 | http://en.wikipedia.org/wiki/Circuit_de_Barcel... |
4 | 5 | istanbul | Istanbul Park | Istanbul | Turkey | 40.95170 | 29.40500 | 130 | http://en.wikipedia.org/wiki/Istanbul_Park |
5 | 6 | monaco | Circuit de Monaco | Monte-Carlo | Monaco | 43.73470 | 7.42056 | 7 | http://en.wikipedia.org/wiki/Circuit_de_Monaco |
6 | 7 | villeneuve | Circuit Gilles Villeneuve | Montreal | Canada | 45.50000 | -73.52280 | 13 | http://en.wikipedia.org/wiki/Circuit_Gilles_Vi... |
7 | 8 | magny_cours | Circuit de Nevers Magny-Cours | Magny Cours | France | 46.86420 | 3.16361 | 228 | http://en.wikipedia.org/wiki/Circuit_de_Nevers... |
8 | 9 | silverstone | Silverstone Circuit | Silverstone | UK | 52.07860 | -1.01694 | 153 | http://en.wikipedia.org/wiki/Silverstone_Circuit |
9 | 10 | hockenheimring | Hockenheimring | Hockenheim | Germany | 49.32780 | 8.56583 | 103 | http://en.wikipedia.org/wiki/Hockenheimring |
races_raw["name"]
and circuits_raw["name"]
sort_values()
to make it easier to compare the variables['70th Anniversary Grand Prix' 'Abu Dhabi Grand Prix'
'Argentine Grand Prix' 'Australian Grand Prix' 'Austrian Grand Prix'
'Azerbaijan Grand Prix' 'Bahrain Grand Prix' 'Belgian Grand Prix'
'Brazilian Grand Prix' 'British Grand Prix' 'Caesars Palace Grand Prix'
'Canadian Grand Prix' 'Chinese Grand Prix' 'Dallas Grand Prix'
'Detroit Grand Prix' 'Dutch Grand Prix' 'Eifel Grand Prix'
'Emilia Romagna Grand Prix' 'European Grand Prix' 'French Grand Prix'
'German Grand Prix' 'Hungarian Grand Prix' 'Indian Grand Prix'
'Indianapolis 500' 'Italian Grand Prix' 'Japanese Grand Prix'
'Korean Grand Prix' 'Las Vegas Grand Prix' 'Luxembourg Grand Prix'
'Malaysian Grand Prix' 'Mexican Grand Prix' 'Mexico City Grand Prix'
'Miami Grand Prix' 'Monaco Grand Prix' 'Moroccan Grand Prix'
'Pacific Grand Prix' 'Pescara Grand Prix' 'Portuguese Grand Prix'
'Qatar Grand Prix' 'Russian Grand Prix' 'Sakhir Grand Prix'
'San Marino Grand Prix' 'Saudi Arabian Grand Prix' 'Singapore Grand Prix'
'South African Grand Prix' 'Spanish Grand Prix' 'Styrian Grand Prix'
'Swedish Grand Prix' 'Swiss Grand Prix' 'São Paulo Grand Prix'
'Turkish Grand Prix' 'Tuscan Grand Prix' 'United States Grand Prix'
'United States Grand Prix West']
circuits_raw["name"]
variable['AVUS' 'Adelaide Street Circuit' 'Ain Diab' 'Aintree'
'Albert Park Grand Prix Circuit' 'Autodromo Enzo e Dino Ferrari'
'Autodromo Internazionale del Mugello' 'Autodromo Nazionale di Monza'
'Autódromo Hermanos Rodríguez' 'Autódromo Internacional Nelson Piquet'
'Autódromo Internacional do Algarve' 'Autódromo José Carlos Pace'
'Autódromo Juan y Oscar Gálvez' 'Autódromo do Estoril'
'Bahrain International Circuit' 'Baku City Circuit' 'Brands Hatch'
'Buddh International Circuit' 'Charade Circuit' 'Circuit Bremgarten'
'Circuit Gilles Villeneuve' 'Circuit Mont-Tremblant'
'Circuit Park Zandvoort' 'Circuit Paul Ricard'
'Circuit de Barcelona-Catalunya' 'Circuit de Monaco'
'Circuit de Nevers Magny-Cours' 'Circuit de Pedralbes'
'Circuit de Spa-Francorchamps' 'Circuit of the Americas'
'Circuito da Boavista' 'Circuito de Jerez' 'Detroit Street Circuit'
'Dijon-Prenois' 'Donington Park' 'Fair Park' 'Fuji Speedway'
'Hockenheimring' 'Hungaroring' 'Indianapolis Motor Speedway'
'Istanbul Park' 'Jarama' 'Jeddah Corniche Circuit'
'Korean International Circuit' 'Kyalami' 'Las Vegas Street Circuit'
'Las Vegas Strip Street Circuit' 'Le Mans' 'Long Beach'
'Losail International Circuit' 'Marina Bay Street Circuit'
'Miami International Autodrome' 'Monsanto Park Circuit' 'Montjuïc'
'Mosport International Raceway' 'Nivelles-Baulers' 'Nürburgring'
'Okayama International Circuit' 'Pescara Circuit'
'Phoenix street circuit' 'Prince George Circuit' 'Red Bull Ring'
'Reims-Gueux' 'Riverside International Raceway' 'Rouen-Les-Essarts'
'Scandinavian Raceway' 'Sebring International Raceway'
'Sepang International Circuit' 'Shanghai International Circuit'
'Silverstone Circuit' 'Sochi Autodrom' 'Suzuka Circuit'
'Valencia Street Circuit' 'Watkins Glen' 'Yas Marina Circuit' 'Zeltweg'
'Zolder']
races_raw
and circuits_raw
datasets are differentrename()
method to do this[]
){}
with key-value pairs separated by colons (:
),
)my_dict = {"name": "Danilo", "course": "QTM151", "city": "Atlanta"}
car_dictionary = {"car_model": ["Ferrari","Tesla","BMW"],
"year": ["2018","2023","2022"]}
car_dictionary
{'car_model': ['Ferrari', 'Tesla', 'BMW'], 'year': ['2018', '2023', '2022']}
{'A': array([[1, 2, 3],
[2, 4, 5]]),
'string': 'ABC'}
del
keywordpd.DataFrame()
function{old_name: new_name}
formatrename()
method using the columns
argumentcircuitId | circuitRef | circuit_name | location | country | lat | lng | alt | url | |
---|---|---|---|---|---|---|---|---|---|
0 | 1 | albert_park | Albert Park Grand Prix Circuit | Melbourne | Australia | -37.84970 | 144.96800 | 10 | http://en.wikipedia.org/wiki/Melbourne_Grand_P... |
1 | 2 | sepang | Sepang International Circuit | Kuala Lumpur | Malaysia | 2.76083 | 101.73800 | 18 | http://en.wikipedia.org/wiki/Sepang_Internatio... |
2 | 3 | bahrain | Bahrain International Circuit | Sakhir | Bahrain | 26.03250 | 50.51060 | 7 | http://en.wikipedia.org/wiki/Bahrain_Internati... |
3 | 4 | catalunya | Circuit de Barcelona-Catalunya | Montmeló | Spain | 41.57000 | 2.26111 | 109 | http://en.wikipedia.org/wiki/Circuit_de_Barcel... |
4 | 5 | istanbul | Istanbul Park | Istanbul | Turkey | 40.95170 | 29.40500 | 130 | http://en.wikipedia.org/wiki/Istanbul_Park |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
72 | 75 | portimao | Autódromo Internacional do Algarve | Portimão | Portugal | 37.22700 | -8.62670 | 108 | http://en.wikipedia.org/wiki/Algarve_Internati... |
73 | 76 | mugello | Autodromo Internazionale del Mugello | Mugello | Italy | 43.99750 | 11.37190 | 255 | http://en.wikipedia.org/wiki/Mugello_Circuit |
74 | 77 | jeddah | Jeddah Corniche Circuit | Jeddah | Saudi Arabia | 21.63190 | 39.10440 | 15 | http://en.wikipedia.org/wiki/Jeddah_Street_Cir... |
75 | 78 | losail | Losail International Circuit | Al Daayen | Qatar | 25.49000 | 51.45420 | \N | http://en.wikipedia.org/wiki/Losail_Internatio... |
76 | 79 | miami | Miami International Autodrome | Miami | USA | 25.95810 | -80.23890 | \N | http://en.wikipedia.org/wiki/Miami_Internation... |
77 rows × 9 columns
circuits
DataFramedict_rename_circuits = {"name": "circuit_name"}
circuits = circuits_raw.rename(columns = dict_rename_circuits)
circuits.head()
circuitId | circuitRef | circuit_name | location | country | lat | lng | alt | url | |
---|---|---|---|---|---|---|---|---|---|
0 | 1 | albert_park | Albert Park Grand Prix Circuit | Melbourne | Australia | -37.84970 | 144.96800 | 10 | http://en.wikipedia.org/wiki/Melbourne_Grand_P... |
1 | 2 | sepang | Sepang International Circuit | Kuala Lumpur | Malaysia | 2.76083 | 101.73800 | 18 | http://en.wikipedia.org/wiki/Sepang_Internatio... |
2 | 3 | bahrain | Bahrain International Circuit | Sakhir | Bahrain | 26.03250 | 50.51060 | 7 | http://en.wikipedia.org/wiki/Bahrain_Internati... |
3 | 4 | catalunya | Circuit de Barcelona-Catalunya | Montmeló | Spain | 41.57000 | 2.26111 | 109 | http://en.wikipedia.org/wiki/Circuit_de_Barcel... |
4 | 5 | istanbul | Istanbul Park | Istanbul | Turkey | 40.95170 | 29.40500 | 130 | http://en.wikipedia.org/wiki/Istanbul_Park |
name
to race_name
races_raw
datasetraces
pd.merge()
to combine datapd.merge()
function to combine dataon
argument specifies the column to merge onhow
argument specifies the type of join to performcircuitId | circuit_name | |
---|---|---|
0 | 1 | Albert Park Grand Prix Circuit |
1 | 2 | Sepang International Circuit |
2 | 3 | Bahrain International Circuit |
3 | 4 | Circuit de Barcelona-Catalunya |
4 | 5 | Istanbul Park |
pd.merge()
to combine datapd.merge(data1, data2, on, how)
data2
# The "pd.merge()" command combines the information from both datasets
# The first argument is the "primary" datasets
# The second argument is the "secondary" dataset (must include the "on" column)
# The "on" is the common variable that is used for merging
# how = "left" tells Python that the left dataset is the primary one
races_merge = pd.merge(races_raw[['raceId', 'year', 'circuitId']],
circuits[["circuitId","circuit_name", "location"]],
on = "circuitId",
how = "left")
pd.merge()
to combine dataraceId | year | circuitId | circuit_name | location | |
---|---|---|---|---|---|
0 | 1 | 2009 | 1 | Albert Park Grand Prix Circuit | Melbourne |
1 | 2 | 2009 | 2 | Sepang International Circuit | Kuala Lumpur |
2 | 3 | 2009 | 17 | Shanghai International Circuit | Shanghai |
3 | 4 | 2009 | 3 | Bahrain International Circuit | Sakhir |
4 | 5 | 2009 | 4 | Circuit de Barcelona-Catalunya | Montmeló |
... | ... | ... | ... | ... | ... |
1097 | 1116 | 2023 | 69 | Circuit of the Americas | Austin |
1098 | 1117 | 2023 | 32 | Autódromo Hermanos Rodríguez | Mexico City |
1099 | 1118 | 2023 | 18 | Autódromo José Carlos Pace | São Paulo |
1100 | 1119 | 2023 | 80 | Las Vegas Strip Street Circuit | Las Vegas |
1101 | 1120 | 2023 | 24 | Yas Marina Circuit | Abu Dhabi |
1102 rows × 5 columns
pd.merge()
to combine dataraceId | circuitId | circuit_name | |
---|---|---|---|
760 | 761 | 61 | AVUS |
434 | 435 | 29 | Adelaide Street Circuit |
335 | 336 | 29 | Adelaide Street Circuit |
319 | 320 | 29 | Adelaide Street Circuit |
370 | 371 | 29 | Adelaide Street Circuit |
... | ... | ... | ... |
501 | 502 | 40 | Zolder |
486 | 487 | 40 | Zolder |
470 | 471 | 40 | Zolder |
516 | 517 | 40 | Zolder |
608 | 609 | 40 | Zolder |
1102 rows × 3 columns
results_merge = pd.merge(results_raw,
races_raw[["raceId","date"]],
on = "raceId",
how = "left")
results_merge.head()
resultId | raceId | driverId | constructorId | number | grid | position | positionText | positionOrder | points | laps | time | milliseconds | fastestLap | rank | fastestLapTime | fastestLapSpeed | statusId | date | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 18 | 1 | 1 | 22 | 1 | 1 | 1 | 1 | 10.0 | 58 | 1:34:50.616 | 5690616 | 39 | 2 | 1:27.452 | 218.300 | 1 | 2008-03-16 |
1 | 2 | 18 | 2 | 2 | 3 | 5 | 2 | 2 | 2 | 8.0 | 58 | +5.478 | 5696094 | 41 | 3 | 1:27.739 | 217.586 | 1 | 2008-03-16 |
2 | 3 | 18 | 3 | 3 | 7 | 7 | 3 | 3 | 3 | 6.0 | 58 | +8.163 | 5698779 | 41 | 5 | 1:28.090 | 216.719 | 1 | 2008-03-16 |
3 | 4 | 18 | 4 | 4 | 5 | 11 | 4 | 4 | 4 | 5.0 | 58 | +17.181 | 5707797 | 58 | 7 | 1:28.603 | 215.464 | 1 | 2008-03-16 |
4 | 5 | 18 | 5 | 1 | 23 | 3 | 5 | 5 | 5 | 4.0 | 58 | +18.014 | 5708630 | 43 | 1 | 1:27.418 | 218.385 | 1 | 2008-03-16 |
name
column in races_raw
and circuits_raw
['raceId' 'year' 'round' 'circuitId' 'name_x' 'date' 'time' 'url'
'fp1_date' 'fp1_time' 'fp2_date' 'fp2_time' 'fp3_date' 'fp3_time'
'quali_date' 'quali_time' 'sprint_date' 'sprint_time' 'name_y']
name
column has been duplicated_x
and _y
to the column names to differentiate themname_x
and name_y
in the dataset races_merge_pitfall
to race_name
and circuit_name
.rename()
alt
, lng
, and lat
into the races
data using pd.merge()
circuitId
as the keypd.concat()
We can use the pd.concat()
function to concatenate data
We can concatenate data along the rows or columns
The method works best if columns are identical
There are also other advanced options if they are not, please check the documentation for more information
Let’s use .query()
to split the data into different parts
array(['Australia', 'Malaysia', 'Bahrain', 'Spain', 'Turkey', 'Monaco',
'Canada', 'France', 'UK', 'Germany', 'Hungary', 'Belgium', 'Italy',
'Singapore', 'Japan', 'China', 'Brazil', 'USA', 'United States',
'UAE', 'Argentina', 'Portugal', 'South Africa', 'Mexico', 'Korea',
'Netherlands', 'Sweden', 'Austria', 'Morocco', 'Switzerland',
'India', 'Russia', 'Azerbaijan', 'Saudi Arabia', 'Qatar'],
dtype=object)
circuits_concat = pd.concat([circuits_spain, circuits_usa, circuits_malaysia])
circuits_concat.head()
circuitId | circuitRef | circuit_name | location | country | lat | lng | alt | url | |
---|---|---|---|---|---|---|---|---|---|
3 | 4 | catalunya | Circuit de Barcelona-Catalunya | Montmeló | Spain | 41.5700 | 2.261110 | 109 | http://en.wikipedia.org/wiki/Circuit_de_Barcel... |
11 | 12 | valencia | Valencia Street Circuit | Valencia | Spain | 39.4589 | -0.331667 | 4 | http://en.wikipedia.org/wiki/Valencia_Street_C... |
25 | 26 | jerez | Circuito de Jerez | Jerez de la Frontera | Spain | 36.7083 | -6.034170 | 37 | http://en.wikipedia.org/wiki/Circuito_Permanen... |
44 | 45 | jarama | Jarama | Madrid | Spain | 40.6171 | -3.585580 | 609 | http://en.wikipedia.org/wiki/Circuito_Permanen... |
48 | 49 | montjuic | Montjuïc | Barcelona | Spain | 41.3664 | 2.151670 | 79 | http://en.wikipedia.org/wiki/Montju%C3%AFc_cir... |
NaN
values will be added)circuits_spain_drop = circuits_spain.drop(columns=['circuitRef', 'location'])
pd.concat([circuits_spain_drop, circuits_usa])
circuitId | circuit_name | country | lat | lng | alt | url | circuitRef | location | |
---|---|---|---|---|---|---|---|---|---|
3 | 4 | Circuit de Barcelona-Catalunya | Spain | 41.5700 | 2.261110 | 109 | http://en.wikipedia.org/wiki/Circuit_de_Barcel... | NaN | NaN |
11 | 12 | Valencia Street Circuit | Spain | 39.4589 | -0.331667 | 4 | http://en.wikipedia.org/wiki/Valencia_Street_C... | NaN | NaN |
25 | 26 | Circuito de Jerez | Spain | 36.7083 | -6.034170 | 37 | http://en.wikipedia.org/wiki/Circuito_Permanen... | NaN | NaN |
44 | 45 | Jarama | Spain | 40.6171 | -3.585580 | 609 | http://en.wikipedia.org/wiki/Circuito_Permanen... | NaN | NaN |
48 | 49 | Montjuïc | Spain | 41.3664 | 2.151670 | 79 | http://en.wikipedia.org/wiki/Montju%C3%AFc_cir... | NaN | NaN |
66 | 67 | Circuit de Pedralbes | Spain | 41.3903 | 2.116670 | 85 | http://en.wikipedia.org/wiki/Pedralbes_Circuit | NaN | NaN |
18 | 19 | Indianapolis Motor Speedway | USA | 39.7950 | -86.234700 | 223 | http://en.wikipedia.org/wiki/Indianapolis_Moto... | indianapolis | Indianapolis |
22 | 80 | Las Vegas Strip Street Circuit | United States | 36.1147 | -115.173000 | \N | https://en.wikipedia.org/wiki/Las_Vegas_Grand_... | vegas | Las Vegas |
32 | 33 | Phoenix street circuit | USA | 33.4479 | -112.075000 | 345 | http://en.wikipedia.org/wiki/Phoenix_street_ci... | phoenix | Phoenix |
36 | 37 | Detroit Street Circuit | USA | 42.3298 | -83.040100 | 177 | http://en.wikipedia.org/wiki/Detroit_street_ci... | detroit | Detroit |
41 | 42 | Fair Park | USA | 32.7774 | -96.758700 | 139 | http://en.wikipedia.org/wiki/Fair_Park | dallas | Dallas |
42 | 43 | Long Beach | USA | 33.7651 | -118.189000 | 12 | http://en.wikipedia.org/wiki/Long_Beach,_Calif... | long_beach | California |
43 | 44 | Las Vegas Street Circuit | USA | 36.1162 | -115.174000 | 639 | http://en.wikipedia.org/wiki/Las_Vegas_Street_... | las_vegas | Nevada |
45 | 46 | Watkins Glen | USA | 42.3369 | -76.927200 | 485 | http://en.wikipedia.org/wiki/Watkins_Glen_Inte... | watkins_glen | New York State |
59 | 60 | Riverside International Raceway | USA | 33.9370 | -117.273000 | 470 | http://en.wikipedia.org/wiki/Riverside_Interna... | riverside | California |
62 | 63 | Sebring International Raceway | USA | 27.4547 | -81.348300 | 18 | http://en.wikipedia.org/wiki/Sebring_Raceway | sebring | Florida |
68 | 69 | Circuit of the Americas | USA | 30.1328 | -97.641100 | 161 | http://en.wikipedia.org/wiki/Circuit_of_the_Am... | americas | Austin |
76 | 79 | Miami International Autodrome | USA | 25.9581 | -80.238900 | \N | http://en.wikipedia.org/wiki/Miami_Internation... | miami | Miami |
raceId | year | round | circuitId | race_name | date | time | url | fp1_date | fp1_time | fp2_date | fp2_time | fp3_date | fp3_time | quali_date | quali_time | sprint_date | sprint_time | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 2009 | 1 | 1 | Australian Grand Prix | 2009-03-29 | 06:00:00 | http://en.wikipedia.org/wiki/2009_Australian_G... | \N | \N | \N | \N | \N | \N | \N | \N | \N | \N |
1 | 2 | 2009 | 2 | 2 | Malaysian Grand Prix | 2009-04-05 | 09:00:00 | http://en.wikipedia.org/wiki/2009_Malaysian_Gr... | \N | \N | \N | \N | \N | \N | \N | \N | \N | \N |
2 | 3 | 2009 | 3 | 17 | Chinese Grand Prix | 2009-04-19 | 07:00:00 | http://en.wikipedia.org/wiki/2009_Chinese_Gran... | \N | \N | \N | \N | \N | \N | \N | \N | \N | \N |
3 | 4 | 2009 | 4 | 3 | Bahrain Grand Prix | 2009-04-26 | 12:00:00 | http://en.wikipedia.org/wiki/2009_Bahrain_Gran... | \N | \N | \N | \N | \N | \N | \N | \N | \N | \N |
4 | 5 | 2009 | 5 | 4 | Spanish Grand Prix | 2009-05-10 | 12:00:00 | http://en.wikipedia.org/wiki/2009_Spanish_Gran... | \N | \N | \N | \N | \N | \N | \N | \N | \N | \N |
Index(['raceId', 'year', 'round', 'circuitId', 'race_name', 'date', 'time',
'url', 'fp1_date', 'fp1_time', 'fp2_date', 'fp2_time', 'fp3_date',
'fp3_time', 'quali_date', 'quali_time', 'sprint_date', 'sprint_time'],
dtype='object')
rename = {"name_x": "race_name", "name_y": "circuit_name"}
races_merge_pitfall.rename(columns = rename).info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1102 entries, 0 to 1101
Data columns (total 19 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 raceId 1102 non-null int64
1 year 1102 non-null int64
2 round 1102 non-null int64
3 circuitId 1102 non-null int64
4 race_name 1102 non-null object
5 date 1102 non-null object
6 time 1102 non-null object
7 url 1102 non-null object
8 fp1_date 1102 non-null object
9 fp1_time 1102 non-null object
10 fp2_date 1102 non-null object
11 fp2_time 1102 non-null object
12 fp3_date 1102 non-null object
13 fp3_time 1102 non-null object
14 quali_date 1102 non-null object
15 quali_time 1102 non-null object
16 sprint_date 1102 non-null object
17 sprint_time 1102 non-null object
18 circuit_name 1102 non-null object
dtypes: int64(4), object(15)
memory usage: 163.7+ KB
races02 = pd.merge(races,
circuits[["circuitId","alt","lng","lat"]],
on = "circuitId",
how = "left")
races02.head()
raceId | year | round | circuitId | race_name | date | time | url | fp1_date | fp1_time | ... | fp2_time | fp3_date | fp3_time | quali_date | quali_time | sprint_date | sprint_time | alt | lng | lat | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 2009 | 1 | 1 | Australian Grand Prix | 2009-03-29 | 06:00:00 | http://en.wikipedia.org/wiki/2009_Australian_G... | \N | \N | ... | \N | \N | \N | \N | \N | \N | \N | 10 | 144.96800 | -37.84970 |
1 | 2 | 2009 | 2 | 2 | Malaysian Grand Prix | 2009-04-05 | 09:00:00 | http://en.wikipedia.org/wiki/2009_Malaysian_Gr... | \N | \N | ... | \N | \N | \N | \N | \N | \N | \N | 18 | 101.73800 | 2.76083 |
2 | 3 | 2009 | 3 | 17 | Chinese Grand Prix | 2009-04-19 | 07:00:00 | http://en.wikipedia.org/wiki/2009_Chinese_Gran... | \N | \N | ... | \N | \N | \N | \N | \N | \N | \N | 5 | 121.22000 | 31.33890 |
3 | 4 | 2009 | 4 | 3 | Bahrain Grand Prix | 2009-04-26 | 12:00:00 | http://en.wikipedia.org/wiki/2009_Bahrain_Gran... | \N | \N | ... | \N | \N | \N | \N | \N | \N | \N | 7 | 50.51060 | 26.03250 |
4 | 5 | 2009 | 5 | 4 | Spanish Grand Prix | 2009-05-10 | 12:00:00 | http://en.wikipedia.org/wiki/2009_Spanish_Gran... | \N | \N | ... | \N | \N | \N | \N | \N | \N | \N | 109 | 2.26111 | 41.57000 |
5 rows × 21 columns
circuitId | circuitRef | circuit_name | location | country | lat | lng | alt | url | |
---|---|---|---|---|---|---|---|---|---|
18 | 19 | indianapolis | Indianapolis Motor Speedway | Indianapolis | USA | 39.7950 | -86.2347 | 223 | http://en.wikipedia.org/wiki/Indianapolis_Moto... |
22 | 80 | vegas | Las Vegas Strip Street Circuit | Las Vegas | United States | 36.1147 | -115.1730 | \N | https://en.wikipedia.org/wiki/Las_Vegas_Grand_... |
32 | 33 | phoenix | Phoenix street circuit | Phoenix | USA | 33.4479 | -112.0750 | 345 | http://en.wikipedia.org/wiki/Phoenix_street_ci... |
36 | 37 | detroit | Detroit Street Circuit | Detroit | USA | 42.3298 | -83.0401 | 177 | http://en.wikipedia.org/wiki/Detroit_street_ci... |
41 | 42 | dallas | Fair Park | Dallas | USA | 32.7774 | -96.7587 | 139 | http://en.wikipedia.org/wiki/Fair_Park |