import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
For these exercices we are using a dataset provided by Airbnb for a Kaggle competition. It describes its offer for New York City in 2019, including types of appartments, price, location etc.
Create a dataframe of a few lines with objects and their poperties (e.g fruits, their weight and colour). Calculate the mean of your Dataframe.
fruits = pd.DataFrame({'fruits':['strawberry', 'orange','melon'], 'weight':[20, 200, 1000],'weight2':[20, 200, 1000], 'color': ['red','orange','yellow']})
fruits.describe()
fruits.mean()
Import the table called AB_NYC_2019.csv
as a dataframe. It is located in the Datasets folder. Have a look at the beginning of the table (head).
Create a histogram of prices
airbnb = pd.read_csv('Data/AB_NYC_2019.csv')
airbnb.head()
airbnb['price'].plot(kind = 'hist', bins = range(0,1000,10));
Create a new column in the dataframe by multiplying the "price" and "availability_365" columns to get an estimate of the maximum yearly income.
airbnb['yearly_income'] = airbnb['price']*airbnb['availability_365']
airbnb['yearly_income']
Create a new Dataframe by first subselecting yearly incomes between 1 and 100'000 and then by suppressing cases with 0 reviews. Then make a scatter plot of yearly income versus number of reviews
(airbnb.yearly_income>1)&(airbnb.yearly_income<100000)
sub_airbnb = airbnb[(airbnb.yearly_income>1)&(airbnb.yearly_income<100000)].copy()
sub_airbnb.plot(x = 'number_of_reviews', y = 'yearly_income', kind = 'scatter', alpha = 0.01)
plt.show()
We provide below and additional table that contains the number of inhabitants of each of New York's boroughs ("neighbourhood_group" in the table). Use merge
to add this population information to each element in the original dataframe.
boroughs = pd.read_excel('Data/ny_boroughs.xlsx')
boroughs
airbnb
merged = pd.merge(airbnb, boroughs, left_on = 'neighbourhood_group', right_on='borough')
merged.head()
groupby
calculate the average price for each type of room (room_type) in each neighbourhood_group. What is the average price for an entire home in Brooklyn ?unstack()
and create a bar plot with the resulting tableairbnb.groupby(['neighbourhood_group','room_type']).mean()
summary = airbnb.groupby(['neighbourhood_group','room_type']).mean().price
summary
summary[('Brooklyn','Entire home/apt')]
summary.unstack()
summary.unstack().plot(kind = 'bar', alpha = 0.5)
plt.show()
Using Seaborn, create a scatter plot where x and y positions are longitude and lattitude, the color reflects price and the shape of the marker the borough (neighbourhood_group). Can you recognize parts of new york ? Does the map make sense ?
fig, ax = plt.subplots(figsize=(10,8))
g = sns.scatterplot(data = airbnb, y = 'latitude', x = 'longitude', hue = 'price',
hue_norm=(0,200), s=10, palette='inferno')