Lecture 15 - Cloud Computing II
| On-premise | IaaS | PaaS | SaaS |
|---|---|---|---|
| Application | Application | Application | Application |
| Middleware | Middleware | Middleware | Middleware |
| OS | OS | OS | OS |
| Virtualisation | Virtualisation | Virtualisation | Virtualisation |
| Servers | Servers | Servers | Servers |
| Networking | Networking | Networking | Networking |
User manages
Provider manages
apt update, apt upgrade, apt installscp and chmodSource: DataCamp
ls, cd, pwd, etc)sudo to run commands as root (superuser)
apt (Advanced Package Tool)apt update refreshes the package listapt install installs a packagesudo apt install python3 installs Python 3sudo apt install python3-pip installs pipchmod 400 (which is required to set the correct permissions on your key file)/home/user directory, type cd ~ and then pwd in your WSL terminalmv command in WSLName and tags. For example, name your instance datasci350Ubuntu Server 24.04 LTS as the OS image and 64-bit (x86) as the architecturet2.micro or t1.micro, both of which are free tier eligibleCreate a new key pair and give it a name (e.g. datasci350)ED25519 and .pem as the file formatCreate Key Pair and save it to a secure locationNetwork settings, you may check Allow HTTPS traffic from the internet and Allow HTTP traffic from the internet so that you can access the web servers hosted on your EC2 VMRoot volume under Configure storagegp3 is the default volume type, and it works fine for most use casesio2 is the fastest and most expensive volume type, usually used for high-performance databases (which require millisecond latency)sc1 or st1 for cost savingsLaunch, you will be taken to the Instances pageConnect to instance to see the instructionsSSH client to see the command to connect to your instancechmod 400 "name-of-your-key.pem" and the Example commands provided by AWS to connect to your instance (ssh -i "name-of-your-key.pem" ubuntu@public-ip)Instances link on the left to see the details of your instancesudo apt update to refresh the package listsudo apt upgrade to install the latest updates
-y flag to automatically answer yes to all prompts, e.g., sudo apt update && sudo apt upgrade -ysudo apt installsudo apt install python3sudo apt install python3-pipscp (secure copy)scp is a command-line tool that allows you to copy files securely-i flag (“identity file”) specifies which private key file to use for authentication (the .pem file you downloaded when creating the key pair)scp -i "name-of-your-key.pem" file-to-copy ubuntu@public-ip:/path/to/destinationscp -i "name-of-your-key.pem" ubuntu@public-ip:/path/to/file-to-copy /path/to/destinationscp to copy files between two remote serversXXXXXX with your public IP and note that :~ is your home directory on the instance. Don’t forget to add it!ls) and run (python3) the file on your instancejupyter (as we did in the previous slides)ssh -i "<your-key>.pem" ubuntu@<public_IPv4_DNS_address> -L 8000:localhost:8888-L 8000:localhost:8888 part forwards port 8888 on the instance to port 8000 on your machinesudo apt update && sudo apt upgrade -ysudo apt install -y python3 python3-pip python3-notebooksource ~/.profilewhich python3, which pip3, which jupyterjupyter notebookhttp://localhost:8000. Copy the token from the terminal to log in
http://localhost:8888/?token=...) and change 8888 to 8000print('Hello, DATASCI350!') (or any other code you like!)scp): Create a file on your local machine, then upload it to the instance. Use this when you have files on your own computer that are not available online (e.g., your own datasets, private code)wget): Download a file directly from the internet to the instance. Use this when the file is already hosted somewhere (e.g., GitHub, a public dataset URL). This skips your local machine entirelysudo apt install -y python3-numpy python3-pandas python3-matplotlib python3-seabornscp. On your local machine, create a weather dataset with the Python code below, or download it here: weather_data.py# weather_data.py
import pandas as pd
import numpy as np
import datetime
# Set seed for reproducibility
np.random.seed(42)
# Generate dates for the past 30 days
dates = pd.date_range(end=datetime.datetime.now(), periods=30).tolist()
dates = [d.strftime('%Y-%m-%d') for d in dates]
# Generate temperature data with some randomness
temp_high = np.random.normal(75, 8, 30)
temp_low = temp_high - np.random.uniform(10, 20, 30)
precipitation = np.random.exponential(0.5, 30)
humidity = np.random.normal(65, 10, 30)
# Create a structured dataset
weather_data = pd.DataFrame({
'date': dates,
'temp_high': temp_high,
'temp_low': temp_low,
'precipitation': precipitation,
'humidity': humidity
})
# Save to a text file
with open('weather_data.txt', 'w') as f:
f.write("# Weather data for the past 30 days\n")
f.write(weather_data.to_string(index=False))
print("Weather data saved to weather_data.txt")Run this script on your local machine: python3 weather_data.py
It will create a file called weather_data.txt with 30 days of weather data
Now upload it to your EC2 instance using scp (from a local terminal):
scp -i <your-key>.pem weather_data.txt ubuntu@<your-instance-ip>:~/You can verify it arrived by running ls on your instance
Method 2: internet to cloud with wget. Now let’s get the analysis script directly on the instance, without going through your local machine. Run this on your EC2 instance:
wget https://raw.githubusercontent.com/danilofreire/datasci350/main/lectures/lecture-15/weather_analysis.py (one line)wget already downloaded it):# weather_analysis.py
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from io import StringIO # Add proper import for StringIO
# Read the weather data
with open('weather_data.txt', 'r') as f:
lines = f.readlines()
# Skip the header comment
data_str = ''.join(lines[1:])
df = pd.read_csv(StringIO(data_str), sep=r'\s+') # Fix StringIO import and use raw string for regex
# Print basic statistics
print("Weather Data Analysis:")
print("=====================")
print(f"Number of days: {len(df)}")
print(f"Average high temperature: {df['temp_high'].mean():.1f}°F")
print(f"Average low temperature: {df['temp_low'].mean():.1f}°F")
print(f"Maximum temperature: {df['temp_high'].max():.1f}°F on {df.loc[df['temp_high'].idxmax(), 'date']}")
print(f"Minimum temperature: {df['temp_low'].min():.1f}°F on {df.loc[df['temp_low'].idxmin(), 'date']}")
print(f"Days with precipitation > 1 inch: {len(df[df['precipitation'] > 1])}")
# Create a visualisation
plt.figure(figsize=(12, 6))
sns.set_style("whitegrid")
# Plot temperature range
plt.fill_between(df['date'], df['temp_low'], df['temp_high'], alpha=0.3, color='skyblue')
plt.plot(df['date'], df['temp_high'], marker='o', color='red', label='High Temp')
plt.plot(df['date'], df['temp_low'], marker='o', color='blue', label='Low Temp')
# Add precipitation as bars on a secondary axis
ax2 = plt.twinx()
ax2.bar(df['date'], df['precipitation'], alpha=0.3, color='navy', width=0.5, label='Precipitation')
ax2.set_ylabel('Precipitation (inches)', color='navy')
ax2.tick_params(axis='y', labelcolor='navy')
# Formatting
plt.title('30-Day Weather Report: Temperature Range and Precipitation', fontsize=16)
plt.xticks(rotation=45, ha='right')
plt.ylabel('Temperature (°F)')
plt.legend(loc='upper left')
plt.tight_layout()
# Save the figure
plt.savefig('weather_analysis.png')
print("Analysis complete. Results saved to 'weather_analysis.png'")weather_data.txt (uploaded via scp) and weather_analysis.py (downloaded via wget)python3 weather_analysis.pyjupyter notebook) and run it therescp again, from a local terminal):
scp -i <your-key>.pem ubuntu@<your-instance-ip>:~/weather_analysis.png ./Recap: we used scp to move files between your computer and the instance, and wget to download files from the internet directly to the instance. Both are useful in different situations!
You’ve just completed a full data analysis workflow in the cloud 🎉
This workflow is similar to how data scientists use cloud resources for larger datasets and more complex analyses!