QTM 350 - Data Science Computing

Lecture 14 - Introduction to Cloud Computing

Danilo Freire

Emory University

Hello, everyone! 👋

Brief recap 📚

Local LLMs and AI agents

  • We saw many cool things last class!
  • There many pros of using local LLMs:
    • Privacy, security, control, and customisation
  • Ollama is a great software for that!
    • It lets you download and run hundreds of LLMs locally
    • It’s free and open-source, and easy to customise with Modelfiles
  • We also saw how to use LM Studio and Hugging Face

Local LLMs and AI agents

  • We learned about LLM APIs and how to use them via OpenRouter
    • There are many free models available, just search for “free” in the search bar
  • We also saw how to use Roo Code and integrate APIs in VS Code
    • You can use any API with a friendly interface
    • You can fully automate your workflow with agents
  • Finally, we learned how to use Web-Ui
  • Web-Ui makes websites accessible for AI agents, and it can perform pretty much any task on the internet
  • Have a look at its parent project too: Browser Use
    • You can use it in Python and integrate it in your scripts!

Browser use

Add grocery items to cart, and checkout

If the video doesn’t play, please click here

Cloud computing ☁️

What is cloud computing?

  • Cloud computing is the on-demand delivery of computing resources through a cloud services platform via the internet with pay-as-you-go pricing
  • Cloud computing has increased in popularity in recent years because it offers many advantages
    • Scalability, flexibility, and cost-efficiency
  • There are three main types of cloud computing services: IaaS, PaaS, and SaaS
  • The most popular cloud computing platforms are Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP), the “Big Three”
  • They are so widely used that Amazon’s AWS is the most profitable division of the company
    • Although it is only 15% of their total sales, they account for more than 50% of their profits (~28 bi)

Source: GeekWire

Why should we care about cloud computing? 🤔

Imagine you are opening a business…

Traditional approach

  1. Estimate supply and demand
  2. Estimate infrastructural needs
  3. Purchase and deploy infrastructure
  4. Install and test your system
  5. Offer your services to clients
  • Infrastructure is very expensive:
  • It takes time and deploy the infrastructure
  • What if the estimations were wrong?

Imagine you are opening a business…

Cloud approach

  1. Choose one or more cloud services providers
  2. Deploy your systems on the cloud
  3. Offer your services to clients
  4. Pay for what you use
Traditional process With Cloud Computing
❌ High investment risk ✅ Reduced risk
❌ Long time-to-market ✅ Shorter time-to-market
✅ Manages own data ❌ Trust the vendor?
✅ Completely in control ❌ Dependant from a specific vendor?

Case study: Animoto

  • Animoto: Lets users create videos from their own photos/music
  • Auto-edits photos and aligns them with the music, so it “looks good”
  • Built using Amazon EC2+S3+SQS
  • Released a Facebook app in mid-April 2008
  • More than 750,000 people signed up within 3 days
  • EC2 usage went from 50 machines to 3,500 (x70 scalability!)
  • No way they could have done this with traditional infrastructure!

Source: Jeff Bezos’ talk at Stanford in 2008

Case study: The Washington Post

  • Hillary Clinton’s official White House schedule released to the public
  • 17,481 pages of non-searchable, low-quality PDF
  • Very interesting to journalists, but would have required hundreds of man-hours to evaluate
  • Peter Harkins, Senior Engineer at The Washington Post: “Can we make that data available more quickly, ideally within the same news cycle?”
  • Tested various Optical Character Recognition (OCR) programs; estimated required speed
  • Launched 200 EC2 instances; project was completed within nine hours using 1,407 hours of VM time ($144.62)
  • Results available on the web only 26 hours after the release

Cloud computing services ☁️

Cloud computing services

  • SaaS (Software as a Service):
    • Software is hosted on a cloud and accessed via the internet as a complete application
    • You don’t have to worry about updates, security, or maintenance
    • You (usually) pay a subscription fee (monthly or yearly)
    • Examples: Gmail, Office 365, Salesforce
  • PaaS (Platform as a Service):
    • Provides a platform allowing customers to develop, run, and manage applications without the complexity of building and maintaining the infrastructure
    • Usually, the platform offers a series of APIs
    • You pay for the platform and the resources you use (e.g., storage, bandwidth)
    • Examples: Google App Engine, AWS Elastic Beanstalk, Heroku

Cloud computing services

SaaS

Cloud computing services

PaaS

Cloud computing services

Cloud computing services

FaaS

Cloud computing services

IaaS

Comparison of cloud services

On-premise IaaS PaaS SaaS
Application Application Application Application
Middleware Middleware Middleware Middleware
OS OS OS OS
Virtualisation Virtualisation Virtualisation Virtualisation
Servers Servers Servers Servers
Networking Networking Networking Networking

User manages
Provider manages

What is virtualisation?

  • Suppose Alice has a machine with 4 CPUs and 8 GB of memory, and three customers:
    • Bob wants a machine with 1 CPU and 3GB of memory
    • Charlie wants 2 CPUs and 1GB of memory
    • Daniel wants 1 CPU and 4GB of memory
  • What should Alice do?
  • Alice can sell each customer a virtual machine (VM) with the requested resources
  • From each customer’s perspective, it appears as if they had a physical machine all by themselves (isolation)

More about virtualisation

What is middleware?

  • Middleware is software that connects two separate applications
  • This is software we rarely see, but which is essential for the functioning of the internet
  • For instance, software than handles authentication, authorisation, and encryption, drivers that connect to databases, and software that handles messaging
  • Amazon SQS, Apache Kafka, and RabbitMQ are examples of middleware

The Big Three ☁️

Amazon Web Services (AWS)

  • Amazon Web Services is a collection of cloud-based services
  • It’s a very big one
  • Let me say it again: a VERY big one 😂
  • They offer a wide range of services:
    • Compute, storage, databases, analytics, machine learning, AI, IoT, security, and more
  • Many companies use AWS, including Netflix, Airbnb, and NASA
  • The most widely used services are EC2, S3, and RDS, for computing, storage, and databases, respectively
  • Let’s look at them in more detail

AWS services 🛠️

Amazon EC2 - Elastic Compute Cloud

  • Amazon EC2 is a web service that provides resizable compute capacity in the cloud
  • It’s designed to make web-scale cloud computing easier for developers
  • You can launch instances with a variety of operating systems (mainly Linux)
  • Which means that, for the most part, you can use bash and run any software you want from the command line
  • You can also use a GUI interface, but it is not necessary
  • You can use EC2 to host a website, run a database, or run a machine learning model
  • However, please be mindful of the costs!
  • You pay for the instances you use, and the costs can add up quickly
  • Per-second (or per-hour) billing
  • Data transfer not included!
  • Persistent storage not included!
  • Scaling not included!

EC2 auto scaling

  • Scaling is the ability to increase or decrease the compute capacity of your application
  • Scale your application manually, on a scheduled basis or on demand
  • This is useful when you have fluctuating workloads, such as a website that gets more traffic during the day, or a machine learning model that gets more requests at certain times
  • You can set up auto scaling groups to automatically scale your application
  • You can also set up load balancers to distribute traffic across multiple instances

Source: AWS

ELB - Elastic Load Balancer

  • ELB is a service that automatically distributes incoming application traffic across multiple targets
  • It can handle the varying load of your application traffic in a single availability zone or across multiple availability zones
  • ELB is a key component of auto scaling
  • As with EC2, you pay for the data transfer and the number of requests
  • And it can be quite expensive too
  • It can detect unhealthy instances and reroute traffic to healthy instances
  • So your application is always available!

Cloud storage

Amazon S3 - Simple Storage Service

  • Amazon has three main storage services: S3, EBS, and EFS
  • S3 is a web service that provides object storage through a web interface
  • You can store data in buckets, which are like folders
  • Data are distributed across a minimum of three availability zones
    • 99.9999999% durability (nine nines!)
  • You can set up lifecycle policies to automatically move data to cheaper storage classes
    • Standard, Intelligent-Tiering, Glacier, Glacier Deep Archive
  • You can also set up versioning to keep multiple versions of an object (works like git)

Cloud storage

Amazon EBS - Elastic Block Store

  • EBS is a web service that provides block storage volumes for use with EC2 instances
  • The difference between S3 and EBS is that EBS is block storage, while S3 is object storage
    • Block storage is like a hard drive, while object storage is like a file system
  • Why would you use it? Mainly for SQL files
    • Low-latency, high-performance storage for frequent access
  • You can attach EBS volumes to your EC2 instances
  • You can also take snapshots of your volumes for backup (like git again!)
  • Up to 16 TB per volume

Database services

Amazon RDS - Relational Database Service

  • Amazon RDS is a web service that makes it easy to set up, operate, and scale a relational database
  • You can choose from six popular database engines:
    • Amazon Aurora, PostgreSQL, MySQL, MariaDB, Oracle, and SQL Server
  • You can also use Aurora Serverless to automatically scale your database
  • It’s a managed service, so you don’t have to worry about backups, patches, or updates
  • It works well with another AWS service called Athena, which allows you to query data in S3 using SQL
  • You can also use Redshift for data warehousing, and DynamoDB for NoSQL databases

Amazon SageMaker

  • Finally, let’s talk about Amazon SageMaker
  • SageMaker is a fully managed service that allows you to build, train, and deploy machine learning models at scale
  • It has several pre-configured models, such as TensorFlow, PyTorch, and Scikit-learn
  • You can also bring your own models too, and use Jupyter notebooks to train them
  • Lots of other models available in the AWS Marketplace, and you can also use SageMaker Studio to manage your projects
  • Machine Learning: application services
    • Comprehend (for NLP)
    • Rekognition (Visual Analysis)
    • Translate
    • Polly (text-to-speech)

Creating and managing an AWS account 🛠️

Creating an AWS account

You can do this later if you want

  • To create an AWS account, go to https://aws.amazon.com
  • Enter your account information, and then choose Verify email address
  • You will receive an email with a verification link
  • Select Personal and choose Continue
  • Enter your billing information, yes, you need to 😒
  • Enter the code displayed in the CAPTCHA, and then submit
  • Choose Complete sign up
  • And you’re done 🎉

Managing an AWS account

  • The first thing you should do is to use a billing alarm
    • You can set up a billing alarm to notify you when your bill exceeds a certain amount
    • This is very important, as costs can add up quickly
  • Go to https://console.aws.amazon.com/costmanagement/
  • In the navigation pane, choose Billing Preferences (scroll down to the bottom)
  • By Alert preferences choose Edit
  • Choose Receive AWS Free Tier Alerts
  • Choose Save preferences
  • Then go to https://us-east-1.console.aws.amazon.com/billing/home#/budgets/overview
  • Choose Create budget, choose a Zero spend budget template, add your email, then click Create budget

Managing an AWS account

Amazon Transcribe 🎤

Audio transcription at scale

  • Imagine that you have a task similar to that of the Washington Post journalists
  • You found a series of audio files that you need to transcribe, and you have to do it quickly
  • So let’s create an S3 bucket (folder), upload an audio file, and transcribe it
  • The file we will use is available here: transcribe-sample.mp3
  • Select AWS Management Console to open the console, then search for S3 in the search bar
  • Click on Create bucket, give it a name, accept the permissions, and click Create bucket

Audio transcription at scale

S3 bucket

Then click on the bucket and upload the file

Create transcription job

  • Now let’s go to Amazon Transcribe
  • Just search for Transcribe in the search bar
  • Click on Create job
  • Give it a name, select the language, and choose the S3 bucket where the file is located
  • Click Create

Transcription job

Transcription job

Conclusion 🎉

What we learned today

  • Why cloud computing is important and how it can help your business
  • The Big Three cloud computing platforms: AWS, Azure, and GCP
  • The main types of cloud computing services: IaaS, PaaS, SaaS, FaaS
  • The most popular AWS services: EC2, S3, RDS, SageMaker
  • How to create an AWS account and set up a billing alarm
  • How to create an S3 bucket, upload an audio file, and transcribe it with Amazon Transcribe
  • And that’s just the beginning!
  • In the next sessions, we will learn more about AWS services, and how to use them to build and deploy applications
  • Remember to close your EC2 and S3 instances when you’re done!

And that’s a wrap! 🎬

Thank you very much!
See you next time! 😊🙏🏽