DATASCI 350 - Data Science Computing

Lecture 20 - Parallel Computing

Danilo Freire

danilo.freire@emory.edu

Department of Data and Decision Sciences
Emory University

Hello, my friends! 😊

Today’s agenda 📅

Lecture outline

Serial vs Parallel Algorithms
Python implementations of parallelism
- Single node
- Multi-node
Joblib and Dask for parallel computing
Tools for further exploration

Serial vs Parallel Algorithms

Serial Execution

Typical programs operate lines sequentially:

# Import packages
import numpy as np

# Define an array of numbers
foo = np.array([0, 1, 2, 3, 4, 5])

# Define a function that squares numbers
def bar(x):
    return x * x

# Loop over each element and perform an action on it
for element in foo:

        # Print the result of bar
        print(bar(element))

The map function

A key tool that we will utilise later is called map
This lets us apply a function to each element in a list or array:

# (Very) inefficient way to define a map function
def my_map(function, array):
    # create a container for the results
    output = []

    # loop over each element
    for element in array:
        
        # add the intermediate result to the container
        output.append(function(element))
    
    # return the now-filled container
    return output

my_map(bar, foo)

[0, 1, 4, 9, 16, 25]

Python helpfully provides a map function in the standard library:

list(map(bar, foo))

[0, 1, 4, 9, 16, 25]

The built-in map function is much more faster than mine (it’s implemented in C), so of course you should use that one! 😂

Using `joblib` for parallel computing

In the example we showed before, no step of the map call depends on the other steps
Rather than waiting for the function to loop over each value, we could create multiple instances of the function bar and apply it to each value simultaneously
There are several methods to achieve this, but we will use joblib for this purpose
- Install it with pip install joblib
The Parallel function from joblib is used to parallelise the task across as many jobs as we want
The n_jobs parameter specifies the number of jobs to run in parallel
The delayed function is used to delay the execution of the function bar until the parallelisation is ready
The results variable will contain the output of the parallel computation

Using our bar function and foo array from before:

# Install joblib if you haven't done so yet
# !pip install joblib

# Import joblib functions 
from joblib import Parallel, delayed

results = Parallel(n_jobs=6)(delayed(bar)(x) for x in foo)
results

[0, 1, 4, 9, 16, 25]

What joblib is doing here is creating 6 instances of the bar function and applying each one to a different element of the foo array
As you can see, the results are the same as before
The difference is that the computation is now done in parallel

Serial vs parallel execution

Let’s see another example of the difference between serial and parallel execution
Here, we will create a NumPy array with 10 million random numbers and perform some mathematical operations on it multiple times
Each call to calculation is independent of the others, so we call them embarrassingly parallel, meaning they can be easily parallelised
We will use the %timeit magic command to measure the time it takes to run a function
Serial:

def calculation(size=10000000):
    # Create a large array and perform operations
    arr = np.random.rand(size)
    for _ in range(10):
        arr = np.sqrt(arr) + np.sin(arr)
    return np.mean(arr)

# Single run
%timeit calculation()

444 ms ± 2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Let’s run the same function 4 times:

# Sequential runs (4 times)
%timeit [calculation() for _ in range(4)]

1.76 s ± 14.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Now let’s see the parallel version:

# Parallel runs (4 times)
%timeit Parallel(n_jobs=4)(delayed(calculation)() for _ in range(4))

612 ms ± 32.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

As you can see, the parallel version is much faster than the serial version
It is not exactly 4 times faster because there is some overhead in creating the parallel processes
But the difference is still significant

Another example

Processing multiple input files

Say we have a number of input files, like .jpg images, that we want to perform the same actions on, like rotate by 180 degrees and convert to a different format
We can define a function that takes a file as input and performs these actions, then map it on a list of files

# Import Python Image Library functions
from PIL import Image

Let’s read an image file and display it:

im = Image.open('./data/kings_cross.jpg')
# Display image
im

Parallel processing of images

Rotate the image by 180 degrees

im.rotate(angle=180)

Let’s define a function that takes a file name as input, opens the file, rotates it upside down, and then saves the output as a PDF:

def image_flipper(file_name):
    # extract the base file name
    base_name = file_name[0:-4]
    
    # open the file
    im = Image.open(file_name)

    # rotate by 180 degrees
    im_flipped = im.rotate(angle=180)
    
    # Save a PDF with a new file name
    im_flipped.save(base_name + "_flipped.pdf", format='PDF')

    return base_name + "_flipped.pdf"

Parallel processing of images

Let’s see the images we have in the data folder:

import glob

file_list = glob.glob('./data/*jpg')

for f in file_list:
    print(f)

./data/kings_cross.jpg
./data/charing_cross.jpg
./data/victoria.jpg
./data/waterloo.jpg
./data/euston.jpg
./data/fenchurch.jpg
./data/st_pancras.jpg
./data/london_bridge.jpg
./data/liverpool_street.jpg
./data/paddington.jpg

Again, this is an embarrassingly parallel problem since each image can be processed independently
Now let’s apply the image_flipper function to each file in the list:

%timeit Parallel(n_jobs=4)(delayed(image_flipper)(f) for f in file_list)

38.3 ms ± 4.02 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Big O notation: Why it matters for parallel computing

Big O notation describes how an algorithm’s runtime or space requirements grow as the input size grows
O(1): Constant time: runtime doesn’t change with input size (e.g., accessing an array element)
O(n): Linear time: runtime grows proportionally with input size (e.g., looping through an array)
O(n²): Quadratic time: runtime grows with the square of input size (e.g., nested loops)
The image_flipper function we just defined has O(n) complexity relative to the number of images
If processing 1 image takes 2 seconds, processing 100 images takes about 200 seconds sequentially

Code

import matplotlib.pyplot as plt
import numpy as np

# Simulate processing times
num_images = np.array([1, 10, 50, 100, 200])
sequential_time = num_images * 2  # 2 seconds per image
parallel_time = (num_images * 2) / 4  # 4 cores, ideal speedup

plt.figure(figsize=(8, 5))
plt.plot(num_images, sequential_time, 'o-', label='Serial O(n)', linewidth=2, markersize=8)
plt.plot(num_images, parallel_time, 's-', label='Parallel O(n/4)', linewidth=2, markersize=8)
plt.xlabel('Number of Images', fontsize=12)
plt.ylabel('Time (seconds)', fontsize=12)
plt.title('O(n) Scaling: Serial vs Parallel', fontsize=14, fontweight='bold')
plt.legend(fontsize=11)
plt.grid(True, alpha=0.3)
plt.show()

How parallel computing changes complexity

Serial complexity: O(n) — time grows linearly with input
Parallel complexity: O(n/p + overhead) where p = number of cores
- With 4 cores: O(n/4 + c) — theoretically 4× faster
Parallel computing can’t improve Big O complexity, only the constant factors
An O(n²) algorithm is still O(n²) when parallelised, just with a smaller constant
Parallel computing is most valuable for embarrassingly parallel O(n) problems like:
Processing n images
Running n independent simulations
Applying a function to n data points

When not to parallelise

# Example: O(n²) nested loop (not embarrassingly parallel)
def inefficient_pairwise_sum(data):
    n = len(data)
    results = []
    for i in range(n):
        for j in range(n):
            results.append(data[i] + data[j])
    return results

# This is SLOW for n=1000
data = list(range(1000))

# DON'T run this - it would take forever!
# inefficient_pairwise_sum(data)

Bad candidates for parallelism include:
- Operations where each step depends on the previous
- Algorithms with shared state between iterations
- Memory-bound problems (limited by RAM speed, not CPU)

Some take-aways

Parallel computing can be much faster than serial computing
These problems are essentially independent and share no information between them
The joblib module makes it simple to run these steps together with a single command
This workflow is limited to running on a single computer (or compute node) since there is no mechanism to communicate outside

Try it yourself! 🧠

Install joblib and NumPy if you haven’t done so yet
- !pip install joblib numpy
Compare the time of the serial and parallel versions of the following function:

def square(x):
    return x**2

# Create a large array to process
numbers = np.arange(1000000)

# Sequential version
%timeit [square(x) for x in numbers]

Then try the parallel version:

from joblib import Parallel, delayed

# Parallel version
%timeit Parallel(n_jobs=4)(delayed(square)(x) for x in numbers)

What did you find? Is the parallel version faster?
Appendix 01

Dask

Dask is a flexible parallel computing library for analytic computing
It is composed of two components:
- Dynamic task scheduling optimised for computation
- “Big Data” collections like parallel arrays, dataframes, and lists that extend common interfaces like NumPy, Pandas, or Python iterators to larger-than-memory or distributed environments
Dask emphasises the following virtues:
- Familiar: Provides parallelised NumPy array and Pandas DataFrame objects
- Flexible: Provides a task scheduling interface for more custom workloads and integration with other projects
- Native: Enables distributed computing in pure Python
- Fast: Operates with low overhead, both in small data and large data settings

Let’s import Dask and see how it works

import dask.dataframe as dd
import dask.array as da

Dask arrays are a parallel and distributed version of NumPy arrays
They are composed of many NumPy arrays, split along one or more dimensions
Each chunk is a separate NumPy array that can be processed in parallel

data = np.random.normal(size=100000).reshape(200, 500)
a = da.from_array(data, chunks=(100, 100))
a

	Array	Chunk
Bytes	781.25 kiB	78.12 kiB
Shape	(200, 500)	(100, 100)
Dask graph	10 chunks in 1 graph layer
Data type	float64 numpy.ndarray

Dask arrays

Dask arrays are lazy, that is, they do not compute anything until you ask for it
Lazy functions are useful because they allow the user to build up a computation graph (execution plan) before executing it
So they optimise the computation and save memory before running anything
For example, let’s slice the Dask array a to get the first 10 rows of the 6th column

a[:10, 5] # first 10 rows of the 6th column

	Array	Chunk
Bytes	80 B	80 B
Shape	(10,)	(10,)
Dask graph	1 chunks in 2 graph layers
Data type	float64 numpy.ndarray

We use the .compute() method to compute the result

a[:10, 5].compute()

array([-0.19775373, -0.94550204,  0.35159976, -0.79737792, -2.62102694,
        0.1470524 ,  0.11357152,  0.66703178, -0.65195167,  0.39966233])

Dask arrays

Dask Array supports many common NumPy operations including:
Basic arithmetic and scalar mathematics (+, *, exp, log, etc)
Reductions along axes (sum(), mean(), std())
Tensor operations and matrix multiplication (tensordot)
Array slicing and basic indexing
Axis reordering and transposition

a.sum().compute()

96.19776735178957

Let’s compare a similar operation in NumPy:

size = 100000000
np_arr = np.random.random(size)
%timeit np_result = np.sqrt(np_arr) + np.sin(np_arr)

514 ms ± 25.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

And in Dask:

da_arr = da.random.random(size, chunks='auto') 
%timeit da_result = da.sqrt(da_arr) + da.sin(da_arr)

565 μs ± 3.6 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

Dask dataframes

Dask also provides a parallelised version of Pandas dataframes
They are composed of many Pandas dataframes, split along the index
Let’s jump into an example:

import dask
df = dask.datasets.timeseries()

This is a small dataset of about 240 MB
Unlike Pandas, Dask DataFrames also lazy
No data is printed here, instead it is replaced by ellipses (...)

df

Dask DataFrame Structure:

	name	id	x	y
npartitions=30
2000-01-01	string	int64	float64	float64
2000-01-02	...	...	...	...
...	...	...	...	...
2000-01-30	...	...	...	...
2000-01-31	...	...	...	...

Dask Name: to_string_dtype, 2 expressions

Nonetheless, the column names and dtypes are known

df.dtypes

name    string[pyarrow]
id                int64
x               float64
y               float64
dtype: object

Dask dataframes

Dask DataFrames support a large subset of the Pandas API
Some operations will automatically display the data

import pandas as pd

pd.options.display.precision = 2
pd.options.display.max_rows = 10

df.head()

	name	id	x	y
timestamp
2000-01-01 00:00:00	Edith	1004	0.15	-0.83
2000-01-01 00:00:01	Kevin	1009	0.52	0.45
2000-01-01 00:00:02	Yvonne	934	-0.94	-0.06
2000-01-01 00:00:03	Jerry	983	0.03	0.34
2000-01-01 00:00:04	Zelda	997	0.61	-0.31

This example shows how to slice the data based on a condition and then determine the standard deviation of the data in the x column

df2 = df[df.y > 0]
df3 = df2.groupby("name").x.std()
df3

Dask Series Structure:
npartitions=1
    float64
        ...
Dask Name: getitem, 8 expressions
Expr=(((Filter(frame=ArrowStringConversion(frame=FromMap(451b666)), predicate=ArrowStringConversion(frame=FromMap(451b666))['y'] > 0))[['name', 'x']]).std(ddof=1, numeric_only=False, split_out=None, observed=False))['x']

Note that df3 is still not shown
We can use the .compute() method to display the result

df3.compute()

name
Alice      0.58
Bob        0.58
Charlie    0.58
Dan        0.58
Edith      0.58
           ... 
Victor     0.57
Wendy      0.58
Xavier     0.58
Yvonne     0.58
Zelda      0.58
Name: x, Length: 26, dtype: float64

Dask dataframes

Aggregations are also supported
Here we aggregate the sum of the x column and the maximum of the y column by the name column

df4 = df.groupby("name").aggregate({"x": "sum", "y": "max"})
df4.compute()

	x	y
name
Edith	-397.35	1.0
Kevin	115.33	1.0
Yvonne	-145.98	1.0
Jerry	67.18	1.0
Zelda	178.70	1.0
...	...	...
Michael	60.19	1.0
Oliver	-47.73	1.0
Sarah	125.20	1.0
Frank	-122.43	1.0
Tim	-186.71	1.0

26 rows × 2 columns

If you have the available RAM for your dataset then you can persist data in memory
We use the .persist() method to do this, and then the data will be available for future computations

df5 = df4.persist()
df5.head()

	x	y
name
Edith	-397.35	1.0
Kevin	115.33	1.0
Yvonne	-145.98	1.0
Jerry	67.18	1.0
Zelda	178.70	1.0

Time series example

Since the Dask dataframe we are using has a time series structure, we can use the resample method to aggregate the data by a time period
Let’s resample the data by 1 hour and calculate the mean of the x and y columns

df[["x", "y"]].resample("1h").mean().head(3)

	x	y
timestamp
2000-01-01 00:00:00	-1.26e-02	1.42e-02
2000-01-01 01:00:00	1.02e-02	-1.51e-02
2000-01-01 02:00:00	4.37e-03	-2.50e-03

We can also use the rolling() method to calculate a rolling mean of the data

df[["x", "y"]].rolling(window="24h").mean().head()

	x	y
timestamp
2000-01-01 00:00:00	0.15	-0.83
2000-01-01 00:00:01	0.34	-0.19
2000-01-01 00:00:02	-0.09	-0.15
2000-01-01 00:00:03	-0.06	-0.03
2000-01-01 00:00:04	0.08	-0.08

Try it yourself! 🧠

Install dask if you haven’t done so yet
- !pip install dask
Find the right chunk size!
Create a Dask array with 10 million random numbers (or less if you have memory constraints)
Vary the chunk size and time the following operation:
Calculate mean(sqrt(x^2)) on the Dask array
See the code below for an example with three different chunk sizes. Which one worked best for you? Why do you think that is?

import numpy as np
import dask.array as da

size = 10_000_000

# Dask with SMALL chunks
da_data_small = da.random.random(size, chunks=100_000)  # 100 chunks
%timeit da.sqrt(da_data_small**2).mean().compute()

# Dask with MEDIUM chunks
da_data_medium = da.random.random(size, chunks=2_000_000)  # 5 chunks
%timeit da.sqrt(da_data_medium**2).mean().compute()

# Dask with LARGE chunks
da_data_large = da.random.random(size, chunks=5_000_000)  # Only 2 chunks
%timeit da.sqrt(da_data_large**2).mean().compute()

Appendix 02

Dask and SQL

dask-sql is a library that allows you to run SQL queries on Dask DataFrames
It is built on top of Apache Calcite, a SQL interpreter
It is still under development, but it already has many features
You can install it with pip install dask-sql
A dask_sql.Context is the Python equivalent to a SQL database,
It serves as an interface to register all tables and functions used in SQL queries, as well as execute the queries themselves
A single Context is created and used for the duration of a Python script or notebook

from dask_sql import Context
c = Context()

Dask and SQL

Once a Context has been created, there are many ways to register tables in it
The simplest way to do this is through the create_table method
Supported input types include Pandas DataFrames, Dask DataFrames, remote datasets (like on Amazon S3), and more
- More information here: https://dask-sql.readthedocs.io/en/latest/data_input.html

# Register and persist a dask table
from dask.datasets import timeseries
ddf = timeseries()
c.create_table("dask", ddf, persist=True)

Now we can run SQL queries on the timeseries table

c.sql("SELECT AVG(x) FROM dask").compute()

	AVG(dask.x)
0	-4.50e-04

Dask, SQL and Pandas

You can of course combine all three libraries!

# Perform a multi-column sort
res = c.sql("""
    SELECT * FROM dask ORDER BY name ASC, id DESC, x ASC
""")

# Now do some follow groupby aggregations
res.groupby("name").agg({"x": "sum", "y": "mean"}).compute()

	x	y
name
Alice	-175.52	1.18e-04
Bob	20.73	2.66e-03
Charlie	37.59	6.39e-04
Dan	-87.80	-9.27e-04
Edith	85.37	3.80e-04
...	...	...
Victor	-113.94	6.78e-04
Wendy	165.64	8.42e-04
Xavier	199.20	-1.62e-04
Yvonne	203.73	3.77e-03
Zelda	-542.47	-1.25e-03

26 rows × 2 columns

Read and write data with Dask

Reading and writing data

.csv is very common in data science (and for good reasons)
Pandas reads and writes .csv files very well, but it is not the best option for large files
It may need several gigabytes of memory to read a large file, as it reads the entire file into memory
Dask provides a much more efficient way to read and write .csv files
Let’s split our dataset

df = dask.datasets.timeseries()
df

Dask DataFrame Structure:

	name	id	x	y
npartitions=30
2000-01-01	object	int64	float64	float64
2000-01-02	...	...	...	...
...	...	...	...	...
2000-01-30	...	...	...	...
2000-01-31	...	...	...	...

Dask Name: make-timeseries, 1 expression

import os
import datetime

if not os.path.exists('data'):
    os.mkdir('data')

def name(i):
    """ Provide date for filename given index

    Examples
    --------
    >>> name(0)
    '2000-01-01'
    >>> name(10)
    '2000-01-11'
    """
    return str(datetime.date(2000, 1, 1) + i * datetime.timedelta(days=1))

df.to_csv('data/*.csv', name_function=name);

Reading and writing data

We now have many CSV files in our data directory, one for each day in the month of January 2000
Each CSV file holds time series data for that day
We can read all of them as one logical dataframe using the dd.read_csv function

df = dd.read_csv('data/2000-*-*.csv')
df

Dask DataFrame Structure:

	timestamp	name	id	x	y
npartitions=30
	object	object	int64	float64	float64
	...	...	...	...	...
...	...	...	...	...	...
	...	...	...	...	...
	...	...	...	...	...

Dask Name: read_csv, 1 expression

Let’s do a simple computation

%timeit df.groupby('name').x.mean().compute()

500 ms ± 59.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Reading and writing data

Parquet files

Although .csv files are nice, new formats like Parquet are gaining popularity
Data are stored by column rather than by row, allowing for efficient querying of specific columns without reading entire rows
It can be used with various programming languages and data processing frameworks
Achieves up to 75% reduction in file size compared to uncompressed formats

df.to_parquet('data/2000-01.parquet', engine='pyarrow')

Now we can read the parquet file

df = dd.read_parquet('data/2000-01.parquet', columns=['name', 'x'], engine='pyarrow')
df

Dask DataFrame Structure:

	name	x
npartitions=30
	object	float64
	...	...
...	...	...
	...	...
	...	...

Dask Name: read_parquet, 1 expression

Let’s do the same computation as before

%timeit df.groupby('name').x.mean().compute()

193 ms ± 21.2 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Dask delayed

Sometimes you don’t want to use an entire Dask DataFrame or Dask Array
You may want to parallelise a single function, for instance, or a small part of your code
There is a way to do this with Dask: the dask.delayed function

def calculation(size=10000000):
    arr = np.random.rand(size)
    for _ in range(10):
        arr = np.sqrt(arr) + np.sin(arr)
    return np.mean(arr)

%timeit [calculation() for _ in range(4)]

1.84 s ± 79.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

We just need to add the @dask.delayed decorator to the function

@dask.delayed
def delayed_calculation(size=10000000):
    arr = np.random.rand(size)
    for _ in range(10):
        arr = np.sqrt(arr) + np.sin(arr)
    return np.mean(arr)

results = []
for _ in range(5):
    results.append(delayed_calculation())

# Compute all results at once
%timeit final_results = dask.compute(*results)

737 ms ± 24.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

As you can see, the results are notably faster, and we didn’t have to change the code at all!
Just remember to add the .compute() method at the end

Dask delayed

We can even visualise the computation graph

@dask.delayed
def generate_data(size):
    return np.random.rand(size)

@dask.delayed
def transform_data(data):
    return np.sqrt(data) + np.sin(data)

@dask.delayed
def aggregate_data(data):
    return {
        'mean': np.mean(data),
        'std': np.std(data),
        'max': np.max(data)
    }

# Compare execution
sizes = [1000000, 2000000, 3000000]

# Dask execution
dask_results = []
for size in sizes:
    data = generate_data(size)
    transformed = transform_data(data)
    stats = aggregate_data(transformed)
    dask_results.append(stats)

%timeit dask.compute(*dask_results)

31.1 ms ± 504 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)

dask.visualize(*dask_results)

Best practices

Parallel computing can be a powerful tool, but it is not always the best solution
So here are some tips from the creators of Dask themselves:
Start small: NumPy, pandas, scikit-learn may have faster functions for what you’re trying to do. Sometimes just by changing the data format to Parquet you can get a huge speedup
Sampling: If you have a large dataset, try working with a sample of the data first. Do you really need all those TBs of data?
Load data with Dask: If you have a large dataset, load it with Dask from the start. It’s much easier to scale up than to scale down
Use .compute() and .persist() sparingly: These functions can be expensive, so use them only when you need to
Experiment with chunk sizes: The right chunk size can make a big difference in performance (or use chunks='auto' if you’re unsure)
Break up computations into many pieces: This will allow Dask to parallelise the computation faster
Use .parquet files for large datasets: They are much more efficient than .csv files
More tips can be found here

And that’s all for today! 🎉

See you next time! 😊

Appendix 01

Here is the solution to the exercise:

def square(x):
    return x**2

# Create a large array to process
numbers = np.arange(1000000)

# Sequential version
%timeit [square(x) for x in numbers]

56.9 ms ± 684 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)

from joblib import Parallel, delayed

# Parallel version
%timeit Parallel(n_jobs=4)(delayed(square)(x) for x in numbers)

2.43 s ± 65 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Back to exercise

Appendix 02

Create an array with 10 million random numbers and calculate: mean(sqrt(x^2))
Try different chunk sizes and see which one works best

size = 10000000

# Dask with SMALL chunks 
da_data_small = da.random.random(size, chunks=100000)
%timeit da.sqrt(da_data_small**2).mean().compute()

29.8 ms ± 1.17 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

# Dask with MEDIUM chunks 
da_data_medium = da.random.random(size, chunks=2000000)
%timeit da.sqrt(da_data_medium**2).mean().compute()

16.7 ms ± 184 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)

# Dask with LARGE chunks 
da_data_large = da.random.random(size, chunks=5000000)
%timeit da.sqrt(da_data_large**2).mean().compute()

27.3 ms ± 1.48 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

# Dask with AUTO chunks 
da_data_auto = da.random.random(size, chunks='auto')
%timeit da.sqrt(da_data_auto**2).mean().compute()

48.1 ms ± 1.34 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Back to exercise

DATASCI 350 - Data Science Computing

Hello, my friends! 😊

Today’s agenda 📅

Lecture outline

Serial vs Parallel Algorithms

Serial Execution

The map function

Using joblib for parallel computing

Serial vs parallel execution

Another example

Processing multiple input files

Parallel processing of images

Parallel processing of images

Big O notation: Why it matters for parallel computing

How parallel computing changes complexity

Some take-aways

Try it yourself! 🧠

Dask

Dask

Dask arrays

Dask arrays

Dask dataframes

Dask dataframes

Dask dataframes

Time series example

Try it yourself! 🧠

Dask and SQL

Dask and SQL

Dask, SQL and Pandas

Read and write data with Dask

Reading and writing data

Reading and writing data

Reading and writing data

Parquet files

Dask delayed

Dask delayed

Dask delayed

Best practices

Best practices

And that’s all for today! 🎉

See you next time! 😊

Appendix 01

Appendix 02

Using `joblib` for parallel computing