import glob, os
from dask.distributed import Client
from dask import delayed
import skimage.io
import skimage.filters
import numpy as np
import matplotlib.pyplot as plt
A very common problem when dealing with image processing, is to have a set of images in a folder and having to apply a time-consuming operation on all of them.
Let's first get the names of all images:
filenames = glob.glob('../Data/BBBC032_v1_dataset/*.tif')
filenames
Dask is not good at parsing filenames so we transform those into absolute paths:
filenames = [os.path.abspath(f) for f in filenames]
We can import a single image using the io
module of scikit-image:
image = skimage.io.imread(filenames[0])
image.shape
It is a quite large image representing volume data. Typical image filtering functions could be relatively slow on this especially with large kernels. We are going to do a gaussian filtering on only part of the image and then measure the mean value of the array:
%%time
image = skimage.io.imread(filenames[0])
filtered = skimage.filters.gaussian(image[0:40,:,:],0.1)
mean_val =np.mean(im)
If we execute that function on all images we are obsiously going to spend about 1min on this. Let's try to make it faster using Dask:
client = Client()
client
%%time
all_vals = []
for f in filenames:
im = skimage.io.imread(f)
im = skimage.filters.gaussian(im[0:40,:,:],0.1)
mean_val = np.mean(im)
all_vals.append(mean_val)
np.max(all_vals)
all_vals = []
for f in filenames:
im = delayed(skimage.io.imread)(f)
im = delayed(skimage.filters.gaussian)(im[0:40,:,:],0.1)
mean_val = delayed(np.mean)(im)
all_vals.append(mean_val)
max_mean = delayed(np.max)(all_vals)
max_mean.visualize()
%%time
max_mean.compute()