Whenever one desires to try out some advanced technique not yet available as a nicely packaged tool like scikit-image, the best solution is to first search for open-source code that approximates what one wants to do. One of the main repositories of such code is Github. As an examples, we will here do semantic segmentation, i.e. segmenting objects in an image.
import sys
import numpy as np
import skimage
import skimage.io
import skimage.transform
from matplotlib import pyplot as plt
Let's have a look at this repository.
We follow the instructions as given. We first check what version of tensorflow we have:
import tensorflow
tensorflow.__version__
So we have to follow the second set of instructions. These are unix type commands that we would normally type in a terminal. As Jupyter support bash commands we can also do it right here:
%%bash
git clone https://github.com/bonlime/keras-deeplab-v3-plus/
cd keras-deeplab-v3-plus/
git checkout 714a6b7d1a069a07547c5c08282f1a706db92e20
Since we only want to try out the package, we will simply add it's path to our current path. If we try multiple packages, this avoid over-crowding the conda environement with useless code. If we want to use it "in production" we can always install it later.
sys.path.append('keras-deeplab-v3-plus')
Now we can finally import the package:
from model import Deeplabv3
We simply follow the instructions given in the repository to run the code. We only modify the image importation as we use a different package (skimage). As always there are some parameters set for pre-processing:
trained_image_width=512
mean_subtraction_value=127.5
Then we can pick the image of our choice:
image = skimage.io.imread('https://upload.wikimedia.org/wikipedia/commons/thumb/0/0c/Cow_female_black_white.jpg/1920px-Cow_female_black_white.jpg')
#image = skimage.io.imread('https://upload.wikimedia.org/wikipedia/commons/3/33/Chat-affut.JPG')
#image = skimage.io.imread('https://upload.wikimedia.org/wikipedia/commons/1/18/TrailKitty.jpg')
image = image.astype('float')
And run the remaining of the proposed code:
# resize to max dimension of images from training dataset
w, h, _ = image.shape
ratio = float(trained_image_width) / np.max([w, h])
resized_image = skimage.transform.resize(image,(int(ratio * w),int(ratio * h)))
#resized_image = np.array(Image.fromarray(image.astype('uint8')).resize((int(ratio * h), int(ratio * w))))
# apply normalization for trained dataset images
resized_image = (resized_image / mean_subtraction_value) - 1.
# pad array to square image to match training images
pad_x = int(trained_image_width - resized_image.shape[0])
pad_y = int(trained_image_width - resized_image.shape[1])
resized_image = np.pad(resized_image, ((0, pad_x), (0, pad_y), (0, 0)), mode='constant')
# make prediction
deeplab_model = Deeplabv3()
res = deeplab_model.predict(np.expand_dims(resized_image,0))
labels = np.argmax(res.squeeze(), -1)
Since we padded and reshaped the image in the pre-processing step, we have now to correct the size of the output labels:
if pad_x > 0:
labels = labels[:-pad_x,:]
if pad_y > 0:
labels = labels[:, :-pad_y]
labels = skimage.transform.resize(labels,(w, h),preserve_range=True, order=0)
plt.imshow(labels)
plt.show()
plt.imshow(image[:,:,0])
plt.show()
class_names = np.array(['background','aeroplane', 'bicycle', 'bird', 'boat',
'bottle', 'bus', 'car', 'cat', 'chair',
'cow', 'diningtable', 'dog', 'horse',
'motorbike', 'person', 'pottedplant',
'sheep', 'sofa', 'train', 'tvmonitor'])
class_names[np.unique(labels).astype(int)]