Intro to Keras, Tensorflow and advanced NN, part 2¶

Summary of first part¶

Terminology¶

A dataset in supervised learning is made of a number of (features, label) pairs
Example, a dataset of diabetic patients is made of:
- Features: information describing each patient (weight, height, blood pressure...)
- Labels: whether each patient is diabetic or not (glucose levels higher or lower than...)
Each (features, label) pair is also called a sample or example. Basically a data point
Features are also sometimes called inputs when referred to something you feed to a NN
Labels are compared to the NN's outputs to see how well the network is doing compared to the truth

https://keras.io ¶

Keras is a high-level neural networks API (front-end), written in Python
Capable of running on top of TensorFlow, CNTK, or Theano (backends)
Built to simplify access to more complex backend libraries

https://tensorflow.org ¶

Use TensorFlow if you want a finer level of control:

Build your own NN layers
Personalized cost function
More complex architectures than those available on Keras

We will be mostly writing python code using Keras libraries, but "under the hood" Keras is using tensorflow libraries.

The documentation is at keras.io.

Here's how a NN layer looks like in TensorFlow:¶

7 samples in batch
784 inputs
500 outputs

A neural network in Keras is called a Model¶

The simplest kind of model is of the Sequential kind:

In [6]:

from tensorflow.keras.models import Sequential

model = Sequential()

This is an "empty" model, with no layers, no inputs or outputs are defined either.

Adding layer is easy:

In [7]:

from tensorflow.keras.layers import Dense

model.add(Dense(units=3, activation='relu', input_dim=3))
model.add(Dense(units=2, activation='softmax'))

A "Dense" layer is a fully connected layer as the ones we have seen in Multi-layer Perceptrons. The above is equal to having this network:

If we want to see the layers in the Model this far, we can just call:

In [ ]:

model.summary()

Using "model.add()" keeps stacking layers on top of what we have:

In [ ]:

model.add(Dense(units=2, activation=None))
model.summary()

Part 2, more Keras layers (https://keras.io/api/layers/)¶

Common layers (we will cover most of these!)

Trainable
- Dense (fully connected/MLP)
- Conv1D (2D/3D)
- Recurrent: LSTM/GRU/Bidirectional
- Embedding
- Lambda (apply your own function)
Non-trainable
- Dropout
- Flatten
- BatchNormalization
- MaxPooling1D (2D/3D)
- Merge (add/subtract/concatenate)
- Activation (Softmax/ReLU/Sigmoid/...)

Dropout is a regularization layer¶

It's applied to a previous layer's output
Takes those outputs and randomly sets them to 0 with probability p
Other outputs are scaled up so that the sum of the inputs remains unchanged
if p = 0.5: model.add(Dropout(0.5))

In [1]:

import tensorflow as tf
import numpy as np
from tensorflow.keras.layers import Dropout
from tensorflow.keras import backend as K

tf.random.set_seed(1)
drop = Dropout(0.5, input_shape=(4,))
data = tf.reshape(tf.range(1.0,13.0), (3, 4))

print("Before:", data, sep="\n")
output = drop(data, training=True)
print("After:", K.eval(output), sep="\n")

Before:
tf.Tensor(
[[ 1.  2.  3.  4.]
 [ 5.  6.  7.  8.]
 [ 9. 10. 11. 12.]], shape=(3, 4), dtype=float32)
After:
[[ 0.  4.  6.  0.]
 [ 0. 12. 14.  0.]
 [18. 20. 22. 24.]]

2023-03-20 07:58:31.056981: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2023-03-20 07:58:31.057209: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-03-20 07:58:31.058599: I tensorflow/core/common_runtime/process_util.cc:146] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.

Dropout is a regularization layer¶

Applying the same input twice will give different results
Means that it is harder for the network to memorize patterns
Helps curb overfitting
Especially used with Dense() layers which are prone to overfitting
Active only at training time

In [2]:

import numpy as np
from tensorflow.keras.layers import Dropout
from tensorflow.keras import backend as K

#f.random.set_seed(0)
drop = Dropout(0.5, input_shape=(4,))
data = tf.reshape(tf.range(1.0,13.0), (3, 4))

print("Before:", data, sep="\n")
output = drop(data, training=True)
print("After:", K.eval(output), sep="\n")

Before:
tf.Tensor(
[[ 1.  2.  3.  4.]
 [ 5.  6.  7.  8.]
 [ 9. 10. 11. 12.]], shape=(3, 4), dtype=float32)
After:
[[ 2.  0.  0.  8.]
 [10.  0.  0. 16.]
 [ 0. 20.  0. 24.]]

Lambda layers¶

Work like regular lambda functions
Inputs and outputs are tensors, functions inside must be keras/tensorflow functions
Function has to be differentiable

In [3]:

from tensorflow.keras.layers import Lambda
from tensorflow.keras import backend as K

def sum_two_tensors(inputs):

    x, y = inputs
    sum_of_tensors = x + y

    return sum_of_tensors

input_tensor_1 = tf.range(0, 9)
input_tensor_2 = tf.range(1, 10)
print(input_tensor_1)
print(input_tensor_2)
#lambda_out = Lambda(sum_two_tensors)([input_tensor_1, input_tensor_2])
lambda_layer = Lambda(sum_two_tensors)
lambda_out = lambda_layer([input_tensor_1, input_tensor_2])
K.eval(lambda_out)

#model.add(Lambda(sum_two_tensors))

tf.Tensor([0 1 2 3 4 5 6 7 8], shape=(9,), dtype=int32)
tf.Tensor([1 2 3 4 5 6 7 8 9], shape=(9,), dtype=int32)

Out[3]:

array([ 1,  3,  5,  7,  9, 11, 13, 15, 17], dtype=int32)

Keras activations (https://keras.io/api/layers/activations/)¶

Activation functions for regression or inner layers:

Sigmoid
Tanh
ReLU
LeakyReLU
Linear (None)

THE activation function for classification (output layer only):

Softmax (ouputs probabilities for each class)

Softmax¶

It's an activation function applied to a output vector z with K elements (one per class) and outputs a probability distribution over the classes:

What makes softmax your favorite activation:

K outputs sums to 1
K probabilities proportional to the exponentials of the input numbers
No negative outputs
Monotonically increasing output with increasing input

Softmax is usually only used to activate the last layer of a NN

ReLU vs. old-school logistic functions¶

Historically, sigmoid and tanh were the most used activation functions
Easy derivative
Bound outputs (ex: from 0 to 1)
They look like this:

ReLU vs. old-school logistic functions¶

Problems arise when we are at large $|x|$
The derivative in that area becomes small (saturation)
Remember what the chain rule said?

ReLU vs. old-school logistic functions¶

When we have $n$ layers, we go through $n$ activation functions
At layer $n$ the derivative is proportional to: $$\begin{eqnarray} \frac{\partial L(w,b|x)}{\partial w_{ln}} & \propto & \frac{\partial a_{ln}}{\partial z_{ln}} \end{eqnarray}$$
At layer 1 the derivative is proportional to: $$\begin{eqnarray} \frac{\partial L(w,b|x)}{\partial w_{l1}} & \propto & \frac{\partial a_{ln}}{\partial z_{ln}} \times \frac{\partial a_{n-1}}{\partial z_{ln-1}} \times \frac{\partial a_{ln-2}}{\partial z_{ln-2}} \ldots \times \frac{\partial a_{l1}}{\partial z_{l1}} \end{eqnarray}$$
It is the product of many numbers $< 1$
Gradient becomes smaller and smaller for the initial layers
Gradient vanishing problem

ReLU is the first activation to address the issue¶

Used in "internal" layers, usually not at last layer

Pros:

Easy derivative (1 for x > 0, 0 elsewhere)
Derivative doesn't saturate for x > 0: alleviates gradient vanishing
Non-linear

Cons:

Non-derivable at 0
Dead neurons if x << 0 for all data instances
Potential gradient explosion
Let's try this on Tensorflow playground: http://playground.tensorflow.org

Other ReLU-like activations¶

LeakyReLU/PReLU

y = $\alpha$x at x < 0
In PReLU $\alpha$ is learned

Other¶

ELU

Derivable at 0
Non-zero at x < 0

In [4]:

from IPython.display import IFrame 
IFrame('https://polarisation.github.io/tfjs-activation-functions/', width=860, height=470)

Out[4]:

Setting activations in Keras¶

We can add activations as string parameters, or as functions:

In [8]:

model = Sequential() 
model.add(Dense(units=2, activation='sigmoid'))
model.add(Dense(units=2, activation='relu'))
model.add(Dense(units=2, activation=tf.keras.activations.relu))
model.add(Dense(units=2, activation='softmax'))

But also as separate layers

In [9]:

import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Activation, Dense

model = Sequential() 
model.add(Dense(units=2))
model.add(Activation('sigmoid'))
model.add(Dense(units=2))
model.add(Activation('relu'))
model.add(Dense(units=2))
model.add(Activation(tf.keras.activations.relu))
model.add(Dense(units=2))
model.add(Activation('softmax'))

Passing classes as parameters¶

Some parameters can be set by passing a string (optimizer='rmsprop')
we need to explicitly import the object if we want better control (optimizer=RMSprop())

In [10]:

from tensorflow.keras.optimizers import RMSprop
model.compile(optimizer=RMSprop(),                    #adaptive learning rate method
              loss='sparse_categorical_crossentropy', #loss function for classification problems with integer labels
              metrics=['accuracy'])                   #the metric doesn't influence the training

model.optimizer.get_config()

Out[10]:

{'name': 'RMSprop',
 'learning_rate': 0.001,
 'decay': 0.0,
 'rho': 0.9,
 'momentum': 0.0,
 'epsilon': 1e-07,
 'centered': False}

Passing classes as parameters¶

Some parameters can be set by passing a string (optimizer='rmsprop')
we need to explicitly import the object if we want better control (optimizer=RMSprop())

In [11]:

from keras.optimizers import RMSprop
model.compile(optimizer=RMSprop(learning_rate=1.0),   #adaptive learning rate method
              loss='sparse_categorical_crossentropy', #loss function for classification problems with integer labels
              metrics=['accuracy'])                   #the metric doesn't influence the training

model.optimizer.get_config()

Out[11]:

{'name': 'RMSprop',
 'learning_rate': 1.0,
 'decay': 0.0,
 'rho': 0.9,
 'momentum': 0.0,
 'epsilon': 1e-07,
 'centered': False}

There are multiple ways to pass data to fit()¶

You can load all of the data in memory, assign it to:
- numpy array or list of arrays (if you have multiple inputs/outputs)
- TensorFlow tensors
- A dictionary to map input names to arrays/tensors

data = np.genfromtxt('path/to/dataset.csv',delimiter=',')

X_train = data[:,0:10]
y_train = data[:,10]

model.fit(X_train, y_train,...)

There are multiple ways to pass data to fit()¶

Or you can pass it an object/function that generates data for you:
- A generator() function
- A keras.utils.Sequence object
- A tensorflow.data.Dataset object

Here a quick example on how a generator that loads loads data from a list of files (images, pickle objects, csv files...) on the filesystem:

def generator(input_list):
    input_list_file = open(input_list, 'r')
    while 1:
        for next_file in input_list_file:

            data = open(next_file, 'r').readlines()
            X = data[:,0:10]
            y = data[:,10]

            yield X,y
        input_list_file.seek(0)

model.fit(generator(train_data_list),...)

Even more Keras layers¶

Dense is the classic FFNN where all nodes between layers are connected
Most of the other layers seen today are not trainable
What other layers are trainable then?

Convolutional layers¶

Used where the spatial relationship between inputs is significant
Classic example: imaging
Different types: 1D, 2D, 3D

from tensorflow.keras.layers import Conv2D

model.add(Conv2D(filters, kernel_size, strides=(1, 1), padding="valid"))

Convolutional layers¶

(source)

Recurrent layers¶

Used when the temporal relationship between inputs is significant
Examples: audio, text
Different types: LSTM, GRU...

from tensorflow.keras.layers import LSTM
model.add(LSTM(units, activation="tanh", recurrent_activation="sigmoid"))

Recurrent layers¶

Embedding layers¶

Used to transform a discrete input into a vector
Example: text input is made of words, how do we translate that into NN inputs?
"cat" -> [0.1, 0.003, 1.2 ..., 0]

from tensorflow.keras.layers import Embed
model.add(Embedding(input_dim, output_dim))

Embedding layers¶

Example: map amino acid names to 2D space
Which amino acids are most similar to tryptophan (W)?

The functional API in Keras¶

https://keras.io/guides/functional_api
Sequential() is quite simple, but limited
What if we want to have multiple input/output layers?
What if we want a model that is not just a linear sequence of layers?

Exercise 2/3 (reprise)¶

Remember the XOR classifier? Or the Boston housing dataset?
Can you apply some of the things we have learned today on the models from yesterday?
Do they help?

Exercise 4 (optional)¶

Classifying IMDB reviews into positive or negative.

Check the exercises notebook!

Intro to Keras, Tensorflow and advanced NN, part 2¶

Summary of first part¶

Terminology¶

https://keras.io¶

https://tensorflow.org¶

Here's how a NN layer looks like in TensorFlow:¶

A neural network in Keras is called a Model¶

Part 2, more Keras layers (https://keras.io/api/layers/)¶

Dropout is a regularization layer¶

Dropout is a regularization layer¶

Lambda layers¶

Keras activations (https://keras.io/api/layers/activations/)¶

Softmax¶

ReLU vs. old-school logistic functions¶

ReLU vs. old-school logistic functions¶

ReLU vs. old-school logistic functions¶

ReLU is the first activation to address the issue¶

Other ReLU-like activations¶

Other¶

Setting activations in Keras¶

Passing classes as parameters¶

Passing classes as parameters¶

There are multiple ways to pass data to fit()¶

There are multiple ways to pass data to fit()¶

Even more Keras layers¶

Convolutional layers¶

Convolutional layers¶

Recurrent layers¶

Recurrent layers¶

Embedding layers¶

Embedding layers¶

The functional API in Keras¶

Exercise 2/3 (reprise)¶

Exercise 4 (optional)¶

https://keras.io ¶

https://tensorflow.org ¶