Commit 73476ae3 authored by Yucesan's avatar Yucesan
Browse files

touch up on notebooks

parent fbc98d6e
%% Cell type:markdown id:necessary-transparency tags:
# Building a Neural Network from Scratch with NumPy
%% Cell type:markdown id:commercial-relief tags:
Necessary imports
%% Cell type:code id:capital-scroll tags:
``` python
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import random
# np.random.seed(11)
# random.seed(11)
```
%% Cell type:markdown id:lightweight-rough tags:
Create a toy problem with two inputs and single output
%% Cell type:code id:moved-practitioner tags:
``` python
N = 500 #Number of points per input
x1 = np.linspace(0.0,1.0,N)
x2 = np.linspace(0.0,1.0,N)
def y_fn(x1,x2):
return (x1 - 0.5) + (x2 - 0.5)**2 - np.sin(x1) + np.cos(x1*x2) #Test your own functions!
```
%% Cell type:markdown id:frank-cartridge tags:
Visualize the true input output relationship
%% Cell type:code id:round-neutral tags:
``` python
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d') #Need to import Axes 3D to be able to plot surfaces
x1grid, x2grid = np.meshgrid(x1, x2) #Convert our inputs into a grid
x1grid_flat = np.ravel(x1grid) #Flattened version of x1 input of grid
x2grid_flat = np.ravel(x2grid) #Flattened version of x2 input of grid
xgrid = np.vstack((x1grid_flat,x2grid_flat)) #Stack the inputs to have (N,2) shaped input array
y_flat = np.array(y_fn(x1grid_flat, x2grid_flat)) #Pass the inputs to our function and get the output
ygrid = y_flat.reshape(x1grid.shape) #Reshape the output into grid
ax.plot_surface(x1grid, x2grid, ygrid)
ax.set_xlabel('x1')
ax.set_ylabel('x2')
ax.set_zlabel('y')
plt.show()
```
%% Cell type:markdown id:ordinary-disorder tags:
Determine the neural network architecture (# layers, # nodes, activation functions)
- Layers: Depth of the network
- Deep network: better abstraction of latent features (prone to overfitting)
- Shallow network: better generalization
- Nodes (neurons):
- Wide network: more parameters per layer (same pro/con as deep network)
- Narrow network: faster training / inference (prediction)
- Activation functions:
- Introduce nonlinearity to network
- Must have available gradients (for backpropagation)
- Picked suitable for data and problem (normalization / constraints)
**Output shape of previous layer should match with the input shape of next layer for fully connected network.**
%% Cell type:code id:palestinian-wound tags:
``` python
nn_architecture = [
{"input_dim": 2, "output_dim": 8, "activation": "tanh"},
{"input_dim": 8, "output_dim": 32, "activation": "tanh"},
{"input_dim": 32, "output_dim": 4, "activation": "tanh"},
{"input_dim": 4, "output_dim": 1, "activation": "tanh"}
]
print("Number of layers: {}".format(len(nn_architecture)))
```
%% Cell type:markdown id:employed-measure tags:
Randomly initialize weights and biases:
- Bias is optional
- Recommended to initialize small weights
**Smart initialization can help the network.**
%% Cell type:code id:hairy-slovenia tags:
``` python
def init_layers(nn_architecture):
number_of_layers = len(nn_architecture)
params_values = {}
layer_idx = 0
for layer in nn_architecture:
layer_idx += 1
layer_input_size = layer["input_dim"]
layer_output_size = layer["output_dim"]
params_values['W' + str(layer_idx)] = np.random.randn(
layer_output_size, layer_input_size) * 0.5 #Parameters scaled to prevent large initial weights, yields to divergence.
params_values['b' + str(layer_idx)] = np.random.randn(
layer_output_size, 1) * 0.5
return params_values
nn_params = init_layers(nn_architecture)
print(nn_params)
total_param_no = 0
for val in nn_params.values():
total_param_no += val.shape[0] * val.shape[1]
print("Total number of trainable parameters: {}".format(total_param_no))
```
%% Cell type:markdown id:tender-leisure tags:
Mainstream activation functions and their derivatives:
- Sigmoid: Map inputs (regardless of the sign) between 0 and 1. Can be used as a switch.
- Hyperbolic Tangent (tanh): Map inputs between -1 and 1. Can be used where output is allowed to be negative.
- Rectified Linear Unit (ReLU): Linearly activated when inputs are positive. Recommended to be used in hidden layers to prevent vanishing gradient problem, as Sigmoid and tanh are prone to saturate.
- Linear: Does literally nothing. Can cause exploding gradients (means if the optimum solution is somewhere on the earth, we are leaving the planet).
%% Cell type:code id:changed-gilbert tags:
``` python
def sigmoid(Z):
return 1/(1+np.exp(-Z))
def relu(Z):
return np.maximum(0,Z)
def tanh(Z):
return np.tanh(Z)
def linear(Z):
return Z
def sigmoid_backward(dA, Z):
sig = sigmoid(Z)
return dA * sig * (1 - sig)
def relu_backward(dA, Z):
dZ = np.array(dA, copy = True)
dZ[Z <= 0] = 0;
return dZ;
def tanh_backward(dA, Z):
tnh = np.tanh(Z)
return dA * (1 - tnh**2)
def linear_backward(dA, Z):
return dA
```
%% Cell type:markdown id:noted-details tags:
Let's visualize the activation functions and derivatives
%% Cell type:code id:twelve-concert tags:
``` python
x_act = np.linspace(-5.0,5.0,1000)
fig, ax = plt.subplots(2,2, figsize=(12,8))
ax[0][0].plot(x_act,sigmoid(x_act), label='sigmoid')
ax[0][0].plot(x_act,sigmoid_backward(np.repeat([1],x_act.shape[0]),x_act), label='sigmoid\'')
ax[0][0].legend()
ax[1][0].plot(x_act,tanh(x_act), label='tanh')
ax[1][0].plot(x_act,tanh_backward(np.repeat([1],x_act.shape[0]),x_act), label='tanh\'')
ax[1][0].legend()
ax[0][1].plot(x_act,relu(x_act), label='relu')
ax[0][1].plot(x_act,relu_backward(np.repeat([1],x_act.shape[0]),x_act), label='relu\'')
ax[0][1].legend()
ax[1][1].plot(x_act,linear(x_act), label='linear')
ax[1][1].plot(x_act,linear_backward(np.repeat([1],x_act.shape[0]),x_act), label='linear\'')
ax[1][1].legend()
```
%% Cell type:markdown id:prerequisite-manitoba tags:
We can embed single forward pass of a single layer in a function simply:
%% Cell type:code id:discrete-polyester tags:
``` python
def single_layer_forward_propagation(A_prev, W_curr, b_curr, activation="tanh"):
Z_curr = np.dot(W_curr, A_prev) + b_curr #Where forward pass happens. As simple as linear algebra gets.
if activation == "relu":
activation_func = relu
elif activation == "sigmoid":
activation_func = sigmoid
elif activation == "tanh":
activation_func = tanh
elif activation == "linear":
activation_func = linear
else:
raise Exception('Non-supported activation function')
return activation_func(Z_curr), Z_curr #We keep the output of the layer before and after passed through activation,
#as it will be used during backpropagation
```
%% Cell type:markdown id:inclusive-marine tags:
We can visualize the first full forward pass of our neural network (which is also the initial guess of our model):
%% Cell type:code id:tracked-vietnamese tags:
``` python
memory_fp = {}
inp_fp = xgrid #Make sure our initial input is our grid.
for layer_idx, layer in enumerate(nn_architecture): #Iterate over layers from first to last.
print('Layer {}'.format(layer_idx+1))
memory_fp['A'+str(layer_idx)] = inp_fp
A_layer_fp, Z_layer_fp = single_layer_forward_propagation(inp_fp,nn_params['W'+str(layer_idx+1)],nn_params['b'+str(layer_idx+1)],layer['activation']) #Iterate over layers from first to last.
memory_fp['Z'+str(layer_idx+1)] = Z_layer_fp
inp_fp = A_layer_fp
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.plot_surface(x1grid, x2grid, ygrid)
ax.plot_surface(x1grid, x2grid, A_layer_fp.reshape(x1.shape[0],x2.shape[0]))
ax.set_xlabel('x1')
ax.set_ylabel('x2')
ax.set_zlabel('y')
print("Blue is true output, Orange is our initial guess per our initial parameters.")
```
%% Cell type:markdown id:shaped-mechanism tags:
Another function to conduct forward propagation across all layers of the network (note that we keep track of the previous values, to use later on back propagation):
%% Cell type:code id:advisory-network tags:
``` python
def full_forward_propagation(X, params_values, nn_architecture):
memory = {}
A_curr = X
layer_idx = 0
for layer in nn_architecture:
layer_idx += 1
A_prev = A_curr
activ_function_curr = layer["activation"]
W_curr = params_values["W" + str(layer_idx)]
b_curr = params_values["b" + str(layer_idx)]
A_curr, Z_curr = single_layer_forward_propagation(A_prev, W_curr, b_curr, activ_function_curr)
memory["A" + str(layer_idx-1)] = A_prev
memory["Z" + str(layer_idx)] = Z_curr
return A_curr, memory #Spit out the final output and cache of all hidden outputs
```
%% Cell type:markdown id:numeric-orbit tags:
Mean squared error (MSE) is a popular loss function for NNs:
- Error between model prediction and true output.
- This is what we want to minimize by adjusting model parameters.
- Squared error provides positive value and accelerates the gradient descent.
%% Cell type:code id:bacterial-complex tags:
``` python
def get_loss_value(Y_hat, Y):
m = Y.shape[-1]
return np.sum((Y_hat - Y)**2) / m
def get_loss_grad(Y_hat, Y):
return 2 * (Y_hat - Y)
```
%% Cell type:markdown id:essential-elite tags:
Print out the loss value obtained with the initial guess, and take the square root to get a sense of the order of magnitude of prediction error in the same scale with the actual output:
%% Cell type:code id:protecting-indiana tags:
``` python
print(np.sqrt(get_loss_value(A_layer_fp, y_flat)))
```
%% Cell type:markdown id:optimum-speech tags:
Now the back propagation for a single layer:
%% Cell type:code id:fresh-walker tags:
``` python
def single_layer_backward_propagation(dA_curr, W_curr, b_curr, Z_curr, A_prev, activation="tanh"):
m = A_prev.shape[1]
if activation == "relu":
backward_activation_func = relu_backward
elif activation == "sigmoid":
backward_activation_func = sigmoid_backward
elif activation == "tanh":
backward_activation_func = tanh_backward
elif activation == "linear":
backward_activation_func = linear_backward
else:
raise Exception('Non-supported activation function')
dZ_curr = backward_activation_func(dA_curr, Z_curr) #Gradient of activated output and value before activation.
dW_curr = np.dot(dZ_curr, A_prev.T) / m #Derivative with respect to the weights. Multiplication of gradient of activation and the input of the layer
db_curr = np.sum(dZ_curr, axis=1, keepdims=True) / m #Derivative with respect to the bias.
dA_prev = np.dot(W_curr.T, dZ_curr) #Finally gradient of the layer overall to be passed to the next (or previous in this case) layer.
return dA_prev, dW_curr, db_curr
```
%% Cell type:markdown id:annual-florence tags:
Since we already have completed the first forward pass and evaluated the initial loss function value, we can also evaluate the gradients with a single backward propagation.
As we print out gradient values, we can see a gradient value corresponding to every single parameter of our network:
%% Cell type:code id:independent-transmission tags:
``` python
grad_fp = {}
dA_prev_fp = get_loss_grad(A_layer_fp,y_flat) #Initial derivative to feed the last layer comes from the gradient of the loss function.
for layer_idx, layer in reversed(list(enumerate(nn_architecture))): #Notice we are moving backwards here.
print('Layer {}'.format(layer_idx+1))
dA_curr_fp = dA_prev_fp
dA_prev_fp, dW_layer_fp, db_layer_fp = single_layer_backward_propagation(dA_curr_fp, nn_params['W'+str(layer_idx+1)],
nn_params['b'+str(layer_idx+1)], memory_fp['Z'+str(layer_idx+1)],
memory_fp['A'+str(layer_idx)], layer["activation"])
grad_fp['dW'+str(layer_idx+1)] = dW_layer_fp
grad_fp['db'+str(layer_idx+1)] = db_layer_fp
print(grad_fp)
```
%% Cell type:markdown id:turned-reducing tags:
Back propagation of entire network wrapped in a function:
%% Cell type:code id:distributed-atlantic tags:
``` python
def full_backward_propagation(Y_hat, Y, memory, params_values, nn_architecture):
grad_values = {}
m = Y.shape[1]
Y = Y.reshape(Y_hat.shape) #Make sure we have same shapes of predictions and true outputs.
dA_prev = get_loss_grad(Y_hat, Y)
for layer_idx_prev, layer in reversed(list(enumerate(nn_architecture))):
layer_idx_curr = layer_idx_prev + 1
activ_function_curr = layer["activation"]
dA_curr = dA_prev
A_prev = memory["A" + str(layer_idx_prev)]
Z_curr = memory["Z" + str(layer_idx_curr)]
W_curr = params_values["W" + str(layer_idx_curr)]
b_curr = params_values["b" + str(layer_idx_curr)]
dA_prev, dW_curr, db_curr = single_layer_backward_propagation(
dA_curr, W_curr, b_curr, Z_curr, A_prev, activ_function_curr)
grad_values["dW" + str(layer_idx_curr)] = dW_curr
grad_values["db" + str(layer_idx_curr)] = db_curr
return grad_values
```
%% Cell type:markdown id:legendary-might tags:
Now we update the parameters using gradients calculated with back propagation. This is where we decide what is the step size we want to go with towards the gradient to update our parameters:
- Large learning rate: Can avoid local optima, unstable due to skipping.
- Small learning rate: Steady convergence, can stuck in local optima.
%% Cell type:code id:rolled-manner tags:
``` python
def update(param_values, grad_values, nn_architecture, learning_rate):
for layer_idx, layer in enumerate(nn_architecture):
param_values["W" + str(layer_idx+1)] -= learning_rate * grad_values["dW" + str(layer_idx+1)] #Subtract because we want to descend.
param_values["b" + str(layer_idx+1)] -= learning_rate * grad_values["db" + str(layer_idx+1)]
return param_values;
```
%% Cell type:markdown id:musical-facial tags:
Now we can visualize our updated model predictions after first epoch (one full forward and backward pass):
%% Cell type:code id:published-mississippi tags:
``` python
nn_params = update(nn_params,grad_fp,nn_architecture,5e-2)
A_sp, memory_sp = full_forward_propagation(xgrid, nn_params, nn_architecture)
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.plot_surface(x1grid, x2grid, ygrid)
ax.plot_surface(x1grid, x2grid, A_layer_fp.reshape(x1.shape[0],x2.shape[0]))
ax.plot_surface(x1grid, x2grid, A_sp.reshape(x1.shape[0],x2.shape[0]))
ax.set_xlabel('x1')
ax.set_ylabel('x2')
ax.set_zlabel('y')
print("Blue is true output, Orange is our initial guess per our initial parameters, and Green is updated model predictions.")
```
%% Cell type:markdown id:revised-identification tags:
Wrap the training steps into a single function.
%% Cell type:code id:apparent-romantic tags:
``` python
def train(X, Y, nn_architecture, epochs, learning_rate):
param_values = init_layers(nn_architecture) #Initialize weights and biases.
loss_history = []
lr_change_ep = 0
for i in range(epochs): #Run loops of forward and backward pass as long as predetermined number of epochs.
Y_hat, memory = full_forward_propagation(X, param_values, nn_architecture) #Full forward pass.
loss = get_loss_value(Y_hat, Y) #Evaluate loss.
loss_history.append(loss) #Keep record of loss.
print('Epoch: {}, Loss: {}'.format(i, loss))
#A small subroutine for adaptive learning rate. Monitor the loss improvement and reduce LR
#if improvement is below a certain threshold for certain amount of epochs.
if i > 10:
if np.all(np.diff(loss_history[-10:]) > -1e-8) and i - lr_change_ep > 50:
lr_change_ep = i
learning_rate = learning_rate * 0.8
print("Reducing learning rate to {}".format(learning_rate))
grad_values = full_backward_propagation(Y_hat, Y, memory, param_values, nn_architecture) #Full backward pass.
param_values = update(param_values, grad_values, nn_architecture, learning_rate) #Update parameters.
return param_values, loss_history, Y_hat
```
%% Cell type:markdown id:advised-growth tags:
Train the neural network
%% Cell type:code id:bound-synthesis tags:
``` python
n_epoch = 100
lr = 5e-2
params, loss, y_pred = train(xgrid, np.expand_dims(y_flat,0), nn_architecture, n_epoch, lr)
```
%% Cell type:markdown id:unnecessary-invitation tags:
Visualize prediction against true values
%% Cell type:code id:cloudy-accused tags:
``` python
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.plot_surface(x1grid, x2grid, ygrid)
ax.plot_surface(x1grid, x2grid, y_pred.reshape(x1.shape[0],x2.shape[0]))
ax.set_xlabel('x1')
ax.set_ylabel('x2')
ax.set_zlabel('y')
plt.show()
```
%% Cell type:markdown id:funded-concentrate tags:
Visualize pointwise prediction
%% Cell type:code id:adult-controversy tags:
``` python
plt.figure()
plt.plot(y_flat.flatten(),y_pred.flatten())
plt.scatter(y_flat.flatten(),y_pred.flatten(),s=0.001)
plt.plot([0.4,1.0],[0.4,1.0],'k--')
plt.xlabel('True')
plt.ylabel('Predicted')
plt.grid(True)
plt.tight_layout()
```
%% Cell type:code id:fuzzy-communications tags:
``` python
```
......
%% Cell type:markdown id:parental-mississippi tags:
# Using TensorFlow Keras for ML
%% Cell type:markdown id:straight-bottle tags:
Necessary imports
%% Cell type:code id:comfortable-leader tags:
``` python
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import random
import tensorflow as tf
from tensorflow.keras import optimizers, initializers
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.callbacks import ReduceLROnPlateau
# np.random.seed(1)
# random.seed(1)
# tf.random.set_seed(1)
```
%% Cell type:markdown id:classical-section tags:
Create a toy problem with two inputs and single output
%% Cell type:code id:pointed-magnet tags:
``` python
N = 500
x1 = np.linspace(0.0,1.0,N)
x2 = np.linspace(0.0,1.0,N)
def y_fn(x1,x2):
return (x1 - 0.5) + (x2 - 0.5)**2 - np.sin(x1) + np.cos(x1*x2)
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
x1grid, x2grid = np.meshgrid(x1, x2)
x1grid_flat = np.ravel(x1grid)
x2grid_flat = np.ravel(x2grid)
y_flat = np.array(y_fn(x1grid_flat, x2grid_flat))
ygrid = y_flat.reshape(x1grid.shape)
ax.plot_surface(x1grid, x2grid, ygrid)
ax.set_xlabel('x1')
ax.set_ylabel('x2')
ax.set_zlabel('y')
plt.show()
```
%% Cell type:markdown id:local-london tags:
Function to create neural network model with a Sequential API of TensorFlow Keras:
- Sequential API: Layers are stacked one after the other. Represent fully-connected networks.
- Functional API: Customized connections between layers.
%% Cell type:code id:fewer-volleyball tags:
``` python
def build_nn_model(lr):
#Simply define layers with number of neurons and type of activation function.
#We can pass the way we want to initialize the parameters.
#If no argument is passed for initialization, initialized randomly.
model = Sequential([
Dense(8, activation='tanh'),
Dense(32, activation='tanh', kernel_initializer=initializers.RandomNormal()),
Dense(4, activation='tanh', kernel_initializer=initializers.RandomNormal()),
Dense(1, activation='tanh'),
])
optimizer = optimizers.Adam(lr) #Built-in optimizer within Keras takes the initial learning rate as input.
model.compile(loss='mean_squared_error', #Loss, monitored metrics, and the optimizer passed as the model is compiled.
optimizer=optimizer,
metrics=['mean_absolute_error'])
return model
```
%% Cell type:markdown id:spatial-portsmouth tags:
Separate data into training, validation, and test sets:
- Training set: Portion of data used to train the network.
- Validation set: Portion of data used to monitor loss and metrics during training and tune the optimizer if needed.
- Test set: Portion of data left out of training process. Used to evaluate model performance after training is complete.
%% Cell type:code id:synthetic-passenger tags:
``` python
input_array = np.vstack((x1grid_flat,x2grid_flat)).transpose()
output_array = np.expand_dims(y_flat,-1)
#Shuffle the data so that training, validation, and test data sets are sampled randomly throughout the domain.
permute = np.random.permutation(input_array.shape[0])
input_array_shuffled = input_array[permute,:]