Press enter to see results or esc to cancel.

Building an Artificial Neuron in Python – the Perceptron

With so much recent hype around Deep Learning, I wanted to take a closer look at neural networks and what has motivated the resurgence of algorithms that have existed since the 1950s. Neural networks, in their simplest form, work the same way as a biological neuron. A neuron has dendrites to receive signals, a cell to process those signals, and an axon to pass on the processed signals to other neurons. The artificial neuron, or neural network, has a series of input channels, a processing stage, and an output.

We learn how to identify and classify new objects and new faces by being told what they are – neural networks are no different. The simplest neural network is called the perceptron, and was developed by Frank Rosenblatt in 1958. The perceptron is an algorithm for training binary classifiers. These classifiers decide whether an input, represented by a vector of numbers, belongs to a particular class or not1.  

The Mark I Perceptron machine was connected to a cadmium sulfide photocell cell that produced a 20×20 pixel image. The wires on the left are part of a patchboard that allowed the user to try different combinations of input features. The weights applied to each of the input features were encoded with potentiometers, and updates to the weights were made with electric motors.

Mathematical definition

The mathematical definition of the binary classifier is:

    \[f(x)=\begin{cases} 1 & \text{if $w \cdot x  + b > 0$}.\\ 0 & \text{otherwise}. \end{cases}\]

Where x is a real-valued input vector and f(x) is the output value (a single binary value). The vector of weights is defined by w, w \cdot x is the dot product

    \[\sum_{i=1}^{m}{w_ix_i}\]

where m is the number of inputs to the perceptron and b is the bias.

As the output of f(x) is a binary value, the perceptron will learn functions that classify the inputs x as either a positive or negative instance.

A perceptron for detecting underwater mines

For this post, we focus on training a single layer perceptron, as many of the same techniques are used to train more complex neural networks. To recreate Rosenblatt’s perceptron, we will look at an implementation in Python.We have adapted the code from Jason Brownlee’s excellent tutorial here.

To train the our neural net, we used the Sonar data set from the UCI Machine Learning repository. From the data set description:

The file “sonar.mines” contains 111 patterns obtained by bouncing sonar signals off a metal cylinder at various angles and under various conditions. The file “sonar.rocks” contains 97 patterns obtained from rocks under similar conditions. The transmitted sonar signal is a frequency-modulated chirp, rising in frequency. The data set contains signals obtained from a variety of different aspect angles, spanning 90 degrees for the cylinder and 180 degrees for the rock. 

Source: Das Boot (1981) In WWII, detecting objects using sonar was a job for trained radiomen 

In our example, the two possible outputs are either a rock or a metal cylinder for the input vector of signal responses from the sonar chirp.

Learning algorithm

To train a classifier using the perceptron algorithm, the model is shown each training instance one at a time. A prediction is made by the model for that training instance. In our example, the model predicts either rock or metal cylinder for the set of sonar chirp responses. The error of the prediction is calculated, and the weights of the model are adjusted to reduce the error.

More formally, the model is trained using stochastic gradient descent. Our error is the cost function, and we want to minimise this value. With the cost function defined, we take the derivative of this cost function to find the gradient.

A good way to visualise stochastic gradient descent is to consider a ball in a bucket. The ball, representing the error, is trying to reach the lowest point of the bucket.

In simple terms, the model calculates the slope at the current position. If the slope is negative, the weights of the model are adjusted to make the ball move to the right. If the slope is positive, we adjust the weights to cause the ball to move to the left. The process repeats until the slope is 0, and the minimum error has been found. As part of the learning algorithm, we have to define the ‘learning rate’, which limits the amount each weight is corrected when the weights are updated. We also set a limit on the number of times to go over the training data while updating the weights with the number of learning ‘epochs’.

Each weight is updated for each row in the training data, over each epoch. Weights are updated based on the error. The error is difference between the expected output value (a correct prediction of a rock or a metal cylinder) and the prediction made by the model with the current weights.

w(t+1)= w(t) + learning_rate * (expected(t) - predicted(t)) * x(t)

The bias is trained in a similar way, but without the input value of one of the sonar chirp responses.

bias(t+1) = bias(t) + learning_rate * (expected(t) - predicted(t))

The full code for training the weights with stochastic gradient descent follows below:

def train_weights(train, l_rate, n_epoch):
    weights = [0.0 for i in range(len(train[0]))] 
    for epoch in range(n_epoch):
        sum_error = 0.0
        for row in train:  
            prediction = predict(row, weights)
            error = row[-1] - prediction
            sum_error += error**2 
            weights[0] = weights[0] + l_rate * error
            for i in range(len(row)-1):
                weights[i + 1] = weights[i + 1] + l_rate * error * row[i]
        error_graph_data.append((epoch, sum_error))
    return weights

For any model trained on labelled data, it is critically important to avoid over fitting on the training data. We use k-fold cross-validation to randomly split the training data into k equal sized sub samples. Of the subsamples, a single subsample is kept for testing the model, and k-1 subsamples are used to train the model. The process is repeated times with each of the subsamples being used once as the test set. The results are then averaged to give a single estimation for the weights of the model.

def cross_validation_split(dataset, n_folds):
	dataset_split = list()
	dataset_copy = list(dataset)
	fold_size = int(len(dataset) / n_folds)
	for i in range(n_folds):
		fold = list()
		while len(fold) < fold_size:
			index = randrange(0, len(dataset_copy))
			fold.append(dataset_copy.pop(index))
		dataset_split.append(fold)
	return dataset_split

The full code for loading the data and training the perceptron can be seen below.

from random import seed
from random import randrange
from csv import reader
import pandas as pd

def load_csv(filename):
    dataset = list()
    with open(filename, 'r') as file:
        csv_reader = reader(file)
        for row in csv_reader:
            if not row:
                continue
            dataset.append(row)
    return dataset

def str_column_to_float(dataset, column):
    for row in dataset:
        row[column] = float(row[column])

def str_column_to_int(dataset, column):
    class_values = [row[column] for row in dataset]
    unique = set(class_values)
    lookup = dict()
    for i, value in enumerate(unique):
        lookup[value] = i
    for row in dataset:
        row[column] = lookup[row[column]]
    return lookup

def cross_validation_split(dataset, n_folds):
	dataset_split = list()
	dataset_copy = list(dataset)
	fold_size = int(len(dataset) / n_folds)
	for i in range(n_folds):
		fold = list()
		while len(fold) < fold_size: index = randrange(0, len(dataset_copy)) fold.append(dataset_copy.pop(index)) dataset_split.append(fold) return dataset_split 

def accuracy_metric(actual, predicted):
    correct = 0
    for i in range(len(actual)):
        if actual[i] == predicted[i]:
            correct += 1
    return correct / float(len(actual)) * 100.0

def evaluate_algorithm(dataset, algorithm, n_folds, *args):
    folds = cross_validation_split(dataset, n_folds)
    scores = list()
    for fold in folds:
        train_set = list(folds)
        train_set.remove(fold) 
        train_set = sum(train_set, []) 
        test_set = list()
        for row in fold:
            row_copy = list(row)
            test_set.append(row_copy)
            row_copy[-1] = None
        predicted = algorithm(train_set, test_set, *args)
        actual = [row[-1] for row in fold]
        accuracy = accuracy_metric(actual, predicted)
        scores.append(accuracy)
    return scores

def predict(row, weights):
    activation = weights[0]
    for i in range(len(row)-1):
        activation += weights[i + 1] * row[i]
    return 1.0 if activation>= 0.0 else 0.0

def train_weights(train, l_rate, n_epoch):
    weights = [0.0 for i in range(len(train[0]))] 
    for epoch in range(n_epoch): 
        sum_error = 0.0
        for row in train: 
            prediction = predict(row, weights)
            error = row[-1] - prediction         
            sum_error += error**2 
            weights[0] = weights[0] + l_rate * error
            for i in range(len(row)-1): 
                weights[i + 1] = weights[i + 1] + l_rate * error * row[i]
        error_graph_data.append((epoch, sum_error))
        if sum_error == 0:
            weights_zero_error.append(weights)
    return weights
        
def perceptron(train, test, l_rate, n_epoch):
    predictions = list()
    weights = train_weights(train, l_rate, n_epoch)    
    for row in test:
        prediction = predict(row, weights)
        predictions.append(prediction)
    return(predictions)
    
seed(1)
filename = 'sonar.all-data.csv'
dataset = load_csv(filename)
for i in range(len(dataset[0])-1):
    str_column_to_float(dataset, i)
str_column_to_int(dataset, len(dataset[0])-1)

n_folds = 3
l_rate = 0.04
n_epoch = 500

error_graph_data = list()
weights_zero_error = list()
scores = evaluate_algorithm(dataset, perceptron, n_folds, l_rate, n_epoch)

df = pd.DataFrame(error_graph_data, columns=['epoch', 'error'])

print('Scores: %s' % scores)
print('Mean Accuracy: %.3f%%' % (sum(scores)/float(len(scores))))

error_plot = df['error'].plot(title = 'Perceptron for the detection of underwater mines \n' 'Mean accuracy: %.3f%%' % (sum(scores)/float(len(scores))))                            
error_plot.set_xlabel('epoch')
error_plot.set_ylabel('error')
error_plot

Tweaking the parameters

Fortunately for us, tweaking our Python perceptron model is significantly easier than it was for Rosenblatt with his Mark I Perceptron machine. Our model allows us to adjust the number of folds we split our training data into, the number of epochs of training, and the learning rate. For our first iteration, we set the number of folds to 3, the learning rate at 0.02 and the number of epochs at 500.

Can we increase the accuracy of the model by increasing the training time?  For the second iteration, we set the number of epochs to 1000.

Increasing the number of training epochs improved the accuracy of our perceptron. We could continue to adjust the learning parameters to get a higher mean accuracy, but ultimately the strength of a prediction model is how well it does on new, unseen data.

Limitations of the perceptron

Rosenblatt invented the perceptron with funds from the United States Office of Naval Research at the Cornell Aeronautical Laboratory. After statements made by Rosenblatt at a press conference, the New York Times reported that the perceptron was “the embryo of an electronic computer that [the Navy] expects will be able to walk, talk, see, write, reproduce itself and be conscious of its existence.

Ultimately, the perceptron was unable to ‘learn’ several classes of patterns. One of these was the XOR function, with two binary inputs and a single output. The XOR should return a true value if the two inputs are not equal, and a false value if they are equal. All of the possible inputs and outputs are shown below:

A limitation of the perceptron was that it was unable to learn patterns where the outputs were not linearly separable. Linear separability means that a single straight line can be drawn between the two classes of outputs. From Wikipedia, we see examples of outputs that are linearly separable:

and those that are not; where two lines must be drawn to separate the classes:

The single-layer perceptron draws a single line through the input space, so how can a neural net draw two lines and solve the XOR? The solution is to add additional layers of neurons; creating a multi-layer perceptron. Multiple layers allows the network to have neurons in the ‘hidden’ layer evaluate the output of the neurons preceding it. Splitting the problem across layers of neurons allows the neural net to learn more complex patterns.

Adding two neurons in parallel allows the network to learn patterns that are not linearly separable, like the XOR function. The values inside the neurons are the biases and the values on the connectors are the weights.

 

The results of the multi-layer perceptron with an XOR function learned.

Despite the limitations of the single-layer perceptron, the work served as the foundation for further research on neural networks, and the development of Werbos’s backpropagation algorithm. We will investigate the capabilities of multi-layer perceptrons and deep learning in the next post.