Building an Artificial Neuron in Python – the Perceptron
With so much recent hype around Deep Learning, I wanted to take a closer look at neural networks and what has motivated the resurgence of algorithms that have existed since the 1950s. Neural networks, in their simplest form, work the same way as a biological neuron. A neuron has dendrites to receive signals, a cell to process those signals, and an axon to pass on the processed signals to other neurons. The artificial neuron, or neural network, has a series of input channels, a processing stage, and an output.
We learn how to identify and classify new objects and new faces by being told what they are – neural networks are no different. The simplest neural network is called the perceptron, and was developed by Frank Rosenblatt in 1958. The perceptron is an algorithm for training binary classifiers. These classifiers decide whether an input, represented by a vector of numbers, belongs to a particular class or not1.
The mathematical definition of the binary classifier is:
Where is a real-valued input vector and is the output value (a single binary value). The vector of weights is defined by , is the dot product
where is the number of inputs to the perceptron and is the bias.
As the output of is a binary value, the perceptron will learn functions that classify the inputs as either a positive or negative instance.
A perceptron for detecting underwater mines
For this post, we focus on training a single layer perceptron, as many of the same techniques are used to train more complex neural networks. To recreate Rosenblatt’s perceptron, we will look at an implementation in Python.We have adapted the code from Jason Brownlee’s excellent tutorial here.
To train the our neural net, we used the Sonar data set from the UCI Machine Learning repository. From the data set description:
The file “sonar.mines” contains 111 patterns obtained by bouncing sonar signals off a metal cylinder at various angles and under various conditions. The file “sonar.rocks” contains 97 patterns obtained from rocks under similar conditions. The transmitted sonar signal is a frequency-modulated chirp, rising in frequency. The data set contains signals obtained from a variety of different aspect angles, spanning 90 degrees for the cylinder and 180 degrees for the rock.
In our example, the two possible outputs are either a rock or a metal cylinder for the input vector of signal responses from the sonar chirp.
To train a classifier using the perceptron algorithm, the model is shown each training instance one at a time. A prediction is made by the model for that training instance. In our example, the model predicts either rock or metal cylinder for the set of sonar chirp responses. The error of the prediction is calculated, and the weights of the model are adjusted to reduce the error.
More formally, the model is trained using stochastic gradient descent. Our error is the cost function, and we want to minimise this value. With the cost function defined, we take the derivative of this cost function to find the gradient.
In simple terms, the model calculates the slope at the current position. If the slope is negative, the weights of the model are adjusted to make the ball move to the right. If the slope is positive, we adjust the weights to cause the ball to move to the left. The process repeats until the slope is 0, and the minimum error has been found. As part of the learning algorithm, we have to define the ‘learning rate’, which limits the amount each weight is corrected when the weights are updated. We also set a limit on the number of times to go over the training data while updating the weights with the number of learning ‘epochs’.
Each weight is updated for each row in the training data, over each epoch. Weights are updated based on the error. The error is difference between the expected output value (a correct prediction of a rock or a metal cylinder) and the prediction made by the model with the current weights.
w(t+1)= w(t) + learning_rate * (expected(t) - predicted(t)) * x(t)
The bias is trained in a similar way, but without the input value of one of the sonar chirp responses.
bias(t+1) = bias(t) + learning_rate * (expected(t) - predicted(t))
The full code for training the weights with stochastic gradient descent follows below:
def train_weights(train, l_rate, n_epoch): weights = [0.0 for i in range(len(train))] for epoch in range(n_epoch): sum_error = 0.0 for row in train: prediction = predict(row, weights) error = row[-1] - prediction sum_error += error**2 weights = weights + l_rate * error for i in range(len(row)-1): weights[i + 1] = weights[i + 1] + l_rate * error * row[i] error_graph_data.append((epoch, sum_error)) return weights
For any model trained on labelled data, it is critically important to avoid over fitting on the training data. We use k-fold cross-validation to randomly split the training data into k equal sized sub samples. Of the k subsamples, a single subsample is kept for testing the model, and k-1 subsamples are used to train the model. The process is repeated k times with each of the k subsamples being used once as the test set. The results are then averaged to give a single estimation for the weights of the model.
def cross_validation_split(dataset, n_folds): dataset_split = list() dataset_copy = list(dataset) fold_size = int(len(dataset) / n_folds) for i in range(n_folds): fold = list() while len(fold) &amp;amp;lt; fold_size: index = randrange(0, len(dataset_copy)) fold.append(dataset_copy.pop(index)) dataset_split.append(fold) return dataset_split
The full code for loading the data and training the perceptron can be seen below.
from random import seed from random import randrange from csv import reader import pandas as pd def load_csv(filename): dataset = list() with open(filename, 'r') as file: csv_reader = reader(file) for row in csv_reader: if not row: continue dataset.append(row) return dataset def str_column_to_float(dataset, column): for row in dataset: row[column] = float(row[column]) def str_column_to_int(dataset, column): class_values = [row[column] for row in dataset] unique = set(class_values) lookup = dict() for i, value in enumerate(unique): lookup[value] = i for row in dataset: row[column] = lookup[row[column]] return lookup def cross_validation_split(dataset, n_folds): dataset_split = list() dataset_copy = list(dataset) fold_size = int(len(dataset) / n_folds) for i in range(n_folds): fold = list() while len(fold) &amp;amp;lt; fold_size: index = randrange(0, len(dataset_copy)) fold.append(dataset_copy.pop(index)) dataset_split.append(fold) return dataset_split def accuracy_metric(actual, predicted): correct = 0 for i in range(len(actual)): if actual[i] == predicted[i]: correct += 1 return correct / float(len(actual)) * 100.0 def evaluate_algorithm(dataset, algorithm, n_folds, *args): folds = cross_validation_split(dataset, n_folds) scores = list() for fold in folds: train_set = list(folds) train_set.remove(fold) train_set = sum(train_set, ) test_set = list() for row in fold: row_copy = list(row) test_set.append(row_copy) row_copy[-1] = None predicted = algorithm(train_set, test_set, *args) actual = [row[-1] for row in fold] accuracy = accuracy_metric(actual, predicted) scores.append(accuracy) return scores def predict(row, weights): activation = weights for i in range(len(row)-1): activation += weights[i + 1] * row[i] return 1.0 if activation&amp;gt;= 0.0 else 0.0 def train_weights(train, l_rate, n_epoch): weights = [0.0 for i in range(len(train))] for epoch in range(n_epoch): sum_error = 0.0 for row in train: prediction = predict(row, weights) error = row[-1] - prediction sum_error += error**2 weights = weights + l_rate * error for i in range(len(row)-1): weights[i + 1] = weights[i + 1] + l_rate * error * row[i] error_graph_data.append((epoch, sum_error)) if sum_error == 0: weights_zero_error.append(weights) return weights def perceptron(train, test, l_rate, n_epoch): predictions = list() weights = train_weights(train, l_rate, n_epoch) for row in test: prediction = predict(row, weights) predictions.append(prediction) return(predictions) seed(1) filename = 'sonar.all-data.csv' dataset = load_csv(filename) for i in range(len(dataset)-1): str_column_to_float(dataset, i) str_column_to_int(dataset, len(dataset)-1) n_folds = 3 l_rate = 0.04 n_epoch = 500 error_graph_data = list() weights_zero_error = list() scores = evaluate_algorithm(dataset, perceptron, n_folds, l_rate, n_epoch) df = pd.DataFrame(error_graph_data, columns=['epoch', 'error']) print('Scores: %s' % scores) print('Mean Accuracy: %.3f%%' % (sum(scores)/float(len(scores)))) error_plot = df['error'].plot(title = 'Perceptron for the detection of underwater mines \n' 'Mean accuracy: %.3f%%' % (sum(scores)/float(len(scores)))) error_plot.set_xlabel('epoch') error_plot.set_ylabel('error') error_plot
Tweaking the parameters
Fortunately for us, tweaking our Python perceptron model is significantly easier than it was for Rosenblatt with his Mark I Perceptron machine. Our model allows us to adjust the number of folds we split our training data into, the number of epochs of training, and the learning rate. For our first iteration, we set the number of folds to 3, the learning rate at 0.02 and the number of epochs at 500.
Can we increase the accuracy of the model by increasing the training time? For the second iteration, we set the number of epochs to 1000.
Increasing the number of training epochs improved the accuracy of our perceptron. We could continue to adjust the learning parameters to get a higher mean accuracy, but ultimately the strength of a prediction model is how well it does on new, unseen data.
Limitations of the perceptron
Rosenblatt invented the perceptron with funds from the United States Office of Naval Research at the Cornell Aeronautical Laboratory. After statements made by Rosenblatt at a press conference, the New York Times reported that the perceptron was “the embryo of an electronic computer that [the Navy] expects will be able to walk, talk, see, write, reproduce itself and be conscious of its existence.”
Ultimately, the perceptron was unable to ‘learn’ several classes of patterns. One of these was the XOR function, with two binary inputs and a single output. The XOR should return a true value if the two inputs are not equal, and a false value if they are equal. All of the possible inputs and outputs are shown below:
A limitation of the perceptron was that it was unable to learn patterns where the outputs were not linearly separable. Linear separability means that a single straight line can be drawn between the two classes of outputs. From Wikipedia, we see examples of outputs that are linearly separable:
and those that are not; where two lines must be drawn to separate the classes:
The single-layer perceptron draws a single line through the input space, so how can a neural net draw two lines and solve the XOR? The solution is to add additional layers of neurons; creating a multi-layer perceptron. Multiple layers allows the network to have neurons in the ‘hidden’ layer evaluate the output of the neurons preceding it. Splitting the problem across layers of neurons allows the neural net to learn more complex patterns.
Despite the limitations of the single-layer perceptron, the work served as the foundation for further research on neural networks, and the development of Werbos’s backpropagation algorithm. We will investigate the capabilities of multi-layer perceptrons and deep learning in the next post.