Week 8 section

In this notebook, we mainly cover Hopfield networks, which we didn't get to discuss much in class. This is one option for you for implementing w08 homework.

The other option is to build a feedforward one-neuron network to distinguish between two classes of letters. If you choose this option, we have a few tips at the end of this notebook.

Hopfield networks

For this week's homework one option is to implement a Hopfield network that can hopefully distinguish letters. As always the first hurdle comes with the data.

From the homework's website here are lists populated with url that points to the reference and handwritten characters that we are going to be using throughout:

In a similar fashion of what we've done in previous weeks we are going to define a function that can read these letters from the url. We know that these letters are going to be 5x5 grids populated with certain characters, and we might have more than one example of the letter at each url:

let's test it and see how we can visualize these letters:

There seem to be 9 examples of letter A in this data, let's go ahead and see all of them:

How about the first example of each letter?

What about the handwritten examples for letter A

There seem to be 2000 examples of the handwritten character A, let's only look at a couple of them

The handwritten aren't perfect like the reference, but we will implement the Hopfield network and see how many of them can be recognized as A's.

How do we implement the hopfield network?

From Mackay's textbook:

In this chapter we will discuss a fully connected feedback network called the Hopfield network. The weights in the Hopfield network are constrained to be symmetric, i.e., the weight from neuron i to neuron j is equal to the weight from neuron j to neuron i.

Hopfield networks have two applications. First, they can act as associative memories.

Then it goes to describe Hebbian learning, in a way that is very intuitive. I recommend if you can that you read Mackay's section 42.1 if you are struggling to understand what's going on conceptually.

Let's break down a couple of things that you are going to want to be able to do in order to implement the weight matrix.

The weight matrix

Hopfield networks are comprised of fully connected neurons. That means every neuron is connected to their neighbours and vice-versa. These connections describe the flow of information through the network using weights. In Hopfield networks, such connections are symmetrical. Therefore, $w_{1,2} = w_{2,1}$. In this notation, $w_{m,n}$ corresponds to the $m$ th row and $n$ th column of $\mathbf{W}$, the weight matrix. In other words:

$$ \forall m,n: \\w_{n,m} = w_{m,n}$$

And the neurons do not have a self connection. the weight for a connection between a neuron and itself is zero, in other words:

$$w_{n,n} = 0 $$

This already gives us a pretty good idea of what such a weight matrix has to look like: zeros on the main diagonal and symmetric along that axis.

Now we want to decode our 5 by 5 image into a format that can be uptaken by the network we are building, by linearizing the array in which we are storing the information using the np.flatten() method we can achieve this, for example:

Notice that this is just a 1 dimensional representation of the same image, simply we add the first row, then the second and so on.

Given a 25-dimensional input vector $y$, and the 25 $\times$ 25 dimensional weight matrix $w$, then the activation $a$ of the $n$ th neuron, $a_n$, is

$$\begin{aligned} a_n &= \sum_{m=1}^{25} w_{nm} y_m \\ &= w_{n,1} y_1 + w_{n,2} y_2 + \dots + w_{n,25} y_{25} \end{aligned}$$

Let's say we trained our connectivity matrix by storing a perfect A. As seen in the lecture notes, the first row of $w$ is

0 1 -1 1 1 1 -1 1 -1 1 1 -1 -1 -1 1 1 -1 1 -1 1 -1 1 1 1 -1

Now let's say we provide as input to the network a perfect A -- we better get a perfect A as the output of the network!

Let's just focus on the first row of our 5 $\times$ 5 digit representation. Our perfect A input is $y = [-1, -1, +1, -1, -1, \dots ]$.

To get a perfect A output, then we need basically this same vector -- first pixel needs to be off, as does the second pixel, then the third pixel needs to be on, then pixels 4 and 5 need to be off.

Let's look at the output of the network when providing the above $y$ as input.

$$\begin{aligned} a_1 &= w_{1,1} y_1 + w_{1,2} y_2 + \dots + w_{1,25} y_{25} \\ &= 0(-1) +1(-1) -1(+1) +1(-1) +1(-1) + \dots \\ &= -24 \end{aligned}$$

(Ignoring the first position because neuron 1 can't talk to itself), every single time that $y$ has a positive 1 in an entry, the connectivity matrix entry is a negative 1, and vice versa. This makes sense -- if we want a perfect A as an output, we need the first pixel to be off. By storing the perfect A into the network, then we've encoded the information that when the first pixel is off, so is the second pixel, and pixels 4 and 5, but pixel 3 is on. The result of the above operation makes a very negative number -- that is the activation $a_n$. We get an activity $y_n$ via

$$y(a) = \left\{\begin{matrix} +1 & a \geq 0\\ -1 & a < 0\\ \end{matrix} \right.$$

This is where np.where() might come in handy, to scan a vector for a condition, for example its numbers being higher than 0 and replacing it's value for 1 or -1. we use this function above in the load_letter function.

So by this rule, $y_1 = y(a_1) = y(-24) = -1$.

Now, what if we passed in a slightly corrupted version of an A? Again let's just restrict ourselves to the top row of the image. Say that we had the second pixel on, when ideally it should be off: $y_{corrupt} = [-1, +1, +1, -1, -1, \dots ]$.

$$\begin{aligned} a_1 &= w_{1,1} y_{corrupt, 1} + w_{1,2} y_{corrupt, 2} + \dots + w_{1,25} y_{corrupt, 25} \\ &= 0(-1) +1(+1) -1(+1) +1(-1) +1(-1) + \dots \\ &= -22 \end{aligned}$$

A-ha: we have a single mismatch at the second index, contributing a $+1$ to our summation. But there are so many matches at all the other pixels that the activation is still very negative, so the activity $y_1$ after passing in the imperfect A is still $-1$ and the pixel is off.

This is the fundamental property of the Hopfield network -- by storing patterns and the pairwise interactions between pixels, then imperfect versions of the patterns that get passed in to the network can get "corrected" by all the stored pairwise information between pixels -- you just need enough pixels to be on when they should be on, and off when they should be off, to fix imperfect patterns.

But how do we actually set the weights?

For that we are going to use Hebb's rule, we are going to create a matrix of the appropriate size and populate it according to this rule, this is the form of the rule for a single pattern:

$$w_{nm} = y_n y_m$$

Let's say we wanted to add the memory A to the network, here we show how the first row would look like:

If you wanted to store $k$ patterns, you add the product at the $n$ and $m$ entries for each pattern:

$$w_{nm} = (y^1_{n} y^1_{m}) +(y^2_{n} y^2_{m}) + \dots + (y^k_{n} y^k_{m})$$$$w_{nm} = \sum_{k=1}^K y_n^{(k)} y_m^{(k)}$$

In your own implementation, it would be tedious writing the above out by hand -- think about how you could use a for loop to build the weight matrix instead.

Tips for feedforward neuron model