From feedforward through stacked recurrent

Photo by Markus Spiske on Unsplash

In NLP, a language model is a probability distribution over sequences on an alphabet of tokens. A central problem in language modeling is to learn a language model from examples, such as a model of English sentences from a training set of sentences.

Language models have many uses. Such as…


On a ‘referrals’ example

Photo by Belinda Fewings on Unsplash

PageRank is a key algorithm that assigns importance scores to nodes in a graph. You could say that it launched Google.

In this post, we illustrate the key “elaborate behaviors” PageRank is able to exhibit with the simplest possible example.

In the Google web search setting, the graph’s nodes are web pages. “Billions and billions” of them. If page A contains a hyperlink to page B, this is captured in the graph as an arc from A to B. Thus the graph is directed.

PageRank is applicable to any graph whose nodes need to be importance-scored, such as a social…


Illustrated with a simple example

Photo by Nicole Rodriguez on Unsplash

Consider the following binary classification problem. The input is a binary sequence of arbitrary length. We want the output to be 1 if and only if a 1 occurred in the input but not too recently. Specifically, the last n bits must be 0.

We can also write this problem as one on language recognition. For n = 4, the language, described as a regular expression, is (0 or 1)*10000*.

Below are some labeled instances for the case n = 3.

000 → 0, 101 → 0, 0100100000 → 1, 1000 → 1

Why this seemingly strange problem? It requires…


Intuitive description with examples and discussion

Photo by Compare Fibre on Unsplash

In this post, we describe an interesting and effective graph-based clustering algorithm called Markov clustering. Like other graph-based clustering algorithms and unlike K-means clustering, this algorithm does not require the number of clusters to be known in advance. (For more on this, see [1].)

This algorithm is very popular in clustering bioinformatics data, specifically to cluster protein sequences and to cluster genes from co-expression data [2]. This algorithm also lends itself to distributed computing [2]. …


What they are. When they are useful. How they relate to each other.

Photo by Aldebaran S on Unsplash

Consider the data set


Photo by Chris Leipelt on Unsplash

In the context of supervised learning, a decision tree is a tree for predicting the output for a given input. We start from the root of the tree and ask a particular question about the input. Depending on the answer, we go down to one or another of its children. The child we visit is the root of another tree. So we repeat the process, i.e. ask another question here. Eventually, we reach a leaf, i.e. a node with no children. This node contains the final answer which we output and stop.

This process is depicted below.


Fully-connected versus Convolutional

Like other organisms, artificial neural networks have evolved through the ages. In this post, we cover two key anatomies that have emerged: fully-connected versus convolutional. The second one is better suited to problems in image processing in which there are local features in a space with geometry. The first one is generally appropriate on problems in which there isn’t a geometry and spatial locality of features is not paramount.

Single Neurons

Let’s start with models of single artificial neurons, the “Leggo bricks” of neural networks. A neuron takes a vector x as input and derives a scalar output y from…


three clusters

The clustering problem is to group a set of data points into clusters. Clusters should be internally tight. Clusters should also be well-separated.


Photo by Alina Grubnyak on Unsplash

A recurrent neural network (RNN) processes an input sequence arriving as a stream. It maintains state, i.e. memory. This captures whatever it has seen in the input to this point that it deems relevant for predicting the output (see below).

At each step, the RNN first derives a new state from the current state combined with the new input value. This becomes the new current state. It then outputs a value derived from its current state.

Thus, an RNN may be viewed as a transformer of an input sequence to an output sequence, with the state capturing whatever features it…

Arun Jagota

PhD in computer science — neural nets. 12+ years of experience in industry as data science algorithms developer. 17+ patents issued. 50 academic publications.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store