In NLP, a language model is a *probability distribution* over sequences on an alphabet of tokens. A central problem in language modeling is to learn a language model from examples, such as a model of English sentences from a training set of sentences.

Language models have many uses. Such as…

**Suggest auto-completes**: the user types a few characters (or words) on a web search engine or a smartphone and likely extensions are suggested.**Recognize handwriting**: An image-based recognition system augmented with a language model has improved accuracy. …

PageRank is a key algorithm that assigns importance scores to nodes in a graph. You could say that it launched Google.

In this post, we illustrate the key “elaborate behaviors” PageRank is able to exhibit with the simplest possible example.

In the Google web search setting, the graph’s nodes are web pages. “Billions and billions” of them. If page A contains a hyperlink to page B, this is captured in the graph as an arc from A to B. Thus the graph is directed.

PageRank is applicable to any graph whose nodes need to be importance-scored, such as a social…

Consider the following binary classification problem. The input is a binary sequence of arbitrary length. We want the output to be 1 if and only if a 1 occurred in the input but not too recently. Specifically, the last *n* bits must be 0.

We can also write this problem as one on language recognition. For *n* = 4, the language, described as a regular expression, is `(0 or 1)*10000*`

.

Below are some labeled instances for the case *n* = 3.

`000 → 0, 101 → 0, 0100100000 → 1, 1000 → 1`

Why this seemingly strange problem? It requires…

In this post, we describe an interesting and effective graph-based clustering algorithm called Markov clustering. Like other graph-based clustering algorithms and unlike *K*-means clustering, this algorithm does not require the number of clusters to be known in advance. (For more on this, see [1].)

This algorithm is very popular in clustering bioinformatics data, specifically to cluster protein sequences and to cluster genes from co-expression data [2]. This algorithm also lends itself to distributed computing [2]. …

Consider the data set

In the context of supervised learning, a decision tree is a tree for predicting the output for a given input. We start from the root of the tree and ask a particular question about the input. Depending on the answer, we go down to one or another of its children. The child we visit is the root of another tree. So we repeat the process, i.e. ask another question here. Eventually, we reach a leaf, i.e. a node with no children. This node contains the final answer which we output and stop.

This process is depicted below.

Like other organisms, artificial neural networks have evolved through the ages. In this post, we cover two key anatomies that have emerged: fully-connected versus convolutional. The second one is better suited to problems in image processing in which there are local features in a space with geometry. The first one is generally appropriate on problems in which there isn’t a geometry and spatial locality of features is not paramount.

**Single Neurons**

Let’s start with models of single artificial neurons, the “Leggo bricks” of neural networks. A neuron takes a vector **x** as input and derives a scalar output y from…

The clustering problem is to group a set of data points into clusters. Clusters should be internally tight. Clusters should also be well-separated.

A recurrent neural network (RNN) processes an input sequence arriving as a stream. It maintains state, i.e. memory. This captures whatever it has seen in the input to this point that it deems relevant for predicting the output (see below).

At each step, the RNN first derives a new state from the current state combined with the new input value. This becomes the new current state. It then outputs a value derived from its current state.

Thus, an RNN may be viewed as a transformer of an input sequence to an output sequence, with the state capturing whatever features it…

PhD in computer science — neural nets. 12+ years of experience in industry as data science algorithms developer. 17+ patents issued. 50 academic publications.