Consider the following binary classification problem. The input is a binary sequence of arbitrary length. We want the output to be 1 if and only if a 1 occurred in the input but not too recently. Specifically, the last *n* bits must be 0.

We can also write this problem as one on language recognition. For *n* = 4, the language, described as a regular expression, is `(0 or 1)*10000*`

.

Below are some labeled instances for the case *n* = 3.

`000 → 0, 101 → 0, 0100100000 → 1, 1000 → 1`

Why this seemingly strange problem? It requires…

In this post, we describe an interesting and effective graph-based clustering algorithm called Markov clustering. Like other graph-based clustering algorithms and unlike *K*-means clustering, this algorithm does not require the number of clusters to be known in advance. (For more on this, see [1].)

This algorithm is very popular in clustering bioinformatics data, specifically to cluster protein sequences and to cluster genes from co-expression data [2]. This algorithm also lends itself to distributed computing [2]. …

Consider the data set

In the context of supervised learning, a decision tree is a tree for predicting the output for a given input. We start from the root of the tree and ask a particular question about the input. Depending on the answer, we go down to one or another of its children. The child we visit is the root of another tree. So we repeat the process, i.e. ask another question here. Eventually, we reach a leaf, i.e. a node with no children. This node contains the final answer which we output and stop.

This process is depicted below.

Like other organisms, artificial neural networks have evolved through the ages. In this post, we cover two key anatomies that have emerged: fully-connected versus convolutional. The second one is better suited to problems in image processing in which there are local features in a space with geometry. The first one is generally appropriate on problems in which there isn’t a geometry and spatial locality of features is not paramount.

**Single Neurons**

Let’s start with models of single artificial neurons, the “Leggo bricks” of neural networks. A neuron takes a vector **x** as input and derives a scalar output y from…

The clustering problem is to group a set of data points into clusters. Clusters should be internally tight. Clusters should also be well-separated.

A recurrent neural network (RNN) processes an input sequence arriving as a stream. It maintains state, i.e. memory. This captures whatever it has seen in the input to this point that it deems relevant for predicting the output (see below).

At each step, the RNN first derives a new state from the current state combined with the new input value. This becomes the new current state. It then outputs a value derived from its current state.

Thus, an RNN may be viewed as a transformer of an input sequence to an output sequence, with the state capturing whatever features it…

In this post, we practice with Python comprehensions on a variety of examples in data preparation and feature extraction. We hope to give the reader a sense of the varied tasks one can do with this one Python mechanism. This post will also be of interest to data science neophytes.

First, let’s start by enumerating two core data patterns that occur repeatedly in data science modeling. In data preparation and in feature extraction. Even in machine learning algorithms, especially ones that operate on vectors and matrices.

The patterns are *aggregator* and *transformer*. Both operate on python collections.

An aggregator computes…

In NLP, a language model is a *probability distribution* over strings on an alphabet. In formal language theory, a language is a *set* of strings on an alphabet. The NLP version is a soft variant of the one in formal language theory.

The NLP version is better suited to modeling natural languages such as English or French. No hard rules dictate exactly which strings are in the language and which not. Rather we have observations to work with. People write. People talk. Their utterances characterize the language.

Importantly, the NLP statistical version is good for *learning* languages over strings from…

PhD in computer science — neural nets. 12+ years of experience in industry as data science algorithms developer. 17+ patents issued. 50 academic publications.