Illustrated with a simple example

Photo by Nicole Rodriguez on Unsplash

Consider the following binary classification problem. The input is a binary sequence of arbitrary length. We want the output to be 1 if and only if a 1 occurred in the input but not too recently. Specifically, the last n bits must be 0.

We can also write this problem as one on language recognition. For n = 4, the language, described as a regular expression, is (0 or 1)*10000*.

Below are some labeled instances for the case n = 3.

000 → 0, 101 → 0, 0100100000 → 1, 1000 → 1


Intuitive description with examples and discussion

Photo by Compare Fibre on Unsplash

In this post, we describe an interesting and effective graph-based clustering algorithm called Markov clustering. Like other graph-based clustering algorithms and unlike K-means clustering, this algorithm does not require the number of clusters to be known in advance. (For more on this, see [1].)

This algorithm is very popular in clustering bioinformatics data, specifically to cluster protein sequences and to cluster genes from co-expression data [2]. This algorithm also lends itself to distributed computing [2]. …


What they are. When they are useful. How they relate to each other.

Photo by Aldebaran S on Unsplash

Consider the data set


Photo by Chris Leipelt on Unsplash

In the context of supervised learning, a decision tree is a tree for predicting the output for a given input. We start from the root of the tree and ask a particular question about the input. Depending on the answer, we go down to one or another of its children. The child we visit is the root of another tree. So we repeat the process, i.e. ask another question here. Eventually, we reach a leaf, i.e. a node with no children. This node contains the final answer which we output and stop.

This process is depicted below.


Fully-connected versus Convolutional

Like other organisms, artificial neural networks have evolved through the ages. In this post, we cover two key anatomies that have emerged: fully-connected versus convolutional. The second one is better suited to problems in image processing in which there are local features in a space with geometry. The first one is generally appropriate on problems in which there isn’t a geometry and spatial locality of features is not paramount.

Single Neurons

Let’s start with models of single artificial neurons, the “Leggo bricks” of neural networks. A neuron takes a vector x as input and derives a scalar output y from…


three clusters

The clustering problem is to group a set of data points into clusters. Clusters should be internally tight. Clusters should also be well-separated.


Photo by Alina Grubnyak on Unsplash

A recurrent neural network (RNN) processes an input sequence arriving as a stream. It maintains state, i.e. memory. This captures whatever it has seen in the input to this point that it deems relevant for predicting the output (see below).

At each step, the RNN first derives a new state from the current state combined with the new input value. This becomes the new current state. It then outputs a value derived from its current state.

Thus, an RNN may be viewed as a transformer of an input sequence to an output sequence, with the state capturing whatever features it…


For data preparation & feature extraction

Photo by Hitesh Choudhary on Unsplash

In this post, we practice with Python comprehensions on a variety of examples in data preparation and feature extraction. We hope to give the reader a sense of the varied tasks one can do with this one Python mechanism. This post will also be of interest to data science neophytes.

First, let’s start by enumerating two core data patterns that occur repeatedly in data science modeling. In data preparation and in feature extraction. Even in machine learning algorithms, especially ones that operate on vectors and matrices.

The patterns are aggregator and transformer. Both operate on python collections.

An aggregator computes…


From simple to ++, with use cases, examples & code snippets

Photo by Kelly Sikkema on Unsplash

In NLP, a language model is a probability distribution over strings on an alphabet. In formal language theory, a language is a set of strings on an alphabet. The NLP version is a soft variant of the one in formal language theory.

The NLP version is better suited to modeling natural languages such as English or French. No hard rules dictate exactly which strings are in the language and which not. Rather we have observations to work with. People write. People talk. Their utterances characterize the language.

Importantly, the NLP statistical version is good for learning languages over strings from…

Arun Jagota

PhD in computer science — neural nets. 12+ years of experience in industry as data science algorithms developer. 17+ patents issued. 50 academic publications.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store