Published in Towards Data Science·Nov 21Member-onlyAnswering Questions Posed In Natural LanguageIssues involving information retrieval, natural language processing, and machine learning — Imagine a software system that is able to answer questions posed in natural language effectively. Such a system is very useful. It is also very interesting from the perspective of what happens under the hood. Let’s start by enumerating some practical use cases of such a system. Online Shopping: Answering…Vector Space Model16 min readVector Space Model16 min read

Published in Towards Data Science·Nov 8Member-onlyGenerative Adversarial LearningFrom generative to “plus adversarial” — Say we have a dataset of real images. Such as pictures of lions in various settings. From this data set, we want to machine-learn to generate new images that look like the real ones. Generative Adversarial Networks, GANs for short, are a compelling approach to this problem. A GAN comprises…Unsupervised Learning7 min readUnsupervised Learning7 min read

Published in Towards Data Science·Oct 10Member-onlyEstimating Event-specific Counts in Streaming DataThe simple and compelling CountMin Sketch — Imagine a stream of incoming symbols a, b, a, c, a, …. We’d like to know how many times a certain symbol has arrived. This problem has many uses. To calculate the number of times a certain query was made to Google, a certain video watched on YouTube, or a…Data Sketches13 min readData Sketches13 min read

Sep 30Member-onlyThe Majority Event In Streaming DataAn interesting (and idiosyncratic) streaming algorithm — Imagine a possibly unbounded stream of incoming symbols. At a particular point, say there is some symbol that has occurred more than 50% of the time till then. We call it a majority symbol. It is obviously unique if it exists. This post discusses an efficient streaming algorithm that “half”…Data Sketches4 min readData Sketches4 min read

Published in Towards Data Science·Sep 2Member-onlyCounting Distinct Events in StreamsBig data statistics in distributed settings — Imagine an infinite stream of incoming symbols. We’d like to know the number of distinct values received so far at any point in time. This problem has a number of uses. …Big Data Analytics11 min readBig Data Analytics11 min read

Published in Towards Data Science·Apr 16Member-onlyStructure Prediction and LearningCombining predictive modeling with structure inference — Supervised machine learning involves predicting the value of an outcome variable from some input. Typically, the outcome is real-valued or categorical. Structured prediction generalizes this to predicting outcomes that either have explicit structure or are composed of multiple interacting scalar outcomes. Let’s see examples involving explicit structure.Viterbi Algorithm11 min readViterbi Algorithm11 min read

Published in Towards Data Science·Jan 21Member-onlyFuzzy String Search: Pruning The Search SpacePhonetic keys, Locality Sensitive Hashing — This is the problem of finding approximate matches of a string in a given dictionary of strings. Let’s see an example. We want to search for Jonahtan in a dictionary of clean first names of people. What we really mean is we want to find approximate matches, after taking spelling…Hashing13 min readHashing13 min read

Published in Towards Data Science·Dec 23, 2021Member-onlyFuzzy String Matching AlgorithmsLevenshtein, Phonetic — Often the same entity may be expressed as different strings. For instance, plausible expressions of the first name of the same person. Such as Kathy and Cathy. Or Jonathan and Jonahtan. Matching and inferring that two strings are plausible expressions of the same entity has several use cases. …Dynamic Programming7 min readDynamic Programming7 min read

Dec 21, 2021Member-onlyGraph Theory PrimerBasic concepts & examples in an academic style — Graphs and Variants: A graph is a set of nodes. Some pairs of nodes are connected by edges. A graph is directed if its edges are oriented. Directed edges are called arcs. A graph is weighted if its nodes or edges have numeric weights on them. …Nodes5 min readNodes5 min read

Published in Towards Data Science·Dec 21, 2021Member-onlyBasics of Recommender SystemsUser Similarity, Item Similarity, Collaborative Filtering, Content-Based Models, Latent Space Models — Recommender systems proactively recommend relevant items to users. When appropriate. “Proactively” means the items just show up — users don’t need to search for them or even be aware of their existence. “Relevant” means users tend to engage with them when they show up. What exactly “engage with them” means…Collaborative Filtering27 min readCollaborative Filtering27 min read