TL;DR: there’s a Shiny app too. I write this post as a follow-up to Erik Bernhardsson’s post “More MCMC – Analyzing a small dataset with 1-5 ratings.” Therein, Erik builds a simple multinomial regression to model explicit, 1-5 feedback scores for different variants of Better’s website. I like his approach for the rigor and mathematical fidelity it …

# Intercausal Reasoning in Bayesian Networks

The work for this post is contained in the following Jupyter notebook. Below is a brief introduction of what’s inside. I’m currently taking a course on probabilistic graphical models in which we algebraically compute conditional probability estimates for simple Bayesian networks given ground-truth probabilities of the component parts. Since this is rather unrealistic in the real world, I sought …

# Bayesian Inference via Simulated Annealing

I recently finished a course on discrete optimization and am currently working through Richard McElreath’s excellent textbook Statistical Rethinking. Combining the two, and duly jazzed by this video on the Traveling Salesman Problem, I’d thought I’d build a toy Bayesian model and try to optimize it via simulated annealing. This work was brief, amusing and experimental. The …

# RescueTime Inference via the “Poor Man’s Dirichlet”

RescueTime is “a personal analytics service that shows you how you spend your time [on the computer], and provides tools to help you be more productive.” Personally, I’ve been a RescueTime user since late-January 2016, and while it does ping me guilty for harking back to a dangling Facebook tab in Chrome, I haven’t yet …

# Generating World Flags with Sparse Auto-Encoders

I’ve always been enchanted by the notion of encoding real-world entities into lists of numbers. In essence, I believe that hard truth is pristinely objective – like the lyrics of a song – and mathematics is the universe’s all-powerful tool in expressing incontrovertibility. One of the expressed goals of machine learning is to learn structure …

# Docker and Kaggle with Ernie and Bert

This post is meant to serve as an introduction to what Docker is and why and how to use it for Kaggle. For simplicity, we will primarily speak about Sesame Street and cupcakes in lieu of computers and data. One Monday morning, Ernie from the ‘Street climbs out from under his red-and-blue pinstriped covers, puts both …

# While We Were Busy with Prosperity

I address this post to my peers – to my liberal, driven, University-educated and multi-cultural counterparts. Like most of you, I spent the day of Donald Trump’s election in a state of disbelief, paralysis and exasperation. Like many more, I had several long, critical conversations about what had just happened and where we go from here. In one conversation, …

# Recurrent Neural Network Gradients, and Lessons Learned Therein

I’ve spent the last week hand-rolling recurrent neural networks. I’m currently taking Udacity’s Deep Learning course, and arriving at the section on RNN’s and LSTM’s, I decided to build a few for myself. What are RNN’s? On the outside, recurrent neural networks differ from typical, feedforward neural networks in that they take a sequence of input instead …

# Simulating the Colombian Peace Vote: Did the “No” Really Win?

On October 2nd, 2016, I watched in awe as Colombia’s national plebiscite for its just-signed peace accord narrowly failed. For the following week, I brooded over the result: the disinformation campaign, Uribe’s antics, and just how good the deal really seemed to be. Two days ago, I chanced upon this post, which reminds us that the …

# My Open-Source Machine Learning Masters (in Casablanca, Morocco)

The Open-Source Machine Learning Masters (OSMLM) is a self-curated deep-dive into select topics in machine learning and distributed computing. Educational resources are derived from online courses (MOOCs), textbooks, predictive modeling competitions, academic research (arXiv), and the open-source software community. In machine learning, both the quantity and quality of these resources – all available for free …