> Open-Source "Masters", Machine Learning and Statistics

Casablanca, Morocco, August 2016 to July 2017

A ten-month, self-curated deep-dive into select topics in machine learning and statistics. Educational resources derived from massive open online courses (MOOCs), textbooks, predictive modeling competitions and academic research (arXiv).


Notable courses, books:


Repositories I published (or contributed to) unrelated to the publications above.

  • tensorflow-models: Some basic models in TensorFlow
  • vanilla-neural-nets: A straightforward and highly readable implementation of vanilla neural nets
  • dotify: A web application that recommends songs via "country arithmetic" and hand-rolled Implicit Matrix Factorization
  • n-queens-sympy: A simple solver for the N-Queens Problem using SymPy
  • markdown-insert-screenshot: A lightweight Atom plugin for saving an interactive screen capture to a relative file destination

> The Pennsylvania State University

University Park, Pennsylvania, September 2007 to December 2011

> Royal Melbourne Institute of Technology

Melbourne, Australia, March to July 2010

  • Semester abroad experience
  • Mathematics and engineering courses in credit towards undergraduate degree

Employment History

> NLP, ASAPP, Inc.

New York, New York, August 2017 to Present

  • Research Engineer, January 2019 to Present
  • Lead Machine Learning Engineer, October 2018 to January 2019
  • Machine Learning Engineer, August 2017 to October 2018

AI for enterprise. Stealth for now.

> Data Scientist, ShopKeep

New York, New York, March 2015 to July 2016

  • Built “Merchants Like You” endpoint with word embeddings, Spark, Scala, Python, Flask and various clustering, nearest-neighbor and dimensionality reduction techniques
  • Deployed production models to predict merchant churn
  • Authored software package offering composable predictive modeling objects persisted to S3
  • Built and maintained internal ETL infrastructure

> Data Science Mentor, Thinkful

New York, New York, August 2015 to December 2017

  • Mentoring students throughout a 10-12 week, project-based Data Science curriculum
  • Main topics include: Python scientific and inferential stack, databases and APIs, Git and version control, probability and statistics, and machine learning algorithms and their applications

> Data Science Consultant, Data-Pop Alliance

New York, New York, September 2015 to February 2016

  • Contributed to several research efforts aimed to use cell phone calling records to predict population measures in West Africa and Latin America

> Data Science and Backend Web Development, LiveAuctioneers

New York, New York, August 2014 to March 2015

  • Lead engineer on item classification initiative: building machine learning models to classify more than 20,000,000 archived items into a given taxonomy using primarily text-based features
  • Rendered models into an internal API used to classify newly uploaded items on an hourly basis
  • Emphasis on classification models, ensemble methods, natural language processing, clustering methods and test-driven development

> Online Poker

Internet, January 2005 to January 2011

  • Made roughly $150,000 in net earnings from $50 initial investment, playing 6 or more tables simultaneously
  • Emphasis on statistical analysis, combinatorics and data-driven problem-solving
  • Coached several players from around the world through virtual platforms, teaching complex mathematical and psychological topics in easy-to-understand ways

Mathematical Tools

  • Neural networks: feed-forward, recurrent, sequence-to-sequence and multi-objective networks for classification, regression and compression
  • Bayesian probabilistic models: random effects models, generalized linear models, mixture models and more for regression, classification and clustering
  • Recommender models: content-based and collaborative methods for explicit and implicit feedback problems
  • Natural language processing: word-embedding models for classification and similarity
  • Probabilistic graphical models: Bayesian and Markov networks for reasoning about medium-sized systems
  • Approximate inference techniques: MCMC sampling methods and variational inference
  • Discrete optimization techniques: dynamic programming, constraint programming, mixed-integer programming and local search methods

Engineering Tools

  • Languages: Python scientific and inferential stack (Pandas, NumPy, SciPy, scikit-learn, TensorFlow, Keras, PyMC, Edward), R (tidyverse, Stan), Scala, SQL, Javascript (React), HTML, CSS, Git
  • Databases and web frameworks: Amazon Redshift, PostgreSQL, MySQL, Alembic, Flask, Tornado Web Server
  • Cloud computing services, data pipelines and containerization: Amazon Web Services (Data Pipeline, S3, EC2, EMR), Google Compute Engine, Docker, Luigi


> Travel

Worldwide, January 2012 to March 2014

After undergrad, before work.

  • Completed a 26.5-month solo backpacking and cycling trip around the world, visiting roughly 40 countries throughout East Africa, West Africa, South America, Scandinavia, Central Asia and Southeast Asia
  • Taught a primary school class, in Spanish, in Colombia, and Spanish and Physics courses, in French, to University and middle-school students in Guinea-Conakry
  • Pedaled a bicycle 7,600 kilometers from Istanbul, Turkey to Bishkek, Kyrgyzstan

Writing, Code, Social

Talks, Teaching

  • "A Practical Guide to the Open-Source Machine Learning Masters", NYC Machine Learning, August 2019
  • "Minimizing the Negative Log-Likelihood, In English", Boston Bayesians, August 2018
  • "You've Been Doing Statistics All Along", Facebook Developer Circles Casablanca, May 2017
  • Platzi Data Science Courses, Bogotá, Colombia, September 2016
    • Beginner's course (Spanish)
    • Advanced course


  • Languages: English (native) Spanish (business level), French (business level), Russian (beginner)
  • Interests: Hiking, languages, traveling by bicycle, mathematics

© Will Wolf 2020

Powered by Pelican