The Open-Source Machine Learning "Master's" (OSMLM) is a self-curated deep-dive into select topics in machine learning and distributed computing. Educational resources are derived from online courses (MOOCs), textbooks, predictive modeling competitions, academic research (arXiv), and the open-source software community. In machine learning, both the quantity and quality of these resources - all available for free or at a trivial cost - is truly f*cking amazing.
Why Am I Doing This?
I want to become more of a technical expert in machine learning. I want to use this expertise to solve real-world problems that actually matter.
To this end, I see two main roads: a traditional graduate program, and the OSMLM.
Why Not Graduate School
For me, graduate school is suboptimal for 3 key reasons:
- It's expensive. Upon a quick Google search, a 2-year graduate program would cost, conservatively, $80,000 in tuition fees alone. This is a wholly nontrivial sum of money that would impact how I structure the next 10 years of my life.
- There are far more dependencies. I have to apply. I have to get accepted. I have to find the right professor. I have to find a city suitable to my broader interests and lifestyle. This takes time.
- By the time I finish, the field of machine learning will look fundamentally different than it did when I started. This is the most important point of all. The only way to remain current with the latest tools and techniques is to do just that. Given the furious and only-accelerating-faster pace at which machine learning is moving, this requires much more than just a few hours on the weekend.
Why the OSMLM
- I think the higher education paradigm is changing. Access to critical, academic knowledge is increasingly democratic: Khan Academy can teach me about the Central Limit Theorem as well as any statistics professor. The ~$250,000 in tuition fees commanded by an undergraduate education at a private American university is, for some, several decades of debt and concession, and for others, prohibitive beyond comedy, reason and fantasy alike. If hard-skills are your end, online self-education is an immensely attractive, intuitive, and practical road to follow - especially in an industry as meritocratic as tech.
- I'm keenly aware of how productive I am in a self-teaching environment. I'm largely self-taught in data science. Before that, it was online poker: a 5-year, $50 to $150,000 journey of instructional videos, online forums, critical discussion with other players and personal coaching - all from the comfort of my bedroom. I'm very effective at learning things online.
- Some of the most impactful projects I've completed professionally stemmed directly from those I'd completed personally. I would not know how to ensemble models if not for Kaggle. I would not know how to perform hierarchical Bayesian inference if not for Bayesian Methods for Hackers. The open-source data science community continues to teach me creative ways to use data to solve challenging problems. To this end, I want to consume, consume, consume.
- The road to further technical expertise is a function of little more than time and effort. I have a few years' industry experience as a Data Scientist. I can write clean code and productionize machine learning things. For me, the OSMLM is nothing more than taking all of the extra-curricular time spent learning new tools and algorithms and making it a full-time job.
- I'm extremely motivated. The thought of studying machine learning all day has me smiling from ear to ear. Simply put, I f*cking love this stuff.
How Long is the OSMLM?
9-12 months. Not forever.
I aim to speak indistinguishably fluent French and Spanish by the time I'm 30. I'm currently 27. The Spanish box is largely checked. With 6-9 months in Francophone Morocco, the French box will be largely checked as well.
Furthermore, I've always wanted to live in a Muslim country: I grew up in a predominantly Jewish suburb of Philadelphia, and have had fantastic experiences traveling the Muslim world.
How Will I Spend My Time?
I'll be spending my best 8-10 hours of the day working from a co-working space. I'll be taking online courses, reading textbooks, participating in machine learning competitions and publishing open-source code. I intend to post frequently to this blog.
What Will I Learn?
I have 4 main areas of focus:
- "Deep Learning" with flavors of: auto-encoders, recommendation, and natural language processing. I remain obsessed with encoding real-world entities as lists of numbers. I like applications that seek to understand people better than they understand themselves. Free-form text is everywhere (and relatively quick to process).
- Bayesian Inference. Because they taught me frequentist statistics in school.
- Game Theory and Reinforcement Learning. I wrote an undergraduate thesis in game theory and group dynamics and remain eager to tackle more. Reinforcement Learning seems like the hipster way to solve such problems these days.
- Apache Spark and Distributed Computing. I have a bit of professional experience with Spark. As data continues to grow in size, distributed computing will move from a thing Google does to a no-duh occupational necessity.
What Does Success Look Like?
Success has a few faces:
- Technical. Have the technical expertise to lead teams focused on each of the above 4 topics (weighted towards the former 3, realistically).
- Personal. Learning how I best learn. How do I structure my ideal working day? Do I prefer working alone, or indeed as part of a team? What is my optimal balance of reading, thinking, and coding?
- Language. I intend to speak French like it's my mother tongue.
What Happens Afterwards?
I'm likely headed back to the Americas, where I intend to devote myself to an impossibly awesome technology project and team for a period of several years. I'd like a technical mentor as well.
How Can You Help?
In addition to self-study, I'd like to assist a few fascinating Moroccan technology organizations with their data problems. As such, if you know anyone in-country with even the most fleeting shared interest, please put me in touch.
In Two Sentences
The Open-Source Machine Learning "Master's" in Casablanca, Morocco allows me to pursue several significant personal goals at the same time. This is my Francophone machine learning adventure.
Update: Now Finished, Here's What I Did
- Neurally Embedded Emojis
- Random Effects Neural Networks in Edward and Keras
- Further Exploring Common Probabilistic Models
- Minimizing the Negative Log-Likelihood, in English
- Transfer Learning for Flight Delay Prediction via Variational Autoencoders
- Deriving the Softmax from First Principles
- Approximating Implicit Matrix Factorization with Shallow Neural Networks
- Ordered Categorical GLMs for Product Feedback Scores
- Intercausal Reasoning in Bayesian Networks
- Bayesian Inference via Simulated Annealing
- RescueTime Inference via the "Poor Man's Dirichlet"
- Generating World Flags with Sparse Auto-Encoders
- Docker and Kaggle with Ernie and Bert
- Recurrent Neural Network Gradients, and Lessons Learned Therein
- Simulating the Colombian Peace Vote: Did the "No" Really Win?
Notable courses, books:
- Statistical Rethinking: A Bayesian Course with Examples in R and Stan
- Probabilistic Graphical Models: Representation, Stanford University
- Probabilistic Graphical Models: Inference, Stanford University
- Probabilistic Graphical Models: Learning, Stanford University
- Practical Deep Learning For Coders, fast.ai
- Discrete Optimization, University of Melbourne
- Artificial Intelligence Nanodegree (Part 1), Udacity
- Deep Learning, Udacity
- Apache Kafka, Udemy
Repositories I published (or contributed to) unrelated to the publications above.
- tensorflow-models: Some basic models in TensorFlow
- vanilla-neural-nets: A straightforward and highly readable implementation of vanilla neural nets
- dotify: A web application that recommends songs via "country arithmetic" and hand-rolled Implicit Matrix Factorization
- n-queens-sympy: A simple solver for the N-Queens Problem using SymPy
- markdown-insert-screenshot: A lightweight Atom plugin for saving an interactive screen capture to a relative file destination