The Open-Source Machine Learning "Master's" (OSMLM) is a self-curated deep-dive into select topics in machine learning and distributed computing. Educational resources are derived from online courses (MOOCs), textbooks, predictive modeling competitions, academic research (arXiv), and the open-source software community. In machine learning, both the quantity and quality of these resources - all available for free or at a trivial cost - is truly f*cking amazing.

Why Am I Doing This?

I want to become more of a technical expert in machine learning. I want to use this expertise to solve real-world problems that actually matter.

To this end, I see two main roads: a traditional graduate program, and the OSMLM.

Why Not Graduate School

For me, graduate school is suboptimal for 3 key reasons:

  1. It's expensive. Upon a quick Google search, a 2-year graduate program would cost, conservatively, $80,000 in tuition fees alone. This is a wholly nontrivial sum of money that would impact how I structure the next 10 years of my life.
  2. There are far more dependencies. I have to apply. I have to get accepted. I have to find the right professor. I have to find a city suitable to my broader interests and lifestyle. This takes time.
  3. By the time I finish, the field of machine learning will look fundamentally different than it did when I started. This is the most important point of all. The only way to remain current with the latest tools and techniques is to do just that. Given the furious and only-accelerating-faster pace at which machine learning is moving, this requires much more than just a few hours on the weekend.

Why the OSMLM

  1. I think the higher education paradigm is changing. Access to critical, academic knowledge is increasingly democratic: Khan Academy can teach me about the Central Limit Theorem as well as any statistics professor. The ~$250,000 in tuition fees commanded by an undergraduate education at a private American university is, for some, several decades of debt and concession, and for others, prohibitive beyond comedy, reason and fantasy alike. If hard-skills are your end, online self-education is an immensely attractive, intuitive, and practical road to follow - especially in an industry as meritocratic as tech.
  2. I'm keenly aware of how productive I am in a self-teaching environment. I'm largely self-taught in data science. Before that, it was online poker: a 5-year, $50 to $150,000 journey of instructional videos, online forums, critical discussion with other players and personal coaching - all from the comfort of my bedroom. I'm very effective at learning things online.
  3. Some of the most impactful projects I've completed professionally stemmed directly from those I'd completed personally. I would not know how to ensemble models if not for Kaggle. I would not know how to perform hierarchical Bayesian inference if not for Bayesian Methods for Hackers. The open-source data science community continues to teach me creative ways to use data to solve challenging problems. To this end, I want to consume, consume, consume.
  4. The road to further technical expertise is a function of little more than time and effort. I have a few years' industry experience as a Data Scientist. I can write clean code and productionize machine learning things. For me, the OSMLM is nothing more than taking all of the extra-curricular time spent learning new tools and algorithms and making it a full-time job.
  5. I'm extremely motivated. The thought of studying machine learning all day has me smiling from ear to ear. Simply put, I f*cking love this stuff.

How Long is the OSMLM?

9-12 months. Not forever.

Why Morocco?

I aim to speak indistinguishably fluent French and Spanish by the time I'm 30. I'm currently 27. The Spanish box is largely checked. With 6-9 months in Francophone Morocco, the French box will be largely checked as well.

Furthermore, I've always wanted to live in a Muslim country: I grew up in a predominantly Jewish suburb of Philadelphia, and have had fantastic experiences traveling the Muslim world.

How Will I Spend My Time?

I'll be spending my best 8-10 hours of the day working from a co-working space. I'll be taking online courses, reading textbooks, participating in machine learning competitions and publishing open-source code. I intend to post frequently to this blog.

What Will I Learn?

I have 4 main areas of focus:

  1. "Deep Learning" with flavors of: auto-encoders, recommendation, and natural language processing. I remain obsessed with encoding real-world entities as lists of numbers. I like applications that seek to understand people better than they understand themselves. Free-form text is everywhere (and relatively quick to process).
  2. Bayesian Inference. Because they taught me frequentist statistics in school.
  3. Game Theory and Reinforcement Learning. I wrote an undergraduate thesis in game theory and group dynamics and remain eager to tackle more. Reinforcement Learning seems like the hipster way to solve such problems these days.
  4. Apache Spark and Distributed Computing. I have a bit of professional experience with Spark. As data continues to grow in size, distributed computing will move from a thing Google does to a no-duh occupational necessity.

What Does Success Look Like?

Success has a few faces:

  1. Technical. Have the technical expertise to lead teams focused on each of the above 4 topics (weighted towards the former 3, realistically).
  2. Personal. Learning how I best learn. How do I structure my ideal working day? Do I prefer working alone, or indeed as part of a team? What is my optimal balance of reading, thinking, and coding?
  3. Language. I intend to speak French like it's my mother tongue.

What Happens Afterwards?

I'm likely headed back to the Americas, where I intend to devote myself to an impossibly awesome technology project and team for a period of several years. I'd like a technical mentor as well.

How Can You Help?

In addition to self-study, I'd like to assist a few fascinating Moroccan technology organizations with their data problems. As such, if you know anyone in-country with even the most fleeting shared interest, please put me in touch.

In Two Sentences

The Open-Source Machine Learning "Master's" in Casablanca, Morocco allows me to pursue several significant personal goals at the same time. This is my Francophone machine learning adventure.

Update: Now Finished, Here's What I Did


Notable courses, books:


Repositories I published (or contributed to) unrelated to the publications above.

  • tensorflow-models: Some basic models in TensorFlow
  • vanilla-neural-nets: A straightforward and highly readable implementation of vanilla neural nets
  • dotify: A web application that recommends songs via "country arithmetic" and hand-rolled Implicit Matrix Factorization
  • n-queens-sympy: A simple solver for the N-Queens Problem using SymPy
  • markdown-insert-screenshot: A lightweight Atom plugin for saving an interactive screen capture to a relative file destination