Machine Learning Engineer
Note that for this position, we're looking for candidates located in the following timezones: UTC, UTC+1, UTC+2, UTC+3.
Parse.ly is a real-time content measurement layer for the entire web. Our analytics platform helps digital storytellers and content creators at some of the web's best sites, such as Arstechnica, The New Yorker, TechCrunch, Wired, The Intercept, Slate, and many more. In total, our analytics system handles over 65 billion monthly events from over 1 billion monthly unique visitors.
As a Machine Learning Engineer, you will conceive of, prototype, and build systems that perform these kinds of tasks:
- Detecting entities, linking them with an ontology such as Wikipedia
- Grouping content into temporally coherent ‘stories’ á la Google News
- Making text-based content recommendations
Your ML code will run in production in a modern public cloud environment, and will power authenticated HTTP/JSON APIs used by customers and other teams.
To solve these challenges, we’re looking engineers who apply expert techniques such as:
- Deep Learning for Natural Language Processing, e.g. via transformer models (e.g. GPT-3, BERT).
- Word embedding models (e.g., word2vec, fastText)
- Bayesian reasoning via generative models (e.g., expertise in defining and learning the parameters of GMMs, HMMs, RBMs, LDA)
- Other topic modeling and clustering techniques
- Time-series analysis and anomaly detection
As an integral part of a small R&D team delivering innovative and differentiated ML and NLP approaches, you also need to be a strong communicator, so that you can present your findings and your prototypes to the rest of the team. You should feel comfortable rolling up your sleeves to get these models into production and into the hands of other teams and our customers.
- Fluent in the Python Machine Learning ecosystem. This involves familiarity with: NumPy or SciPy; scikit-learn; PyTorch, Keras, or Tensorflow. If you know even a couple of these, you’ll likely be in a good place.
- Master’s or PhD in Machine Learning (or similar field), or equivalent industry experience. The key skill we’re looking for is an academic fluency to read Machine Learning journal articles, to understand them, and to apply their ideas to real working software.
- Interest in productionizing models (experience is a plus). This is an applied systems builder role, so your models will not sit in IPython Notebooks, they will end up running 24x7 in real-world deployments. Therefore you’ll learn how to build ETL pipelines, how to ship maintainable code, and how to schedule/operationalize it (e.g. with Apache Airflow or similar tools).
- Interest in cluster/cloud computing at scale (experience is a plus). For example, if you’ve used libraries like PySpark or Dask, or if you’ve used cloud computing tools like AWS Athena or Google BigQuery, you’ll find that handy as we have very large datasets stored in cloud filesystems.
- Break user needs down into tractable machine learning problems. This might be via brainstorming with your team and exploratory data analysis. Your broad experience with a wide variety of modelling techniques should guide you to a short list of techniques to test. Your deep experience with one or two modelling techniques will bring new ideas to our team. Your data exploration skills should inform your feature engineering requirements.
- Independently assemble the data you need. Write and execute the ETL jobs you need to train your models -- you’ll often need to utilize a cluster computing framework (such as Spark, Dask, or BigQuery) to process several terabytes of data efficiently.
- Train and evaluate your models. Understand how well your models work and their sensitivity to hyperparameters via appropriate cross validation and experiment design.
- Keep your work tidy. Use notebooks for quick analysis, but keep boilerplate and copypasta to a minimum by writing clean, tested code that’s maintained and used by the entire team. You will work with other engineers to make sure there’s “one right way” to process data.
Fully distributed team, European ML team
Parse.ly is a fully distributed team, with engineers working from home offices. There is no central office. People with past experience working in this “remote-first” way will be prioritized. The machine learning team is timezone-clustered, so you should be based in a country/city with one of the following timezones: GMT, GMT+1, GMT+2, or GMT+3.
Though this ML team is based in Europe, it will work closely with the rest of the Parse.ly product team that is based in US timezones (GMT-7 thru GMT-3). About once per week, you should be available for an 11am ET (NYC time) video meeting, since that’s a time slot where we can catch the full distributed team without being too early or too late for everyone.
A Note About Automattic & Benefits
In February 2021, Parse.ly was acquired by Automattic's enterprise software division, WPVIP. Automattic is one of the biggest champions of open source and the open web, and also one of the top fully distributed employers in the world. Though you will interview for the Parse.ly team within WPVIP and Automattic, if you receive a job offer, it will be to become an Automattician, which means having colleagues who observe the Automattic creed and getting access to Automattic's excellent employee benefits. Your benefits will include:
- Time off: Our open vacation policy (no set number of days per year) is designed to help you to be at your best! There is no minimum or maximum, but we encourage you to take at least 25 days of time off per year.
- Health care: Automattic pays 100% of plan premiums for you, your spouse/domestic partner, and eligible dependents.
- 401K: 100% match on your contributions up to 6% of annual earnings. You’re eligible to participate in Automattic’s 401(k) plan on your first day of employment. Match is fully vested from day one.
- Parental leave: Open parental leave (including maternity, paternity, LGBTQ+, and adoption for all parents). If you’ve been with Automattic for 12 months, your leave up to 6 months is fully paid.
Automattic's benefits can vary by location. The above describes US-based staff. See our benefits based on where you are in the world.
A Note About Diversity & Inclusion at Parse.ly
We’re improving diversity in the tech industry. At Parse.ly and our parent company, Automattic, we want people to love their work and show respect and empathy to all. We welcome differences and strive to increase participation from traditionally underrepresented groups.
Studies have shown that women and people of color tend not to apply for jobs unless they feel they meet exactly all items in the job description. We believe that high-performing teams include people from different backgrounds and experiences who can challenge each other’s assumptions with fresh perspectives. We encourage you to apply, even if you don’t match everything listed in the job description. Let us know why this role appeals to you, why you will be great, and which requirements will stretch you.
We are an equal opportunity employer who values and encourages diversity and belonging at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.
- Apply now by sending your CV and/or LinkedIn profile, Github link (or similar) to firstname.lastname@example.org. Make sure to indicate you are applying for the "Machine Learning Engineer" role.
- Include a 1-3 paragraph intro to why you're interested in this role.
- Tell us about an interesting project you worked on and point us toward a piece of code you wrote.