Blog posts about machine learning in production
The notebook anti-pattern
In the past few years there has been a large increase in tools trying to solve the challenge of bringing machine learning models to production. One thing that these tools seem to have in common is the incorporation of notebooks into production pipelines. This article aims to explain why this drive towards the use of notebooks in production is an anti pattern, giving some suggestions along the way.
Testing your ML pipelines
When it comes to data products, a lot of the time there is a misconception that these cannot be put through automated testing. Although some parts of the pipeline can not go through traditional testing methodologies due to their experimental and stochastic nature, most of the pipeline can. In addition to this, the more unpredictable algorithms can be put through specialised validation processes.
Monitoring ML pipelines
I have spoken a lot in this blog about the process of bringing machine learning code to production. However, once the models are in production you are not done, you are just getting started. The model will have to face its worst enemy: The Real World!
Automated model serving to mobile devices
The most common approach to deploying machine learning models is to expose an API endpoint. This API endpoint would generally be called via a POST method containing the input data for the model as the body, and responding with the output of the model. However, an API endpoint is not always the most appropriate solution to your use case.
Terraforming a Spark cluster on Amazon
This post is about setting up the infrastructure to run yor spark jobs on a cluster hosted on Amazon.
Spark Word2Vec: lessons learned
This post summarises some of the lessons learned while working with Spark’s Word2Vec implementation. You may also be interested in the previous post “Problems encountered with Spark ml Wod2Vec”
Conference talks
AMLD 2020
Testing your machine learning pipelines, AI & Industry track
Switzerland
DV Hive 2018
Let evolution do the guessing - how to evolve neural networks
Berlin
MCubed 2020
Monitoring your ML Pipelines
London
DV Hive 2019
Flaming the notebook: ML in the real world
Berlin | group presentation
IEEE CEC 2014
Cooperative DynDE for temporal data clustering
Beijing
DV Hive 2020
A Geospatial Dig
Berlin
DV Hive 2019
Buzz can crack too: How to test your ML pipelines
Berlin
IEEE CEC 2013
A cooperative multi-population approach to clustering temporal data
Cancun
DV Hive 2018
Deployment of machine learning models on mobile devices
Berlin | group presentation
LNCS
Dynamic differential evolution algorithm for clustering temporal data
Sofia