#data science #data engineering #machine learning

Data Versioning

Productionizing machine learning/AI/data science is a challenge. Not only are the outputs of machine-learning algorithms often compiled artifacts that need to be incorporated into existing production services, the languages and techniques used to develop these models are usually very different than those used in building the actual service. In this post, I want to explore how the degrees of freedom in versioning machine learning systems poses a unique challenge. I'll identify four key axes on which machine learning systems have a notion of version, along with some brief recommendations for how to simplify this a bit. ...