Continuous Delivery in Data Science

As I discussed in a previous article, Data Science is in desperate need of Devops. Fortunately, there are finally some emerging devops patterns to support Data Science development. DataBricks themselves are providing much of it.

Two concepts keep popping up in the devops patterns: “Continuous Integration / Continuous Deployment” and “Test Driven Design” (Moving toward “Behavioral Driven Design” but that’s not a widely used term).

Metaphors used in Data Science

People say data science is difficult, which it is, but even harder is explaining it to other people!

Data Science itself is to blame for this, mostly because we don’t have a concrete definition of it either, which has created a few problem. There are companies promoting ‘Data Science’ tools as ways to enable all your analysts to become “Data Scientists”. The job market is full of people who took a course on Python calling themselves “Data Scientists”. And businesses so focused on reporting that they think all Analytics, Data Science included, is just getting data faster and prettier.

But the tools we use are just that, tools. The code we use requires specialized knowledge to apply it effectively. The data pipelines we create are to monitor the success and failure of our models, it’s an added bonus it helps with reporting.  To mitigate these challenges we have to come up with some clever metaphors, let's explore them a little more deeply.