Sep 29, 2020
Data science has gained significant mindshare across many industries over the last decade. Today, organizations are accumulating vast amounts of data and want to efficiently extract as much value from it as possible. The interest in data science, however, has often not translated into business impact. Though data science has exploded in popularity and there has been a proliferation of tools for individuals, there is a big gap between the tools that data scientists prefer and those that meet requirements for the enterprise.
Why is this the case? While data has been in the limelight for quite some time, many organizations have just been laying the foundation for meaningful data analysis and data products. They have been focused on the basics – ingestion, storage, quality, and simple transformations. The most immediate value that organizations can extract from data is business intelligence (BI). On the other hand, data science is an investment that requires a longer horizon. As a result, both data engineering and BI ecosystems are much more mature than data science, which is still playing catch-up.
The reality is that most data scientists in organizations still download data on their laptop and run code locally with open source tools. While this is a very familiar workflow, they run into issues with collaboration and reproducibility. After all, these tools were not designed with collaboration as a first-class citizen. Unfortunately, existing data science tools are either not enterprise-ready because they are not conducive to collaboration nor do they scale to large volumes of data; or they are not user friendly because they require users to work with tools and languages that they are not familiar with.
Data scientists would rather solve the data problem at hand than worry about managing environments, configuring clusters, figuring out why a computation is running so slowly, monitoring resource usage or making sure they are compliant with security. As I discussed in a previous article, a disruption to the data science workflow, even if it might appear small, can have a significant impact on productivity. Sometimes the disruptions are not so small. For example, when an entire data science organization is not able to use a fundamental data manipulation library because the organization has adopted a platform that requires them to learn a new one. I argue that one of the key reasons data science teams are not as productive as they should be is because they don’t have the tools that allow them to work productively and collaboratively.
This is where Coiled comes in. Coiled accelerates data science adoption and increases productivity of data science teams by removing distractions so that data scientists can focus on solving data problems. Founded by Matthew Rocklin, Hugo Bowne-Anderson and Rami Chowdhury, Coiled is uniquely positioned to solve the problem of data science for the enterprise. Matt is the creator of Dask, which is the most widely used Python library for parallelization. He has deep expertise in enabling data science in large organizations and in scaling data science workloads from his experiences at Continuum Analytics and NVIDIA. Hugo and Rami also have deep roots in data science and open source communities through their work at DataCamp and Continuum Analytics.
Coiled solves the problems of enterprise-readiness and usability in many ways:
Coiled will be the crucial connective tissue necessary for enterprises to do data science at scale. That is why we are incredibly excited to partner with them. Congratulations to Matt, Hugo, Rami and the rest of the Coiled team!
Coiled runs from anywhere you can run Python, including other web services, automated jobs, and even your own laptop.