The Future of Data Science Is About Removing the Shackles on AI


(Blue Planet Studio/Shutterstock)

Before embracing his destiny as a superhero, Superman was raised on the Kent family farm in Smallville, Kansas, where his superpowers lay dormant and unutilized. And even in adulthood as a reporter for The Daily Planet, Clark Kent, a well-liked but unremarkable guy, still needed time to reach his true potential as the savior of humanity. Much the same can be said of the origin story of artificial intelligence (AI).

AI is finally justifying the hype that has enshrouded it for decades. While not (yet) the savior of humanity, AI has grown from concept to reality, and practical applications are changing our world for the better.

However, much like Clark Kent, many of AI’s wondrous feats are hidden and its effects can only be observed when you look beyond the disguise of the mundane. Take BNP Paribas Cardif, a major insurance company operating in over 30 countries. The company fields over 20 million calls with customers per year. By leveraging speech-to-text technology and natural language processing, they are able to analyze the content of calls to serve specific business needs: control sales quality, understand what customers are expressing and what they need, get a sentiment barometer, and more.”

Or look at AES, a top renewable energy producer in the United States and globally. Renewable energy requires a lot more devices to manage and monitor than traditional energy. Data science and AI drive AES’ next-level operational efficiency with automation and provides data-driven insights that augment the actions and decisions of performance engineers. This ensures uptime requirements are met and clean energy is delivered to customers as quickly, efficiently and cost-effectively as possible. Like Superman, AES is doing its part to help save the world.


These and the myriad AI applications already in production are just the front runners. They stand out because until now, AI’s potential has been limited by three key constraints:

  1. A lack of training data;
  2. Insufficient compute power;
  3. The need for data to be tied to specific (centralized) locations.

However, thanks to a few key technological innovations, a sea change is happening that is freeing AI of these tethers, and enterprises must prepare to leverage this powerful technology.

Let’s look at these constraints – the shackles holding AI back – and how they are being broken.

AI Shackle 1: Compute Power

Traditionally, enterprises have not had enough processing power to fuel AI models and keep them up and running. Enterprises have been left wondering whether they should rely exclusively on cloud environments for the resources they need, or whether it is better to split their compute investments between cloud and on-premise resources.

In-house, on-prem GPU clusters now give enterprises a choice. Today, there are several larger, more advanced organizations looking at production use cases and investing in their own GPU clusters (i.e., NVIDIA DGX SuperPOD). GPU clusters give enterprises the dedicated horsepower they need to run

massive training models—provided they harness a software-based distributed compute framework. Such a framework can abstract away the difficulties of manually parsing training workloads across different GPU nodes.

AI Shackle 2: Centralized Data

Data has typically been collected, processed and stored in a centralized location, often known as a data warehouse, to create a single source of truth for companies to work from.


Maintaining a single data repository makes it easy to regulate, monitor and iterate on. Just as companies now have a choice between investing in on-prem or cloud compute capacity, there has been a movement in recent years to create flexibility in data warehousing by decentralizing data.


Data localization laws can make it impossible to aggregate a distributed enterprise’s data. And a rapidly emerging collection of edge use cases for data models is making the concept of singular data warehouses less than absolute.

Today, most organizations are operating hybrid clouds, so gone are the days of data needing to be tied to one specific location. As we see businesses continue to leverage hybrid cloud, they gain all its benefits – including the flexibility of deploying models at the edge.

AI Shackle 3: Training Data

A lack of useful data has been a major obstacle to AI proliferation. While we are technically surrounded by data, collecting and storing data can be extremely time consuming, tedious and expensive. There’s also the issue of bias. When developing and deploying AI models, they need to be balanced and free of bias to ensure that they are generating insights that have value and do not cause harm. But just as the

real world has bias, so does data. And in order to scale your use of models, you need lots and lots of data.

To overcome these challenges, enterprises are turning to synthetic data. In fact, synthetic data is on a meteoric rise. Gartner estimates that by 2024, 60% of data for AI applications will be synthetic. For data scientists, the nature of the data (real or synthetic) is irrelevant. What matters is the quality of the data. Synthetic data removes the potential for bias. It’s also easy to scale and cheaper to source. With synthetic data, businesses also have the option to get data that is pre-tagged, dramatically decreasing the amount of time and resources it takes to produce and generate the feedstock to train your models.

The Ascension of AI

As AI is liberated from the data quality, compute and location shackles, more use cases and more accurate models touching our day-to-day lives will emerge. We are already seeing leading organizations optimize business processes with AI, and those that don’t make moves to keep up will be at a significant competitive disadvantage.

In order to fully reap the benefits of AI, implementation needs to come from the top down. While data scientists do the hard work of model development and deployment, the C-suite must also be educated on the concepts in order to best incorporate AI into their business strategy. Executive leaders who understand the technology and its potential can make better strategic investments in AI and, therefore, in their businesses.

Conversely, when they don’t know how AI can effectively support business objectives, they may just sink money into innovation centers and hope new research projects leveraging AI and ML bear fruit. This is a bottom-up approach that is suboptimal. Instead, the C-suite needs to partner with data science practitioners and leaders on staff to learn how to best incorporate these technologies into their regular business plans.

It took time for Clark Kent to grow into his role as protector of humanity. Now that AI’s shackles have been loosened, if not fully broken, the time has come for enterprises to help unleash AI’s full potential by investing in the solutions that will make the world a better place for us all and, in turn, help these enterprises remain competitive in today’s digital economy.

About the author: Kjell Carlsson is the head of data science strategy and evangelism at Domino Data Lab. Previously, he covered AI, ML, and data science as a Principal Analyst at Forrester Research where he wrote reports on AI topics ranging from computer vision, MLOps, AutoML, and conversation intelligence to augmented intelligence, next-generation AI technologies, and data science best practices. He has spoken in countless keynotes, panels, and webinars, and has been frequently quoted in the media. Carlsson received his Ph.D. in Business Economics from the Harvard Business School.

Related Items:

The Limits of Citizen Data Scientists

A ‘Breakout Year’ for ModelOps, Forrester Says

AutoML Tools Emerge as Data Science Difference Makers