06/01/2023

Crafting a Strategic Approach for an AI Platform

Insights

6 min remaining

The big AI question has shifted away from the old build vs. purchase debate. The conversation now revolves around: Should I purchase an end-to-end AI platform, or should we buy the best tools for each area, and then build connections between them?

The 100% Build vs. Purchase is a thing of the past

Let’s step back and ask ourselves: What happened to the build vs. purchase debate? Most organizations won’t build a platform from scratch today for a variety of reasons. One is the technical debt hidden in machine learning systems that Google has identified.

It is impossible to build an AI platform from scratch that allows AI efforts to scale (see Figure 1) because there is too much “glue”. There are so many features outside of the core functionality that simply builds a machine-learning model.

The reality is that it is not possible to build an AI platform entirely from scratch. But what are the alternatives?

Discussion on the New Build vs. Buy Discussion

Most organizations have two choices when it comes to building a modern AI platform (and scaling an enterprise AI strategy).

  1. Buy a single platform that can cover the entire data science, machine learning, and AI lifecycles (Figure 2) – from raw data ingestion to ETL and building models, to operationalization and AI systems.
  2. The best tools are purchased for each step or part of the lifecycle. These tools can then be combined to create a platform that is customized for your organization.

In many cases, this second option is situational. Existing investments dictate it (e.g. we have already invested in x,y,z; what else can we do to complete the stack? How can we connect it all?) The organization may be more motivated by implicit choice than the explicit decision to make new investments that best suit its needs.

The best tools for ETL, AutoML, data cataloging, model management, etc., should be provided. (See Figure 3), allows each team to select the technology that they wish to use. This is an attractive prospect for organizations trying to maintain consensus. The “glue” that holds these pieces together, although not as difficult as creating everything from scratch is still a challenge.

In addition to the glue issue, important aspects of the lifecycle end-to-end are lost by moving from one tool to another. As an example:

  • It is hard to trace the data lineage across different tools. It is a problem for organizations of all sizes and industries because transparency and explainability are essential to building trust internally and externally. In some industries (like financial services and pharmaceuticals), it is even required by law. It will be impossible, if not downright impossible, to tell at a glance what data is used for which models, or how the data is treated.
  • Combining best-in-class tools can also complicate the handoffs between teams. (For example, between analysts, data scientists, and software or IT engineers to deploy into production, or between data scientists, IT, and software engineers). The handoff between teams can be complicated by the use of multiple tools. For example, data scientists and analysts may need to share critical information.
  • As a result of team handoffs, and collaborations between data practitioners comes the challenge of managing approval chains. How can businesses reduce risks by ensuring they have checks and sign-offs as AI projects move from one stage to the next? They should be looking out for model bias, fairness issues, data privacy, etc.
  • Option two means lost opportunities for automation in between stages of the lifecycle. For example, triggering automated action when the data underlying a model or AI systems in production have fundamentally changed.
  • How do teams version and audit the artifacts that are shared between these tools in the same vein? How can one tell, for example, which version of the model in device B matches the version of the data pipe in tool A?

The End-to-End Advantage

Due to the challenges outlined above, organizations should not spend their energy on building an AI platform by assembling tools from different lifecycles. This will result in a loss of the overall picture and the data pipeline, as well as adding technical debt. Investing in an AI platform that is end-to-end provides the following benefits:

1. Reuse can save you money

The ability to see AI pipelines end-to-end in one location allows data artifacts to be reused and capitalized across the entire organization. Data that has been cleaned and prepped by analysts can then be used by data scientists in other business areas, saving them time and maximizing the ROI of AI. It is crucial for organizations wishing to scale their AI strategy that they embed the concepts of reuse, capitalization, and repurposing into their very fabric.

2. Implementation of High Impact Technologies is the main focus

An end-to-end AI platform serves as a central abstraction layer, allowing IT and architecture teams to focus on the constant evolution of the underlying technologies that will benefit the whole organization rather than maintaining the interaction between tens of different tools to work with data across the business units.

3. Easy Governance and Monitoring

The concept of data governance is not enough for most organizations. It includes all controls and processes that businesses must implement to reduce risk, both in their operations and in compliance with regulatory requirements. The ability to interact with one tool that is used by everyone in the organization simplifies efforts to reduce AI risks, which are growing with democratization and adhering to increasing data privacy regulations.

Monitoring is largely handled by MLOps. MLOps must be integrated within the larger DevOps strategies of the enterprise to bridge the gap between traditional machine learning and CI/CD. This means that DevOps should use complementary systems and allow them to automate machine learning tests just like they automate software tests. This level of automation can be achieved (and is simple to do) using a single platform that covers the entire lifecycle, such as Dataiku. When working with multiple tools, it can quickly become messy.

The End-to-End Risk

The fear is that if you invest in a platform that covers everything, your organization will be tied to one vendor. It’s not a small or insignificant risk. Lock-in is an important consideration as the company is dependent on the vendor’s decisions and roadmap.

It is important to invest in open, extensible technology, which allows organizations to take advantage of their existing data architecture and invest in the best technologies available in terms such as storage, computation, algorithms, programming languages, frameworks, etc.

Ask questions when evaluating AI tools about not just the platform’s ability to integrate with current technologies (programming language, machine learning model libraries, data storage systems, etc.) but also about the company’s vision and overall AI strategy. The platform should be open enough to allow for future technologies that the company might want to implement to be easily integrated into the platform.

About the author

Kobe Digital is a unified team of performance marketing, design, and video production experts. Our mastery of these disciplines is what makes us effective. Our ability to integrate them seamlessly is what makes us unique.