Name: The Modern AI Architecture: The Foundation for GenAI and AI-Driven Agents
Uploaded: 2025-04-03T16:31:00.599Z
Duration: 1 h 40 s
Description: The Modern AI Architecture: The Foundation for GenAI and AI-Driven Agents

Transcript for "The Modern AI Architecture: The Foundation for GenAI and AI-Driven Agents": Okay. Good morning. Good afternoon, everyone. I know we're all in different parts of the world. So some people, it might be mid afternoon. For some people, it might be the crack of dawn. We are just happy that you are joining us today on Dataiku Live for our next webinar, a part of our Agent April series. I see a couple of highs and hellos in the chat. It's good to see you all. If you can all post, this is my favorite part of any webinar, if you could post where you are watching from, I always find that the most interesting. So go ahead and say hi in the chat and where you are posting from while I share my screen, and we get started. Alright. So I see London. I see Alberta, Canada. I see Ottawa, Canada. France, London, Germany, Texas, Dublin, Michigan. So like I said, for some of you, it is well into the afternoon, and your day is probably finishing up. For some of us, we are just getting started. I know Dmitri is on the West Coast in Canada, so I know his day is just getting started, as we will just get started. As I said before, welcome everyone to our Dataiku Live webinar. We're really glad that you can join us today. Hey, let's face it, Agentic AI is everywhere right now. Whether you're in manufacturing, healthcare, retail, finance, you're probably hearing about AI agents and wondering how they fit into your business. But before jumping on that agent bandwagon, we need to take a step back and really understand what's actually needed under the hood to make agents and AI work. So that's what today's session is about. We're going to highlight the modern AI architecture and what groundwork must be in place before you can really have value from agents and Gen AI. I will go ahead and properly have us be acquainted. My name is Chad Covin. I'm the host of today's session. I'm a Senior Technical Specialist in product marketing here at Dataiku, and I've been with Dataiku for almost three years now. I'll go ahead and pass it over to Dmitri so he can introduce himself. Hi, everyone. My name is Dmitri Ryssev. I'm a Solution Architect here at, Dataiku. Been here about four years now and I'll be leading the product demonstration today. Nice to meet you all. Right upfront, where are we going today? First, we'll discover the powerful modern architecture behind today's most successful AI and agentic implementations, and we'll show you how a flexible framework can drastically reduce time to insights. We've already seen companies like Novartis experience a 90% reduction in time to insights using this approach. Second, we'll demonstrate how systematic evaluation of language models can optimize both performance and cost in a landscape with hundreds of available models. Knowing which one delivers the best results for your specific use case can make the difference between wasted resources and that competitive advantage. Finally, we'll explore how proper governance controls protect your organization from all types of risk, including financial risk for Gen AI adoption. According to Gartner, most organizations will face cost increases of at least 40% for Gen AI by 2027. We'll show you how to implement controls that maintain budget predictability without compromising security or innovation. That's where we're going. How How are we going to get there? First, we're going to take a quick stroll down memory lane and understand how we got to this current AI landscape. Then we'll explore the three critical points that most organizations face when trying to scale their agents and Gen AI offerings. Next, we'll see a live demo run by Dmitri in Dataiku so that we can understand how Dataiku solves some of these critical points. Finally, we'll go through a quick conclusion and leave some time at the end for questions. Also, I recommend for anyone that has questions during the session, please use the Q and A section. It should be all the way towards the right, right above the chat window. There you can type out any questions and we can also type out responses. If we're not able to type out a response to you during the session, we'll address them towards the end in the Q and A section. Let's start with that quick stroll down memory lane, as I mentioned, and we'll talk about why exactly we need this sort of modern AI architecture. We all know that the tempo of AI's evolution is just at a breakneck speed. It is at a crazy pace. This is not going to be an all encompassing list, but over the past twelve months, we have had Anthropic release Claude three for their myriad of models. We had, not to be outdone, that same spring GPT four point zero released. Then Mistral came out with their large two models, and then Google released Gemini two point o in fall of last year. Not to be outdone by DeepSeek, which essentially changed the world and was the talk of all the town for a couple months. Then Claude released 3.7 Sonnet, not to be outdone again by OpenAI, four point five, and then we had Gemini 2.5 released just a couple of weeks ago. We all know where this is going, what's next? For IT leaders, this isn't just the timeline, it's really a challenge. Each release represents a pivotal point. You invest months integrating, let's say, Cloud three into your workflows, and then GPT four point zero arrives with capabilities that could benefit different departments. As your team scrambles to evaluate it while maintaining existing systems, two more major models enter the market. Remember, these are just the headline models. We haven't even touched on specialized domain specific language models or multimodal models that might actually deliver better results for particular use cases. This relentless speed of LLM innovation we just saw, it really creates a fundamental challenge for scaling Gen AI initiatives. Building enterprise solutions requires managing complexity across four interdependent layers that must work together seamlessly. First, the generative model layer, where what works best today might not work best tomorrow. Second, the feedback layer, where implementing effective learning loops across multiple models will compound complexity. Third, the deployment layer, which must integrate new models without disrupting operations. And the monitoring layer, which becomes increasingly challenging with each additional model and use case. So McKinsey's research shows that leading organizations are tackling this through component based architecture, designing systems with swappable components that can be updated independently, allowing them to incorporate new capabilities without rebuilding the entire stack. This brings us to the fundamental tension that organizations face, control versus innovation. Right? On one hand, you need tight control over cost, security, governance, agents, and LLM models themselves. And on the other hand, you need the flexibility to quickly adopt new models and compatibilities to stay competitive. Traditional architectural approaches force you to choose one or the other. Either you lock down your infrastructure for control but slow innovation, or you move fast but potentially lose control of cost and security. This tension is at the heart of why scaling agents and GenAI has been so challenging. When this isn't done correctly, let's understand some of the three critical failure points. The three critical failure points organizations face are being trapped in a rigid, inflexible architecture that can't adapt, lacking the ability to select the optimal model for each specific use case, and balancing performance and cost, and ultimately losing control over the security and the total cost of their Gen AI and agent applications. Let's look at each of these a little further. Let's start with the first critical point, rigid architecture, where organizations build applications tightly coupled to specific models, they face the rebuild or fall behind dilemma. So according to IT REX, this rebuild cycle triples development costs as teams must repeatedly reconstruct applications with each model switch. Meanwhile, industry leaders have already figured this out. Algonize reports 93% of Fortune 500 companies are using at least three Gen AI providers for their applications, many of which are using four and five Gen AI models and Gen AI providers for each of their applications. The most successful companies aren't asking which model should we standardize on. They're asking how do we enable an architecture to be flexible enough to use the model, the best model, for each specific need. Without this flexibility, you're constantly rebuilding rather than innovating. Number two is the failure point of model selection. Too many organizations default to whatever model is trending rather than what's right for their specific needs. But model selection is not just a performance issue. It's also a security issue. According to Harmonic, eight point five percent of employee prompts contain sensitive data. Selecting the optimal model might mean using a leading provider's API hosted solution for general tasks, but it might also mean using a locally hosted model when sensitive data requires enhanced security controls. The most successful organizations match each use case to the optimal model based on factors like data sensitivity, performance requirements, and cost. And without this strategy, we're either overpaying for capabilities that we don't need or we're compromising on security for sensitive information. Speaking of security, the third critical failure point, which really is the most dangerous of all, is security and cost governance failures. We are currently in an environment where AI spending is outpacing IT budgets by nearly three times, while about 70% of executives admit to prioritizing speed over security. So we're watching organizations really create that perfect storm. This combination of unchecked spending growth and deliberate security compromises, they don't just threaten budgets, but they create vulnerabilities that could lead to data breaches and compliance violations, as well as AI and agents possibly going rogue. The most successful organizations implement governance that provides visibility and protection while still enabling innovation. Now, we understand the three critical failure points of ineffective architecture. Naturally, the success points are the inverse to have a modernized AI architecture. The first key is flexibility. As we've seen, models evolve at breakneck speed, so your architecture must adapt without requiring complete rebuilds. This means creating a foundation that connects new models to your existing layers without disrupting operations. Second, is knowing the best LLM for each use case. This means systematically matching models to business needs, whether that's an API based model for general tasks or locally hosted solution for sensitive data. This approach optimizes both performance and cost across your entire architecture. Finally, maintaining full control over usage and cost balances. Centralized governance with innovation. The right controls provide visibility and predictability while still enabling your teams to move quickly. So what ultimately is the solution? And naturally, you will not be surprised. It is Dataiku. Dataiku provides the modern AI architecture in a few ways. First, flexibility is delivered through the Dataiku LLM mesh, which creates a component based architecture connecting all these leading GenAI providers you see here. This also allows you to seamlessly integrate and swap models without rebuilding your applications, which future proofs your GenAI strategy against the rapid pace of innovation we saw earlier. Second, Dataiku's QualityGuard ensures you're using the optimal model for each specific use case, standardizing output quality while balancing performance needs with cost considerations. This systematic evaluation takes the guesswork out of model selection. Finally, Safeguard and Cost Guard provide the governance controls that prevent the security vulnerabilities and budget overruns we discussed. Safeguard reduces operational risk through security guardrails, while CostGuard gives you visibility into Gen AI spending and the ability to proactively block excessive costs, which is critical when AI spending is growing nearly three times faster than overall technology budgets. Together, this creates a modern AI architecture that delivers on all three critical success points. Now, let's see what this looks like in practice. I'll hand it over to Dmitri to show us how Dataiku actually handles these critical points. Awesome. Thanks, Chad. So while Dmitri is getting set up, I'm gonna go ahead and open a poll. We always love for some, you know, audience participation. I'll open a poll that you can go to the poll tab and vote, and Dmitri will lead us in our demo. Perfect. Thanks, Chad. So, what's amazing about Dataiku is you can, leverage the platform to different use cases, whether they're, Gen AI based or or not. But in this case, I wanted to start with providing a few examples of how you can leverage Dataiku to perform some of the most common types of, Gen AI related, use cases. So for example, let's say that you are someone working in the insurance industry. Maybe you are a loan adviser, and you wanna have a really easy way to be able to to have a natural language conversation with the underlying datasets, related to your customers and to your documents. In this case, we're provided a interface that looks very intuitive for the end user, somebody maybe on the business side, to be able to ask questions of the data. So for example, what's the average age of my loan applicants? And what Dataiku can support is this text to SQL paradigm where any question a user asks, can be translated into SQL, and then answered in natural language by the, by the LOM. And in the background, you can actually see what the SQL query was. You can continue the conversations. You can ask something like, and what's the minimum age of these applicants? And it'll go back and forth. It will generate another SQL query, generate a response and provide it for the user. So now we have more users able to to interact with that data through natural language that's automatically generated to SQL. Another example can be, working not with just datasets, but with documents. So in this case, we we have, what's known as Dataiku Answers, a web interface connected to an underlying set of documentation for, loan approval policies and our credit policies. And so I can ask questions of those documents, like, what is our minimum credit score for approval, for example. And what's gonna happen is it's going to, automatically find the most relevant pieces of documentation. It's going to sift through that information, pass it off to an LLM, and the LLM is going to respond back again in natural language. And it's even going to cite the particular passages or documents, that the user can see, that they can reference, so they can make sure everything is actually accurate in terms of what the LLM is is responding to, in terms of its answer. How great is that? And if you're looking for something even more cohesive where maybe you have, one interface, one stop shop where users can come in, they can ask questions of datasets, and it'll do text to SQL. It can ask questions about the policy documents. It can ask, customer dataset questions and and so much more. That's when Dataiku Agent Connect is very powerful because you effectively give this, this application access to a whole host of agents and their corresponding tools such that the LOM can reason which tool and which agent makes sense, for the particular task at hand based on the user's query. And so the LLM can then respond back with the appropriate information that is gathered from those agents and from those tools, in this case, answering questions about, kind of a reference to our credit scoring guide. It will actually source which agent can use and which tool that agent used in order to provide this information. And, again, you can, connect up this, this agent with as many tools as makes sense for the the user's application and and for the context at hand. Now you're probably asking, well, how do I actually go about building a use case such as this or others in Dataiku such that we can, benefit from having this sort of interface for our end users? Well, let's go through it. So I will open up a flow in in Dataiku, and I expect some of the audience are familiar with Dataiku, some less so. So I'll start off by giving a bit of, of an introduction. This is a flow in in Dataiku verbiage, and the blue squares refer to datasets, as it were. And so we're developing, like, a pipeline or a flow that is representative of the use case we're we're trying to accomplish. In between the blue squares, you may see, circles or other colorful objects. Those refer to recipes. So some transformation or, typically, some action we're taking on the data. And because of Dataiku's flexibility with respect to how you can work with, datasets, these can be visual recipes, these can be code based recipes, these can be recipes that are visual but call an LLM. And the idea is that whether you're technical or nontechnical, whether you have a probability to working in code or prefer to work with visual tools, everybody has an ability to work with data in Dataiku. And so that leads to further collaboration, democratization of access to data and LMs such that, companies get more value from these resources and from these models. Of course, if I'm working with structured or tabular data, then I can open up a dataset here. I can begin to look at the distribution of the data, understand it, and begin to build out recipes to work with it. But in fact, in this case, we're gonna focus on the, RAG or retrieval augmented, generation, section of what I showed earlier. So where the LM can pull information from a set of, structured or unstructured documents. The beauty of Dataiku again is we can work just as as well with tabular data and data coming from, say, a file share. So this could be, a file share living in OneDrive or in SharePoint or in a bucket in Cloud storage or on prem. It really doesn't matter from Dataiku's perspective because we have connectors into various different data sources. Again, the data can be structured or unstructured. As you can see here, we're working with a few different PDF files that are within this folder. We have tooling that's in place for the user to be able to work with this data in a very clean and easy way. So for example, we have what's known as this Embed documents recipe, where it can point to a, a folder of documents like we're we're doing here, and it's going to go through each of the files within that folder, recognize the format of those documents. And for the cases where it's more of, say, image like, so maybe it's a PDF file or a PowerPoint file, it's going to actually take a snapshot of each page of that document, and it's going to send it off to a large language model with the capability of understanding, images, so a multimodal model as it were. And it's going to, ask the, the vision language model, in this case, the VLM, to provide a detailed, description of, of that page. So this is the first place in which Dataiku's flexibility and optionality with respect to models becomes very apparent. You can actually select which model you want to leverage, which vision model you wanna use in order to be able to make that, happen. So as you can see here, I select the GPT four zero Mini, but you can just as easily select a Vertex model, a Bedrock model, and OpenAI, any of these other models. And again, the list here will depend on what level of access I have to as the user. As a Dataiku user, your list might be different depending on what level of access your company has given you access to. This is an example. And then for the cases where it's not a PDF or a PowerPoint file, if it's just text, it will simply send that text as is off, for the embedding process, which is what's happening here. We if we run that recipe, we are left with what's known as a knowledge bank, which is, basically the the collection of, of embeddings, a set of documents that we can then point to, a large language model or an agent, in another case in order to ask questions of those documents. And the reason that this exists is because you don't necessarily want to have the large language model probe every single document and every page of that document if you're working with like hundreds or thousands of documents and they're really big documents. So the point of this is that at runtime when you ask a model to ask questions of that data, it will go ahead and retrieve this number that you can specify here of, of documents, and then it will provide it as a part of the context or as a part of the prompt to the LMM, which again, you can specify which LM you want this to be here. So now we're we're talking about which LM is going to be able to connect to that knowledge bank and then respond appropriately. And you can specify again, maybe you want this to be, Claude, one of Claude's models, maybe you want it to be Bedrock. Many different options available to you. You can even do local models that are downloaded via Hugging Face. We can go into a little bit more depth around how that can look. But the idea again is that we provide you that flexibility such that you're able to, pick and choose using a drop down, which model makes sense for more use case. And you may be asking yourself, well, Dmitri, how am I supposed to know which model makes sense? I'm gonna have to test this for every single model. Well, we actually provide a really nice interface to help with this. This is something called prompt studio. And within prompt studio, you can write as many prompt examples as you like, and send it as an example to the l m of choice and then get a response back. And that will be a qualitative way for you to be able to understand, what that model is, is responding and if it's, sufficient for your needs. So in this case, I have an example of a model, that's Claude three SONNET responding to my question of what is a typical approval process. I can ask a similar question using GPT four o, which is what we're doing here. So I can run this. It's going to send that query off to the LM. It's going to get a response back. And if I, you know, have a few of these different prompts and I wanna be able to see their, results side by side, I can click the compare button and I get a a comparison view between both of the models and what their, responses were. So again, a qualitative means certainly for me as I'm developing this, this use case in order to get a sense of which model may be useful for for myself, in order to, to move forward. So now once I have, decided on which model I want, I want to utilize, then I can begin actually, running it against a set of reference answers. So this is kind of like my ground truth, quote, unquote, data, where I have some example questions that are sort of my gold standard questions. And then I have my reference answers, my labeled data that a human has written that I want to compare the results of the l m and this reference answer really to see how well the l m is doing, against these, these gold standard questions. And so what I can do is I can actually set a prompt recipe, which I have here, connected to that knowledge bank. And I can say, I want for each of these questions in my in my list of questions, answer it. And I I'm gonna point it to a specific LLM, and I can mix and match between different LLMs here. I can also specify different settings with respect to, giving examples for the LLM so it has a better sense of the structure I expect. You can also configure things like temperature and max number of tokens. We're gonna leave that as is for right now. But, as we go through this, as you can see, we get a set of responses that is the l m output that you see here, and then we're going to compare that to the reference answer that's provided. And the way that we do that is through this recipe that we see called the, evaluate l m. So this is the ability for us to more quantitatively assess, how the performance of the model is doing. So if we take a look at the settings of this recipe, what we can see here is we can specify the input datasets. Really nicely actually. Dataiku allows us to specify the type of task that we're doing as well. So you can see we have the option for question answering, summarization, translation. And based off of our, answer to this question or to this drop down, it will give us, different options and different recommendations for the type of quantitative metrics we wanna have as output. Because, again, we had the qualitative view from the prompt studios, but but now we wanna make this a little bit more robust. We wanna be able to actually actually take a look at some, quantitative metrics that we can use for comparison between these different models. We'll also specify the ground truth column, which was our reference answer that you saw earlier, and then the metrics that are of interest for us. And these are out of the box metrics. So nothing that you need to do in order to be able to, to define these. Of course, if you're unfamiliar with what these terms means, we have a lot of documentation both within the platform and within our our actual documentation in our academy that helps to describe what the difference between answer correctness and answer similarity is. But if I take a few of these options as metrics to compute, then I'll be able to get a better, more quantitative sense of how my, my model is is doing for this specific task. If I have my own custom metrics that I'm interested in, then I can choose to apply it here as well. In fact, I've created a simple one here that's taking a look at the average cost per response. So I'm taking a look at the number of tokens, multiplying it by the cost per token, for each of these models, and I'm getting a response as well. You can create your own custom metrics. We actually have some code samples for different types of metrics that you can see here that can help you, to develop your own. But the idea is you can define as many of these as you like. You can standardize off of them for your particular use case because, again, some LMs are better for some use cases, some are better for others. As we understand, as as Chad has described, there's not one model that's perfect for every use case, and it is a comparison and a and a trade off between performance and cost, often. But if I run this, this recipe, which I have set pointed to GPT four o, I'm gonna be able to do a comparison between how GPT four o did versus other models that I've also applied to my standard set of questions. And so now I'm gonna be able to run a more, kind of cohesive comparison, between all of these different LLMs. So as that's running, I'm going to head back to to the flow, and I'm going to check-in on what the, on the result is. So as you can see, I've already, applied this to to four different, models. So four o mini. I've actually applied this process to deep seek model as well and SONNET three seven is the latest model there, along with o one just for fun because I wanted to see, what the difference would have been between the number of tokens and the cost for a reasoning model versus a pure completion model. And I see GPT four o just finished, so I'll refresh my page, and I should be able to see, GPT four o as well. Perfect. You'll be able to see, graphs and information that correspond to these different models and their performance metrics. Of course, right now, we just see one entry of the performance metrics that you see here. But over time, as you have the model go through different questions that may be asked, and this can be pointed not just to reference answers, but to actual questions that may be asked of the user, then they can give their feedback back as well. You'll be able to see the performance metrics potentially go up and down over time, and you can use that information to inform whether or not you want to switch your, your element that you're using for your use case or not. So as you can see here, we have a few different model options available. The one that was highest performing from an answer correctness perspective was o one, but it also had, as you can imagine, the, highest cost per response compared to these other models, which is, of course, not surprising considering this it's a reasoning model. And so you have to make that trade off. Is this use case is that uplift, between o one and and four o worth the additional cost per response? I can't say definitively one way or the other. That's up to you as a as a user, as a business to decide. But if I want to dig into these metrics further and really understand for each question, what was the answer similarity? What was the BERT score? What does that mean? I can open up the evaluation in more detail. And on a row by row, on a on a answer by answer basis, I can see what the output of the LOM was compared to the reference answer that I see here. And I can also see on the right hand side here, the performance metrics, associated. And and keep in mind as well, these performance metrics or many of them are based off of the concept of LM as a judge. So I'm actually pointing to an LLM that is taking a comparison between this reference answer and the, answer generated by my original LLM And that kind of, referee LM or LM or the judge as it were is making a distinction of what the, score should be. So it's not gonna be perfect, but over time, it's gonna give you a sense or a direction of, of what the relative performance of these models are, especially if you keep that element as a judge consistent. Now taking a step back, you're probably asking yourself, well, how easy is it to connect to these different models in Dataiku and how do I provision access, especially from an IT or an admin perspective? How easy is that really? I'm gonna open up the connection window in Dataiku so that you can see how easy it really is. As you can see in the this admin portal that we have for the LM mesh, we have options for connections to all of the major providers. Right? Not surprising, we can connect to OpenAI, Cohere, Bedrock. We can host local models. We can connect to, Mistral model, Snowflake, Databricks, etcetera. And the way that you can connect is is very straightforward. For example, for OpenAI, all I need to do is, put in my API key and then specify the models that I want to provision access to. And this could be a lot of different types of models. So maybe you want four o, maybe you want 4.5 preview, which just came out in o one. And maybe you wanna turn on different models for embedding as well. And then maybe you also want to provision access at a per user, per group level more more likely. So in that case, you can specify which groups you want to have access to, these particular models. And you can do that for OpenAI, of course. You can also do that, for example, if if anyone's using AWS Bedrock, then with Bedrock, you can simply specify which models you wanna have access to. We already have connections into their, most recent release of their Nova, series of models, but also to their, hosted anthropic. So Claude three point seven SONNET is already available in this environment and, of course, many, many others that you see here. If you have your own custom deployments in Bedrock or other Cloud providers like with Azure OpenAI, for example, you could also bring them in here as well. It's staying within your secured cloud environments. And then last but not least, if you prefer to actually work with your own, locally hosted model, meaning either on prem or in your private cloud environment, but downloaded via Hugging Face, then we also have this option to connect to a Hugging Face model. So I think I already have a Hugging Face connection that's set here. And all you need to do is, add the Hugging Face API key and then this determine which model is from Hugging Face you want to have downloaded down to your Dataiku environments and then running on top of a Kubernetes cluster, that has GPUs allocated. And with Dataiku Cloud specifically, we make that incredibly easy for you to get started with running, hugging face models. Both from an infrastructure perspective and also from an, from a user perspective, you can, of course, specify which models that you wanna have working within your environment. We have numerous, model presets that you can see here for different model families. So think llama, mistral, gemma, many, many others. And you can also bring your own. So if you have your own, model that you wanna bring in from Hugging Face, that is not as a part of this list, but you know is available. Simply add in the Hugging Face ID, and it will become available to you. This is actually an example I gave with, with DeepSeek's, 7,000,000,000 chat model. I brought it in, and it's, working within the Dataiku environment as well. One other thing I want to call out is for each of these connections that you are supplying for your users, as Chad mentioned, you also are going to want to have a set of, guardrails, for the ability to filter out potentially both at the query and at the response level, what information is being either sent to the model or is it being responded back as a part of the completion. So you can add these guardrails at either a connection level or even more granularly at like a project or a use case level. But looking at this from a connection level, if I wanted to do things like, filter for, toxic, contents, for example. I can add that guardrail to my Hugging Face connection. I can enable it on both, queries and responses, and I can specify if I'm gonna use the OpenAI, toxicity moderation API or use my preferred local hugging face model as an example. If I wanted to add additional guardrails such as if I have a dataset that has a list of, like, custom forbidden terms that I wanna make sure that does never get sent out to the model, And if I want to also filter for, PII or personally identifiable information, names, emails, addresses, phone numbers, this sort of thing, I can I can set that? I can filter for prompt injection attempts. I can also, look to ensure or, provide kind of a validity check around the response format if I wanted to always be as, like, a JSON format, for example. So these are all things that you can set at a connection level from an IT perspective and then provision governed access for your users, your specific groups of users as it were, so that they can have governed access over working with these models within the Dataiku environment. And again, making it as easy as possible using our our visual recipes to be able to interact with the right model at the right time for that use case. We've talked about, performance already with respect to working with these models, and you've seen with the evaluation recipe how you can effectively, take the results of the models and then have it stored, for the performance of your use case. Over time, as you begin to develop a more robust, kind of tracking mechanism for, tracking the performance of these models, you'll be able to leverage our capabilities with respect to say, scenarios and and automation in order to be able to automatically alert yourself if the performance of your model drops below a specific threshold. So I can say if my model's answer correctness drops below 0.55, for example, automatically alert me as the owner of this project so that I can come in and review what's going on, potentially, sub in a different model that's performing better, at least based on my reference answers or my prompt studios. But that's from a performance perspective. What about with respect to cost? That's a very important question as well when working with, with these, with these models. Dataiku automatically logs every response and every completion that is being sent, to and from, Dataiku through the LM mesh. And as such, you're able to track both the performance as you saw, but also the number of tokens and the, cost of, of these models. And the way this can look is through something like a dashboard like this, where you'll be able to see the overall cost across all of your projects and all of your users and connections. Over time, you'll be able to see that are broken down by project, but you're also able to filter it. So for example, by a particular connection type. If I wanna see how much I'm paying out to OpenAI, I can filter on this. And then if I wanted for a particular project, I can do that as well. If I want to, take a look more kind of holistically, of course, I can scroll down and I can see the total cost per project, over time. So on a weekly or on a monthly basis, are we spending more or less on these l m calls? Who are the users who are spending the most? So maybe I wanna dig in further and understand who these individuals are and what sort of projects they're working on, and if the cost to benefits ratio is worth it, from their perspective. And then, of course, which provider we're spending the most on because we have with LMS, the option for as many providers as we like. But this is more of like a historical, look at, at how much we're spending. What if I wanted to get in front of of these costs? Because maybe we have a user who's, going rogue and and sending a lot of queries all at once. I wanna be able to, stop that before it becomes a really big issue. Well, with Dataiku's cost guard, you are able to to support that. We have this concept called proactive blocking through cost controls where you can specify specific, as many as you want, quotas, for your models or for your projects or for your users, such that if any of those quotas are reached, you can take immediate decisive action. We have an example of a quota set here, where the logic is if the connection is, any of our OpenAI connections or I can even make this more specific. I can say, if the provider is OpenAI as, as an example, Then I want the quota to be a max of $500, per month. And I want the reset period to be every month. You can actually configure that here. But I want it to be $500 every month. You can see the current cost, so you know how close we are to this quota, and then you can specify whether you want this to be a hard or a soft quota as well. So you can specify where if that $500 is actually a threshold is reached, block any subsequent queries from happening such that, there's no further queries being sent to, to OpenAI in this case. You can also make it a soft, threshold such that maybe I'll be alerted at a specific threshold, maybe 75, then also at a hundred, I'll be alerted, of, of of that being reached. But our users are still able to to make that call. But I'm gonna, for sure, go ahead and and talk to them about why they're spending so much on, on this connection. By the way, this isn't strictly at the connection level. You can set this to be, by a particular provider, by a particular, connection itself, or even for a particular user or set of users. Maybe we provision a specific level of, compute costs or API costs, for each of our user groups. And after that, they're they run out, or you can do it by project as well. And then at a, let's say, at a production level, once you have these quotas available, you also wanna make sure that it never reaches that threshold. So you can set as many of these cost controls and of these, these quotas as you like, so that you can be sure that your cost with respect to LMs never goes above a specific threshold. And if it does, you can be alerted of it, and you can take steps to ensure that we don't spend any more than necessary so you can feel confident in your budgets, as it relates to working with these, with these models. Awesome. And then last but not least, I also wanna call it the fact that Dataiku has governance capability. It's baked in, throughout the platform through our, through our govern node. And what this allows, for is the ability to, provide a comprehensive view of all of the use cases that are leveraging LMs in, in this case. And not just that they exist, but also who owns that use case, what model provider is being utilized. You can provide documentation, throughout the govern node such that you'll be able to understand, you know, what sort of intake process did we do, who approved the, this use case to to happen. And so all of this documentation and and there's a workflow engine that's baked in to the govern node that you'll be able to leverage such that you're gonna be able to have full governance and oversight over any of these use case that I described earlier, whether it's text to SQL, whether it's an agent related use case, whether it's a RAG related use case. All of that can be, cataloged and then set through an intake or a workflow process that you can fully customize and define and provide the capability for sign offs prior to moving into, say, a production state. And all of that being facilitate facilitated, I should say, through the Dataiku govern node. Of course, we'll we'll dive into this further in a subsequent session or subsequent webinar chat, which we can we can give a hint to. But hopefully, this gives you a a good sense of some of the capabilities in Dataiku, solving for those three key pillars that we described earlier, with respect to flexibility, knowing the right LM for each use case through evaluating their performance metrics both qualitatively and quantitatively, and having full control over the usage and cost of these particular models. Totally. Thank you so much, Dmitri. That was a fantastic demonstration of Dataiku and how we solve for those three, critical success points. Just to round everything out, I'm gonna go ahead and share my screen just one more time for for old time's sake. And then we'll go ahead and hop into many of your guys' questions. So we covered absolutely a lot of ground today by exploring those three critical factors for a modern successful AI architecture. So I'll leave you with the three takeaways that we talked about at the top of this session. First, a modern AI architecture powers both your agents and GenAI applications while dramatically reducing your time to insights by up to 90% as I spoke about earlier. So organizations that implement flexible component based architectures avoid the costly rebuild rebuild cycles we discussed and can adapt quickly as models evolve. Second, systematic LLM evaluation is no longer an option. We need it to be mandatory. By implementing a strategic approach to model selection, you can optimize cost while simultaneously improving performance for each specific use case. This balanced approach really ensures you're using the right tool for the right job. Lastly, proper governance controls provide dual protection safeguarding you from unexpected cost increases as vendors' prices are changing and you maintain that robust security protocol that protects your organization from sensitive data. So the bottom line is this, moving from reactive fragmented approaches to a cohesive AI architecture is what separates organizations that merely just want to experiment with GenAI and those that can actually put it in production and have it transform their business. So thank you all for joining us today. Dmitri and I are happy to answer any of the questions that you have. I see there are a ton of questions in the QA section, so we do appreciate you all. So we'll go ahead and start answering them. So one that I have not addressed via chat is you showed the user selecting a model from a drop down. Any thoughts or plans on dynamic model selection? Meaning, based on the use case and the context of the request, route to the most appropriate model to respond to that request. Most appropriate can be most accurate, most fine tuned, lowest cost, lowest latency. In our roadmap, I don't believe we have any plans of creating dynamic model selection. I do think that's a very interesting use case that can definitely be done in Dataiku. Dmitri, I don't know if you want to expand on that. Yeah. I actually can see it being done through agents. It would be some work to define the flow such that you're able to accomplish this Because, essentially, if you wanted to do it, quantitatively, you'd have to, be able to execute a process to run, like, an eval recipe against that model and generate the performance metrics for some set of data and then have an agent, essentially have a tool be able to pick up that information and then respond to the request from the user dynamically based on the results of that evaluation. So technically, it would be possible. Not something, to Chad's point, we have out of the box today, but certainly doable with the the extensibility and the flexibility of the Dataiku platform and our agent features. Yep. Alright. So moving forward, we also have a question about, verifying my model quality. Can it be one shot for qualitative comparison as well? So when you say qualitative comparison, I I get the understanding that it's, you know, looking at these answers or these responses side by side or even having, the LLM as a judge kind of look at them side by side and provide to you, some sort of, you know, answer. And and that's yes. I think Dmitri showed that off during the, demo within the evaluate recipe. You can see the row by row analysis. And In this demo, we showed just text. That's also the case for multimodal. If you were looking at images, in the near future, you will be able to see these images side by side as well to understand how these models are processing. Yeah. There's that. We will continue through these questions. Somebody said, in other words, there is no need to fine tune LLMs in Dataiku as various models are available as per use case. Correct? And, Dimitri, I'll go ahead and pass that one to you. Yeah. We don't wanna say blanket statement. There's no need to fine tune LLMs because, there is some valid reasoning sometimes to to to fine tune models. And in fact, in Dataiku, you can fine tune LMs through our interface, either through code or through visual recipes. So that is a capability that we provide. But you were you are right in the sense that the more options that you provide for your users with respect to different LMs, the less likely it is you'll need to fine tune because, it'll be less work just to pick and choose the right model for your use case versus going through the process of fine tuning. This is true. Okay. Moving forward, will Dataiku provide customer support initially, until the users become acquainted with the Dataiku UI and infrastructure? Yes. That's something that we definitely do. We are big on on our services, and we offer many different services to make sure that our customers are up and running smoothly. It can actually generate value from Dataiku. So we have, you know, data scientists, we have use case ideation, we have, field engineers that'll get you up and running in terms of infrastructure. So, there are many different ways that we support, our customers once they become Dataiku customers and make sure they're up and running smoothly. Which agent framework do we support? I think I may have answered this one, but, on the back end, we support lang Langchain as well as LangGraph. So you can create agents visually and with code. There are webinars that we've done previously that show that off. There's also a webinar that's coming up, which I'll highlight in just a little bit, that'll show that off as well. How do you compare your offering with Azure and AWS? You know, Dmitri, I'll actually pass this one off to you since you you work a lot with our partners. Yeah. It's it's it's a very fair question. I would say that, you know, Azure, AWS, they are they provide a lot of different services and different offerings. Typically, we actually partner with them. We can deploy Dataiku, in your, preferred cloud and run Dataiku along with connecting to various services that those companies provide. So for example, for Azure, it would be like Azure OpenAI or working with Synapse or Fabric or other, services. And then with AWS, it would be similarly Bedrock, SageMaker, Redshift, etcetera. They also have, to a degree, their own capability with respect to working with, of course, LMs and also doing, data processing and machine learning. But they are all their own, I would say, service or product within that suite. So Dataiku is unique in the sense that it's holistic. You have one product that does all the way from data access, data prep, all the way through to machine learning, working with GenAI models, and then the governance and, auditability around all of that, like we showed you earlier. And what's more is that our interface is, supported by both, coders and non coders alike so that you have flexibility with respect to how you're working within the environment. So, you can certainly try to make a comparison, but I would argue that we are quite unique in the market as it relates to our, our offering. Totally. Totally. I would and I would definitely agree, with that statement. Heading to the next question, can the Dataiku tool suite work with an in house hosted LLM versus the commercial LLMs? Yes, we definitely can. We showed it a tiny bit in the demo of working with a locally hosted Hugging Face model, that model being DeepSeq. But it could have been really any model that we want to choose, whether it was an open source model like Mistral or, if you had your own locally hosted model, we could definitely use that through that connection, to be used in in your flows and in our visual recipes and whatnot. We have another question. We have standardized on Red Hat's, OpenShift for Kubernetes. How does this factor into implementation of Dataiku? Is this a problem? Dmitri, I'll pass that to you. Yeah. No. It's it's not a problem. We work with many customers who are leveraging, OpenShift for their Kubernetes cluster. We would integrate and support that connection. We would use, like, the base Kubernetes process. And we would be able to, for example, run a a Hugging Face model leveraging that OpenShift Kubernetes cluster along with many other processes we can do on Kubernetes. Awesome. Awesome. We have one final question. We are exploring how Dataiku can help us evolve it with, AIML, including automated score analysis, natural language report generation, coaching workflow support, and research assistance. Can we get a personalized demo later sometime? So I would definitely say reach out to, us, and and we can definitely get something set up with, one of our fabulous SEs, or, one of our, CSEs, depending on your account. But, yeah. I would definitely say reach out to, your customer support team, and and we can look at making that happen. Last but not least, do you support creation of MCP server? So from my understanding, not currently, but that is something that's on the road map that we we are definitely thinking about. So when there are any updates around our integration with MCP, we will definitely let you know. Real quick before we go, that seems like all the questions. I just wanted to make everyone aware that we do have another webinar coming up next week at 11AM, show and tell inside the agent toolbox. So we didn't talk about agents a lot today. We we referenced it. We showed it off a little bit, but we didn't dive in. This will dive into agents. So if you are interested in agentic technology, definitely come back, and, you will see agents in all their glory in Dataiku. So thank you all for joining, and you will get a recording of this very soon. We appreciate you all and hope you have a great day.