AI workflows as living systems

AI workflows fail if you treat them like one-and-done automation projects

November 7, 2025

By Yoav Naveh

Most leaders make the mistake of thinking about AI workflows the way they think about RPA: document the process, collect data, and automate. In fact, findings from a Forrester-commissioned study found 60% of global AI leaders view RPA-driven platforms as the most valuable for AI-driven process management, which signals an alarming tendency to treat those workflows like traditional automation.

But treating AI the same as RPA won’t work. RPA is limited to executing your workflows exactly as documented, and it simply breaks whenever human judgment is required. AI agents, on the other hand, are expected to make human-like decisions. They fail when they’re set up like one-and-done RPA implementations, then run headfirst into gaps left by undocumented institutional knowledge and endless edge cases. The solution is to start treating them as living systems that continuously learn, as you would a human team.

4 reasons old workflow design breaks before launch

Most automation projects fail because the design assumptions are wrong. Leaders expect processes to be neat, fully captured, and stable. In practice, they’re messy, undocumented, and always in flux, which is why “document and automate” falls apart before you ever get to scale.

1. Your data probably isn’t good enough

Leaders often assume that applying AI to their data or documentation will produce reliable results without having a clear understanding of the quality of their data management practices. The quality of your AI inputs determines the quality of your AI outputs.

A Gartner survey of 1,203 data management leaders in July 2024 found that 63% of organizations either do not have or are unsure if they have the right data management practices for AI. The same research warned that companies often fail to recognize how different AI-ready data requirements are from traditional data management. Due to this gap in preparation, Gartner predicts that through 2026, organizations will abandon 60% of AI projects that are not supported by AI-ready data.

2. Hallucinations and edge cases are the rule, not the exception

In software development, teams often default to testing the “happy path,” or the ideal scenario where everything goes right. AI workflows fall into the same trap because leaders assume their teams run on neat playbooks, but most processes aren’t written down at all. When they are, the documentation is usually outdated, and what actually drives execution is institutional knowledge; tacit shortcuts, workarounds, and expertise that live in people’s heads, not in process docs.

A Panapto survey of more than 1,000 U.S. workers found that 42% of institutional knowledge is unique to the individual and not shared with coworkers. When an employee leaves, nearly half of their role becomes impossible for the rest of the team to perform. If AI is meant to pick up even part of that slack, it’ll break or hallucinate without that institutional knowledge.

Hallucinations are a byproduct of missing context, but they also happen because of how language models themselves work. These systems are trained to generate fluent answers, and when they don’t know, they tend to guess rather than stay silent. That means hallucinations will appear even in well-documented processes, and edge cases only make the problem worse.

3. You can’t solve for mistakes with a massive amount of historical data

It’s tempting to think you could solve for breakdowns or hallucinations caused by edge cases with a massive amount of historical data, used to train the AI to cover every scenario. In practice, that’s impossible because you’ll never have a complete archive of everything that could happen structured in a way that’s easy to extract. Even if you did, it probably wouldn’t be enough.

A recent article in Issues Journal argues that large datasets systematically omit certain kinds of information because the very methods of standardization and scalability filter out the nuanced, context-dependent details that often decide how a process really works. Plus, even if you design for today’s edge cases, tomorrow’s will look different.

This breakdown has a name: model drift. It’s the decline in accuracy that happens when the world shifts from the conditions the model was trained on. Research from AI Multiple shows that 91% of machine learning models degrade within one to two years of deployment, which means almost every AI system you launch will lose effectiveness unless it has a mechanism to keep learning.

4. You can’t treat AI like an extension of your system of record

Most enterprises have a clear "system of record,” which could be a system like their ERP, core database, or CRM. These systems are designed to be rigid, with strict permissions and governance to maintain data integrity. But AI workflows fail when companies try to make their system of record do double duty as the place where work actually gets done.

What's missing is a "system of work," a fluid layer where decisions happen, exceptions get handled, and processes adapt as business conditions change. Unlike your system of record, which needs to be deterministic and unchanging, your system of work needs to evolve like a human would.

AI isn't an opportunity to improve on your system of record, which already has its place. Instead, think of AI as an opportunity to build that system of work on top of your existing systems of record. It can pull the structured data it needs while operating with the adaptability and judgment that actual work requires. AI integrations are about building that layer to make decisions and take actions while keeping your core systems stable and governed.

A new playbook makes AI workflows resilient

If the old way of designing workflows fails because it assumes processes are complete and stable, the new way starts by admitting reality: your processes are messy, incomplete, and constantly changing.

Instead of chasing perfect documentation or exhaustive data, build resilience by starting small, integrating with the systems you already use, and designing guardrails so the AI can admit when it doesn’t know. Most importantly, you keep humans in the loop to teach, audit, and retrain the system over time.

Start with 20 samples and scale through human-in-the-loop learning

Building an AI workflow is less like flipping a switch and more like assembling a puzzle. You don’t need every piece to start, just enough to see edges and start filling it in. Twenty real-world samples are often enough to give the AI a baseline for what “normal” looks like. For a global carrier, that might be 20 past shipment updates to teach the system how status reporting should flow. For a bank, it could be 20 recent account openings to show what valid documentation looks like. From there, the AI runs on what it already understands.

Even if you had 10,000 samples won’t capture every edge case or predict future ones, so chasing them is irrelevant. What matters is building a clear sense of what “normal” looks like and relying on human-in-the-loop interactions when something falls outside that pattern. When it hits a case it doesn’t recognize, like a missing tracking ID in a carrier email, or a payslip in an unfamiliar format, the system escalates to a human. The human does the work, and each correction is captured and applied as new training data. Over time, AI learns directly from these human-in-the-loop interactions, filling in more of the puzzle until institutional knowledge and edge cases are captured.

Integrate with the systems you already use

Most enterprises still run on legacy stacks that might include ERPs, TMSs, CRMs, and EHRs. Replacing them isn’t practical, and forcing employees to work in a parallel AI tool only adds friction. Being able to integrate with those legacy systems is what makes AI useful.

Take a hospital as an example. If AI is meant to review prescriptions but sits outside the EHR, it can’t see the patient records in context. It will miss the edge cases that matter most, which any hospital would consider a failure. But if it was integrated directly into the EHR, it could surface missing test results or flag a potential prescription conflict while the physician is writing the order.

Design guardrails to stop hallucinations and add confidence layers

Guardrails keep AI systems inside boundaries. In practice, that means limiting the information sources it draws from to constrain the actions it’s allowed to take, and setting thresholds for when it must escalate to a human instead of guessing. In tandem, confidence layers act like checkpoints between the AI model and the workflow. They force the system to test its certainty before acting and if confidence falls below a set threshold, the AI stops, signals “I don’t know,” and escalates to a human instead of guessing.

Maybe a law firm that wants to use AI to review contracts sets guardrails that require the AI to analyze documents only against the firm’s validated precedent library to exclude the open web. Every output has to tie back to an exact clause in the contract. Or, perhaps a government agency processing benefit applications, might rely on confidence layers to protect accuracy. The AI can auto-approve an income document if it matches known templates, but if it can’t validate the extracted values, the confidence layer intervenes and routes the case to a caseworker.

Monitor constantly to catch drift early

Once AI goes live, you need systems that detect when accuracy starts to slip. In practice, that means feeding the AI a steady stream of labeled samples, comparing its outputs against truth, and setting thresholds that trigger alerts when error rates rise.

A trading platform that uses AI to verify source-of-funds documents might automatically route 5% of deposits each week to compliance officers for manual review. If the AI starts misclassifying new types of payslips or missing details in property statements, the variance could be flagged before it spreads. Another layer could be input tracking. If the mix of document types suddenly shifted or new formats appeared, the system could raise a warning that its training data no longer reflects reality.

AI should be trained continuously and automatically

AI workflows succeed when you treat them as living systems that adapt, learn, and improve with your business. The advantage of human-led workflows is the human ability to make decisions by adjusting as processes and conditions evolve. AI has to be able to do the same to be successful. The companies that win with AI will be the ones that build a foundation for continuous co-learning where the AI sits on top of the systems they already use, absorbs new edge cases as they appear, and strengthens over time.

Think of AI less like a project you finish and more like an infrastructure you grow. With the right partner, you don’t have to rebuild your organization around it; you just make AI part of how your organization already works, with the same adaptability you expect from your people, and let it compound from there.