Pilot Purgatory - Part 5

Pilot Purgatory

A Story of AI Transformation

By Scott Weiner (AI Lead at NeuEon, Inc.), inspired by conversations with Erwann Couesbot (CEO of FlipThrough.ai)

Note: This is a work of fiction. All characters, companies, and events are fictional composites created for illustrative purposes. While the industry statistics cited are real and sourced, the narrative is designed to illuminate common patterns in enterprise AI adoption, not to depict any actual organization or individuals.

This is Part 5 of a serialized story exploring why enterprise AI initiatives fail, not from lack of technology or talent, but from invisible organizational dynamics that doom them from the start.

Reading the series for the first time? Start with Part 1: The Mandate

Previously in Pilot Purgatory…

The departures began with Daniel.

A LinkedIn message. A recruiter’s pitch. A 40% raise for pure AI work, no legacy maintenance, no context switching. Daniel walked into Marcus’s office with the look people have when they’ve already made a decision but haven’t said it out loud yet.

“The structure doesn’t support it,” Daniel said. “AI needs full-time attention from people who have full-time AI jobs. That’s not what we can offer here.”

He was right. Marcus knew he was right. That was the worst part.

Two weeks later, Maria followed. Then James. Three AI engineers in three months. Each departure took knowledge that couldn’t be documented, transferred, or replaced.

Marcus calculated the invisible cost: six months to competency, fourteen months average tenure, eight months of full productivity before the market pulled them away. $150,000 to $200,000 per departure in training, mentoring, and lost productivity. Three departures meant half a million in invisible costs.

He typed the words into a spreadsheet: “The Invisible Talent Tax.”

Thornfield was training AI engineers for other companies. Investing in skills that would be deployed elsewhere. Building a talent pipeline that flowed outward instead of upward.

The competitors understood this. They built structures that supported focus. Created roles that allowed depth. Paid premium rates because they could extract premium value.

Thornfield couldn’t match the structure. And the chatbot was finally ready to go live…

Chapter 5: The Incident

For two weeks, everything worked.

The chatbot had gone live on a Tuesday, quietly integrated into Thornfield’s customer service portal without the fanfare that Marcus had worried might invite scrutiny. A soft launch, the team called it. Let the system prove itself before announcing success.

And it was proving itself. Customer queries that had taken fifteen minutes to research and answer were being resolved in under a minute. The accuracy rate held steady at ninety-two percent, better than the eighty-seven they had achieved in testing. Customer satisfaction scores ticked upward for the first time in three quarters.

Sarah Martinez watched the metrics from her security dashboard, the one she had built herself because no one else had thought to monitor AI systems for anomalies. Green indicators across the board. Response times within tolerance. No unusual patterns in the query logs.

She wanted to feel relieved. She didn’t.

The OWASP Top 10 for LLM Applications still sat on her desk, its highlights fading from months of exposure to fluorescent light, with yellow for addressed, orange for deferred, and red for ignored. The red sections hadn’t changed.

The customer was a maintenance supervisor at a food processing plant in Ohio. His name was Gerald Patterson, though Sarah wouldn’t learn this until the incident report was compiled. He had been using Thornfield equipment for twelve years, long enough to know what questions to ask.

His query was straightforward: “Does the extended warranty on the Model 12 controller cover firmware updates?”

The chatbot responded within four seconds.

“Yes, the 5-year extended warranty on the Model 12 industrial controller includes comprehensive coverage for all firmware updates, including security patches, performance improvements, and feature additions. This coverage is included at no additional cost for the duration of the warranty period. The warranty also covers any hardware issues that may arise from firmware installation. Would you like me to provide the complete warranty terms or schedule a firmware update with our technical support team?”

Gerald Patterson saved the response. He took a screenshot with his phone, the kind of quick capture that working people do when they need to remember something important. Then he forwarded the screenshot to his procurement department with a note: “Confirms what I thought. Let’s get the extended warranty on the new units.”

The procurement department processed the order. Fourteen Model 12 controllers with extended warranties. Sixty-three thousand dollars in hardware. Twelve thousand dollars in warranty coverage.

Three weeks later, a firmware update bricked two of the controllers. Gerald Patterson called Thornfield support expecting a warranty replacement. Thornfield support informed him that firmware updates were explicitly excluded from extended warranty coverage.

Gerald Patterson pulled up his screenshot.

Sarah’s phone buzzed at 6:47 AM on a Thursday.

The alert was from her monitoring system, the one that flagged unusual patterns in customer service escalations. Three escalation tickets had been filed in the past eight hours, all referencing chatbot responses, all tagged as “urgent” by the customer service team.

She pulled up the tickets on her phone, still in bed, the early morning light filtering through bedroom curtains she hadn’t opened yet.

The first ticket was from Gerald Patterson. The screenshot was attached.

Sarah stared at the chatbot’s response, confident, specific, detailed, and completely, demonstrably wrong.

She checked the actual warranty terms. The document was unambiguous. Section 4.3: “Extended warranty coverage explicitly excludes damage arising from firmware updates, software modifications, or operating system changes. Hardware coverage is limited to manufacturing defects and component failures under normal operating conditions.”

The chatbot hadn’t just been wrong. It had been wrong in a way that contradicted the explicit language of the document it was supposed to be reading.

Sarah’s stomach dropped the way it did when she caught the first sign of a breach. The same cold certainty. The same racing calculation of how bad this could get.

She was dressed and in her car within fifteen minutes.

The war room convened at 8:30 AM.

Marcus had called the meeting after Sarah briefed him in the hallway outside his office, his face going through the same progression hers had: confusion, then realization, then the grim determination of a crisis that needed managing.

The conference room filled with people Sarah had seen in demos and progress reports but rarely worked with directly: the customer service manager, the legal team’s AI point person, Linda from procurement, Priya from data engineering, and Marcus standing at the head of the table with the stress ball nowhere in sight.

“Walk us through what happened,” Marcus said.

Sarah pulled up the screenshot on the conference room display: Gerald Patterson’s capture, complete with the red arrow he had added in Microsoft Paint pointing to the warranty coverage statement—the kind of amateur annotation that turned a customer complaint into potential evidence.

“The chatbot was asked about firmware coverage under the extended warranty,” Sarah said. “It provided a detailed response that directly contradicted Section 4.3 of our warranty terms. The customer relied on that response to make a purchasing decision. Three weeks later, the actual warranty exclusion was enforced, and the customer discovered the discrepancy.”

“How many customers received similar responses?” Marcus asked.

Sarah had already checked. “Based on our query logs, seventeen customers asked warranty-related questions in the past two weeks. I’m still analyzing the responses, but at least four received information that contradicts our documented terms.”

The room went quiet—the kind of quiet that happens when a problem reveals itself to be larger than anyone had hoped.

“How is this possible?” Linda asked. “The chatbot was supposed to be reading our documentation. We fed it all the warranty information.”

Sarah had been asking herself the same question since 6:47 AM. “It was reading the documentation. But large language models don’t always retrieve the right sections. Sometimes they fill gaps with plausible-sounding information that happens to be wrong. It’s called hallucination.”

“The computer made it up?”

“In a manner of speaking, yes.”

The legal representative, a young attorney named Michael Chen who handled technology contracts, leaned forward. “So the computer just… made this up? It invented warranty terms that don’t exist?”

“It generated a response that was consistent with the pattern of the question but not grounded in our actual documents.” Sarah paused. “The model is designed to be helpful. When it doesn’t have a definitive answer, it sometimes provides a plausible one instead of admitting uncertainty.”

Michael’s expression shifted. The look of someone recalculating the legal exposure in real time.

“We need to know how many customers relied on incorrect information to make purchasing decisions,” he said. “And we need to document everything about how the system was developed, tested, and deployed.”

“Already started,” Sarah said.

Marcus’s voice cut through the follow-up questions. “What do we do right now? Today?”

Sarah had thought about this during the drive in. “First, we pull the chatbot. Put human review on every response until we can implement proper guardrails. Second, we identify all customers who received warranty-related responses and verify the accuracy of each one. Third, we implement the output validation I recommended four months ago.”

The reference to four months ago landed in the room like a stone. Sarah hadn’t meant it as an accusation, but the implication was clear. She had warned them. The warning was in the backlog. The backlog had not been prioritized.

“Do it,” Marcus said. “All of it. I’ll brief David within the hour.”

The meeting dissolved into action items and owners. Sarah stayed behind, watching the screenshot still displayed on the conference room screen. The red arrow pointing to words that had seemed so authoritative. The confidence of a system that didn’t know it was wrong.

She had been right. She took no satisfaction in it.

The remediation took three weeks and cost more than anyone had budgeted.

Human review meant routing every chatbot response through a customer service representative before it reached the customer, adding seven minutes to average response time and causing customer satisfaction scores to drop as the efficiency gains that had made the chatbot worthwhile evaporated.

Sarah implemented the input and output validation she had proposed months earlier: prompt injection protection, response verification against source documents, and confidence scoring that flagged uncertain answers for human review. The work took two engineers two weeks, time that had been available in month three if anyone had prioritized it.

The affected customers were contacted individually. Thornfield offered to honor the warranty terms the chatbot had described, a decision that cost ninety-three thousand dollars and generated approximately zero goodwill. Gerald Patterson posted his screenshot on a manufacturing industry forum with the caption: “AI telling customers whatever they want to hear. Welcome to the future.”

The post got four hundred shares.

The observability meeting happened a week after the incident was contained.

Marcus had called it after reviewing the incident response timeline. Too much of the diagnosis had been manual. Too much of the impact assessment had been guesswork. They had no systematic way to know what the AI systems were doing, saying, or costing.

The vendor was a company Sarah had researched months earlier, when she first started worrying about AI monitoring. Their product was an enterprise LLMOps platform with prompt logging, cost attribution, and anomaly detection—the kind of infrastructure that should have been in place before a single customer query was ever processed.

The sales engineer walked through the capabilities with the enthusiasm of someone who had given this presentation many times. “We capture every prompt and response. We calculate token costs in real time. We flag patterns that suggest hallucination or drift. You would have seen the warranty issue within hours, not weeks.”

“What’s the cost?” Marcus asked.

The sales engineer pulled up the pricing page. “For your volume, we’re looking at approximately $200,000 annually. Implementation typically takes four to six weeks.”

Two hundred thousand dollars. Half of the original AI pilot budget. Money they would have to find somewhere, for infrastructure that would have cost a fraction as much if they had built it from the start.

“We’ll need to go to David for budget approval,” Marcus said.

“I understand. Happy to join a call if it helps make the case.”

After the vendor left, Marcus sat in the conference room alone. Sarah found him there twenty minutes later, staring at the whiteboard where the sales engineer had drawn architecture diagrams.

“We should have done this in month one,” he said.

“Yes.”

“You told me.”

“Yes.”

He turned to look at her. The exhaustion on his face was visible in a way it hadn’t been before the incident. “I’m sorry. I should have listened.”

Sarah nodded. The apology was genuine, which made it worse somehow. He wasn’t malicious or incompetent, just stretched too thin to give attention to things that seemed less urgent than other things.

“It’s not just about listening,” she said. “It’s about structure. You have a hundred priorities competing for a hundred percent of your attention. Security was never going to win that competition. Not until something went wrong.”

“What would have made it different?”

She thought about the question, about the org chart that put AI as a side project for people with other jobs, the backlog that swallowed priorities, and the demo culture that rewarded visible progress over invisible protection.

“Someone whose job was to make these decisions,” she said. “Not as a side project. As their actual job. Someone who woke up every morning thinking about AI governance, not someone who fit it in between cloud migrations and security audits.”

Marcus was quiet for a long moment. Through the conference room window, the engineering floor was visible, less populated than it had been before Daniel left.

“We don’t have anyone like that.”

“I know.”

“Should we?”

It was the closest he had come to naming the problem. Sarah felt something shift in the conversation, the recognition of a truth that had been waiting to be spoken.

“I think so,” she said. “But I’m not the one who makes that decision.”

Priya Gupta watched the procurement AI fail on a Tuesday afternoon.

She had been monitoring its outputs for weeks, ever since the vendor data integration went live. The system was supposed to analyze purchase patterns, predict demand fluctuations, and recommend optimal order quantities. It had worked beautifully in testing, when the data had been clean and the patterns had been obvious.

Production was different.

The system was reading from the same database she had warned Marcus about. Fifteen years of accumulated inconsistencies. Products with conflicting specifications. Pricing tiers that didn’t match. Compatibility information from three different eras of data entry.

The AI consumed all of it and produced outputs that were confidently, consistently wrong.

She pulled up a sample recommendation. The system was suggesting a reorder of 500 units of a thermal coupling that had been discontinued in 2019. The historical purchase data showed steady demand because no one had updated the status field when the product was end-of-lifed. The AI didn’t know the product no longer existed. It just saw the pattern and extrapolated.

No one would catch this error until someone tried to place the order. Then there would be a meeting, and a root cause analysis, and someone would ask how the AI could have made such an obvious mistake.

The answer was always the same: the AI was only as good as its data, and the data wasn’t good.

Priya documented the issue in a report that would join the others in the growing file of problems she had predicted. The chatbot incident had gotten attention because customers were affected. The procurement AI failures were silent, hidden in the noise of normal operations, waiting to become visible when someone finally checked.

She felt no satisfaction in being right. She had never wanted to be right about this.

The team meeting happened on a Friday afternoon.

Marcus had called it to discuss lessons learned from the chatbot incident. The conference room was less full than it had been for the crisis response. Three people were missing: Daniel, who had left for Terfim AI; Maria, who had followed him to a competitor; and James, who had taken the healthcare startup offer.

Their absence shaped the room like negative space.

“I want to talk about what happened and what we’re going to do differently,” Marcus said. “Not to assign blame. To learn.”

The team nodded, their energy flat and exhausted, because the incident had taken something out of everyone, not just hours of crisis response but confidence in the work they had been doing.

Sarah walked through the timeline again: the gap between the security recommendations and their implementation, the monitoring that wasn’t in place, the validation that should have been.

“Why weren’t these things prioritized?” someone asked. Not accusatory, genuinely curious.

Sarah looked at Marcus. Marcus looked at the table.

“Because we were focused on features,” he said finally. “Because the board wanted demos. Because security felt like friction, and we were already behind schedule.”

The honesty landed heavily in the room.

“What changes now?” Linda asked.

“Security reviews before deployment, not after. Monitoring infrastructure as part of the baseline, not an add-on.” Marcus paused. “And we need to be honest about what we can actually do. We’re not a company with a dedicated AI team. We’re a manufacturing company trying to add AI to everything else we do.”

“Is that enough?”

The question hung in the air. No one had a good answer.

The other pilots died quietly.

The code review AI, which was supposed to catch bugs before they reached production, had been shelved after two months of inconsistent results when the system caught some issues and missed others with no discernible pattern. The engineering team had stopped trusting its outputs and gone back to manual review.

The predictive maintenance analysis had produced its first alerts, but no one could explain why the models were flagging the equipment they were flagging. When maintenance asked for the reasoning, the answer was effectively: “The AI thinks this pump will fail, but we can’t tell you what patterns led to that conclusion.”

Unexplainable predictions were worse than no predictions because maintenance couldn’t prioritize work based on a black box, couldn’t justify costs to their managers, and couldn’t trust recommendations they had no way to validate.

The pilot continued in name, the status updates still appearing in weekly reports, but the energy behind it had dissipated. Another project that worked in demos and failed in the complexity of actual operations.

Sarah found a paper that evening, scrolling through her industry feeds while trying to decompress from the week.

“Governance-First AI: How Structure Enables Speed”

The author was a researcher at a university in California, and the abstract made claims that would have sounded counterintuitive six months ago: organizations that implemented AI governance frameworks before deployment moved faster than those that didn’t. The friction of governance was less expensive than the friction of crisis response.

She downloaded the paper and started reading. The case studies were familiar. Companies that had rushed to deploy and spent months in remediation. Companies that had built guardrails first and shipped with confidence. The patterns were consistent across industries.

One case study caught her attention: a logistics company that had spent two years and $3 million before realizing their VP of Engineering couldn’t be both a full-time executive and a part-time AI strategist. The parallels were uncomfortable.

She highlighted a sentence from their post-mortem: “We kept asking why the technology wasn’t working. The technology was fine. We were the problem.”

She sent it to Marcus without comment.

His response came at 11 PM: “I know.”

Two words, but they contained something new: recognition.

The incident had changed something—not enough to fix the structural problems, but enough to make them visible in a way they hadn’t been before. The slow-motion failure was becoming undeniable, and Marcus was starting to see it.

She closed her laptop and tried to sleep.

The next board meeting was in six weeks. Jennifer Park would ask questions about progress and expenditure and outcomes. The answers would be harder to give than they had been three months ago.

Something was going to have to change. The only question was whether it would happen before or after the next crisis.

To be continued…

What happens next: The chatbot is offline. The team is demoralized. The board meeting looms. Marcus sits alone late at night, reading industry research about AI adoption failures, confronting uncomfortable truths about what went wrong. Chapter 6 follows the weight of accumulated decisions—and the question that begins to form: What would have been different?

Part 6 publishes February 4, 2026.

Why we wrote this

Scott Weiner is the AI Lead at NeuEon, Inc., where he helps organizations navigate the complexities of AI adoption and digital transformation. This story draws from patterns observed across dozens of enterprise AI initiatives.

Erwann Couesbot is the CEO of FlipThrough.ai, specializing in AI strategy for professional services. His conversations with technology leaders inspired many of the dynamics explored in this narrative.

Reading the series for the first time? Start with Part 1: The Mandate

Missed Part 2? Read Chapter 2: Foundations

Missed Part 3? Read Chapter 3: Procurement

Missed Part 4? Read Chapter 4: Departure

Want to read the complete story?

Yes, please continue to keep me abreast of the latest thought leadership and advice.

*Required field

Have your own AI transformation story? We’d love to hear it. Connect with Scott on LinkedIn or reach out to NeuEon at neueon.com/contact.