You're Paying an AI Vendor for Output a Free Chatbot Could Produce

A lot of “AI consulting” being sold to small and mid-sized businesses today is someone running the buyer’s problem through Claude or ChatGPT, lightly editing the result, and sending it back as a deliverable. The buyer pays for the engagement. The vendor pays $20 a month for the chat window. The gap between those two numbers is the entire business model. The deliverable looks the part, and the budget is gone before the issue surfaces.
What “Agent Washing” Looks Like in Practice
The practice has a name. Gartner calls it “agent washing,” and their research estimates that only about 130 of the thousands of vendors marketing “agentic AI” actually deliver real agentic capabilities. The same pattern scales down from enterprise vendors to the AI consulting market serving small and mid-sized businesses, where it shows up as hollow integration scopes, unverified evaluation metrics, missing fallback protocols, blind monitoring layers.
These are some common forms it takes:
A chat window can summarize a customer complaint. A real AI engagement monitors the support queue for incoming complaints, cross-references the issue with the user’s recent account activity, determines the root cause, drafts a tailored resolution, and flags it with a ready-to-send fix.
A chat window can write a quote. A real AI engagement watches the intake form, recognizes the kind of project from past wins, prices it against the same logic an experienced estimator would use, and sends the customer a quote in twenty minutes instead of two days.
A chat window can suggest a refund policy. A real AI engagement watches the support inbox, identifies refund requests as they come in, pulls the order history, decides if the request qualifies, processes the refund, and routes the edge cases to a human with a one-paragraph summary explaining what it found and why it stopped.
The difference is not intelligence. The difference is that the second version actually does the work. The first version is what gets sold as “AI consulting” when the vendor’s primary tool is the same chat window the buyer already has access to.
For a deeper look at the agent washing phenomenon and the data behind it, see why 40% of agentic AI projects are at risk of cancellation.
Why the Pattern Survives
The reason most buyers do not catch it is a literacy gap, not carelessness. Most small and mid-sized businesses do not have internal AI teams. Nobody on staff can look at a deliverable and determine whether the work behind it required engineering, integration, and testing, or whether it’s just a simulated workflow mapped out by a model.
AI-generated content is articulate, well-structured, and reads as professional. A strategy deck produced by a language model and polished by a designer looks indistinguishable from one produced through weeks of original research and analysis. Without the expertise to evaluate the substance, the buyer evaluates the presentation.
Not every vendor running this pattern is acting in bad faith. Some genuinely believe what they deliver is real AI consulting. They use the same language, the same proposal formats, and the same engagement structures as firms doing deep technical work. The difference is invisible from the outside unless the buyer knows what questions to ask.
The broader failure data backs this up. According to IDC research, 88% of AI pilots never make it into production. A significant share of those projects were never built to operate in the first place. They were built to exist in a vacuum, and a controlled environment was the final product.
What Real AI Development Actually Includes
The clearest way to tell a real AI engagement from a superficial system design is to look at the underlying infrastructure. A model can map out workflows, simulate logic, and outline processes. It cannot connect itself to actual business operations.
Integration: The system does the work, not the team. Integration is what turns the AI from a tool someone has to open into a system that runs in the background. Without it, the AI lives in a separate window and the team still has to copy results into the CRM, the ERP, or the scheduling tool. With it, the AI updates those systems directly, which is how a job actually gets done instead of just analyzed.
Evaluation: The team knows whether it is working. Evaluation is how anyone confirms the system is doing the job it was hired to do. Without it, performance is a guess based on whether someone happens to complain. With it, the system reports accuracy against the buyer’s actual data, and leadership knows the day quality starts slipping instead of finding out a quarter later.
Fallback handling: The system knows when to ask for help. Real systems are honest about uncertainty. Without fallback handling, the AI guesses on every edge case and the team learns about the bad guesses through customer complaints. With it, the system flags what it is not sure about, routes the hard ones to a human, and logs every override so the team can spot patterns and improve the model.
Monitoring: Problems show up before they show up in revenue. Models drift. Data changes. Customer behavior shifts. Without monitoring, the system silently produces worse results month over month and nobody catches it until the numbers come in soft. Monitoring is the early-warning layer that keeps the system from quietly degrading.
Ownership: The system still works six months from now. Models get updated. Vendor APIs change. The business adds a new product line. Without a named team responsible for the system, every one of those events breaks something, and the buyer finds out only when an operator notices a result that looks off. Ownership is the difference between a system that runs and a system that decays.
For a full evaluation framework covering these criteria, see how to evaluate an AI development partner.
Five Questions That Separate a Real Vendor From a Middleman
Most of the time, separating real AI work from a chat-window resale takes about five minutes and the right questions. Bring these into the next vendor conversation and pay attention to where the answers get vague.
Ask what part of the deliverable could not have been produced in a free chat window. A real vendor names specific components: an integration, an evaluation harness, a fine-tuned model, a workflow that runs without human input. A wrapper engagement pivots to “expertise” or “industry knowledge,” both of which a chat window can also produce on demand.
Ask which of your systems the engagement will connect to, and how. A real engagement names systems out loud: the CRM, the ERP, the data warehouse, the document store. A wrapper engagement stays vague, refers to “future integrations,” or proposes a manual handoff between the AI output and your tools. Manual handoff is the tell.
Ask how the team will know if the system is wrong. Real engagements describe evaluation methods, accuracy thresholds, and a process for catching failures. Wrapper engagements describe “quality assurance” or “human review” without specifics. If the vendor cannot name the test cases they plan to run on your data, the system has not been built to be tested.
Ask what happens after launch. Real engagements include monitoring, maintenance, a named owner, and a plan for model updates. Wrapper engagements end at delivery. If post-launch support is “available upon request,” the vendor expects the work to be over once the PDF lands in your inbox.
Ask how many engineering hours go into the project. Real AI engagements involve engineers writing code, configuring infrastructure, building integration logic, and setting up monitoring. Wrapper engagements involve consulting hours and slide design. If the proposed staffing leans heavily on strategists and barely on engineers, the outcome is just a concept, not a running system.
If the vendor cannot answer three of these five questions in concrete terms, the engagement is most likely a repackaging of work the buyer could do alone.
The Path That Actually Works
The signal of a real partner shows up early. In the first scoping conversation, they propose integration work, evaluation criteria, fallback logic, and a maintenance plan. Those four items together indicate an engagement designed to survive live operations, not just a successful deployment. If all four show up in the proposal, the engagement is likely real.
The point of this post is not to discourage every AI engagement. It is to give the buyer the tools to tell the difference between an engagement that builds something operational and one that delivers a formatted chat output.
Research from MIT found that working with a specialized partner who builds the system and hands ownership to the internal team succeeds roughly twice as often as pure internal builds. The constraint is not whether to work with a partner. The constraint is choosing one that does real work.
A strategic consulting engagement starts by mapping where AI fits the buyer’s operations and what the integration, evaluation, and monitoring requirements look like before any building starts. A partner who can answer the five questions above in specific terms is worth evaluating further.
What to Do This Week
Three steps that take less than an hour:
Pull any active AI vendor proposal off the desk and run it through the five questions above. Pay attention to which ones get concrete answers and which ones get deflected.
Audit an AI project you already paid for against those same five criteria. Look at the actual backend. Did the vendor build those operational infrastructure layers, or did they just leave you with a non-operational mockup?
If the answers are unclear, ask the vendor for the integration plan, evaluation criteria, and monitoring approach in writing. A vendor that can provide those three items in specific terms is doing real work. A vendor that cannot is selling formatted output.
Ready to Make AI Work for Your Operation?
We map the highest-impact opportunities in your business and build systems that run in production.
Start a Conversation