We're Not Building Agentic Search. We're Building Agentic Discovery.
Why the "give an agent an API" thesis breaks in travel, real estate, and automotive, and what replaces it.
In short: Conversational AI search works well when a buyer already knows what they want, where it mostly translates a sentence into filters. It works poorly at the discovery stage, because most buyers in high-consideration categories arrive with weak intent, and a language model cannot retrieve an answer the buyer has not yet formed. Decades of consumer research show that preferences are constructed during the decision, not retrieved from memory. The job is therefore not to execute intent faster but to help buyers build it. We call this agentic discovery rather than agentic search.
There is a clean and appealing thesis moving through AI commerce right now. Expose your product catalog through an MCP or a set of tools, let the user say what they want in natural language, and let an agent do the rest. The interface collapses into a conversation. The engineering collapses into integration. It is an elegant story, and for a narrow class of problems it is correct.
For the categories we work on at Kleio, which are travel, real estate, and automotive, it is mostly wrong. The reason is not model quality or tool design. It is an assumption buried inside the thesis about how buyers arrive: that they show up with strong, formed intent that only needs to be executed. In high-consideration purchases, that assumption rarely holds.
The strong-intent assumption
The thesis works beautifully when the buyer already knows the answer and the only remaining cost is retrieval and execution. "Book the 8:40 to Lyon." "Reorder the same set of parts." That is genuine strong intent, and an agent with an API beats every other interface for it. A lot of customer service looks like this. So does repeat commerce.
High-consideration categories are different in kind. The buyer is not retrieving a decision. They are trying to reach one, often for the first time, with real money and real consequences attached.
Most of our buyers start undecided
The data on this is not subtle.
In automotive, roughly 60 percent of shoppers begin the process without a specific make or model in mind [1]. The industry has a name for the early stage: the discovery phase [2]. And the path does not simply narrow toward a pre-selected answer. Research on the consumer decision journey shows the consideration set often expands during active evaluation, as buyers encounter options they had not initially considered [3]. Intent is being assembled along the way.
In real estate, the trigger is usually emotional before it is specification. A first-time buyer feels a pull toward stability or belonging long before they can state a budget, a district, or a room count [4]. Those hard parameters emerge during the search, and they shift as the buyer sees what is actually available.
In travel, even the dominant platform in the category has effectively conceded the point. Its own team has said the part they were always good at was the last mile, moving a customer from search to booking, and that the genuinely hard problem was meeting customers earlier, while they were still working out what they wanted [5]. That is an admission that intent at the top of the journey is weak.
Across all three verticals, the buyer walks in without the thing the automation thesis assumes they already have.
The filter paradox
Here is the structural trap for conversational search.
When intent is strong, natural language is a worse interface than a filter. Describing a preference in a sentence and waiting for an agent to parse it and reason over it is slower than three clicks on price, dates, and location. Structured search wins on speed and on control, and the buyer who already knows what they want will reach for it.
When intent is weak, a better language parser does not rescue the buyer either. It cannot answer a question they have not yet formed. A more fluent way to ask "what do I actually want?" returns nothing useful, because the answer does not exist to be retrieved.
So a pure conversational search layer lands in an uncomfortable middle. It is too slow for the decided and too shallow for the undecided. This is the structural reason the semantic-search promise in these categories has disappointed for years, and it holds independent of how good the underlying model is.
Intent translation versus intent creation
Look closely at what most so-called semantic features actually do. The dominant pattern, including at the largest players, is to take a sentence and map it onto filters that already existed. Type "a family villa with a pool near the beach" and the system applies the relevant filters automatically [6]. That is genuinely useful, and it is also, precisely, intent translation. It serves a preference the buyer already holds. It does nothing to create the preference the buyer lacks.
Independent reviews of these tools make the ceiling concrete. Bolted-on AI search over a fixed catalog tends to return a sorted list rather than a decision, and it suits the traveler who already knows the broad shape of the trip and just wants to fill in the logistics [7]. That is, once again, the strong-intent case, where filters were already adequate.
It is worth steel-manning the opposition here, because the strongest counter-argument is a real one. The dominant platform claims its conversational features are working, reporting that users shifted from single-keyword queries toward richer, discovery-style prompts, and that engagement rose as a result [5]. Two things are worth separating. First, engagement is not the same as a confident, completed, high-value decision, and vendor case studies almost never publish the conversion delta against a strong filter baseline. Second, and more important, surfacing more of an existing catalog in response to a richer query is still retrieval. It widens the funnel. It does not, by itself, help a buyer construct a preference they did not arrive with. The open question is whether richer queries reflect intent the buyer already had and merely expressed more fluently, or intent the tool actually helped create. We believe it is mostly the former, and that is the gap worth building into.
Preferences are constructed, not retrieved
The automation thesis inherits a very old and very wrong model of the buyer: a rational agent carrying stable, stored preferences that only need to be revealed. Decades of consumer research say close to the opposite.
Bettman, Luce, and Payne established that consumers frequently do not have well-defined preferences waiting in memory, and instead construct them during the decision itself, using strategies that depend on the task and on how the options are presented [8]. Preference is an output of the decision process, not an input to it. If that is true, then a system whose only job is to execute a stated preference has misunderstood the moment it is operating in.
This is also why more options can make outcomes worse rather than better. In the well-known field study by Iyengar and Lepper, shoppers presented with a large assortment of jams were markedly less likely to buy than those shown a small one [9]. The effect has been debated, and later meta-analytic work finds that it is moderator-dependent rather than universal [10]. The careful reading is not "less choice is always better." It is that when buyers cannot evaluate a large set, they defer or abandon. A raw catalog sitting behind a chat box is exactly that: a large set, presented through a browsing interface that is worse than the one it replaced.
If preferences are constructed, then the system's real job is to help construct them. This is precisely what critique-based, or example-critiquing, recommenders were designed to do. The buyer starts from a concrete example, reacts to it ("cheaper," "more like this but closer to the center"), and the model refines its estimate of their preferences across several cycles [11]. The literature is explicit that this approach matters most in high-risk, first-purchase domains, where users have no fixed preferences to begin with and reach a confident decision only through iterative critiquing [12]. Travel, property, and cars are the textbook instances, not the edge cases.
From agentic search to agentic discovery
All of this reframes the product.
Agentic discovery is the use of AI agents to help a buyer construct a preference by exploring and reacting to real products, in contrast to agentic search, which executes a preference the buyer already holds. The goal is not to route a strong signal to an API faster. It is to take a buyer who cannot yet specify what they want and move them, through interaction with real products, from a vague pull toward a choice they trust.
In practice that means letting people see and manipulate the catalog rather than only query it. It means presenting a small, diverse, legible set instead of a long ranked list. It means treating a buyer's reactions to concrete products as the raw material for an evolving preference model, and using conversation to structure the exploration and expose tradeoffs rather than to replace the act of looking. The conversation is scaffolding for discovery. It is not the destination.
The distinction is not cosmetic. Search assumes the answer exists and has to be found. Discovery assumes the answer has to be built.
Where that leaves MCP and agents
None of this is an argument against agents or against MCP. Both are real, and both are useful. But they are plumbing. An MCP exposes a catalog. It does not know how to help a person come to want a product they had not imagined. Without genuine mastery of the conversation and of the catalog beneath it, an agent simply automates, faster, a search the buyer was never equipped to run in the first place.
The defensible advantage was never the protocol, which everyone will eventually have. It is the discovery experience built on top of it, and the model of the buyer that experience encodes.
The teams treating high-consideration commerce as a pure automation problem are solving the easy 20 percent, the strong-intent tail, and mistaking it for the whole market. The hard and valuable part is the weak-intent majority, and it is not an automation problem. It is a preference-construction problem.
So the question to ask of any AI commerce product is a simple one. Does it help someone who already knows what they want go faster, or does it help someone who does not yet know arrive somewhere good? The first is a feature. The second is a business.
Frequently asked questions
Does conversational AI search work for booking travel, cars, or homes?It works well when the buyer already knows what they want, where it mainly translates a sentence into filters. It works poorly at the discovery stage, because most buyers in these categories start without a formed preference, and a language model cannot retrieve an answer the buyer has not yet constructed.
What is the difference between agentic search and agentic discovery?Agentic search executes a preference the buyer already holds, routing a strong signal to an API. Agentic discovery helps a buyer who cannot yet specify what they want build a preference by exploring and reacting to real products. Search assumes the answer exists and has to be found; discovery assumes it has to be built.
Why does semantic search underperform in high-consideration purchases?Because it sits in an awkward middle. When intent is strong, filters are faster than natural language, and when intent is weak, a better parser cannot answer a question the buyer has not yet formed. Most semantic features only translate a sentence into existing filters, which serves intent rather than creating it.
Is exposing a catalog through an MCP enough to build an AI shopping agent?No. An MCP is plumbing that connects a catalog to an agent. It does nothing to help a buyer construct a preference, so on its own it just automates a search the buyer was never equipped to run. The defensible advantage is the discovery experience built on top, not the protocol.
What does consumer research say about how buyers form preferences?Research on constructive choice (Bettman, Luce, and Payne, 1998) finds that consumers frequently lack well-defined preferences and build them during the decision itself. Critique-based recommender systems were designed for exactly these first-purchase, high-risk domains, helping buyers reach confident decisions through iterative feedback.
References
- Porch Group Media, "25 Statistics on How Consumers Shop for Cars" (citing Autotrader), 2025. https://porchgroupmedia.com/blog/25-amazing-statistics-on-how-consumers-shop-for-cars/
- 67 Degrees, "How Consumers Research Cars Before Buying." https://www.67degrees.co.uk/blog/how-consumers-research-cars-before-buying/
- McKinsey & Company, "The Consumer Decision Journey," 2009. https://www.mckinsey.com/capabilities/growth-marketing-and-sales/our-insights/the-consumer-decision-journey
- Product Marketing Alliance, "The Five Steps of the Consumer Buying Process," 2026. https://www.productmarketingalliance.com/what-are-the-5-steps-in-the-consumer-buying-process/
- OpenAI, "Booking.com and OpenAI Personalize Travel at Scale" (case study). https://openai.com/index/booking-com/
- Booking.com, "Booking.com Enhances Travel Planning with New AI-Powered Features," 30 October 2024. https://news.booking.com/bookingcom-enhances-travel-planning-with-new-ai-powered-features--for-easier-smarter-decisions/
- Stardrift, "Best AI Tools to Consolidate Flight and Hotel Search," 2026. https://stardrift.ai/resources/ai-tools-consolidate-flight-hotel-search
- Bettman, James R., Mary Frances Luce, and John W. Payne. "Constructive Consumer Choice Processes." Journal of Consumer Research 25, no. 3 (1998): 187–217. https://doi.org/10.1086/209535
- Iyengar, Sheena S., and Mark R. Lepper. "When Choice Is Demotivating: Can One Desire Too Much of a Good Thing?" Journal of Personality and Social Psychology 79, no. 6 (2000): 995–1006. https://business.columbia.edu/faculty/research/when-choice-demotivating-can-one-desire-too-much-good-thing
- Discussion of the choice-overload replication record and moderator effects (referencing Chernev, Böckenholt, and Goodman's 2015 meta-analysis). https://atticusli.com/replication-crisis/choice-overload-jam-study/
- Chen, Li, and Pearl Pu. "Critiquing-Based Recommenders: Survey and Emerging Trends." User Modeling and User-Adapted Interaction 22, no. 1–2 (2012): 125–150. https://link.springer.com/article/10.1007/s11257-011-9108-6
- On preference construction and iterative critiquing in first-purchase domains. https://www.sciencedirect.com/science/article/abs/pii/S1071581917301313



.png)
.png)

.png)

