Content Engineering

Last Updated: Jul 16, 2026

How LLMs Retrieve Your Content: RAG, Embeddings, and Chunking Explained

Written by

Pushkar Sinha

Head of SEO Research

Reviewed by

Ameet Mehta

Co-Founder & CEO

How LLMs Retrieve Your Content: RAG, Embeddings, and Chunking Explained

TL;DR

AI search does not evaluate your whole page at once. It evaluates individual passages pulled from your page. Every section you publish is a standalone competitor for citation.
Passages get matched to queries by meaning, not keywords. A section that says one thing clearly will outperform a section stuffed with the right terms but vague in what it actually says.
Your page gets automatically split into chunks before any query is even asked. If a section break lands in the middle of a complete thought, neither half gets retrieved.
The passages that consistently earn citations lead with the answer, carry high entity density, and make sense when read in complete isolation from the rest of the page.
Google ranking does not predict AI citation. Only 12% of AI-cited URLs also rank in Google's top 10. Each platform retrieves from different sources using different logic.
Content teams that want AI visibility need to shift from thinking about pages to thinking about passages. Every brief, every section scope, every edit should be evaluated at the passage level.

When someone asks a question in ChatGPT, Claude, or Perplexity, the answer is assembled from individual passages pulled from across the web in real time.

The system that does this is called Retrieval-Augmented Generation (RAG). RAG is how AI systems pull passages from the web in real time and use them to build answers.

Three concepts drive this process: RAG itself, embeddings, and chunking. In this article, I break down each one and explain what the research shows about how they affect your content.

What Happens When Someone Asks AI a Question

The mechanics behind AI answers are simpler than most content leads expect. Understanding the pipeline helps you understand the ‘why’ behind every section you write.

The Retrieval Pipeline Behind Every AI Answer

Every time a user asks a question in Claude, ChatGPT, or Perplexity, RAG runs a two-step process:

Retrieval: the system searches indexed content and pulls the passages most relevant to the query.
Generation: it feeds those passages to a language model, which reads them and builds an answer, often citing where each piece came from.

AI models also draw from training data, everything the model learned before deployment. That knowledge is static and does not update between releases. So, when a model cites a source in its answer, that citation comes from the retrieval layer, not training data. For content teams, the retrieval layer is where you can influence outcomes.

Kevin Indig, Growth Advisor and former Head of SEO at Shopify, puts the distinction plainly:

“Content used for training gets barely cited, so the question we have to ask ourselves is how much we want to be part of the training data as opposed to beachfront property for live web retrieval.”
— Kevin Indig, Growth Memo, March 2026

The Unit of Retrieval Is the Passage, Not the Page

I explain this distinction to every content lead I work with because it changes how they scope their briefs. RAG does not retrieve full pages. It retrieves individual passages.

A 3,000-word article does not compete as one unit in AI retrieval. It gets split into passages first, and each passage is evaluated independently. An audit of 15 domains covering approximately 2 million organic sessions found that ‘answer capsule’ presence was the single strongest commonality among ChatGPT-cited posts (Search Engine Land, November 2025).

How Embeddings Decide Which Passages Match a Query

This is the concept that explains why keyword-stuffed content fails in AI search, and why a well-scoped section on a smaller site can outperform a vague section on a high-authority domain.

What an Embedding Is in Simple Terms

RAG retrieves through meaning-based matching. The technology behind this is called embeddings. An embedding is a way of converting text into a format that captures what the text means, not just what words it uses.

When your content gets indexed, every passage gets converted into an embedding. When a user asks a question, that question gets converted into an embedding, too. The system then compares the question's embedding against every passage's embedding and returns the ones that are closest in meaning. This is called semantic similarity.

Keywords Still Contribute, but Meaning Decides

Because embeddings capture meaning, a passage can get retrieved without containing the exact words from the query. If someone asks "how do I get my content cited by AI" and your passage discusses "structuring sections for retrieval in language model systems," the meanings are close enough to match.

Keywords still matter. They contribute to the meaning the embedding captures. But a passage loaded with the right keywords and vague in what it actually says will produce a weak embedding. It will not match strongly with any specific query.

Kevin Indig's analysis of 1.2 million ChatGPT responses and 18,012 verified citations found five traits of highly cited content (Growth Memo, February 2026):

Definitive language: cited passages were nearly 2x more likely to use clear definitions.
Conversational Q&A structure: 78.4% of question-linked citations came from headings.
Entity density: cited text had 3-4x higher entity density than average web content.
Balanced sentiment: subjectivity score around 0.47.
Business-grade clarity: Flesch-Kincaid grade level of 16 versus 19.1 for lower-cited content.

I keep coming back to this list because it matches what I see in practice. The pages that get cited are not the longest or the most keyword-rich. They are the clearest.

What Makes an Embedding Strong vs Weak

A strong embedding comes from a passage that says one thing clearly. One topic. One direct statement. Enough context to stand on its own. A weak embedding comes from a passage that covers multiple topics, drifts between ideas, or needs the paragraph before it to make sense.

When a passage could mean several things, its vector sits between multiple clusters on the meaning map. It does not match strongly with anything. The practical rule: if a passage is about one thing and says it plainly, it will produce a strong embedding.

How Chunking Splits Your Page Into Pieces That Compete Alone

Chunking is the part of the RAG pipeline that most content teams have never considered. Your page gets broken apart before it even enters the retrieval competition.

What Chunking Is and When It Happens

Before any matching happens, your content gets split into chunks. This happens during indexing, before any user asks a question. Common chunk sizes range between 128 and 512 tokens, though some systems use up to 1,024 tokens for tasks that need broader context (arxiv, May 2025; Weaviate). Each chunk gets its own embedding. Each chunk enters the retrieval competition independently.

You do not control where the splits happen. The system decides based on its own chunking strategy. Some systems use fixed-size chunks. Others use semantic boundaries. The approach varies by platform.

Research presented at Tech SEO Connect 2025 found two important retrieval behaviors:

"RAG crawler activity is governed by appetite and session throttling, meaning bots are lazy and will only visit the best content once; they don't want to go more than about three pages deep. If the bot retrieves unstructured content first, it will use that information and ignore subsequently structured content in the same session."
— Tech SEO Connect 2025, via Lily Ray

What Goes Wrong When Chunking Breaks Your Content Badly

Three problems show up consistently:

Split answer: a complete thought gets divided across two chunks. Neither contains the full answer. Neither gets retrieved.
Merged topics: two unrelated ideas land in one chunk. A paragraph about pricing and a paragraph about implementation end up together. The embedding becomes unfocused and matches neither topic well.
Missing context: a chunk contains a useful statement but lacks the setup to make it understandable. The system retrieves it, but the passage feels incomplete.

All three come from the same root cause. I have reviewed hundreds of pages where the content was strong, but the structure broke it for retrieval. That’s because the content was written for reading in sequence, not for extraction in isolation.

Why Section Boundaries Matter More Than Section Length

There are two ways a system can split your page:

At natural topic breaks, where one section ends and another begins. A peer-reviewed study found this approach retrieved the right passage 87% of the time (MDPI Bioengineering, November 2025).
At a fixed word count, cutting wherever the count hits, regardless of whether a thought is finished. The same study found this approach retrieved the right passage only 50% of the time.

The length of each passage matters too, but not in the way most teams assume:

A factual question like "what is RAG" needs a short, direct answer.
A conceptual question like "how should content teams think about AI search" needs a longer passage with more surrounding context.

There is no single perfect chunk size. But content that places clean topic breaks between 200 and 400 words gives chunking systems the best chance to split in the right places.

What This Means for Content Teams

The research points in one direction. Google ranking does not predict AI citation.

Only 12% of URLs cited by AI assistants also rank in Google's top 10. Perplexity shows the highest overlap with Google at 28.6% (Ahrefs, August 2025). ChatGPT on short-tail queries overlaps only about 10% (Ahrefs, September 2025). Each platform indexes different sources and retrieves from different databases.

Rand Fishkin, CEO of SparkToro, tested 2,961 prompts across ChatGPT, Claude, and Google AI:

"There is less than a 1 in 100 chance that any of the AI tools will give the same list of brands/products in two responses. Less than 1 in 1,000 will give the same list in the same order."
— Rand Fishkin, SparkToro, February 2026

Kevin Indig's analysis of 1.2 million ChatGPT responses found a consistent citation pattern he calls the "ski ramp." 44.2% of citations come from the first 30% of content (Growth Memo, February 2026).

The URLs differ. The citation timing varies. But the structural traits of cited passages stay the same.

The Passage Is the Unit of Competition

Every section you write is a standalone competitor for citation. The brief should specify what question each section answers. The writer should lead each section with the answer. The editor should test whether each section makes sense when read in isolation.

Passage Independence Checklist

I use this checklist on every brief before it goes to a writer:

Each 200-400 word section is standalone
Zero backward references ("as mentioned above")
Zero forward references ("as we'll see below")
Every "it", "this", "they" has a clear referent
Main point in first 1-2 sentences
No section exceeds 400 words

Lily Ray summed it up at Affiliate Summit 2026:

"SEO still powers AI. Ranking influences RAG citations. Third-party reputation is critical. AI trusts corroboration."
— Lily Ray, VP SEO Strategy, Amsive | Affiliate Summit, February 2026

Start by Testing Your Top Pages

Step 1: Pick your top 5-10 pages by traffic or strategic value.

Step 2: Identify the target query each page should answer.

Step 3: Run each query across ChatGPT, Claude, and Perplexity.

Step 4: Document which passages get surfaced and which get skipped on each platform.

Step 5: Review the skipped pages. Check whether each section leads with the answer, covers one idea, and makes sense when read alone.

Step 6: Restructure where needed. One idea per section. No multi-topic paragraphs. Sections between 150 and 400 words.

Step 7: Run the isolation test. Copy any section out, read it without context. If meaning is lost, rewrite until it stands alone.

RAG, embeddings, and chunking are not three separate problems to solve. They are three stages of one pipeline that your content passes through every time someone asks an AI a question. Understanding how they work together is what turns passage-level thinking from a concept into an editing discipline.

Quick Reference

Concept	What It Does	What You Control
RAG	Pulls passages in real time to build answers	Whether your content contains clear, citable passages
Embeddings	Matches query meaning to passage meaning	How clearly each passage communicates one idea
Chunking	Splits your page into competing fragments	How you size and scope each section

Passage-Level Thinking, Built Into Every Brief

Answer-first sections, high entity density, and clean topic breaks between 200 and 400 words don't happen by accident. We bake the passage independence checklist into production, so retrieval-ready structure is the default.

Book a Demo

Frequently Asked Questions

Can I control how AI systems chunk my content?+

Not directly. Each platform uses its own chunking strategy and you cannot dictate where the splits happen. But you can influence it. Clean HTML heading structure, one topic per section, and consistent section lengths between 200 and 400 words give the system natural break points to work with. The closer your section breaks align with topic boundaries, the more likely the system is to split where you intended.

Does HTML formatting like headers, bold, and bullet points affect retrieval?+

Headers act as strong chunking signals across most systems. They tell the system where one topic ends and another begins. Bold and bullet points contribute to the meaning an embedding captures, but they are not retrieval signals on their own. The content inside the formatting is what matters, not the formatting itself.

How do I know if AI platforms are citing my content?+

The simplest way is to run your target queries across ChatGPT, Claude, and Perplexity and check whether your URLs show up in the citations. The challenge is that AI citation results are inconsistent between runs, so manual testing gives you a snapshot, not a trend. VisibilityStack's Demand Capture Score automates this by tracking your passage-level citation performance across AI platforms and traditional search together, so you can monitor visibility over time instead of relying on spot checks.

If AI retrieves passages, does page length still matter?+

Page length matters for Google ranking but not for AI retrieval. A 500-word page with three well-structured sections can outperform a 5,000-word page where the answer is buried in paragraph twelve. What matters is whether each individual section is clear, self-contained, and leads with the answer.

Does schema markup help with AI retrieval?+

Schema helps search engines understand what your page is about, but RAG retrieval is driven by embeddings, not structured data markup. A well-written passage without schema will outperform a poorly written passage with perfect schema. That said, schema contributes to how your page gets indexed and categorized, which can indirectly affect whether it enters the retrieval pool in the first place.

How often do AI systems re-crawl and re-index content+

It varies by platform and there is no published crawl schedule. Research from Tech SEO Connect 2025 found that RAG crawlers are governed by appetite and session throttling, meaning they visit the best content once and don't go more than about three pages deep. Updating existing content to be passage-ready is likely more effective than publishing new pages and waiting for a crawl.

Pushkar Sinha

Head of SEO Research

Pushkar leads SEO Research at VisibilityStack, driving the development of proprietary methodologies and frameworks that power our platform. His deep expertise in search algorithms and AI systems informs our technical approach. Pushkar has led SEO research initiatives at multiple technology companies, developing frameworks that have driven hundreds of millions in organic pipeline for B2B SaaS clients.

Share this article

AI Names Your Brand in Only 43% of Citations. Here's Why the Other 57% Stay Silent. [Research]

Pushkar Sinha

Jul 14, 2026

AI Doesn't Quote You, It Rewrites You: 76% of Citations Prove It [Research Study]

Pushkar Sinha

Jul 18, 2026

A Guide to Reddit Account Setup, Warmup, and Comment Strategy for AI Citations

Ameet Mehta

Jun 18, 2026

How LLMs Retrieve Your Content: RAG, Embeddings, and Chunking Explained

TL;DR

What Happens When Someone Asks AI a Question

The Retrieval Pipeline Behind Every AI Answer

The Unit of Retrieval Is the Passage, Not the Page

How Embeddings Decide Which Passages Match a Query

What an Embedding Is in Simple Terms

Keywords Still Contribute, but Meaning Decides

What Makes an Embedding Strong vs Weak

How Chunking Splits Your Page Into Pieces That Compete Alone

What Chunking Is and When It Happens

What Goes Wrong When Chunking Breaks Your Content Badly

Why Section Boundaries Matter More Than Section Length

What This Means for Content Teams

The Passage Is the Unit of Competition

Start by Testing Your Top Pages

Quick Reference

Passage-Level Thinking, Built Into Every Brief

Frequently Asked Questions

Related Posts

AI Names Your Brand in Only 43% of Citations. Here's Why the Other 57% Stay Silent. [Research]

AI Doesn't Quote You, It Rewrites You: 76% of Citations Prove It [Research Study]

A Guide to Reddit Account Setup, Warmup, and Comment Strategy for AI Citations

Platform

Services