Executive Summary
Each AI model lives in its own citation universe. When we sent the same prompts to ChatGPT, Google AI Overview, Gemini, Grok, and Copilot, and analyzed over 2 million cited sources. 71% of the websites they cited appeared in only one model's responses. At the specific-page level, the fragmentation is even sharper: 89% of URLs are exclusive to a single model.
The average pair of AI models shares just 14.4% of their cited domains. Even the two most overlapping models (AI Overview and Grok) agree on only 1 in 5 sources. ChatGPT and AI Overview, the two most widely used platforms, share fewer than 1 in 5 domains. Grok emerged as the model with the widest and most unique source footprint. Gemini sits at the opposite end with over 80% of its citations come from domains that at least one other model already references.
For brands and publishers managing their AI visibility, the implication is direct: a source that performs well on one platform may be invisible on another. Each model draws from its own distinct slice of the web, and a single-platform optimization strategy leaves significant blind spots.
Highlights
- Fewer than 1 in 5 domains are share between ChatGPT and Google AI Overviews. The two most widely used AI platforms overlap on just 17.4% of their cited sources and they're far from the least similar pair.
- 71% of domains are exclusive to a single AI model. More than 7 in 10 websites cited by AI appear in only one model's responses. The web that each model sees is largely its own.
- 89% of URLs appear in only one model's responses. Models occsionally share the same website, but they almost never cite the same page. At the page level, AI sources are nearly entirely fragmented.
- 14.4% average domain overlap between any two models. Pick any two AI models and on average they share fewer than 1 in 7 of the domains they cite.
Most Sources Are Exclusive to a Single Model
The most fundamental finding: the vast majority of sources cited by AI models are not shared. Out of all the distinct domains across all 5 models, 71.1% appear in only one model's responses.
The gap between domain and URL exclusivity tells its own story. Models occasionally land on the same website, but they almost never cite the same page. The domain overlap rate is 28.9%, while the URL overlap rate drops to just 11.2%.
As you move from "shared by 1 model" to "shared by all 5," the numbers collapse:
Only 1.6% of domains are cited by every model. Fewer than 1 in 60 domains are truly universal across AI.
"Out of the 2 million cited sources we analyzed, only 1.6% of domains are recognized by every model. The rest belong to one platform's world or another."


Every Model Has Its Own Source Footprint
Each model maintains a meaningfully distinct set of sources. Grok has the widest net with 57.7% of its domains aren't cited by any other model. Gemini is the opposite: only 19.8% of its domains are exclusive, meaning over 80% of what it cites is already referenced by at least one competitor.
85.8% of the specific pages Grok cites appear in no other model's responses. Each platform is looking at a different internet.
"Grok cites more exclusive domains than most models cite in total. Each platform is drawing from its own map of the web."

Any Two Models Share Fewer Than 1 in 7 Domains
Across all 10 possible model pairs, the average domain overlap rate is 14.4%. The highest pair (AI Overview and Grok) shares just 20.6%. The lowest (Grok and Copilot) shares 7.6%.
| Model A | Model B | Overlap Rate |
| AI Overview | Grok | 20.6% |
| AI Overview | Gemini | 19.7% |
| GPT-4o | Grok | 18.3% |
| GPT-4o | AI Overview | 17.4% |
| Gemini | Grok | 14.7% |
| GPT-4o | Gemini | 13.5% |
| Gemini | Copilot | 11.6% |
| GPT-4o | Copilot | 10.4% |
| AI Overview | Copilot | 9.9% |
| Grok | Copilot | 7.6% |
Copilot is the most isolated model, it appears in the bottom 3 pairs. AI Overview shows the highest overlap with both Grok (20.6%) and Gemini (19.7%), suggesting a broader retrieval profile that partially intersects with multiple competitors.
Even in the best case, 4 out of 5 domains are not shared. No two models come close to citing a majority of the same sources.
"Even the two most similar AI models disagree on 4 out of 5 sources they cite. The least similar share almost nothing."

Smaller Models Draw From Mainstream Sources
When you flip the question from "how much overlap" to "how much of one model's sources are already known to another," a hierarchy emerges. Smaller models' citations are largely subsets of the bigger ones.
| Smaller Model | Contained in → | % Contained |
| Gemini | Grok | 68.3% |
| Gemini | AI Overview | 55.0% |
| Copilot | Grok | 53.6% |
| AI Overview | Grok | 48.7% |
| Copilot | GPT-4o | 46.9% |
| Gemini | GPT-4o | 43.0% |
| Copilot | AI Overview | 41.1% |
| GPT-4o | Grok | 41.0% |
Grok appears as the "contained in" target most frequently as it shows up on the right side of this table more than any other model. Over two-thirds of what Gemini cites, Grok already knows. Nearly half of AI Overview's domains appear in Grok's set. Smaller models tend to cite from a more mainstream, well-established pool of sources, while Grok ventures further into the long tail of the web.
"Two-thirds of what Gemini cites, Grok already knows. Smaller models stick to the mainstream whereas larger ones explore the edges."
Context
This analysis draws from source citation data collected across the Temso platform, which monitors how brands appear in AI-generated responses. Every time an AI model answers a prompt and cites a source that citation is captured and tracked.
The dataset includes 2,045,102 source citations from 134,673 AI responses, spanning 5 models: ChatGPT (GPT-4o), Google AI Overview, Gemini, Grok, and Copilot. Unlike a typical observational study, this analysis is controlled: it only includes prompts where all 5 models responded, ensuring every model is compared on exactly the same set of questions. These findings reflect the citation behavior of AI models as observed through brand-related and commercial prompts.
Methodology
How We Measured This
For each of the 5 models, we collected the complete set of domains and URLs cited in responses to the qualifying prompts. We then compared every possible pair of models (10 pairs) by calculating the overlap rate: the number of shared domains divided by the total distinct domains across both models. This gives a clean, symmetric measure of how similar two models' citation sets are.
The key design choice in this study was the controlled prompt set. By restricting to prompts where all 5 models responded, we ensured that differences in citation behavior reflect genuine model-level preferences not just differences in what each model was asked about. Every model in every comparison answered the same questions.
All estimates include confidence intervals at the 95% level. The difference between domain and URL overlap rates was tested for statistical significance, confirming that the two levels of analysis produce meaningfully different results.
