
Where ChatGPT, Gemini and Perplexity Get Their Sources
Wikipedia takes 47.9% of ChatGPT’s top-10 citations, Reddit 46.7% of Perplexity’s. Where each engine reads, and how to get into its sources.
The short version:
- Each AI engine has its own source ecosystem: among the top 10 most-cited domains, Wikipedia accounts for 47.9% of ChatGPT’s citations and Reddit for 46.7% of Perplexity’s, according to Profound’s citation study.
- These hierarchies are volatile: the share of ChatGPT answers citing Reddit collapsed from about 60% to about 10% in six weeks, per Semrush.
- The engines barely overlap: only 11% of domains cited by ChatGPT are also cited by Perplexity (Averi benchmark corroborated by Whitehat SEO). Working a single ecosystem is never enough.
- Measured tactics exist: citing sources and adding statistics and quotations can lift a site’s visibility by up to 40%, according to Princeton’s GEO study.
Why sources decide your AI visibility
When a buyer asks ChatGPT, Gemini or Perplexity a question, the engine builds its answer from the sources it reads: comparison sites, forums, encyclopedias, brand websites. If those sources don’t talk about you, you don’t exist in its answers, however good your product is.
One vocabulary point first: a mention (your brand is named in the answer) is not a citation (your site is listed as a source), and our article on AI visibility covers the difference. This article is about what comes before both: the documents each engine consults before it answers.
Engines expose their sources very unevenly: Perplexity displays its citations prominently, while ChatGPT often answers from memory without showing where anything came from. Visible or not, the raw material is the same: what the web says about you, in the exact places where the engine reads.
Each engine’s source ecosystem
There is no single “AI web”: each engine has built its own hierarchy of sources. The large-scale studies (Profound, hundreds of millions of citations between August 2024 and June 2025; Yext, 6.8 million citations) draw three profiles: encyclopedic for ChatGPT, community-driven for Perplexity, brand-site-first for Gemini.
ChatGPT: Wikipedia first, and a very volatile mix
Among the top 10 domains ChatGPT cites most, Wikipedia alone accounts for 47.9% of citations, according to Profound. And the rest of the mix moves fast: Semrush’s November 2025 study, which tracked 230,000 prompts and 100+ million citations over 13 weeks, watched the share of ChatGPT answers citing Reddit collapse from about 60% in early August 2025 to about 10% by mid-September.
What it means for you: ChatGPT favors established, factual references. Being present in encyclopedic pages, studies and the recognized press of your industry weighs more than an optimized blog. And nothing is permanent: OpenAI can reweight its sources within weeks, which makes any single-source strategy fragile.
Perplexity: Reddit, directories and comparison sites
Perplexity is the most community-driven engine: Reddit accounts for 46.7% of its citations among its top 10 domains, again per Profound. The Yext study from October 2025 adds that Perplexity favors niche directories, where the other engines look elsewhere.
What it means for you: this is the engine where authentic discussions and industry lists matter most. Being present in your sector’s discussion threads, specialized directories and independent comparisons directly raises your odds of appearing in its answers.
Gemini: your own site and the Google ecosystem
The same Yext study shows that Gemini pulls the majority of its citations from brand-owned websites, while ChatGPT leans first on third-party sites. Google’s engine puts more trust in what you publish yourself, backed by its own index of the web.
What it means for you: for Gemini, the first lever is your own site. Pages that answer your buyers’ questions precisely and a solid FAQ are your best entry points; a clean presence across the Google ecosystem (business profile, YouTube) completes the picture.
How to get into the sources, ecosystem by ecosystem
These three ecosystems barely overlap: only 11% of domains cited by ChatGPT are also cited by Perplexity, according to the Averi benchmark corroborated by Whitehat SEO on 118,000 responses (we break down these divergences here). Working a single ecosystem makes you visible on one engine and leaves the other two to your competitors.
Generative engine optimization is not magic: it has been measured. Princeton’s GEO study (Aggarwal et al., KDD 2024), on a 10,000-query benchmark, found that citing sources and adding statistics and quotations can increase a site’s visibility in AI answers by up to 40%.
Your own site: the Gemini lever
Turn your pages into answers. One page per question your buyers actually ask, a direct answer in the first paragraph, numbers attributed to their sources, attributed quotations: these are the tactics the Princeton study measured as the most effective. For Gemini, which cites brand sites first, this is the highest-return investment.
Directories, reviews and comparison sites: the ChatGPT and Perplexity lever
Identify the comparison sites and directories the engines already cite on your questions, then make your brand exist there: a complete, up-to-date listing, solicited customer reviews, presence in industry rankings. Every list you’re missing from is an answer where a competitor speaks in your place.
Forums and UGC: powerful but volatile
Reddit and forums feed Perplexity massively, and ChatGPT intermittently. Participate where your customers already talk, with useful answers and full transparency about who you are: astroturfing gets spotted and communities punish it. Reddit’s collapse in ChatGPT (from 60% to 10% in six weeks per Semrush) is a reminder that no UGC channel is a permanent asset.
Industry press: the slow lever
One article in a recognized publication of your sector feeds all three engines at once: it is read by live web search today and can enter the models’ memory at the next training runs. It is the slowest lever, and the most durable one.
One honest caveat to close: none of these tactics guarantees a mention. Effects show up in weeks or months, and engines reweight their sources without notice. The only robust approach: measure, act, then measure again.
And in your market?
All the large studies cited here are English-language: English prompts, mostly US sources. Yet the exact source hierarchy varies by language and by sector. Your local directories, specialized comparison sites and trade press don’t carry the same weight in your market’s answers as their US equivalents do in English-language ones.
The only way to know where the engines get their information on your questions, in your language, is to measure it. Pythie’s free audit asks 10 questions from your market to ChatGPT, Gemini and Perplexity, and shows you the exact sources each engine used, URL by URL, in one minute, no account needed. You then know which directory, comparison site or forum to work on first.
Frequently asked questions
Do you need an llms.txt file?
No rush. According to Ahrefs’ June 2026 study of 137,210 domains running Ahrefs Web Analytics, 28% publish an llms.txt, yet 97% of those files received zero requests in May 2026. Publish one if it costs you nothing; expect nothing measurable from it.
Is a Wikipedia page essential?
No, but it weighs heavily on ChatGPT: Wikipedia accounts for 47.9% of its citations among its top 10 domains, per Profound. A page cannot be decreed; notability criteria are strict. If you don’t qualify, comparison sites, directories and industry press remain real levers across all three engines.
Do AI engines read customer reviews?
Yes, indirectly: review platforms and comparison sites are part of the third-party sites ChatGPT leans on and the niche directories Perplexity favors, per the Yext study. Recent, detailed, authentic reviews raise your odds of being picked up. The free audit shows you which platforms your engines actually cite.