When an idea gets its own page, does AI find the answer?

ContentGrapher sometimes tells you an idea on your page deserves a page of its own. We took 30 real pages where it said that, built the recommended page, and checked whether AI systems could find the answers.

The short answer: yes, and the gap is not subtle. With the recommended page in place, AI found the answer to 84% of the questions. Without it, 4%.

That is the whole study in one comparison, so the rest of this page does two jobs: it explains how we made the comparison fair, and it tells you the one honest catch we found along the way.

How often the answer was found

Hub with the recommended page0.0%

Hub without it0.0%

Share of 166 real search questions where the strictest check found an answer, averaged across all 30 page sets.

What we actually tested

When ContentGrapher analyzes a page, it sorts every idea on it: this one is the point of the page, that one supports it, and that one over there really belongs on its own separate page. Our agreement study measured how consistently AI models make that “belongs elsewhere” call. This study asks the question that actually matters to you: if you act on the call and build that separate page, does anything improve?

So for each of 30 real pages, from sites like Adobe, AWS, Semrush, Zendesk, Buffer, and gov.uk, we built two versions of a small website around it. One version added the pages ContentGrapher recommended. The other, the look-alike, added the same number of new pages with the same amount of writing, but built around ideas the analysis said should stay put. If building any extra pages helps, both versions should improve. If picking the right ideas is what matters, only the first one should.

Then we asked questions. Not questions we made up: 166 real questions pulled from Google's “People Also Ask” box, the things people genuinely type when they search for these topics. For each question we checked whether an AI retrieval system, the kind that powers AI search and chat-with-your-docs tools, could find an answer in each version of the site.

It was not luck

With 30 pages you can ask whether a result is a fluke. We checked two ways, and here is the plain reading of both.

Question by question, the recommended page never lost. We compared every single question head to head: did the version with the recommended page do better, worse, or the same as the look-alike? Out of 166 questions, the recommended page won 164, tied 2, and lost 0.

All 166 questions, one dot each

164 won by the recommended page2 ties0 won by the look-alike

And the gap survives bad luck. We re-dealt our 30 pages thousands of times, the standard way statisticians stress-test a result, to see how small the gap could plausibly get if we had happened to pick an unlucky set of pages. Even at the pessimistic end, the gap stays above 72 percentage points. The bar we set in advance for calling this real was 5. It cleared that bar fourteen times over.

It works on every kind of page

Pages do different jobs. Some explain what something is, some walk you through a task, some compare options, some help you decide if a product fits, and some sell. We made sure all five jobs were represented in the 30 pages, because a result that only works on explainer articles would be a much smaller result.

Answers found, by the job the page does

guide · how-to pages97.2%

without: 11.1%

compare · X vs Y pages85.0%

without: 0.0%

convert · product and pricing pages83.3%

without: 2.8%

explain · what-is pages79.2%

without: 4.2%

evaluate · is-it-right-for-me pages76.7%

without: 0.0%

“Without” is the look-alike hub for the same pages. The gap holds on every page job.

The weakest page type still beat its look-alike by 73 percentage points. We had a specific worry that comparison and evaluation pages would drag the result down, because of how those pages are built they give the analysis less to work with. They did not: comparison pages came in at 85%, above the explainer pages.

And it is not a quirk of one AI system

Every AI retrieval setup reads content a little differently. So we re-ran the measurement on three different reading systems: two from OpenAI that are common in production tools, and one open-source system that runs on an ordinary computer with no cloud connection at all. The gap showed up on all three, between 95 and 98 percentage points on the ten clearest pages. Whatever is doing the work here, it is the page, not the particular AI reading it.

We also tried it on pages that already exist

Everything above uses destination pages we wrote for the test. A fair question: does the pattern show up when the recommended page already exists on a real website? For one company's site, teramind.co, ContentGrapher's recommendations happened to point at pages the site already had, so we could check directly.

The honest result is mixed. On two of the four pages we could test, the real existing page did the job well, answering 83% to 100% of the questions. On the other two, the existing pages were written as product marketing and did not answer the kinds of questions people actually ask, so they lost badly to pages written specifically for the idea. The pattern is real but the sample is tiny, four pages and fourteen questions, so we report it as an illustration, not a finding.

The honest catch

Here is the thing we found that we were not hoping to find. Run ContentGrapher's analysis on the same page twice and you do not get exactly the same list of “give this its own page” recommendations. The lists overlap by about 60% on average. We set ourselves a target of 70% before we started, and the system came in under it.

Run the analysis twice: how much of the list repeats

70% target61.5%

Average overlap between the “give this its own page” lists from repeat runs on the same 10 pages, against the 70% bar we set in advance. It came in under, and we publish that.

What does that do to the result? The ideas that show up run after run are the ones carrying the 84%. The ideas that appear in one run and not the next are the marginal calls. So the practical advice is: treat a single analysis as a strong draft, and trust the recommendations that keep showing up over any one run's exact list. We decided before running the study that if this number came in low we would publish it rather than quietly drop it, and the engineering work to tighten it is already scheduled.

What we cannot claim

01A single analysis run does not give you the definitive list. The result belongs to the recommendations that repeat across runs, and we measured exactly how much the list moves.
02This is a controlled measurement of AI retrieval, not a study of live AI search products like ChatGPT or Google AI Overviews. Nobody outside those companies can run that study; this is the closest controllable stand-in.
03The destination pages were written by AI for the test, so both sides got the same treatment. Pages that already exist on real sites performed worse in our small side test, and we show that data rather than hide it.
04We tested three AI reading systems, not all of them. A fourth was planned and did not run, and we say so in the methodology rather than pretend it was never planned.

The answer

When ContentGrapher says an idea deserves its own page and you build that page, AI systems find the answers people are searching for 84% of the time, against 4% without it. The gap held on all 30 pages, on every kind of page, on three different AI reading systems, and on 164 of 164 head-to-head questions. Building pages around ideas the analysis did not flag produced nothing, so the value is in the picking, not the publishing.

And the system that does the picking is not perfectly consistent yet. We measured that too, told you what it means for how to read your results, and put the fix on the roadmap. That is the deal this research series offers: the result, and the catch, in the same report.

This study closes the loop the decoy study opened. That one showed that filling the right gaps within a page beats filling random ones. This one shows that moving the right ideas off the page works too, and the agreement study sits between them, measuring how reliably models make the call in the first place.

Read the methodology Explore the full data Related: do AI models agree on what belongs on a page? The Agreement Study Related: does it matter which gap you fill? The Decoy Study All research