Data capture
We collect data from the live, publicly available consumer interfaces of each platform, including ChatGPT, Perplexity, Google AI Overviews, Google AI Mode, Gemini, Grok, and Microsoft Copilot. We run real browser sessions against the same interfaces your customers use.
We do not use the platforms' APIs. API outputs are sanitized and draw on different sources than the consumer product, so they do not reflect what a real user sees. Capturing the rendered consumer experience is the only way to measure the answers your customers actually get.
We also run from the locations that matter to you. The platforms localize their answers, so a result captured from the right geography is the result a real user in that market receives. For every run we record the full rendered response and every source URL it cites. That raw response is what every metric is computed from.
Measurement through sampling
A Temso score is a rate, not a single answer. For a given topic we measure the proportion of responses, across repeated runs, in which a certain brand and source appears. That proportion is the metric.
Two kinds of error affect it: random variance and systematic bias.
Random variance is the platform answering differently from one run to the next. It averages out as the sample grows. The margin of error falls roughly with the square root of the number of runs, so a larger sample produces a tighter, more stable number.
Systematic bias is different. It does not average out, no matter how many runs you add. We remove it by holding the testing conditions constant: fixed location, language, session state, and cadence. Because the conditions never move, a change in the data reflects a real change in the AI or in your content, never a change in how we measured.
Sample size is the main lever
The single biggest factor in accuracy is sample size. More runs mean a tighter confidence interval, a more stable reading, and less chance of a skewed result.
Sample size is also a lever you control. Enterprise plans run your prompt sets at higher daily frequency, scaled to the precision your use case requires. The more sensitive the decision riding on the data, the more often we sample.
Modeling how real users ask
AI platforms do not answer the raw prompt. They decompose each query into multiple sub-queries (fan-out), retrieve against those, and synthesize the result. Semantically equivalent phrasings produce overlapping sub-query sets, so they converge on the same retrieved sources and substantially the same answer. Measuring every possible wording is unnecessary.
The variable that does move the answer is the context of the asker: the offering, the persona, and the use case behind the question. Temso provides a schema to encode those dimensions into your prompt library, so the set represents your actual demand rather than generic questions.
We validate that the set is representative against independent signals: keyword search volumes to weight prompts by demand and your own first-party data through integrations such as Google Search Console.
Verifying the data
The numbers are observed, not estimated. Every metric traces back to a specific captured response and the exact source URLs that drove it. If a score moves, you can drill into the response that caused it. Re-run it and check.
Beyond that, two independent cross-checks let you confirm the signal against your own data.
Temso correlates that crawl activity against citation trends, so you can confirm that rising citations line up with real crawler behavior on your site. It is built in and only needs to be activated on your account.
Comparing query and impression patterns in Search Console against citation patterns in Temso gives you a second independent data source. When independent instruments move together, the measurement is reflecting reality rather than an artifact of how it was captured.
What the data is and what it is not
The platforms are non-stationary. Models update, run server-side tests, and change their behavior over time. We measure that too. A shift that hits every brand at once signals a platform change. A shift isolated to your brand signals a change in your content or visibility.
Scores are best read as trends over time, not as exact daily readings. And the data measures visibility, not audience: it shows what appears in AI answers, not how many people saw it.
If your team wants to validate the methodology against your own data, our team is happy to walk through it directly.

