Methodology
The LLM Visibility Index is built in public. Every step of the methodology, prompt corpus, and database schema is open by design.
1. Corpus generation
Each sector starts with ~15 buyer-intent seed queries per region, authored by the RefinedAI editorial team. For sectors that span multiple regions (e.g. AU + SG), we author a separate seed set per region to capture local market language.
The seeds are fed into RefinedAI's Discover and Refine modules, which fan out across all four LLMs and walk the refinement tree to depth 4-5. The raw output is the questions LLMs naturally ask when they need more context to answer a query.
The raw tree is deduplicated by normalised hash, then human reviewed. The final corpus contains 300-500 approved prompts per sector, marked as APPROVED in the database. Rejected prompts stay in the database (status REJECTED) for transparency.
2. Monthly tracking runs
On the first of each month, every approved prompt in the corpus is sent to all four LLMs through two parallel data feeds:
- RefinedAI Monitor — direct calls via the Vercel AI SDK to Claude, OpenAI, Gemini, and Perplexity
- DataForSEO LLM Responses API — an independent third-party feed used as cross-verification
Two independent feeds means independent verification. If both feeds agree on a citation, it is highly reliable. Disagreements are flagged in the per-agency scorecards.
LLMs and exact model versions tracked:
- CLAUDE
- OPENAI
- GEMINI
- PERPLEXITY
3. Scoring
For each tracking run, we extract structured data from every LLM response: which agencies were mentioned, which were cited (with a URL on the agency's domain), what position they appeared in, and the sentiment of the mention.
- Share of Voice (SoV) — the agency's share of total mentions across the corpus, weighted by position
- Mention Rate — % of prompts where the agency was mentioned at all
- Citation Rate — % of prompts where a URL on the agency's domain was cited
- Average Position — average rank when listed in a ranked response
- Sentiment — -1 to +1 score on the tone of the mention
4. Page audits
Every newly cited URL is scraped via Firecrawl and audited for structured data: JSON-LD schema types, entity-attribute-value density, word count, and title structure. These are surfaced on the per-agency scorecards and feed into a future "schema completeness" score.
5. Disclosure
Aemorph (the parent company of RefinedAI) is the publisher of this index. To preserve the credibility of the public rankings, Aemorph is tracked privately but excluded from public ranking tables. This is enforced at the database level via the includedInPublic field on every agency record.
Internal versions of every monthly snapshot include Aemorph for sales and operational use. The two views are produced from the same data but different filter rules.
6. Open by design
The methodology, corpus, schema, and Inngest pipeline functions are all intended to be public. We will publish the full Prisma schema and link to the open-source RefinedAI repository as part of v1.1.
Build-in-public is the moat. If anyone else wants to replicate this index, they should be able to.