AI Search Proof Lab

GEO Measurement System: A 4-week AI search visibility experiment

Using Otterly.AI for GEO tracking and Claude for repeatable workflow automation

By Ellen Tuckett · AI Search, AEO, GEO & SEO Strategy · June 2026 · v1.0
Proof Lab · Cloudflare test brand · Week 1 baseline · June 2026
Week 1 at a glance

46% generative share of voice · 1.05 average brand position · 62% brand coverage across 15 tracked buyer-intent prompts.

Cloudflare is the example. The system is the deliverable.

46%
Generative share of voice
1.05
Average brand position
62%
Brand coverage across 15 prompts

Overview

Ranking is no longer the full visibility story.

In the retrieval era, brands are discovered, interpreted, summarized, compared, and shortlisted by AI systems before a buyer ever reaches a website. It is no longer enough to rank in an index. A brand also needs to be retrieved, accurately described, cited from trusted sources, and positioned clearly against competitors inside AI-generated answers.

This Proof Lab shows how I built a repeatable Generative Engine Optimization (GEO) measurement system using Cloudflare as the test brand. The experiment uses Otterly.AI as the GEO tracking layer and Claude as the workflow automation layer, with human review turning the outputs into strategy and an executive-ready weekly report.

The goal is not a one-time AI visibility snapshot. The goal is a practical GEO operating model that can be run every week: measure the answers, analyze the sources, identify the narrative gaps, track competitive movement, and turn the findings into prioritized action.

Who this is for

This Proof Lab is for B2B SaaS marketing leaders, SEO leads, GEO practitioners, product marketers, demand generation teams, technical content teams, and growth leaders building AI search visibility programs.

It is especially relevant for technical B2B companies competing in complex categories where buyers ask answer engines for recommendations, comparisons, definitions, vendor shortlists, and category guidance before they ever visit a website.

Primary prompt this page answers

Primary prompt: How do you build a repeatable GEO measurement system for AI search visibility?

Supporting prompts:

The operating model

The system runs on a simple division of labor:

Otterly.AI measures the GEO signals: Prompt tracking, brand coverage, share of voice, average position, top prompts, and domain citations. It is not a screenshot tool. It is the measurement layer.
Claude operationalizes the workflow: Organizing exports, normalizing data, classifying prompt outcomes, drafting first-pass insights, and packaging the weekly report. It is not "writing a report" from scratch. It is a workflow partner.
Human review turns outputs into strategy: Prompt selection, data validation, narrative judgment, prioritization, and final editing.
The weekly report makes the findings usable for executives and cross-functional teams.

The workflow is designed to automate the repetitive parts of GEO measurement without automating away strategic interpretation.

What this experiment measures

Measurement area What it answers
Inclusion rateIs the brand appearing in AI answers for priority prompts?
Generative share of voiceHow visible is the brand compared with competitors?
Average brand positionWhen the brand appears, where does it show up?
Narrative fidelityIs the AI describing the brand accurately and consistently?
Citation qualityWhich owned, earned, competitor, community, and media sources shape the answer?
Prompt-level gapsWhich prompts are wins, competitive risks, losses, or whitespace?
Recommended actionsWhat should the brand improve, publish, clarify, or distribute next?
Weekly movementIs the brand gaining, holding, or losing ground over time?

Why Cloudflare as the test brand

Cloudflare competes across multiple technical narratives at once: CDN, DDoS protection, web application firewall, edge compute, Zero Trust, VPN replacement, developer infrastructure, and integrated security platform.

That makes it ideal for testing how AI systems handle overlapping product categories, technical terminology, competitor comparisons, and citation sources.

The goal was not to evaluate Cloudflare as a company. The goal was to use a visible, technically complex brand to stress-test a workflow that applies to any B2B company competing for AI search visibility.

Experiment design

For Week 1, I created a fixed library of 15 buyer-intent prompts in Otterly.AI, scoped to the United States, spanning Cloudflare's major narratives: global CDN, DDoS protection, WAF, edge compute, Zero Trust, VPN replacement, origin egress cost reduction, and integrated CDN/WAF/DNS platform positioning.

Otterly.AI prompt library showing 15 tracked Cloudflare GEO prompts scoped to the United States.
Fixed 15-prompt library tracked in Otterly.AI for the Cloudflare GEO experiment.
Source: Otterly.AI Cloudflare brand report, United States, captured 18 June 2026.

Each weekly run uses the same prompt set, tracking structure, scoring approach, and interpretation process so movement can be monitored over time.

How the tools work together

Workflow stage Tool What happens Human review
Prompt library setupOtterly.AIBuild fixed buyer-intent prompts across narratives, personas, and competitive themesConfirm prompts reflect real buyer questions
GEO trackingOtterly.AITrack inclusion, mentions, share of voice, position, competitors, and citationsVerify outputs match market, prompt set, and brand report
Evidence captureOtterly.AICapture prompt lists, brand ranking, coverage, top prompts, and citationsVerify date range, country, engines, and screenshots
Data organizationClaudeOrganize weekly exports and screenshots into a repeatable structureConfirm files are complete and labeled
Metric normalizationClaudeConvert raw outputs into a structured GEO trackerSpot-check rows against Otterly source
Prompt classificationClaude + humanDraft Win, Competitive, Loss, or Whitespace labelsAdjust based on intent and strategic importance
Narrative fidelityClaude + humanDraft where the brand is accurate, weak, missing, or competitor-ledValidate accuracy and refine interpretation
Citation gap analysisClaude + humanIdentify which sources shape AI answersDecide which gaps are worth acting on
Weekly reportingClaude + humanBuild the polished weekly reportReview, edit, and approve executive version

Week 1 baseline findings

1. Cloudflare leads the tracked competitive set

Otterly.AI brand ranking table comparing Cloudflare, Akamai, Fastly, and Imperva.
Cloudflare leads with 37 mentions, 62% brand coverage, and 46% share of voice.
Source: Otterly.AI Cloudflare brand report, United States, captured 18 June 2026.
Brand Mentions Brand coverage Share of voice Average position
Cloudflare3762%46%1.05
Akamai2135%26%2.15
Fastly1728%21%2.17
Imperva610%7%3.00

The standout signal is Cloudflare's average position of 1.05. When it appears, it is almost always cited first. Placement, not just presence, drives perceived authority, trust, and buyer recall.

2. Coverage is strong but not uniform

Otterly.AI brand coverage chart comparing Cloudflare, Akamai, Fastly, and Imperva.
Week 1 brand coverage baseline across tracked AI engines.
Source: Otterly.AI Cloudflare brand report, United States, captured 18 June 2026.

Cloudflare leads at 62% coverage, ahead of Akamai at 35%, Fastly at 28%, and Imperva at 10%. The value of this chart is not the single data point. It is the trend line it enables across Weeks 2 through 4.

3. Cloudflare owns several high-intent prompt clusters

Otterly.AI table showing top Cloudflare prompts by brand mentions.
Prompt-level visibility showing where Cloudflare is most frequently mentioned.
Source: Otterly.AI Cloudflare brand report, United States, captured 18 June 2026.

Strongest prompts cluster around low-latency global app deployment, unmetered DDoS and global CDN, best global CDNs for high-traffic sites, WAFs with DDoS protection, reliable edge compute, and enterprise CDN/WAF/DNS consolidation.

Softer prompts, including Zero Trust, VPN replacement, and origin egress cost reduction, show lower mention counts and represent the clearest opportunity to strengthen narrative and citation-quality assets.

4. Citation analysis: where AI answers pull from

Otterly.AI domain citations report and category distribution for Cloudflare answers.
Domain citation analysis showing owned, competitor, community, and third-party sources.
Source: Otterly.AI Cloudflare brand report, United States, captured 18 June 2026.

Key Week 1 citation findings:

A brand can have strong owned content and still lose narrative ground if competitor or third-party sources are more frequently retrieved. A strong GEO strategy must know which sources are retrieved, which are trusted, and which shape the answer.

Prompt-level classification

Classification Meaning Typical action
WinIncluded, accurately framed, competitively positionedProtect and monitor
CompetitiveAppears, but competitors are strong or better framedComparison and authority work
LossCompetitors appear, brand absent or weakOwned content, schema, documentation, distribution
WhitespaceNo brand owns the answerThought leadership opportunity

This classification turns AI visibility data into an action map rather than a static dashboard.

The GEO workflow

Each weekly run follows the same eight steps:

Build the prompt library from buyer intent, product narratives, personas, and competitive pressure
Run prompt tracking across AI engines in Otterly.AI
Capture brand visibility metrics: mentions, coverage, share of voice, and position
Analyze citations to see which sources shape answers
Evaluate narrative fidelity: is the brand described accurately and on-strategy?
Classify prompt outcomes as Win, Competitive, Loss, or Whitespace
Translate gaps into actions across content, technical, documentation, distribution, and authority
Track movement over time to flag improvement, decline, and emerging competitors

The goal is not to admire the data. The goal is to improve the next run.

The Claude automation layer

Week 1 was run hands-on to prove the measurement model. The next step is to automate the repeatable parts with Claude, without handing over strategy.

In a production GEO program, the same work recurs every week: maintaining the prompt library, capturing and organizing exports, normalizing metrics, classifying outcomes, reviewing citations, summarizing narrative gaps, flagging competitive change, drafting actions, and producing an executive-readable report.

Claude supports the workflow across five modules:

Module What Claude helps produce Human review
Prompt library managerA consistent prompt set, grouped by theme, persona, intent, and narrativeConfirm prompts reflect strategic priorities
Data intake assistantA clean weekly folder of exports, screenshots, notes, and reporting filesVerify date range, country, engines, and sources
Metric normalizerA structured GEO tracker with prompt, brand, competitor, mention, position, citation, and status fieldsSpot-check rows against the source
Insight engineFirst-pass classifications, narrative gaps, citation gaps, and recommended actionsReview judgment calls, adjust for business context
Reporting layerA polished weekly report with charts, insights, competitor movement, and prioritiesEdit and approve the executive version

The strategist should not spend time copying data, sorting screenshots, and rebuilding the same report every week. That time is better spent deciding what the data means and what should happen next.

The weekly executive report

The intended output is a clean weekly GEO report, shareable with executives, marketing leaders, product marketing, content, demand generation, web, and technical SEO stakeholders.

The report is built around six parts:

Executive summary
KPI snapshot, including coverage, share of voice, mentions, average position, prompt wins/risks, citation mix, and week-over-week movement
Prompt-level findings
Narrative fidelity review
Citation and authority review
Prioritized action plan

The report is designed to answer the question every executive asks: what changed, what does it mean, and what should we do next?

Sample weekly report output

GEO visibility snapshot

Metric Week 1 result What it means
Brand coverage62%Cloudflare appears across the majority of tracked prompts
Generative share of voice46%Cloudflare leads the tracked competitive set
Brand mentions37Most-mentioned brand across the prompt library
Average position1.05When Cloudflare appears, it is usually ranked first
Closest competitorAkamaiTrails Cloudflare but remains visible across core prompts
Highest-pressure competitorFastlyFrequent in CDN, edge, and performance answer sets

Executive readout

Cloudflare leads the tracked category in Week 1 with the strongest coverage, highest share of voice, and best average position. The main opportunity is not inclusion. Cloudflare is already visible. The opportunity is narrative depth in softer clusters: Zero Trust, VPN replacement, and origin egress cost reduction. These clusters show room to strengthen supporting content and citation-quality assets.

Recommended next actions

Priority Action Why it matters
HighReview Zero Trust and VPN replacement answer text for narrative fidelityThese prompts show softer visibility and may need clearer positioning
HighMap weaker prompts to existing Cloudflare pages and documentationIdentifies whether the gap is content, structure, citation, or messaging
MediumStrengthen comparison and use-case content around competitor-heavy promptsImproves retrieval when Akamai or Fastly are strongly present
MediumReview citation sources by prompt clusterShows which sources shape each answer
MediumBuild an executive trend view for Weeks 2 through 4Turns the baseline into a measurable operating cadence

The report turns GEO from a specialist audit into a cross-functional operating rhythm: the web team acts on structure and schema, product marketing on narrative and comparison gaps, content on prompt clusters and documentation, demand generation on distribution, and PR on third-party source gaps.

4-week roadmap

Why this matters

AI search visibility is becoming a measurable growth channel, but the work requires more than checking whether a brand appears in ChatGPT, Perplexity, Gemini, or Google AI Overviews.

It requires a system: a prompt library, competitive benchmarks, inclusion tracking, generative share of voice, narrative fidelity review, citation analysis, technical and content diagnosis, third-party authority strategy, a testing cadence, a feedback loop from findings to action, an automation layer that makes it repeatable, and a reporting layer that makes it usable.

This Proof Lab demonstrates that system in motion. Cloudflare is the test brand, but the framework applies to any technical B2B company that needs to understand how AI systems retrieve, describe, cite, and rank it against competitors.

The deliverable is not a one-time AI visibility report. It is a repeatable GEO measurement workflow connecting prompts, sources, narratives, competitors, automation, reporting, and action.

It helps a team stop asking "Are we showing up in AI answers?" and start asking better questions:

Follow the experiment

This is Week 1 of a 4-week public build. New findings, narrative fidelity scoring, and the final repeatable framework will publish over the next three weeks.

Related Proof Lab work

This experiment connects to other Proof Lab assets that support AI search, AEO, GEO, citation strategy, and executive reporting.

FAQ

What is a GEO measurement system?

A system that tracks how a brand appears in AI-generated answers across priority prompts, competitors, citations, and narratives. It measures inclusion, share of voice, average position, narrative fidelity, and source quality, then turns findings into actions.

How does Otterly.AI support GEO tracking?

It monitors brand mentions, prompt visibility, competitor presence, share of voice, average position, and domain citations across AI search surfaces.

How does Claude support GEO workflow automation?

It helps organize exports, normalize metrics, classify prompts, draft insight summaries, identify narrative and citation gaps, and prepare weekly executive-ready reporting.

Why does narrative fidelity matter in GEO?

Because inclusion alone can mislead. A brand can appear in an answer yet be described generically, incompletely, or in a way that favors a competitor's framing.

Can this workflow be reused for other brands?

Yes. The prompt library, tracker, scoring, and reporting structure are brand-agnostic and can be applied to any technical B2B company or market.