Using Otterly.AI for GEO tracking and Claude for repeatable workflow automation
46% generative share of voice · 1.05 average brand position · 62% brand coverage across 15 tracked buyer-intent prompts.
Cloudflare is the example. The system is the deliverable.
Ranking is no longer the full visibility story.
In the retrieval era, brands are discovered, interpreted, summarized, compared, and shortlisted by AI systems before a buyer ever reaches a website. It is no longer enough to rank in an index. A brand also needs to be retrieved, accurately described, cited from trusted sources, and positioned clearly against competitors inside AI-generated answers.
This Proof Lab shows how I built a repeatable Generative Engine Optimization (GEO) measurement system using Cloudflare as the test brand. The experiment uses Otterly.AI as the GEO tracking layer and Claude as the workflow automation layer, with human review turning the outputs into strategy and an executive-ready weekly report.
The goal is not a one-time AI visibility snapshot. The goal is a practical GEO operating model that can be run every week: measure the answers, analyze the sources, identify the narrative gaps, track competitive movement, and turn the findings into prioritized action.
This Proof Lab is for B2B SaaS marketing leaders, SEO leads, GEO practitioners, product marketers, demand generation teams, technical content teams, and growth leaders building AI search visibility programs.
It is especially relevant for technical B2B companies competing in complex categories where buyers ask answer engines for recommendations, comparisons, definitions, vendor shortlists, and category guidance before they ever visit a website.
Primary prompt: How do you build a repeatable GEO measurement system for AI search visibility?
Supporting prompts:
The system runs on a simple division of labor:
The workflow is designed to automate the repetitive parts of GEO measurement without automating away strategic interpretation.
| Measurement area | What it answers |
|---|---|
| Inclusion rate | Is the brand appearing in AI answers for priority prompts? |
| Generative share of voice | How visible is the brand compared with competitors? |
| Average brand position | When the brand appears, where does it show up? |
| Narrative fidelity | Is the AI describing the brand accurately and consistently? |
| Citation quality | Which owned, earned, competitor, community, and media sources shape the answer? |
| Prompt-level gaps | Which prompts are wins, competitive risks, losses, or whitespace? |
| Recommended actions | What should the brand improve, publish, clarify, or distribute next? |
| Weekly movement | Is the brand gaining, holding, or losing ground over time? |
Cloudflare competes across multiple technical narratives at once: CDN, DDoS protection, web application firewall, edge compute, Zero Trust, VPN replacement, developer infrastructure, and integrated security platform.
That makes it ideal for testing how AI systems handle overlapping product categories, technical terminology, competitor comparisons, and citation sources.
The goal was not to evaluate Cloudflare as a company. The goal was to use a visible, technically complex brand to stress-test a workflow that applies to any B2B company competing for AI search visibility.
For Week 1, I created a fixed library of 15 buyer-intent prompts in Otterly.AI, scoped to the United States, spanning Cloudflare's major narratives: global CDN, DDoS protection, WAF, edge compute, Zero Trust, VPN replacement, origin egress cost reduction, and integrated CDN/WAF/DNS platform positioning.
Each weekly run uses the same prompt set, tracking structure, scoring approach, and interpretation process so movement can be monitored over time.
| Workflow stage | Tool | What happens | Human review |
|---|---|---|---|
| Prompt library setup | Otterly.AI | Build fixed buyer-intent prompts across narratives, personas, and competitive themes | Confirm prompts reflect real buyer questions |
| GEO tracking | Otterly.AI | Track inclusion, mentions, share of voice, position, competitors, and citations | Verify outputs match market, prompt set, and brand report |
| Evidence capture | Otterly.AI | Capture prompt lists, brand ranking, coverage, top prompts, and citations | Verify date range, country, engines, and screenshots |
| Data organization | Claude | Organize weekly exports and screenshots into a repeatable structure | Confirm files are complete and labeled |
| Metric normalization | Claude | Convert raw outputs into a structured GEO tracker | Spot-check rows against Otterly source |
| Prompt classification | Claude + human | Draft Win, Competitive, Loss, or Whitespace labels | Adjust based on intent and strategic importance |
| Narrative fidelity | Claude + human | Draft where the brand is accurate, weak, missing, or competitor-led | Validate accuracy and refine interpretation |
| Citation gap analysis | Claude + human | Identify which sources shape AI answers | Decide which gaps are worth acting on |
| Weekly reporting | Claude + human | Build the polished weekly report | Review, edit, and approve executive version |
| Brand | Mentions | Brand coverage | Share of voice | Average position |
|---|---|---|---|---|
| Cloudflare | 37 | 62% | 46% | 1.05 |
| Akamai | 21 | 35% | 26% | 2.15 |
| Fastly | 17 | 28% | 21% | 2.17 |
| Imperva | 6 | 10% | 7% | 3.00 |
The standout signal is Cloudflare's average position of 1.05. When it appears, it is almost always cited first. Placement, not just presence, drives perceived authority, trust, and buyer recall.
Cloudflare leads at 62% coverage, ahead of Akamai at 35%, Fastly at 28%, and Imperva at 10%. The value of this chart is not the single data point. It is the trend line it enables across Weeks 2 through 4.
Strongest prompts cluster around low-latency global app deployment, unmetered DDoS and global CDN, best global CDNs for high-traffic sites, WAFs with DDoS protection, reliable edge compute, and enterprise CDN/WAF/DNS consolidation.
Softer prompts, including Zero Trust, VPN replacement, and origin egress cost reduction, show lower mention counts and represent the clearest opportunity to strengthen narrative and citation-quality assets.
Key Week 1 citation findings:
A brand can have strong owned content and still lose narrative ground if competitor or third-party sources are more frequently retrieved. A strong GEO strategy must know which sources are retrieved, which are trusted, and which shape the answer.
| Classification | Meaning | Typical action |
|---|---|---|
| Win | Included, accurately framed, competitively positioned | Protect and monitor |
| Competitive | Appears, but competitors are strong or better framed | Comparison and authority work |
| Loss | Competitors appear, brand absent or weak | Owned content, schema, documentation, distribution |
| Whitespace | No brand owns the answer | Thought leadership opportunity |
This classification turns AI visibility data into an action map rather than a static dashboard.
Each weekly run follows the same eight steps:
The goal is not to admire the data. The goal is to improve the next run.
Week 1 was run hands-on to prove the measurement model. The next step is to automate the repeatable parts with Claude, without handing over strategy.
In a production GEO program, the same work recurs every week: maintaining the prompt library, capturing and organizing exports, normalizing metrics, classifying outcomes, reviewing citations, summarizing narrative gaps, flagging competitive change, drafting actions, and producing an executive-readable report.
Claude supports the workflow across five modules:
| Module | What Claude helps produce | Human review |
|---|---|---|
| Prompt library manager | A consistent prompt set, grouped by theme, persona, intent, and narrative | Confirm prompts reflect strategic priorities |
| Data intake assistant | A clean weekly folder of exports, screenshots, notes, and reporting files | Verify date range, country, engines, and sources |
| Metric normalizer | A structured GEO tracker with prompt, brand, competitor, mention, position, citation, and status fields | Spot-check rows against the source |
| Insight engine | First-pass classifications, narrative gaps, citation gaps, and recommended actions | Review judgment calls, adjust for business context |
| Reporting layer | A polished weekly report with charts, insights, competitor movement, and priorities | Edit and approve the executive version |
The strategist should not spend time copying data, sorting screenshots, and rebuilding the same report every week. That time is better spent deciding what the data means and what should happen next.
The intended output is a clean weekly GEO report, shareable with executives, marketing leaders, product marketing, content, demand generation, web, and technical SEO stakeholders.
The report is built around six parts:
The report is designed to answer the question every executive asks: what changed, what does it mean, and what should we do next?
| Metric | Week 1 result | What it means |
|---|---|---|
| Brand coverage | 62% | Cloudflare appears across the majority of tracked prompts |
| Generative share of voice | 46% | Cloudflare leads the tracked competitive set |
| Brand mentions | 37 | Most-mentioned brand across the prompt library |
| Average position | 1.05 | When Cloudflare appears, it is usually ranked first |
| Closest competitor | Akamai | Trails Cloudflare but remains visible across core prompts |
| Highest-pressure competitor | Fastly | Frequent in CDN, edge, and performance answer sets |
Cloudflare leads the tracked category in Week 1 with the strongest coverage, highest share of voice, and best average position. The main opportunity is not inclusion. Cloudflare is already visible. The opportunity is narrative depth in softer clusters: Zero Trust, VPN replacement, and origin egress cost reduction. These clusters show room to strengthen supporting content and citation-quality assets.
| Priority | Action | Why it matters |
|---|---|---|
| High | Review Zero Trust and VPN replacement answer text for narrative fidelity | These prompts show softer visibility and may need clearer positioning |
| High | Map weaker prompts to existing Cloudflare pages and documentation | Identifies whether the gap is content, structure, citation, or messaging |
| Medium | Strengthen comparison and use-case content around competitor-heavy prompts | Improves retrieval when Akamai or Fastly are strongly present |
| Medium | Review citation sources by prompt cluster | Shows which sources shape each answer |
| Medium | Build an executive trend view for Weeks 2 through 4 | Turns the baseline into a measurable operating cadence |
The report turns GEO from a specialist audit into a cross-functional operating rhythm: the web team acts on structure and schema, product marketing on narrative and comparison gaps, content on prompt clusters and documentation, demand generation on distribution, and PR on third-party source gaps.
AI search visibility is becoming a measurable growth channel, but the work requires more than checking whether a brand appears in ChatGPT, Perplexity, Gemini, or Google AI Overviews.
It requires a system: a prompt library, competitive benchmarks, inclusion tracking, generative share of voice, narrative fidelity review, citation analysis, technical and content diagnosis, third-party authority strategy, a testing cadence, a feedback loop from findings to action, an automation layer that makes it repeatable, and a reporting layer that makes it usable.
This Proof Lab demonstrates that system in motion. Cloudflare is the test brand, but the framework applies to any technical B2B company that needs to understand how AI systems retrieve, describe, cite, and rank it against competitors.
The deliverable is not a one-time AI visibility report. It is a repeatable GEO measurement workflow connecting prompts, sources, narratives, competitors, automation, reporting, and action.
It helps a team stop asking "Are we showing up in AI answers?" and start asking better questions:
This is Week 1 of a 4-week public build. New findings, narrative fidelity scoring, and the final repeatable framework will publish over the next three weeks.
This experiment connects to other Proof Lab assets that support AI search, AEO, GEO, citation strategy, and executive reporting.
A system that tracks how a brand appears in AI-generated answers across priority prompts, competitors, citations, and narratives. It measures inclusion, share of voice, average position, narrative fidelity, and source quality, then turns findings into actions.
It monitors brand mentions, prompt visibility, competitor presence, share of voice, average position, and domain citations across AI search surfaces.
It helps organize exports, normalize metrics, classify prompts, draft insight summaries, identify narrative and citation gaps, and prepare weekly executive-ready reporting.
Because inclusion alone can mislead. A brand can appear in an answer yet be described generically, incompletely, or in a way that favors a competitor's framing.
Yes. The prompt library, tracker, scoring, and reporting structure are brand-agnostic and can be applied to any technical B2B company or market.