DOCUMENTATION

The PROOF scoring guide

PROOF scores your page across nine dimensions, totalling 100 points. Each section below explains what we check, why it matters, and how to fix it. Use this as your reference when working through the recommendations PROOF generates.

Technical SEO

The plumbing of your page. Search engines and AI crawlers need clean, well-structured HTML signals to understand what the page is and who it serves.

Browser title length

WhatThe title tag in your page head, shown in browser tabs and search results.

WhyGoogle truncates titles over 60 characters. Too short and you miss ranking opportunities. Too long and your full pitch never shows.

FixAim for 50-60 characters. Lead with your keyword, end with brand.

Meta description length

WhatThe meta description tag, often used as the snippet under your link in search results.

WhyA compelling 150-160 char description boosts click-through rate, which Google uses as a ranking signal.

FixWrite 150-160 chars with the value proposition + keyword + a verb that drives action.

Single H1 tag

WhatThe main heading at the top of the page.

WhyMultiple H1s confuse both search engines and screen readers about what the page is primarily about.

FixUse exactly one H1 per page. Demote any others to H2.

URL length and structure

WhatThe full web address of the page.

WhyShort, descriptive URLs are easier to share, look more trustworthy, and rank slightly better.

FixKeep under 75 characters. Use hyphens between words. Include your keyword.

Mobile viewport tag

WhatThe viewport meta tag that tells mobile browsers how to scale your page.

WhyWithout it, mobile users see a tiny desktop view. Google penalises non-mobile-friendly pages.

FixAdd this in your head: meta viewport content="width=device-width, initial-scale=1".

On-page SEO

How well the page signals what it is about for a specific search intent. Where your target keyword shows up, and how naturally.

Keyword in browser title

WhatWhether your target keyword appears in the page title tag.

WhyTitle is the single strongest on-page ranking signal. If your keyword is not here, you are fighting uphill.

FixAdd the keyword near the start of your title.

Keyword in H1

WhatWhether your target keyword appears in the main page heading.

WhyH1 reinforces topical relevance to both users and crawlers when it matches the title intent.

FixInclude the keyword in your H1 phrasing, naturally.

Keyword in meta description

WhatWhether your keyword appears in the meta description tag.

WhyWhen users see their search term highlighted in the snippet, click-through rates jump.

FixMention the keyword once in your description, naturally, near the start.

Keyword in URL slug

WhatWhether the keyword appears in the page URL.

WhySearch engines use URL words as a relevance signal. Users also trust descriptive URLs more.

FixUse a slug like /your-keyword-phrase rather than /post-12345.

Keyword in opening 100 words

WhatWhether the keyword appears in the first 100 words.

WhySearch engines weight the opening of the page heavily. LLMs disproportionately quote opening paragraphs.

FixMention your keyword naturally in the intro paragraph.

Keyword density

WhatHow often your keyword appears relative to total word count.

WhyToo low and the page looks irrelevant. Too high and it looks like keyword stuffing.

FixAim for 0.5 to 3 percent. For 600 words that is 3 to 18 mentions.

Topical coverage

Whether your page demonstrates topical authority by using multiple related phrases across different page zones, not just repeating one keyword.

Phrases distributed across page zones

WhatHow many distinct phrases related to your target keyword appear across multiple page zones (title, H1, H2, meta description, body).

WhyPages that score highly with both Google and AI engines use 3 to 5 related phrases that appear in multiple zones, not piled in body alone. Distribution across zones signals genuine topical depth, not keyword stuffing.

FixIdentify 5 to 7 variations of your target keyword. Make sure each variation appears in at least 2 places, for example title plus body, or H2 plus meta description.

Total breadth of related phrases

WhatTotal count of distinct related phrases that appear at least twice anywhere in your content.

WhyAI engines look for vocabulary breadth around a topic. A page mentioning only your exact keyword 20 times looks thin compared to one that uses 7 related phrases naturally.

FixBrainstorm related phrases: synonyms, sub-topics, common questions about the keyword, related products or services. Weave 5 to 7 of these naturally into your content.

Citation worthiness

How likely your page is to be cited by ChatGPT, Claude, Gemini, and Perplexity. AI engines cite complex, nuanced content 77 percent of the time versus only 23 percent for simple content.

Multi-word question depth

WhatHow many headings on your page ask substantive, multi-word questions (5 or more words).

WhyAI engines cite content that answers specific, nuanced questions. A heading like "What is SEO?" is too generic. "What is the difference between SEO and AEO?" is the kind of question AI engines quote.

FixRewrite at least 3 H2 headings as multi-word questions your readers actually ask. Aim for 5 or more words per question.

Sentence depth and nuance

WhatWhat percentage of your sentences are long (15+ words) or contain multi-clause structures (comma + because, while, although, etc).

WhyShort, simple sentences are surface-level content. AI engines prefer pages that demonstrate analytical depth through layered sentence structures, signaling genuine expertise.

FixMix sentence lengths intentionally. Keep some short and punchy. Use longer multi-clause sentences when you need to explain causation, comparison, or qualification.

Concrete data and specifics

WhatHow many specific numbers, dates, statistics, and proper nouns appear in your content.

WhyAI engines cite content rich in concrete data because there is something specific to extract. Vague content with no numbers or named examples rarely gets cited.

FixAdd specific numbers (percentages, dollar amounts, counts), year references, and named examples (companies, people, places) wherever you can do so honestly.

Content quality

Whether the page is substantial enough to rank and structured cleanly enough for both humans and AI to extract from.

Word count

WhatTotal words in the body content.

WhyThin pages rarely rank. Most ranking pages for competitive terms are 800+ words. LLMs need substance to extract anything quotable.

FixExpand below 600 words. For competitive terms, 1500+ tends to perform best.

Average sentence length

WhatMean words per sentence.

WhyLong sentences are harder for humans to parse and harder for LLMs to chunk into citation-quality snippets.

FixKeep average under 25 words. Break up anything over 30.

Heading hierarchy

WhatHow many H2 sub-headings structure your content.

WhyPages with 2 to 5 H2s scan better, rank better, and give LLMs natural extraction points.

FixAdd 2-3 H2 headings for any article over 500 words.

Image alt text coverage

WhatPercentage of images that have descriptive alt text attributes.

WhyAlt text is required for accessibility, helps images rank in image search, and gives LLMs context.

FixAdd descriptive alt text to every image. Avoid generic "image1.jpg" or empty alt.

Visual content present

WhatWhether the page has at least one image.

WhyPages with visuals get more dwell time and shares, which improves rankings indirectly.

FixAdd at least one relevant image, infographic, or diagram.

Structured data

Schema.org JSON-LD markup tells search engines and AI crawlers exactly what kind of thing your page is, who created it, and what facts it contains.

Schema.org JSON-LD present

WhatWhether the page has any JSON-LD structured data.

WhyJSON-LD is the format Google, Bing, and AI crawlers prefer. Without it, machines have to guess at your page meaning.

FixAdd a script type=application/ld+json block in your head with at least Article or WebPage schema.

Article or WebPage schema

WhatSpecific declaration that this is an article or content page.

WhyTells crawlers how to interpret the rest of your data. Prerequisite for richer schemas like FAQ.

FixAdd @type Article (or WebPage) with headline, datePublished, author fields.

Organization or Author schema

WhatSchema that declares who published the content.

WhyEstablishes E-E-A-T (Experience, Expertise, Authoritativeness, Trust). LLMs cite content from named, credible sources more often.

FixAdd @type Organization or Person with name, url, optional logo and sameAs links.

FAQ or QAPage schema

WhatStructured Q&A data attached to the page.

WhyHighest-impact AEO move available. LLMs preferentially extract from FAQ-shaped content.

FixAdd 3-5 FAQ entries via FAQPage schema, each with a question + clear answer.

AEO readiness

Answer Engine Optimization. How well the page is shaped to be quoted, cited, or summarised by ChatGPT, Claude, Gemini, and Perplexity.

Page answers a clear question

WhatWhether the page has at least one explicit question pattern in headings or body.

WhyLLMs are trained to answer questions. Pages that pre-frame the question get extracted more often.

FixAdd at least one question, in an H2 or in the opening line.

Direct answer in opening

WhatWhether the first 40-80 words give a substantive, quotable answer.

WhyLLMs heavily weight opening content. A wandering intro means LLMs grab a bad snippet, or none at all.

FixLead with a direct answer to the page topic. The first paragraph should be self-contained.

Question-style sub-headings

WhatWhether your H2s are framed as actual questions.

WhyH2 questions match the way users phrase queries to LLMs, dramatically increasing your retrieval probability.

FixReframe at least 2 H2s as questions: "What is X" rather than "About X".

Lists and structured content

WhatWhether the page uses bulleted or numbered lists.

WhyLists are pre-chunked content. LLMs preferentially extract from list items because they are self-contained.

FixConvert at least one prose paragraph into a 3-5 item bulleted or numbered list.

Citation or source language

WhatWhether the page references named external sources, studies, or reports.

WhyLLMs rate well-cited content as more authoritative. Phrases like "according to" signal credibility.

FixReference at least one external study, report, or named source by name.

Author or attribution signal

WhatWhether a named author byline is visible.

WhyAnonymous content gets cited less. LLMs use authorship as a credibility filter (E-E-A-T).

FixAdd a visible byline plus meta author tag.

Date or freshness signal

WhatWhether the page declares when it was published or updated.

WhyLLMs and search engines favour fresh content. Without dates, content looks stale.

FixShow publish/updated date in visible content AND in article published_time meta.

Definitions of key terms

WhatWhether the page explicitly defines what its key terms mean.

WhyLLMs cite definitions because they are extractable as standalone facts.

FixDefine your key terms with explicit phrases like "X is defined as", "X refers to".

AI Discoverability

Whether the AI crawlers from major LLM providers can actually access your site. Many sites accidentally block them and have no idea.

robots.txt present

WhatA robots.txt file at the root of your domain.

WhyWithout it, crawler behaviour is undefined. Best practice is to declare your rules explicitly.

FixAdd a robots.txt at yoursite.com/robots.txt with explicit User-agent rules.

GPTBot allowed

WhatOpenAI's crawler for ChatGPT and GPT model training.

WhyBlock this and you are invisible to ChatGPT. ChatGPT has hundreds of millions of weekly users.

FixIn robots.txt, do NOT include "Disallow: /" under "User-agent: GPTBot".

ClaudeBot allowed

WhatAnthropic's crawler for Claude.

WhyBlock this and Claude cannot cite your content. Claude has a fast-growing enterprise user base.

FixIn robots.txt, do NOT block "User-agent: ClaudeBot".

PerplexityBot allowed

WhatPerplexity's crawler.

WhyPerplexity is the fastest-growing AI search engine. Blocking it removes a real growth channel.

FixIn robots.txt, do NOT block "User-agent: PerplexityBot".

Google-Extended allowed

WhatGoogle's opt-in flag for Gemini and AI training.

WhyBlock this and Gemini cannot use your content.

FixIn robots.txt, do NOT block "User-agent: Google-Extended".

CCBot allowed

WhatCommon Crawl bot. Many AI training datasets derive from Common Crawl.

WhyBlock this and you reduce the chance of appearing in next-generation models.

FixIn robots.txt, do NOT block "User-agent: CCBot".

Applebot-Extended allowed

WhatApple's crawler for Apple Intelligence.

WhyApple Intelligence is rolling out across iOS and macOS. Long-term visibility signal.

FixIn robots.txt, do NOT block "User-agent: Applebot-Extended".

llms.txt declaration

WhatA new standard file at yoursite.com/llms.txt that gives AI a curated map of your important content.

WhyAnthropic-proposed standard. Helps AI tools navigate your site efficiently.

FixCreate an llms.txt file with markdown links to your most important pages.

Sitemap declared in robots.txt

WhatA Sitemap directive in your robots.txt pointing to your XML sitemap.

WhyHelps both Google and AI crawlers discover all your pages quickly without crawling your entire site.

FixAdd a line at the bottom of robots.txt: Sitemap: https://yoursite.com/sitemap.xml

Explicit AI crawler rules

WhatWhether your robots.txt specifically names AI crawlers (GPTBot, ClaudeBot, etc.) rather than just having no rules.

WhyExplicitly naming the AI crawlers signals intentional permission. AI tools can distinguish between "you forgot to set rules" and "you actively welcome us".

FixAdd explicit User-agent blocks for GPTBot, ClaudeBot, PerplexityBot, etc., each followed by Allow: / to show you intentionally permit them.

llms.txt quality

WhatWhether your llms.txt has substantial content beyond just a title — a description, structured links, context.

WhyA sparse llms.txt is barely better than none. AI engines use llms.txt as a curated map of your site, so empty or thin files give them nothing to work with.

FixInclude a description of your site, markdown links to your most important pages organized by section, and a brief context line for each link.

Authority signals

External and internal signals that the page is trustworthy and well-connected.

Internal link depth

WhatHow many links the page has to other pages on your own site.

WhyInternal links spread authority across your site and help crawlers discover related content.

FixLink to 3-5 related pages on your site. Use descriptive anchor text.

External citations

WhatLinks to credible sources outside your domain.

WhyExternal links to authoritative sources signal that you have done your research.

FixCite at least 2 credible external sources with descriptive anchor text.

Author metadata

WhatA meta author tag in the page head.

WhyReinforces authorship for crawlers that do not parse visible bylines.

FixAdd meta name=author content="Your Name" to the head.

Date metadata

WhatArticle published_time and modified_time meta tags.

WhyMachine-readable date signals that visible dates do not always provide.

FixAdd meta property=article:published_time and article:modified_time tags.