Synthetic Personas for Landing Page Optimization: Using LLMs as AI Focus Groups

Tyler Gargula February 12, 2026

Turning LLMs into Focus Groups: What the SSR Paper Means for Product and Marketing Teams

Our team has spent years analyzing how search engines evaluate content. We’ve watched the industry cycle through countless methods for understanding user intent. Focus groups, A/B tests, survey panels — they all share a common problem: they’re expensive, slow, and difficult to scale. A recent paper from PyMC Labs suggests we may be approaching a turning point.

What the PyMC Labs Paper Says

“LLMs Reproduce Human Purchase Intent via Semantic Similarity Elicitation of Likert Ratings” (Maier et al., 2025) presents a method called Semantic Similarity Rating (SSR) that achieves roughly 90% of human test-retest reliability when simulating consumer survey responses. That means if you ask an AI-generated persona to evaluate your landing page today and again next week, you’d get similar feedback both times, just like asking the same real person twice.

The Research Behind It

The researchers tested their approach against 57 different consumer surveys with over 9,300 human responses and found that LLM-generated synthetic consumers could replicate human preference distributions with surprising accuracy.

The methodology was validated across real-world consumer products, including Colgate-Palmolive product surveys, demonstrating that synthetic consumer responses can meaningfully approximate human preferences for actual market offerings—not just hypothetical scenarios.

How SSR Works (and Why it Matters)

The SSR approach is straightforward but powerful. Instead of asking an LLM to rate something on a scale of 1-5, you ask it to respond in natural language as a specific persona would.

To do this, you create reference statements for each point on the scale — what a “1 – Strongly Disagree” response sounds like versus a “5 – Strongly Agree.” You then measure how semantically similar the LLM’s response is to each of your reference statements. A response like “I’d definitely recommend this to colleagues” maps closer to “5 – Strongly Agree” than to “3 – Neutral” based on embedding similarity, not because the model was told to output a number.

This matters because asking an LLM to rate based on a numerical scale produces unreliable, overconfident answers. The SSR approach is more nuanced. Rather than saying “this is definitely a 4,” it might say “this is 60% likely to be a 4 and 30% likely to be a 5.” That uncertainty is more realistic — it better reflects how real people think.

What This Means for SEO and Marketing

For those of us in search and content, this research opens a practical question: can we use synthetic consumers to evaluate how well our product pages communicate value? The connection isn’t immediately obvious, but consider the problem we’re trying to solve.

The AI Overview Problem

Google’s AI Overviews now directly recommend products and brands for commercial queries. When a potential customer asks about “enterprise project management software,” they see a synthesized answer that positions certain brands favorably and others not at all. Understanding both where you stand in that AI-generated recommendation and how your actual landing page performs for different buyer personas becomes increasingly valuable.

How the SSR Methodology Can Help Us Evaluate Landing Pages

Rather than relying on intuition or expensive user testing, the SSR approach allows us to generate diverse personas and have each one evaluate your landing page from their specific perspective. For example, you could create personas such as:

  • A procurement manager focused on compliance
  • A technical buyer concerned with API capabilities
  • An end user who just wants something that works

Now, the resulting scores from their evaluations can’t perfectly predict conversions, but they can help us identify gaps. If synthetic personas consistently flag that your pricing page lacks transparency while competitors score higher on that dimension, that’s directional feedback worth investigating.

The Project: Persona Product Pages Review

Based on this research, we built an R&D prototype that combines AI Overview analysis with SSR-based persona evaluation. Here’s how it works.

Step 1: Define the Product Category

You start by entering what the tool calls an Explicit Product Description (EPD) — a clear, specific description of the product or service category you want to analyze. Something like “construction project management software” or “AI-powered SEO platform for enterprise teams.” The more specific the EPD, the more relevant the competing brands and personas will be. You also select whether this is a B2B or B2C context, which shapes how personas are generated downstream.

Step 2: Discover Competing Brands via AI Overview

The tool queries Google and extracts brands mentioned in the AI Overview — the AI-generated summary that appears at the top of search results for your EPD. This ensures the analysis focuses on brands that Google’s AI considers relevant to your product category, not just brands you already know about.

For each discovered brand, the tool calculates an AI Visibility Salience Score — a composite metric (scored 1–5) that combines four dimensions of how the brand appears in the AI Overview. Those four dimensions include:

  • Its position in the text
  • Depth of coverage
  • Sentiment of the mention
  • Strength of any recommendation language

This score captures not just whether a brand was mentioned, but how prominently and favorably.

Step 3: Review and Refine the Brand List

This is where you stay in the loop. Before analysis proceeds, the tool presents the discovered brands along with their identified domain and landing pages. You can remove irrelevant competitors, correct landing page URLs (pointing to a specific product page rather than a homepage, for example), mark one brand as “My Brand” for comparative framing, or add brands that weren’t automatically discovered.

This review step matters because the tool searches for each brand’s most relevant product page based on the EPD, but automated page discovery isn’t perfect. A direct product page will produce more meaningful persona feedback than a generic homepage.

Step 4: Extract and Evaluate Landing Page Content

For each brand’s landing page, the tool fetches the page content and converts it to clean text for analysis. It’s worth noting what this captures and what it doesn’t: the extraction works well for text-based content but won’t evaluate visual design, interactive elements like demos or calculators, content loaded dynamically via JavaScript after initial render, or anything behind a login or form gate.

Navigation elements (headers, footers, menus) are included in the extracted content, which can occasionally influence persona feedback in unexpected ways.

Step 5: Generate Synthetic Personas

The tool generates a set of synthetic buyer personas with realistic demographic attributes appropriate to the market context. Each persona has distinct goals, challenges, and buying preferences that shape how they evaluate landing page content.

  • For B2B: Personas include specific job titles, seniority levels, company sizes, industry-specific pain points, etc.
  • For B2C: Personas include household income, household size, education level, employment status, location type (urban, suburban, rural), purchase preferences (in-store, online, mixed), etc.

This differentiation matters because a VP of Engineering evaluating API documentation has entirely different priorities than a suburban parent evaluating price and convenience. The personas need context-appropriate attributes to produce meaningful feedback.

Step 6: Persona Evaluation via SSR

Each persona evaluates each brand’s landing page content. The evaluation produces two complementary scores:

  • SSR (Semantic Similarity Rating): The persona provides a free-text response, evaluating the landing page. Our tool then asks the LLM to generate short, distinct summaries of those responses before mapping them to the reference statements anchored to each point on the 1-5 Likert scale. The summary is compared via cosine similarity to the reference statements, then converted to a probability distribution using softmax. The expected value gives the SSR score. Theis captures implicit sentiment — what the persona’s language reveals about their reaction, even beyond what they might consciously rate.
  • FLR (Follow-up Likert Rating): A traditional 1–5 rating extracted from the persona after providing their free-text evaluation. This is the explicit, stated preference.

Why Both Scores Matter

Comparing SSR and FLR for the same persona-brand pair can surface useful discrepancies:

  • When SSR runs lower than FLR: Implicit concerns in the written feedback that
  • When FLR runs lower than SSR: More explicit criticism than the written language suggestedweren’t fully penalized in the numerical rating
  • When agreement is below 70%: The discrepancy is significant enough that reviewing the qualitative feedback will provide important additional context
  • When agreement is 70% or higher: The persona’s implicit and explicit reactions are reasonably aligned

What You Get: Analysis Outputs

The tool delivers five different types of output, each designed for different team needs and decision contexts.

Quadrant Chart Visualization

A two-axis plot showing where each brand sits competitively:

  • X-axis: AI Visibility (salience score from the AI Overview)
  • Y-axis: Landing Page Effectiveness (SSR rating from persona evaluations)

Brands in the upper-right quadrant of the plot have both high visibility and strong landing pages. Confidence intervals (95%) appear as circles around each data point, reflecting the spread of persona ratings. This immediately surfaces misalignments: brands that rank well in AI search but have weak landing pages, or strong pages that aren’t getting AI visibility.

Per-Brand Results

Each brand receives an average SSR score across all personas, a score breakdown by persona type (e.g., “Technical Buyer: 4.2, Procurement Manager: 3.1”), both SSR and FLR ratings for comparison, and the four-dimension AI salience metrics — position, coverage, sentiment, and recommendation strength.

Qualitative Feedback Aggregation

For each brand, the tool aggregates what synthetic personas responded positively to (“clear pricing tiers,” “comprehensive API documentation”) and what they flagged for improvement (“no case studies for our industry,” “unclear implementation timeline”), categorized by sentiment. When multiple personas independently flag the same issue, that signal strengthens.

Competitive Gap Analysis

Side-by-side comparison showing where your brand (or a target brand) stands relative to competitors on both dimensions, with specific feedback on what’s driving the differences.

Exportable Datasets

The tool provides structured data for further analysis and integration into existing workflows:

  • SSR Metrics Dataset: Cross-sectional data showing each brand’s SSR scores and associated metrics
  • Feedback Summary Dataset: Aggregated positive and negative feedback grouped by brand and category
  • Persona-Level Dataset: Individual persona responses with FLR/SSR ratings per brand, including key persona attributes

How You Can Use Our Tool in Practice

Our tool supports different workflows depending on your role, timing in the development cycle, and what questions you’re trying to answer. Here’s how we recommend applying it:

By Team Function

  • Product marketing: Use personal feedback to identify messaging gaps before launching campaigns. If the “enterprise IT director” persona consistently rates competitor pages higher on security messaging, that’s a content priority.
  • SEO teams: Get visibility into AI Overview positioning alongside traditional SERP metrics. Understanding whether Google’s AI considers your brand authoritative for a product category (and why) informs content strategy in a landscape where AI-generated results increasingly shape first impressions.
  • Product managers: Validate whether landing page content actually communicates the features you’ve built. A feature that exists but doesn’t register with synthetic buyers may need better positioning.
  • Competitive intelligence: Run this analysis periodically to track how competitors’ messaging evolves and how AI search positioning shifts over time.
  • Cross-functional alignment: Showing a product manager, a marketer, and an SEO the same synthetic persona feedback on their page creates a shared vocabulary for discussing improvements. The data provides a neutral starting point for prioritization conversations.

By Use Case

  • Competitive intelligence before a page exists: If you’re building a new product page, you can run competitor pages through persona evaluation first. The aggregated feedback tells you what’s working in your category and what gaps you might fill.
  • Rapid iteration on messaging: Testing three different value proposition framings with a focus group takes weeks. Testing them with synthetic personas takes minutes. The results aren’t equivalent, but they can help narrow down which directions merit real user testing.
  • AI search positioning analysis: Understanding where your brand appears (or doesn’t) in AI Overviews, and how that positioning compares to your actual landing page strength, helps identify misalignments between search visibility and product reality. The salience score breakdown (position, coverage, sentiment, recommendation) gives specificity to what can otherwise feel like an opaque system.
  • Diagnosing SSR-FLR discrepancies: When a brand’s implicit sentiment (SSR) diverges meaningfully from its explicit rating (FLR), the qualitative feedback usually explains why. These discrepancies can point to subtle messaging issues that wouldn’t surface in a simple satisfaction survey.
  • Pre-launch validation: Before committing to a new landing page design, run it through persona evaluation alongside your current page. If scores don’t improve, that’s a signal to revisit the approach before investing in development.

Limitations Worth Acknowledging

Like any research-based methodology, there are boundaries worth understanding before you rely on it for decision-making. Some limitations come from the underlying research, others from how we’ve implemented it in our tool, and some from the practical realities of automated content extraction.

Survey Type Matters

The SSR paper is careful to note that this approach works better for some survey types than others. Consumer preference surveys about products show strong correlation with human data. However, surveys about sensitive personal topics or highly specialized knowledge do not. The researchers report two key reliability metrics: SSR achieves 90% of human test-retest reliability for product ranking correlation, and maintains realistic response distributions with KS (Kolmogorov-Smirnov) similarity greater than 0.85 compared to real survey data. When these thresholds aren’t met, the signal becomes unreliable.

For landing page evaluation, we’re in reasonably favorable territory. We’re asking about purchase intent and product-market fit, not about personal beliefs or domain expertise. Still, the outputs should be treated as directional signals rather than ground truth. If four out of five synthetic personas flag the same issue with your page, that’s worth investigating. But it doesn’t mean four out of five real customers would have the same reaction.

The Persona Fidelity Question

The paper demonstrates that LLMs can simulate diverse demographic perspectives, but the simulation is limited by what the model learned during training. A synthetic “small business owner in the industrial supply sector” may or may not accurately reflect the priorities of real buyers in that segment.

Content Extraction Constraints

The tool evaluates text content only. Visual design, brand aesthetics, interactive features, video content, and anything behind a login won’t factor into persona scores. Pages that rely heavily on JavaScript rendering or dynamic content loading may not be fully captured. For example, a beautifully designed page with mediocre copy might score lower than it deserves, and vice versa.

This project is still in the R&D phase. The tool is functional and available to try, but our focus remains on validating whether the methodology produces actionable insights in practice rather than refining it for large-scale deployment. We’re interested in how the approach performs across different product categories and market contexts.

If the prototype continues to validate, there are natural extensions: tracking how persona scores change after page updates, comparing SSR scores to actual conversion data, fine-tuning persona generation for specific industries, and expanding the content extraction to handle more complex page architectures.

For now, we’re interested in whether the underlying research holds up when applied to a concrete use case. The SSR paper suggests that LLMs can approximate human judgment in consumer contexts with reasonable fidelity. Evaluating landing pages against purchase intent seems like a reasonable test of that claim.

The full paper is available on arXiv (2510.08338v3) for those who want to dig into the methodology. You can try the tool and see how your landing pages stack up.

Special Shoutout: Thanks to Matthew Kay for testing and providing feedback on the tool.

Subscribe to Fullsteam

Join our newsletter to stay up to date on features and releases.

Search Engines Google’s August 2024 Core Update
Search Engines The Tale of Google: A Hero’s Journey To Dominate the Wild, Wild Web
Category / Tag The Tale of Google: A Hero’s Journey To Dominate the Wild, Wild Web