Focus groups at scale are still focus groups

The shape of the hype

Simile raised a Series A in February 2026. Aaru, Listen Labs, Qualtrics Edge, Ipsos, and Kantar are all selling some version of the same idea: run simulated conversations with AI panelists that behave like human respondents, at a fraction of the cost and a thousand times the scale. The pitch is clean. Focus groups are expensive, slow, and limited to a dozen people. A synthetic panel can produce a million opinions before lunch.

The category has named itself. Synthetic research. Synthetic respondents. AI focus groups. Each name reveals the mental model: qualitative research with better actors.

Synthetic panels at a million agents are still focus groups. More of the same signal does not give you a new kind of forecast.

Why scale is not the answer

Salganik, Dodds, and Watts ran the Music Lab experiment[1] in 2006. They let participants download unknown songs in an artificial social environment. The experiment was rerun eight times with identical songs and identical seed conditions. The eight worlds produced eight different hit lists. The correlation between quality and success was real but weak. Scale did not reveal the hits. Path dependence did.

Martin et al. in 2016[2] formalized this. They looked at how much predictive accuracy you can extract from individual-level features when outcomes are driven by social amplification. The ceiling is lower than practitioners assume. More panelists do not help when the outcome is a lottery.

Duncan Watts made the same point in Everything Is Obvious[3]. Post-hoc explanations of viral success describe a path that could have unrolled differently with a lucky change in the first hundred adopters. You cannot scale your way out of that. You can only integrate more signals that together constrain the space of futures.

The weather forecasting parallel

Lewis Fry Richardson computed a six-hour weather forecast by hand in 1922. It took him weeks, and the forecast was wrong. The failure was not arithmetic. It was that Richardson's single-model approach could not absorb observations fast enough to correct itself.

Weather forecasting got good between 1950 and today for two reasons. First, Charney, Fjortoft, and von Neumann ran a numerical model on the ENIAC computer in 1950[4] and demonstrated that multiple physical models could be integrated numerically. Second, ECMWF built ensemble forecasting in the 1980s with data assimilation: not one simulation, but many, fused against incoming observations.

Bauer, Thorpe, and Brunet called this the "quiet revolution of numerical weather prediction" in a 2015 Nature review[5]. Forecast skill improved by roughly one day per decade. The useful forecast horizon in 1960 was one day out. In 2015 it was six days out. Not because the models got bigger. Because the integration got better.

The medical AI parallel

Single-modality medical AI plateaued early. Esteva et al. in 2017[6] trained a dermatology CNN that matched board-certified dermatologists on lesion classification. Rajpurkar's CheXNet hit similar benchmarks for chest radiographs. The headline results stopped improving.

The Acosta et al. Nature Medicine review in 2022[7] found multimodal biomedical AI, fusing imaging with clinical notes, labs, and genomics, consistently lifted AUC by five to fifteen points over the best single-modality baselines. The pattern is structural. When outcome generation is multi-causal, single-signal systems saturate.

The superforecasting parallel

Tetlock and Gardner's Superforecasting[8] reported that the Good Judgment Project's top forecasters beat the intelligence community's baseline by roughly thirty percent on geopolitical questions. Not because any one forecaster was brilliant. Because aggregating updating forecasts from multiple informed participants, with calibration feedback, is how forecasting accuracy is produced.

Nate Silver made the same case in The Signal and the Noise[9]. Surowiecki's Wisdom of Crowds[10] is the popular version. Each treats prediction as an integration problem with aggregation, updating, and calibration as the primitives. Not as a sampling problem.

What applies to behavioral prediction

Behavioral prediction is a multi-causal outcome. A creative's performance depends on the cortical response it produces, on the linguistic features of its message, on its distance from the zeitgeist it lands in, and on the structural history of creatives like it. Four independent signal families, each instrumentable at commodity cost in 2026 for the first time.

We wrote the framework piece separately (see the four signals framework). Neural, linguistic, cultural, historical. Fusion with cross-family attention. Calibration against published field outcomes. That is the shape of a forecasting system, not a focus group.

Five content types overlaid on a single radar — Figure 01Five short-form archetypes overlaid on the seven-network radar. A synthetic-panel vendor returns one signal family per ad. A fusion system returns the whole shape. Different problems, different instruments.

What this means for the category

Synthetic research is qualitative research with better actors. It is still qualitative research. It is not infrastructure for prediction. It can be useful. It cannot be sufficient.

The category's choice is whether to name its ceiling honestly or to spend another decade rebranding. Neuromarketing spent twenty years calling single-signal EEG "neural insight," ignored the reverse-inference critique, and consolidated into NielsenIQ with a cut-down unit (we covered that history). Synthetic research is about to repeat the same cycle unless it reorganizes around signal fusion.

What it means for the reader

If you are a brand, stop buying point tools that produce single-signal scores and calling the number prediction. Ask every vendor in your stack what they publish about calibration, what their error bars are on out-of-distribution creative, and what happens when two of their signals disagree.

If you are an investor, stop funding more focus-group-at-scale companies. The market is not demand-constrained; it is ceiling-constrained. The returns compound for whoever builds the fusion layer first, not the hundredth synthetic panel.

If you are a researcher, calibrate. Publish your error bars. Publish your failure modes. Make the category honest by doing the science in public.

Our posture

OpenAffect is built for the second thing. We integrate four signal families, we publish our calibration studies (see why we publish), and we treat uncertainty the way weather services and medical systems treat it: as a first-class output with honest error bars, not as a footnote to the marketing deck.

The last twenty years of behavioral prediction were spent scaling single instruments. The next decade will be won by whoever fuses many.

References

1Salganik, Dodds, Watts. Experimental study of inequality and unpredictability in an artificial cultural market. Science 2006.
2Martin, Hofman, Sharma, Anderson, Watts. Exploring limits to prediction in complex social systems. WWW 2016.
3Watts. Everything Is Obvious: Once You Know the Answer. Crown 2011.
4Charney, Fjortoft, von Neumann. Numerical integration of the barotropic vorticity equation. Tellus 1950.
5Bauer, Thorpe, Brunet. The quiet revolution of numerical weather prediction. Nature 2015.
6Esteva et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature 2017.
7Acosta, Falcone, Rajpurkar, Topol. Multimodal biomedical AI. Nature Medicine 2022.
8Tetlock and Gardner. Superforecasting: The Art and Science of Prediction. Crown 2015.
9Silver. The Signal and the Noise. Penguin 2012.
10Surowiecki. The Wisdom of Crowds. Doubleday 2004.
11Lorenz. Deterministic nonperiodic flow. Journal of the Atmospheric Sciences 1963.

Focus groups at scale are still focus groups

Each signal family hits its own ceiling. Hybrid fusion routes around every single-signal plateau.

The shape of the hype

Why scale is not the answer

The weather forecasting parallel

The medical AI parallel

The superforecasting parallel

What applies to behavioral prediction

What this means for the category

What it means for the reader

Our posture

References

Keep reading

The four signals of human response

Why we chose to publish our research

What is neuromarketing in 2026? (And why most of what you have read is wrong)

Synthetic respondents vs focus groups: what actually replaces what in 2026

How much does a focus group cost in 2026? (With real numbers)

Predicting the future is signal integration, not magic