SEOs Data Poisoning for fun & profit, aka LLM Lobbying

May 03, 2025

TLDR

The output of LLMs can be deliberately influenced by the content of the training corpus
There is a lot of academic literature describing this as ‘data poisoning’
Niche topics can require relatively small amounts of the content to alter outputs
For most SEOs, it’s probably better to focus on RAG but it does make sense for big brands and/or very niche topics

Data poisoning

Most SEO for LLMs (‘GEO’, ‘AI SEO’, ‘LEO’ - whatever you want to call it) is likely to be influencing the material incorporated via RAG (aka ‘grounding’).

However, altering the underlying model’s output is also desirable when LLM based tools like ChatGPT choose not to integrate external data, or even when external data is included, influencing a model to:

recommend your brand or product over competitors’
use specific adjectives and attributes to describe your products

There is academic research on this topic going back to 2017, and unfortunately it’s dramatically referred to in the field as ‘data poisoning’. ☠️

Specifically, an SEO producing content solely designed to influence the output of ChatGPT for their employer would be a… stealthy, black box, single pattern, clean-label data poisoning attack:

Stealthy: We don’t want OpenAI to find out
Single-pattern: We are interested in only the outputs related to one topic, not influencing the whole model (AKA ‘targeted integrity’)
Black box: We don’t know GPT model’s corpus, weights etc.
Clean-label: Our content serves a genuine purpose, i.e. appears to be “clean”, unlike “dirty”-label attacks which use obviously incorrect information
Data poisoning: Influencing the model during training, rather than inference i.e. when a user prompts

How to poison a Large Language Model

Paraphrasing Zhao et al (2025), data poisoning is the injection of carefully crafted ‘malicious’ samples into the training dataset of a model, with the intent of manipulating its behavior.

The number of times that you need to have a specific message in a model’s training corpus to influence the LLMs output, depends how well a topic is already covered in the corpus. e.g. in a corpus that includes Reddit then pop culture and cats will have lots of data, but very niche product categories or local content is probably very sparse.

A simple quantification looks like this:

$I_t = \alpha \cdot \beta \cdot \left(\frac{P_t}{D_t}\right)$

A mathematical formula describing how much content is required to influence an LLM

I_t: Influence over model outputs on a topic
D_t Total number of training tokens for that topic
P_t: Number of poisoned tokens injected on topic
α: Pattern repetition or memorability (e.g. how repetitive your ‘poisoned’ content is)
β: Model sensitivity to rare topic tokens (overfitting or undertraining for sparse topics)

Explanation

$\frac{P_t}{D_t}$ : This is your share of the topic’s training data
$\alpha$ : Amplifies influence if your poison is repetitive, stylistically consistent, or semantically reinforced
$\beta$ : Reflects that some topics are learned less robustly, so models are more sensitive to the data they do get

Caveats

An important caveat to this formula is that input to output is usually non-linear, Kandpal et al. show that a sequence appearing 10 times in the training data is, on average, generated approximately 1,000 times more frequently than a sequence that appears only once.

Also different types of content are learned at different rates - a study done by Tirumala et al. on memorization of different parts of speech, reveals that nouns and numbers are memorized significantly faster than other parts of speech.

Avoiding defenses

As the commercial benefit of influencing LLM becomes adversarial between model creators and SEO, like Google’s ‘Spam Brain’, LLM creators will almost certainly fight back:

1) Filtering ‘poisoned’ documents

How models defend themselves

Commercial LLM training pipelines employ filtering techniques to exclude manipulative content:

Data filtering and outlier detection algorithms flag statistically unusual content patterns (a bit like ike Google’s ‘Spam Brain’
Membership inference detection (Carlini et al.) removes suspected poisoned data
Differential privacy techniques during training dilute the influence of any single training example
Semantic binding analysis (Kurita et al.) identifies content with unusual word associations designed to influence models

Evading filtering defenses

The academic literature calls this “clean-label poisoning” - embedding your content designed to influence/poison within legitimately valuable information.

This maintains the document’s usefulness, while including the ‘influencing’ content. I think of it as a “spoonful of sugar” - the valuable content helps the influencing content get past filters.

Specifically:

Ensure your content is accessible to training crawlers (allowing useragents like GPTBot, Google-Extended, CCBot, anthropic-ai)
Maintain the high information value alongside ‘influencing’ content.
Avoid statistical anomalies that might trigger filters i.e. don’t over do it, don’t say weird stuff

2) Collapsing or downweighting repetition

How models defend themselves

LLM training pipelines are likely to identify and downweight repetitive content that appears to be attempting to artificially influence the model through repetition:

Collapsing duplicate or near-duplicate content
Detecting and downweighting content clusters with suspiciously similar messaging
Reducing the weight of content that appears designed to over-represent certain viewpoints

Evading repetition defenses

Simply rewording content isn’t sufficient according to recent academic research. Jia et al. demonstrated that effective semantic variation requires specific optimization approaches to maintain influence while appearing diverse to defense mechanisms:

Creating genuinely different content pieces that arrive at the same conclusions
Varying not just wording but structure
Distributing influence across different contexts, domains, and document types
Maintaining natural language patterns that don’t trigger repetition detectors

Is this ethical?

Data poisoning sounds awful, it’s not something I’ll tell people I do at parties. However I would suggest this is more akin to political lobbying, in that it can be both good and bad.

How it can benefit models and users

It add missing data in a models training corpus
If the messaging is accurate, it improves the quality of the model’s output
Most major brands have now disallowed model providers from training on their content making model development more difficult. This may encourage them to open their content again.

How it can harm models and users

It can obviously introduce inaccuracies in responses e.g. if your brand truly isn’t the best/cheapest/shiniest, and you convince a model it is - it will be misleading users.
“Transferability”- poisoning in one domain can affect model behavior in adjacent domains in unpredictable ways

Is it worth the effort?

If you’re creating content on the web for SEO, and that content is open to training, you already are, even if not deliberately.

As to where to put your efforts, influencing LLM output is much more readily achievable via influencing RAG. Research has shown that RAG influence techniques can be just as harmful, if not more so, more effective at altering output

Influencing RAG will be a slow process as model retraining happens infrequently, currently frontier models releases are once or twice per year due to the chhuge costs associated with them.

Year	OpenAI	Google	Anthropic
2020
2021	GPT-3	LaMDA
2022		LaMDA 2 PaLM
2023	GPT-4	Bard PaLM 2 Gemini	Claude Claude 2
2024	GPT-4o	Gemini 2.0	Claude 3 Claude 3.5
2025		Gemini 2.5	Claude 3.7 Sonnet

Realistically LLM data poisoning will work either for:

The biggest brands. Work by Goldstein et al. suggests that influence capability follows power law distributions that favor established entities
Brands with a small but profitable niche

	Popular topic	Niche topic
Big brand	Yes	Probably no need
Small brand	Little chance of success	Yes

Specifically, what should I do?

Include the influencing messaging on your own site
43% of the top 100 news websites in the world still allow training for OpenAI, and it’s even higher for smaller models so include the messaging in your external comms

Cited studies / more reading

Zhao, P., Zhu, W., Jiao, P., Gao, D., & Wu, O. (2025). Data Poisoning in Deep Learning: A Survey. arXiv preprint arXiv:2503.22759. PDF

Fu, T., Sharma, M., Torr, P., Cohen, S. B., Krueger, D., & Barez, F. (2024). POISONBENCH: Assessing Large Language Model Vulnerability to Data Poisoning. arXiv preprint arXiv:2410.08811. PDF

Zhang, Y., Rando, J., Evtimov, I., Chi, J., Smith, E. M., Carlini, N., Tramer, F., & Ippolito, D. (2024). Persistent Pre-Training Poisoning of LLMs. arXiv preprint arXiv:2410.13722. PDF

Chen, B., Guo, H., Wang, G., Wang, Y., & Yan, Q. (2024). The Dark Side of Human Feedback: Poisoning Large Language Models via User Inputs. arXiv preprint arXiv:2409.00787. PDF

Jiang, S., Kadhe, S. R., Zhou, Y., Cai, L., & Baracaldo, N. (2023). Forcing Generative Models to Degenerate Ones: The Power of Data Poisoning Attacks. arXiv preprint arXiv:2312.04748. PDF

He, P., Xu, H., Xing, Y., Liu, H., Yamada, M., & Tang, J. (2024). Data Poisoning for In-context Learning. arXiv preprint arXiv:2402.02160. PDF

Satvaty, A., Verberne, S., & Turkmen, F. (2024). Undesirable Memorization in Large Language Models: A Survey. arXiv preprint arXiv:2410.02650. PDF