TLDR
- The output of LLMs can be deliberately influenced by the content of the training corpus
- There is a lot of academic literature describing this as ‘data poisoning’
- Niche topics can require relatively small amounts of the content to alter outputs
- For most SEOs, it’s probably better to focus on RAG but it does make sense for big brands and/or very niche topics
Data poisoning
Most SEO for LLMs (‘GEO’, ‘AI SEO’, ‘LEO’ - whatever you want to call it) is likely to be influencing the material incorporated via RAG (aka ‘grounding’).
However, altering the underlying model’s output is also desirable when LLM based tools like ChatGPT choose not to integrate external data, or even when external data is included, influencing a model to:
- recommend your brand or product over competitors’
- use specific adjectives and attributes to describe your products
There is academic research on this topic going back to 2017, and unfortunately it’s dramatically referred to in the field as ‘data poisoning’. ☠️
Specifically, an SEO producing content solely designed to influence the output of ChatGPT for their employer would be a… stealthy, black box, single pattern, clean-label data poisoning attack:
- Stealthy: We don’t want OpenAI to find out
- Single-pattern: We are interested in only the outputs related to one topic, not influencing the whole model (AKA ‘targeted integrity’)
- Black box: We don’t know GPT model’s corpus, weights etc.
- Clean-label: Our content serves a genuine purpose, i.e. appears to be “clean”, unlike “dirty”-label attacks which use obviously incorrect information
- Data poisoning: Influencing the model during training, rather than inference i.e. when a user prompts
How to poison a Large Language Model
Paraphrasing Zhao et al (2025), data poisoning is the injection of carefully crafted ‘malicious’ samples into the training dataset of a model, with the intent of manipulating its behavior.
The number of times that you need to have a specific message in a model’s training corpus to influence the LLMs output, depends how well a topic is already covered in the corpus. e.g. in a corpus that includes Reddit then pop culture and cats will have lots of data, but very niche product categories or local content is probably very sparse.
A simple quantification looks like this:
A mathematical formula describing how much content is required to influence an LLM
- It: Influence over model outputs on a topic
- Dt Total number of training tokens for that topic
- Pt: Number of poisoned tokens injected on topic
- α: Pattern repetition or memorability (e.g. how repetitive your ‘poisoned’ content is)
- β: Model sensitivity to rare topic tokens (overfitting or undertraining for sparse topics)
Explanation
Explanation
- : This is your share of the topic’s training data
- : Amplifies influence if your poison is repetitive, stylistically consistent, or semantically reinforced
- : Reflects that some topics are learned less robustly, so models are more sensitive to the data they do get
Caveats
An important caveat to this formula is that input to output is usually non-linear, Kandpal et al. show that a sequence appearing 10 times in the training data is, on average, generated approximately 1,000 times more frequently than a sequence that appears only once.
Also different types of content are learned at different rates - a study done by Tirumala et al. on memorization of different parts of speech, reveals that nouns and numbers are memorized significantly faster than other parts of speech.
Avoiding defenses
As the commercial benefit of influencing LLM becomes adversarial between model creators and SEO, like Google’s ‘Spam Brain’, LLM creators will almost certainly fight back:
1) Filtering ‘poisoned’ documents
How models defend themselves
Commercial LLM training pipelines employ filtering techniques to exclude manipulative content:
- Data filtering and outlier detection algorithms flag statistically unusual content patterns (a bit like ike Google’s ‘Spam Brain’
- Membership inference detection (Carlini et al.) removes suspected poisoned data
- Differential privacy techniques during training dilute the influence of any single training example
- Semantic binding analysis (Kurita et al.) identifies content with unusual word associations designed to influence models
Evading filtering defenses
The academic literature calls this “clean-label poisoning” - embedding your content designed to influence/poison within legitimately valuable information.
This maintains the document’s usefulness, while including the ‘influencing’ content. I think of it as a “spoonful of sugar” - the valuable content helps the influencing content get past filters.
Specifically:
- Ensure your content is accessible to training crawlers (allowing useragents like GPTBot, Google-Extended, CCBot, anthropic-ai)
- Maintain the high information value alongside ‘influencing’ content.
- Avoid statistical anomalies that might trigger filters i.e. don’t over do it, don’t say weird stuff
2) Collapsing or downweighting repetition
How models defend themselves
LLM training pipelines are likely to identify and downweight repetitive content that appears to be attempting to artificially influence the model through repetition:
- Collapsing duplicate or near-duplicate content
- Detecting and downweighting content clusters with suspiciously similar messaging
- Reducing the weight of content that appears designed to over-represent certain viewpoints
Evading repetition defenses
Simply rewording content isn’t sufficient according to recent academic research. Jia et al. demonstrated that effective semantic variation requires specific optimization approaches to maintain influence while appearing diverse to defense mechanisms:
- Creating genuinely different content pieces that arrive at the same conclusions
- Varying not just wording but structure
- Distributing influence across different contexts, domains, and document types
- Maintaining natural language patterns that don’t trigger repetition detectors
Is this ethical?
Data poisoning sounds awful, it’s not something I’ll tell people I do at parties. However I would suggest this is more akin to political lobbying, in that it can be both good and bad.
How it can benefit models and users
- It add missing data in a models training corpus
- If the messaging is accurate, it improves the quality of the model’s output
- Most major brands have now disallowed model providers from training on their content making model development more difficult. This may encourage them to open their content again.
How it can harm models and users
- It can obviously introduce inaccuracies in responses e.g. if your brand truly isn’t the best/cheapest/shiniest, and you convince a model it is - it will be misleading users.
- “Transferability”- poisoning in one domain can affect model behavior in adjacent domains in unpredictable ways
Is it worth the effort?
If you’re creating content on the web for SEO, and that content is open to training, you already are, even if not deliberately.
As to where to put your efforts, influencing LLM output is much more readily achievable via influencing RAG. Research has shown that RAG influence techniques can be just as harmful, if not more so, more effective at altering output
Influencing RAG will be a slow process as model retraining happens infrequently, currently frontier models releases are once or twice per year due to the chhuge costs associated with them.
Year | OpenAI | Anthropic | |
2020 | |||
2021 | GPT-3 | LaMDA | |
2022 | LaMDA 2 PaLM | ||
2023 | GPT-4 | Bard PaLM 2 Gemini |
Claude Claude 2 |
2024 | GPT-4o | Gemini 2.0 | Claude 3
Claude 3.5 |
2025 | Gemini 2.5 | Claude 3.7 Sonnet |
Realistically LLM data poisoning will work either for:
- The biggest brands. Work by Goldstein et al. suggests that influence capability follows power law distributions that favor established entities
- Brands with a small but profitable niche
Popular topic | Niche topic | |
Big brand | Yes | Probably no need |
Small brand | Little chance of success | Yes |
Specifically, what should I do?
- Include the influencing messaging on your own site
- 43% of the top 100 news websites in the world still allow training for OpenAI, and it’s even higher for smaller models so include the messaging in your external comms
Cited studies / more reading
Zhao, P., Zhu, W., Jiao, P., Gao, D., & Wu, O. (2025). Data Poisoning in Deep Learning: A Survey. arXiv preprint arXiv:2503.22759. PDF
Fu, T., Sharma, M., Torr, P., Cohen, S. B., Krueger, D., & Barez, F. (2024). POISONBENCH: Assessing Large Language Model Vulnerability to Data Poisoning. arXiv preprint arXiv:2410.08811. PDF
Zhang, Y., Rando, J., Evtimov, I., Chi, J., Smith, E. M., Carlini, N., Tramer, F., & Ippolito, D. (2024). Persistent Pre-Training Poisoning of LLMs. arXiv preprint arXiv:2410.13722. PDF
Chen, B., Guo, H., Wang, G., Wang, Y., & Yan, Q. (2024). The Dark Side of Human Feedback: Poisoning Large Language Models via User Inputs. arXiv preprint arXiv:2409.00787. PDF
Jiang, S., Kadhe, S. R., Zhou, Y., Cai, L., & Baracaldo, N. (2023). Forcing Generative Models to Degenerate Ones: The Power of Data Poisoning Attacks. arXiv preprint arXiv:2312.04748. PDF
He, P., Xu, H., Xing, Y., Liu, H., Yamada, M., & Tang, J. (2024). Data Poisoning for In-context Learning. arXiv preprint arXiv:2402.02160. PDF
Satvaty, A., Verberne, S., & Turkmen, F. (2024). Undesirable Memorization in Large Language Models: A Survey. arXiv preprint arXiv:2410.02650. PDF