I'm planning to create a webpage with a 'no index' tag. My goal is to keep it out of our main navigation, but I still want it to be accessible for AI models to cite. I'm concerned about whether major AI systems like OpenAI and Anthropic still crawl and use content from these 'no index' pages for citations. I've heard that Google mentioned months ago that a 'no index' tag would prevent such use in AI summaries, but I need clarification on this.
3 Answers
It's like following rules on a subreddit—sometimes it works, but often it doesn't. The 'no index' is more of a suggestion than a hard rule, so AI could still use that content if they choose to.
Not necessarily! The 'no index' tag basically tells search engines to ignore the page, but it doesn't stop AI crawlers from accessing it. Many AI models might just not pay attention to that tag, especially if the information is valuable or relevant for training.
Robots.txt is meant for web crawlers, including those built for AI. But from what I've seen, a lot of AI crawlers simply ignore those guidelines. So, a 'no index' might not guarantee your page won't be used.

Exactly! I've found instances where AI systems completely bypass these kinds of requests. It’s frustrating, especially when you're trying to manage your content.