I've been brainstorming about designing a website that would be unindexable and challenging for AI to scrape. The idea involves using a custom font where each character maps to a random Unicode character. To users, the text appears normal and readable, but behind the scenes, it turns into a jumbled mess of Unicode that wouldn't make sense to AI scrapers. I'm curious if this would be effective in stopping AI from indexing or copying content from the site. Is this concept practical, or is there something I'm missing?
5 Answers
It's possible that your method could stop AI scraping to some extent. However, it might not be foolproof. If enough websites use this technique, AI developers could adapt their scrapers to decode it. Also, I wonder about accessibility—if the CSS fails, users won't be able to read a thing.
This isn't a brand-new idea! There are npm packages, like @noscrape, that already implement similar techniques. Facebook has also used methods like jumbling text in HTML to confuse scrapers while still presenting a clear view to users. It's definitely a fascinating field to explore, especially with how AI is evolving.
I've actually used a similar strategy in emails to protect contact details from scraping. It’s all about adding that random twist to make it non-standard.
Another approach would be to render all text content as images. That way, it’s not text at all, making scraping a lot harder!
I'm curious about how this impacts screen readers. Would they struggle with your setup? My guess is yes, anything that doesn't utilize OCR technology would likely break.

Related Questions
How to Build a Custom GPT Journalist That Posts Directly to WordPress
Cloudflare Origin SSL Certificate Setup Guide
How To Effectively Monetize A Site With Ads