I'm developing a B2B analytics tool that tracks brand visibility across AI platforms, like ChatGPT. To ensure accuracy, I need to fetch responses directly from the ChatGPT website (chat.openai.com), rather than using the OpenAI API, since the outputs vary due to system prompts and other factors. My goal is to handle around 30,000 prompts daily. I'm considering using headless browser options like Playwright, Puppeteer, or Selenium, but I'm aware of the challenges of scaling a scraper due to Cloudflare protections and frequent user interface changes. I'm also open to managed services that can handle browser automation and account management for me. What's the best approach or service recommended for this task?
2 Answers
Honestly, scraping is your best bet here. You’d need to find ways to bypass their bot detection and keep up with UI changes, but that could mean you're shifting towards a whole new kind of product. Just be prepared for the headaches that come with it!
Keep in mind that public APIs exist to prevent issues like this. If you’re getting different responses from the API compared to the website, it might just be the AI's nature or how the API's integrated. Maybe it's worth focusing on making sure your API usage aligns with what's on the website rather than scraping.

I get that, but the different internal prompts on the site mean I can’t rely on the API if I want to show my clients exactly what’s happening on ChatGPT.