I'm trying to use curl to fetch JSON data from a Reddit post, but instead, I'm getting back a web page with a lot of text instead of the raw JSON I expected. Here's the command I'm using: `curl --url https://www.reddit.com/r/IAmA/comments/16h7303/i_am_a_sleep_expert_ask_me_anything/.json`. Can anyone explain why this is happening and how I can modify my command to get the pure JSON response? Thanks!
4 Answers
The site seems to block requests with the default curl user-agent string. If you send a different user-agent, like `curl -H "User-Agent: foobar" "https://www.reddit.com/r/IAmA/comments/16h7303/i_am_a_sleep_expert_ask_me_anything/.json"`, it should work!
A quick fix is to change the user agent. Try adding `-A 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36'`. This often works to bypass restrictions.
You could also try using wget for this task. It behaves similarly to curl but can sometimes get around these kinds of restrictions more easily.
wget can or can't do it like curl. Both default user agents are blocked, so you'd still need to spoof a user agent for either tool.
It seems like the site might not allow curl requests by default. Sometimes sites serve different content based on how the request is made. You can check what response curl is downloading to diagnose the issue better. Here's a link that shows a visual example: [link](https://ibb.co/Z6BKf7NW).
Oh. I see. Looks like my script needs to send some authentication. This is why it doesn't work in my script but does in the browser. This used to work a while ago, but no longer.
Ok. Will try this when I get home later.