I'm trying to use Curl to fetch JSON data from a Reddit post, but instead of getting the JSON I expect, I get a bunch of HTML. Can someone explain why this is happening and how I can set Curl to return the correct JSON format? Here's the command I'm using:
curl --url https://www.reddit.com/r/IAmA/comments/16h7303/i_am_a_sleep_expert_ask_me_anything/.json Thanks!
4 Answers
It seems like Reddit is specifically blocking user-agent strings from Curl. If you try passing a random user-agent header, like `curl -H "User-Agent: foobar" "https://www.reddit.com/r/IAmA/comments/16h7303/i_am_a_sleep_expert_ask_me_anything/.json"`, it should work just fine!
Did you know? Wget can also achieve this, but both Curl and Wget need the user-agent to be spoofed to work properly, as their default user-agents are usually blocked.
It might be that the site is blocking Curl requests. You can check what Curl is downloading to see if that's the case. Some sites require an appropriate user-agent.
Got it! Looks like I need to add some authentication to my script. It works in the browser but not in Curl. This used to work fine before.
Try changing the user-agent in your Curl command. Adding `-A 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36'` should do the trick!
Okay, I’ll give that a shot when I’m home later!
True! Wget can do it, but it also needs user-agent tweaking to match Curl.