I'm working on a project to analyze Reddit posts from 2022 to 2025, but I'm running into some major hurdles. I'm not really tech-savvy and have just started learning Python, so it's been pretty tough.
Initially, I used PRAW to fetch posts and comments through the Reddit API, but hit the rate limit at about 57,000 posts, which isn't enough for my analysis. I then tried Pushshift, which is usually better for historical data, but it appears to be broken—lots of missing or incomplete data for the recent years. I also checked Hugging Face for datasets, but they seem to stop at 2021.
BigQuery looks like it might be a solution, but I'm not sure about the costs involved, and I could really use a public dataset. If anyone has tips or resources for getting Reddit data from 2022 onwards, I'd really appreciate any help or simple steps since I'm still getting the hang of Python.
3 Answers
Finding a budget-friendly solution is tricky since bypassing API usage can lead to your IP being banned due to violating terms of service. I came across a tool called Axiom that might help you scrape Reddit data—it's around $50 and could potentially get all the data you need for your analysis!
Reddit started charging for API access back in 2023, so finding a free way to collect that data is going to be tough now. You might have more luck focusing on academic or research partnerships for potential access, but it won't be straightforward for personal projects.
Have you looked into how much it actually costs to pay for API access? Depending on the method, it might be worth considering if it's within your budget. Just curious if you're okay with paying for this project.
Not sure yet! I just wanted a side project while in university, and analyzing Reddit seemed like a good idea!

I feel you! I was hoping there might be a workaround or something for limited access, especially for research purposes.