Hey everyone! I'm working on a project that uses the Reddit API to scrape posts containing pain points from various subreddits, as I can't afford GummySearch. The idea is to identify what people are struggling with and brainstorm business concepts based on that. However, I'm having a tough time with the filtering logic to ensure I accurately capture posts with pain points, similar to GummySearch.
Right now, I have a set of 'pain keywords' that I'm using to filter, but it's not effective—I only get about 5-6 relevant posts back. I'm considering using the OpenAI SDK to analyze the JSON data returned by Reddit, but I'm unsure if that would work since I'm pulling in a lot of posts at once (up to 50 per subreddit).
If anyone has experience with something similar, I'd love to hear how you approached it!
2 Answers
It seems like your filtering might be too strict since you're looking for exact matches of those phrases. Maybe try to expand your filtering logic to recognize variations of those terms instead. For instance, if you're looking for 'I hate', consider using keyword variations like 'hate', 'frustrating', and so on. You might also want to use a text analysis tool to help identify relevant posts by sentiment analysis or more advanced filtering algorithms.
Have you thought about using regular expressions or a simple natural language processing model to identify posts? These can help pick out relevant content more flexibly. Using keywords is a start, but NLP could give you a broader context and potentially better results!
Exactly, NLP can help you understand context better. Instead of matching strict phrases, it can analyze the intent behind the words, which is crucial for your project!
Yeah, that makes sense! By broadening the keyword approach, you can catch more posts that convey frustration even if they don't use the exact phrases you're checking for.