I've been struggling with the process of reviewing pull requests for documentation. It feels like chaos when I try to get a clear picture of what actually changed. For instance, if I tweak a paragraph for clarity, standard `git diff` treats the whole section as deleted and added anew, leaving me with confusing walls of red and green text. I end up having to sift through everything just to ensure that no crucial details, like deadlines or prices, got accidentally altered while improving the phrasing. Last weekend, I decided to tackle this issue by creating a prototype tool that focuses on the meaning of the text rather than just the syntax. It ignores minor phrasing changes but flags significant content alterations. I'm curious if others would find this useful in their continuous integration (CI) workflows or if I'm just over-complicating things.
5 Answers
This is actually a really useful idea! We've faced similar problems with documentation PRs where important content changes get lost in the formatting clutter. In CI pipelines, your tool could help in several ways, like distinguishing between meaningful content changes and mere style adjustments. Just a heads-up though, performance can be tricky, especially with semantic analysis being resource-intensive. Maybe consider offering options to choose between semantic and traditional diffs based on the situation. It can also be beneficial to look into how tools like Prettier handle markdown formatting during comparisons.
Glad you like the idea! Integrating with GitHub Actions could make it easier for folks to adopt, allowing them to seamlessly integrate it into their existing workflows.
Doesn’t that mean you have to check two things instead of one? Seems like a lot of work.
The idea is that it should actually reduce your workload. If my tool indicates that there are no factual changes, you can quickly skim through instead of meticulously reviewing every word, which is more of a triage method than blind reliance.
You might want to check out diff-so-fancy which does a decent job with word diffs. It might help complement your tool!
I love diff-so-fancy, but it doesn't help with meaning changes. That's the unique angle I'm tackling!
Sounds interesting! Even though it was primarily aimed at documentation, I can see applications beyond web development. There are definitely other fields where this could be a game-changer.
Thanks! I was mainly focused on documentation, but I'd love to hear what other uses you're thinking about. Maybe legal documents or general automation?
Have you considered trying out some visual diff tools? They usually highlight changed words quite nicely.
I've used diff-so-fancy, and while it does clean up the output, it still operates on syntax, showing every single word change. I’m trying to catch the meaning rather than the syntax.

Awesome suggestions! I agree about needing flexibility in performance. A hybrid approach sounds smart, maybe limiting semantic checks to markdown files or specific labels in PRs.