In big Python projects, what tools do you use to find duplicate or very similar functions? I'm specifically looking for static analysis or command-line tools, not those that utilize AI. I actually developed a small library called DeepCSim to assist with this problem, but I'm interested to hear about other solutions that developers are using in real-world scenarios. Thanks!
3 Answers
You might want to try using Pylint, which has a `duplicate-code` rule (R0801). Although, keep in mind it can be a bit slow. It primarily detects identical lines of code, so it might not catch everything that’s just similar.
Have you considered SonarQube? It's free for projects with up to 50,000 lines of code, and I find it pretty effective for tracking down duplicates.
Honestly, I believe that a well-structured architecture can often prevent the issue of duplicate code. Chasing down similar-looking code just to eliminate duplication can make maintaining a codebase more challenging. There's a blog by Dan Abramov discussing the balance of clean code and practicality that might resonate with you: it's not specifically about Python, but the principles apply widely.

I came here to mention SonarQube too! It’s a solid choice!