I'm looking to implement a system to limit the amount of data a user can upload to my server each day based on their IP address, but I want to do this without directly storing the IP addresses for privacy and security reasons. I'm concerned about the risks if my database is ever compromised. While I know I could use a hash to obfuscate the IPs, I wonder if there are algorithms that avoid the issues of collisions or that don't allow easy reconstruction of the original IPs. Any recommendations?
5 Answers
To be honest, it's better not to rely solely on IPs for rate limiting. You risk affecting legitimate users who share IPs. Maybe consider using user accounts or implementing some kind of fingerprinting instead.
While IP addresses aren't always sensitive, they can still be considered personally identifiable information under regulations like GDPR. Even though reuse or sharing can happen, they should be treated with caution. It's best to err on the side of keeping them protected if your application falls under such regulations.
It's important to remember that rate-limiting by IP requires knowledge of whether a given IP falls into a specific category. This can pose a challenge for privacy since the server needs to retain enough information about incoming requests to enforce limits. One solid approach is to use a separate, extra-secure server just for rate limiting, ensuring all operations are done in memory and never storing IPs on disk, which might make them more vulnerable during a breach.
You could use HMAC with a secret key combined with the normalized IP and a date bucket. This way, you store the HMAC result instead of the raw IP, which can then be used for your upload limits without exposing the actual IP address.
But storing that secret key could still be a security risk in itself.
Using a hash function like SHA-224 could work since it's quite robust at preventing reconstruction of the original data. Just remember to delete IP data after it’s no longer needed to stay compliant with privacy laws. Also, keep in mind complications from NAT environments where many users may appear to share an IP.

Even if IP addresses can be shared, treating them as sensitive data helps cover all bases legally.