I've recently joined an organization that's using SSSD on their cloud-based Linux VMs, which connect to multiple domains. I've noticed that when I'm trying to retrieve group information for users in the '.bad.com' domain from one particular Linux server, it takes an eternity—around 4 minutes! This delay is really frustrating because SSH sessions often time out before I even get a password prompt. Interestingly, the 'id' command runs fast (in milliseconds) for users in other domains, and sometimes it even returns quickly for '.bad.com' users, but that's rare. I've confirmed that this issue is isolated to just one server while others in the same subnet work fine. I've ruled out network issues since pings and traceroutes all check out. I've tried tweaking various SSSD settings and enabled debug logging but haven't had much luck. I'm looking for some creative troubleshooting ideas as I'm running out of options!
4 Answers
You should double-check the routing setup and overall network health. If local tests on the server are fine, that suggests something deeper in the network might be at play. Look at timestamps on logs for any strange behavior and consider running a check on your LDAP server to see if there’s a timeout being logged. Also, examine system load indicators; high swaps or load can impact performance.
This might be an Active Directory-specific issue. Since you mentioned one user accesses other domains quickly except for '.bad.com', it sounds like that domain’s DCs might not have the full LDAP database. Consider configuring all domain controllers as global catalog servers or making the one that’s slow a global catalog server—just ensure it doesn't hold the infrastructure master role. Plus, running tools like 'dcdiag' can help you spot any underlying replication issues.
This definitely strikes me as a DNS issue. In my experience, similar problems arise when server configurations are off, especially regarding PTR records. When DNS can’t resolve properly, it results in timeouts for SSH logins. It's worth checking your DNS settings to ensure everything's configured correctly.
This is a puzzling problem but could potentially be related to network or firewall issues. Have you checked if either IPv4 or IPv6 is getting blocked? It might be worth using ldapsearch to validate the LDAP connection directly and see if you get any errors that way.
Totally agree! Running basic checks like traceroute or ping isn't enough; sometimes the issue is in the configuration files themselves. I had a similar situation where a wrong gateway was causing huge delays. Make sure to troubleshoot any proxies or non-switching devices on the network.