I've been experiencing this frustrating issue with Active Directory mostly affecting one of our clients running on Server 2019. Occasionally, the domain controller (DC) becomes unresponsive, making it impossible to perform tasks, like creating users or changing passwords, even though everything seems fine at first glance. When this happens, trying to load the AD application just results in errors, and the only solution I've found is to reboot the server.
Even though we have another DC available, it doesn't seem to accept changes from the first one, especially since the first DC holds the FSMO roles. This becomes a bigger issue if it crashes overnight, as users can log in with cached credentials, but they can't access mapped drives or print. I've checked the logs but haven't found clear causes, just repeating errors. I'm dumping some logs from the start of the issue, and I hope someone might have insight into resolving this or at least identifying what's happening.
4 Answers
It might be worth transferring your FSMO roles to another DC. Often, having a backup DC can make a significant difference, and it could stabilize things for you. If issues persist, consider opening a support ticket with Microsoft; their AD team used to be really helpful, though experiences can vary.
First off, check your Active Directory Sites and Services on your PDC. Make sure there's no weird custom links messing things up, as AD typically handles links well unless they’re broken. Also, run some replication checks using commands like `repadmin /replsum` and `repadmin /showrepl` to see if your DCs are out of sync. If they are, you might need to bring them back in sync one by one before troubleshooting further. Just be cautious with the settings as misuse can cause additional problems.
I think you're right; it’s all about keeping those DCs synced. If the PDC isn't working, password changes won't stick because only it can write to the database. I’d suggest replacing the faulty DC as a last resort, just to avoid ongoing issues.
You’ve got a situation where the unreliable PDC affects how other DCs function, especially regarding trusts and resources. Consider a fresh setup when it comes to your DCs, especially if issues keep occurring. If something's not replicating correctly, that could be why workstations can't access resources properly. Document everything you do for a smooth transition.
You definitely want to make sure the new DC takes over FSMO roles, then demote the faulty one. Keep an eye out for event ID 4604 in the logs after you've promoted the new DC to confirm replication success. Cleaning up all references to the old DC is crucial, especially in DNS. It ensures you won't run into cognitive issues later. Just follow the guidelines to maintain network stability in the process!

I had a similar experience a couple months ago with their support. It took a while just to get an engineer assigned. In the end, we figured out the issue on our own.