System Operations

How to Prevent Auto-Failback in Pacemaker/DRBD Setup That Disrupts Active Transfers?

November 17, 2025

Asked By TechGizmo42 On November 17, 2025

I'm currently testing a 2-node cluster with Pacemaker, Corosync, and DRBD (Active/Passive). Node 1 is set as the Primary and Node 2 as the Secondary, with Node 1 having a location preference score of 50. Here's the situation:

1. I simulated a failure on Node 1, and the resources successfully transferred to Node 2.
2. While resources were running on Node 2, I started a large file transfer to the DRBD mount point.
3. After a while, I brought Node 1 back online.
4. Pacemaker immediately switched resources back to Node 1, killing the file transfer on Node 2 and leaving me with a corrupted file.

I thought Pacemaker or DRBD would hold off on resource switching until ongoing write operations or synchronization completed. But clearly, that's not the case.

1. Is this behavior typical? Does Pacemaker not consider active user jobs?
2. How can I change the cluster's configuration to keep it on Node 2 until all transfers and syncs are complete? I do need Node 1 to always be the master when it's available.
3. Should I be worried about filesystem corruption from this, or is it just interrupted transactions?

Here's a glimpse of my configuration:
- stonith-enabled=false (I know this isn't safe; I'm just testing)
- default-resource-stickiness=0
- Location Constraint: Resource prefers node1=50

Any advice would be appreciated!

3 Answers

Answered By SysAdminNinja99 On November 21, 2025

Yep, that's expected behavior. Pacemaker operates without any knowledge of active I/O or user sessions. Since your stickiness is set to 0, it will immediately return resources to Node 1 as soon as it comes online. DRBD doesn't delay promotions just because there's an active write on the Secondary node—it's all about satisfying the constraints. To fix this, consider increasing your stickiness value to 100 or more, which will help keep the resources on Node 2 longer. Using a ban constraint that only gets lifted when you either clear it manually or after a full DRBD sync can also help. Don't rely solely on a 'prefer=50' type constraint for master selection; you should look into DRBD Master/Slave rules or manual promotions. And definitely enable STONITH to avoid potential split-brain situations. There shouldn't be filesystem corruption unless DRBD gets compromised, just interrupted writes which can lead to data loss, but nothing corrupted. If a node fails without STONITH, that’s when real corruption can occur!

DataWhiz01 - November 21, 2025

I can confirm this! I've set my resource stickiness to 1000 to avoid the issues you're facing.

TechGizmo42 - November 21, 2025

Thanks for the confirmation! I’ll implement these changes.

Answered By ClusterGeek88 On November 19, 2025

If possible, consider using 3 nodes for your Pacemaker setup. Having an extra node ensures that when one node drops, the others can communicate, providing better reliability. This way, the two nodes left knowing they aren't the problem can continue to operate smoothly. Everything that was mentioned in the previous response holds true, but added resilience can be very beneficial.

ReflectiveRon - November 21, 2025

Great suggestion! I often recommend adding nodes for better quorum.

Answered By TechWizard245 On November 19, 2025

You might want to check out Linstor as well. It can provide better orchestration capabilities for your DRBD setup, but just know if you're already having issues with DRBD, Linstor won't solve them itself.

ClusterWizard9 - November 21, 2025

True! It's good for orchestration, but not a fix if your base DRBD setup is flawed.

How to Prevent Auto-Failback in Pacemaker/DRBD Setup That Disrupts Active Transfers?

3 Answers

Related Questions

Can't Load PhpMyadmin On After Server Update

Redirect www to non-www in Apache Conf

How To Check If Your SSL Cert Is SHA 1

Windows TrackPad Gestures

LEAVE A REPLY Cancel reply