I'm working with two systems—one being a critical server with power backup, and the other a workstation that currently lacks backup. After pushing a git commit to the server, a power outage occurred, causing me to lose my staging changes on the workstation. The server still shows the commit, so I'm trying to figure out what went wrong on my workstation from an OS or filesystem perspective. Does Linux have a mechanism to prevent this kind of data loss? Is it related to journaling, caching, or some kind of checkpoint system? How often are these checkpoints or changes written, and how does the system decide how far back to go if needed?
4 Answers
What filesystem are you using on your workstation? Many newer distributions use btrfs, which has a quirk of committing changes to disk every 30 seconds. If that's what you have, your changes likely weren’t synced yet. In general, Linux queues writes and completes them when it can, which means they might seem saved but really weren’t.
You can either turn off caching for safety, but that would kill your performance, or get a RAID controller with a battery backup to keep your write cache safe until power returns.
You could technically mount your drives in sync mode to avoid data loss, but this would definitely slow things down. No filesystem can fully protect against power loss; if a write happens at that exact moment, you're out of luck. A UPS is the best way to guard against these issues.
Some SSDs come with power loss protection, using caps to save data in cache during power failures. RAID controllers also often have battery backups for similar reasons.
Linux uses journaling filesystems to ensure that writes are either fully completed or not at all, but it sounds like what happened here is more about page caching. Your changes were likely still in RAM and hadn't been written to the disk when the power cut hit. Linux is aggressive with caching, so it waits until a good moment to write data to disk. When you rebooted, you rolled back to the last state that had been flushed.
True, they are connected. The journal keeps the filesystem consistent while pages are being flushed. If power goes out during this, it might leave some files updated and others not. Also, the default for dirty pages is quite high. You might notice sluggishness during large data transfers because of it.
Right? I totally zoned out on page cache! I've always thought about it with external drives but didn’t connect the dots for internal drives. I use `sync` on USBs but never thought of that for my main system.
I’m using btrfs. Can I adjust that 30-second commit time?