> they found that every single piece of software they tested except for SQLite in one particular mode had at least one bug.
This is why whenever I need to persist any kind of state to disk, SQLite is the first tool I reach for. Filesystem APIs are scary, but SQLite is well-behaved.
Of course, it doesn't always make sense to do that, like the dropbox use case.
Before becoming too overconfident in SQLite note that Rebello et al. (https://ramalagappan.github.io/pdfs/papers/cuttlefs.pdf) tested SQLite (along with Redis, LMDB, LevelDB, and PostgreSQL) using a proxy file system to simulate fsync errors and found that none of them handled all failure conditions safely.
In practice I believe I've seen SQLite databases corrupted due to what I suspect are two main causes:
1. The device powering off during the middle of a write, and
2. The device running out of space during the middle of a write.
I'm pretty sure that's not where I originally saw his comments. I remember his criticisms being a little more pointed. Although I guess "This is a bunch of academic speculation, with a total absence of real world modeling to validate the failure scenarios they presented" is pretty pointed.
I believe it is impossible to prevent dataloss if the device powers off during a write. The point about corruption still stands and appears to be used correctly from what I skimmed in the paper. Nice reference.
> I believe it is impossible to prevent dataloss if the device powers off during a write.
Most devices write sectors atomically, and so you can build a system on top of that that does not lose committed data. (Of course if the device powers off during a write then you can lose the uncommitted data you were trying to write, but the point is you don't ever have corruption, you get either the data that was there before the write attempt or the data that is there after).
Only way I know of is if you have e.g. a RAID controller with a battery-backed write cache. Even that may not be 100% reliable but it's the closest I know of. Of course that's not a software solution at all.
That's uh, not running out of power in the middle of the write. That's having extra special backup power to finish the write. If your battery dies mid cache-write-out, you're still screwed.
This is why whenever I need to persist any kind of state to disk, SQLite is the first tool I reach for. Filesystem APIs are scary, but SQLite is well-behaved.
Of course, it doesn't always make sense to do that, like the dropbox use case.