In Search of a Distributed File System

Is it just me, or do the distributed filesystems available even today just plain suck?

I've used NFSv3 and Samba Shares to share filesystems, and various Version Control packages like CVS, Subversion, and now Git... but I still haven't come across anything that is my DREAM SHARED FILESYSTEM.

Requirements of My Ideal Distributed File System

  • Files must be available and Read/Write on the Local System's media.
  • On File Write (file save), all changes from the Local System must be immediately Replicated to all "mirrors" of the Distributed File System
  • all incoming changes received from a Remote System must "lock" the changed file while the updates are pending. 
  • Applications that have "changed files" open must be able to notice the underlying file change (check the timestamp), and notify the user as appropriate.

OK, I know, most of those requirements are screaming "Use a Version Control System!", but here's where I digress from that chain of thought.

First, I don't really want to track individual changes in this Distributed File System -- all the change control and replication should be transparent.  I should be able to use a real Version Control package like Subversion or Git alongside this Distributed File System if I want to track the changes.  This makes the most sense to me, as most of my file-write changes are under 10 lines long.  (I do mostly prototyping and R&D work, so small and frequent changes is the normal workflow for me.)

Second, I want multiple "Filesystem Mirrors" of different flavors.  They could be networked filesystems like NFS or Samba; Read/Write mirrors via rsync or scp; ReadOnly mirrors via ftp; or full mirrors on an Encrypted Filesystem on a USB Memory Stick, mounted elsewhere on the Local System.  This "Memory Stick" option is particularly important to me, as I'd like the option (and comfort) of carrying a secure, physical backup of my work home at the end of the day. 

Wouldn't it be great to pop that stick into my Home Desktop, continue working on the project after hours, and return to work the next morning?  Just pop that stick into the Office Computer, and let this Distributed File System solution discover the changes and propagage the changes to all other Mirrors.  Basically, that memory stick becomes a disconnected "Briefcase on Steroids".  No network necessary.  Which is equally ideal if you plan to do some work in an airplane or something. 

These Ideals Scream of a Fuse FileSystem Based Solution

Yes, it certainly does -- there's no way something like this would work out in Kernel Space without some serious help and regression-testing.  Thus, I'm starting to design the architecture for the Local FileSystem portions, and once I have all the file-write hooks working right, I'll start to design the Remote Receiver portions.  The Remote Receivers might be small comet listeners that receive update notifications and implement the changes on specific filesystem solutions (NFS/Samba;rsync/scp;ftp;Encrypted Stick, etc..)...

Grumble Grumble Grumble

I'm gonna have to do some more research on this, to see if there isn't already something out there that I could cobble together.  This just screams "obvious" to me, but I just don't see anything out there at the moment.  Tips, hints, or pointers, anyone?  Am I reinventing File System Clustering or something along those lines?

I want this too, mostly

I just found your blog through TechHui, sorry to comment so long after the date of your post.

I once set up a pair of servers with failover, heartbeat, DRBD and all that. It was fun to do, but in testing, I determined that it was so tricky to administer that I would most likely increase my downtime through the inevitable human error. So I set up database replication and some rsync scripts in cron and those servers have gone over 4 years with no unplanned downtime and no data loss.

So while I like the idea of automatic syncing (and there are filesystems that were built for this, but they weren't stable enough last time I looked), I don't like the complexity that usually goes with it.

The beauty of Unix is that everything is configurable. The curse of Unix is that everything has to be configured...

It should be possible to build a dnotify config that syncs files on change to any number of destinations, but is that elegant enough? Does that solve the problem that inspired this post?

Briefcase Sync Daemon for Replication

Hi Kurt,

Yeah, I haven't thought about this much since posting the blog article.  Initially I was thinking some kind of custom Fuse Filesystem, but have since reconsidered designing it a daemon that listens for inotify events and propagates the changes. 

Once I got to that conclusion, a quick google-search turned up a few viable options that might be workable.  I'll probably look into this further when I have more time.