Building Filesystems the Way You Build Web Apps

Posted in Fun, Programming on July 8th, 2010 by Evan Broder18 Comments

FUSE is awesome. While most major Linux filesystems (ext3, XFS, ReiserFS, btrfs) are built-in to the Linux kernel, FUSE is a library that lets you instead write filesystems as userspace applications. When something attempts to access the filesystem, those accesses get passed on to the FUSE application, which can then return the filesystem data.

It lets you quickly prototype and test filesystems that can run on multiple platforms without writing kernel code. You can easily experiment with strange and unusual interactions between the filesystem and your applications. You can even build filesystems without writing a line of C code.

FUSE has a reputation of being used only for toy filesystems (when are you actually going to use flickrfs?), but that’s really not fair. FUSE is currently the best way to read NTFS partitions on Linux, how non-GNOME and legacy applications can access files over SFTP, SMB, and other protocols, and the only way to run ZFS on Linux.

But because the FUSE API calls separate functions for each system call (i.e. getattr, open, read, etc.), in order to write a useful filesystem you need boilerplate code to translate requests for a particular path into a logical object in your filesystem, and you need to do this in every FUSE API function you implement.

Take a page from web apps

This is the kind of problem that web development frameworks have also had to solve, since it’s been a long time since a URL always mapped directly onto a file on the web server. And while there are a handful of approaches for handling URL dispatch, I’ve always been a fan of the URL dispatch style popularized by routing in Ruby on Rails, which was later ported to Python as the Routes library.

Routes dissociates an application’s URL structure from your application’s internal organization, so that you can connect arbitrary URLs to arbitrary controllers. However, a more common use of Routes involves embedding variables in the Routes configuration, so that you can support a complex and potentially arbitrary set of URLs with a comparatively simple configuration block. For instance, here is the (slightly simplified) Routes configuration from a Pylons web application:

from routes import Mapper

def make_map():
    map = Mapper()
    map.minimization = False

    # The ErrorController route (handles 404/500 error pages); it should
    # likely stay at the top, ensuring it can always be resolved
    map.connect('error/{action}/{id}', controller='error')

    map.connect('/', controller='status', action='index')
    map.connect('/{controller}', action='index')
    map.connect('/{controller}/{action}')
    map.connect('/{controller}/{action}/{id}')

    return map

In this example, {controller}, {action}, and {id} are variables which can match any string within that component. So, for instance, if someone were to access /spend/new within the web application, Routes would find a controller named spend, and would call the new action on that method.

RouteFS: URL routing for filesystems

Just as URLs take their inspiration from the filesystem, we can use the ideas from URL routing in our filesystem. And to make this easy, I created a project called RouteFS. RouteFS ties together FUSE and Routes, and it’s great because it lets you specify your filesystem in terms of the filesystem hierarchy instead of in terms of the system calls to access it.

RouteFS was originally developed as a generalized solution to a real problem I faced while working on the Invirt project at MIT. We wanted a series of filesystem entries that were automatically updated when our database changed (specifically, we were using .k5login files to control access to a server), so we used RouteFS to build a filesystem where every filesystem lookup was resolved by a database query, ensuring that our filesystem always stayed up to date.

Today, however, we’re going to be using RouteFS to build the very thing I lampooned FUSE for: toy filesystems. I’ll be demonstrating how to build a simple filesystem in less than 60 lines of code. I want to continue the popular theme of exposing Web 2.0 services as filesystems, but I’m also a software engineer at a very Git- and Linux-heavy company. The popular Git repository hosting site Github has an API for interacting with the repositories hosted there, so we’ll use the Python bindings for the API to build a Github filesystem, or GithubFS. GithubFS lets you examine the Git repositories on Github, as well as the different branches of those repositories.

Getting started

If you want to follow along, you’ll first need to install FUSE itself, along with the Python FUSE bindings – look for a python-fuse or fuse-python package. You’ll also need a few third-party Python packages: Routes, RouteFS, and github2. Routes and RouteFS are available from the Python Cheeseshop, so you can install those by running easy_install Routes RouteFS. For github2, you’ll need the bleeding edge version, which you can get by running easy_install http://github.com/ask/python-github2/tarball/master

Now then, let’s start off with the basic shell of a RouteFS filesystem:

#!/usr/bin/python

import routes
import routefs

class GithubFS(routefs.RouteFS):
    def make_map(self):
        m = routes.Mapper()
        return m

if __name__ == '__main__':
    routefs.main(GithubFS)

As with the web application code above, the make_map method of the GithubFS class creates, configures, and returns a Python Routes mapper, which RouteFS uses for dispatching accesses to the filesystem. The routefs.main function takes a RouteFS class and handles instantiating the class and mounting the filesystem.

Populating the filesystem

Now that we have a filesystem, let’s put some files in it:

#!/usr/bin/python

import routes
import routefs

class GithubFS(routefs.RouteFS):
    def __init__(self, *args, **kwargs):
        super(GithubFS, self).__init__(*args, **kwargs)

        # Maps user -> [projects]
        self.user_cache = {}

    def make_map(self):
        m = routes.Mapper()
        m.connect('/', controller='list_users')
        return m

    def list_users(self, **kwargs):
        return [user
            for user, projects in self.user_cache.iteritems()
            if projects]

if __name__ == '__main__':
    routefs.main(GithubFS)

Here, we add our first Routes mapping, connecting '/', or the root of the filesystem, to the list_users controller, which is just a method on the filesystem’s class. The list_users controller returns a list of strings. When the controller that a path maps to returns a list, RouteFS automatically makes that path into a directory. To make a path be a file, you just return a single string containing the file’s contents.

We’ll use the user_cache attribute to keep track of the users that we’ve seen and their repositories. This will let us auto-populate the root of the filesystem as users get looked up.

Let’s add some code to populate that cache:

#!/usr/bin/python

from github2 import client
import routes
import routefs

class GithubFS(routefs.RouteFS):
    def __init__(self, *args, **kwargs):
        super(GithubFS, self).__init__(*args, **kwargs)

        # Maps user -> [projects]
        self.user_cache = {}
        self.github = client.Github()

    def make_map(self):
        m = routes.Mapper()
        m.connect('/', controller='list_users')
        m.connect('/{user}', controller='list_repos')
        return m

    def list_users(self, **kwargs):
        return [user
            for user, projects in self.user_cache.iteritems()
            if projects]

    def list_repos(self, user, **kwargs):
        if user not in self.user_cache:
            try:
                self.user_cache[user] = [r.name
                    for r in self.github.repos.list(user)]
            except:
                self.user_cache[user] = None

        return self.user_cache[user]

if __name__ == '__main__':
    routefs.main(GithubFS)

That’s enough code that we can start interacting with the filesystem:

opus:~ broder$ ./githubfs /mnt/githubfs
opus:~ broder$ ls /mnt/githubfs
opus:~ broder$ ls /mnt/githubfs/ebroder
anygit	    githubfs	 pyhesiodfs	 python-simplestar
auto-aklog  ibtsocs	 python-github2  python-zephyr
bluechips   libhesiod	 python-hesiod
debmarshal  ponyexpress  python-moira
debothena   pyafs	 python-routefs
opus:~ broder$ ls /mnt/githubfs
ebroder

Users and projects and branches, oh my!

You can see a slightly more fleshed-out filesystem on (where else?) Github. GithubFS lets you look at the current SHA-1 for each branch in each repository for a user:

opus:~ broder$ ./githubfs /mnt/githubfs
opus:~ broder$ ls /mnt/githubfs/ebroder
anygit	    githubfs	 pyhesiodfs	 python-simplestar
auto-aklog  ibtsocs	 python-github2  python-zephyr
bluechips   libhesiod	 python-hesiod
debmarshal  ponyexpress  python-moira
debothena   pyafs	 python-routefs
opus:~ broder$ ls /mnt/githubfs/ebroder/githubfs
master
opus:~ broder$ cat /mnt/githubfs/ebroder/githubfs/master
cb4fc93ba381842fa0c2b34363d52475c4109852

What next?

Want to see more examples of RouteFS? RouteFS itself includes some example filesystems, and you can see how we used RouteFS within the Invirt project. But most importantly, because RouteFS is open source, you can incorporate it into your own projects.

So, what cool tricks can you think of for dynamically generated filesystems?

Stop worrying about in-kernel filesystems

Still rebooting for this month’s round of ext4 bugs? Let Ksplice Uptrack fix the filesystems in your kernel without rebooting, so you can spend your time writing userspace filesystems instead!

Share :
  • Twitter
  • Reddit
  • Digg
  • Facebook
  • del.icio.us
  • StumbleUpon
  1. xaque208 says:

    Fuse is not the only way to run ZFS: http://wiki.github.com/behlendorf/zfs/

  2. Eric Blue says:

    Awesome article! I would definitely like to give this a try. It would be an interesting experiment to build a FS on top of MediaWiki (SMW in particular). Being able to traverse categories, semantic properties, and cat or even edit articles would be very nice.

  3. Jeffrey Bosboom says:

    @Eric Blue: You might try WikipediaFS at http://wikipediafs.sourceforge.net/ , although the page notes the project is currently looking for maintainers.

  4. Greg says:

    Hi!
    Thanks for the inspiring post… I sat down right away to play with this a bit – pulling public Twitter posts… Good fun, plenty to improve as well, as run into limitations by python-fuse, python-routefs and python-twitter as well. It’s all well, a way of learning. Post issue on github in a bit….
    Cheers!
    Greg

  5. Chuck says:

    Nice post. Reminds me a lot of QNX where all filesystems exist in userspace and they have just this sort of map/rerouting virtual filesystem stuff. All in a very nice service kind of way where you can register filesystem extensions easily. It’s very elegant, although poorly documented, but worth looking into some of their ideas…

  6. Paul Stone says:

    This is awesome. Although the _get_file method should check for type of unicode as well as str.

  7. iirekm says:

    I think that more and more parts of Kernel should be done in userspace – Linux Kernel should move from monolythic approach to microkernel.
    Linux Kernel was created for ‘slow’ i386 processors – every system call cost there a lost, so it was more effecient to put all the code in one big kernel.
    Now when we have fast processor I think that there would be no huge performance overhead even if ‘core’ filesystems and drivers were moved to user space.
    Currently Kernel can be developed only in C, by some gurus who know it’s ugly sources well. With move to user space it could be developed by everyone, in any language like Python FUSE filesystem in this example.

  8. Dieter_be says:

    Cool stuff. Nice work.
    But re: “GithubFS lets you look at the current SHA-1 for each branch in each repository for a user:”,
    I would try to avoid writing “git frontends” like this. At some point the filesystem could just present a sparse git clone, so the user has all the git features for free. This is probably more expensive then limiting the filesystem contents to some sha1 hashes and whatever, but still, maybe with an efficient api call and clientside caching…

  9. But because the FUSE API calls separate functions for each system call (i.e. getattr, open, read, etc.), in order to write a useful filesystem you need boilerplate code to translate requests for a particular path into a logical object in your filesystem, and you need to do this in every FUSE API function you implement.

    You don’t actually need to do that. It’s true that the API does pass the path to every API function, which allows simple filesystems to be stateless and translate the path each time. However, the FUSE API does allow you store state associated with the file handle in open() and the directory handle in opendir(), and then make other functions ignore the path and just use the stored state. This makes it easier to write more complicated filesystems, and is required if you want to give your filesystem proper POSIX rename() semantics.

    See the fusexmp_fh.c example distributed with FUSE (as contrasted with fusexmp.c).

    This functionality is exposed in python-fuse as the file_class and dir_class members of Fuse, which let you associate a Python class with file and directory handles; file_class is demonstrated in the xmp.py example that comes with python-fuse.

  10. Last I checked (which was about a year ago, when I wanted to use these bindings for coursework) the Python FUSE bindings didn’t really support the dir_class setup in a sane/useful way. Has this been improved?

  1. [...] full post on Hacker News If you enjoyed this article, please consider sharing it! Tagged with: code [...]

  2. [...] Sie sind hier: Startseite » Geekstuff » Eigenes Dateisystem in nur 50 Zeilen Code Geeks und nerds, schaut euch diesen Artikel an: Building Filesystems the Way You Build Web Apps [...]

  3. [...] Ksplice » Building Filesystems the Way You Build Web Apps – System administration and softwar… (tags: fuse python programming filesystem linux filesystems) [...]

  4. [...] for 2010-07-10 By Josh | July 10, 2010 Building Filesystems the Way You Build Web Apps "Interesting concept. Layer the routing guts found in modern web frameworks over Linux’s [...]

  5. [...] the details are mostly lost on me, Evan Broder exposes some interesting concepts regarding arbitrarily mapping URLs to a file system. The post on Ksplice is Ruby/Python-heavy, but if you take your time reading, you can follow. The [...]

  6. [...] Ksplice " Building Filesystems the Way You Build Web Apps – System … – [...]

  7. [...] error pages); it should # likely stay at the top, ensuring it can always be resolved … Read more>>> This entry was posted in Route Maps. Bookmark the permalink. ← München hören – [...]

Leave a Reply