Nebula: a prototype of some ideas on file stores

I've been toying around lately with some ideas for a file store; I finally got around to implementing some of those ideas as a project called Nebula. There is source code available (written in Common Lisp) and a demo HTTP frontend. It's a very rough idea demonstrator, and the HTTP front end was mostly so I could test the idea faster — caveat user.

The motivation section is where I step on my soapbox and muse on the thoughts that lead me to this. You can skip this section if you're not fond of such diatribes and go directly to the nebula section.

Motivation

Nebula was motivated by a dissatisfaction in the current state of systems these days. Without going into too much detail on the subject, solutions like Docker [1] and container orchestration seem more primitive than what we're capable of. I'm also of the heretical opinion that Unix isn't the end-all be-all of operating systems [2]. This isn't to impugn the people working on these; right now, this is the best we have. I just think we can do better. But, given the trend of worse is better, I don't think we'll see any useful improvements on the status quo in the future (for a variety of reasons primarily non-technical), probably continuing to add more and more layers [3] as we keep doing now.

I'd much rather see something more along the lines of the Mirage OS / unikernel approach. Packaging an application shouldn't require bringing in an entire Linux installation in a container [4], or using a container system with pseudo-isolation. Applications should be properly isolated, like those running as unikernels on Mirage OS.

In thinking about this problem, I was strongly motivated by the paper "A Security Kernel Based on the Lambda Calculus" [5]; I'm not arguing that the security kernel introduced there is a good solution to this problem, but it has a good discussion of the characteristics of a security kernel.

On Security Kernels

The core tenant of the paper is that a secure system provides three guarantees:

  1. Completely isolated environments [6]. Applications cannot access each other's environments directly. This is a subtle guarantee: in a capability system [7], for example, this doesn't necessarily guarantee that they can't access the same memory. Environments may also have different scopes: they can apply to individual threads, or to application instances, depending on how the system is architected.
  2. Inter-environment communications. This is the environment analogue of IPC. Many of the same techniques apply: sockets, shared memory, and message passing, for example.
  3. Safe cooperation between environments provided by access mediation. The security kernel provides guarantees of the previous two, as well as resource constraints needed to ensure that applications can run harmoniously.

As I envision it, an operating system for the future would employ a unikernel approach to running processes; a small supervisor or hypervisor would mediate access and resource control, and it would probably employ a capability system to control access. In considering this, I wondered how the filesystem would play into this.

On File Systems

Files are fundamental to the Unix way. However, the POSIX model doesn't really work well with a capability system; they employ the access control list security model, which is the opposite of the capability model. In a capability system, the user presents some capability [8] to the security kernel or file server, which translates that capability to some resource. If user G707 [9] wants to share a file with G708, instead of adding G708 to an access roster (whether via a Unix group or by modifying the world permissions on the file), G707 will supply a capability to G708 that G708 can give to the file server to retrieve the file. In fact, any user with that capability can access the file. As I'll discuss later, there are ways for G707 to grant a capability to G708 such that G708's access can be restricted without restricting G707's access.

This granting could be done via a message passing system, for example.

Imagine an editor and compiler running as unikernels; the editor will have created some source file, and wants to grant access to it to the the compiler. In this case, the editor sends the capability to the compiler, and the compiler can access the source file. Neither the editor or compiler would necessarily need to know about the file sharing mechanism; it's likely that there is a filesystem driver included that knows how to translate between filenames and this capability-based access. It's similar to how we don't access files by their disk and inode; we have the abstraction of paths and hierarchical filesystems at our disposal.

I basically got this point thinking about this, then decided to play with what such a file server would look like.

Nebula

I like to start projects with a clearly defined problem statement [10]; what is it exactly that I'm trying to solve? Accordingly, the problem statement I came up with for Nebula is this:

Users have data: they want to be able to keep track of revisions to this data, and they would like to be able to share this data with other users. Users would also like the ability to cluster into groups to share and collaborate on this data.

Some secondary goals that I wanted to accomplish were

  • build real world experience designing, implementing, and operating capability systems
  • characterise the behaviour of capability systems in the real world

Characterising the solution

The next step after defining the problem statement is to figure out what the characteristics of this system are.

  • Users must be able to store and retrieve data.
  • Users must be able to view the history [11] of their data.
  • Users should be able to share data with other users.
  • A user should be able to refer to a piece of data as a leaf in a history tree, as a node in the tree, or as an isolated snapshot with the history information stripped.
  • Users should have some assurance as to the integrity and confidentiality of their data: one user should not be able to read another user’s file unless permission has been explicitly granted or unless the other user has their own copy of that data.

Towards a solution

I'll be upfront: Nebula doesn't meet all of these characteristics. It's a playground to see how some of these characteristics turn out. Specifically, it runs as an HTTP API (that's sort of REST-like, if you squint hard enough) on a Linux server; it can't make guarantees about what happens to its data outside of the system. Internally, it's consistent but since it isn't running on an isolated system, it can't make strong guarantees [12].

Towards this, Nebula is built on two concepts:

  • blobs
  • entries

A blob is some chunk of data. The server treats it as an octet string [13], and internally, they are identified with their hex-encoded SHA-256 identifier.

Referring to data by blob is only useful for getting the contents of the data; it also poses potential information leaks.

  • Information leakage: if G3210 wants to determine if someone already has a copy of some data, they would request its SHA-256 digest from the filestore. The server will return the data if it exists. This data is of no consequence to G3210, as they likely already had a copy of the data to produce the hash.
  • Adding any metadata changes the contents of the file, causing the SHA-256 identifier to change.
  • If G5907 wishes to delete their copy of a blob, the server would have no good way to know that G2081 also has a copy of that data and doesn't want it to be deleted.

The blob, then, is used for the immutable data: a piece of data as it exists at some point in time.

Entries contain metadata, and they have four main properties in this version of Nebula:

  • An identifier, in the form of a randomly generated version 4 UUID. This is a large enough object that collisions [14] are highly unlikely.
  • A target, which stores either a blob identifier (meaning the entry points directly at a blob), or a UUID (the entry points to another entry). This pointing at other entries will be discussed shortly.
  • A timestamp for when the entry was created.
  • A parent, which contains either nothing or a UUID pointing to the parent entry from which this entry derives.

For this prototype, I used Postgres to store entries [15]; during some discussion on IRC, it was mentioned that the simpler storage option for entries would be to serialise them to disk. This made garbage collection much more difficult: it's much easier to query the database to see if there are any remaining entries with a certain target than to scan a bunch of files or try to ensure some in-memory data structure was consistently getting serialised to disk [16].

I mentioned earlier that one user could grant access without jeopardising their access, and the discussion of the parent property hints at the solution. In capability systems, this is known as the "proxy" [17] pattern: user G7926 proxies some entry E8057 [18], creating a new entry E3002 whose target is set to E8057. When Nebula goes to resolve the target of E3002, it will see that it points to E8057, and will look up E8057's target (some blob) and return this data. If it turns out that access to the data via E3002 is no longer needed or desired, it can be removed; E8057 is still accessible. If instead E8057 is removed, Nebula knows to also remove E3002 as it would point to an invalid target.

To prevent information leakage, proxying defaults to creating a new entry with no parent (but it will preserve the timestamp). It's possible to walk back the list of parents of the original entry, building a history list (called a lineage here). This list can then be proxied in full, providing a proxied copy of the original data and its history.

A capability in Nebula, then, is an entry identifier [19]. Anyone who presents this identifier can read the associated data.

Conclusions

There's plenty of area for more work to be done on this.

  • I'd like to figure out a suitable non-HTTP API; thoughts include a Common Lisp interface, a 9P interface, and some sort of networked editor.
  • There's no data-at-rest plan for blobs. They're stored unencrypted. This would be an interesting problem.
  • If I am to stick with the HTTP API, it should be made much better.
  • There's no notion of files yet, which would make this much friendlier to use. I'm not sure this is an abstraction that belongs at this layer.
  • Along with files, it would be useful to have a history tree, not just a single lineage. This would involve finding some parent node, then walking back to build a tree of all the nodes with that parent at their root.
  • It would be useful to build at least one system that uses Nebula.
  • I'm still not sure this is an accurate depiction of a capability system.
  • Deriving from its origins as a prototype for a hypothetical operating system filestore, Nebula is intended to provide a filestore for one host. What would it look like to distribute Nebula?
  • Nebula doesn't enforce any sort of resource restrictions (maximum file size, quotas, etc…). I'm not sure that belongs at this abstraction layer, but might make at some other layer.
  • I'm not entirely happy with the entries being stored in Postgres, as it adds the overhead of running Postgres. I've thought about investigating elephant or manardb or building a modernised CLOS persistence store, though the license (both use LLGPL) gives me second thoughts about the viability of modifications to or deriviatives of either.
  • There is lots of documentation to be written.

Testing Nebula

You'll need

  • a working Lisp; I use SBCL, but there are other options.
  • Quicklisp; I'll assume it's been set up to load when the Lisp system starts. This is done with an invocation of (ql:add-to-init-file).
  • Postgres, with a database and user (with write access to the database)

Quicklisp normally installs to $HOME/quicklisp and I'll assume that's the case.

Create the database credentials file in $HOME/.nebula.lisp with the following template (but be sure to fill in the right information):

;;; Example names taken from the postmodern docs.
((:DB-HOST "localhost")    ;; hostname of database
 (:DB-NAME "testdb")       ;; database name
 (:DB-PASS "surveiller")   ;; password
 (:DB-PORT 5432)           ;; port
 (:DB-USER "focault"))     ;; username

The following sequence will get the project running:

$ cd ~/quicklisp/local-projects
$ git clone https://github.com/kisom/cl-nebula
$ git clone https://github.com/kisom/cl-nebula-www
$ sbcl
* (ql:quickload "nebula-www")
* (nebula-www:startup)

The API documentation contains a list of endpoints.

Going forward

The next step is to build something on top of Nebula. My first thought is to map a POSIX-like file system on top as a proof of concept. It would handle some notion of identity and groupings, and allow refering to entries by a friendly string identifier. Specifically, one thing I thought of was a collaborative sort of text editor built as a web interface. I'm not entirely sure about this, though; it still needs some thinking.

Feel free to send your comments to kyle (at) tyrfingr (dot) is, or see the contact links on the comms page.

[1]I run quite a few services for myself inside Docker containers, and we use them quite a bit at work. Most of my thoughts here are informed by this experience.
[2]

That being said, I'm one of those Linux-on-the-desktop people. You'll find me armed with a Thinkpad running Ubuntu server, NixOS, or OpenBSD, using StumpWM as my window manager and spending most of my time in emacs or xterm. For those who point this out and say something to the effect of "just use OS X", I reply with a "been there, done that, lost enough time". Having spent a considerable amount of time running OS X, I find it gets in the way of my usage patterns more often than not. Perhaps I'm not so much a user as a refugee. This may be a case of "worse is better"; it's an actively maintained operating system that still allows me to be productive.

An interesting argument has been made previously that I should adapt to the OS X usage model. I take the opposite stance: I prefer to adapt the computer to my usage model. This subject could make a blog post of its own; I touched on it some in a previous post.

[3]There's an important difference between adding layers of abstraction, which might streamline certain things and allow higher-level reasoning of a system, and adding layers of obfuscation. For example, a half-truth is that the CPU doesn't differentiate between data types (not strictly true, but generally true); in assembly, you deal mostly with the structure and interpretation of computer memory. In a higher-level language like C, it's useful to differentiate between integers of various sizes and sign, or characters, or ~struct~s of other data. These data types are only abstractions of the programming language. The layering I see in solutions like Docker strikes me as more of the "yo dawg, I herd u liek to lunix while you lunix so i stuck some lunix on your lunix" variety.
[4]I am aware of RancherOS. It's not something that I see many people using, and the fact that it is still a Linux distribution notwithstanding, my other arguments apply.
[5]J. Rees. “A Security Kernel Based on the Lambda Calculus”, A.I. Memo 1564, MIT, 1996. It's available on the web; I've also typeset the paper as well.
[6]The term environment here means the lexical environment; it shouldn’t be confused with the concept of Unix environments. A lexical environment is the mapping of names to their values.
[7]Wikipedia defines a capability as "a communicable, unforgeable token of authority. It refers to a value that references an object along with an associated set of access rights." A capability system uses capabilities (instead of access control lists) as the primary mechanism for access control.
[8]Figuring out what the hell a capability actually is has been an interesting area of personal research and really one of the main motivators for Nebula.
[9]I'll tend to gensym names in this post.
[10]This approach has been directly inspired by Rich Hickey's "Hammock Driven Development" talk.
[11]Other feelings about VMS aside, this was a useful feature.
[12]It probably could, though this would be hacky and wouldn't actually be useful in the service of prototyping the idea.
[13]A blob has the Lisp type (SIMPLE-ARRAY (UNSIGNED-BYTE 8)); in C, this would be a uint8_t[].
[14]A collision would be two distinct entries getting the same identifier, even if one of those entries no longer exists.
[15]There is no evidence that this choice of data store was primarily motivated by a desire to use postmodern. You can't prove a thing.
[16]Postmodern turned out to be the perfect tool for this task. I'd written this originally in Clojure for reasons that aren't germane to this discussion, and attempted to port it to Haskell. The Clojure version used SQLite, and the Haskell version used serialisation. As a totally-arbitrary comparison, the Common Lisp entry and database code (e.g. entry.lisp and db.lisp) is 101 SLOC; the Clojure version (e.g. entry.clj and db.clj) is 213 SLOC; and the Haskell version (e.g. Entry.hs) uses 111 SLOC, all as counted by cloc. I didn't get garbage collection working in the Haskell version, which turned out to be a fatal flaw that led me to just use CL. Certainly much of this compactness of the Common Lisp code is due to a deeper familiarity with the language.
[17]I think.
[18]To avoid entering full UUIDs here, I'm using modified gensyms for entry IDs.
[19]I think.

Tags: , ,