Regular expressions (regexes) are among the more cryptic dialects of code, adding dense clusters of undifferentiated punctuation, even when written by programmers who otherwise go to great lengths to favor clarity. Continually confounding for beginning programmers, they are nearly unavoidable, appearing in nearly identical forms across many programming languages. Take, for instance, this quite common regex:

(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])

As we can all see, it is how one determines whether an email address is valid.

8051Enthusiast has created a tool called regex2fat which translates regexes like the one above into disk images, creating a labyrinth of folders one can navigate through to find matches for their expression. Asked how he came up with the idea, he said:

I unfortunately can't really remember the moment it clicked, but I must have thought something like "Oh no that's terrible."

For an example of how this works, let's start with a simpler regex than the one above, perhaps:

AB+C

The plus sign is a metacharacter called a Kleene plus, indicating one or more of the previous symbol. The expression above would match "ABC" but also "ABBBBC" or "NNNABC" (which contains the ABC sequence within it).

After running regext2fat, I have a FAT32 image I can mount in Windows like an external drive and navigate through:

At the root is a directory list containing each letter of the alphabet, space (as "SPACE"), numerals, and the most common punctuation. I select one, open it, and again am presented with the same exhaustive list of every choice for the next letter of my string. Each choice I make brings me to a folder with the same list. If I choose a path that matches my regular expression, I will find an empty file called "MATCH", along with the same set of folders again, to go in deeper, and perhaps find more MATCHes. Above is the string "V@ABBC" in the AB+C regex.

8051Enthusiast explains why this transformation works:

Regular regexes (i.e. no backreferences and similar advanced features) can be turned into a so called DFA (deterministic finite automaton). This is basically a bunch of arrows going between states, where an arrow is labeled with a letter so that a letter in a state causes the current state to go along the arrow to another state, with a subset of states being accepting.

The set of states for our AB+C example is the algebraic representation of this DFA:

Starting at state 0, an A moves us to the first state, followed by any number of B's, and so on. At State 3, we find our MATCH. Ordinarily, we don't expect folders to loop, as the B's do above.

While Fat32 normally has a tree-like structure, each directory just references blocks anywhere on the file system, so the same block can be referenced from multiple directories. The directories also have no explicit field for parent directories, so one can leave .. [ED: parent directory] out. This allows for graph structures inside a file system, which a DFA basically is.

It's not just turning the expected tree into a graph pattern that makes regex2fat so odd: it's also the mix of spatial metaphors. Folders, after all, are not just a hierarchy, they are containers that hold other things. On a file system, an empty folder should be cleaned up, deleted, it indicates poor organization. Regex2fat turns every string into a series of endlessly expanding containers for every possibility of language, a single letter at a time, like a remedial Borges's Library. If I want to see whether the word "FOLDER" matches my regex, the answer is six folders deep.

regex2fat speaks to the elasticity of metaphor in computing, and the ease of replacing one vision of the same data with another; a common feature of multicoding esolangs, and of glitch art techniques like sonification. Computers are metaphor machines. Melanie Hoff, an artist and educator, uses this in their peer-to-peer folder poetry class, where tree structures are employed to build "unfolding narratives, rhythmic prose, and choose-your-own-adventure poetry" (see their example program garden-of-forking-paths). In their class, they point out that the folder symbol is itself an arbitrary metaphor betraying its bureaucratic origins (first developed for the ERMA system, developed in the 1950s for use by Bank of America), and could as easily have been represented by, say, a purse. If a series of purses holding smaller purses sounds absurd, consider classic Manila folders on a desk, holding a sequence of smaller ones inside.



Submitted as a regex2fat "issue" on GitHub

However, in addition to the choice of representing these strings as directories, there's the specific choice of the FAT32 file system, which has not been the default since Windows XP.

Before I finished this project, I didn't even know if it would actually cleanly work with FAT32 ... but there are a few reasons I chose FAT32:

  • FAT32 is probably the most widely supported filesystem
  • It is actually quite simple to implement, if you ignore long filenames
  • You can easily leave out the parent directory, since it is just a regular directory entry
  • I like FAT32

In the end, whether a filesystem works with this is mostly dependent on the implementation, since the specification (if there is one) probably leaves behaviour on directory loops undefined.

One way that it differs from FAT32 defaults is that regex2fat always runs if under the Windows 1252 OEM codepage, which means non-ASCII characters can't be used. It would be nice to see this include other character sets in future versions.

While the project has found an enthusiastic audience on Hacker News and reddit, 8051Enthusiast also had this to say about some of the feedback:

You know the kind of people that respond to things like this with "but what practical use does this have?" I sure hope that someday they'll stop when enough of these projects exist.