Writing a FUSE filesystem for viewing Ren'Py archives

Ren’Py is an open source visual novel engine used by a lot of games, including Slay the Princess and Doki Doki Literature Club.

Often, Ren’Py VNs store game assets in the engine’s proprietary archive format, and while there exist extractors for these, I would much rather have an option to mount rpa archives as virtual filesystems, as even though disk space is cheap, no extra space taken is still a vastly superior option, especially because I rarely look at assets, only when I’m curious about some details.

On Linux, FUSE provides an option to write file systems that operate in user space, so that’s what I’ll be using in this post.

First things first, I should probably familiarize myself with rpa archives so it’s time to clone the Ren’Py codebase.

Some git greping and finding files later, I’m looking at launcher/game/archiver.py which seems promising!

Ren’Py archives, which have the file extension rpa, like most formats, start with a magic value, which seems to be the ASCII string RPA-3.0 followed by the offset of the zlib-compressed file index and the encryption key, both of them hex values encoded to text, and finally, a newline character. Here I should note that Ren’Py’s own documentation states that this format is meant to prevent casual copying, but isn’t very secure.

It’s already possible to start writing some code, to parse this first part of the file:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16


#include <inttypes.h>
#include <stdio.h>
// more includes

static const char RPA3_MAGIC[] = {'R', 'P', 'A', '-', '3', '.', '0'};

int main(int argc, char *argv)
{
    // file loading, error handling code goes here

    // `header` is a NULL-terminated string containing the file header including the newline character
    char *prev = NULL;
    uint64_t compressed_index_offset = (uint64_t)strtoull(&header[sizeof(RPA3_MAGIC) + 2], &prev, 16);
    uint64_t xor_key = (uint64_t)strtoull(prev + 1, NULL, 16);
    printf("compressed_index_offset=%"PRIx64" xor_key=%"PRIx64"\n", compressed_index_offset, xor_key);
}

Here is the output from running it on a random archive I have laying around, confirming it works:

RPA-3.0 0000000041fc7a26 42424242
compressed_index_offset=41fc7a26 xor_key=42424242

Next, it’s time to decompress the file index, which is compressed by Ren’Py using a plain call to the zlib.compress() Python function.

Since most Linux distros already come with libzlib, I won’t try to roll my own here, maybe in an other post. This is what I ended with after an hour or so of coding¹, based on the zlib documentation and usage example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50


#include <zlib.h>

int decompress_file_index(uint64_t compressed_file_index_sz,
                          uint8_t compressed_file_index[static compressed_file_index_sz],
                          uint64_t *decompressed_file_index_sz,
                          uint8_t *decompressed_file_index[static *decompressed_file_index_sz])
{
    z_stream strm = {
        .zalloc = Z_NULL,
        .zfree = Z_NULL,
        .opaque = Z_NULL,
        .avail_in = compressed_file_index_sz,
        .next_in = compressed_file_index,
    };
    *decompressed_file_index_sz = 0;
    *decompressed_file_index = NULL;
    int ret = inflateInit(&strm);
    if (ret != Z_OK) {
        fprintf(stderr, "Failed to init zlib\n");
        return ret;
    }
    do {
        *decompressed_file_index_sz += DECODE_CHUNK;
        *decompressed_file_index = realloc(*decompressed_file_index, *decompressed_file_index_sz);
        strm.avail_out = DECODE_CHUNK;
        strm.next_out = (*decompressed_file_index) + (*decompressed_file_index_sz) - DECODE_CHUNK;
        do {
            ret = inflate(&strm, Z_NO_FLUSH);
            switch (ret) {
                case Z_NEED_DICT:
                    ret = Z_DATA_ERROR;
                case Z_DATA_ERROR:
                    inflateEnd(&strm);
                    free(*decompressed_file_index);
                    fprintf(stderr, "Z_DATA_ERROR\n");
                    return ret;
                case Z_MEM_ERROR:
                    inflateEnd(&strm);
                    free(*decompressed_file_index);
                    fprintf(stderr, "Z_MEM_ERROR\n");
                    return ret;
            }
        } while (strm.avail_out == 0);
    } while (ret != Z_STREAM_END);

    *decompressed_file_index_sz = strm.total_out;
    *decompressed_file_index = realloc(*decompressed_file_index, *decompressed_file_index_sz);
    inflateEnd(&strm);
    return ret == Z_STREAM_END ? Z_OK : Z_DATA_ERROR;
}

This isn’t the nicest API, I should of course defer the fprintf calls to the caller, but this works for now.

Ren’Py actually compresses a pickle.dumps() of the index using the highest supported pickle protocol of the Python version used by Ren’Py. This is nice as they can just pickle.loads(zlib.decompress(file_index)), but for our purposes, it means further processing. The pickled data is essentially a dictionary of lists, where the dictionary is indexed by file names, and each list contains exactly one tuple. The tuples are triplets of (offset ^ secret_key, file_size ^ secret_key, b""). I am not sure why the last empty bytestring is needed, but we’ll have to handle that as well. You can also see the previously extracted key is used to XOR the values.

Ren’Py 8.4.1, the latest version available at the time of writing, utilizes Python 3.12. The latest pickle data format was introduced in Python 3.8, so that is my target.

The closest I could find to a specification of the actual opcodes was CPython’s implementation. There’s also pickletools.dis which prints the opcodes of a pickled buffer:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21


>> import pickle
>> import pickletools
>> print(pickletools.dis(pickle.dumps({'foo':list((1,2,b""))})))
    0: \x80 PROTO      4
    2: \x95 FRAME      21
   11: }    EMPTY_DICT
   12: \x94 MEMOIZE    (as 0)
   13: \x8c SHORT_BINUNICODE 'foo'
   18: \x94 MEMOIZE    (as 1)
   19: ]    EMPTY_LIST
   20: \x94 MEMOIZE    (as 2)
   21: (    MARK
   22: K        BININT1    1
   24: K        BININT1    2
   26: C        SHORT_BINBYTES b''
   28: \x94     MEMOIZE    (as 3)
   29: e        APPENDS    (MARK at 21)
   30: s    SETITEM
   31: .    STOP
highest protocol among opcodes = 4
None

Disassembling a data dump, I can confirm that my particular archive uses protocol version 5:

1
2
3
4


>> f = open('dump.txt', 'rb')
>>> data = f.read()
>>> print(pickletools.dis(data))
    0: \x80 PROTO      5

However, take a look at this snippet of my data dump:

   66: \x8c     SHORT_BINUNICODE 'images/e1/d1_drive.webp'
   91: \x94     MEMOIZE    (as 5)
   92: ]        EMPTY_LIST
   93: \x94     MEMOIZE    (as 6)
   94: J        BININT     1111571984
   99: J        BININT     1111801854
  104: h        BINGET     3
  106: \x87     TUPLE3
  107: \x94     MEMOIZE    (as 7)
  108: a        APPEND
  109: \x8c     SHORT_BINUNICODE 'images/e1/e1i1.webp'
  130: \x94     MEMOIZE    (as 8)
  131: ]        EMPTY_LIST
  132: \x94     MEMOIZE    (as 9)
  133: J        BININT     1112029277
  138: J        BININT     1111674630
  143: h        BINGET     3
  145: \x87     TUPLE3
  146: \x94     MEMOIZE    (as 10)
  147: a        APPEND
  148: \x8c     SHORT_BINUNICODE 'images/e1/e1i10.webp'
  170: \x94     MEMOIZE    (as 11)
  171: ]        EMPTY_LIST
  172: \x94     MEMOIZE    (as 12)
  173: J        BININT     1112098102
  178: J        BININT     1112155806
  183: h        BINGET     3
  185: \x87     TUPLE3
  186: \x94     MEMOIZE    (as 13)
  187: a        APPEND

The order of the data is always the same: file name, offset, file size. This means we can just skip everything that isn’t a SHORT_BINUNICODE, or BININT, and extract pieces of data in this exact order, which also means that we don’t care for all opcodes, except for the ones that represent an (unsigned) integer or a string of characters. I’ll list all the opcodes we might care about, as I haven’t bothered to check if Ren’Py places any restrictions on asset sizes:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16


enum opcode {
    BININT           =    'J',
    BININT1          =    'K',
    LONG             =    'L',
    BININT2          =    'M',
    STRING           =    'S',
    BINSTRING        =    'T',
    SHORT_BINSTRING  =    'U',
    UNICODE          =    'V',
    BINUNICODE       =    'X',
    BINFLOAT         =    'G',
    LONG1            = '\x8a',
    LONG4            = '\x8b',
    BINUNICODE8      = '\x8d',
    SHORT_BINUNICODE = '\x8c',
};

But of course I’m only going to implement BININT and SHORT_BINUNICODE³. A BININT opcode is followed by a 32-bit unsigned integer (least-significant byte first), and SHORT_BINUNICODE is just a single-byte value for the length, followed by the actual string (these aren’t null-terminated). Anyway, here is what I came up with after two hours of digging and coding:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61


#include <inttypes.h>

struct rpa_entry {
    char *name;
    uint32_t offset, size;
};

enum next_binint {
    NEXT_OFFSET = 0,
    NEXT_SIZE = 1,
};

void unpickle_index(const uint64_t file_index_sz,
                    const uint8_t file_index[static file_index_sz],
                    uint32_t key,
                    unsigned *entry_count,
                    struct rpa_entry *entries[static *entry_count])
{
    const uint8_t *p = file_index;
    // preallocate 10k entries
    unsigned nb_entries = 10000, entry_idx = 0;
    enum next_binint next_binint = NEXT_OFFSET;
    uint32_t val = 0;
    *entries = calloc(sizeof(struct rpa_entry) * nb_entries, 1);
    while (p < file_index + file_index_sz) {
        switch(*p) {
            case BININT:
                val = (((*(p + 1)) <<  0) |
                       ((*(p + 2)) <<  8) |
                       ((*(p + 3)) << 16) |
                       ((*(p + 4)) << 24)) ^ key;
                switch (next_binint) {
                    case NEXT_OFFSET:
                        (*entries)[entry_idx].offset = val;
                        break;
                    case NEXT_SIZE:
                        (*entries)[entry_idx].size = val;
                        entry_idx++;
                        if (entry_idx == nb_entries) {
                            nb_entries *= 2;
                            *entries = realloc(*entries, sizeof(struct rpa_entry) * nb_entries);
                        }
                        break;
                }
                next_binint = !next_binint;
                p += 5;
                break;
            case (uint8_t) SHORT_BINUNICODE:
                val = (*(p + 1));
                (*entries)[entry_idx].name = calloc(val + 1, 1);
                memcpy((*entries)[entry_idx].name, p + 2, val);
                p += 2 + val;
                break;
            default:
                p++;
        }
    }

    *entry_count = entry_idx;
    *entries = realloc(*entries, (*entry_count) * sizeof(struct rpa_entry));
}

The code just parses byte-by-byte, but even on an i5-3340, it feels instantaneous, since we are only talking about kilobytes of data. Of course, it might be a fun challenge to try and vectorize it. Anyway, we can confirm that it works with the following dirty snippet:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10


lseek(rpafd, entries[0].offset, SEEK_SET);
{
    char *slash = &entries[0].name[strlen(entries[0].name)-2];
    while((*slash) != '/') slash--;
    FILE *f = fopen(slash + 1, "w");
    uint8_t *data = malloc(entries[0].size);
    read(rpafd, data, entries[0].size);
    fwrite(data, 1, entries[0].size, f);
    fclose(f);
}

This correctly dumps an image from the archive⁴.

Now it’s time to turn this into a FUSE file system! FUSE file systems thankfully do not have to support all file system operations, which is good, because we only want to allow reading. The documentation consists of the library’s example directory and its Doxygen page of its API. I also used the manual page as a reference. The gist of it is, we have to fill out a struct fuse_operations with all the file system operations that we support, and then call fuse_main.

Now, one issue is, file systems are hierarchical, but I just stored everything in a flat array, rather than a directory tree structure, which would make actually showing a directory hierarchy rather tricky. I’ve also never had to implement a directory structure, so it’s about time I got my hands dirty. This will of course explode the number of allocations, but the few extra milliseconds at startup shouldn’t matter. Here’s how I ended up implementing the function which actually populates our directory structure:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60


struct rpa_node {
    char *name;
    union {
        struct {
            uint32_t offset, size;
        } file;
        struct {
            struct rpa_node **entries;
            int nb_entries;
        } dir;
    } node;
    bool is_dir;
};

const char *next_slash(const char *p)
{
    const char *q = p;
    while (*q && (*q) != '/') q++;
    return q;
}

void add_node_to_tree(struct rpa_node *root, const char *path, uint32_t offset, uint32_t size)
{
    struct rpa_node *node = root;
    const char *p = path;
    size_t path_len = strlen(path);
    while (p < path + path_len) {
        const char *q = next_slash(p);
        size_t component_len = q - p;
        bool found = false;
        for (unsigned i = 0; i < node->node.dir.nb_entries; i++) {
            struct rpa_node *e = node->node.dir.entries[i];
            size_t node_name_len = strlen(e->name);
            if (component_len == node_name_len && strncmp(e->name, p, component_len) == 0) {
                found = true;
                node = e;
                break;
            }
        }
        if (!found) {
            node->node.dir.entries = realloc(node->node.dir.entries, (node->node.dir.nb_entries + 1) * sizeof(struct rpa_entry *));
            struct rpa_node *new_node = calloc(sizeof(struct rpa_node), 1);
            new_node->is_dir = true; // everything starts off as a directory
            new_node->name = calloc(q - p + 1, 1);
            new_node->node.dir.nb_entries = 0;
            memcpy(new_node->name, p, q - p);
            node->node.dir.entries[node->node.dir.nb_entries] = new_node;
            node->node.dir.nb_entries++;
            node = new_node;
        }
        if(!(*q)) {
            node->is_dir = false;
            node->node.file.offset = offset;
            node->node.file.size = size;
            p = q; // set to the '\0' byte to break the loop
        } else {
            p = q + 1; // next component
        }
    }
}

It’s not pretty, and I’ll be honest, I did spend an hour or so with the GDB&printf combo debugging segmentation faults⁵. Each node can be either a regular file or a directory, so as lots of C codebases do it, I used a union and a kind variable (is_dir). The actual algorithm simply processes each component of a file path (as read from a BINUNICODE), and creates non-existing nodes. All the nodes created are directories, except for the last one.

Of course we must change unpickle_index⁶:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73


diff --git a/unpickle.c b/unpickle.c
index cc03f53..13f61eb 100644
--- a/unpickle.c
+++ b/unpickle.c
@@ -21,7 +21,7 @@
 #include <string.h>
 #include <stdio.h>
 
-#include "archive_entry.h"
+#include "fs.h"
 #include "unpickle.h"
 
 // Relevant opcodes taken from CPython
@@ -50,15 +50,15 @@ enum next_binint {
 void unpickle_index(const uint64_t file_index_sz,
                     const uint8_t file_index[static file_index_sz],
                     uint32_t key,
-                    unsigned *entry_count,
-                    struct rpa_entry *entries[static *entry_count])
+                    struct rpa_node *root)
 {
     const uint8_t *p = file_index;
     // preallocate 10k entries, will reallocate later
     unsigned nb_entries = 10000, entry_idx = 0;
     enum next_binint next_binint = NEXT_OFFSET;
     uint32_t val = 0;
-    *entries = calloc(sizeof(struct rpa_entry) * nb_entries, 1);
+    char *path = NULL;
+    uint32_t size, offset;
     while (p < file_index + file_index_sz) {
         switch(*p) {
             case BININT:
@@ -68,15 +68,12 @@ void unpickle_index(const uint64_t file_index_sz,
                        ((*(p + 4)) << 24)) ^ key;
                 switch (next_binint) {
                     case NEXT_OFFSET:
-                        (*entries)[entry_idx].offset = val;
+                        offset = val;
                         break;
                     case NEXT_SIZE:
-                        (*entries)[entry_idx].size = val;
+                        size = val;
                         entry_idx++;
-                        if (entry_idx == nb_entries) {
-                            nb_entries *= 2;
-                            *entries = realloc(*entries, sizeof(struct rpa_entry) * nb_entries);
-                        }
+                        add_node_to_tree(root, path, offset, size);
                         break;
                 }
                 next_binint = !next_binint;
@@ -84,15 +81,16 @@ void unpickle_index(const uint64_t file_index_sz,
                 break;
             case (uint8_t) SHORT_BINUNICODE:
                 val = (*(p + 1));
-                (*entries)[entry_idx].name = calloc(val + 1, 1);
-                memcpy((*entries)[entry_idx].name, p + 2, val);
+                if (path) {
+                    free(path);
+                    path = NULL;
+                }
+                path = calloc(val + 1, 1);
+                memcpy(path, p + 2, val);
                 p += 2 + val;
                 break;
             default:
                 p++;
         }
     }
-
-    *entry_count = entry_idx;
-    *entries = realloc(*entries, (*entry_count) * sizeof(struct rpa_entry));
 }

We also need a way to traverse this tree:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33


static struct rpa_node *find_node(struct rpa_node *root, const char *path)
{
    if (strncmp("/", path, strlen(path)) == 0) {
        return root;
    }
    struct rpa_node *node = root;
    size_t path_len = strlen(path);
    const char *p = path;
    if ((*p) == '/') p++;
    while (p < path + path_len) {
        const char *q = next_slash(p);
        size_t component_len = q - p;
        bool found = false;
        for (unsigned i = 0; i < node->node.dir.nb_entries; i++) {
            struct rpa_node *e = node->node.dir.entries[i];
            size_t node_name_len = strlen(e->name);
            if (component_len == node_name_len && strncmp(e->name, p, component_len) == 0) {
                node = e;
                found = true;
                break;
            }
        }
        if (!found)
            return NULL;
        if (*q) {
            p = q + 1;
        } else {
            p = q;
        }
    }

    return node;
}

This function is already libfuse-aware, as all of our file system operations will be passed absolute paths where "/" root is the file system’s mount point, like "/images/ch1/background.jpeg". The loop is mostly a simpler version of add_node_to_tree’s. Whenever we encounter a component that isn’t a leaf of the current node, we return NULL.

And now we can finally wire up libfuse! I’ll start by writing a really basic file system, so I can iterate on that. But first, I have to modify my argument handling. libfuse handles a bunch of commandline options like -o allow_root to allow the root user to access the files the program exposes, as well as -f to keep the program in the foreground.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33


struct fuse_operations rpa_ops = {
    .init = rpa_init,
    .getattr = rpa_getattr,
};

static struct rpa_options {
    const char *archive_name;
} rpa_opts;

#define OPTION(t, p) { t, offsetof(struct rpa_options, p), 1 }
struct fuse_opt fuse_opts[] = {
    OPTION("--archive=%s", archive_name),
    FUSE_OPT_END,
};

int main(int argc, char **argv)
{
    struct fuse_args args = FUSE_ARGS_INIT(argc, argv);
    if (fuse_opt_parse(&args, &rpa_opts, fuse_opts, NULL) < 0) {
        fprintf(stderr, "Usage: rpafs --archive=</path/to/archive.rpa> </path/to/mount/point>\n");
        return EXIT_FAILURE;
    }

    if (!rpa_opts.archive_name) {
        fprintf(stderr, "No archive name supplied\n");
        return EXIT_FAILURE;
    }

    // [...]

    ret = fuse_main(args.argc, args.argv, &rpa_ops, &root);
    fuse_opt_free_args(&args);
}

I found libfuse’s hello example a good reference for the argument handling. We specify the RPA file with --archive=, and we also get -f to run the file system in foreground mode, and -d for FUSE-level debugging! Note that libfuse only redirects stderr to the terminal, so when doing any sort of printf debugging, we must use fprintf(stderr, ...) instead.

Now onto the file system… We must fill out a structure of file operation handlers (rpa_ops in the above snippet). Luckily, as I’ve mentioned above, our callbacks only get passed paths that we can already process, so that makes our job easier. Based on the example code from the manual page, a basic file system must implement readdir, read, open, and getattr. As far as my understanding goes, getattr handles stat() calls and the like, and the rest are self explanatory :)

To be able to view directory contents, we need readdir and getattr. Here’s how I implemented them:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37


int rpa_getattr(const char *path, struct stat *st, struct fuse_file_info *fi)
{
    struct rpa_node *node = find_node(&root, path);
    if (node)
        st->st_blksize = archive_blocksize;
    else {
        return -ENOENT;
    }
    st->st_nlink = 1;
    if (!node->is_dir) {
        st->st_size = node->node.file.size;
        st->st_mode = 0444 | S_IFREG;
    } else {
        st->st_mode = 0555 | S_IFDIR;
    }

    return 0;
}

int rpa_readdir(const char *path,
                void *data,
                fuse_fill_dir_t filler,
                off_t offset,
                struct fuse_file_info *ffi,
                enum fuse_readdir_flags flags)
{
    filler(data, ".", NULL, 0, 0);
    filler(data, "..", NULL, 0, 0);

    struct rpa_node *node = find_node(&root, path);
    for (unsigned i = 0; i < node->node.dir.nb_entries; i++) {
        struct rpa_node *e = node->node.dir.entries[i];
        filler(data, e->name, NULL, 0, 0);
    }

    return 0;
}

Pretty straightforward so far. Our readdir handler gets a fuse_fill_dir_t, which is a function pointer that we must call to expose each directory entry. After adding the mandatory(?) . and .. paths⁷, we just look up the directory in our internal tree and then add all of its entries.

For getattr, we set some attributes. I don’t know if st_nlink must be at least 1, but I set it to that just to be safe, because the example code also had such a statement⁸. I also store the RPA file’s block size in archive_blocksize when stating the archive, so that programs can use it for reading chunks of data. As our FUSE filesystem is read only, I set directory permissions so that they are readable and executable (this is needed for stat I believe), and regular file permissions to be read-only. If there’s an error, we must return negative errno values, so for non-existing entries we return -ENOENT.

Now, it’s time to implement the actual file opening and reading.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37


int rpa_open(const char *path, struct fuse_file_info *fi)
{
    struct rpa_node *node = find_node(&root, path);
    if (!node)
        return -ENOENT;
    if (!node->is_dir) {
        if ((fi->flags & O_RDONLY) != O_RDONLY)
            return -EACCES;
        return 0;
    }
    return -ENOENT;
}

int rpa_read(const char *path,
             char *buf,
             size_t sz,
             off_t offset,
             struct fuse_file_info *fi)
{
    struct rpa_node *node = find_node(&root, path);
    uint32_t asset_offset = node->node.file.offset;
    uint32_t asset_size = node->node.file.size;
    if (offset + sz > asset_size) {
        sz = asset_size - offset;
    }
    if (lseek(rpafd, node->node.file.offset + offset, SEEK_SET) < 0)
        return -errno;
    int total_read = 0;
    int ret = 0;
    do {
         ret = read(rpafd, buf + total_read, sz - total_read);
         if (ret < 0)
             return -errno;
         total_read += ret;
    } while (total_read < sz);
    return total_read;
}

Our open handler only has to return 0 if all goes well, the file descriptors are handled by the kernel. We also make sure not to allow anything but read-only open calls. Handling reads is also straightforward, we just seek to the asset, and read at most sz bytes. And with that done, our file system works!

Screenshot of sxiv showing displaying sprites from Slay the Princess: The Pristine Cut

This way I no longer have to unpack gigabyte-large files just to inspect some assets! It was also a nice exercise in string handling, plus I’ve never written a FUSE file system before. Overall, I’m happy with this weekend hack. The program’s source code is available on Github under the GPLv3 license. Enjoy!

I have not written a serious C program in probably half a year, so I am quite rusty. ↩︎
These opcodes are taken from the CPython codebase ↩︎
After writing this post, I did some further testing, and my limited implementation ended up biting me. This will of course be fixed in the actual repository of the tool. ↩︎
Note to self: Take screenshots while writing code, not afterwards. If you read further, you’ll see that the flat array is eventually replaced with a tree-like structure, so that snippet no longer works for me, and I’m too lazy to checkout to a previous commit, so you’ll just have to take my word for it. ↩︎
Not sure if using Rust would have made this easier, as afaik you need unsafe {} for these kinds of data structures unless you want to bring in a dependency, but you’re welcome to correct me, as I’m very much a novice in Rust. ↩︎
I noticed some unused variables remained in the code. Oops… ↩︎
I’m not sure if this is the correct way to handle these paths, as I just wrote it like it was in the example code, but it seems to work. ↩︎
My assumption is that a file is internally a hard link to its data, but as always, Please correct me if I’m wrong. ↩︎