Thrashing

I’m in cognitive overload. At work I’ve been solo on a project that’s big enough that every time I turn to look at a different piece of it, the last thing I worked on gets swapped out of my memory and I have to relearn the new current thing.

At the turn of the month we drove a total of 28 hours over 3 days to Oklahoma and back for a wedding. With four childrens. I’m ready for more! Almost recovered. This was massively disruptive as everything I was thinking about with regard to evalrus got swapped out of my brain completely, including enthusiasm for it.

I now have my very own not-for-sale-outside-the-Indian-subcontinent copy of The Dragon Book. It’s a lot more accessible than I remember! I often think I’d get a lot more out of my CS degree now than I did 18 years ago.

Copying Collection and Lifetimes

And now, some thoughts on managed memory and Rust semantics.

If anybody cared to look at memory.rs they might have considered that Ptr<T> references memory in an Arena but the lifetime of Ptr<T> is not limited to the lifetime of the Arena it is connected to. I’ve let some possible use-after-free unsafety leak out.

I thought about this, and tried adding an explicit lifetime to Ptr<T>, and thought about it some more. These lifetimes are viral and start cluttering everything up. I don’t like it, yet it would be the right thing to do.

I’m not going to do it. Here’s why:

If I tie a Ptr<T> to the lifetime of an Arena, the compiler can reasonably assume that a borrow of a Ptr<T> can last the lifetime of the Arena.

If I want to implement a copying collector, an object that is moved has an unpredictable lifetime from Rust’s point of view. The object continues to exist but any references to it would be invalid pointers.

If I implement a copying collector, I don’t want to be able to take long term references to Ptr<T>s anywhere - I have to be able to identify every Ptr<T> and update it to point to the new object location after it has been moved.

It seems to me that there’s something of a fundamental incompatibility between lifetimes and runtime garbage collection, especially if objects can be relocated. I don’t know what the answer is, if any. A compromise that leaks unsafety under specific circumstances may be the best outcome.

Part of my problem here is that I still don’t fully grasp the power of lifetimes and Rust’s type system. I come from a C/C++ and Python background so I’m used to unsafety. Creating safe abstractions is still a new challenge.

Symbol Mapping

Here’s a light memory management problem that had my brain tied in pretzels for a bit.

A symbol has a name represented by a string, but should be refered to in the interpreter by an address for simplicity and performance sake. Each symbol should be unique, there shouldn’t be a duplicate of any, so comparing any two symbols of the same name should, under the hood, compare their pointers to find equality.

The simple problem in my code is that a Symbol should be stored in an Arena - runtime managed memory. But where should it’s str representation live? Additionally, I need to map strs to Symbols bidirectionally. That suggests a HashMap but a HashMap is entirely Rust-managed.

I finally arrived at a solution. There are probably others, possibly better ones.

pub struct SymbolMap {
    map: HashMap<String, Ptr<Symbol>>
}

where a Symbol holds a copy of the raw &str fat pointer representation of the String key.

#[derive(Copy, Clone)]
pub struct Symbol {
    name_ptr: *const u8,
    name_len: usize,
}

The entire impl of Symbol is

impl Symbol {
    pub fn new<M>(value: &String, mem: &mut M) -> Ptr<Symbol> where M: Allocator {
        mem.alloc(Symbol {
            name_ptr: value.as_str().as_ptr(),
            name_len: value.as_str().len(),
        })
    }

    pub fn as_str(&self) -> &str {
        unsafe {
            let slice = slice::from_raw_parts(self.name_ptr, self.name_len);
            str::from_utf8(slice).unwrap()
        }
    }
}

SymbolMaps implementation is also simple:

impl SymbolMap {
    pub fn new() -> SymbolMap {
        SymbolMap {
            map: HashMap::new()
        }
    }

    pub fn lookup<M>(&mut self, name: &String, mem: &mut M) -> Ptr<Symbol>
        where M: Allocator
    {
        // Can't take a map.entry(name) without providing an owned String, i.e. cloning 'name'
        // Can't insert a new entry with just a reference without hashing twice, and cloning 'name'
        // Which is the lesser weevil? Perhaps making lookups fast and inserts slower.

        { // appease le borrow chequer inside this block
            if let Some(ptr) = self.map.get(name) {
                return ptr.clone();
            }
        }

        let name = name.clone();
        let ptr = Symbol::new(&name, mem);
        self.map.insert(name, ptr);
        ptr
    }
}

As the comments say, I decided to make name lookups the fast path and creating new symbols the slow path.

Symbols are helpfully immutable - SymbolMap doesn’t allow modifying a Symbol name after it has been created. This means that the internal pointer and size of the name won’t ever change and we can safely take copies of them for the Symbol type. So long as the HashMap outlives any Arenas containing Symbols we should be ok. Enforcing that relationship at compile time? Your suggestions most desirous.

At least now the RPL prints out symbol names correctly and that is very good!

If you look through the source code, you’ll see that I abstracted the Arena interface out into an Allocator trait. This will make it easier to refactor memory management down the road.

Up next…

Not sure. Still trying to regain momentum.