Peter Liniker

[ about ] - [ blog ]

tl;dr I realized I can’t parse into an AST until I have a place to put an AST, so I need an allocator first.

ParseError

I have spent an inordinate amount of time this week reading the 100 Cupboards series, and not thinking about this project. The same author wrote Outlaws of Time which I couldn’t put down last year even though I was in bed with a fever at the time. N.D. Wilson is the kind of writer who can paint a thousand pictures with every word, rendering an action movie in real time in your imagination. Coincidentally, that book was inspired by a dream while he himself was down with a fever.

Anyway enough distractions, I should put some thought into my own grammar!

Lisp is represented by S-expressions, making it an easy grammar by any standards, certainly Mr Wilson’s. I’m not dealing with hash-map, string or numeric syntaxes yet either. All I have is lists of symbols.

Despite picking the easiest thing in the world to parse, it’s been nearly half a lifetime since I did this at university. This is going to take me a while to relearn. I’m definitely going to feel not very smart for a while.

Things I have to consider:

Internal representation

The data structure to represent these S-expressions isn’t immediately obvious. In the more Common Lisps, the empty list () is also represented by the symbol nil. Does that mean empty list really is a symbol? Or is it a cons cell with both values empty? Or both?

Thinking about it, it can only be one thing: a special symbol. Symbols and cons cells are the only constructs available and if a cons cell has both values empty, then what is the type of empty? Empty must be the nil symbol, alternatively written ().

Nextly, should we just use Vec to represent lists or should we go down that road, using a Cons-ish type to make a linked list? I’m inclined toward the latter, since it’s more historically accurate and I like history. And pain, apparently.

I don’t really like pain and making a linked list structure in Rust does seem inadvisable. It would also probably mean Rc<Pair> being a pervasive type throughout my project, which I’m not excited about. It’s not the memory managment model I’m interested in exploring further down the road.

What this is beginning to look like is that we’re not quite ready to parse.

I should have seen this coming. I’ve spent a couple years daydreaming about creating a programming language and have always come back to the notion that memory management is so fundamental that it must be the starting point. We’re going to have to switch tracks briefly.

There are so many options available and none, with the exception of Rc, are going to be easy. I don’t want to use Rc.

I just happen to have a copy of The Garbage Collection Handbook and I will now spend some time in it’s pages.

Back soon…

Memory management

I have decided. Further down the line we’ll need a full fledged GC. Before we get there, we’ll need an allocator and an API to it that will remain reasonably stable.

My provisional plan is to build a hybrid mark-sweep/copying collector but that’s a little way off. That GC will need only a basic bump-pointer allocator, which I’m happy about. We’re kicking the complexity can down the road.

Quite coincidentally, a solution presented itself today for backing malloc with Vec so that’s where we’ll begin. I won’t even have to write it myself because Jonathan Reem has already done so!

This setup will be temporary. For the full-fledged GC, we’ll have to replace that with our own custom allocator.

Because we’re not going to have a full GC yet, we’ll just be allocating into the Vec without freeing anything. When the Vec is full, we’ll panic with out-of-memory.

Implementing

New state of code tagged here. I also made some minor changes to the lexer: renaming some things and switching from line/char based iterating to purely char based iterating.

Here’s what I started with for an allocator:

pub struct Arena {
    buffer: *mut u8,
    size: isize,
    bump: isize
}

where bump is the index to the next free location to allocate an object into and buffer is the pointer to a contiguous segment of memory that will hold our objects.

An Arena is constructed simply thusly where allocate() comes from the memalloc crate:

impl Arena {
    pub fn new(size: isize) -> Arena {
        let buffer = unsafe { allocate(size as usize) };

        if buffer == ptr::null_mut() {
            panic!("could not allocate memory!");
        }

        Arena {
            buffer: buffer,
            size: size,
            bump: 0
        }
    }
}

Now we’ll walk through the interesting part of the code: allocating space for a new object and writing it into the Arena. To start with we’ll write a test that calls the Arena::allocate<T>() function and attempts to dereference the pointer, testing that the memory location contains the expected data. This allocate() function starts out empty, returning a null pointer.

The pointer is wrapped in a Ptr<T> type for which we implement Deref and DerefMut:

As expected, cargo test fails with a segfault for dereferencing a null pointer:

pub struct Ptr<T> {
    ptr: *mut T
}

impl Arena {
    pub fn allocate<T>(&mut self, object: T) -> Ptr<T> {
        // return a NULL pointer
        Ptr { ptr: ptr::null_mut() }
    }
}

#[cfg(test)]
{
    #[test]
    fn test_alloc_struct() {
        let mut mem = Arena::new(1024);
        let ptr = mem.allocate(Thing::new());
        assert!(ptr.check());  // dereference the pointer and check memory contents
    }
}

Now we’ll try to get the test to pass.

The first thing to do is check that there’s enough buffer space left:

impl Arena {
    pub fn allocate<T>(&mut self, object: T) -> Ptr<T> {
        let next_bump = self.bump + (mem::size_of::<T>() as isize);
        if next_bump > self.size {
            panic!("out of memory!");
        }

        // return a NULL pointer
        Ptr { ptr: ptr::null_mut() }
    }
}

The test continues to segfault. We need to copy object into the Arena and return a Ptr with a valid pointer.

impl Arena {
    pub fn allocate<T>(&mut self, object: T) -> Ptr<T> {
        let next_bump = self.bump + (mem::size_of::<T>() as isize);
        if next_bump > self.size {
            panic!("out of memory!");
        }

        let p = unsafe {
            let p = self.buffer.offset(self.bump) as *mut T;
            ptr::write(p, object);
            p
        };

        self.bump = next_bump;

        Ptr { ptr: p }
    }
}

What did we do there? We used std::ptr::offset() to create a new pointer from the Arena buffer pointer plus the value in bump. Then we used std::ptr::write() to copy object to the buffer starting at that new pointer address.

Finally we increment bump and return the wrapped pointer in a Ptr<T> abstraction.

Now the test passes!

I’ll add another test to make sure our out-of-memory panic works:

#[cfg(test)]
{
    #[test]
    #[should_panic]
    fn test_out_of_memory() {
        let mut mem = Arena::new(1024);
        loop {
            let _ptr = mem.allocate(Thing::new());
        }
    }
}

and yes! It does! After implementing Drop for Arena so that buffer gets deallocated properly, we have a basic allocator.

We will not implement dropping the objects inside the allocator as that amounts to implementing finalizers and I’m not going there. Whatever goes into Arena will have to be OK with not having it’s own drop() method being called.

In the making of this stage I consulted:

Up next…

Next time we might get to do some parsing (^‿^)