tl;dr I realized I can’t parse into an AST until I have a place to put an AST, so I need an allocator first.
I have spent an inordinate amount of time this week reading the 100 Cupboards series, and not thinking about this project. The same author wrote Outlaws of Time which I couldn’t put down last year even though I was in bed with a fever at the time. N.D. Wilson is the kind of writer who can paint a thousand pictures with every word, rendering an action movie in real time in your imagination. Coincidentally, that book was inspired by a dream while he himself was down with a fever.
Anyway enough distractions, I should put some thought into my own grammar!
Lisp is represented by S-expressions, making it an easy grammar by any standards, certainly Mr Wilson’s. I’m not dealing with hash-map, string or numeric syntaxes yet either. All I have is lists of symbols.
Despite picking the easiest thing in the world to parse, it’s been nearly half a lifetime since I did this at university. This is going to take me a while to relearn. I’m definitely going to feel not very smart for a while.
Things I have to consider:
(a . b)
(a b c d e)
The data structure to represent these S-expressions isn’t immediately obvious. In
the more Common Lisps, the empty list
() is also represented by the symbol
Does that mean empty list really is a symbol? Or is it a cons cell with both
values empty? Or both?
Thinking about it, it can only be one thing: a special symbol. Symbols and cons cells
are the only constructs available and if a cons cell has both values empty, then
what is the type of empty? Empty must be the
nil symbol, alternatively written
Nextly, should we just use
Vec to represent lists or should we go
down that road,
using a Cons-ish type to make a linked list? I’m inclined toward the latter, since
it’s more historically accurate and I like history. And pain, apparently.
I don’t really like pain and making a linked list structure in Rust does seem
inadvisable. It would also probably mean
Rc<Pair> being a pervasive type throughout
my project, which I’m not excited about. It’s not the memory managment model I’m
interested in exploring further down the road.
What this is beginning to look like is that we’re not quite ready to parse.
I should have seen this coming. I’ve spent a couple years daydreaming about creating a programming language and have always come back to the notion that memory management is so fundamental that it must be the starting point. We’re going to have to switch tracks briefly.
There are so many options available and none, with the exception of
Rc, are going
to be easy. I don’t want to use
I just happen to have a copy of The Garbage Collection Handbook and I will now spend some time in it’s pages.
I have decided. Further down the line we’ll need a full fledged GC. Before we get there, we’ll need an allocator and an API to it that will remain reasonably stable.
My provisional plan is to build a hybrid mark-sweep/copying collector but that’s a little way off. That GC will need only a basic bump-pointer allocator, which I’m happy about. We’re kicking the complexity can down the road.
This setup will be temporary. For the full-fledged GC, we’ll have to replace that with our own custom allocator.
Because we’re not going to have a full GC yet, we’ll just be allocating into the
Vec without freeing anything. When the
Vec is full, we’ll panic with
New state of code tagged here. I also made some minor changes to the lexer: renaming some things and switching from line/char based iterating to purely char based iterating.
Here’s what I started with for an allocator:
bump is the index to the next free location to allocate an object into and
is the pointer to a contiguous segment of memory that will hold our objects.
Arena is constructed simply thusly where
allocate() comes from the
Now we’ll walk through the interesting part of the code: allocating space for a
new object and writing it into the
Arena. To start with we’ll write a test
that calls the
Arena::allocate<T>() function and attempts to dereference
the pointer, testing that the memory location contains the expected data.
allocate() function starts out empty, returning a null pointer.
The pointer is wrapped in a
Ptr<T> type for which we implement
cargo test fails with a segfault for dereferencing a null pointer:
Now we’ll try to get the test to pass.
The first thing to do is check that there’s enough
buffer space left:
The test continues to segfault. We need to copy
object into the
Ptr with a valid pointer.
What did we do there? We used
to create a new pointer from
buffer pointer plus the value in
bump. Then we used
object to the
buffer starting at that new pointer address.
Finally we increment
bump and return the wrapped pointer in a
Now the test passes!
I’ll add another test to make sure our out-of-memory panic works:
and yes! It does! After implementing
Arena so that
deallocated properly, we have a basic allocator.
We will not implement dropping
the objects inside the allocator as that amounts to implementing finalizers
and I’m not going there. Whatever goes into
Arena will have to be OK with
not having it’s own
drop() method being called.
In the making of this stage I consulted:
Next time we might get to do some parsing (^‿^)