tl;dr I realized I can’t parse into an AST until I have a place to put an AST, so I need an allocator first.
I have spent an inordinate amount of time this week reading the 100 Cupboards series, and not thinking about this project. The same author wrote Outlaws of Time which I couldn’t put down last year even though I was in bed with a fever at the time. N.D. Wilson is the kind of writer who can paint a thousand pictures with every word, rendering an action movie in real time in your imagination. Coincidentally, that book was inspired by a dream while he himself was down with a fever.
Anyway enough distractions, I should put some thought into my own grammar!
Lisp is represented by S-expressions, making it an easy grammar by any standards, certainly Mr Wilson’s. I’m not dealing with hash-map, string or numeric syntaxes yet either. All I have is lists of symbols.
Despite picking the easiest thing in the world to parse, it’s been nearly half a lifetime since I did this at university. This is going to take me a while to relearn. I’m definitely going to feel not very smart for a while.
Things I have to consider:
()
, or nil
(a . b)
(a b c d e)
((a) b)
etcThe data structure to represent these S-expressions isn’t immediately obvious. In
the more Common Lisps, the empty list ()
is also represented by the symbol nil
.
Does that mean empty list really is a symbol? Or is it a cons cell with both
values empty? Or both?
Thinking about it, it can only be one thing: a special symbol. Symbols and cons cells
are the only constructs available and if a cons cell has both values empty, then
what is the type of empty? Empty must be the nil
symbol, alternatively written ()
.
Nextly, should we just use Vec
to represent lists or should we go
down that road,
using a Cons-ish type to make a linked list? I’m inclined toward the latter, since
it’s more historically accurate and I like history. And pain, apparently.
I don’t really like pain and making a linked list structure in Rust does seem
inadvisable. It would also probably mean Rc<Pair>
being a pervasive type throughout
my project, which I’m not excited about. It’s not the memory managment model I’m
interested in exploring further down the road.
What this is beginning to look like is that we’re not quite ready to parse.
I should have seen this coming. I’ve spent a couple years daydreaming about creating a programming language and have always come back to the notion that memory management is so fundamental that it must be the starting point. We’re going to have to switch tracks briefly.
There are so many options available and none, with the exception of Rc
, are going
to be easy. I don’t want to use Rc
.
I just happen to have a copy of The Garbage Collection Handbook and I will now spend some time in it’s pages.
Back soon…
I have decided. Further down the line we’ll need a full fledged GC. Before we get there, we’ll need an allocator and an API to it that will remain reasonably stable.
My provisional plan is to build a hybrid mark-sweep/copying collector but that’s a little way off. That GC will need only a basic bump-pointer allocator, which I’m happy about. We’re kicking the complexity can down the road.
Quite coincidentally, a
solution presented itself today
for backing malloc
with Vec
so that’s where we’ll begin. I won’t even have to write
it myself because Jonathan Reem has already done so!
This setup will be temporary. For the full-fledged GC, we’ll have to replace that with our own custom allocator.
Because we’re not going to have a full GC yet, we’ll just be allocating into the
Vec
without freeing anything. When the Vec
is full, we’ll panic with
out-of-memory.
New state of code tagged here. I also made some minor changes to the lexer: renaming some things and switching from line/char based iterating to purely char based iterating.
Here’s what I started with for an allocator:
where bump
is the index to the next free location to allocate an object into and buffer
is the pointer to a contiguous segment of memory that will hold our objects.
An Arena
is constructed simply thusly where allocate()
comes from the memalloc
crate:
Now we’ll walk through the interesting part of the code: allocating space for a
new object and writing it into the Arena
. To start with we’ll write a test
that calls the Arena::allocate<T>()
function and attempts to dereference
the pointer, testing that the memory location contains the expected data.
This allocate()
function starts out empty, returning a null pointer.
The pointer is wrapped in a Ptr<T>
type for which we implement Deref
and DerefMut
:
As expected, cargo test
fails with a segfault for dereferencing a null pointer:
Now we’ll try to get the test to pass.
The first thing to do is check that there’s enough buffer
space left:
The test continues to segfault. We need to copy object
into the Arena
and
return a Ptr
with a valid pointer.
What did we do there? We used
std::ptr::offset()
to create a new pointer from
the Arena
buffer
pointer plus the value in bump
. Then we used
std::ptr::write()
to copy object
to the buffer
starting at that new pointer address.
Finally we increment bump
and return the wrapped pointer in a Ptr<T>
abstraction.
Now the test passes!
I’ll add another test to make sure our out-of-memory panic works:
and yes! It does! After implementing Drop
for Arena
so that buffer
gets
deallocated properly, we have a basic allocator.
We will not implement dropping
the objects inside the allocator as that amounts to implementing finalizers
and I’m not going there. Whatever goes into Arena
will have to be OK with
not having it’s own drop()
method being called.
In the making of this stage I consulted:
Next time we might get to do some parsing (^‿^)