Well this is turning out deeper than I anticipated. In glorious personal tradition, I went into the parsing problem keyboard blazing and quickly ran into gotchas. I’ve rewritten and refactored my parser twice, reinventing the old problems that standard parsing techniques have been proven to solve before I was born.
Many many years ago I took a compilers course based on The Dragon Book. Perhaps we covered lexing and parsing… I have no recollection of any of it because, while Roger Bailey was a delight, he taught us the course using Haskell without teaching us Haskell and I don’t think I really understood anything.
I’ve spent the past two weeks learning by failing, reading articles, failing to completely understand the articles, failing some more and finally beginning to understand something of parsing. While I may not be a fast learner, in my defence I am a thorough learner.
My very first hastily scribbled attempt didn’t have any kind of lookahead and I found that I couldn’t accurately determine if an s-expression had correctly terminated or not.
I also hated the code. A very clumsy state machine, full of holes no doubt.
Behold and scorn!
Blech.
My second, more thoughtful, attempt was after I kind of began to understand recursive descent parsing. The parser sort of worked but I totally failed to implement lookahead correctly so the result was no better than the first attempt. In the commit above I tried to correct my peek code and came face to face with the wagging finger of the borrow checker.
Bother. Here I am at time of writing, staring at this error and rethinking my iteration data structures. I like the overall code pattern a whole lot better and I’m pretty sure I’m on the right track to a partially recursive slightly descending parser. That’s encouraging.
The problem is the TokenStream
struct which has the wrong structure and lifetimes
to solve this problem.
The TokenStream::peek()
function can’t return a reference
to a Token
that outlives more than zero recursive descents into the parser
because the peek
value gets overwritten by the value from next()
from the tokens
iterator.
The fanfare to Also sprach Zarathustra is playing in my head as I scroll through the documentation for the Peekable struct. This is the tool that will transform me from compiler-error-generating-keyboard-monkey into Rust-wielding spacefaring modern man.
Replacing TokenStream
with Peekable
does it. The tests pass, the RPL rpls.
That twice quoted section of code now reads like this:
I’ve moved the Pair
and Value
types into a separate file called… types.rs
.
Also in this update, I implemented fmt::Display
for Value
to print more better.
Here is the source tree in it’s new state.
If you looked at the code you’d have noticed that the Symbol
enum variant does not
carry the name of the symbol in it yet. That’s because I haven’t figured out how I
want to represent symbols internally yet. There are questions that need answering.
Symbols have names but names are inefficient representations. Addresses are better!
How do I want to map names to addresses and back again? Is this the same as an
environment or does it at least overlap in functionality somewhat? Where should the
symbol name strings live? In the Arena
or in a separate structure?
Tune in next time to find out zero or more answers!
Here’s a list of articles I have found to be the most helpful in this stage:
In addition to mandatory attendance of a full time job and the usual dinner, children to bed, collapse routine, my spare time is thinning out even more as I’m coaching football (“soccer”) for a team of eight-year-olds. Nevertheless, I will maintain my current level of mediocre productivity!