Back to old tricks, or, baby steps in Rust

(donsbot.wordpress.com)

173 points | by todsacerdoti 37 days ago

11 comments

  • enitihas 37 days ago

    I completed the second interpreter in the craftinginterpreters book in rust. One thing I noticed was all the places the author was explicitly keeping track of lifetimes, I could do it via the rust compiler. For example, there is a function which should only be called with strings which will live for the life of the program. Easy to enforce with rust.

    The only place where I couldn't find a zero cost abstraction was enum encoding, where enums are of different sizes.

    For example

    pub enum OpCode { OpReturn, OpConstant(u8), OpNegate, OpAdd, ... }

    It would be great if the compiler could use only 1 byte for the rest of the enums other than OpConstant, and more bytes for OpConstant, and use maybe the first 2 bits to store the number of bytes used, kind of like sds library by antirez. This is the only place where I felt my rust implementation was less efficient than the book's c implementation.

    Off course, I could go the same route as the c code, and not use type safe enums by just keeping an array of bytes and doing the decoding myself, but I wanted to have full compile time type safety.

    • Twisol 37 days ago

      > The only place where I couldn't find a zero cost abstraction was enum encoding, where enums are of different sizes.

      Unfortunately, variable-sized types interact poorly with arrays -- you can no longer access any element in constant time. UTF-8 has essentially the same issue.

      Did you end up abstracting over the byte array at all? I would imagine that some kind of `BytecodeBuffer` could give pretty reasonable ergonomics over a raw buffer without sacrificing efficiency or correctness.

      • masklinn 37 days ago

        > Unfortunately, variable-sized types interact poorly with arrays

        Unsized types interact badly with everything, really. That's why you usually shove them behind a pointer.

      • jacb 37 days ago

        If you want random access into an array of enums, you need them all to be the same size (there are also alignment concerns, which is why it's tough to use only 1 bit of overhead to discriminate an enum with two variants). I guess you could add indirection and have an enum with variants A(Box<AType>), B(Box<BType>), etc. This is good practice if you care about space and you have a variant with a type that's a few hundred bytes (not heap-allocated like with Vec). But it's not worth adding the indirection and potential cache miss to save a byte, when most cache lines can fit at least 32 bytes.

        • comex 37 days ago

          But you don't always need random access; often you only need sequential access, in which case you can pack differently-sized values into a single buffer. However, Rust enums don't support that use case. A macro could probably do it without too much loss of ergonomics, but I'm not sure if there's a good crate for it.

          • jacb 37 days ago

            That's a good point! I guess you'd still have to make sure that variants with alignment requirements don't get unaligned. One way to do that would be to add leading padding if they follow an unaligned variant. Adds some complexity to the representation, I guess, and I suspect there's other ways of doing it. Neat problem.

      • adamnemecek 37 days ago

        I've been writing Rust for some time and I think that people overuse closures. I've been looking at a lot of GUI framework implementations and a lot of them use closures for event handling. I think that the ECS (https://en.wikipedia.org/wiki/Entity_component_system) paradigm is a much better match. I've been working on an GPU-first, ECS based GUI framework for my app (http://ngrid.io) and couldn't be happier. I might open source parts of it. The code ends up being really short and and uses little memory.

        A browser might have multiple representations of a DOM node in memory (the pipeline from DOM to GPU is really long). I have like one.

        I realize this is only tangentially related.

        • kvark 37 days ago

          Also seeing closures as a universal instrument to many cases, often overused.

          Problem with them is disrupted control flow and implicit capturing of the context. However, in cases where you have to guarantee something to happen after user code, the RAII guards can't replace the closures, because drop() isn't guaranteed to be called even in safe code...

          The ECS GUI integration is an interesting direction. I think Rust ecosystem looked at it, and it was hard.

        • amelius 37 days ago

          I just followed your link to the ECS Wikipedia page, and it seems to me that ECS is a very dynamic way of creating objects and defining their behavior, almost like how you can assign functions to objects in JavaScript without defining them in a class. Doesn't this conflict with the more formal approach of using a strong type system?

          • arc-in-space 37 days ago

            This feels hard to answer, and I feel like you're making some fundamental category error. Having a type system doesn't mean "don't do suspiciously-dynamic-sounding things", it means "don't accidentally mix up data with incompatible structure". Rust, C++, Haskell, etc. are all perfectly happy with polymorphism, and a typical ECS certainly is well-typed. It's just another way of doing mixins.

            • twic 36 days ago

              It arguably conflicts a bit, because in an ECS, you can have notions of, say, Movable and Damageable, but there's no way to write a function applies to things which are both Movable and Damageable, right? The closest you could get would be a function which takes both a Movable and a Damageable, and has to trust that they correspond to the same entity.

              • adamnemecek 36 days ago

                No actually, you query both and as you iterate through the results, the current element is one that has both. Look at some examples of the Rust ECS framework specs.

            • orthoxerox 37 days ago

              The primary goal of ECS is arranging data in a more CPU-friendly way. The secondary goal is avoiding hardcoding the behavior of your model into a type hierarchy.

              As a comparison, imagine your bank's lending products. You probably don't want to hardcode them into a type hierarchy with a virtual method AccrueInterest(Date), but rather have flags that indicate which loans accrue daily, which loans accrue monthly, which loans have some secondary interest etc.

            • adamnemecek 37 days ago

              You are right, it's dynamic, however it's a different type of dynamic than in JS. The reason being that systems query components they operate on as opposed to coupling code with data.

              Like you can't have a type mismatch with systems, since you get the data you queried.

            • ComputerGuru 37 days ago

              The Orb rust GUI library developed for RedoxOS is based on ECS.

            • mamcx 37 days ago

              Now the question is how use ECS for build a query/stream?

            • aazaa 37 days ago

              I like the benchmark comparison between Box and Trait approaches. It's not easy to find these comparisons.

              Regarding overall approach, Iterators are very flexible, and I find it a little surprising that you get Iterator behavior (the Stream trait) without explicitly mentioning Iterator in the code.

              If iterator performance is a concern in, for example, file I/O, there are ways to address that:

              https://stackoverflow.com/questions/45882329/read-large-file...

              • Tempest1981 37 days ago

                As a Rust newbie, how do I learn how to read this?

                  pub struct Stream<'s, S: Seed, A: Copy> {
                      next: Box<dyn Fn(S) -> Step<S, A> + 's>,
                      seed: S
                  }
                
                Can someone summarize in a sentence or two? Or point to some docs?
                • dochtman 37 days ago

                  Stream is a struct (a type with named fields). It has three parameters, the lifetime s, the type S and the type A. S must satisfy the Seed trait, and A must satisfy the Copy trait. Stream has two fields, next and seed. The type of next is a Box (basically an owned pointer into the heap) of a type that implements the Fn(S) -> Step<S, A> interface (that is, it can be called with an S as the single argument and produces something of the type Step parametrized with S and A) and does not outlive the s lifetime. The field seed is of the type S.

                  This chapter of the book is probably a good start:

                  https://doc.rust-lang.org/book/ch10-00-generics.html

                  • partiallattice 36 days ago

                    Small typo:

                    it can be called with an S as the single argument and produces something of the type Step parametrized with S and A) and -does not- outlives the s lifetime.

                    's refers to a minimal lifetime and not a maximal limit.

                  • steveklabnik 37 days ago

                    I read this as "there is a public struct named Stream that takes one lifetime parameter and two type parameters. There are two members, a seed, and a next closure that takes a seed, and returns some iterator which gives a new seed and a value."

                    That doesn't fully describe all of it, and there's some inferences there, but that's how I hear it in my head. Some of it is experience, and some of it is context, too. Especially that "an iterator" part. I'd expect to see Iterator when talking about iterators, but the 'shape' for a lack of a better word looks like one to me.

                    Rust's syntax is pretty regular and modular. After some practice, the bits start to fit together, and the way to tackle more complex signatures is the same as how you tackle complex code: break it up into smaller bits, and then assemble your understanding of the bits.

                    • vaylian 37 days ago

                      That's a fairly complicated example. Let me deconstruct it:

                      In Rust you work a lot with types. You can write code that supports several different types but you usually have to specify which constraints these types have to fulfill. You specify these constraints between <> brackets. In this example we have 3 such constraints:

                      1) <'s, S: Seed, A: Copy>

                      2 & 3) <dyn Fn(S) -> Step<S, A> + 's>

                      with 2) <S, A>

                      with 3) <dyn Fn(S) -> Step<S, A> + 's>

                      Because these constraints are all part of the same struct definiton, the names in these 3 constraints refer to the same things. The names are: s, S, A

                      Constraints in Rust can either be lifetimes or traits (similar to interfaces in Java). Lifetimes are marked with a single quote (') followed by the name of the lifetime (in this case: 's)

                      Let's look at constraint number 1):

                      I) We declare a lifetime 's but we do not specify where it will be applied in the struct yet.

                      II) We declare a type constraint S. Every type that is used as S must implement the Seed trait.

                      III) We declare a type constraint A. Every type that is used as A must implement the Copy trait.

                      Before I go into constraint 2&3, I must explain that `Fn(S) -> Step<S, A> + 's>` describes the type signature of a closure. The syntax is basically `Fn(parameter-list) -> return-value`.

                      Constraint 2) says that the return value must be a struct `Step` with the type parameters `S` and `A`.

                      Constraint 3) says that the next field of the struct must be a `Box` that contains an object of unknown size (hence the `dyn`) which is a closure that takes one `S` parameter and returns a `Step` struct with the type parameters `S` and `A`. The lifetime of the closure must be `'s`.

                      Finally, the seed must be of type `S`.

                      • willwinger 37 days ago

                        I have been using Rust for my side project https://github.com/elasmojs/rooster

                        Rust is just great! I feel liberated from Java and Node.

                        But what intrigues me is why do programmers insist on making code look and feel difficult? Is there a secret obsession in good programmers to make their code look clever? Is variable naming not a basic etiquette?

                        • mplanchard 36 days ago

                          How would you approach the naming of this example differently?

                          I have gone back and forth on whether to align with the general convention on single-letter generic type names, but I’ve tended to come back to the single-letter approach any time I try making them more specific. The issue is of course that they are generic, and so naming them too explicitly robs them of some of their “this is any type” genericness. For lifetimes, I will definitely sometimes name them more explicitly than `’a`, but usually only when there’s more than one, as a way of being sure I don’t mix them up. However, the association of lifetimes with references tends to be straightforward enough that it often seems clearer and less messy to just name them with single characters.

                          • willwinger 36 days ago

                            I understand but in this case we have the lifetime as s, seed as S and copy as A. And you can see how it creates that extra effort for someone who is seeing it for the first time. For me any code is associated with its application and that application should be evident in the code.

                            We talk a lot about user experience and usability in our end products but we do not extend that thought to our fellow programmers at code level.

                            • SAI_Peregrinus 34 days ago

                              I don't mind single-letter names, but I dislike having the same letter used with different case for two different names. In this example I'd probably have a lifetime 'l, a seed S, and a copy C. That way there's no "s" vs "capital s" confusion when saying what the names are.

                        • andrewzah 37 days ago
                          • gameswithgo 37 days ago

                            the good news is that is a worst case scenario and pretty rare

                          • lilyball 37 days ago

                            This looks like a fun exercise, but ultimately it seems to just be reinventing Iterators except each step yields a new copy of the iterator instead of just mutating the existing iterator. The other difference is there's an explicit Skip, which I don't understand the point of because the next function could just be written as a loop until it either finishes or yields an element.

                            So this basically just seems to be "what if Iterator couldn't use &mut".

                            • twic 36 days ago

                              I think the point of the exercise is to start with streams as they are defined in Haskell and then gradually improve them, as an experiment or demonstration. The post is aimed at Haskell programmers who might be interested in Rust.

                              If the goal was to build efficient and flexible streams in Rust, then indeed, you wouldn't start like this.

                            • zetalemur 37 days ago

                              It's interesting to see that many long-standing members of the Haskell community take a look into (or even heavily use) Rust and add it to their repertoire. This is really a good sign for Rust (Haskell folks normally have high expectations from a PL).

                              It makes sense, as (currently) it's pretty much impossible to tackle some important domains (e.g. image de/encoding) in Haskell while in Rust we already have libraries for that. Also: Idiomatic Rust is already extremely fast, while in Haskell it is non-trivial to implement high-performing code.

                              Prime example: The sieve of Eratosthenes (and I think we can agree that this is a prime example). In Rust you can implement a trivial solution that performs extremely well. In Haskell you will probably reach for the ST monad and I think it's controversial if that's idiomatic or not (and maybe non-trivial to grasp - as far as I know ST is implemented via compiler magic).

                              • tome 37 days ago

                                > it's pretty much impossible to tackle some important domains (e.g. image de/encoding) in Haskell

                                What do you mean? Image de/encoding is certainly possible in Haskell.

                                • zetalemur 37 days ago

                                  Yes, I may have worded that wrong, it's certainly possible. But is it reasonable? By that, I mean can it compete in performance with C/C++/Rust?

                                  • tome 37 days ago

                                    Ah, not sure! Maybe you're right. But then it doesn't seem separate from your second point: "Also: Idiomatic Rust is already extremely fast, while in Haskell it is non-trivial to implement high-performing code.".

                                    • zetalemur 37 days ago

                                      You are technically correct. The best kind of correct. :)

                              • lachlan-sneff 37 days ago

                                This is essentially the iterator trait, right?

                                • ed25519FUUU 37 days ago

                                  I’m glad to hear that the compiler nudged you in the right direction for optimal code. That nagging feeling that there’s a “better way” or more optimal solution is really debilitating to me when writing in an unfamiliar language.

                                  This is probably one of the reasons I’ve been drawn to Go, where that feeling Is basically non-existent.

                                  • jasonjayr 37 days ago

                                    What is it about go that makes it 'non-existent' ?

                                    • chc 37 days ago

                                      To be clear, I think it's mainly nonexistent if you haven't worked in more powerful languages than Go before. All those people complaining about lack of generics and stuff probably are experiencing that feeling. Go is simple enough that the way you write something is likely to be at least very much along the right track to the the best way you could write it in Go. But that won't necessarily stop you thinking of other ways that just aren't supported by the language.

                                      • hombre_fatal 37 days ago

                                        Because Go lacks powerful tools like sum types and generics. In a language like Rust or Haskell, you're always asking yourself if you can better express the problem in the type system, or which constructs you can use that, say, make runtime errors impossible.

                                        In Go, your tools are limited to copy-and-paste and `interface{}` type-casting. There are no better options to consider, so you try to untrain your brain from its reflex of "hmm this is error prone, I should find a better way."

                                        A good example of this is when you're parsing a binary file into a sequence of record structs. In a language like Typescript or Rust, you just make one ergonomic `Record = Record1(int, int) | Record2(string) | ...` and `fn parse([]byte) -> []Record`.

                                        In Go, the best you have is `fn parse([]byte) -> []interface{}` with type-casting.

                                        • damnyou 37 days ago

                                          That sounds like it would increase the load on your brain, not decrease it.

                                          • masklinn 37 days ago

                                            I guess they mean it doesn't give you that feeling of "there must be a better way" because chances are very high there is not, and so after some time looking for better ways and never finding them, since there's no feedback loop of "yeah I found a better way!" that feeling stops occurring.

                                        • loopz 37 days ago

                                          What Gophers mean by that is that golang nudges you to write the naive, straightforward and readable solution at first. This sidesteps premature optimization and false design choices too early on. In practice go stdlib and direct code tend to be faster than clever and more complex alternatives.

                                          • zozbot234 37 days ago

                                            Rust similarly nudges you to write the highest-performing-but-still-safe solution. You can opt into "convenience" features that come with some overhead, but that always involves some extra boilerplate so you'll know where a refactor may improve performance. And you can add unsafety, but it will always be clearly marked.

                                            I actually think Rust has a better story of what it "nudges" devs to write. Go strives for a very fuzzy feeling of surface simplicity, but in practice it's way less principled.

                                            • loopz 37 days ago

                                              Go is faster to write, read, can have faster defaults/libs and less need for choice. I'd consider rust where further optimization is warranted.

                                          • ed25519FUUU 37 days ago

                                            The feeling "there's probably a much better way of doing this" because the chances are there's really not. For better or for worse, there's simply not a lot of ways to accomplish the same task in Go.

                                            • whateveracct 37 days ago

                                              There typically is a better way to do pretty much everything you write in Go..but Go doesn't support it.

                                              It's just Worse is Better The language for better or worse.

                                            • Scarbutt 37 days ago

                                              Less features.

                                          • kiaulen 36 days ago

                                            I read the article, but had to use FF's reader mode. Why do people use #666 and a super light font weight for body text?!? My eyesight is fairly good, and it's still incredibly hard to read.

                                            • garmaine 37 days ago

                                              OP, why do you have gray text on a white background? It is literally unreadable for someone that doesn't have young eyes.

                                              Thankfully reader view in firefox cleaned it up, but otherwise it would be inaccessible.