I work on the protobuf team at Google, and I'm a big fan of Rust, though I haven't written much actual Rust except a bunch of Project Euler solutions.
For protobuf in C++, we've been moving more and more in the direction of using arenas for memory allocation. When you parse a protobuf, it creates a tree of objects that are usually all deleted at the same time. Freeing an arena is much, much cheaper than traversing the tree of objects and calling free() on each one.
My dream has been that Rust protobuf could support arenas as well as C++, but use Rust's type system to make it all provably correct at compile time (in C++ the lifetime management is inherently manual and unsafe). For absolute top performance, arenas will always beat trees of unique pointers (which I think corresponds to Rust's Box<> type).
I don't know Rust's type/lifetime system well enough to know if this is possible. I was looking recently at arenas in Rust and I noticed that Rust's version of placement new seems to be stalled:
"Unfortunately the path forward for placement new in Rust does not look good right now, so I've reverted this crate to work more like a memory heap where stuff can be put, but not constructed in place."
Hey @haberman! One of the authors here.
We actually do have an arena-esque implementation built on top of pb-jelly internally, as it was needed for Magic Pocket.
It's built on top of the Blob traits exposed by pb-jelly. It's not yet open-source, but it would be a good candidate to do next! It also definitely has unsafe code to your point. We open sourced the safe implementations that uses more standard types (Bytes/Buffer/Vec) first.
There's a decent amount of cleanup needed before we can opensource that as well, as much of it was built years ago, when rust ecosystem was less mature (eg Bytes/Buffer weren't around yet).
Placement new is just an optimization to avoid the initial memcpy from the stack in cases in which LLVM can't work it out itself. I don't believe that placement new ever enables semantics that aren't possible with plain old move semantics.
Ah, in that case it sounds like heterogenous arenas are more or less a solved problem in Rust, even if they aren't necessarily 100% optimal.
Probably the more difficult piece then is just how to model arena ownership of a tree of objects that all have links between them. We want to guarantee that links to sub-messages remain valid, which we would expect to be true if they are all in the same arena. But I believe Rust allows moving/swapping objects in and out of the arena?
> heterogenous arenas are more or less a solved problem in Rust
To expand a bit on what pcwalton said: Rust never had a concept of a type you can't move (we likely can't ever introduce it) and placement was never about "in-place construction" (Rust doesn't even have a concept of "construction", but rather focuses on initialization).
There is `Pin<&mut T>` which prevents e.g. replacing the value behind it with another `T` one, but there the restriction is in the pointer itself (you can think of it as `&pin mut T`), not in the pointee, and it's not that relevant here, it mainly exists to allow library code to correctly interface with the internally self-referential generators (that `async fn`s are built on).
In the scenario you describe you would use shared references (with appropriate lifetime parameterization on relevant data types), not mutable ones (which imply exclusivity), nor raw pointers (which is the only way to get the C++-like problems).
That will limit you to immutable data by default, so if you want to mutate some leaf fields, you wrap them in `Cell` (which prevents invalidation because `&Cell<T>` doesn't allow creating references inside the `T` value) or some kind of atomic/lock if you want concurrent access.
This already solves all the problems you were thinking of, and it doesn't even fundamentally require an arena (what you need an arena for is the dynamic graph structure, assuming you want to use references/pointers and not integer indices).
In essence, the more you rely on safe Rust features, the harder this is to get wrong or end up with a misusable API.
Oh and the Rust compiler itself has used arenas since before 1.0 (IIRC), and with every year we move more data into them, though most are 100% immutable (which helps with compiler correctness, especially incremental recompilation), or even interned (deep deduplication by caching allocation attempts).
We can even get away without dynamically tracking destructors by making most arena-allocated data destructor-less (we mostly just had to replace `Vec<T>` with arena-allocated `&[T]` to make that work).
Rust objects can always be moved to arbitrary places in memory, so I'm not sure how "in place" construction of a Rust object can even make sense unless it explicitly involves the Pin<P> feature to prevent moves. This is quite different from C/C++ where there's no default expectation that a constructed object can be bitwise-copied somewhere else in memory. Even Box<T> is really a special case since the object is meant to be accessed via an owning pointer or a (shared or mutable) reference; when accessing the underlying T by value, that always involves a move of the object so again there's no such thing as being "in place".
Without placement new allocating an object involves:
a) Constructing the object on the stack
b) Copying it to the heap
With placement new you construct the object directly on the heap, thus avoiding a copy.
It's purely an optimization to avoid a single copy. It's a bit surprising that Rust doesn't have it, given the focus on control over such things, and that it had an experimental 'box' keyword for this at 1.0.
edit: To be clear, the extra copy is not guaranteed - compilers can elide it. In order to help the compiler do this you can look at crates like boxext, which provide extension methods on Box that help the compiler to remove the copy. But placement new guarantees this.
Check out bumpalo. Dodrio is a real-world usage of bumpalo, so you can inspect that code to see how he deals with lifetimes correctly.
Edit: GATs are how "placement new" (Rust doesn't have new at all, hence the airquotes) would work in Rust, assuming the author here meant to say "custom allocators." With GATs, you could create a pointer (Box, Arc, ArenaBox etc.) trait and use that on your message types. "Placement new" is a whole different issue that ultimately boils down to a compiler optimization that is currently missing/not working and shouldn't functionally affect what you are trying to do at all.
another former contributor to pb-jelly, though no longer at Dropbox.
protoc plugins have an interesting bootstrapping problem as well: the protoc-gen-$LANG interface requires the ability to ser/de protobuf messages that describe the proto file's AST. If your build system builds almost everything from scratch, including the protoc plugin, this means that you need to have a variant of your protoc plugin linked to a working proto implementation...
That's not to say this is impossible or even difficult, but at the time that I last looked at it (more than a year ago at this point), it made it fairly unpalatable to move the codegen from Python to Rust.
It's 'only' really 4 (two of them are gRPC implementations).
But yeah - this is one of those things that makes me stay with Go instead of moving over to Rust for my backend SOA/microservice work. In Rust, for everything you need to do, there's at least 5 different libraries that implement that, all competing with eachother. This is especially annoying when dealing with transitive dependencies. Meanwhile in Go, you generally get one choice - it might be not great, but that's fine, it doesn't have to be.
EDIT: This is not intended to be mindless bashing of Rust. I do use Rust for other things. It's a fine language.
It's the only real alternative implementation (and more precisely, a fork of upstream protobuf), _and_ there is strong cooperation   between both projects to maintain a level of interoperability.
And, it's not even protobuf I have a problem with - but things like HTTP implementations. There still isn't a canonical HTTP client/server implementation for Rust, while in Go basically everyone just uses `net/http`, or something that builds on top of that. Same for cryptographic primitives, TLS, context, ...
I imagine this stems from the fact that Go needed to be a complete production-ready backend language at launch, whereas Rust has other origins. At Google any "hello, world" program is a web server, at Mozilla not so much.
Exactly this. Imagine you work at BigCorp and use protobuf to pass data around - now you have a unified data model you can share and everyone can use the same client to access it without going through the trouble of maintaining all those getters and setters. Rolling your own getters and setters is fine in a small project but you really see the advantages of the code gen approach once you are dealing with multiple different teams in an org working with the same complex data model.
There are definitely some downsides to the approach though, mostly typical problems you would expect with machine generated code. Namely that it’s verbose and if you have a super complex protobuf data model (hundreds or thousands of fields) and want to ship a fatjar or similar bundling of dependencies you can run into some size issues.
Protobuf does not provide any type safety whatsoever. The name of the type of the message is carried in a side-channel, and the interpretation of that name is completely up to the endpoint that deserializes the message.
I know it's common and perhaps even fashionable, but FWIW language like "We take an opinionated stance" utterly puts me off caring about this package
It's a piece of software, it has a design that is either fit for purpose or not. When ego becomes entangled in that design process, it's a strong indicator of the kind of experience one might have trying to get fixes or enhancements merged, or even the kind of attitude you'd find when attempting to report a bug.
That's not what the word 'opinionated' means here. It's not any one person's opinion; it's that the project overall takes a stance on an issue rather than leaving everything open for everyone else to figure out. It provides clarity and direction compared to the more difficult situation where every library is completely general. No ego involved at all.