Introduction

So, you're coming from C++ and want to write Rust? Great!

You have questions? We have answers.

This book is a collection of frequently asked questions for those arriving from existing C++ codebases. It guides you on how to adapt your C++ thinking to the new facilities available in Rust. It should help you if you're coming from other object-oriented languages such as Java too.

Although it's structured as questions and answers, it can also be read front-to-back, to give you hints about how to adapt your C++/Java thinking to a more idiomatically Rusty approach.

It does not aim to teach you Rust - there are many better resources. It doesn't aim to talk about Rust idioms in general - there are great existing guides for that. This guide is specifically about transitioning from some other traditionally OO language. If you're coming from such a language, you'll have questions about how to achieve the same outcomes in idiomatic Rust. That's what this guide is for.

Structure

The guide starts with idioms at the small scale - answering questions about how you'd write a few lines of code - and moves towards ever larger patterns - answering questions about how you'd structure your whole codebase.

Contributors

The following awesome people helped write the answers here, and they're sometimes quoted using the abbreviations given.

Thanks to Alyssa Haroldsen (@kupiakos) (AH), Augie Fackler (AF), David Tolnay (@davidtolnay) (DT), Łukasz Anforowicz (LA), Manish Goregaokar (@ManishEarth) (MG), Mike Forster (MF), Miguel Young de la Sota (@DrawsMiguel) (MY), and Tyler Mandry (@tmandry) (TM).

Their views have been edited and collated by Adrian Taylor (@adehohum), danakj@chromium.org and Martin Brænne. Any errors or misrepresentations are ours.

Licensed under either of Apache License, Version 2.0 or MIT license at your option.

Questions about code in function bodies

How can I avoid the performance penalty of bounds checks?

Rust array and list accesses are all bounds checked. You may be worried about a performance penalty. How can you avoid that?

Contort yourself a little bit to use iterators. - MY

Rust gives you choices around functional versus imperative style, but things often work better in a functional style. Specifically - if you've got something iterable, then there are probably functional methods to do what you want.

For instance, suppose you need to work out what food to get at the petshop. Here's code that does this in an imperative style:


#![allow(unused)]
fn main() {
// Copyright 2020 Google LLC
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
//    https://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

use std::collections::HashSet;
struct Animal {
    kind: &'static str,
    is_hungry: bool,
    meal_needed: &'static str,
}

static PETS: [Animal; 4] = [
    Animal {
        kind: "Dog",
        is_hungry: true,
        meal_needed: "Kibble",
    },
    Animal {
        kind: "Python",
        is_hungry: false,
        meal_needed: "Cat",
    },
    Animal {
        kind: "Cat",
        is_hungry: true,
        meal_needed: "Kibble",
    },
    Animal {
        kind: "Lion",
        is_hungry: false,
        meal_needed: "Kibble",
    },
];

static NEARBY_DUCK: Animal = Animal {
    kind: "Duck",
    is_hungry: true,
    meal_needed: "pondweed",
};
fn make_shopping_list_a() -> HashSet<&'static str> {
    let mut meals_needed = HashSet::new();
    for n in 0..PETS.len() { // ugh
        if PETS[n].is_hungry {
            meals_needed.insert(PETS[n].meal_needed);
        }
    }
    meals_needed
}
}

The loop index is verbose and error-prone. Let's get rid of it and loop over an iterator instead:


#![allow(unused)]
fn main() {
// Copyright 2020 Google LLC
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
//    https://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

use std::collections::HashSet;
struct Animal {
    kind: &'static str,
    is_hungry: bool,
    meal_needed: &'static str,
}

static PETS: [Animal; 4] = [
    Animal {
        kind: "Dog",
        is_hungry: true,
        meal_needed: "Kibble",
    },
    Animal {
        kind: "Python",
        is_hungry: false,
        meal_needed: "Cat",
    },
    Animal {
        kind: "Cat",
        is_hungry: true,
        meal_needed: "Kibble",
    },
    Animal {
        kind: "Lion",
        is_hungry: false,
        meal_needed: "Kibble",
    },
];

static NEARBY_DUCK: Animal = Animal {
    kind: "Duck",
    is_hungry: true,
    meal_needed: "pondweed",
};
fn make_shopping_list_b() -> HashSet<&'static str>  {
    let mut meals_needed = HashSet::new();
    for animal in PETS.iter() { // better...
        if animal.is_hungry {
            meals_needed.insert(animal.meal_needed);
        }
    }
    meals_needed
}
}

We're accessing the loop through an iterator, but we're still processing the elements inside a loop. It's often more idiomatic to replace the loop with a chain of iterators:


#![allow(unused)]
fn main() {
// Copyright 2020 Google LLC
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
//    https://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

use std::collections::HashSet;
struct Animal {
    kind: &'static str,
    is_hungry: bool,
    meal_needed: &'static str,
}

static PETS: [Animal; 4] = [
    Animal {
        kind: "Dog",
        is_hungry: true,
        meal_needed: "Kibble",
    },
    Animal {
        kind: "Python",
        is_hungry: false,
        meal_needed: "Cat",
    },
    Animal {
        kind: "Cat",
        is_hungry: true,
        meal_needed: "Kibble",
    },
    Animal {
        kind: "Lion",
        is_hungry: false,
        meal_needed: "Kibble",
    },
];

static NEARBY_DUCK: Animal = Animal {
    kind: "Duck",
    is_hungry: true,
    meal_needed: "pondweed",
};
fn make_shopping_list_c() -> HashSet<&'static str> {
    PETS.iter()
        .filter(|animal| animal.is_hungry)
        .map(|animal| animal.meal_needed)
        .collect() // best...
}
}

The obvious advantage of the third approach is that it's more concise, but less obviously:

  • The first solution may require Rust to do array bounds checks inside each iteration of the loop, making Rust potentially slower than C++. In this sort of simple example, it likely wouldn't, but functional pipelines simply don't require bounds checks.
  • The final container (a HashSet in this case) may be able to allocate roughly the right size at the outset, using the size_hint of a Rust iterator.
  • If you use iterator-style code rather than imperative code, it's more likely the Rust compiler will be able to auto-vectorize using SIMD instructions.
  • There is no mutable state within the function. This makes it easier to verify that the code is correct and to avoid introducing bugs when changing it. In this simple example it may be obvious that calling the HashSet::insert is the only mutation to the set, but in more complex scenarios it is quite easy to lose the overview.
  • And as a new arrival from C++, you may find this hard to believe: For an experienced Rustacean it'll be more readable.

Here are some more iterator techniques to help avoid materializing a collection:

  • You can chain two iterators together to make a longer one.

  • If you need to iterate two lists, zip them together to avoid bounds checks on either.

  • If you want to feed all your animals, and also feed a nearby duck, just chain the iterator to std::iter::once:

    
    #![allow(unused)]
    fn main() {
    use std::collections::HashSet;
    struct Animal {
        kind: &'static str,
        is_hungry: bool,
        meal_needed: &'static str,
    }
    static PETS: [Animal; 0] = [];
     static NEARBY_DUCK: Animal = Animal {
            kind: "Duck",
            is_hungry: true,
            meal_needed: "pondweed",
        };
    fn make_shopping_list_d() -> HashSet<&'static str> {
        PETS.iter()
            .chain(std::iter::once(&NEARBY_DUCK))
            .filter(|animal| animal.is_hungry)
            .map(|animal| animal.meal_needed)
            .collect()
    }
    }
    

    (Similarly, if you want to add one more item to the shopping list - maybe you're hungry, as well as your menagerie? - just add it after the map).

  • Option is iterable.

    
    #![allow(unused)]
    fn main() {
    use std::collections::HashSet;
    struct Animal {
        kind: &'static str,
        is_hungry: bool,
        meal_needed: &'static str,
    }
    static PETS: [Animal; 0] = [];
    struct Pond;
    static MY_POND: Pond = Pond;
    fn pond_inhabitant(pond: &Pond) -> Option<&Animal> {
        // ...
       None
    }
    
    fn make_shopping_list_e() -> HashSet<&'static str> {
        PETS.iter()
            .chain(pond_inhabitant(&MY_POND))
            .filter(|animal| animal.is_hungry)
            .map(|animal| animal.meal_needed)
            .collect()
    }
    }
    

    Here's a diagram showing how data flows in this iterator pipeline:

    flowchart LR
      %%{ init: { 'flowchart': { 'nodeSpacing': 40, 'rankSpacing': 15 } } }%%
        Pets
        Filter([filter by hunger])
        Map([map to noms])
        Meals
        uniqueify([uniqueify])
        shopping[Shopping list]
        Pets ---> Filter
        Pond
        Pond ---> inhabitant
        inhabitant[Optional pond inhabitant]
        inhabitant ---> Map
        Filter ---> Map
        Map ---> Meals
        Meals ---> uniqueify
        uniqueify ---> shopping
    
  • Here are other iterator APIs that will come in useful:

C++20 recently introduced ranges, a feature that allows you to pipeline operations on a collection similar to the way Rust iterators do, so this style of programming is likely to become more common in C++ too.

To summarize: While in C++ you tend to operate on collections by performing a series of operations on each individual item, in Rust you'll typically apply a pipeline of operations to the whole collection. Make this mental switch and your code will not just become more idiomatic but more efficient, too.

Isn't it confusing to use the same variable name twice?

In Rust, it's common to reuse the same name for multiple variables in a function. For a C++ programmer, this is weird, but there are two good reasons to do it:

  • You may no longer need to change a mutable variable after a certain point, and if your code is sufficiently complex you might want the compiler to guarantee this for you:

    
    #![allow(unused)]
    fn main() {
    fn spot_ate_my_slippers() -> bool {
        false
    }
    fn feed(_: &str) {}
    let mut good_boy = "Spot";
    if spot_ate_my_slippers() {
        good_boy = "Rover";
    }
    let good_boy = good_boy; // never going to change my dog again, who's a good boy
    feed(&good_boy);
    }
    
  • Another common pattern is to retain the same variable name as you gradually unwrap things to a simpler type:

    
    #![allow(unused)]
    fn main() {
    let url = "http://foo.com:1234";
    let port_number = url.split(":").skip(2).next().unwrap();
        // hmm, maybe somebody else already wrote a better URL parser....? naah, probably not
    let port_number = port_number.parse::<u16>().unwrap();
    }
    

How can I avoid the performance penalty of unwrap()?

C++ has no equivalent to Rust's match, so programmers coming from C++ often underuse it.

A heuristic: if you find yourself unwrap()ing, especially in an if/else statement, you should restructure your code to use a more sophisticated match.

For example, note the unwrap() in here (implying some runtime branch):


#![allow(unused)]
fn main() {
fn test_parse() -> Result<u64,std::num::ParseIntError> {
let s = "0x64a";
if s.starts_with("0x") {
    u64::from_str_radix(s.strip_prefix("0x").unwrap(), 16)
} else {
    s.parse::<u64>()
}
}
}

and no extra unwrap() here:


#![allow(unused)]
fn main() {
fn test_parse() -> Result<u64,std::num::ParseIntError> {
let s = "0x64a";
match s.strip_prefix("0x") {
    None => s.parse::<u64>(),
    Some(remainder) => u64::from_str_radix(remainder, 16),
}
}
}

if let and matches! are just as good as match but sometimes a little more concise. cargo clippy will usually tell you if you're using a match which can be simplified to one of those other two constructions.

How do I access variables from within a spawned thread?

Use crossbeam_utils::thread::scope.

Questions about your function signatures

Should I return an iterator or a collection?

Pretty much always return an iterator. - AH

We suggested you use iterators a lot in your code. Share the love! Give iterators to your callers too.

If you know your caller will store the items you're returning in a concrete collection, such as a Vec or a HashSet, you may want to return that. In all other cases, return an iterator.

Your caller might:

  • Collect the iterator into a Vec
  • Collect it into a HashSet or some other specialized container
  • Loop over the items
  • Filter them or otherwise completely ignore some

Collecting the items into vector will only turn out to be right in one of these cases. In the other cases, you're wasting memory and CPU time by building a concrete collection.

This is weird for C++ programmers because iterators don't usually have robust references into the underlying data. Even Java iterators are scary, throwing ConcurrentModificationExceptions when you least expect it. Rust prevents that, at compile time. If you can return an iterator, you should.

flowchart LR
    subgraph Caller
    it_ref[reference to iterator]
    end
    subgraph it_outer[Iterator]
    it[Iterator]
    it_ref --reference--> it
    end
    subgraph data[Underlying data]
    dat[Underlying data]
    it --reference--> dat
    end

How flexible should my parameters be?

Which of these is best?


#![allow(unused)]
fn main() {
fn a(params: &[String]) {
    // ...
}

fn b(params: &[&str]) {
    // ...
}

fn c(params: &[impl AsRef<str>]) {
    // ...
}
}

(You'll need to make an equivalent decision in other cases, e.g. Path versus PathBuf versus AsRef<Path>.)

None of the options is clearly superior; for each option, there's a case it can't handle that the others can:

fn a(params: &[String]) {
}
fn b(params: &[&str]) {
}
fn c(params: &[impl AsRef<str>]) {
}
fn main() {
    a(&[]);
    // a(&["hi"]); // doesn't work
    a(&vec![format!("hello")]);

    b(&[]);
    b(&["hi"]);
    // b(&vec![format!("hello")]); // doesn't work

    // c(&[]); // doesn't work
    c(&["hi"]);
    c(&vec![format!("hello")]);
}

So you have a variety of interesting ways to slightly annoy your callers under different circumstances. Which is best?

AsRef has some advantages: if a caller has a Vec<String>, they can use that directly, which would be impossible with the other options. But if they want to pass an empty list, they'll have to explicitly specify the type (for instance &Vec::<String>::new()).

Not a huge fan of AsRef everywhere - it's just saving the caller typing. If you have lots of AsRef then nothing is object-safe. - MG

TL;DR: choose the middle option, &[&str]. If your caller happens to have a vector of String, it's relatively little work to get a slice of &str:

fn b(params: &[&str]) {
}

fn main() {
    // Instead of b(&vec![format!("hello")]);
    let hellos = vec![format!("hello")];
    b(&hellos.iter().map(String::as_str).collect::<Vec<_>>());
}

How do I overload constructors?

You can't do this:


#![allow(unused)]
fn main() {
struct BirthdayCard {}
impl BirthdayCard {
    fn new(name: &str) -> Self {
      Self{}
        // ...
    }

    // Can't add more overloads:
    //
    // fn new(name: &str, age: i32) -> BirthdayCard { ... }
    //
    // fn new(name: &str, text: &str) -> BirthdayCard { ... }
}
}

If you have a default constructor, and a few variants for other cases, you can simply write them as different static methods. An idiomatic way to do this is to write a new() constructor and then with_foo() constructors that apply the given "foo" when constructing.


#![allow(unused)]
fn main() {
struct Racoon {}
impl Racoon {
    fn new() -> Self {
      Self{}
        // ...
    }
    fn with_age(age: usize) -> Self {
      Self{}
        // ...
    }
}
}

If you have have a bunch of constructors and no default, it may make sense to instead provide a set of new_foo() constructors.


#![allow(unused)]
fn main() {
struct Animal {}
impl Animal {
    fn new_squirrel() -> Self {
      Self{}
        // ...
    }
    fn new_badger() -> Self {
      Self{}
        // ...
    }
}
}

For a more complex situation, you may use the builder pattern. The builder has a set of methods which take &mut self and return &mut Self. Then add a build() that returns the final constructed object.


#![allow(unused)]
fn main() {
struct BirthdayCard {}

struct BirthdayCardBuilder {}
impl BirthdayCardBuilder {
    fn new(name: &str) -> Self {
      Self{}
        // ...
    }

    fn age(&mut self, age: i32) -> &mut Self {
        self
        // ...
    }

    fn text(&mut self, text: &str) -> &mut Self {
        self
        // ...
    }

    fn build(&mut self) -> BirthdayCard { BirthdayCard { /* ... */ } }
}
}

You can then chain these into short or long constructions, passing parameters as necessary:

struct BirthdayCard {}

struct BirthdayCardBuilder {}
impl BirthdayCardBuilder {
    fn new(name: &str) -> BirthdayCardBuilder {
      Self{}
      // ...
    }

    fn age(&mut self, age: i32) -> &mut BirthdayCardBuilder {
        self
        // ...
     }

    fn text(&mut self, text: &str) -> &mut BirthdayCardBuilder {
        self
        // ...
     }

    fn build(&mut self) -> BirthdayCard { BirthdayCard { /* ... */ } }
}

fn main() {
    let card = BirthdayCardBuilder::new("Paul")
        .age(64)
        .text("Happy Valentine's Day!")
        .build();
}

Note another advantage of builders: Overloaded constructors often don't provide all possible combinations of parameters, whereas with the builder pattern, you can combine exactly the parameters you want.

When must I use #[must_use]?

Use it on Results and mutex locks. - MG

#[must_use] causes a compile error if the caller ignores the return value.

Rust functions are often single-purpose. They either:

  • Return a value without any side effects; or
  • Do something (i.e. have side effects) and return nothing.

In neither case do you need to think about #[must_use]. (In the first case, nobody would call your function unless they were going to use the result.)

#[must_use] is useful for those rarer functions which return a result and have side effects. In most such cases, it's wise to specify #[must_use], unless the return value is truly optional (for example in HashMap::insert).

When should I take parameters by value?

Move semantics are more common in Rust than in C++.

In C++ moves tend to be an optimization, whereas in Rust they're a key semantic part of the program. - MY

To a first approximation, you should assume similar performance when passing things by (moved) value or by reference. It's true that a move may turn out to be a memcpy, but it's often optimized away.

Express the ownership relationship in the type system, instead of trying to second-guess the compiler for efficiency. - AF

The moves are, of course, destructive - and unlike in C++, the compiler enforces that you don't reuse a variable that has been moved. Some C++ objects become toxic after they've moved; that's not a risk in Rust.

So here's the heuristic: if a caller shouldn't be able to use an object again, pass it via move semantics in order to consume it.

An extreme example: a UUID is supposed to be globally unique - it might cause a logic error for a caller to retain knowledge of a UUID after passing it to a callee.

More generally, consume data enthusiastically to avoid logical errors during future refactorings. For instance, if some command-line options are overridden by a runtime choice, consume those old options - then any future refactoring which uses them after that point will give you a compile error. This pattern is surprisingly effective at spotting errors in your assumptions.

Should I ever take self by value?

Sometimes. If you've got a member function which destroys or transforms a thing, it should take self by value. Examples:

  • Closing a file and returning a result code.
  • A builder-pattern object which spits out the thing it was building. (Example).

Questions about your types

My 'class' needs mutable references to other things to do its job. Other classes need mutable references to these things too. What do I do?

It's common in C++ to have a class that contain mutable references to other objects; the class mutates those objects to do its work. Often, there are several classes that all hold a mutable reference to the same object. Here is a diagram that illustrates this:

flowchart LR
    subgraph Shared functionality
    important[Important Shared Object]
    end
    subgraph ObjectA
    methodA[Method]
    refa[Mutable Reference]-->important
    methodA-. Acts on shared object.->important
    end
    subgraph ObjectB
    refb[Mutable Reference]-->important
    methodB[Method]
    methodB-. Acts on shared object.->important
    end
    main --> ObjectA
    main --> ObjectB
    main-. Calls .-> methodA
    main-. Calls .-> methodB

In Rust, you can't have multiple mutable references to a shared object, so what do you do?

First of all, consider moving behavior out of your types. (See the answer about the observer pattern and especially the second option described there.)

Even in Rust, though, it's still often the best choice to make complex behavior part of the type within impl blocks. You can still do that - but don't store references. Instead, pass them into each function call.

flowchart LR
    subgraph Shared functionality
    important[Important Shared Object]
    end
    subgraph ObjectA
    methodA[Method]
    methodA-. Acts on shared object.->important
    end
    subgraph ObjectB
    methodB[Method]
    methodB-. Acts on shared object.->important
    end
    main --> ObjectA
    main --> ObjectB
    main --> important
    main-. Passes reference to shared object.-> methodA
    main-. Passes reference to shared object.-> methodB

Instead of this:

struct ImportantSharedObject;
struct ObjectA<'a> {
   important_shared_object: &'a mut ImportantSharedObject,
}
impl<'a> ObjectA<'a> {
   fn new(important_shared_object: &'a mut ImportantSharedObject) -> Self {
       Self {
           important_shared_object
       }
   }
   fn do_something(&mut self) {
       // act on self.important_shared_object
   }
}
fn main() {
    let mut shared_thingy = ImportantSharedObject;
    let mut a = ObjectA::new(&mut shared_thingy);
    a.do_something(); // acts on shared_thingy
}

Do this:

struct ImportantSharedObject;
struct ObjectA;
impl ObjectA {
   fn new() -> Self {
       Self
   }
   fn do_something(&mut self, important_shared_object: &mut ImportantSharedObject) {
       // act on important_shared_object
   }
}
fn main() {
    let mut shared_thingy = ImportantSharedObject;
    let mut a = ObjectA::new();
    a.do_something(&mut shared_thingy); // acts on shared_thingy
}

(Happily this also gets rid of named lifetime parameters.)

If you have a hundred such shared objects, you probably don't want a hundred function parameters. So it's usual to bundle them up into a context structure which can be passed into each function call:

struct ImportantSharedObject;
struct AnotherImportantObject;
struct Ctx<'a> {
    important_shared_object: &'a mut ImportantSharedObject,
    another_important_object: &'a mut AnotherImportantObject,
}

struct ObjectA;
impl ObjectA {
   fn new() -> Self {
       Self
   }
   fn do_something(&mut self, ctx: &mut Ctx) {
       // act on ctx.important_shared_object and ctx.another_important_thing
   }
}
fn main() {
    let mut shared_thingy = ImportantSharedObject;
    let mut another_thingy = AnotherImportantObject;
    let mut ctx = Ctx {
        important_shared_object: &mut shared_thingy,
        another_important_object: &mut another_thingy,
    };
    let mut a = ObjectA::new();
    a.do_something(&mut ctx); // acts on both the shared thingies
}
flowchart LR
    subgraph Shared functionality
    important[Important Shared Object]
    end
    subgraph Context
    refa[Mutable Reference]-->important
    end
    subgraph ObjectA
    objectA[Object A]
    methodA[Method]
    methodA-. Acts on shared object.->important
    end
    subgraph ObjectB
    objectB[Object B]
    methodB[Method]
    methodB-. Acts on shared object.->important
    end
    main --> objectA
    main --> objectB
    main --> Context
    main-. Passes context.-> methodA
    main-. Passes context.-> methodB

Even simpler: just put all the data directly into Ctx. But the key point is that this context object is passed around into just about all function calls rather than being stored anywhere, thus negating any borrowing/lifetime concerns.

This pattern can be seen in bindgen, for example.

Split out borrowing concerns from the object concerns. - MG

To generalize this idea, try to avoid storing references to anything that might need to be changed. Instead take those things as parameters. For instance petgraph takes the entire graph as context to a Walker object, such that the graph can be changed while you're walking it.

Questions about your whole codebase

The C++ observer pattern is hard in Rust. What to do?

The C++ observer pattern usually means that there are broadcasters sending messages to consumers:

flowchart TB
    broadcaster_a[Broadcaster A]
    broadcaster_b[Broadcaster B]
    consumer_a[Consumer A]
    consumer_b[Consumer B]
    consumer_c[Consumer C]
    broadcaster_a --> consumer_a
    broadcaster_b --> consumer_a
    broadcaster_a --> consumer_b
    broadcaster_b --> consumer_b
    broadcaster_a --> consumer_c
    broadcaster_b --> consumer_c

The broadcasters maintain lists of consumers, and the consumers act in response to messages (often mutating their own state.)

This doesn't work in Rust, because it requires the broadcasters to hold mutable references to the consumers.

What do you do?

Option 1: make everything runtime-checked

Each of your consumers could become an Rc<RefCell<T>> or, if you need thread-safety, an Arc<RwLock<T>>.

The Rc or Arc allows broadcasters to share ownership of a consumer. The RefCell or RwLock allows each broadcaster to acquire a mutable reference to a consumer when it needs to send a message.

This example shows how, in Rust, you may independently choose reference counting or interior mutability. In this case we need both.

Just like typical reference counting in C++, Rc and Arc have the option to provide a weak pointer, so the lifetime of each consumer doesn't need to be extended unnecessarily. As an aside, it would be nice if Rust had an Rc-like type which enforces exactly one owner, and multiple weak ptrs. Rc could be wrapped quite easily to do this.

Reference counting is frowned-upon in C++ because it's expensive. But, in Rust, not so much:

  • Few objects are reference counted; the majority of objects are owned statically.
  • Even when objects are reference counted, those counts are rarely incremented and decremented because you can (and do) pass around &Rc<RefCell<T>> most of the time. In C++, the "copy by default" mode means it's much more common to increment and decrement reference counts.

In fact, the compile-time guarantees might cause you to do less reference counting than C++:

In Servo there is a reference count but far fewer objects are reference counted than in the rest of Firefox, because you don’t need to be paranoid - MG

However: Rust does not prevent reference cycles, although they're only possible if you're using both reference counting and interior mutability.

Option 2: drive the objects from the code, not the other way round

In C++, it's common to have all behavior within classes. Those classes are the total behavior of the system, and so they must interact with one another. The observer pattern is common.

flowchart TB
    broadcaster_a[Broadcaster A]
    consumer_a[Consumer A]
    consumer_b[Consumer B]
    broadcaster_a -- observer --> consumer_a
    broadcaster_a -- observer --> consumer_b

In Rust, it's more common to have some external function which drives overall behavior.

flowchart TB
    main(Main)
    broadcaster_a[Broadcaster A]
    consumer_a[Consumer A]
    consumer_b[Consumer B]
    main --1--> broadcaster_a
    broadcaster_a --2--> main
    main --3--> consumer_a
    main --4--> consumer_b

With this sort of design, it's relatively straightforward to take some output from one object and pass it into another object, with no need for the objects to interact at all.

In the most extreme case, this becomes the Entity-Component-System architecture used in game design.

Game developers seem to have completely solved this problem - we can learn from them. - MY

Option 3: use channels

The observer pattern is a way to decouple large, single-threaded C++ codebases. But if you're trying to decouple a codebase in Rust, perhaps you should assume multi-threading by default? Rust has built-in channels, and the crossbeam crate provides multi-producer, multi-consumer channels.

I'm a Rustacean, we assume massively parallel unless told otherwise :) - MG

That's all very well, but I have an existing C++ object broadcasting events. How exactly should I observe it?

If your Rust object is a consumer of events from some pre-existing C++ producer, all the above options remain possible.

  • You can make your object reference counted and have C++ own such a reference (potentially a weak reference)
  • C++ can deliver the message into a general message bucket. An external function reads messages from that bucket and invokes the Rust object that should handle it. This means the reference counting doesn't need to extend to the Rust objects outside that boundary layer.
  • You can have a shim object which converts the C++ callback into some message and injects it into a channel-based world.

Some of my C++ objects have shared mutable state. How can I make them safe in Rust?

You're going to have to do something with interior mutability: either RefCell<T> or its multithreaded equivalent, RwLock<T>.

You have three decisions to make:

  1. Will only Rust code access this particular instance of this object, or might C++ access it too?
  2. If both C++ and Rust may access the object, how do you avoid conflicts?
  3. How should Rust code react if the object is not available, because something else is using it?

If only Rust code can use this particular instance of shared state, then simply wrap it in RefCell<T> (single-threaded) or RwLock<T> (multi-threaded). Build a wrapper type such that callers aren't able to access the object directly, but instead only via the lock type.

If C++ also needs to access this particular instance of the shared state, it's more complex. There are presumably some invariants regarding use of this data in C++ - otherwise it would crash all the time. Perhaps the data can be used only from one thread, or perhaps it can only be used with a given mutex held. Your goal is to translate those invariants into an idiomatic Rust API that can be checked (ideally) at compile-time, and (failing that) at runtime.

For example, imagine:

class SharedMutableGoat {
public:
    void eat_grass(); // mutates tummy state
};

std::mutex lock;
SharedMutableGoat* billy; // only access when owning lock

Your idiomatic Rust wrapper might be:


#![allow(unused)]
fn main() {
mod ffi {
  #[allow(non_camel_case_types)]
  pub struct lock_guard;
  pub fn claim_lock() -> lock_guard { lock_guard{} }
  pub fn eat_grass() {}
  pub fn release_lock(lock: &mut lock_guard) {}
}
struct SharedMutableGoatLock {
    lock: ffi::lock_guard, // owns a std::lock_guard<std::mutex> somehow
};

// Claims the lock, returns a new SharedMutableGoatLock
fn lock_shared_mutable_goat() -> SharedMutableGoatLock {
    SharedMutableGoatLock { lock: ffi::claim_lock() }
}

impl SharedMutableGoatLock {
    fn eat_grass(&mut self) {
        ffi::eat_grass(); // Acts on the global goat
    }
}

impl Drop for SharedMutableGoatLock {
    fn drop(&mut self) {
        ffi::release_lock(&mut self.lock);
    }
}
}

Obviously, lots of permutations are possible, but the goal is to ensure that it's simply compile-time impossible to act on the global state unless appropriate preconditions are met.

The final decision is how to react if the object is not available. This decision can apply with C++ mutexes or with Rust locks (for example RwLock<T>). As in C++, the two major options are:

  • Block until the object becomes available.
  • Try to lock, and if the object is not available, do something else.

There can be a third option if you're using async Rust. If the data isn't available, you may be able to return to your event loop using an async version of the lock (Tokio example, async_std example).

How do I do a singleton?

Use OnceCell for now. This should arrive in the standard library in future.

What's the best way to retrofit Rust's parallelism benefits to an existing codebase?

When parallelizing an existing codebase, first check that all existing types are correctly Send and Sync. Generally, though, you should try to avoid implementing these yourself - instead use pre-existing wrapper types which enforce the correct contract (for example, RwLock).

After that:

If you can solve your problem by throwing Rayon at it, do. It’s magic - MG

If your task is CPU-bound, Rayon solves this handily. - MY

Rayon offers parallel constructs - for example parallel iterators - which can readily be retrofitted to an existing codebase. It also allows you to create and join tasks. Using Rayon can help simplify your code and eliminate lots of manual scheduling logic.

If your tasks are IO-bound, then you may need to look into async Rust, but that's hard to pull into an existing codebase.

What's the best way to architect a new codebase for parallelism?

In brief, like in other languages, you have a choice of architectures:

  • Message-passing, using event loops which listen on a channel, receive Send data and pass it on.
  • More traditional multithreading using Sync data structures such as mutexes (and perhaps Rayon).

There's probably a bias towards message-passing, and that's probably well-informed by its extensibility. - MG

I need a list of nodes which can refer to one another. How?

You can't easily do self-referential data structures in Rust. The usual workaround is to use an arena and replace references from one node to another with node IDs.

An arena is typically a Vec (or similar), and the node IDs are a newtype wrapper around a simple integer index.

Obviously, Rust doesn't check that your node IDs are valid. If you don't have proper references, what stops you from having stale IDs?

Arenas are often purely additive, which means that you can add entries but not delete them (example). If you must have an arena which deletes things, then use generational IDs; see the generational-arena crate and this RustConf keynote for more details.

If arenas still sound like a nasty workaround, consider that you might choose an arena anyway for other reasons:

  • All of the objects in the arena will be freed at the end of the arena's lifetime, instead of during their manipulation, which can give very low latency for some use-cases. Bumpalo formalizes this.
  • The rest of your program might have real Rust references into the arena. You can give the arena a named lifetime ('arena for example), making the provenance of those references very clear.

I'm having a miserable time making my data structure. Should I use unsafe?

Low-level data structures are hard in Rust. Arguably, Rust merely makes plain all the lifetime and ownership issues which you already had in other languages, but the compiler is brutal about it, and you're going to have a bad day.

Even something as simple as a doubly-linked list is notoriously hard; so much so that there is a book that teaches Rust based solely on linked lists. As that (wonderful) book makes clear, you are often faced with a choice:

If you're facing this decision... perhaps there's a third way.

You should almost always be using somebody else's tried-and-tested data structure.

petgraph and slotmap are great examples. Use someone else's crate by default, and resort to writing your own only if you exhaust that option.

C++ makes it hard to pull in third-party dependencies, so it's culturally normal to write new code. Rust makes it trivial to add dependencies, and so you will need to do that, even if it feels awkward for a C++ programmer.

This ease of adding dependencies co-evolved with the difficulty of making data structures. It's simply a part of programming in Rust. You just can't separate the language and the ecosystem.

You might argue that this dependency on third-party crates is concerning from a supply-chain security point of view. Your author would agree, but it's just the way you do things in Rust. Stop creating your own data structures.

Should I have a few big crates or lots of small ones?

In the past, it was recommended to have small crates to get optimal build time. Incremental builds generally make this unnecessary now. You should arrange your crates optimally for your semantic needs.

What crates should everyone know about?

CrateDescription
rayonparallelizing
serdeserializing and deserializing
crossbeamall sorts of parallelism tools
itertoolsmakes it slightly more pleasant to work with iterators. (For instance, if you want to join an iterator of strings, you can just go ahead and do that, without needing to collect the strings into a Vec first)
petgraphgraph data structures
slotmaparena-like key-value map
nomparsing
clapcommand-line parsing
regexerr, regular expressions
ringthe leading crypto library
nalgebralinear algebra
once_cellcomplex static data

How should I call C++ functions from Rust and vice versa?

Use cxx.

Oh, you want a justification? In that case, here's the history which brought us to this point.

From the beginning, Rust supported calling C functions using extern "C", #[repr(C)] and #[no_mangle]. Such callable C functions had to be declared manually in Rust:

sequenceDiagram
   Rust-->>extern: unsafe Rust function call
   extern-->>C: call from Rust to C
   participant extern as Rust unsafe extern "C" fn
   participant C as Existing C function

bindgen was invented to generate these declarations automatically from existing C/C++ header files. It has grown to understand an astonishingly wide variety of C++ constructs, but its generated bindings are still unsafe functions with lots of pointers involved.

sequenceDiagram
   Rust-->>extern: unsafe Rust function call
   extern-->>C: call from Rust to C++
   participant extern as Bindgen generated bindings
   participant C as Existing C++ function

Interacting with bindgen-generated bindings requires unsafe Rust; you will likely have to manually craft idiomatic safe Rust wrappers. This is time-consuming and error-prone.

cxx automates a lot of that process. Unlike bindgen it doesn't learn about functions from existing C++ headers. Instead, you specify cross-language interfaces in a Rust-like interface definition language (IDL) within your Rust file. cxx generates both C++ and Rust code from that IDL, marshaling data behind the scenes on both sides such that you can use standard language features in your code. For example, you'll find idiomatic Rust wrappers for std::string and std::unique_ptr and idiomatic C++ wrappers for a Rust slice.

sequenceDiagram
   Rust-->>rsbindings: safe idiomatic Rust function call
   rsbindings-->>cxxbindings: hidden C ABI call using marshaled data
   cxxbindings-->>cpp: call to standard idiomatic C++
   participant rsbindings as cxx-generated Rust code
   participant cxxbindings as cxx-generated C++ code
   participant cpp as C++ function using STL types

In the bindgen case even more work goes into wrapping idiomatic C++ signatures into something bindgen compatible: unique ptrs to raw ptrs, Drop impls on the Rust side, translating string types ... etc. The typical real-world binding we've converted from bindgen to cxx in my codebase has been -500 lines (mostly unsafe code) +300 lines (mostly safe code; IDL included). - DT

The greatest benefit is that cxx sufficiently understands C++ STL object ownership norms that the generated bindings can be used from safe Rust code.

At present, there is no established solution which combines the idiomatic, safe interoperability offered by cxx with the automatic generation offered by bindgen. It's not clear whether this is even possible but several projects are aiming in this direction.