Introduction

So, you're coming from C++ and want to write Rust? Great!

You have questions? We have answers.

This book is a collection of frequently asked questions for those arriving from existing C++ codebases. It guides you on how to adapt your C++ thinking to the new facilities available in Rust. It should help you if you're coming from other object-oriented languages such as Java too.

Although it's structured as questions and answers, it can also be read front-to-back, to give you hints about how to adapt your C++/Java thinking to a more idiomatically Rusty approach.

It does not aim to teach you Rust - there are many better resources. It doesn't aim to talk about Rust idioms in general - there are great existing guides for that. This guide is specifically about transitioning from some other traditionally OO language. If you're coming from such a language, you'll have questions about how to achieve the same outcomes in idiomatic Rust. That's what this guide is for.

Structure

The guide starts with idioms at the small scale - answering questions about how you'd write a few lines of code - and moves towards ever larger patterns - answering questions about how you'd structure your whole codebase.

Contributors

The following awesome people helped write the answers here, and they're sometimes quoted using the abbreviations given.

Thanks to Adam Perry(@__anp__) (AP), Alyssa Haroldsen (@kupiakos) (AH), Augie Fackler (@durin42) (AF), David Tolnay (@davidtolnay) (DT), Łukasz Anforowicz (LA), Manish Goregaokar (@ManishEarth) (MG), Mike Forster (MF), Miguel Young de la Sota (@DrawsMiguel) (MY), and Tyler Mandry (@tmandry) (TM).

Their views have been edited and collated by Adrian Taylor (@adehohum), Chris Palmer, danakj@chromium.org and Martin Brænne. Any errors or misrepresentations are ours.

Licensed under either of Apache License, Version 2.0 or MIT license at your option.

Questions about code in function bodies

How can I avoid the performance penalty of bounds checks?

Rust array and list accesses are all bounds checked. You may be worried about a performance penalty. How can you avoid that?

Contort yourself a little bit to use iterators. - MY

Rust gives you choices around functional versus imperative style, but things often work better in a functional style. Specifically - if you've got something iterable, then there are probably functional methods to do what you want.

For instance, suppose you need to work out what food to get at the petshop. Here's code that does this in an imperative style:


#![allow(unused)]
fn main() {
// Copyright 2020 Google LLC
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
//    https://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

use std::collections::HashSet;
struct Animal {
    kind: &'static str,
    is_hungry: bool,
    meal_needed: &'static str,
}

static PETS: [Animal; 4] = [
    Animal {
        kind: "Dog",
        is_hungry: true,
        meal_needed: "Kibble",
    },
    Animal {
        kind: "Python",
        is_hungry: false,
        meal_needed: "Cat",
    },
    Animal {
        kind: "Cat",
        is_hungry: true,
        meal_needed: "Kibble",
    },
    Animal {
        kind: "Lion",
        is_hungry: false,
        meal_needed: "Kibble",
    },
];

static NEARBY_DUCK: Animal = Animal {
    kind: "Duck",
    is_hungry: true,
    meal_needed: "pondweed",
};
fn make_shopping_list_a() -> HashSet<&'static str> {
    let mut meals_needed = HashSet::new();
    for n in 0..PETS.len() { // ugh
        if PETS[n].is_hungry {
            meals_needed.insert(PETS[n].meal_needed);
        }
    }
    meals_needed
}
}

The loop index is verbose and error-prone. Let's get rid of it and loop over an iterator instead:


#![allow(unused)]
fn main() {
// Copyright 2020 Google LLC
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
//    https://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

use std::collections::HashSet;
struct Animal {
    kind: &'static str,
    is_hungry: bool,
    meal_needed: &'static str,
}

static PETS: [Animal; 4] = [
    Animal {
        kind: "Dog",
        is_hungry: true,
        meal_needed: "Kibble",
    },
    Animal {
        kind: "Python",
        is_hungry: false,
        meal_needed: "Cat",
    },
    Animal {
        kind: "Cat",
        is_hungry: true,
        meal_needed: "Kibble",
    },
    Animal {
        kind: "Lion",
        is_hungry: false,
        meal_needed: "Kibble",
    },
];

static NEARBY_DUCK: Animal = Animal {
    kind: "Duck",
    is_hungry: true,
    meal_needed: "pondweed",
};
fn make_shopping_list_b() -> HashSet<&'static str>  {
    let mut meals_needed = HashSet::new();
    for animal in PETS.iter() { // better...
        if animal.is_hungry {
            meals_needed.insert(animal.meal_needed);
        }
    }
    meals_needed
}
}

We're accessing the loop through an iterator, but we're still processing the elements inside a loop. It's often more idiomatic to replace the loop with a chain of iterators:


#![allow(unused)]
fn main() {
// Copyright 2020 Google LLC
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
//    https://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

use std::collections::HashSet;
struct Animal {
    kind: &'static str,
    is_hungry: bool,
    meal_needed: &'static str,
}

static PETS: [Animal; 4] = [
    Animal {
        kind: "Dog",
        is_hungry: true,
        meal_needed: "Kibble",
    },
    Animal {
        kind: "Python",
        is_hungry: false,
        meal_needed: "Cat",
    },
    Animal {
        kind: "Cat",
        is_hungry: true,
        meal_needed: "Kibble",
    },
    Animal {
        kind: "Lion",
        is_hungry: false,
        meal_needed: "Kibble",
    },
];

static NEARBY_DUCK: Animal = Animal {
    kind: "Duck",
    is_hungry: true,
    meal_needed: "pondweed",
};
fn make_shopping_list_c() -> HashSet<&'static str> {
    PETS.iter()
        .filter(|animal| animal.is_hungry)
        .map(|animal| animal.meal_needed)
        .collect() // best...
}
}

The obvious advantage of the third approach is that it's more concise, but less obviously:

  • The first solution may require Rust to do array bounds checks inside each iteration of the loop, making Rust potentially slower than C++. In this sort of simple example, it likely wouldn't, but functional pipelines simply don't require bounds checks.
  • The final container (a HashSet in this case) may be able to allocate roughly the right size at the outset, using the size_hint of a Rust iterator.
  • If you use iterator-style code rather than imperative code, it's more likely the Rust compiler will be able to auto-vectorize using SIMD instructions.
  • There is no mutable state within the function. This makes it easier to verify that the code is correct and to avoid introducing bugs when changing it. In this simple example it may be obvious that calling the HashSet::insert is the only mutation to the set, but in more complex scenarios it is quite easy to lose the overview.
  • And as a new arrival from C++, you may find this hard to believe: For an experienced Rustacean it'll be more readable.

Here are some more iterator techniques to help avoid materializing a collection:

  • You can chain two iterators together to make a longer one.

  • If you need to iterate two lists, zip them together to avoid bounds checks on either.

  • If you want to feed all your animals, and also feed a nearby duck, just chain the iterator to std::iter::once:

    
    #![allow(unused)]
    fn main() {
    use std::collections::HashSet;
    struct Animal {
        kind: &'static str,
        is_hungry: bool,
        meal_needed: &'static str,
    }
    static PETS: [Animal; 0] = [];
     static NEARBY_DUCK: Animal = Animal {
            kind: "Duck",
            is_hungry: true,
            meal_needed: "pondweed",
        };
    fn make_shopping_list_d() -> HashSet<&'static str> {
        PETS.iter()
            .chain(std::iter::once(&NEARBY_DUCK))
            .filter(|animal| animal.is_hungry)
            .map(|animal| animal.meal_needed)
            .collect()
    }
    }
    

    (Similarly, if you want to add one more item to the shopping list - maybe you're hungry, as well as your menagerie? - just add it after the map).

  • Option is iterable.

    
    #![allow(unused)]
    fn main() {
    use std::collections::HashSet;
    struct Animal {
        kind: &'static str,
        is_hungry: bool,
        meal_needed: &'static str,
    }
    static PETS: [Animal; 0] = [];
    struct Pond;
    static MY_POND: Pond = Pond;
    fn pond_inhabitant(pond: &Pond) -> Option<&Animal> {
        // ...
       None
    }
    
    fn make_shopping_list_e() -> HashSet<&'static str> {
        PETS.iter()
            .chain(pond_inhabitant(&MY_POND))
            .filter(|animal| animal.is_hungry)
            .map(|animal| animal.meal_needed)
            .collect()
    }
    }
    

    Here's a diagram showing how data flows in this iterator pipeline:

flowchart LR
    %%{ init: { 'flowchart': { 'nodeSpacing': 40, 'rankSpacing': 15 } } }%%
      Pets
      Filter([filter by hunger])
      Map([map to noms])
      Meals
      uniqueify([uniqueify])
      shopping[Shopping list]
      Pets ---> Filter
      Pond
      Pond ---> inhabitant
      inhabitant[Optional pond inhabitant]
      inhabitant ---> Map
      Filter ---> Map
      Map ---> Meals
      Meals ---> uniqueify
      uniqueify ---> shopping
  • Here are other iterator APIs that will come in useful:

C++20 recently introduced ranges, a feature that allows you to pipeline operations on a collection similar to the way Rust iterators do, so this style of programming is likely to become more common in C++ too.

To summarize: While in C++ you tend to operate on collections by performing a series of operations on each individual item, in Rust you'll typically apply a pipeline of operations to the whole collection. Make this mental switch and your code will not just become more idiomatic but more efficient, too.

Isn't it confusing to use the same variable name twice?

In Rust, it's common to reuse the same name for multiple variables in a function. For a C++ programmer, this is weird, but there are two good reasons to do it:

  • You may no longer need to change a mutable variable after a certain point, and if your code is sufficiently complex you might want the compiler to guarantee this for you:

    
    #![allow(unused)]
    fn main() {
    fn spot_ate_my_slippers() -> bool {
        false
    }
    fn feed(_: &str) {}
    let mut good_boy = "Spot";
    if spot_ate_my_slippers() {
        good_boy = "Rover";
    }
    let good_boy = good_boy; // never going to change my dog again, who's a good boy
    feed(&good_boy);
    }
    
  • Another common pattern is to retain the same variable name as you gradually unwrap things to a simpler type:

    
    #![allow(unused)]
    fn main() {
    let url = "http://foo.com:1234";
    let port_number = url.split(":").skip(2).next().unwrap();
        // hmm, maybe somebody else already wrote a better URL parser....? naah, probably not
    let port_number = port_number.parse::<u16>().unwrap();
    }
    

How can I avoid the performance penalty of unwrap()?

C++ has no equivalent to Rust's match, so programmers coming from C++ often underuse it.

A heuristic: if you find yourself unwrap()ing, especially in an if/else statement, you should restructure your code to use a more sophisticated match.

For example, note the unwrap() in here (implying some runtime branch):


#![allow(unused)]
fn main() {
fn test_parse() -> Result<u64,std::num::ParseIntError> {
let s = "0x64a";
if s.starts_with("0x") {
    u64::from_str_radix(s.strip_prefix("0x").unwrap(), 16)
} else {
    s.parse::<u64>()
}
}
}

and no extra unwrap() here:


#![allow(unused)]
fn main() {
fn test_parse() -> Result<u64,std::num::ParseIntError> {
let s = "0x64a";
match s.strip_prefix("0x") {
    None => s.parse::<u64>(),
    Some(remainder) => u64::from_str_radix(remainder, 16),
}
}
}

if let and matches! are just as good as match but sometimes a little more concise. cargo clippy will usually tell you if you're using a match which can be simplified to one of those other two constructions.

How do I access variables from within a spawned thread?

Use std::thread::scope.

When should I use runtime checks vs jumping through hoops to do static checks?

Everyone learns Rust a different way, but it's said that some people reach a point of "trait mania" where they try to encode too much via the type system, and get in a mess. So, in learning Rust, you will want to strike a balance between runtime checks (easy) or static compile-time checks (more efficient but requires deeper understanding.)

It’s very personal - some people learn better if they opt out of language features, others not. - MG

Some heuristics for how to keep things simple during the beginning of your Rust journey:

  • It's OK to start with lots of .unwrap(), cloning and Arc/Rc.
  • Start to use more advanced language features when you feel annoyed with the amount of boilerplate. (As an expert, you'll switch to a different strategy which is to consider the virality of your choices through the codebase.)
  • Don't use traits until you have to. You might (for instance) need to use a trait to make some code unit testable, but overoptimizing for that too soon is a mistake. Some say that it's wise initially to avoid defining any new traits at all.
  • Try to keep types smaller.

Specifically on reference counting,

If using Rc means you can avoid a lifetime parameter which is in half the APIs in the project, that’s a very reasonable choice. If it avoids a single lifetime somewhere, probably not a good idea. But measure before deciding. - MG

If you want to bail out of the complexity of static checks, which runtime checks are OK?

  • unwrap() and Option is mostly fine.
  • Arc and Rc is also fine in most cases.
  • Extensive use of clone() is fine but will have a performance impact.
  • Cell is regarded as a code smell and suggests you don't understand your lifetimes - it should be used sparingly.
  • unsafe is definitely not OK. It's harder to write unsafe Rust than to write C or C++, because Rust has additional aliasing rules. If you're reaching for unsafe to work around the complexity of Rust's static type system, as a relative Rust beginner, please reconsider and look into the other options listed above.

Doing lifetime magic — where "magic" means annotating a function or complex type with more than 1 lifetime, or other wizardry — is often an optimization that you can defer until later. In the beginning, and when writing small programs that you only intend to use a few times ('scripts'), copying is fine.

Questions about your function signatures

Should I return an iterator or a collection?

Pretty much always return an iterator. - AH

We suggested you use iterators a lot in your code. Share the love! Give iterators to your callers too.

If you know your caller will store the items you're returning in a concrete collection, such as a Vec or a HashSet, you may want to return that. In all other cases, return an iterator.

Your caller might:

  • Collect the iterator into a Vec
  • Collect it into a HashSet or some other specialized container
  • Loop over the items
  • Filter them or otherwise completely ignore some

Collecting the items into vector will only turn out to be right in one of these cases. In the other cases, you're wasting memory and CPU time by building a concrete collection.

This is weird for C++ programmers because iterators don't usually have robust references into the underlying data. Even Java iterators are scary, throwing ConcurrentModificationExceptions when you least expect it. Rust prevents that, at compile time. If you can return an iterator, you should.

flowchart LR
    subgraph Caller
    it_ref[reference to iterator]
    end
    subgraph it_outer[Iterator]
    it[Iterator]
    it_ref --reference--> it
    end
    subgraph data[Underlying data]
    dat[Underlying data]
    it --reference--> dat
    end

How flexible should my parameters be?

Which of these is best?


#![allow(unused)]
fn main() {
fn a(params: &[String]) {
    // ...
}

fn b(params: &[&str]) {
    // ...
}

fn c(params: &[impl AsRef<str>]) {
    // ...
}
}

(You'll need to make an equivalent decision in other cases, e.g. Path versus PathBuf versus AsRef<Path>.)

None of the options is clearly superior; for each option, there's a case it can't handle that the others can:

fn a(params: &[String]) {
}
fn b(params: &[&str]) {
}
fn c(params: &[impl AsRef<str>]) {
}
fn main() {
    a(&[]);
    // a(&["hi"]); // doesn't work
    a(&vec![format!("hello")]);

    b(&[]);
    b(&["hi"]);
    // b(&vec![format!("hello")]); // doesn't work

    // c(&[]); // doesn't work
    c(&["hi"]);
    c(&vec![format!("hello")]);
}

So you have a variety of interesting ways to slightly annoy your callers under different circumstances. Which is best?

AsRef has some advantages: if a caller has a Vec<String>, they can use that directly, which would be impossible with the other options. But if they want to pass an empty list, they'll have to explicitly specify the type (for instance &Vec::<String>::new()).

Not a huge fan of AsRef everywhere - it's just saving the caller typing. If you have lots of AsRef then nothing is object-safe. - MG

TL;DR: choose the middle option, &[&str]. If your caller happens to have a vector of String, it's relatively little work to get a slice of &str:

fn b(params: &[&str]) {
}

fn main() {
    // Instead of b(&vec![format!("hello")]);
    let hellos = vec![format!("hello")];
    b(&hellos.iter().map(String::as_str).collect::<Vec<_>>());
}

How do I overload constructors?

You can't do this:


#![allow(unused)]
fn main() {
struct BirthdayCard {}
impl BirthdayCard {
    fn new(name: &str) -> Self {
      Self{}
        // ...
    }

    // Can't add more overloads:
    //
    // fn new(name: &str, age: i32) -> BirthdayCard { ... }
    //
    // fn new(name: &str, text: &str) -> BirthdayCard { ... }
}
}

If you have a default constructor, and a few variants for other cases, you can simply write them as different static methods. An idiomatic way to do this is to write a new() constructor and then with_foo() constructors that apply the given "foo" when constructing.


#![allow(unused)]
fn main() {
struct Racoon {}
impl Racoon {
    fn new() -> Self {
      Self{}
        // ...
    }
    fn with_age(age: usize) -> Self {
      Self{}
        // ...
    }
}
}

If you have have a bunch of constructors and no default, it may make sense to instead provide a set of new_foo() constructors.


#![allow(unused)]
fn main() {
struct Animal {}
impl Animal {
    fn new_squirrel() -> Self {
      Self{}
        // ...
    }
    fn new_badger() -> Self {
      Self{}
        // ...
    }
}
}

For a more complex situation, you may use the builder pattern. The builder has a set of methods which take &mut self and return &mut Self. Then add a build() that returns the final constructed object.


#![allow(unused)]
fn main() {
struct BirthdayCard {}

struct BirthdayCardBuilder {}
impl BirthdayCardBuilder {
    fn new(name: &str) -> Self {
      Self{}
        // ...
    }

    fn age(&mut self, age: i32) -> &mut Self {
        self
        // ...
    }

    fn text(&mut self, text: &str) -> &mut Self {
        self
        // ...
    }

    fn build(&mut self) -> BirthdayCard { BirthdayCard { /* ... */ } }
}
}

You can then chain these into short or long constructions, passing parameters as necessary:

struct BirthdayCard {}

struct BirthdayCardBuilder {}
impl BirthdayCardBuilder {
    fn new(name: &str) -> BirthdayCardBuilder {
      Self{}
      // ...
    }

    fn age(&mut self, age: i32) -> &mut BirthdayCardBuilder {
        self
        // ...
     }

    fn text(&mut self, text: &str) -> &mut BirthdayCardBuilder {
        self
        // ...
     }

    fn build(&mut self) -> BirthdayCard { BirthdayCard { /* ... */ } }
}

fn main() {
    let card = BirthdayCardBuilder::new("Paul")
        .age(64)
        .text("Happy Valentine's Day!")
        .build();
}

Note another advantage of builders: Overloaded constructors often don't provide all possible combinations of parameters, whereas with the builder pattern, you can combine exactly the parameters you want.

When must I use #[must_use]?

Use it on Results and mutex locks. - MG

#[must_use] causes a compile error if the caller ignores the return value.

Rust functions are often single-purpose. They either:

  • Return a value without any side effects; or
  • Do something (i.e. have side effects) and return nothing.

In neither case do you need to think about #[must_use]. (In the first case, nobody would call your function unless they were going to use the result.)

#[must_use] is useful for those rarer functions which return a result and have side effects. In most such cases, it's wise to specify #[must_use], unless the return value is truly optional (for example in HashMap::insert).

When should I take parameters by value?

Move semantics are more common in Rust than in C++.

In C++ moves tend to be an optimization, whereas in Rust they're a key semantic part of the program. - MY

To a first approximation, you should assume similar performance when passing things by (moved) value or by reference. It's true that a move may turn out to be a memcpy, but it's often optimized away.

Express the ownership relationship in the type system, instead of trying to second-guess the compiler for efficiency. - AF

The moves are, of course, destructive - and unlike in C++, the compiler enforces that you don't reuse a variable that has been moved. Some C++ objects become toxic after they've moved; that's not a risk in Rust.

So here's the heuristic: if a caller shouldn't be able to use an object again, pass it via move semantics in order to consume it.

An extreme example: a UUID is supposed to be globally unique - it might cause a logic error for a caller to retain knowledge of a UUID after passing it to a callee.

More generally, consume data enthusiastically to avoid logical errors during future refactorings. For instance, if some command-line options are overridden by a runtime choice, consume those old options - then any future refactoring which uses them after that point will give you a compile error. This pattern is surprisingly effective at spotting errors in your assumptions.

Should I ever take self by value?

Sometimes. If you've got a member function which destroys or transforms a thing, it should take self by value. Examples:

  • Closing a file and returning a result code.
  • A builder-pattern object which spits out the thing it was building. (Example).

How do I take a thing, and a reference to something within that thing?

For example, suppose you want to give all of your dogs to your friend, yet also tell your friend which one of the dogs is the Best Boy or Girl.

struct PetInformation {
  std::vector<Dog> dogs;
  Dog& BestBoy;
  Dog& BestGirl;
}

PetInformation GetPetInformation() {
  // ...
}

Generally this is an indication that your types or functions are not split down in the correct way:

This is a decomposition problem. Once you’ve found the correct decomposition, everything else just works. The code almost writes itself. - AF


#![allow(unused)]
fn main() {
struct Dog;
struct PetInformation(Vec<Dog>);

fn get_pet_information() -> PetInformation {
  // ...
PetInformation(Vec::new())
}

fn identify_best_boy(pet_information: &PetInformation) -> &Dog {
  // ...
  pet_information.0.get(0).unwrap()
}
}

One use-case is when you want to act on some data, depending on its contents... but you also wanted to do something with those contents that you previously identified.


#![allow(unused)]
fn main() {
struct Key;
struct Door { locked: bool }

struct Car {
  ignition: Option<Key>,
  door: Door,
}

fn steal_car(car: Car) {
  match car {
    Car {
      ignition: Some(ref key),
      door: Door { locked: false }
    } => drive_away_normally(car /* , key */),
    _ => break_in_and_hotwire(car)
  }
}

fn drive_away_normally(car: Car /* , key: &Key */) {
  // Annoying to have to repeat this code...
  let key = match car {
    Car {
      ignition: Some(ref key),
      ..
    } => key,
    _ => unreachable!()
  };
  turn_key(key);
  // ...
}

fn turn_key(key: &Key) {}
fn break_in_and_hotwire(car: Car) {}
}

If this repeated matching gets annoying, it's relatively easy to extract it to a function.


#![allow(unused)]
fn main() {
fn turn_key(key: &Key) {}
fn break_in_and_hotwire(car: Car) {}
struct Key;
struct Door { locked: bool }
struct Car {
  ignition: Option<Key>,
  door: Door,
}

impl Car {
  fn get_usable_key(&self) -> Option<&Key> {
    match self {
      Car {
        ignition: Some(ref key),
        door: Door { locked: false }
      } => Some(key),
      _ => None,
    }
  }
}

fn steal_car(car: Car) {
  match car.get_usable_key() {
    None => break_in_and_hotwire(car),
    Some(_) => drive_away_normally(car),
  }
}

fn drive_away_normally(car: Car) {
  turn_key(car.get_usable_key().unwrap());
}
}

When should I return impl Trait?

Your main consideration should be API stability. If your caller doesn't need to know the concrete implementation type, then don't tell it. That gives you flexibility to change your implementation in future without breaking compatibility.

Note Hyrum's Law!

Using impl Trait doesn't solve all possible API stability concerns, because even impl Trait leaks auto-traits such as Send and Sync.

I miss function overloading! What do I do?

Use a trait to implement the behavior you used to have.

For example, in C++:

class Dog {
public:
  void eat(Dogfood);
  void eat(DeliveryPerson);
};

In Rust you might express this as:


#![allow(unused)]
fn main() {
trait Edible {
};

struct Dog;

impl Dog {
  fn eat(edible: impl Edible) {
    // ...
  }
}

struct Dogfood;
struct DeliveryPerson;

impl Edible for Dogfood {}
impl Edible for DeliveryPerson {}
}

This gives your caller all the convenience they want, though may increase work for you as the implementer.

I miss operator overloading! What do I do?

Implement the standard traits instead (for example PartialEq, Add). This has equivalent effect in that folks will be able to use your type in a standard Rusty way without knowing too much special about your type.

Should I return an error, or panic?

Panics should be used only for invariants, never for anything that you believe might happen. That's especially true for libraries

  • panicking (or asserting) should be reserved for the 'top level' code driving the application.

Libraries which panic are super-rude and I hate them - MY

Even in your own application code, panicking might not be wise:

Panicking in application logic for recoverable errors makes it way harder to librarify some code - AP

If you really must have an API which can panic, add a try_ equivalent too.

What should my error type be?

Rust's Result type is parameterized over an error type. What should you use?

For app code, consider anyhow. For library code, use your own enum of error conditions - you can use thiserror to make this more pleasant.

When should I take or return dyn Trait?

In either C++ or Rust, you can choose between monomorphization (that is, building code multiple times for each permutation of parameter types) or dynamic dispatch (i.e. looking up the correct implementation using vtables).

In C++ the syntax is completely different - templates vs virtual functions. In Rust the syntax is almost identical - in some cases it's as simple as exchanging the impl keyword with the dyn keyword.

Given this flexibility to switch strategies, which should you start with?

In both languages, monomorphization tends to result in a quicker program (partly due to better inlining). It's arguably true that inlining is more important in Rust, due to its functional nature and pervasive use of iterators. Whether or not that's the reason, experienced Rustaceans usually start with impl:

It's best practice to start with monomorphization and move to dyn... - MG

The main cost of monomorphization is larger binaries. There are cases where large amounts of code can end up being duplicated (the marvellous serde is one).

You can choose to do things the other way round:

... it’s workable practice to start with dyn and then move to impl when you have problems. - MG

dyn can be awkward, and potentially expensive in different ways:

One thing to note about pervasive dyn is that because it unsizes the types it wraps, you need to box it if you want to store it by value. You end up with a good bit more allocator pressure if you try to have dyn field types. - AP

<'a>I seem to have lots of named lifetimes. Am <'b>I doing something wrong?

Some say that if you have a significant number of named lifetimes, you're overcomplicating things.

There are some scenarios where multiple named lifetimes make perfect sense - for example if you're dealing with an arena, or major phases of a process (the Rust compiler has 'gcx and 'tcx lifetimes relating to the output of certain compile phases.)

But otherwise, it may be that you've got lifetimes because you're trying too hard to avoid a copy. You may be better off simply switching to runtime checking (e.g. Rc, Arc) or even cloning.

Are named lifetimes even a "code smell"?

My experience has been that the extent to which they're a smell varies a good bit based on the programmer's experience level, which has led me towards increased skepticism over time. Lots of people learning Rust have experienced the pain of first not wanting to .clone() something, immediately putting lifetimes everywhere, and then feeling the pain of lifetime subtyping and variance. I don't think they're nearly as odorous as unsafe, for example, but treating them as a bit of a smell does I think lead to code that's easier to read for a newcomer and to refactor around the stack. - AP

Questions about your types

My 'class' needs mutable references to other things to do its job. Other classes need mutable references to these things too. What do I do?

It's common in C++ to have a class that contain mutable references to other objects; the class mutates those objects to do its work. Often, there are several classes that all hold a mutable reference to the same object. Here is a diagram that illustrates this:

flowchart LR
    subgraph Shared functionality
    important[Important Shared Object]
    end
    subgraph ObjectA
    methodA[Method]
    refa[Mutable Reference]-->important
    methodA-. Acts on shared object.->important
    end
    subgraph ObjectB
    refb[Mutable Reference]-->important
    methodB[Method]
    methodB-. Acts on shared object.->important
    end
    main --> ObjectA
    main --> ObjectB
    main-. Calls .-> methodA
    main-. Calls .-> methodB

In Rust, you can't have multiple mutable references to a shared object, so what do you do?

First of all, consider moving behavior out of your types. (See the answer about the observer pattern and especially the second option described there.)

Even in Rust, though, it's still often the best choice to make complex behavior part of the type within impl blocks. You can still do that - but don't store references. Instead, pass them into each function call.

flowchart LR
    subgraph Shared functionality
    important[Important Shared Object]
    end
    subgraph ObjectA
    methodA[Method]
    methodA-. Acts on shared object.->important
    end
    subgraph ObjectB
    methodB[Method]
    methodB-. Acts on shared object.->important
    end
    main --> ObjectA
    main --> ObjectB
    main --> important
    main-. Passes reference to shared object.-> methodA
    main-. Passes reference to shared object.-> methodB

Instead of this:

struct ImportantSharedObject;
struct ObjectA<'a> {
   important_shared_object: &'a mut ImportantSharedObject,
}
impl<'a> ObjectA<'a> {
   fn new(important_shared_object: &'a mut ImportantSharedObject) -> Self {
       Self {
           important_shared_object
       }
   }
   fn do_something(&mut self) {
       // act on self.important_shared_object
   }
}
fn main() {
    let mut shared_thingy = ImportantSharedObject;
    let mut a = ObjectA::new(&mut shared_thingy);
    a.do_something(); // acts on shared_thingy
}

Do this:

struct ImportantSharedObject;
struct ObjectA;
impl ObjectA {
   fn new() -> Self {
       Self
   }
   fn do_something(&mut self, important_shared_object: &mut ImportantSharedObject) {
       // act on important_shared_object
   }
}
fn main() {
    let mut shared_thingy = ImportantSharedObject;
    let mut a = ObjectA::new();
    a.do_something(&mut shared_thingy); // acts on shared_thingy
}

(Happily this also gets rid of named lifetime parameters.)

If you have a hundred such shared objects, you probably don't want a hundred function parameters. So it's usual to bundle them up into a context structure which can be passed into each function call:

struct ImportantSharedObject;
struct AnotherImportantObject;
struct Ctx<'a> {
    important_shared_object: &'a mut ImportantSharedObject,
    another_important_object: &'a mut AnotherImportantObject,
}

struct ObjectA;
impl ObjectA {
   fn new() -> Self {
       Self
   }
   fn do_something(&mut self, ctx: &mut Ctx) {
       // act on ctx.important_shared_object and ctx.another_important_thing
   }
}
fn main() {
    let mut shared_thingy = ImportantSharedObject;
    let mut another_thingy = AnotherImportantObject;
    let mut ctx = Ctx {
        important_shared_object: &mut shared_thingy,
        another_important_object: &mut another_thingy,
    };
    let mut a = ObjectA::new();
    a.do_something(&mut ctx); // acts on both the shared thingies
}
flowchart LR
    subgraph Shared functionality
    important[Important Shared Object]
    end
    subgraph Context
    refa[Mutable Reference]-->important
    end
    subgraph ObjectA
    objectA[Object A]
    methodA[Method]
    methodA-. Acts on shared object.->important
    end
    subgraph ObjectB
    objectB[Object B]
    methodB[Method]
    methodB-. Acts on shared object.->important
    end
    main --> objectA
    main --> objectB
    main --> Context
    main-. Passes context.-> methodA
    main-. Passes context.-> methodB

Even simpler: just put all the data directly into Ctx. But the key point is that this context object is passed around into just about all function calls rather than being stored anywhere, thus negating any borrowing/lifetime concerns.

This pattern can be seen in bindgen, for example.

Split out borrowing concerns from the object concerns. - MG

To generalize this idea, try to avoid storing references to anything that might need to be changed. Instead take those things as parameters. For instance petgraph takes the entire graph as context to a Walker object, such that the graph can be changed while you're walking it.

My type needs to store arbitrary user data. What do I do instead of void *?

Ideally, your type would know all possible types of user data that it could store. You'd represent this as an enum with variant data for each possibility. This would give complete compile-time type safety.

But sometimes code needs to store data for which it can't depend upon the definition: perhaps it's defined by a totally different area of the codebase, or belongs to clients. Such possibilities can't be enumerated in advance. Until recently, the only real option in C++ was to use a void * and have clients downcast to get their original type back. Modern C++ offers a much better option, std::any; if you've come across that, Rust's equivalent will seem very familiar.

In Rust, the Any type allows you to store anything and retrieve it later in a type-safe fashion:

use std::any::Any;

struct MyTypeOfUserData(u8);

fn main() {
  let any_user_data: Box<dyn Any> = Box::new(MyTypeOfUserData(42));
  let stored_value = any_user_data.downcast_ref::<MyTypeOfUserData>().unwrap().0;
  println!("{}", stored_value);
}

If you want to be more prescriptive about what can be stored, you can define a trait (let's call it UserData) and store a Box<dyn UserData>. Your trait should have a method fn as_any(&self) -> &dyn std::any::Any; Each implementation can just return self.

Your caller can then do this:

trait UserData {
  fn as_any(&self) -> &dyn std::any::Any;
  // ...other trait methods which you wish to apply to any UserData...
}

struct MyTypeOfUserData(u8);

impl UserData for MyTypeOfUserData {
  fn as_any(&self) -> &dyn std::any::Any { self }
}

fn main() {
  // Store a generic Box<dyn UserData>
  let user_data: Box<dyn UserData> = Box::new(MyTypeOfUserData(42));
  // Get back to a specific type
  let stored_value = user_data.as_any().downcast_ref::<MyTypeOfUserData>().unwrap().0;
  println!("{}", stored_value);
}

Of course, enumerating all possible stored variants remains preferable such that the compiler helps you to avoid runtime panics.

When should I put my data in a Box?

In C++, you often need to box things for ownership reasons, whereas in Rust it's typically just a performance trade-off. It's arguably premature optimization to use boxes unless your profiling shows a lot of memcpy of that particular type (or, perhaps, the relevant clippy lint informs you that you have a problem.)

I never box things unless they're really big. - MG

Another heuristic is if part of your data structure is very rarely filled, in which case you may wish to Box it to avoid incurring an overhead for all other instances of the type.

struct Humility; struct Talent; struct Ego;
struct Popstar {
  ego: Ego,
  talent: Talent,
  humility: Option<Box<Humility>>,
}
fn main() {}

(This is one reason why people like using anyhow for their errors; it means the failure case in their Result enum is only a pointer wide.)

Of course, Rust may require you to use a box:

  • if you need to Pin some data, typically for async Rust, or
  • if you otherwise have an infinitely sized data structure

but as usual, the compiler will explain very nicely.

Should I have public fields or accessor methods?

The trade-offs are similar to C++ except that Rust's pattern-matching makes it very convenient to match on fields, so within a realm of code that you own you may bias towards having more public fields than you're used to. As with C++, this can give you a future compatibility burden.

When should I use a newtype wrapper?

The newtype wrapper pattern uses Rust's type systems to enforce extra behavior without necessarily changing the underlying representation.


#![allow(unused)]
fn main() {
fn get_rocket_length() -> Inches { Inches(7) }
struct Inches(u32);
struct Centimeters(u32);

fn build_mars_orbiter() {
  let rocket_length: Inches = get_rocket_length();
  // mate_to_orbiter(rocket_length); // does not compile because this takes cm
}
}

Other examples that have been used:

  • An IP address which is guaranteed not to be localhost;
  • Non-zero numbers;
  • IDs which are guaranteed to be unique

Such new types typically need a lot of boilerplate, especially to implement the traits which users of your type would expect to find. On the other hand, they allow you to use Rust's type system to statically prevent logic bugs.

A heuristic: if there are some invariants you'd be checking for at runtime, see if you can use a newtype wrapper to do it statically instead. Although it may be more code to start with, you'll save the effort of finding and fixing logic bugs later.

How else can I use Rust's type system to avoid high-level logic bugs?

Lots of ways:

Zero-sized types.

Also known as "ZSTs". These are types which occupy literally zero bytes, and so (generally) make no difference whatsoever to the code generated. But you can use them in the type system to enforce invariants at compile-time with no runtime check.

For example, they're often used as capability tokens - you can statically prove that code exclusively has the right to do something.

pub trait ValidationStatus {}

mod validator {
  use self::super::{Bytecode, ValidationStatus};
  /// ZST marker to show that bytecode has been validated.
  // Private field ensures this can't be created outside this mod
  // but PhantomData means this is still zero-sized.
  pub struct BytecodeValidated(std::marker::PhantomData<u8>);
  pub fn validate_bytecode<V: ValidationStatus>(code: Bytecode<V>) -> Bytecode<BytecodeValidated> {
    // Do expensive validation operation here...
   Bytecode {
    validated: BytecodeValidated(std::marker::PhantomData),
    code: code.code
   }
  }
  impl ValidationStatus for BytecodeValidated {}
}

struct BytecodeNotValidated;

impl ValidationStatus for BytecodeNotValidated {}

pub struct Bytecode<V: ValidationStatus> {
  validated: V,
  code: Vec<u8>,
}

fn run_bytecode(bytecode: &Bytecode<validator::BytecodeValidated>) {
  // Compiler PROVES you validated it before you can run it. There are no
  // runtime branches involved.
}

fn get_unvalidated_bytecode() -> Bytecode<BytecodeNotValidated> {
  // ...
  Bytecode {
   validated: BytecodeNotValidated,
   code: Vec::new()
 }
}

fn main() {
  let bytecode = get_unvalidated_bytecode();
  // run_bytecode(bytecode); // does not compile
  let bytecode = validator::validate_bytecode(bytecode);
  run_bytecode(&bytecode);
  run_bytecode(&bytecode);
}

ZSTs can also be used to demonstrate exclusive access to some resource.

struct RobotArmAccessToken;

fn move_arm(token: &mut RobotArmAccessToken, x: u32, y: u32, z: u32) {
  // ...
}

fn attach_car_door(token: &mut RobotArmAccessToken) {
  move_arm(token, 3, 4, 6);
  move_arm(token, 5, 3, 6);
}

fn install_windscreen(token: &mut RobotArmAccessToken) {
  move_arm(token, 7, 8, 2);
  move_arm(token, 1, 2, 3);
}

fn main() {
  let mut token = RobotArmAccessToken; // ensure only one exists
  attach_car_door(&mut token);
  install_windscreen(&mut token);
}

(The type system would prevent these operations happening in parallel.)

Marker traits

Indicate that a type meets certain invariants, so subsequent users of that type don't need to check at runtime. A common example is to indicate that a type is safe to serialize into some bytestream.

Enums as state machines.

Each enum variant is a state and stores data associated with that state. There simply is no possibility that the data can get out of sync with the state.


#![allow(unused)]
fn main() {
enum ElectionState {
  RaisingDonations { amount_raised: u32 },
  DoingTVInterviews { interviews_done: u16 },
  Voting { votes_for_me: u64, votes_for_opponent: u64 },
  Elected,
  NotElected,
};
}

A more heavyweight approach here is to define types for each state, and allow valid state transitions by taking the previous state by-value and returning the next state by-value.


#![allow(unused)]
fn main() {
struct Seed { water_available: u32 }
struct Growing { water_available: u32, sun_available: u32 }
struct Flowering;
struct Dead;

enum PlantState {
  Seed(Seed),
  Growing(Growing),
  Flowering(Flowering),
  Dead(Dead)
}

impl Seed {
  fn advance(self) -> PlantState {
    if self.water_available > 3 {
      PlantState::Growing(Growing { water_available: self.water_available, sun_available: 0 })
    } else {
      PlantState::Dead(Dead)
    }
  }
}

impl Growing {
  fn advance(self) -> PlantState {
    if self.water_available > 3 && self.sun_available > 3 {
      PlantState::Flowering(Flowering)
    } else {
      PlantState::Dead(Dead)
    }
  }
}

impl Flowering {
  fn advance(self) -> PlantState {
    PlantState::Dead(Dead)
  }
}

impl Dead {
  fn advance(self) -> PlantState {
    PlantState::Dead(Dead)
  }
}

impl PlantState {
  fn advance(self) -> Self {
    match self {
      Self::Seed(seed) => seed.advance(),
      Self::Growing(growing) => growing.advance(),
      Self::Flowering(flowering) => flowering.advance(),
      Self::Dead(dead) => dead.advance(),
    }
  }
}

// we should probably find a way to inject some sun and water into this
// state machine or things are not looking rosy
}

What should I do instead of inheritance?

Use composition. Sometimes this results in more boilerplate, but it avoids a raft of complexity.

Specifically, for example:

  • you might include the "superclass" struct as a member of the subclass struct;
  • you might use an enum with different variants for the different possible "subclasses".

Usually the answer is obvious: it's unlikely that your Rust code is structured in such a way that inheritance would be a good fit anyway.

I've only missed inheritance when actually implementing languages which themselves have inheritance. - MG

I need a list of nodes which can refer to one another. How?

You can't easily do self-referential data structures in Rust. The usual workaround is to use an arena and replace references from one node to another with node IDs.

An arena is typically a Vec (or similar), and the node IDs are a newtype wrapper around a simple integer index.

Obviously, Rust doesn't check that your node IDs are valid. If you don't have proper references, what stops you from having stale IDs?

Arenas are often purely additive, which means that you can add entries but not delete them (example). If you must have an arena which deletes things, then use generational IDs; see the generational-arena crate and this RustConf keynote for more details.

If arenas still sound like a nasty workaround, consider that you might choose an arena anyway for other reasons:

  • All of the objects in the arena will be freed at the end of the arena's lifetime, instead of during their manipulation, which can give very low latency for some use-cases. Bumpalo formalizes this.
  • The rest of your program might have real Rust references into the arena. You can give the arena a named lifetime ('arena for example), making the provenance of those references very clear.

I'm having a miserable time making my data structure. Should I use unsafe?

Low-level data structures are hard in Rust, especially if they're self- referential. Rust will make visible all sorts of risks of ownership and shared mutable state which may not be visible in other languages, and they're hard to solve in low-level data structure code.

Even something as simple as a doubly-linked list is notoriously hard; so much so that there is a book that teaches Rust based solely on linked lists. As that (wonderful) book makes clear, you are often faced with a choice:

If you're facing this decision... perhaps there's a third way.

You should almost always be using somebody else's tried-and-tested data structure.

petgraph and slotmap are great examples. Use someone else's crate by default, and resort to writing your own only if you exhaust that option.

C++ makes it hard to pull in third-party dependencies, so it's culturally normal to write new code. Rust makes it trivial to add dependencies, and so you will need to do that, even if it feels surprising for a C++ programmer.

This ease of adding dependencies co-evolved with the difficulty of making data structures. It's simply a part of programming in Rust. You just can't separate the language and the ecosystem.

You might argue that this dependency on third-party crates is concerning from a supply-chain security point of view. Your author would agree, but it's just the way you do things in Rust. Stop creating your own data structures.

Then again:

it’s equally miserable to implement performant, low-level data structures in C++; you’ll be specializing on lots of things like is_trivially_movable etc. - MY.

I nevertheless have to write my own data structure. Should I use unsafe?

I'm sorry to hear that.

Some suggestions:

  • Use Rc, weak etc. until you really can't.
  • Even if you can't use a pre-existing crate for the whole data structure, perhaps you can use a crate to avoid the unsafe bits (for example rental)
  • Bear in mind that refactoring Rust is generally safer than refactoring C++ (because the compiler will point out a higher proportion of your mistakes) so a wise strategy might be to start with a fully-safe, but slow, version, establish solid tests, and then reach for unsafe.

Questions about designing APIs for others

See also the excellent Rust API guidelines. The document you're reading aims to provide extra hints which may be especially useful to folk coming from C++, but that's the canonical reference.

When should my type implement Default?

Whenever you'd provide a default constructor in C++.

When should my type implement From, Into and TryFrom?

You should think of these as equivalent to implicit conversions in C++. Just as with C++, if there are multiple ways to convert from your thing to another thing, don't implement these, but if there's a single obvious conversion, do.

Usually, don't implement Into but instead implement From.

How should I expose constructors?

See the previous two answers: where it's simple and obvious, use the standard traits to make your object behavior predictable.

If you need to go beyond that, remember you've got a couple of extra toys in Rust:

  • A "constructor" could return a Result<Self>
  • Your constructors can have names, e.g. Vec::with_capacity, Box::pin

When should my type implement AsRef?

If you have a type which contains another type, provide AsRef especially so that people can clone the inner type. It's good practice to provide explicit versions as well (for example, String implements AsRef<str> but also provides .as_str().)

When should I implement Copy?

Anything that is integer-like or reference-like should be Copy; other things shouldn’t. - MY

When it's efficient and when it’s an API contact you're willing to uphold. - AH

Generally speaking, types which are plain-old-data can be Copy. Anything more nuanced with any type of state shouldn't be.

Should I have Arc or Rc in my API?

It’s a code smell to have reference counts in your API design. You should hide it. - TM.

If you must, you will need to decide between Rc and Arc - see the next answer for some considerations. But, generally, Arc is better practice because it imposes fewer restrictions on your callers. Also, consider taking a look at the Archery crate.

Should my API be thread-safe? What does that mean?

In C++, a thread-safe API usually means that you can expect your API's consumers to use objects from multiple threads. This is difficult to make safe and therefore substantial extra engineering is required to make an API thread-safe.

In Rust, things differ:

  • it's more normal to do things across multiple threads;
  • you don't have to worry about your callers making mistakes here because the compiler won't let them;
  • you can often rely on Send rather than Sync.

You certainly shouldn't be putting a Mutex around all your types. If your caller attempts to use the type from multiple threads, the compiler will simply stop them. It is the responsibility of the caller to use things safely.

If the library has Arc or Rc in the APIs, it may be making choices about how you should instantiate stuff, and that’s rude. - AF

There's a reasonable chance that your API can be used in parallel threads by virtue of Send and Sync being automatically derived. But - you should think through the usage model for your API clients and ensure that's true.

use std::cell::RefCell;
use std::collections::VecDeque;
use std::sync::Mutex;
use std::thread;

// Imagine this is your library, exposing this interface to library
// consumers...
mod pizza_api {

    use std::thread;
    use std::time::Duration;

    pub struct Pizza {
        // automatically 'Send'
        _anchovies: u32,
        _pepperoni: u32,
    }

    pub fn make_pizza() -> Pizza {
        println!("cooking...");
        thread::sleep(Duration::from_millis(10));
        Pizza {
            _anchovies: 0, // yuck
            _pepperoni: 32,
        }
    }

    pub fn eat_pizza(_pizza: Pizza) {
        println!("yum")
    }
}

// Absolutely no changes are required to the pizza library to let
// it be usable from a multithreaded context
fn main() {
    let pizza_queue = Mutex::new(RefCell::new(VecDeque::new()));
    thread::scope(|s| {
        s.spawn(|| {
            let mut pizzas_eaten = 0;
            while pizzas_eaten < 100 {
                if let Some(pizza) = pizza_queue.lock().unwrap().borrow_mut().pop_front() {
                    pizza_api::eat_pizza(pizza);
                    pizzas_eaten += 1;
                }
            }
        });
        s.spawn(|| {
            for _ in 0..100 {
                let pizza = pizza_api::make_pizza();
                pizza_queue.lock().unwrap().borrow_mut().push_back(pizza);
            }
        });
    });
}

What should I Derive to make my code optimally usable?

The official guidelines say to be eager.

But don't overpromise:

Equality can suddenly become expensive later - don’t make types comparable unless you intend people to be able to compare instances of the type. Allowing people to pattern match on enums is usually better. - MY

Note that syn is a rare case in that it has so many types, and is so extensively depended upon by the rest of the Rust ecosystem, that it avoids deriving the standard traits unless explicitly commanded to do so via a cargo feature. This is an unusual pattern and should not normally be followed.

How should I think about API design, differently from C++?

Make the most of the fact that everything is immutable by default. Things which are mutable should stick out. - AF

Think about things which should take self and return self. - AF

Refactoring is less expensive in Rust than C++ due to compiler safeguards, but rearchitecting is expensive in any language. Think about "one way doors" and "two way doors" in the design space: can you undo a change later?

Questions about your whole codebase

The C++ observer pattern is hard in Rust. What to do?

The C++ observer pattern usually means that there are broadcasters sending messages to consumers:

flowchart TB
    broadcaster_a[Broadcaster A]
    broadcaster_b[Broadcaster B]
    consumer_a[Consumer A]
    consumer_b[Consumer B]
    consumer_c[Consumer C]
    broadcaster_a --> consumer_a
    broadcaster_b --> consumer_a
    broadcaster_a --> consumer_b
    broadcaster_b --> consumer_b
    broadcaster_a --> consumer_c
    broadcaster_b --> consumer_c

The broadcasters maintain lists of consumers, and the consumers act in response to messages (often mutating their own state.)

This doesn't work in Rust, because it requires the broadcasters to hold mutable references to the consumers.

What do you do?

Option 1: make everything runtime-checked

Each of your consumers could become an Rc<RefCell<T>> or, if you need thread-safety, an Arc<RwLock<T>>.

The Rc or Arc allows broadcasters to share ownership of a consumer. The RefCell or RwLock allows each broadcaster to acquire a mutable reference to a consumer when it needs to send a message.

This example shows how, in Rust, you may independently choose reference counting or interior mutability. In this case we need both.

Just like typical reference counting in C++, Rc and Arc have the option to provide a weak pointer, so the lifetime of each consumer doesn't need to be extended unnecessarily. As an aside, it would be nice if Rust had an Rc-like type which enforces exactly one owner, and multiple weak ptrs. Rc could be wrapped quite easily to do this.

Reference counting is frowned-upon in C++ because it's expensive. But, in Rust, not so much:

  • Few objects are reference counted; the majority of objects are owned statically.
  • Even when objects are reference counted, those counts are rarely incremented and decremented because you can (and do) pass around &Rc<RefCell<T>> most of the time. In C++, the "copy by default" mode means it's much more common to increment and decrement reference counts.

In fact, the compile-time guarantees might cause you to do less reference counting than C++:

In Servo there is a reference count but far fewer objects are reference counted than in the rest of Firefox, because you don’t need to be paranoid - MG

However: Rust does not prevent reference cycles, although they're only possible if you're using both reference counting and interior mutability.

Option 2: drive the objects from the code, not the other way round

In C++, it's common to have all behavior within classes. Those classes are the total behavior of the system, and so they must interact with one another. The observer pattern is common.

flowchart TB
    broadcaster_a[Broadcaster A]
    consumer_a[Consumer A]
    consumer_b[Consumer B]
    broadcaster_a -- observer --> consumer_a
    broadcaster_a -- observer --> consumer_b

In Rust, it's more common to have some external function which drives overall behavior.

flowchart TB
    main(Main)
    broadcaster_a[Broadcaster A]
    consumer_a[Consumer A]
    consumer_b[Consumer B]
    main --1--> broadcaster_a
    broadcaster_a --2--> main
    main --3--> consumer_a
    main --4--> consumer_b

With this sort of design, it's relatively straightforward to take some output from one object and pass it into another object, with no need for the objects to interact at all.

In the most extreme case, this becomes the Entity-Component-System architecture used in game design.

Game developers seem to have completely solved this problem - we can learn from them. - MY

Option 3: use channels

The observer pattern is a way to decouple large, single-threaded C++ codebases. But if you're trying to decouple a codebase in Rust, perhaps you should assume multi-threading by default? Rust has built-in channels, and the crossbeam crate provides multi-producer, multi-consumer channels.

I'm a Rustacean, we assume massively parallel unless told otherwise :) - MG

That's all very well, but I have an existing C++ object broadcasting events. How exactly should I observe it?

If your Rust object is a consumer of events from some pre-existing C++ producer, all the above options remain possible.

  • You can make your object reference counted and have C++ own such a reference (potentially a weak reference)
  • C++ can deliver the message into a general message bucket. An external function reads messages from that bucket and invokes the Rust object that should handle it. This means the reference counting doesn't need to extend to the Rust objects outside that boundary layer.
  • You can have a shim object which converts the C++ callback into some message and injects it into a channel-based world.

Some of my C++ objects have shared mutable state. How can I make them safe in Rust?

You're going to have to do something with interior mutability: either RefCell<T> or its multithreaded equivalent, RwLock<T>.

You have three decisions to make:

  1. Will only Rust code access this particular instance of this object, or might C++ access it too?
  2. If both C++ and Rust may access the object, how do you avoid conflicts?
  3. How should Rust code react if the object is not available, because something else is using it?

If only Rust code can use this particular instance of shared state, then simply wrap it in RefCell<T> (single-threaded) or RwLock<T> (multi-threaded). Build a wrapper type such that callers aren't able to access the object directly, but instead only via the lock type.

If C++ also needs to access this particular instance of the shared state, it's more complex. There are presumably some invariants regarding use of this data in C++ - otherwise it would crash all the time. Perhaps the data can be used only from one thread, or perhaps it can only be used with a given mutex held. Your goal is to translate those invariants into an idiomatic Rust API that can be checked (ideally) at compile-time, and (failing that) at runtime.

For example, imagine:

class SharedMutableGoat {
public:
    void eat_grass(); // mutates tummy state
};

std::mutex lock;
SharedMutableGoat* billy; // only access when owning lock

Your idiomatic Rust wrapper might be:


#![allow(unused)]
fn main() {
mod ffi {
  #[allow(non_camel_case_types)]
  pub struct lock_guard;
  pub fn claim_lock() -> lock_guard { lock_guard{} }
  pub fn eat_grass() {}
  pub fn release_lock(lock: &mut lock_guard) {}
}
struct SharedMutableGoatLock {
    lock: ffi::lock_guard, // owns a std::lock_guard<std::mutex> somehow
};

// Claims the lock, returns a new SharedMutableGoatLock
fn lock_shared_mutable_goat() -> SharedMutableGoatLock {
    SharedMutableGoatLock { lock: ffi::claim_lock() }
}

impl SharedMutableGoatLock {
    fn eat_grass(&mut self) {
        ffi::eat_grass(); // Acts on the global goat
    }
}

impl Drop for SharedMutableGoatLock {
    fn drop(&mut self) {
        ffi::release_lock(&mut self.lock);
    }
}
}

Obviously, lots of permutations are possible, but the goal is to ensure that it's simply compile-time impossible to act on the global state unless appropriate preconditions are met.

The final decision is how to react if the object is not available. This decision can apply with C++ mutexes or with Rust locks (for example RwLock<T>). As in C++, the two major options are:

  • Block until the object becomes available.
  • Try to lock, and if the object is not available, do something else.

There can be a third option if you're using async Rust. If the data isn't available, you may be able to return to your event loop using an async version of the lock (Tokio example, async_std example).

How do I do a singleton?

Use OnceCell.

What's the best way to retrofit Rust's parallelism benefits to an existing codebase?

When parallelizing an existing codebase, first check that all existing types are correctly Send and Sync. Generally, though, you should try to avoid implementing these yourself - instead use pre-existing wrapper types which enforce the correct contract (for example, RwLock).

After that:

If you can solve your problem by throwing Rayon at it, do. It’s magic - MG

If your task is CPU-bound, Rayon solves this handily. - MY

Rayon offers parallel constructs - for example parallel iterators - which can readily be retrofitted to an existing codebase. It also allows you to create and join tasks. Using Rayon can help simplify your code and eliminate lots of manual scheduling logic.

If your tasks are IO-bound, then you may need to look into async Rust, but that's hard to pull into an existing codebase.

What's the best way to architect a new codebase for parallelism?

In brief, like in other languages, you have a choice of architectures:

  • Message-passing, using event loops which listen on a channel, receive Send data and pass it on.
  • More traditional multithreading using Sync data structures such as mutexes (and perhaps Rayon).

There's probably a bias towards message-passing, and that's probably well-informed by its extensibility. - MG

I need a list of nodes which can refer to one another. How?

You can't easily do self-referential data structures in Rust. The usual workaround is to use an arena and replace references from one node to another with node IDs.

An arena is typically a Vec (or similar), and the node IDs are a newtype wrapper around a simple integer index.

Obviously, Rust doesn't check that your node IDs are valid. If you don't have proper references, what stops you from having stale IDs?

Arenas are often purely additive, which means that you can add entries but not delete them (example). If you must have an arena which deletes things, then use generational IDs; see the generational-arena crate and this RustConf keynote for more details.

If arenas still sound like a nasty workaround, consider that you might choose an arena anyway for other reasons:

  • All of the objects in the arena will be freed at the end of the arena's lifetime, instead of during their manipulation, which can give very low latency for some use-cases. Bumpalo formalizes this.
  • The rest of your program might have real Rust references into the arena. You can give the arena a named lifetime ('arena for example), making the provenance of those references very clear.

Should I have a few big crates or lots of small ones?

In the past, it was recommended to have small crates to get optimal build time. Incremental builds generally make this unnecessary now. You should arrange your crates optimally for your semantic needs.

What crates should everyone know about?

CrateDescription
rayonparallelizing
serdeserializing and deserializing
crossbeamall sorts of parallelism tools
itertoolsmakes it slightly more pleasant to work with iterators. (For instance, if you want to join an iterator of strings, you can just go ahead and do that, without needing to collect the strings into a Vec first)
petgraphgraph data structures
slotmaparena-like key-value map
nomparsing
clapcommand-line parsing
regexerr, regular expressions
ringthe leading crypto library
nalgebralinear algebra
once_cellcomplex static data

How should I call C++ functions from Rust and vice versa?

Use cxx.

Oh, you want a justification? In that case, here's the history which brought us to this point.

From the beginning, Rust supported calling C functions using extern "C", #[repr(C)] and #[no_mangle]. Such callable C functions had to be declared manually in Rust:

sequenceDiagram
   Rust-->>extern: unsafe Rust function call
   extern-->>C: call from Rust to C
   participant extern as Rust unsafe extern "C" fn
   participant C as Existing C function

bindgen was invented to generate these declarations automatically from existing C/C++ header files. It has grown to understand an astonishingly wide variety of C++ constructs, but its generated bindings are still unsafe functions with lots of pointers involved.

sequenceDiagram
   Rust-->>extern: unsafe Rust function call
   extern-->>C: call from Rust to C++
   participant extern as Bindgen generated bindings
   participant C as Existing C++ function

Interacting with bindgen-generated bindings requires unsafe Rust; you will likely have to manually craft idiomatic safe Rust wrappers. This is time-consuming and error-prone.

cxx automates a lot of that process. Unlike bindgen it doesn't learn about functions from existing C++ headers. Instead, you specify cross-language interfaces in a Rust-like interface definition language (IDL) within your Rust file. cxx generates both C++ and Rust code from that IDL, marshaling data behind the scenes on both sides such that you can use standard language features in your code. For example, you'll find idiomatic Rust wrappers for std::string and std::unique_ptr and idiomatic C++ wrappers for a Rust slice.

sequenceDiagram
   Rust-->>rsbindings: safe idiomatic Rust function call
   rsbindings-->>cxxbindings: hidden C ABI call using marshaled data
   cxxbindings-->>cpp: call to standard idiomatic C++
   participant rsbindings as cxx-generated Rust code
   participant cxxbindings as cxx-generated C++ code
   participant cpp as C++ function using STL types

In the bindgen case even more work goes into wrapping idiomatic C++ signatures into something bindgen compatible: unique ptrs to raw ptrs, Drop impls on the Rust side, translating string types ... etc. The typical real-world binding we've converted from bindgen to cxx in my codebase has been -500 lines (mostly unsafe code) +300 lines (mostly safe code; IDL included). - DT

The greatest benefit is that cxx sufficiently understands C++ STL object ownership norms that the generated bindings can be used from safe Rust code.

At present, there is no established solution which combines the idiomatic, safe interoperability offered by cxx with the automatic generation offered by bindgen. It's not clear whether this is even possible but several projects are aiming in this direction.

I'm getting a lot of binary bloat.

In Rust you have a free choice between impl Trait and dyn Trait. See this answer, too. impl Trait tends to be the default, and results in large binaries as much code can be duplicated. If you have this problem, consider using dyn Trait. Other options include the 'thin template pattern' (an example is serde_json where the code to read from a string and a slice would be duplicated entirely, but instead one delegates to the other and requests slightly different behavior.)

Questions about your development processes

How should I use tools differently from C++?

  • Use rustfmt automatically everywhere. While in C++ there are many different coding styles, the Rust community is in agreement (at least, they're in agreement that it's a good idea to be in agreement). That is codified in rustfmt. Use it, automatically, on every submission.
  • Use clippy somewhere. Its lints are useful.
  • Use IDEs more liberally. Even staunch vim-adherents (your author!) prefer to use an IDE with Rust, because it's simply invaluable to show type annotations. Type information is typically invisible in the language so in Rust you're more reliant on tooling assistance.
  • Deny unsafe code by default. (#![forbid(unsafe_code)]).