May 16th, 2024 13:13 UTC · 4 weeks ago

Rust

Understanding a bit more of Rust’s type system

In particular: &str and Sized

Take a look at this Iterator expression in Rust:

let strings = vec!["foo", "foobar", "bar"];
let filtered: Vec<_> = strings.into_iter()
    .filter(|string| string.starts_with("foo"))
    .collect();

Okay, so I got it into my head that I don’t like closure syntax in Rust. The | pipes around the argument list annoyed me yesterday. I prefer the JS/TS way:

let fnSingleArg = string => string.startsWith("foo");
let fnMultiArgs = (string, prefix) => string.startsWith(prefix);

What I’d like to see in Rust might be something similar:

let fn_single_arg = string => string.starts_with("foo");
let fn_multi_args = string, prefix => string.starts_with(prefix);

but also weirder:

// This, i.e. `(a, b)` is a _single_ arg, and it's being unpacked.
let fn_unpack_arg = (a, b) => a + b;

I don’t know exactly; I haven’t given it too much thought. What I want is to optimise for ergonomics and aesthetics1 around single arguments, including unpacking single arguments, because in the code I write I am predominantly writing closures with single arguments, e.g. for use with Iterator.

That’s the preamble to what I really want to talk about. You see, I thought the iterator example would look nicer with a free starts_with function, which itself returns a function:

let filtered: Vec<_> = strings.into_iter()
    .filter(starts_with("foo"))
    .collect();

Maybe it does look and read better, but my real subject is what I learned by implementing starts_with and how it turned out to be more complex than I could have imagined.

First attempt

pub fn starts_with<T: AsRef<str>>(prefix: &str) -> impl Fn(T) -> bool + '_ {
    move |s| s.as_ref().starts_with(prefix)
}

The AsRef<str> bit means I can call it with &str or String or &String or indeed anything else that implements AsRef<str>.

But using it in our example iterator expression doesn’t work:

error[E0599]: the method `collect` exists for struct `Filter<IntoIter<&str>, impl Fn(&&str) -> bool>`, but its trait bounds were not satisfied
  --> src/strings.rs:10:5
   |
10 |     let filtered: Vec<_> = strings.into_iter().filter(starts_with_foo).collect();
   |                                                                        ^^^^^^^ method cannot be called on `Filter<IntoIter<&str>, impl Fn(&&str) -> bool>` due to unsatisfied trait bounds
   |
  ::: …/core/src/iter/adapters/filter.rs:19:1
   |
19 | pub struct Filter<I, P> {
   | ----------------------- doesn't satisfy `_: Iterator`
   |
   = note: the following trait bounds were not satisfied:
           `<impl Fn(&&str) -> bool + '_ as FnOnce<(&&str,)>>::Output = bool`
           which is required by `std::iter::Filter<std::vec::IntoIter<&str>, impl Fn(&&str) -> bool + '_>: Iterator`
           `impl Fn(&&str) -> bool + '_: FnMut<(&&str,)>`
           which is required by `std::iter::Filter<std::vec::IntoIter<&str>, impl Fn(&&str) -> bool + '_>: Iterator`
           `std::iter::Filter<std::vec::IntoIter<&str>, impl Fn(&&str) -> bool + '_>: Iterator`
           which is required by `&mut std::iter::Filter<std::vec::IntoIter<&str>, impl Fn(&&str) -> bool + '_>: Iterator`

Forcing myself to slow down and read the error message (5 or 6 times) leads me to read the code for the Iterator::filter trait method:

fn filter<P>(self, predicate: P) -> Filter<Self, P>
where
    Self: Sized,
    P: FnMut(&Self::Item) -> bool,
{
    Filter::new(self, predicate)
}

and it dawns on me that starts_with is returning the wrong type of function. Notice P up there: FnMut(&Self::Item) -> bool. It wants a reference 🤔

Second go

pub fn starts_with<T: AsRef<str>>(prefix: &str) -> impl Fn(&T) -> bool + '_ {
    move |s| s.as_ref().starts_with(prefix)
}

Note the changed return type: the function expects a reference.

This works! It compiles in the iterator expression. This is not the time for celebrations though because, weirdly, there’s a hitch when using it as a free function. Passing in explicit references, i.e. &&str and &String, is okay:

starts_with("foo")(&"foobar");
starts_with("foo")(&"foobar".to_string());

But passing in a &str is not okay:

error[E0277]: the size for values of type `str` cannot be known at compilation time
  --> src/strings.rs:11:5
   |
11 |     starts_with("foo")("foobar");
   |     ^^^^^^^^^^^^^^^^^^ doesn't have a size known at compile-time
   |
   = help: the trait `Sized` is not implemented for `str`
note: required by an implicit `Sized` bound in `starts_with`
  --> src/strings.rs:3:20
   |
3  | pub fn starts_with<T: AsRef<str>>(prefix: &str) -> impl Fn(&T) -> bool + '_ {
   |                    ^ required by the implicit `Sized` requirement on this type parameter in `starts_with`
help: consider relaxing the implicit `Sized` restriction
   |
3  | pub fn starts_with<T: AsRef<str> + ?Sized>(prefix: &str) -> impl Fn(&T) -> bool + '_ {
   |                                  ++++++++

Relaxing the Sized constraint works – thank you, rustc 🙏 – and gives us:

pub fn starts_with<T: AsRef<str> + ?Sized>(prefix: &str) -> impl Fn(&T) -> bool + '_ {
    move |s| s.as_ref().starts_with(prefix)
}

This does everything I want. I can celebrate! 🎉🥳

Yet something still bothers me: why did I need to do that + ?Sized thing?

That implicit Sized constraint

The Rust Programming Language says, in Dynamically Sized Types and the Sized Trait:

Let’s dig into the details of a dynamically sized type called str, which we’ve been using throughout the book. That’s right, not &str, but str on its own, is a DST. We can’t know how long the string is until runtime, meaning we can’t create a variable of type str, nor can we take an argument of type str.

and later on:

By default, generic functions will work only on types that have a known size at compile time. However, you can use the following special syntax to relax this restriction:

fn generic<T: ?Sized>(t: &T) {
    // --snip--
}

A trait bound on ?Sized means “T may or may not be Sized” and this notation overrides the default that generic types must have a known size at compile time. The ?Trait syntax with this meaning is only available for Sized, not any other traits.

Also note that we switched the type of the t parameter from T to &T. Because the type might not be Sized, we need to use it behind some kind of pointer. In this case, we’ve chosen a reference.

(Emphasis is mine.)

Understanding?

I feel like I understand the reasoning now, and I could apply it again in another situation, but my underlying mental model remains hazy. The kinds of unresolved thoughts going round my head:

  • I’ve read that &str is a fat pointer: a pointer to the data and the length of the string slice. Does this mean that Box<str> is also a pointer plus a size?

  • Why use &str instead of Box<str>, or vice-versa? I assume it has something to do with Box always referring to heap allocations, whereas &str can point to heap, stack, compiled-in strings, etc.

  • The compiler knows the size of literal strings at compile time, so why are they &str?

  • Is &str/str treated specially by the compiler?

And that is where I will leave this topic – for today.


1

Yes, aesthetics. I want to read code easily, and something like .map(|(foo, bar)| { foo().unwrap_or(bar); ... }) is noisy and hard to parse (with my brain 🧠). Subjectively ugly too, but Rust has a smidge of airport prettiness to it anyway.