Skip to content

Conversation

@RoDmitry
Copy link
Collaborator

@RoDmitry RoDmitry commented Nov 9, 2025

Less String clones -> more performance 😊
Also Derive optimizations.

@RoDmitry
Copy link
Collaborator Author

RoDmitry commented Nov 9, 2025

@haydenhoang check it out. Does it look good?

@RoDmitry
Copy link
Collaborator Author

RoDmitry commented Nov 9, 2025

There is a problem which I have skipped, in OpenApi generation:

pub struct UpdateNlSearchModelParams<'p> {
    /// The ID of the NL search model to update
    pub model_id: Cow<'p, str>,
    /// The NL search model fields to update
    pub body: models::NlSearchModelCreateSchema, // missing <'p>
    pub _phantom: PhantomData<&'p ()>,
}

Somehow it thinks that models::NlSearchModelCreateSchema isPrimitiveType, maybe some mistake in openapi.yml.
And actually I don't even care about it. Fixable by hand 😏

@haydenhoang
Copy link
Contributor

Hey @RoDmitry do you have any benchmark showing performance gain from replacing String with Cow<'_, str>?

Doing that make the code more complex and harder to maintain (lifetimes, phantom data), and I think the user might prefer the easy use of String if there is no significant performance gain.

@RoDmitry
Copy link
Collaborator Author

RoDmitry commented Nov 10, 2025

Benchmark of the clones? I don't think that's necessary. That's an obvious optimization. There will be significant performance gain, if you benchmark it. I don't want to, because I understand how slow memory allocations are.
The code is more complex, but not to the point that it's harder to maintain.

User will not be bothered by the lifetimes much. It's either local lifetime or 'static. Also lifetimes might be usable in other future optimizations like borrowing of slices or anything else.
String can be wrapped by Cow using Cow::from, or just using .into(), which is the replacement of .to_owned() in the current code, so a user will not care about it.

Although I agree that Cow<'_, str> does not look as solid as String especially if a user is a newbie to Rust.
In my codebase I have clones of big models::SearchParameters in loops, so it's very useful in my case.

@haydenhoang
Copy link
Contributor

Wait, isn't the deserialized string owned anyway? Since the content does not outlive the returned model.

 let content = resp.text().await?;
        match content_type {
            ContentType::Json => serde_json::from_str(&content).map_err(Error::from),
            ContentType::Text => {
                return Err(Error::from(serde_json::Error::custom(
                    "Received `text/plain` content type response that cannot be converted to `models::ApiKey`",
                )));
            }
            ContentType::Unsupported(unknown_type) => {
                return Err(Error::from(serde_json::Error::custom(format!(
                    "Received `{unknown_type}` content type response that cannot be converted to `models::ApiKey`"
                ))));
            }
        }

So Cow<'_, str> is only useful on the request/input side?

I don't think the performance benefit is worth it. Saving nanoseconds by avoiding clones but added complexity to the code (especially the public API models) while a typical API request already has milliseconds of latency.

In my codebase I have clones of big models::SearchParameters in loops, so it's very useful in my case.

If the loop runs only a small number of times, which is typical, because each iteration makes an API call, the cost of cloning SearchParameters is negligible compared to network I/O. I think the optimization only matters when you clone the struct thousands of times in a tight loop, which really isn't a common pattern for an HTTP client.

@RoDmitry
Copy link
Collaborator Author

RoDmitry commented Nov 11, 2025

So Cow<'_, str> is only useful on the request/input side?

Yes.

a typical API request already has milliseconds of latency.

Yes, but the API request is async, while clones are sync, and eat cpu time.

added complexity to the code (especially the public API models)

What complexity to the public API? Try it yourself, it doesn't change anything much. It's just an enum wrapper of borrowed/owned string. How is it hard to use? Cow has good a API for working with strings. Use .into_owned() and it will unwrap to a usual String.

the cost of cloning SearchParameters is negligible compared to network I/O.

Basically anything is negligible compared to network I/O. Does it mean that any optimizations are useless?

the optimization only matters when you clone the struct thousands of times in a tight loop

I clone it hundreds of times in a tight loop. Does it mean that I must struggle? Or just buy a better cpu? 😅

@RoDmitry
Copy link
Collaborator Author

RoDmitry commented Nov 11, 2025

Or maybe you want to separate the request models from the response models? Because response models contain only owned Strings. I didn't try that. I have no idea on how to do it in mustache. Is it possible? If you want to improve it, I'm open, and can push your commits (because the repo is not mine).

Or another idea is to create the second model, but with all of the fields as owned types (Strings), and set that owned model as the response type in the API. But it will leave half of the models unused.

@RoDmitry
Copy link
Collaborator Author

RoDmitry commented Nov 12, 2025

I think the following logic might be possible: preprocess openapi.yml to mark models used as input, then using that mark like {{#vendorExtensions.x-rust-is-used-as-input}} it would be possible to conditionally generate the models.

@haydenhoang
Copy link
Contributor

haydenhoang commented Nov 12, 2025

Optimizations are good as long as it overweigh the trade-offs. I found a post that explain this pretty well.

The trade-offs here are adding more code/making the code less clear for some performance gain that we haven't even measured yet. I don't like the public API models to carry unnecessary phantomData and lifetimes because it's more clean that way.

I think the following logic might be possible: preprocess openapi.yml to mark models used as input, then using that mark like {{#vendorExtensions.x-rust-is-used-as-input}} it would be possible to conditionally generate the models.

Well, this attempt at optimization really does make our codebase more complex.

Or maybe you want to separate the request models from the response models?

I think we should keep the type consistent across models.

Would be nice if you can open a smaller PR for the internal derive optimization first. This big PR are slowing us down, right now I'm working on writing the documentation for the derive.

@RoDmitry RoDmitry marked this pull request as draft November 12, 2025 21:43
@RoDmitry
Copy link
Collaborator Author

RoDmitry commented Nov 12, 2025

I know that PhantomData does not look good in the APIs input params. I will think about that a little more.

I found a post that explain this pretty well.

anything that isn't a trivially clear optimization should be avoided until it can be measured.

But it is a trivially clear optimization!

And Premature optimization is the practice of trying to improve a program's performance before it's finished.
Do you think that APIs that we have now are not finished?

@RoDmitry RoDmitry marked this pull request as ready for review November 13, 2025 11:42
@RoDmitry
Copy link
Collaborator Author

RoDmitry commented Nov 13, 2025

I have removed all of the PhantomData and all unused lifetimes. @haydenhoang does it look ok now?

Code generation leaves some extra lifetimes (not a lot), which I have removed manually. After the merge, I will open an issue for it. Currently don't have extra time to fix it.

@RoDmitry RoDmitry force-pushed the optimize-cow branch 6 times, most recently from a22f31a to c272f23 Compare November 13, 2025 12:08
@RoDmitry RoDmitry force-pushed the optimize-cow branch 3 times, most recently from 5bf1a13 to 12af80b Compare November 13, 2025 12:23
@haydenhoang
Copy link
Contributor

Will look into this later 👍

@haydenhoang
Copy link
Contributor

haydenhoang commented Nov 16, 2025

Ok, it looks much cleaner.

I see that we are using Cow only for request models and String for response models.

I just ran some benchmark on this and the SearchParameters that uses Cow is ~20 times faster than String. Here is the code if you are interested:

use criterion::{Criterion, black_box, criterion_group, criterion_main};
use std::borrow::Cow;

#[derive(Clone, Debug)]
pub struct SearchParametersCow<'a> {
    pub q: Option<Cow<'a, str>>,
    pub query_by: Option<Cow<'a, str>>,
    pub prefix: Option<Cow<'a, str>>,
    pub filter_by: Option<Cow<'a, str>>,
    pub sort_by: Option<Cow<'a, str>>,
    pub facet_by: Option<Cow<'a, str>>,
    pub include_fields: Option<Cow<'a, str>>,
    pub exclude_fields: Option<Cow<'a, str>>,
    pub pinned_hits: Option<Cow<'a, str>>,
    pub hidden_hits: Option<Cow<'a, str>>,
    pub override_tags: Option<Cow<'a, str>>,
    pub highlight_fields: Option<Cow<'a, str>>,
    pub preset: Option<Cow<'a, str>>,
}

#[derive(Clone, Debug)]
pub struct SearchParametersString {
    pub q: Option<String>,
    pub query_by: Option<String>,
    pub prefix: Option<String>,
    pub filter_by: Option<String>,
    pub sort_by: Option<String>,
    pub facet_by: Option<String>,
    pub include_fields: Option<String>,
    pub exclude_fields: Option<String>,
    pub pinned_hits: Option<String>,
    pub hidden_hits: Option<String>,
    pub override_tags: Option<String>,
    pub highlight_fields: Option<String>,
    pub preset: Option<String>,
}

fn make_cow() -> SearchParametersCow<'static> {
    SearchParametersCow {
        q: Some(Cow::from("search query")),
        query_by: Some(Cow::from("field1,field2,field3")),
        prefix: Some(Cow::from("pre")),
        filter_by: Some(Cow::from("filter:val")),
        sort_by: Some(Cow::from("score:desc")),
        facet_by: Some(Cow::from("category")),
        include_fields: Some(Cow::from("id,name")),
        exclude_fields: Some(Cow::from("debug_info")),
        pinned_hits: Some(Cow::from("123:1,456:2")),
        hidden_hits: Some(Cow::from("789,101")),
        override_tags: Some(Cow::from("tag1,tag2")),
        highlight_fields: Some(Cow::from("description")),
        preset: Some(Cow::from("default")),
    }
}

fn make_string() -> SearchParametersString {
    SearchParametersString {
        q: Some("search query".to_owned()),
        query_by: Some("field1,field2,field3".to_owned()),
        prefix: Some("pre".to_owned()),
        filter_by: Some("filter:val".to_owned()),
        sort_by: Some("score:desc".to_owned()),
        facet_by: Some("category".to_owned()),
        include_fields: Some("id,name".to_owned()),
        exclude_fields: Some("debug_info".to_owned()),
        pinned_hits: Some("123:1,456:2".to_owned()),
        hidden_hits: Some("789,101".to_owned()),
        override_tags: Some("tag1,tag2".to_owned()),
        highlight_fields: Some("description".to_owned()),
        preset: Some("default".to_owned()),
    }
}

fn bench_clone(c: &mut Criterion) {
    let cow = make_cow();
    let string = make_string();

    c.bench_function("clone Cow<'_, str>", |b| b.iter(|| black_box(cow.clone())));

    c.bench_function("clone String", |b| b.iter(|| black_box(string.clone())));
}

criterion_group!(benches, bench_clone);
criterion_main!(benches);

Result

clone Cow<'_, str>      time:   [20.127 ns 20.241 ns 20.391 ns]

clone String            time:   [409.90 ns 410.61 ns 411.35 ns]

I clone it hundreds of times in a tight loop.

Could you explain a bit on your use case and why are you cloning it hundred of times?

And are there any other use cases where the user needs to clone input models other than SearchParameters or MultisearchParameters multiple times? I couldn't think of any.

My thought is to only use Cow for SearchParameters and MultisearchParameters and leave the rest using String. This would make our preprocessing script much simpler and also avoid this issue that you mentioned "Code generation leaves some extra lifetimes (not a lot)"

@RoDmitry
Copy link
Collaborator Author

RoDmitry commented Nov 16, 2025

My thought is to only use Cow for SearchParameters and MultisearchParameters and leave the rest using String. This would make our preprocessing script much simpler

No, we already have the preprocessing script. Let the other structs also have better speeds. I think it's a good thing, don't hurt anybody.

and also avoid this issue that you mentioned "Code generation leaves some extra lifetimes (not a lot)"

It's fixable, I just don't have time for it. It just have to check recursively that the inner types have at least one string type. It's not a hard script to write.

Could you explain a bit on your use case and why are you cloning it hundred of times?

It's a script for collecting facet counts, where an each field is excluded in loop. It gives me the correct counts. Internal facet counts gave me different numbers, which did not suit me. Maybe I will open an issue in Typesense. I'm not sure, maybe anybody already has mentioned it. I wrote that script a couple years ago, and have not checked since.

@RoDmitry
Copy link
Collaborator Author

RoDmitry commented Nov 16, 2025

And are there any other use cases where the user needs to clone input models

I guess you don't understand that calling "sometext".to_owned() makes an actual memory allocation on heap, and that this optimization is not only about cloning an entire struct. The optimization is required when you create an owned String anywhere where you don't actually modify it. If it's not being modified, why allocate it on heap? So it's more optimal to use &str where you don't modify it, but it creates a lifetime dependency on the source String, so we use Cow to allow it be owned if needed.

And no heap allocation happens when you use Cow::from("sometext"). Heap allocation is a thing that you have benchmarked above. So it's supposed to be around the same difference between "sometext".to_owned() and Cow::from("sometext").

@haydenhoang
Copy link
Contributor

haydenhoang commented Nov 17, 2025

and that this optimization is not only about cloning an entire struct.

Ah yes, what I meant is the optimization would only be significant if the struct (or its fields) is cloned multiple times because those nanoseconds add up.

I have changed my mind. If we are gonna make SearchParameters use Cow then we should make the other input models use it too for consistency.

It's fixable, I just don't have time for it. It just have to check recursively that the inner types have at least one string type. It's not a hard script to write.

I think we should fix this issue first before merging.

@RoDmitry
Copy link
Collaborator Author

RoDmitry commented Nov 17, 2025

I have changed my mind. If we are gonna make SearchParameters use Cow then we should make the other input models use it too for consistency.

That's why I have initially made every struct to use Cow instead of String, for consistency. Because if we add a method which uses a struct previously used only as output, than there will be a breaking change inside that struct (changing fields types from String to Cow). So it would be more consistent to also have Cow in the output structs.

@haydenhoang
Copy link
Contributor

haydenhoang commented Nov 18, 2025

Hmm, this is getting a bit complicated.

I think using String for response models is the more predictable approach for most Rust users, since it aligns with what existing HTTP libraries typically do.

The best approach might be to use Cow selectively for models like SearchParameters or GetCollectionsParameters, etc... since these are the models for URL params which are input-only. These are likely to have static configuration which will benefit from using Cow.
Besides, input models for request body probably already have the string owned (it might come from an http request or dynamically constructed) which they can just take ownership.

It's a script for collecting facet counts, where an each field is excluded in loop. It gives me the correct counts. Internal facet counts gave me different numbers, which did not suit me. Maybe I will open an issue in Typesense. I'm not sure, maybe anybody already has mentioned it. I wrote that script a couple years ago, and have not checked since.

Is it this issue? It looks like recent releases may have resolved it.

I think using String is the most intuitive approach, since that’s what most Rust HTTP libraries already use and what users generally expect. Cow could provide some performance wins, but the simplicity and predictability of String is probably more valuable here. To keep things simple, I'd suggest we close this PR for now.

Thanks for the discussion and I really appreciate the work you put into this @RoDmitry!

@RoDmitry
Copy link
Collaborator Author

RoDmitry commented Nov 18, 2025

Is it this issue?

Not that issue. That's a very old one. Overall facet counts are not suitable for me.

users generally expect

Personally I would expect to see &str in input params, but using Cow is a more advanced approach. Requiring an owned String (where no mutations happen) is definitely a bad style of making an API.

I'd suggest we close this PR for now.

That's a very strange approach. Do you propose to avoid optimizations, even after benchmarking and getting 20x better performance? And you've decided to ignore my case with cloning in a loop, even after I have fixed everything you was against. I've heard you, good luck then.

@RoDmitry RoDmitry requested a review from morenol November 18, 2025 23:37
@haydenhoang
Copy link
Contributor

Because if we add a method which uses a struct previously used only as output, than there will be a breaking change inside that struct (changing fields types from String to Cow).

I just checked the openapi.yml file and found that this is very unlikely to happen. Input schemas always has *Upsert, *Update or *Create suffixes. The api spec file is written in a way so that the input and output for an endpoint always use different schemas (duplicate schemas are renamed).

Hi @morenol. Would love to hear your thoughts on using Cow<'_, str> for input models.

@RoDmitry
Copy link
Collaborator Author

RoDmitry commented Nov 19, 2025

There is absolute no difference for a user whether to use "sometext".to_owned() or "sometext".into()! That's so stupid conversation.
Also you can define your own extension trait to use "sometext".into_cow():

use std::borrow::Cow;

trait IntoCow<'a> {
    fn into_cow(self) -> Cow<'a, str>;
}

impl<'a> IntoCow<'a> for &'a str {
    fn into_cow(self) -> Cow<'a, str> {
        Cow::Borrowed(self)
    }
}

impl IntoCow<'_> for String {
    fn into_cow(self) -> Cow<'_, str> {
        Cow::Owned(self)
    }
}

@morenol can you approve?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants