-
Notifications
You must be signed in to change notification settings - Fork 15
Optimize: replace String with Cow<'_, str>
#60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@haydenhoang check it out. Does it look good? |
|
There is a problem which I have skipped, in OpenApi generation: Somehow it thinks that |
|
Hey @RoDmitry do you have any benchmark showing performance gain from replacing Doing that make the code more complex and harder to maintain (lifetimes, phantom data), and I think the user might prefer the easy use of |
|
Benchmark of the clones? I don't think that's necessary. That's an obvious optimization. There will be significant performance gain, if you benchmark it. I don't want to, because I understand how slow memory allocations are. User will not be bothered by the lifetimes much. It's either local lifetime or Although I agree that |
|
Wait, isn't the deserialized string owned anyway? Since the So I don't think the performance benefit is worth it. Saving nanoseconds by avoiding clones but added complexity to the code (especially the public API models) while a typical API request already has milliseconds of latency.
If the loop runs only a small number of times, which is typical, because each iteration makes an API call, the cost of cloning |
Yes.
Yes, but the API request is async, while clones are sync, and eat cpu time.
What complexity to the public API? Try it yourself, it doesn't change anything much. It's just an enum wrapper of borrowed/owned string. How is it hard to use?
Basically anything is negligible compared to network I/O. Does it mean that any optimizations are useless?
I clone it hundreds of times in a tight loop. Does it mean that I must struggle? Or just buy a better cpu? 😅 |
|
Or maybe you want to separate the request models from the response models? Because response models contain only owned Or another idea is to create the second model, but with all of the fields as owned types ( |
|
I think the following logic might be possible: preprocess openapi.yml to mark models used as input, then using that mark like |
b34654a to
b296ceb
Compare
b296ceb to
e236905
Compare
|
Optimizations are good as long as it overweigh the trade-offs. I found a post that explain this pretty well. The trade-offs here are adding more code/making the code less clear for some performance gain that we haven't even measured yet. I don't like the public API models to carry unnecessary phantomData and lifetimes because it's more clean that way.
Well, this attempt at optimization really does make our codebase more complex.
I think we should keep the type consistent across models. Would be nice if you can open a smaller PR for the internal derive optimization first. This big PR are slowing us down, right now I'm working on writing the documentation for the derive. |
a7d20be to
a47d6c6
Compare
bc3e2c6 to
bd3bfc9
Compare
|
I know that
But it is a trivially clear optimization! And Premature optimization is the practice of trying to improve a program's performance before it's finished. |
bd3bfc9 to
6ea7c90
Compare
6ea7c90 to
c195860
Compare
fc86156 to
351f164
Compare
351f164 to
2728a6c
Compare
|
I have removed all of the Code generation leaves some extra lifetimes (not a lot), which I have removed manually. After the merge, I will open an issue for it. Currently don't have extra time to fix it. |
a22f31a to
c272f23
Compare
5bf1a13 to
12af80b
Compare
|
Will look into this later 👍 |
12af80b to
563e202
Compare
|
Ok, it looks much cleaner. I see that we are using I just ran some benchmark on this and the use criterion::{Criterion, black_box, criterion_group, criterion_main};
use std::borrow::Cow;
#[derive(Clone, Debug)]
pub struct SearchParametersCow<'a> {
pub q: Option<Cow<'a, str>>,
pub query_by: Option<Cow<'a, str>>,
pub prefix: Option<Cow<'a, str>>,
pub filter_by: Option<Cow<'a, str>>,
pub sort_by: Option<Cow<'a, str>>,
pub facet_by: Option<Cow<'a, str>>,
pub include_fields: Option<Cow<'a, str>>,
pub exclude_fields: Option<Cow<'a, str>>,
pub pinned_hits: Option<Cow<'a, str>>,
pub hidden_hits: Option<Cow<'a, str>>,
pub override_tags: Option<Cow<'a, str>>,
pub highlight_fields: Option<Cow<'a, str>>,
pub preset: Option<Cow<'a, str>>,
}
#[derive(Clone, Debug)]
pub struct SearchParametersString {
pub q: Option<String>,
pub query_by: Option<String>,
pub prefix: Option<String>,
pub filter_by: Option<String>,
pub sort_by: Option<String>,
pub facet_by: Option<String>,
pub include_fields: Option<String>,
pub exclude_fields: Option<String>,
pub pinned_hits: Option<String>,
pub hidden_hits: Option<String>,
pub override_tags: Option<String>,
pub highlight_fields: Option<String>,
pub preset: Option<String>,
}
fn make_cow() -> SearchParametersCow<'static> {
SearchParametersCow {
q: Some(Cow::from("search query")),
query_by: Some(Cow::from("field1,field2,field3")),
prefix: Some(Cow::from("pre")),
filter_by: Some(Cow::from("filter:val")),
sort_by: Some(Cow::from("score:desc")),
facet_by: Some(Cow::from("category")),
include_fields: Some(Cow::from("id,name")),
exclude_fields: Some(Cow::from("debug_info")),
pinned_hits: Some(Cow::from("123:1,456:2")),
hidden_hits: Some(Cow::from("789,101")),
override_tags: Some(Cow::from("tag1,tag2")),
highlight_fields: Some(Cow::from("description")),
preset: Some(Cow::from("default")),
}
}
fn make_string() -> SearchParametersString {
SearchParametersString {
q: Some("search query".to_owned()),
query_by: Some("field1,field2,field3".to_owned()),
prefix: Some("pre".to_owned()),
filter_by: Some("filter:val".to_owned()),
sort_by: Some("score:desc".to_owned()),
facet_by: Some("category".to_owned()),
include_fields: Some("id,name".to_owned()),
exclude_fields: Some("debug_info".to_owned()),
pinned_hits: Some("123:1,456:2".to_owned()),
hidden_hits: Some("789,101".to_owned()),
override_tags: Some("tag1,tag2".to_owned()),
highlight_fields: Some("description".to_owned()),
preset: Some("default".to_owned()),
}
}
fn bench_clone(c: &mut Criterion) {
let cow = make_cow();
let string = make_string();
c.bench_function("clone Cow<'_, str>", |b| b.iter(|| black_box(cow.clone())));
c.bench_function("clone String", |b| b.iter(|| black_box(string.clone())));
}
criterion_group!(benches, bench_clone);
criterion_main!(benches);Result
Could you explain a bit on your use case and why are you cloning it hundred of times? And are there any other use cases where the user needs to clone input models other than My thought is to only use |
No, we already have the preprocessing script. Let the other structs also have better speeds. I think it's a good thing, don't hurt anybody.
It's fixable, I just don't have time for it. It just have to check recursively that the inner types have at least one string type. It's not a hard script to write.
It's a script for collecting facet counts, where an each field is excluded in loop. It gives me the correct counts. Internal facet counts gave me different numbers, which did not suit me. Maybe I will open an issue in Typesense. I'm not sure, maybe anybody already has mentioned it. I wrote that script a couple years ago, and have not checked since. |
I guess you don't understand that calling And no heap allocation happens when you use |
Ah yes, what I meant is the optimization would only be significant if the struct (or its fields) is cloned multiple times because those nanoseconds add up. I have changed my mind. If we are gonna make
I think we should fix this issue first before merging. |
That's why I have initially made every struct to use |
|
Is it this issue? It looks like recent releases may have resolved it. I think using Thanks for the discussion and I really appreciate the work you put into this @RoDmitry! |
Not that issue. That's a very old one. Overall facet counts are not suitable for me.
Personally I would expect to see
That's a very strange approach. Do you propose to avoid optimizations, even after benchmarking and getting 20x better performance? And you've decided to ignore my case with cloning in a loop, even after I have fixed everything you was against. I've heard you, good luck then. |
I just checked the Hi @morenol. Would love to hear your thoughts on using |
|
There is absolute no difference for a user whether to use use std::borrow::Cow;
trait IntoCow<'a> {
fn into_cow(self) -> Cow<'a, str>;
}
impl<'a> IntoCow<'a> for &'a str {
fn into_cow(self) -> Cow<'a, str> {
Cow::Borrowed(self)
}
}
impl IntoCow<'_> for String {
fn into_cow(self) -> Cow<'_, str> {
Cow::Owned(self)
}
}@morenol can you approve? |
Less
Stringclones -> more performance 😊Also Derive optimizations.