Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Select::parse fails due to borrow the css query #188

Open
latot opened this issue Jul 22, 2024 · 26 comments · May be fixed by #203
Open

Select::parse fails due to borrow the css query #188

latot opened this issue Jul 22, 2024 · 26 comments · May be fixed by #203
Labels
C-feature-request Category: feature request C-help-wanted Category: help wanted

Comments

@latot
Copy link

latot commented Jul 22, 2024

Hi, I has been using this library from a while, has been great.

Here a example:

fn a(document: &Html) -> Result<(), Box<dyn Error>> {
    let query = format!("div table thead tr:nth-child({}) th", 3);
    let selector = Selector::parse(&query)?;
    Ok(())
}

This function will fails, the reason is pretty simple, the function owns the query, but the error triggered by parse seems to have borrowed the string, so it can't return an error when the function owns the data.

The workaround right now is unwrap instead use error propagation.

There is some ways to handle this, one would be a parse with String which owns the data, other one is instead use a borrowed string for the error, convert &str to String and use that instead.

Thx!

@adamreichold
Copy link
Member

You can continue to use error propagation, you just need to downgrade the to a string, e.g.

let selector = Selector::parse(&query).map_err(|err| err.to_string())?;

but note that for selectors known at compile-time, .unwrap() is somewhat reasonable as failing to parse such a selector is a logic error instead of a runtime error. (For your example of a dynamically generated selector, one would have to consider how much leeway there is, i.e. if you really only interpolate an u8 into the nth-child part, then .unwrap() might still be reasonable as calling code should be unable to produce an invalid selector by choosing the value of the u8.)

@latot
Copy link
Author

latot commented Jul 23, 2024

Hi! thx for the answer.

Well, is not ideal change the error type for the error propagation, would be good be able to return the actual one.

unwrap, I think it would depends on the type of app you are constructing, if is a one that only you will use, is more reasonable to panic, just for a user end product we want to handle the error and do not to panic.

@adamreichold
Copy link
Member

adamreichold commented Jul 23, 2024

unwrap, I think it would depends on the type of app you are constructing, if is a one that only you will use, is more reasonable to panic, just for a user end product we want to handle the error and do not to panic.

This is not the distinction I am making: A CSS selector that is fixed when the program is built is not a runtime error but a construction error, i.e. a panic (that should never happen once the program is shipped) is entirely appropriate.

(Think of it this way, we could with sufficient effort create a proc macro which parses such a CSS selector during compilation and embeds some already validated form of it into the program so that no parsing at all needs to happen when it runs. Alas, just using .unwrap() is a convenient way to get almost the same result with very much less implementation effort.)

@latot
Copy link
Author

latot commented Jul 23, 2024

I agree, this is because there is at least two types of CSS selectors.

I one hand, the ones that can be checked at compile time, here unwrap is good, and the best solution is a macro as you says, this could be a nice feature :)

The next one is more complex css selectors, where the string is created by some algorithm, which is impossible to be checked by at compile time, can only be checked at runtime, there the error must be propagated, ideally the original not, not a string for a good error handling.

@LoZack19 LoZack19 added the C-help-wanted Category: help wanted label Aug 26, 2024
@LoZack19
Copy link
Contributor

LoZack19 commented Aug 26, 2024

The idea of a macro is indeed interesting, I will leave this issue open in order to remember it. @cfvescovo

@LoZack19 LoZack19 added the C-feature-request Category: feature request label Aug 26, 2024
@cfvescovo
Copy link
Member

Take a look at #53

@cfvescovo
Copy link
Member

Since we had more requests for a macro implementation, maybe it's now time to add it to scraper

@LoZack19
Copy link
Contributor

#53 is interesting because it explores the idea of a proc-macro. I think that, if it worked, it could be a cleaner solution than what we have right now. In that discussion, though, concerns had been raised about the necessity of implementing such a macro

@cfvescovo
Copy link
Member

#53 is interesting because it explores the idea of a proc-macro. I think that, if it worked, it could be a cleaner solution than what we have right now. In that discussion, though, concerns had been raised about the necessity of implementing such a macro

At the time, I chose to close #53 because another crate was actively being developed to address the issue. Now the repo of that crate is inactive.

I think we should provide an "official" implementation so that we can maintain and improve it over time. Moreover, it could be useful for beginners who don't want to roll out their own macro

@LoZack19
Copy link
Contributor

LoZack19 commented Aug 28, 2024

This claude-generated solution looks very clean and idiomatic to me, even much better than the one provided in the crate scraper-proc-macro

use proc_macro::TokenStream;
use quote::quote;
use syn::{parse_macro_input, LitStr};

#[proc_macro]
pub fn selector(input: TokenStream) -> TokenStream {
    let selector = parse_macro_input!(input as LitStr).value();
    
    match scraper::Selector::parse(&selector) {
        Ok(_) => quote!(
            ::scraper::Selector::parse(#selector).unwrap()
        ).into(),
        Err(e) => syn::Error::new(
            proc_macro2::Span::call_site(),
            format!("Failed to parse CSS selector: {}", e)
        ).to_compile_error().into(),
    }
}

@LoZack19
Copy link
Contributor

LoZack19 commented Aug 28, 2024

If integrated, would you put it inside the selector module or in a dedicated macro module? It could also be integrated as an additional macro feature

@teymour-aldridge
Copy link
Collaborator

If integrated, would you put it inside the selector module or in a dedicated macro module? It could also be integrated as an additional macro feature

I imagine it would sit behind a feature flag in a module? The implementation would probably be in a separate crate.

@LoZack19
Copy link
Contributor

I was reading that a proc_macro needs to live necessarely in a separate crate, so yes, this is true, it would be quite invasive as an addition though, but yet useful

@LoZack19
Copy link
Contributor

In order to avoid cyclic crate dependencies it would be better to ship scraper_proc_macros separately. Otherwise scraper would have to depend on it, which is not possible. Correct me if I'm wrong.

@teymour-aldridge
Copy link
Collaborator

In order to avoid cyclic crate dependencies it would be better to ship scraper_proc_macros separately. Otherwise scraper would have to depend on it, which is not possible. Correct me if I'm wrong.

It is possible, it will just require scraper_proc_macros to be published before scraper when doing releases in future?

@adamreichold
Copy link
Member

This claude-generated solution looks very clean and idiomatic to me, even much better than the one provided in the crate scraper-proc-macro

I am not sure if I find that proc macro particularly helpful as while it does verify at compile time, it will just compile the selector again at runtime. If the dependency tree is sufficiently messy, it could even be that the verification at runtime uses a different version of the scraper crate and compilation at runtime still fails (admittedly a somewhat unlikely scenario).

I think what we should aim for is a proc macro that actually outputs a representation (maybe via serialization) of the already parsed selector so that work is not repeated at runtime.

I also think the span in the error case should tightened to indicate only the literal, not the whole macro invocation though.

@cfvescovo
Copy link
Member

The macro should generate a selector representation directly, without requiring parsing at run time.
Moreover, while it's true that the basic approach would require cyclic dependencies - which, as you now, are not allowed - the issue can be solved by writing three crates: selectors_core, selectors_macros (which will depend on core) and selectors, which will depend on both. Obviously, as this sounds a bit overcomplicated, feel free to suggest more straightforward approaches

@cfvescovo
Copy link
Member

cfvescovo commented Aug 28, 2024

A simpler solution would be writing our macros in a different crate and let our users depend on this new crate in order to use them. However, I would prefer exposing our new macros directly from scraper behind a feature flag

@LoZack19
Copy link
Contributor

LoZack19 commented Aug 28, 2024

The macro should generate a selector representation directly, without requiring parsing at run time.

I confirm that the macro provided by Claude executes the parser at runtime, but it then re-executes it at compile time, so the concern that @adamreichold is raising is real, even though I personally thing that such a scenario would be highly unlikely, and it would still be an improvement over not having the macro at all.

@LoZack19
Copy link
Contributor

LoZack19 commented Aug 28, 2024

The selectors_core, selectors_macros solution would eliminate the version coherence problem between the compile time and the runtime versions of Selector::parse(). Therefore I would prefer this solution over the one adopting indipendent crates.

@latot
Copy link
Author

latot commented Aug 28, 2024

I still does not know much about macros, but it should only accept strings, not variables as inputs, or will cause the error posted above (first post).

Maybe after this, create a issue for css selectors from variables.

@LoZack19
Copy link
Contributor

Unfortunately, since SelectorsList does not implement ToTokens, we cannot reuse the result computed by parse at complite time. We would need to submit a PR to the maintainers of the selectors crate, for this feature to be implemented. As of now, I would still like to have the proc macro which recomputes the selector at runtime, but if we choose not to do that, I would close this issue.

We could also probably implement the parse logic inside a normal macro, but that is beyong the scope of this crate.

@adamreichold
Copy link
Member

I think setting up the crate structure and using the runtime-based implementation would be fine until we can modify selectors to support an embeddable representation.

@teymour-aldridge
Copy link
Collaborator

To be honest I'm not 100% sure what benefit this macro confers? The syntax of CSS selectors is relatively straightforward and I think most people have to test their selectors anyway (to ensure they're picking thet right elements), something which this doesn't help.

Perhaps a builder might work better here?

@adamreichold
Copy link
Member

The syntax of CSS selectors is relatively straightforward and I think most people have to test their selectors anyway (to ensure they're picking thet right elements), something which this doesn't help.

I agree that most problems we see during actual usage are not due to incorrect selector syntax but rather selectors not working as intended against the target DOM. The feedback for invalid syntax from a compile-time error is slightly faster though.

The macro (in its final form, not the intermediate check-at-compile-time-then-parse-again-at-runtime variant) would additionally provide efficiency gains insofar selectors would not have to parsed at runtime at all.

@latot
Copy link
Author

latot commented Aug 29, 2024

There is some advantages.

  1. The actual Selector works internally with &str which blocks the error propagation, if a macro just uses a static &str would helps with that.
  2. We need to unwrap the selector to know is constructed well (without dynamic ones), this could be a lot simpler with a macro where at compile time we know it is right or not which is great.
  3. The macro could helps to split selectors in two, static ones and dynamic ones, well still if the selector uses String instead of &str for errors could be used one instead of two.

While more clear, better :3 at least is what I think.

@LoZack19 LoZack19 linked a pull request Aug 30, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-feature-request Category: feature request C-help-wanted Category: help wanted
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants