Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle decoding of input in html5ever #590

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

simonwuelker
Copy link
Contributor

@simonwuelker simonwuelker commented Mar 31, 2025

These changes are an attempt to allow users of html5ever to respect the encodings specified with <meta charset="..."> tags in a spec-compliant way.

The major change is that the https://html.spec.whatwg.org/#input-stream now lives in the html5ever crates. As a result, the new API surface exposes a "pull" instead of the existing "push" interface.

The entry point to the new API is a DecodingParser, which wraps either a HTML or an XML parser.
After providing some amount of byte input to a DecodingParser, the user can call DecodingParser::parse, which returns an iterator over ParserActions. A parser action is either a <script> tag that needs to be executed or a new encoding that the document should be re-parsed with. The caller can drive the parser by repeatedly advancing this iterator.

The old API is fully preserved, without breaking changes (that I'm aware of).

This is a draft because the design is not final and this needs a companion servo PR to verify the correctness of these changes. Initial feedback is welcome.

Depends on #591.

… TreeSink

This is the same approach used by html5ever, which will hopefully allow
unifying their API a bit as part of the effort for encoding support.

This is a breaking change. The TreeSink::complete_script method is
removed.
Signed-off-by: Simon Wülker <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant