Skip to content

Fix charset encoding of framed documents #51

@Treora

Description

@Treora

Like issue #29, but for subdocuments inside frames. As remarked here:

        get blob() { return new Blob([this.string], { type: 'text/html' }) },
        get string() {
            // TODO Add <meta charset> if absent? Or html-encode characters as needed?
            return documentOuterHTML(clonedDoc)
        },

The same applies to crawl-subresources for frames whose inner document we cannot access directly.

It seems new Blob() always utf-8-encodes given strings (mdn). I suppose we should either add <meta charset="utf-8"> to the DOM before running documentOuterHTML. Alternatively, we change the blob’s MIME type to text/html;charset=utf-8; something we could not do for the top-level document — might that be ‘cleaner’?

Problem observed in the wild.

Metadata

Metadata

Assignees

No one assigned

    Labels

    snapshot qualityImproving fidelity/size/durability/etc of the output

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions