|
| 1 | +# Proposal: `downloads.getFileHash()` |
| 2 | + |
| 3 | +**Summary** |
| 4 | + |
| 5 | +Allow extensions to query hashes of contents of downloaded files |
| 6 | +programmatically. |
| 7 | + |
| 8 | +**Document Metadata** |
| 9 | + |
| 10 | +**Author:** @bershanskyi |
| 11 | + |
| 12 | +**Sponsoring Browser:** Chromium |
| 13 | + |
| 14 | +**Contributors:** @Rob--W |
| 15 | + |
| 16 | +**Created:** 2024-04-08 |
| 17 | + |
| 18 | +**Related Issues:** [WECG GitHub issue 401](https://github.com/w3c/webextensions/issues/401) |
| 19 | + |
| 20 | +## Motivation |
| 21 | + |
| 22 | +### Objective |
| 23 | + |
| 24 | +This API will enable extensions to privately and efficiently scan user |
| 25 | +downloads for "Known Unsafe" contents and validate the integrity of "Known Safe" |
| 26 | +files without exposing the raw content of the files. |
| 27 | + |
| 28 | +#### Use Cases |
| 29 | + |
| 30 | +##### Filtering for unwanted downloads |
| 31 | + |
| 32 | +Many antivirus products detect previously seen "safe" and "unsafe" files by |
| 33 | +computing a file content hash and checking it against a set of known hashes. |
| 34 | +These checks require minimal effort on the client side, yet are powerful |
| 35 | +enough to limit the spread of known malware downloads since effectively every |
| 36 | +client gets "inoculated" against a particular download as soon as a single |
| 37 | +antivirus detects a malicious file. |
| 38 | +User-Agents may store file hashes even after the original file is deleted to |
| 39 | +facilitate tracing of malware exposure even if the file is deemed to be |
| 40 | +dangerous after the user initially downloads it. |
| 41 | + |
| 42 | +##### Download integrity assurance |
| 43 | + |
| 44 | +Files can get damaged while they are shared on a file hosting server or NAS. |
| 45 | +While some file formats explicitly support integrity checks like embedded |
| 46 | +checksums or even cryptographic signatures, other files do not. This API |
| 47 | +would allow file managers or extensions to programmatically check the |
| 48 | +integrity of downloaded files if the file source elects to provide them. |
| 49 | + |
| 50 | +### Known Consumers |
| 51 | + |
| 52 | +None at the moment. |
| 53 | + |
| 54 | +## Specification |
| 55 | + |
| 56 | +### Schema |
| 57 | + |
| 58 | + |
| 59 | +``` idl |
| 60 | +namespace downloads { |
| 61 | + callback GetFileHashCallback = void(DOMString hash); |
| 62 | +
|
| 63 | + [supportsPromises, permissions="downloads.getFileHash"] |
| 64 | + static void getFileHash( |
| 65 | + long downloadId, |
| 66 | + [legalValues=("SHA-256")] DOMString algorithm, |
| 67 | + GetFileHashCallback callback); |
| 68 | +} |
| 69 | +``` |
| 70 | + |
| 71 | +``` idl |
| 72 | +namespace downloads { |
| 73 | + enum FILE_HASH_ALGO { |
| 74 | + "SHA-256" |
| 75 | + } |
| 76 | +
|
| 77 | + getFileHash: (downloadId: number, algorithm?: downloads.FILE_HASH_ALGO) |
| 78 | + => Promise<string>; |
| 79 | +} |
| 80 | +``` |
| 81 | + |
| 82 | +### New Permissions |
| 83 | + |
| 84 | +| Permission Added | Suggested Warning | |
| 85 | +| ------------------------- | ----------------- | |
| 86 | +| `"downloads.getFileHash"` | No warning | |
| 87 | + |
| 88 | +The new `"downloads.getFileHash"` permission is optional and is honored only if |
| 89 | +extension has `"downloads"` permission as well, and the extension has host |
| 90 | +access for `DownloadItem.finalUrl`. |
| 91 | + |
| 92 | +### Manifest File Changes |
| 93 | + |
| 94 | +No changes besides the new optional permission `"downloads.getFileHash"`. |
| 95 | + |
| 96 | +## Security and Privacy |
| 97 | + |
| 98 | +### Exposed Sensitive Data |
| 99 | + |
| 100 | +Hashes (cryptographic) are considered one-way operations, but they can easily |
| 101 | +be reversed to the original input when the number of possible inputs is |
| 102 | +sufficiently small. So extension may be able to infer the raw contents of the |
| 103 | +file if the possible space of file contents is small. |
| 104 | +A malicious extension could: |
| 105 | + - Observe hashes of downloaded files and send them to a home server |
| 106 | + - Initiate a download with an arbitrary URL and obtain a hash of the response |
| 107 | + - Initiate a download for a ranged HTTP request and obtain a hash of the |
| 108 | + corresponding portion of the response |
| 109 | + |
| 110 | +### Abuse Mitigations |
| 111 | + |
| 112 | +The API method will require a silent permission and host access to the |
| 113 | +`DownloadItem.finalUrl`. |
| 114 | + |
| 115 | +### Additional Security Considerations |
| 116 | + |
| 117 | +User Agents must not provide obsolete hash algorithms (like MD5 and SHA-1). |
| 118 | +User Agents must expose only hashes of complete files and never provide |
| 119 | +partial hash results while the file is being downloaded. This is necessary to |
| 120 | +prevent malicious or compromised extensions from calculating incremental hashes |
| 121 | +and constructing a "hash oracle" to infer file contents one byte (or just a few |
| 122 | +bytes) at a time. |
| 123 | + |
| 124 | +## Alternatives |
| 125 | + |
| 126 | +### Existing Workarounds |
| 127 | + |
| 128 | +As of writing, extensions can calculate the digest of a download by |
| 129 | +implementing a native host component that would read the desired file from the |
| 130 | +disc directly. This approach is a significant elevation of privileges (requires |
| 131 | +unsandboxed application), largely expands the complexity of implementing such |
| 132 | +an extension (since the extension has to listen for download completion, notify |
| 133 | +the native component, and ensure that the file does not change or move before |
| 134 | +the hash is calculated) and raises friction for users (due to the complexity of |
| 135 | +installation). Additionally, calculating file content hash separately is |
| 136 | +slightly less efficient since it requires reading it back into memory. |
| 137 | + |
| 138 | +### Open Web API |
| 139 | + |
| 140 | +There are no equivalent APIs on the Open Web. Open Web has the concept of |
| 141 | +subresource integrity, but it applies only to page resources like images and |
| 142 | +other media and does not apply to downloaded files. |
| 143 | + |
| 144 | +## Implementation Notes |
| 145 | + |
| 146 | + 1. Hash algorithm descriptor matching must be case-sensitive (which differs |
| 147 | + from some other Open Web APIs). |
| 148 | + 2. User Agent may throw an error upon a call to `downloads.getFileHash()` |
| 149 | + if the required hash is not available. User Agent may compute the hash on |
| 150 | + the fly if the original file is still available. |
| 151 | + 3. User Agents must not support obsolete hash algorithms, see "Additional |
| 152 | + Security Considerations" for more details. |
| 153 | + 4. User Agents must encode (binary) hash digest into a hex string, using |
| 154 | + lower-case letters. |
| 155 | + |
| 156 | +## Future Work |
| 157 | + |
| 158 | +In theory, some use cases could be served via a declarative API. For example, |
| 159 | +an antivirus extension could provide a set of hashes of known bad content for |
| 160 | +the browser to use directly. However, the development of such a schema is |
| 161 | +outside of the scope of this proposal. |
0 commit comments