Skip to content

Commit ace49a9

Browse files
committed
Proposal: downloads.getFileHash()
1 parent 3e866a2 commit ace49a9

File tree

1 file changed

+161
-0
lines changed

1 file changed

+161
-0
lines changed

proposals/downloads_get_file_hash.md

+161
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,161 @@
1+
# Proposal: `downloads.getFileHash()`
2+
3+
**Summary**
4+
5+
Allow extensions to query hashes of contents of downloaded files
6+
programmatically.
7+
8+
**Document Metadata**
9+
10+
**Author:** @bershanskyi
11+
12+
**Sponsoring Browser:** Chromium
13+
14+
**Contributors:** @Rob--W
15+
16+
**Created:** 2024-04-08
17+
18+
**Related Issues:** [WECG GitHub issue 401](https://github.com/w3c/webextensions/issues/401)
19+
20+
## Motivation
21+
22+
### Objective
23+
24+
This API will enable extensions to privately and efficiently scan user
25+
downloads for "Known Unsafe" contents and validate the integrity of "Known Safe"
26+
files without exposing the raw content of the files.
27+
28+
#### Use Cases
29+
30+
##### Filtering for unwanted downloads
31+
32+
Many antivirus products detect previously seen "safe" and "unsafe" files by
33+
computing a file content hash and checking it against a set of known hashes.
34+
These checks require minimal effort on the client side, yet are powerful
35+
enough to limit the spread of known malware downloads since effectively every
36+
client gets "inoculated" against a particular download as soon as a single
37+
antivirus detects a malicious file.
38+
User-Agents may store file hashes even after the original file is deleted to
39+
facilitate tracing of malware exposure even if the file is deemed to be
40+
dangerous after the user initially downloads it.
41+
42+
##### Download integrity assurance
43+
44+
Files can get damaged while they are shared on a file hosting server or NAS.
45+
While some file formats explicitly support integrity checks like embedded
46+
checksums or even cryptographic signatures, other files do not. This API
47+
would allow file managers or extensions to programmatically check the
48+
integrity of downloaded files if the file source elects to provide them.
49+
50+
### Known Consumers
51+
52+
None at the moment.
53+
54+
## Specification
55+
56+
### Schema
57+
58+
59+
``` idl
60+
namespace downloads {
61+
callback GetFileHashCallback = void(DOMString hash);
62+
63+
[supportsPromises, permissions="downloads.getFileHash"]
64+
static void getFileHash(
65+
long downloadId,
66+
[legalValues=("SHA-256")] DOMString algorithm,
67+
GetFileHashCallback callback);
68+
}
69+
```
70+
71+
``` idl
72+
namespace downloads {
73+
enum FILE_HASH_ALGO {
74+
"SHA-256"
75+
}
76+
77+
getFileHash: (downloadId: number, algorithm?: downloads.FILE_HASH_ALGO)
78+
=> Promise<string>;
79+
}
80+
```
81+
82+
### New Permissions
83+
84+
| Permission Added | Suggested Warning |
85+
| ------------------------- | ----------------- |
86+
| `"downloads.getFileHash"` | No warning |
87+
88+
The new `"downloads.getFileHash"` permission is optional and is honored only if
89+
extension has `"downloads"` permission as well, and the extension has host
90+
access for `DownloadItem.finalUrl`.
91+
92+
### Manifest File Changes
93+
94+
No changes besides the new optional permission `"downloads.getFileHash"`.
95+
96+
## Security and Privacy
97+
98+
### Exposed Sensitive Data
99+
100+
Hashes (cryptographic) are considered one-way operations, but they can easily
101+
be reversed to the original input when the number of possible inputs is
102+
sufficiently small. So extension may be able to infer the raw contents of the
103+
file if the possible space of file contents is small.
104+
A malicious extension could:
105+
- Observe hashes of downloaded files and send them to a home server
106+
- Initiate a download with an arbitrary URL and obtain a hash of the response
107+
- Initiate a download for a ranged HTTP request and obtain a hash of the
108+
corresponding portion of the response
109+
110+
### Abuse Mitigations
111+
112+
The API method will require a silent permission and host access to the
113+
`DownloadItem.finalUrl`.
114+
115+
### Additional Security Considerations
116+
117+
User Agents must not provide obsolete hash algorithms (like MD5 and SHA-1).
118+
User Agents must expose only hashes of complete files and never provide
119+
partial hash results while the file is being downloaded. This is necessary to
120+
prevent malicious or compromised extensions from calculating incremental hashes
121+
and constructing a "hash oracle" to infer file contents one byte (or just a few
122+
bytes) at a time.
123+
124+
## Alternatives
125+
126+
### Existing Workarounds
127+
128+
As of writing, extensions can calculate the digest of a download by
129+
implementing a native host component that would read the desired file from the
130+
disc directly. This approach is a significant elevation of privileges (requires
131+
unsandboxed application), largely expands the complexity of implementing such
132+
an extension (since the extension has to listen for download completion, notify
133+
the native component, and ensure that the file does not change or move before
134+
the hash is calculated) and raises friction for users (due to the complexity of
135+
installation). Additionally, calculating file content hash separately is
136+
slightly less efficient since it requires reading it back into memory.
137+
138+
### Open Web API
139+
140+
There are no equivalent APIs on the Open Web. Open Web has the concept of
141+
subresource integrity, but it applies only to page resources like images and
142+
other media and does not apply to downloaded files.
143+
144+
## Implementation Notes
145+
146+
1. Hash algorithm descriptor matching must be case-sensitive (which differs
147+
from some other Open Web APIs).
148+
2. User Agent may throw an error upon a call to `downloads.getFileHash()`
149+
if the required hash is not available. User Agent may compute the hash on
150+
the fly if the original file is still available.
151+
3. User Agents must not support obsolete hash algorithms, see "Additional
152+
Security Considerations" for more details.
153+
4. User Agents must encode (binary) hash digest into a hex string, using
154+
lower-case letters.
155+
156+
## Future Work
157+
158+
In theory, some use cases could be served via a declarative API. For example,
159+
an antivirus extension could provide a set of hashes of known bad content for
160+
the browser to use directly. However, the development of such a schema is
161+
outside of the scope of this proposal.

0 commit comments

Comments
 (0)