-
Notifications
You must be signed in to change notification settings - Fork 16
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Alp Toker
committed
Jul 14, 2017
1 parent
5f01948
commit cb3269b
Showing
1 changed file
with
25 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
# block-crawler: discovery tool for legally restricted HTTP 451 resources | ||
|
||
## Synopsis | ||
|
||
The _block-crawler_ module scans web resources in order to discover content withheld due to legal reasons using the HTTP 451 status code specified in [RFC7725](https://tools.ietf.org/html/rfc7725). | ||
|
||
## Purpose and scope | ||
|
||
Unlike other kinds of internet censorship implemented by service providers and governments, resources marked with HTTP 451 are typically blocked _at source_ — that is to say, the publisher has voluntarily complied with demands to restrict the content, either regionally or globally. | ||
|
||
_block-crawler_ intends to provide a reference implementation for RFC7725, in so far as it covers all specified features and provisions. The tool includes specialised support for the _blocked-by_ Link HTTP header field ([RFC5988](https://tools.ietf.org/html/rfc5988)) whose value is a URI reference optionally identifying the entity which is implementing the blockage. | ||
|
||
## Modes of operation | ||
|
||
This module provides a standalone commandline utility as well as developer interfaces and a REST HTTP API for integration into third-party measurement frameworks. | ||
|
||
Because HTTP 451 is typically used to 'geoblock' content, it is expected that varied results will be observed from different geographic vantage points. The output of this tool is suitable for aggregation into a larger international dataset which can reveal the global extent of corporate compliance with legal censorship orders and other kinds of localised restrictions on the flow of information online. | ||
|
||
### Data formats | ||
|
||
Results are produced in a simple streaming JSON annotation format which identifies the affected URL, observed status code and status text and optional blocking entity. A single report entity identifies a one HTTP request at a specific point in time observed from a single IP address. | ||
|
||
## Status and contributor guidelines | ||
|
||
This tool is under development and not yet recommended for use in production or as a reporting tool for transparency work. |