-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HAPI robots identity and behavior #148
Comments
A similar problem is identifying requests made by a testing system, rather than a human. These requests ought not be included in usage studies, either. Have you ever seen a way of identifying these, Bernie? |
We could add optional keywords (as a recommendation, not to the HAPI standard 3.x), that most servers will ignore, such as 'mode=testing'. That would show up in web logs. If we suggest a standard practice, that should suffice. If we go with Bernie's request to alter the user agent for testing as with bots, we'll have to modify the python client (at the very least) to allow for that in **opts, as I don't see that as currently a variable. Plus often I am testing using cut-and-paste into a web browser (such as when a script shows an error, and I want to investigate deeper). Adding 'mode=testing' is easy to comply with. |
If your test server has a fixed ip address, we can filter that out. I think that is what we do now for Jeremy's tests. But that isn't ideal and these days, more requests come from dynamically assigned AWS or Azure addresses. So a unique UA or mode parameter would be better. |
My hourly scan of all servers has the User-Agent: |
Just needs a doc update in the spec document, and it's not about server changes, just client convention. Statement like this in the appendix: |
See https://github.com/hapi-server/data-specification/blob/hapi-robots-statement-bug-148/hapi-dev/HAPI-data-access-spec-dev.md#85-robot-clients-should-identify-themselves which is a draft describing this in the appendix. |
See also https://github.com/hapi-server/data-specification/wiki/hapi-bots.md which is a list of rebots. |
If we want a whole new request parameter for Also, since current servers must reject any non-recognized request parameters, this would cause current implementations to throw an error. The discussion about if we need a change to the request interface to support this (some think yes, others no) can happen on that ticket. For now, having updated docs to describe the USER-AGENT approach is good and almost done (it needs a few more tweaks, but is mostly ready). |
Jeremy's branch has near-final text. Need to mention "bot" and not reference tickets. |
I will make a new branch starting with this ticket, and move my changes over. The old branch is trivial and will be deleted. |
HAPI robots should be easily identifiable and well behaved. That is, the HTTP User-Agent value should have "bot" in the name and a URL to a page containing more information about the bot (including a contact). For example,
For this issue, the significant parts of the above examples are
The identity is helpful to the hapi server hosts for excluding bot requests from usage reports.
The bot should also respect robots.txt and all standard HTTP rate-limiting mechanisms (e.g., 429 response, Try-After header, etc.).
This may be related to issue #135 if the pinging is repetitive.
Currently, @rweigel , @jbfaden , and @sandyfreelance are known to be operating hapi bots.
The text was updated successfully, but these errors were encountered: