Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HAPI robots identity and behavior #148

Closed
berniegsfc opened this issue Aug 26, 2022 · 10 comments · Fixed by #174
Closed

HAPI robots identity and behavior #148

berniegsfc opened this issue Aug 26, 2022 · 10 comments · Fixed by #174

Comments

@berniegsfc
Copy link
Contributor

HAPI robots should be easily identifiable and well behaved. That is, the HTTP User-Agent value should have "bot" in the name and a URL to a page containing more information about the bot (including a contact). For example,

Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.84 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_5) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1.1 Safari/605.1.15 (Applebot/0.1; +http://www.apple.com/go/applebot)

For this issue, the significant parts of the above examples are

Googlebot/2.1; +http://www.google.com/bot.html)
(Applebot/0.1; +http://www.apple.com/go/applebot)

The identity is helpful to the hapi server hosts for excluding bot requests from usage reports.
The bot should also respect robots.txt and all standard HTTP rate-limiting mechanisms (e.g., 429 response, Try-After header, etc.).
This may be related to issue #135 if the pinging is repetitive.
Currently, @rweigel , @jbfaden , and @sandyfreelance are known to be operating hapi bots.

@jbfaden
Copy link
Contributor

jbfaden commented Aug 26, 2022

A similar problem is identifying requests made by a testing system, rather than a human. These requests ought not be included in usage studies, either. Have you ever seen a way of identifying these, Bernie?

@sandyfreelance
Copy link
Contributor

sandyfreelance commented Aug 26, 2022

We could add optional keywords (as a recommendation, not to the HAPI standard 3.x), that most servers will ignore, such as 'mode=testing'. That would show up in web logs. If we suggest a standard practice, that should suffice.

If we go with Bernie's request to alter the user agent for testing as with bots, we'll have to modify the python client (at the very least) to allow for that in **opts, as I don't see that as currently a variable. Plus often I am testing using cut-and-paste into a web browser (such as when a script shows an error, and I want to investigate deeper). Adding 'mode=testing' is easy to comply with.

@berniegsfc
Copy link
Contributor Author

If your test server has a fixed ip address, we can filter that out. I think that is what we do now for Jeremy's tests. But that isn't ideal and these days, more requests come from dynamically assigned AWS or Azure addresses. So a unique UA or mode parameter would be better.

@jbfaden
Copy link
Contributor

jbfaden commented Oct 4, 2022

My hourly scan of all servers has the User-Agent:
hapibot-a/1.0; https://github.com/hapi-server/data-specification/wiki/hapi-bots.md#hapibot-a

@jvandegriff
Copy link
Collaborator

Just needs a doc update in the spec document, and it's not about server changes, just client convention.

Statement like this in the appendix:
"If you have a regularly running HAPI client that collects info from HAPI servers, then the client should identify itself as n automated agent."

@jbfaden
Copy link
Contributor

jbfaden commented Jan 23, 2023

@jbfaden
Copy link
Contributor

jbfaden commented Jan 23, 2023

See also https://github.com/hapi-server/data-specification/wiki/hapi-bots.md which is a list of rebots.

@jvandegriff
Copy link
Collaborator

If we want a whole new request parameter for mode=testing, that needs to be a separate ticket for a 4.0 milestone, since we are changing the main request interface!

Also, since current servers must reject any non-recognized request parameters, this would cause current implementations to throw an error.

The discussion about if we need a change to the request interface to support this (some think yes, others no) can happen on that ticket.

For now, having updated docs to describe the USER-AGENT approach is good and almost done (it needs a few more tweaks, but is mostly ready).

@jvandegriff
Copy link
Collaborator

Jeremy's branch has near-final text. Need to mention "bot" and not reference tickets.

@jbfaden
Copy link
Contributor

jbfaden commented Apr 10, 2023

I will make a new branch starting with this ticket, and move my changes over. The old branch is trivial and will be deleted.

@jbfaden jbfaden linked a pull request Apr 28, 2023 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants