Skip to content

Commit 4324cd0

Browse files
Merge branch 'main' into cel-remoteAddrInList
Signed-off-by: Jason Cameron <[email protected]>
2 parents 20b187b + 6e2eeb9 commit 4324cd0

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

71 files changed

+1350
-368
lines changed

.github/actions/spelling/expect.txt

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,9 +9,10 @@ anubistest
99
Applebot
1010
archlinux
1111
badregexes
12+
bdba
1213
berr
1314
bingbot
14-
Bitcoin
15+
bitcoin
1516
blogging
1617
Bluesky
1718
blueskybot
@@ -27,6 +28,7 @@ caninetools
2728
Cardyb
2829
celchecker
2930
CELPHASE
31+
cerr
3032
certresolver
3133
CGNAT
3234
cgr
@@ -100,6 +102,7 @@ Hashcash
100102
hashrate
101103
headermap
102104
healthcheck
105+
hebis
103106
hec
104107
hmc
105108
hostable
@@ -146,6 +149,7 @@ maintainership
146149
malware
147150
mcr
148151
memes
152+
metarefresh
149153
metrix
150154
mimi
151155
minica
@@ -154,6 +158,7 @@ Mojeek
154158
mojeekbot
155159
mozilla
156160
nbf
161+
netsurf
157162
nginx
158163
nobots
159164
NONINFRINGEMENT
@@ -166,6 +171,7 @@ onionservice
166171
openai
167172
openrc
168173
pag
174+
palemoon
169175
Pangu
170176
parseable
171177
passthrough
@@ -182,6 +188,7 @@ prebaked
182188
privkey
183189
promauto
184190
promhttp
191+
proofofwork
185192
pwcmd
186193
pwuser
187194
qualys
@@ -236,9 +243,11 @@ Tik
236243
Timpibot
237244
torproject
238245
traefik
246+
uberspace
239247
unixhttpd
240248
unmarshal
241249
uvx
250+
UXP
242251
Varis
243252
Velen
244253
vendored
@@ -251,6 +260,7 @@ webpage
251260
websecure
252261
websites
253262
Webzio
263+
wildbase
254264
wordpress
255265
Workaround
256266
workdir

.github/workflows/zizmor.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ jobs:
2929
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
3030

3131
- name: Upload SARIF file
32-
uses: github/codeql-action/upload-sarif@ff0a06e83cb2de871e5a09832bc6a81e7276941f # v3.28.18
32+
uses: github/codeql-action/upload-sarif@fca7ace96b7d713c7035871441bd52efbe39e27e # v3.28.19
3333
with:
3434
sarif_file: results.sarif
3535
category: zizmor

README.md

Lines changed: 27 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -14,15 +14,36 @@
1414

1515
Anubis is brought to you by sponsors and donors like:
1616

17-
[![Distrust](./docs/static/img/sponsors/distrust-logo.webp)](https://distrust.co?utm_campaign=github&utm_medium=referral&utm_content=anubis)
18-
[![Terminal Trove](./docs/static/img/sponsors/terminal-trove.webp)](https://terminaltrove.com/?utm_campaign=github&utm_medium=referral&utm_content=anubis&utm_source=abgh)
19-
[![canine.tools](./docs/static/img/sponsors/caninetools-logo.webp)](https://canine.tools?utm_campaign=github&utm_medium=referral&utm_content=anubis)
20-
[![Weblate](./docs/static/img/sponsors/weblate-logo.webp)](https://weblate.org/)
21-
[![Uberspace](./docs/static/img/sponsors/uberspace-logo.webp)](https://uberspace.de/)
17+
### Diamond Tier
18+
19+
<a href="https://www.raptorcs.com/content/base/products.html">
20+
<img src="./docs/static/img/sponsors/raptor-computing-logo.webp" alt="Raptor Computing Systems" height=64 />
21+
</a>
22+
23+
### Gold Tier
24+
25+
<a href="https://distrust.co?utm_campaign=github&utm_medium=referral&utm_content=anubis">
26+
<img src="./docs/static/img/sponsors/distrust-logo.webp" alt="Distrust" height="64">
27+
</a>
28+
<a href="https://terminaltrove.com/?utm_campaign=github&utm_medium=referral&utm_content=anubis&utm_source=abgh">
29+
<img src="./docs/static/img/sponsors/terminal-trove.webp" alt="Terminal Trove" height="64">
30+
</a>
31+
<a href="https://canine.tools?utm_campaign=github&utm_medium=referral&utm_content=anubis">
32+
<img src="./docs/static/img/sponsors/caninetools-logo.webp" alt="canine.tools" height="64">
33+
</a>
34+
<a href="https://weblate.org/">
35+
<img src="./docs/static/img/sponsors/weblate-logo.webp" alt="Weblate" height="64">
36+
</a>
37+
<a href="https://uberspace.de/">
38+
<img src="./docs/static/img/sponsors/uberspace-logo.webp" alt="Uberspace" height="64">
39+
</a>
40+
<a href="https://wildbase.xyz/">
41+
<img src="./docs/static/img/sponsors/wildbase-logo.webp" alt="Wildbase" height="64">
42+
</a>
2243

2344
## Overview
2445

25-
Anubis [weighs the soul of your connection](https://en.wikipedia.org/wiki/Weighing_of_souls) using a proof-of-work challenge in order to protect upstream resources from scraper bots.
46+
Anubis is a Web AI Firewall Utility that [weighs the soul of your connection](https://en.wikipedia.org/wiki/Weighing_of_souls) using one or more challenges in order to protect upstream resources from scraper bots.
2647

2748
This program is designed to help protect the small internet from the endless storm of requests that flood in from AI companies. Anubis is as lightweight as possible to ensure that everyone can afford to protect the communities closest to them.
2849

cmd/anubis/main.go

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -68,6 +68,7 @@ var (
6868
extractResources = flag.String("extract-resources", "", "if set, extract the static resources to the specified folder")
6969
webmasterEmail = flag.String("webmaster-email", "", "if set, displays webmaster's email on the reject page for appeals")
7070
versionFlag = flag.Bool("version", false, "print Anubis version")
71+
xffStripPrivate = flag.Bool("xff-strip-private", true, "if set, strip private addresses from X-Forwarded-For")
7172
)
7273

7374
func keyFromHex(value string) (ed25519.PrivateKey, error) {
@@ -336,7 +337,7 @@ func main() {
336337
h = s
337338
h = internal.RemoteXRealIP(*useRemoteAddress, *bindNetwork, h)
338339
h = internal.XForwardedForToXRealIP(h)
339-
h = internal.XForwardedForUpdate(h)
340+
h = internal.XForwardedForUpdate(*xffStripPrivate, h)
340341

341342
srv := http.Server{Handler: h, ErrorLog: internal.GetFilteredHTTPLogger()}
342343
listener, listenerUrl := setupListener(*bindNetwork, *bind)
@@ -420,11 +421,11 @@ func extractEmbedFS(fsys embed.FS, root string, destDir string) error {
420421
return os.MkdirAll(destPath, 0o700)
421422
}
422423

423-
data, err := fs.ReadFile(fsys, path)
424+
embeddedData, err := fs.ReadFile(fsys, path)
424425
if err != nil {
425426
return err
426427
}
427428

428-
return os.WriteFile(destPath, data, 0o644)
429+
return os.WriteFile(destPath, embeddedData, 0o644)
429430
})
430431
}

data/botPolicies.yaml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -55,7 +55,9 @@ bots:
5555
- name: generic-browser
5656
user_agent_regex: >-
5757
Mozilla|Opera
58-
action: CHALLENGE
58+
action: WEIGH
59+
weight:
60+
adjust: 10
5961

6062
dnsbl: false
6163

Lines changed: 23 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -1,28 +1,26 @@
11
- name: deny-aggressive-brazilian-scrapers
2-
action: DENY
2+
action: WEIGH
3+
weight:
4+
adjust: 20
35
expression:
46
any:
5-
# Internet Explorer should be out of support
6-
- userAgent.contains("MSIE")
7-
# Trident is the Internet Explorer browser engine
8-
- userAgent.contains("Trident")
9-
# Opera is a fork of chrome now
10-
- userAgent.contains("Presto")
11-
# Windows CE is discontinued
12-
- userAgent.contains("Windows CE")
13-
# Windows 95 is discontinued
14-
- userAgent.contains("Windows 95")
15-
# Windows 98 is discontinued
16-
- userAgent.contains("Windows 98")
17-
# Windows 9.x is discontinued
18-
- userAgent.contains("Win 9x")
19-
# Amazon does not have an Alexa Toolbar.
20-
- userAgent.contains("Alexa Toolbar")
21-
- name: challenge-aggressive-brazilian-scrapers
22-
action: CHALLENGE
23-
expression:
24-
any:
25-
# This is not released, even Windows 11 calls itself Windows 10
26-
- userAgent.contains("Windows NT 11.0")
27-
# iPods are not in common use
28-
- userAgent.contains("iPod")
7+
# Internet Explorer should be out of support
8+
- userAgent.contains("MSIE")
9+
# Trident is the Internet Explorer browser engine
10+
- userAgent.contains("Trident")
11+
# Opera is a fork of chrome now
12+
- userAgent.contains("Presto")
13+
# Windows CE is discontinued
14+
- userAgent.contains("Windows CE")
15+
# Windows 95 is discontinued
16+
- userAgent.contains("Windows 95")
17+
# Windows 98 is discontinued
18+
- userAgent.contains("Windows 98")
19+
# Windows 9.x is discontinued
20+
- userAgent.contains("Win 9x")
21+
# Amazon does not have an Alexa Toolbar.
22+
- userAgent.contains("Alexa Toolbar")
23+
# This is not released, even Windows 11 calls itself Windows 10
24+
- userAgent.contains("Windows NT 11.0")
25+
# iPods are not in common use
26+
- userAgent.contains("iPod")

data/bots/ai-robots-txt.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,5 +2,5 @@
22
# Note: Blocks human-directed/non-training user agents
33
- name: "ai-robots-txt"
44
user_agent_regex: >-
5-
AI2Bot|Ai2Bot-Dolma|aiHitBot|Amazonbot|anthropic-ai|Brightbot 1.0|Bytespider|CCBot|ChatGPT-User|Claude-SearchBot|Claude-User|Claude-Web|ClaudeBot|cohere-ai|cohere-training-data-crawler|Cotoyogi|Crawlspace|Diffbot|DuckAssistBot|FacebookBot|Factset_spyderbot|FirecrawlAgent|FriendlyCrawler|Google-CloudVertexBot|Google-Extended|GoogleOther|GoogleOther-Image|GoogleOther-Video|GPTBot|iaskspider/2.0|ICC-Crawler|ImagesiftBot|img2dataset|imgproxy|ISSCyberRiskCrawler|Kangaroo Bot|meta-externalagent|Meta-ExternalAgent|meta-externalfetcher|Meta-ExternalFetcher|MistralAI-User/1.0|NovaAct|OAI-SearchBot|omgili|omgilibot|Operator|PanguBot|Perplexity-User|PerplexityBot|PetalBot|QualifiedBot|Scrapy|SemrushBot-OCOB|SemrushBot-SWA|Sidetrade indexer bot|TikTokSpider|Timpibot|VelenPublicWebCrawler|Webzio-Extended|wpbot|YouBot
5+
AI2Bot|Ai2Bot-Dolma|aiHitBot|Amazonbot|Andibot|anthropic-ai|Applebot|Applebot-Extended|bedrockbot|Brightbot 1.0|Bytespider|CCBot|ChatGPT-User|Claude-SearchBot|Claude-User|Claude-Web|ClaudeBot|cohere-ai|cohere-training-data-crawler|Cotoyogi|Crawlspace|Diffbot|DuckAssistBot|FacebookBot|Factset_spyderbot|FirecrawlAgent|FriendlyCrawler|Google-CloudVertexBot|Google-Extended|GoogleOther|GoogleOther-Image|GoogleOther-Video|GPTBot|iaskspider/2.0|ICC-Crawler|ImagesiftBot|img2dataset|ISSCyberRiskCrawler|Kangaroo Bot|meta-externalagent|Meta-ExternalAgent|meta-externalfetcher|Meta-ExternalFetcher|MistralAI-User/1.0|NovaAct|OAI-SearchBot|omgili|omgilibot|Operator|PanguBot|Panscient|panscient.com|Perplexity-User|PerplexityBot|PetalBot|PhindBot|QualifiedBot|QuillBot|quillbot.com|SBIntuitionsBot|Scrapy|SemrushBot-OCOB|SemrushBot-SWA|Sidetrade indexer bot|TikTokSpider|Timpibot|VelenPublicWebCrawler|Webzio-Extended|wpbot|YandexAdditional|YandexAdditionalBot|YouBot
66
action: DENY

data/bots/cloudflare-workers.yaml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,6 @@
11
- name: cloudflare-workers
22
headers_regex:
33
CF-Worker: .*
4-
action: DENY
4+
action: WEIGH
5+
weight:
6+
adjust: 15
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
- import: (data)/clients/small-internet-browsers/netsurf.yaml
2+
- import: (data)/clients/small-internet-browsers/palemoon.yaml
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
- name: "reduce-weight-netsurf"
2+
user_agent_regex: "NetSurf"
3+
action: WEIGH
4+
weight:
5+
adjust: -5

0 commit comments

Comments
 (0)