summary refs log tree commit diff
diff options
context:
space:
mode:
authoruser@node5.net <user@node5.net>2025-08-03 03:32:52 +0200
committeruser@node5.net <user@node5.net>2025-08-03 03:32:52 +0200
commitbb36d3195fe811fb20c25fc8184ed778df821eb0 (patch)
tree17636980ac5f7af8c0e91c566d9cae1ac9da02e1
parent93c10851318eebf547f4a77f31a3029331520b14 (diff)
New article - Stop Scraping my Cgit!
-rw-r--r--Stop Scraping my Cgit!.md99
1 files changed, 99 insertions, 0 deletions
diff --git a/Stop Scraping my Cgit!.md b/Stop Scraping my Cgit!.md
new file mode 100644
index 0000000..5df584b
--- /dev/null
+++ b/Stop Scraping my Cgit!.md
@@ -0,0 +1,99 @@
+---
+description: Blocking scapers in nftables firewall
+created: 2025-08-03
+---
+
+When you self host, and expose a git server to the internet, you'll find your access log filled with scraperrs.
+Hence I've had the following in my git nginx config to ask the bots to kindly fuck off.
+This stops a lot of bots, who respect this
+
+```nginx
+        location /robots.txt {
+                return 200 "User-agent: * # match all bots
+Disallow: / # keep them out";
+        }
+```
+
+Albeit after reading the blog post 
+[Stop Scraping my Git Forge! - notashelf.dev](https://notashelf.dev/posts/stop-scraping-my-forge)
+i thought let's take another look, and would you look at that lot's entries like the following:
+
+```
+47.79.213.166 - - [03/Aug/2025:02:12:22 +0200] "GET /firmware/sonix-qmk/diff/keyboards/qwertyydox/keymaps/default?id=e7cc5a35c2b80d081207db940777b7537d30a5cd&id2=9808bfaf2616afbe837873d962bc214be3705f90 HTTP/1.1" 403 186 "https://www.google.com/" "Mozilla/5.0 (Linux; Android 10; K) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/133.0.0.0 Mobile Safari/537.36"
+101.44.71.209 - - [03/Aug/2025:02:12:26 +0200] "GET /firmware/qmk/commit/keyboards/handwired/k_numpad17/config.h?id=1eb70be4579e3888ea665fec5706b03eac3d2b3e HTTP/2.0" 403 175 "https://git.node5.net/firmware/qmk/commit/keyboards/handwired/k_numpad17/config.h?id=1eb70be4579e3888ea665fec5706b03eac3d2b3e" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36"
+
+```
+
+<pre>
+$ whois <span style="font-weight:bold;color:lightblue;">101.44.71.209</span> | grep <span style="font-weight:bold;color:lightblue;">netname</span>
+<span style="font-weight:bold;color:darkred;">OrgName</span>:        Huawei-Cloud-HK
+</pre>
+
+<pre>
+$ whois <span style="font-weight:bold;color:lightblue;">47.79.213.166</span> | grep <span style="font-weight:bold;color:lightblue;">Organization</span>
+<span style="font-weight:bold;color:darkred;">OrgName</span>:        Alibaba Cloud LLC (AL-3)
+</pre>
+
+If you look up the IP(s) on [bgp.he.net](https://bgp.he.net/) you can find all associated IP prefixes
+If you copy the text of this page to a text file and grep with this pattern: [source](https://www.shellhacks.com/regex-find-ip-addresses-file-grep/)
+
+```sh
+grep -E -o "([0-9]{1,3}[\.]){3}[0-9]{1,3}.*$"
+```
+
+You can get all the IPv4 ranges.
+
+---
+
+## Blocking
+
+### Nginx
+
+You can impot a file e.g. under the server block with: `include /etc/nginx/sites-available/blocklist.conf;`
+
+blocklist.conf:
+```
+# AS136907 HUAWEI CLOUDS
+deny 1.178.32.0/20;
+deny 1.178.48.0/20;
+...
+```
+This however will still fill your access logs...
+
+
+### Nftables
+
+Even better you can block these IPs entirely with NFTables
+
+In `/etc/nftables.conf` add the following: [source](https://unix.stackexchange.com/questions/329971/nftables-ip-set-multiple-tables)
+
+```nft
+include "nftables_blocklist.conf"
+
+table inet filter {
+
+        set blocklist {
+                type ipv4_addr; flags interval;
+                auto-merge
+                elements = $blocklist
+        }
+
+        chain input_world {
+                ip saddr @blocklist counter drop
+...
+```
+
+nftables_blocklist.conf
+```nft
+define blocklist = {
+        1.178.32.0/20, # AS136907 HUAWEI CLOUDS
+        1.178.48.0/20,
+...
+
+```
+
+```commandline
+sudo nft list ruleset | grep '@blocklist'
+		ip saddr @blocklist counter packets 29 bytes 1732 drop
+```
+