diff options
| author | user@node5.net <user@node5.net> | 2025-08-07 14:27:09 +0200 |
|---|---|---|
| committer | user@node5.net <user@node5.net> | 2025-08-07 14:27:09 +0200 |
| commit | 56cc386fbb7ac2ceb7fedf8a5eb0e6b5f2413e7c (patch) | |
| tree | 384a1b6d87a8bb626d1491974e0b26dc4c14355a | |
| parent | 0b78c87cc9057e1b1405ad9f46e4bd27d165f7ce (diff) | |
Stop Scraping my Cgit! - Add note about LLMs
| -rw-r--r-- | Stop Scraping my Cgit!.md | 10 |
1 files changed, 9 insertions, 1 deletions
diff --git a/Stop Scraping my Cgit!.md b/Stop Scraping my Cgit!.md index 5df584b..528f730 100644 --- a/Stop Scraping my Cgit!.md +++ b/Stop Scraping my Cgit!.md @@ -49,7 +49,7 @@ You can get all the IPv4 ranges. ### Nginx -You can impot a file e.g. under the server block with: `include /etc/nginx/sites-available/blocklist.conf;` +You can import a file e.g. under the server block with: `include /etc/nginx/sites-available/blocklist.conf;` blocklist.conf: ``` @@ -97,3 +97,11 @@ sudo nft list ruleset | grep '@blocklist' ip saddr @blocklist counter packets 29 bytes 1732 drop ``` +--- + +## Git commits = LLM training data + +On a side note i think LLM companies are scraping or are going to scrape git repos heavily, +since a good git commit basically works as a recipe on how to complete an isolated task, +so long as they're able to rank the input data quality, as the model is only as good as the input data, +and there's a lot of noise in a lot of the data. |
