diff options
| -rw-r--r-- | Stop Scraping my Cgit!.md | 10 |
1 files changed, 9 insertions, 1 deletions
diff --git a/Stop Scraping my Cgit!.md b/Stop Scraping my Cgit!.md index 5df584b..528f730 100644 --- a/Stop Scraping my Cgit!.md +++ b/Stop Scraping my Cgit!.md @@ -49,7 +49,7 @@ You can get all the IPv4 ranges. ### Nginx -You can impot a file e.g. under the server block with: `include /etc/nginx/sites-available/blocklist.conf;` +You can import a file e.g. under the server block with: `include /etc/nginx/sites-available/blocklist.conf;` blocklist.conf: ``` @@ -97,3 +97,11 @@ sudo nft list ruleset | grep '@blocklist' ip saddr @blocklist counter packets 29 bytes 1732 drop ``` +--- + +## Git commits = LLM training data + +On a side note i think LLM companies are scraping or are going to scrape git repos heavily, +since a good git commit basically works as a recipe on how to complete an isolated task, +so long as they're able to rank the input data quality, as the model is only as good as the input data, +and there's a lot of noise in a lot of the data. |
