Today's sad thought: I wonder if there's any easy, request efficient (and bandwidth efficient) way to offer git repos for cloning that won't be excessively ransacked by LLM crawlers.
(... because if there is maybe the modern way to publish a git repo is 'here is the readme, here is how to clone, if you want anything more clone and look, you can thank LLM crawlers for this'. Maybe a (static) text page with a few of the recent commits.)
@cks@mastodon.social Any git repo that a human can easily clone, the LLM eventually will. I think the trick to avoid them would be to make the instructions niche enough that they won't bother. For example, if there are like only a dozen git repos on the whole internet that are being served via say, plan9 9P or some such, then it is never worth the effort for them to automate that. Off course, that also adds another layer of difficulty for a legitimate human user to get to your code. But I think there is no way around it. If it is easy for an unprepared human user to clone your repo, it will be used for LLM training.