Conversation

Like the Emacs-in-Rust thing¹, it’s another pipe dream project with scarce humanpower.

¹ https://toot.aquilenet.fr/@civodul/116099711903317535

1
0
0

I think these two factors—lack of humanpower and a “big” vision—coupled with the passion for technicalities typical of such projects make them particularly vulnerable to genAI.

Because yes, “we” want SMP support in Mach and it’s not been happening until this contributor achieved something with the help of genAI.

2
0
0

It’s probably easier for a big project like Gentoo to say “no” to genAI—they have enough contributors anyway, they don’t need it.

So what do we do?

1
0
0

I think we need solidarity. We need to recognize the harms of genAI, including from a free software standpoint.

And we need bigger projects to show the way: to clearly state their rejection, and not on the grounds of quality assurance—a concern bound to become irrelevant—but really on the grounds of ethics, refusing to be part of the harm this does to society.

3
0
0

@civodul this is crazy and sudden. Why is this happening at once everywhere???

1
0
0

So yes, maybe we’ll have to give up on some dreams—like the year of the Hurd on the desktop.

But in exchange, we’ll get something more valuable: human beings sharing their passion, helping each other, and building things together. The real asset of free software.

1
0
0

@tusharhero Because it’s so tempting? And “everyone does it”, and “look how it could benefit our project”.

1
0
0

@civodul there are more of us every day, and i take comfort in that amidst the deluge coming in from every angle. this isn't over yet, but we need some way to rally and push back en masse, rather than the current ad hoc approach.

genai is fractally immoral, unrecoverably so, and that needs to be front-and-center. everything else is shifting sands, easily ceded.

0
0
0

@civodul everyone also distributes nonfree software. Doesn't mean we should also do it...

0
0
0

@civodul Please Ludo, GNU Guix and GNU Hurd need to reject LLMs. I am going to request this to other FSF/GNU projects as well. I am in the process of writing a campaign where we actively pledge against this.

I know it's difficult for small scale free software projects to find contributions and support, but we cannot lose our ethics. I will personally donate and try to contribute more to projects who take the ethical stance. The hackers shall win in the long run! We cannot pollute our codebases with such code, we might inherit a lot of tech debt from this.

0
1
0

@civodul ...so HURD is no longer fully GPL? Because if they're using AI generated code, that's the result. Can't copyright it, so you can't GPL it.

1
0
0

@civodul $748 in less than a week!!!!! I get that they only paid $100 because of a temporary subscription deal, but holy shit… That’s a lot of compute. How many guix subsitutes do you think could be built with $748 of compute?

2
0
0

@civodul from Baccula’s paper:

When Brent started this project on February 16, he purchased a Claude Max subscription for $100/month. This subscription provides a fixed allocation of usage—not per-token billing—for both interactive sessions and the claude –print API calls that the task runner uses. The actual cost of this project is $100/month, not the per-token amounts shown in the task runner’s cost tracking.
The per-token costs reported by the API represent what the usage would cost at retail API rates: approximately $297 across 169 task runs with billing data (plus ∼$111 estimated for 31 runs without billing), and ∼$338 for 11 interactive sessions—roughly $746 total at retail rates. They are useful for understanding relative expense between tasks, but they are not what was actually paid. At retail API rates, the project would have cost over seven times the subscription price—the subscription is a much better deal for heavy usage.

0
0
0

@amy Anthropic claims they spent $20k on the Claude C Compiler (probably underestimated, but that gives an idea).

1
0
0

@admin The LLM output is public domain. If it’s “legally significant” (10 lines of code or more), and if these LLM-produced contributions are not clearly identified, then one could consider the whole as public domain, AIUI.

1
0
0

@civodul @admin It varies by jurisdiction. In the US, LLM output cannot be copyrighted and is public domain, but in the UK it can be copyrighted and the copyright holder is whoever prompted the LLM (assuming the LLM is not plagiarizing anything, which is questionable).

If it’s “legally significant” (10 lines of code or more), and if these LLM-produced contributions are not clearly identified, then one could consider the whole as public domain, AIUI.

Does that mean that you can make any program (or even any copyrighted work) public domain by adding LLM output to it and not clearly marking it? That can’t be right…

2
0
1

Wow, I had not heard of that project. I did hear about browser that doesn’t compile costing a lot as well though.

0
0
0

@noisytoot @civodul If you can get that code merged, yes, that seems to be the case.

Of course, if the person submitting the code fraudulently claims to hold the copyright -- which I think most open source projects would require before accepting the submission -- then things get more complicated legally. No idea how that would work out. But if they know it's generated and they accept the code and don't disclose and disclaim it then yes, at some point they lose copyright.

1
0
0

@admin @civodul

If you can get that code merged, yes, that seems to be the case.

Unless there’s a CLA or copyright assignment, contributors retain copyright and the project maintainers have no special status or rights other than those granted to everyone by the license. It doesn’t really make sense that a project maintainer’s decision to merge a contributor’s LLM-generated code can relicense code written by other people.

Otherwise what would prevent me from, say, forking Linux, merging partially LLM-generated code into my fork, then declaring that all of Linux (or my fork of it, which is almost identical) is now public domain?

1
0
0

@noisytoot @civodul Hmm, does Hurd not have a CLA? I kinda assumed all the GNU projects required assigning copyright, the FSF is pretty big on that, but looks like maybe not.

So in that case you have to disclose/disclaim anyway in order to retain the individual copyrights, so yes that shouldn't risk the copyright of the entire project. But you still can't really license any AI code under the GPL.

1
0
0

@admin @noisytoot The Hurd has copyright assignment, but not Mach (the microkernel), which has its roots outside GNU.

0
0
1

@noisytoot @admin This is one interpretation I’ve read, but I guess it’s a grey area.

0
0
0

@civodul In this case, the person is fully transparent. How many would not be?

Well, the fact that patches using GenAI are sent means we’ve already lost on the ethical side.

Moreover, sadly, it’s impossible to draw a line on the grounds of ethics for rejecting genAI.

Why? Because (1) it’s impossible to clearly state what means the use of genAI and (2) it’s impossible to know if one contribution follows such non-use of genAI.

A project could ask that contributors pledge to not use genAI but (1) makes such pledge poor and (2) makes such pledge empty – the pledge commits those who believe in. 🙃

On this topic of rejecting genAI contributions, it’s doomed, IMHO.

From a project point of view, the only thing actionable is to communicate about the harms. And communicate again.

1
0
0

@zimoun @civodul I feel this argument is taking several hard things we have done for decades — asking for honesty from your collaborators, defining criteria for provenance and copyrightability, and forming voluntary associations with stated ethical and legal boundaries; and calling them now impossible or lost causes because … err, they are not universally popular and we are at risk of eventual betrayal by bad actors and would not be pure after that.

I can relate to that feeling of overwhelm or resignation, but I can also see hundreds of projects that do these things and succeed.

0
1
0

@civodul
It shouldn't be that hard to reject. The code isn't copyrightable, so applying a copyleft license to it would be problematic at least.

1
0
0

@jens @civodul Absence of copyright, in the *US* at least, means public domain, so that might not be a problem. We accept public domain contributions to many projects (notably though, there are challenges there; CC0, while not useful for software, exists because the public domain *doesn't* exist everywhere)

However, I am worried that we're still early and the courts could turn around on a ruling for something like this, and suddenly a bunch of generated code which has been integrated is in a dangerous legal situation.

1
0
0

@cwebber Even if perhaps not set in stone, the US Copyright Office views LLM output as public domain:
https://copyright.gov/ai/Copyright-and-Artificial-Intelligence-Part-2-Copyrightability-Report.pdf

Also, companies like Microsoft committing to protect customers from LLM-related copyright infringement lawsuits are ensuring everything works as if LLM output was public domain:
https://blogs.microsoft.com/on-the-issues/2023/09/07/copilot-copyright-commitment-ai-legal-concerns/

But I do agree with your message: that including LLM-generated code in a free software project remains too risky at this point.

@jens

1
0
0

@civodul @cwebber @jens Wait, so if I can come up with a prompt that reliably causes an LLM to produce some copyrighted work, I can share that prompt with whomever I want and not be violating any law right?

2
0
0

@octorine If one infringes on copyrighted material, I think it doesn’t matter whether it came verbatim from an LLM, a Git checkout, a tarball, or snail mail.

(But apparently Copilot and friends have made it increasingly unlikely that you’ll get inputs copied verbatim in the output.)

@jens @cwebber

0
0
0

@tusharhero @civodul Can we please just ban this? Let us not pollute GNU mailing lists with AI.

1
0
0

@divyaranjan @tusharhero @civodul
Oh my, that's...terrible.

I guess that without policy, whatever Brent's assistant does, whether an intern or a LLM (please, do not use AI for something that's clearly devoid of any understanding or intelligence), is at their discretion?

1
0
0

@janneke @tusharhero @civodul Indeed, but can we make sure that the code does not get merged? If not for quality issues, but just the fact that it's unclear what to do with the copyright of LLM's code output, and we can't let GNU projects risk software freedom because of this.

1
0
0

@divyaranjan @tusharhero @civodul
AIUI, Damien Zammit wrote the SMP Code, and "only" took inspiration from this.

I'm not a core Hurd developer, I just do packaging for Guix.

0
0
0

@tusharhero In the end it has little to do with code: these things are quickly sabotaging the social fabric before our eyes, be it in free software projects or in other areas.

0
0
0

@civodul This is actually a nice example. It is a funny point in time because the providers are unethical, but the tools are interesting. We may go off the cliff, but I don't think it will take very long for free software to catch up with their own decent and improved models. AI is software too. Also we (as free software developers) clearly are invested in our code, so I am not too worried about slop. It may explode, but we will always keep cleaning up.

0
0
0

@octorine @civodul @cwebber @jens "... a prompt that reliably ..."
That doesn't exist, by design.

0
0
0