i know some people oppose the widespread use of CI on ideological grounds, so i think it's worth it thinking about why we value it
for me, aside from the obvious reliability aspect (that could probably be mostly achieved by every contributor having a pre-commit hook that runs tests in a nix shell or something), the key utility provided by CI is legibility:
this is almost more important than the added reliability:
this is enough of a benefit that the risks from GitHub Actions design issues are worth mitigating in order to use the workflow
@whitequark why would someone oppose CI on ideological grounds?
@wwahammy i've heard a few reasons but i'd rather let someone who actually holds this position argue for it
@wwahammy @whitequark ghats what’s I would wanna know, I’m suspicious of projects that DONT have a CI/CD after the various misfortunes and misadventures I’ve had trying to build stuff from source
@whitequark @wwahammy I think I'm one of them so I'll go over a few:
1. Resource usage externalities when this is done at scale, especially for large projects times large numbers of PR authors. This manifests as energy waste, hammering the servers/infrastructure of software you depend on and pull dynamically in standard "destroy the world and re-run everything from scratch" CI recipes, etc.
2. Dependency on subsidized compute resources from a capitalist platform with motivation to lock you in and enshittify.
3. Reducing or eliminating the mandate for your software to be independently buildable by people on their own systems without your CI infrastructure.
right, so (for context) my responses to this would be:
I value reducing human misery a lot more than I value conserving energy, so I don't consider CI energy use "waste" unless there are specific ways in which it can be optimized but isn't.
I'm actively working on community run CI infrastructure so my position here should be obvious
"destroy the world" recipes are actively necessary to tackle this problem. GHA-style workflows, for all their faults, significantly limit how clever you can be in setting up your CI infrastructure, so the chances that you can turn a GHA workflow into a usable series of steps for your OS quickly is quite high—more so than if it was a heavily customized Buildbot workflow, for example
@dalias @wwahammy one thing I'm unsure about is downloads. the tradeoffs here aren't obvious; depending on what you're doing, caching stuff you download (even repeatedly) locally can end up using scarcer resources more aggressively (bandwidth can be a much more available resource).
the best case scenario here is something like Nix flakes which are intrinsically cacheable, but if you ever let external contributors run workflows, you run the risk of poisoning this cache (Nix isn't hardened enough against a malicious builder).
one day i'll have the answers to this
@whitequark @wwahammy My view is that any CI system that wants to be non-abusive to third parties' network resources needs to limit all fetching to content-addressed storage with caching in the CI host layer. No direct URL fetching/network access.
@whitequark @wwahammy (1) also includes hammering other folks' servers. Even if you don't care about the energy externality, downloading the same thing hundreds or thousands of times is antisocial and shoves your cost onto someone else who likely has no funding and who's then forced off of affordable self-hosting and onto some centralized platform that subsidizes the costs in exchange for control.
I know about your work on (2). I'm not blaming you, just outlining the ideological opposition to CI and particularly GitHub-based CI, which is something I believe you're working to mitigate.
I think (3) is a lot more complicated than I can adequately address here.
@dalias @wwahammy I'm aware (1) does include that, but I don't see it in such a binary way—back in the day for projects that couldn't serve the bandwidth needed we had mirrors, no reason we can't do that again. (and in fact Debian still does, and Zig actively encourages this approach for the same reason, etc.)
if you don't have enough bandwidth to serve downloads—and even a cheapest VPS is often capable of doing 100 Mbps up for days—you should be redirecting people to the mirrors and returning 429 to those who still hammer the origin to make sure the scripts are updated. but first we need tooling that makes secure downloads with distributed mirroring easy, I think Zig had to build their own
and yeah, I didn't get the impression that you're blaming me
re (3) I don't think having CI absolves people of providing independently buildable code. I'm not sure I've encountered even a single case of such... maybe some vibecoded stuff but who cares. I rather see bothering with setting up CI as downstream of wanting to make sure code remains buildable by other people. if someone doesn't care about that as a goal they just provide Docker (or in the old days they would just provide some ghastly shell archive that unpacks into /opt), and no technology is going to make someone care who previously didn't
@whitequark @wwahammy I really really also don't like the "destroy the world and start over" that makes it take minutes to get CI results and know if your change needs revision to have a chance at being acceptable. It could and should be an incremental make that finishes in milliseconds when you've only made a localized change. By tracking cached CI results & artifacts linked to commits & configurations, a virtual overlay of the artifacts for the parent commit could always be used as the starting point for building with a proposed new commit, yielding near-instant results (assuming the build system is decent and handles incremental builds).
@whitequark No experience with Github Actions specifically, but even in a small (<10 person) company, having automatic builds were definitely worth the hassle, even if someone had to manually upload them to make the release available to customers.
At the very least, being able to trust that a given version/build number maps to a specified revision is huge.
@whitequark @wwahammy Well it's going to have to run on the CI side anyway when the PR is updated.
@whitequark @dalias @wwahammy Wait, you value reducing human misery? In this society, this economy and this industry?
(I mostly joke: I actually agree entirely with that value, but it does feel like a pretty marginal position these days)
@dalias @wwahammy I know this is a real problem (PyPI and Rubygems have both considered measures against excessive bandwidth use, mostly by CI services) but I don't think this is the solution; if someone says I should use a CI system where git clone and pip install don't work I would simply consider it defective and pick a different one. and as stated, this seems like it would entirely prevent anything that uses HTTPS to talk to the network (so, basically everything) from working unless every individual tool is going to be upgraded with this system in mind which seems unlikely
@iris_meredith @dalias @wwahammy my entire motivation for building OSS (in the particular way that I do it) comes down to "the industry / the incumbents are making this miserable as fuck so I'll fix it"
see: Vivado, Verilog, etc
@noisytoot @dalias @wwahammy I was thinking about "substituters". as far as I'm aware nothing stops you from editing the stuff in the Nix store if you have the right privileges (directly or via a service) and it's pretty hard to detect if it's ever done, therefore I wouldn't rely just on Nix to prevent cache poisoning (especially in light of regularly dropping Linux LPEs)
@whitequark @wwahammy Why would you ever do a git clone of third-party repos as part of CI? You just need the version you're building with, in which case you can request the archive of that, which can then be content-addressed by its hash. You don't need the entire history which is probably a few orders of magnitude larger.
@dalias @whitequark @wwahammy ok so your not against CI/CD, your just against GitHub Actions specifically.
What would you recommend instead?
@dalias @wwahammy most of the time? because it's a submodule. sometimes a recursive submodule.
github's default actions/checkout does a shallow clone (which is just as efficient), but some packages do actually look at their own history in order to give accurate git-describe results or turn git distance numbers into version numbers. your workflow isn't my workflow
@dalias @wwahammy also I'm pretty sure that at least with Forgejo, it takes less resources to do a git shallow clone than it takes to download an archive of a commit (because the archive needs to be generated and then stored, and all of them are fully denormalized, while git does some sort of optimization with pack files I think?)
@valorzard @whitequark @wwahammy Well I'm against a number of standard CI/CD practices that are harmful to parties not even involved in the project using the CI/CD.
I don't have a specific recommendation for something I haven't wanted to use. I don't think the whole purpose of CI/CD is that important because I don't think we should be expecting non-developers to be using a continuous rolling main branch rather than discrete releases the maintainers have confidence in. If other people want to do that, fine, but finding the right tooling to do it without externalities impacting others is on them not me.
@whitequark @wwahammy OK, but that's the fault of the CI system doing a shallow clone rather than a fully recursive checkout from already-cloned-and-cached repositories. It's the fault of poor abstraction layers that behave as "just do whatever you want to script in this throwaway container" rather than something more structured.
@dalias @valorzard @wwahammy I think if you have significantly varying amounts of confidence in your main branch there's something wrong with your approach to development, even if non-developers only ever use releases. releases are useful to indicate evolution of the support contract, sure; but if your main branch is sometimes especially wonky because you landed a poorly tested change you should probably test your changes better
@whitequark @wwahammy I don't see why the archive would need to be stored. Tarballs are fully streamable and the git-archive command emits them as a stream not with temporary storage.
@noisytoot @dalias @wwahammy nope. but if you're actively trying to cache intermediate products, you'd have to either allow persistent writes to /nix or allow writes to substituters, both of which seem like they'd allow for cache poisoning (or at least, they don't seem robust enough that I can guarantee absence of it)
@whitequark @wwahammy Gotta love how much of a regression all the fancy forges are versus plain cgi-bin cgit... 😫
@dalias @whitequark @wwahammy Do you include SourceHut in that analysis? In some ways it's even more minimalist than cgit.
@dalias @wwahammy so I've been responsible for the operation of something more structured for a few years—in my case, a complex Buildbot CI workflow that was updating and building an LLVM/Clang/ARTIQ on a 10 Mbps link (not a typo). I actually did set up the caching system you're talking about here, which used nginx in a forward proxy mode to intercept and store Conda package requests, and it was one of my most nightmarish technical assignments. if I never have to do that again in my life it will be too soon. the correct amount of state in a CI system is zero, because this actually makes it knowable, instead of a bundle of surprises you never know will work from one build to the next because of changes you couldn't predict or track
this doesn't mean that redownloading the same static files over and over is necessary, but the basic principle of "preserve nothing from run to run" is the only way to stay sane
@dalias see, I don't really like talking to you because of your tendency to arrogantly jump to conclusions without ever doing a bare minimum of research
@dalias not "Huh, I wonder why is it that Forgejo does that?" (I don't know but I suspect it has something to do with IO load from repeatedly requested archives), directly to "It's a regression compared to [favorite project]!". it's insufferable
@whitequark @dalias @wwahammy “10 Mbps link” That's a nice fast UART you've got there!
@whitequark If this is a conversation you'd rather I not continue I'm fine with dropping it.
@whitequark @wwahammy TBH if you can't trust your incremental builds to be incremental, that's something I'd want a good CI to test too. 🤪
Like, both preserving artifacts from parent commit, *and* running a new build from scratch, and asserting that the results are byte-for-byte identical.
No, that doesn't sound fun to implement.
@dalias @wwahammy practically speaking, since most of the traffic is coming from npm/pip/cargo/etc I think you should be able to reduce load on external services without intercepting every network request, but by providing local on-demand caches of popular (thus, expensive to run) repositories. this is unlikely to make much of a difference because the supermajority of the load will continue to come from GitHub, but in a hypothetical world where GitHub implemented this, it would improve things a lot
of course GitHub doesn't care too much because npm traffic should be free for them and I guess they just don't think too much about the rest? gross behavior
@dalias no, I would rather like to see you question your assumptions (that other people just don't know how to build software) more often. which I know is a lot more work, but still
@dalias @wwahammy the unfortunate part about being a comparative drop in the bucket is that you could reduce your traffic by 99.9% and nobody on the other end would even notice. in general it doesn't look like a problem that will be solved unless e.g. PyPI starts responding with 429 to requests from Azure's ASN, and which will probably be solved quickly afterwards
from memory, the latest plan on this was to start charging the biggest bandwidth users, but I'm not sure where that's at. maybe @glyph knows?
@whitequark I mean I feel like it's less of an "assumption" and more of a long history of unpleasant experiences.
@MrDOS @dalias @wwahammy I don't think I have words to adequately describe waiting for Conda to download a build of LLVM you just uploaded there minutes ago... for 90 minutes... then deciding to discard everything it's done and download it again, for some inscrutable dependency solver reasons I could never nail down
I think it may have improved since but it's why I still have a visceral reaction to Conda. it's basically like this
@whitequark @MrDOS @wwahammy A similar visceral reaction is probably a large part of my rage at this kind of stuff.
@whitequark @dalias @glyph it really seems like it comes down to "GitHub doesn't want to fix it"
@dalias @whitequark @wwahammy I hope I didn't lose the right end of this thread with so many side replies it needs its own representational format. Only meta communication from my side:
Thanks for the civil discussion, a rarity in public Internet when different opinions clash. I know there's a lot of own experiences, assumptions, and opinions, some of them in comparable direction while others being diametrical.
Anyway, I enjoyed this discussion wholeheartedly. Not because it could also be a panel discussion but rather because they highlight reasons for actions being taken (with different findings but that is okay on my page)
I'd like to see more of this in the future 🙏🏾
@dalias @whitequark @wwahammy these can be solved by hosting your own GitLab, Forgejo, or Gitea instance, using an artifact storage (either built-in or something like Nexus) and not overcomplicating your CI setup (e.g. just calling the script/build system/test rather than having entire scripts in the CI)
@whitequark i don't hate ci in general, but i hate how github has created a ci system that is near impossible to replicate locally. good luck testing it when you're doing changes to it, and forcing people to push to see if their changes pass tests because they cannot replicate ci locally leads to awful github use
@whitequark how many "fix ci (for real this time)" commits could have been avoided if github actions could be run locally
@whitequark and i don't mean via act, which isn't even a perfect port because it's a different container image, i mean first party support from github
to me, for any ci system to be useful, it MUST be runnable locally
@SRAZKVT so i suppose i'm in a better bucket re: act now that with forgejo it does literally use the same image as ci :p
@whitequark yea forgejo is afaik the only all in one forge that actually has reasonable ci. maybe sourcehut's ci as well ? but ive never looked at it
@SRAZKVT for sure i have a lot of criticism of it, but i've been more happy with it than github in many ways
@whitequark for nearly everything ive looked at, forgejo has been handling proper git practices much much bettet than github ever did. did a force push ? fuck you full diff only you see nothing else. forgejo at least is more helpful
also merge strategies, forgejo allows more options, including fast forward merge iirc, which github just ignores
@whitequark today i'd say forgejo is a better github than github
@dalias @whitequark @wwahammy problem with trust in this context is that it's not sufficient to trust without some approach to detecting and remediating failures, and then when you have infrastructure for detecting and remediating incremental build failures it often looks a lot like just clearing the cache and running the whole build again. tension between "i should be able to rely on this working" and "it's not physically possible to be sure this always works, so i need to manage the fallout"
@whitequark @dalias @wwahammy plus, if you're worried about CI cpu-hours you could also Just Make Your Cold Build Faster. in C and C++ particularly there's a lot of approaches there with varying degrees of flakiness (e.g. unity files, pch, fixing your code structure), where the incremental build is only one flaky and often ineffective option of many.
@whitequark @wwahammy if i'm understanding it correctly, the system most similar to what @dalias seems to want is... google's internal infrastructure (at least, according to very-outdated memories of it)
this of course comes with the usual "being incompatible with everybody else" and picking up "just hyperscaler problems"
but the worst part is that it makes "buildable on your own systems" *even harder*, because of all the extra "stuff" (caching, storage space, compute, and bandwidth all count as "stuff") that you don't have on your computer
this starts getting into discussions that aren't really about CI anymore though, touching on "should we be building 'smaller' software?" and "should we prioritize closing the gap between 'developers' and 'ordinary users' or at least not make it worse?"
@r @wwahammy @dalias and i think the biggest tension here, which i don't know how to solve, is that the gap between "I can make the software reliable if it is built and run in a predictable environment" and "there is a lot of social pressure on OSS developers to support their software in environments they cannot reliably reproduce or sometimes even access" is ever-widening and burnout-inducing
personally, I think that while the implementation of Bazel shipping a hermetic toolchain is horrendous to interact with and actively hostile to any sort of debugging, in concept this is sound. I should not be on the hook for supporting environments I can't reproduce (a dual of: I should be on the hook for making sure the software runs reliably in environments I do reproduce)
@whitequark @wwahammy @dalias yup, that conversation is *also* currently actively making the rounds (discussions about Rust+cargo build times)
@r @wwahammy @dalias I think the distro behavior of "we're going to replace random components of your software with random things we think are good according to a top-down bureaucratic policy and then implicitly leave you as the point of contact for the result" is unjustifiable and is driving much of said burnout, but the less extreme version of the expectations is a burden too, just not so big that it warrants a conflict over it most of the time
in particular, I do not think any software of mine that I regard as "secure" where the distro decided to replace some components in an uncoordinated way must not be regarded as "secure" after this modification. or they should at least rename it
Firefox got a lot of flak for their trademark policy but they were right
@whitequark @wwahammy @dalias don't have enough personal experience on the maintainer end to comment too much on that, but even as just a user i agree that the distro social contract/purpose/value hasn't been properly articulated and updated in many years
not just firefox, the xscreensaver debacle comes to mind
this is all *extra frustrating* because one of the worst "offenders" seems to be debian, which is also one of the distros most attempting to hold back the tide of YOLO
don't have any answers here, but imo the opaqueness of packaging and "distro-scale ops" really doesn't help
@r @wwahammy @dalias so Debian's stated policy is that if you encounter an issue with a package in Debian you should file a bug with Debian and only with Debian (which they will then triage). this is, actually, mostly a fair policy. it has two issues:
I think the collaboration model where in a routine workflow information flows one direction (upstream to downstream) is incompatible with a desire to patch things whenever. while a distro is fully within its legal rights to do this, I view this as a consent issue: I should be able to opt out of having modifications I don't approve of be attached to the reputation of my project by an entity with infinitely more recognition without invoking the tools of state violence
@whitequark @wwahammy @dalias hah i actually knew about that rule. dunno how many people do though. debian bts is *also* not the most intuitive software in the current day
fwiw i'm vaguely okay with the "trademarks" part of intellectual property, compared to the brokenness of copyrights and patents. (i mean, within the context of being not a fan of state violence.) not sure what the general "f/oss culture" opinion is, since it doesn't seem to come up as often
the information flow model is *definitely* broken. i wonder how much of this is *still* downstream of the "unix wars" era of vendors you couldn't do anything about? (maybe related: the hell that is terminfo. maybe "compiler horrible glue" (ABIs, driver) too?)
@r @wwahammy @dalias I'm also basically okay with trademarks but I don't think we should be building norms in open source software around trademark enforcement; that is incredibly inequitable, even more so than with copyright enforcement
I don't know what causes this. maybe it just organically developed like that since back in the day distros were less of a "the entire world depends on this" and you wouldn't email a guy whose software you're using whenever you're using it
otoh, hearing that my software is used generally feels good, so even from that perspective it seems flawed