Conversation

We are struggling to keep Codebreg.org available for unauthenticated users due to massive abuse of expensive endpoints.

Our current priority is keeping Codeberg.org responsive for authenticated users.

17
7
2

@codebergstatus
How about using nginx or Anubis CDN to prevent abuse while respecting users' privacy?

Always love Codeberg's mission.

2
0
1

@codebergstatus I cant seem to login...niether on codeberg or git.disroot.org
is the service down for account holders as well?

1
0
0

@Pouakai @codebergstatus they lock out lynx users. Plus, Anubis is also slop…

1
0
1

@mirabilos @Pouakai @codebergstatus And Anubis is easy for someone with slop-engine-scale GPU resources to compute. It puts more borden on ordinary desktop and mobile browser users than on the abusers.

2
1
1

@codebergstatus Thanks for the heads-up and good luck keeping those bots under control :) I was able to log in just now, so I don't complain :)

0
0
0
genai, slop, codeberg
Show content
@lumi@snug.moe
codeberg is based on forgejo, forgejo is based on gitea, gitea is kinda based on gogs, and gogs has claude commits
ive not checked if any fork cleans ai slop so i assume there is certain amount of slop in codeberg codebase itself potentially
5
0
0

@codebergstatus i had a lot of issues when hosting forgejo as well, i managed to mitigate them with Anubis + rate limiting via nginx + fail2ban

i mean were still cranking like ~10 gb a day on average but better than like 150 or however much it was before idr :D

best of luck fighting them bots

0
2
0

@codebergstatus keep fighting the good fight, we all appreciate the time and hard work you put into this public good.

0
1
0
genai, slop, codeberg
Show content

@fozunja @lumi I think the timeline of the forks does not line up with your assumptions.

0
0
0
genai, slop, codeberg
Show content

@fozunja @lumi If those based-on relationships are forks, they happened ages before LLM codeslop was a thing. What gogs is doing now is irrelevant.

1
0
0
genai, slop, codeberg
Show content

@fozunja @lumi gitea hard-forked gogs long ago

1
0
0
@codebergstatus Shouldn't have drama forked Gitea, otherwise you would already have some of the tools for that.
image.png
4
0
0

@Pouakai @codebergstatus they're struggling to keep codeberg available to users, not to restrict even more users from accessing it.

0
0
0

@phnt @codebergstatus congrats, you missed a point big time

1
0
0

@phnt @codebergstatus also, this solution sucks

1
0
0
genai, slop, codeberg
Show content
@dalias@hachyderm.io
i just checked and forgejo had kinda long tread with ai discussion
and result seems fine:
https://codeberg.org/forgejo/governance/commit/f7fac77037e7646e32982b99750e260fe13e8dd4

(gitea has ai slop too though, seems like forgejo is the cleanest of this trio)
1
0
0
genai, slop, codeberg
Show content
im sorry for making hasty conclusions, now forgejo completely banned ai generated stuff
Forgejo does not accept works of authorship (code, documentation, etc.) either partially or completely generated by AI due to legal uncertainties
1
0
0

@codebergstatus

Codebreg.org was so expensive, you had to park it?!

1
0
0
@tragivictoria @codebergstatus I know the point, I've been trying to keep my forge running on the same box as this instance in a usable state as well for 2 years without being too much intrusive about it. I know the issues and problems associated with it.

My point is that if Codeberg didn't fuel dumb drama and fears years ago, they would be in a better position to fix their current issues than what they currently have.

>also, this solution sucks

I know that isn't not the greatest implementation. It's more of a band-aid solution than anything else. Still better than a full on/off switch and a better start than an on/off switch.
1
0
0
genai, slop
Show content
@mkljczk@fediverse.pl forgejo and codeberg are clean

gitea isnt though
0
0
0
@phnt @tragivictoria @codebergstatus i did something similar for git.pleroma.social with caddy setting and checking cookies, works alright. traffic was indeed unusably bad before.
0
0
0
genai, slop, codeberg
Show content

@fozunja @lumi godammit

0
0
0

@dalias @mirabilos @Pouakai @codebergstatus
Anubis uses bog standard SHA256 as PoW method. It’s the equivalent of a wet cheeto for protection, you can generate a difficulty 6 solution in <1ms on any CPU made past 2015 with a decent implementation.

It’s only really doing anything when not being handled at all woozy_blahaj

1
0
0

@privateger @dalias @Pouakai @codebergstatus my Linux laptop is from 2007.

Trying to visit the Linux kernel website overheated it for several minutes on end.

2
0
0

@privateger @dalias @Pouakai @codebergstatus and this proves that it’s just as bad environmental pollution as the whole blockchain racket was and needs to be forbidden.

0
0
0

@mirabilos @dalias @Pouakai @codebergstatus
Note how I said decent implementation, meaning native code. JS is not that. It’s a PoW algo implemented in a way that makes actual client users slower by >100x.

The most resistance it poses to scraping is a mild inconvenience while adding support for handling it

1
0
0

@dalias @mirabilos@toot.mirbsd.org @Pouakai @codebergstatus i don't understand this. anubis is very easy for me to sit through because i only do it once. someone spamming at scale has to pay a higher cost to do so. for resources that are meant to be accessed by humans and not programmatically, it seems appropriate

1
0
0

@hipsterelectron @Pouakai @codebergstatus It's not just once though. Every time I visit the sites using it I get challenged again. No idea why they don't make the cookies persist forever. And # links don't work because it loses the # part.

1
0
0

@hipsterelectron @Pouakai @codebergstatus I think they suspect humans are going to be giving/selling their cookies to abusive scraper bots and thus make them short lived to limit the value.

1
0
0

@dalias @Pouakai @codebergstatus i'm glad their focus in general is on authenticated users who have a reputation in the system to dissuade them from abuse

1
0
0

@hipsterelectron @Pouakai @codebergstatus I don't understand why they impose Anubis on authenticated users at all. If you catch an authenticated user abusively scraping you just ban their account.

1
0
0

@dalias @hipsterelectron @Pouakai @codebergstatus I don’t see any indication they would be using Anubis for authenticated users, even for example go-away inspects the auth cookie for Forgejo probably in the style of forward_auth and doesn’t show the challenge for authenticated users

1
0
0

@natty @Pouakai @codebergstatus @dalias @hipsterelectron It is possible to do that in Anubis too, users just don't. I probably need to spend the energy to write a tutorial or something. I've been kinda burnt out.

0
0
0
genai, slop, codeberg
Show content

@fozunja note the wording "[...] due to legal uncertainties [...]":

This means doesn't oppose it for , or reasons...

1
0
0
re: genai, slop, codeberg
Show content

@kkarhan
Maybe, but that is something the community can fix with discussions and the right arguments. Until then it keeps the slop out.
@fozunja

1
0
0
re: genai, slop, codeberg
Show content

@momo @fozunja agreed.

Already people had to enough stuff due to

0
0
0
genai, slop, codeberg
Show content

@lumi there is a meeting out this later this month! If you're a member, please attend!

@mirabilos @dalias @Pouakai @codebergstatus

0
1
1
genai, slop, codeberg
Show content

@lumi @mirabilos @jessebot @Pouakai @codebergstatus Sounds like a great reason to become a member!

0
1
1
re: genai, slop, codeberg
Show content
@lumi @mirabilos @dalias @Pouakai @codebergstatus I don't think that's a good idea. using AI and lying about it is worse, because it makes it harder to avoid
1
1
2
re: genai, slop, codeberg
Show content

@noisytoot @Pouakai @lumi @codebergstatus @mirabilos If we only have to deal with it coming from the most awful people who are willing to lie and violate other people's boundaries, rather than coming from all sorts, then

(1) there's a lot less of it to deal with, and

(2) we already have plenty of other red flags to spot these people and plenty of reason to ban them from our projects.

0
1
1
genai, slop, codeberg
Show content

@lumi @mirabilos @dalias @Pouakai @codebergstatus hi, a proposal will be put up to an asynchronous vote among the 1700+ e.V. members as part of this year's annual assembly, which takes place next sunday

0
1
1

@phnt @codebergstatus I have that setting on my ForgeJo instance, pretty sure Codeberg has it in their toolbox, but the goal of Codeberg is to be a public forge, where stuff can be published !

2
0
0
@Sobex @phnt @codebergstatus

> but the goal of Codeberg is to be a public forge, where stuff can be published !

Then they need to put on their big boy pants and buy some more compute and bandwidth and also build a CDN. They're trying to be something on the public internet that they clearly cannot afford to be and they don't have the skills and money to build the infra they need.

(I'm still laughing about their Postgres corruption fiasco)
1
0
0
@feld @codebergstatus @Sobex I mean it's a nonprofit ran almost exclusively by volunteers. Can't really blame them at that point. The community around them is the bigger issue I think. Building a foss equivalent of GitHub simply isn't possible now, so instead of most foss people saying "migrate to Codeberg from Git{lab,hub}", it should be "run your own if you can" instead.
1
0
0
@phnt @codebergstatus @Sobex they should treat it like a co-op and require membership and dues.
0
0
0

@codebergstatus Having a similar issue on my instance, even with anubis. Facebook loves scraping everything over and over.

0
0
0
@moanos @codebergstatus Not really at least in Forgejo specifically. Forgejo doesn't have the "expensive" option recently introduced in Gitea, that requires signin only on some of the more expensive routes and leaves the rest public. Last time I checked they had some cookie setter/checker that triggered on most things outside of the main repo page and I guess that still isn't enough.
1
0
0
@phnt @moanos @codebergstatus

> Forgejo doesn't have the "expensive" option recently introduced in Gitea

lol wat why wouldn't they just cherry-pick that?

Gitea continues to be the superior fork for having less nonsense and continued reasonable developer behavior. Like when I pinged about this:

https://github.com/go-gitea/gitea/issues/34521

now Forgejo devs saw this come through and made the same change on their end, but look at the code style differences here. Gitea's is far superior IMHO

https://github.com/go-gitea/gitea/pull/37985/changes

https://codeberg.org/forgejo/forgejo/pulls/12905/files
0
0
0

@codebergstatus At the moment, this causes some "downstream" projects to fail, e.g. building librewolf-bin on AUR (ArchLinux) fails. https://aur.archlinux.org/cgit/aur.git/tree/PKGBUILD?h=librewolf-bin#n73

> fatal: unable to access 'https://codeberg.org/librewolf/source.git/': TLS connect error: error:0A000126:SSL routines::unexpected eof while reading

I think you have a great mission and I hope you'll find a solution! ❤️

0
0
0
@Sobex @codebergstatus Gitea has a third option in the same setting that requires authentication only for some of the more expensive routes that usually cause issues. That's the difference between the two.

>but the goal of Codeberg is to be a public forge, where stuff can be published !

If they are getting flooded, which they are for at least two years now, it's better to hide certain things to anonymous users than bring the whole service down altogether with scraping. Things like blame that are expensive and rarely used legitimately.
0
0
0

@phnt @codebergstatus I replied earlier, misunderstanding that the setting had a difference between gitea and forgejo; my bad.

Still, the gitea feature as-is is not sufficient for what codeberg wants, as it also blocks unauthenticated users from looking at any code, fetching any raw files, looking at issues or pull requests, and looking at the wiki. I get blocking commits, diffs, blames, graph, and some other things, but this prevents unauthenticated users from doing anything except for pretty much looking at the README.

At least looking at code on the main branch should be accessible without logging in, and certainly getting an overview of issues.

1
0
0
@taylor @phnt @codebergstatus

> I get blocking commits, diffs, blames, graph, and some other things, but this prevents unauthenticated users from doing anything except for pretty much looking at the README.

> At least looking at code on the main branch should be accessible without logging in, and certainly getting an overview of issues.

I disagree with all of this except about the wiki. That should be open still.

You want to look at the code? Download a source tarball.

You want to look at diffs/blames/graphs? Make an account or git clone it and do it locally with your own compute. The developers you want aren't using their browser as an IDE anyway.

Can't make an account because it's locked down? Well that's for good reason
0
0
0

@codebergstatus sorry for the stupid question guys. But why would you want an unauthenticated endpoint for a repo? 🫡🥰

I'm an authenticated user btw...

1
0
0

@handi @codebergstatus git clones?? And what about release files that would be downloaded?

0
0
0