berkeley.edu.pl

Conversation

Towel Mouse

The fuck is this.

WHERE ARE MY BOTS?!

I am not used to 100req/sec total. With ai.robots.txt being the top one, asn and faked-browser barely even registering?

Huh.

Wolf480pl

wolf480pl@mstdn.io

4 months ago

Reply to @algernon@come-from.mad-scientist.club

@algernon
they're gathering up forces for a larger strike /s

Towel Mouse

algernon@come-from.mad-scientist.club

4 months ago

Reply to @wolf480pl@mstdn.io

@wolf480pl tbh, that is exactly what happened last time I had a lull.

[SUBJECT_NAME_HERE]

liebach@mastodon.art

4 months ago

Reply to @algernon@come-from.mad-scientist.club

@algernon They finally figured out you're a bad influence?

Towel Mouse

algernon@come-from.mad-scientist.club

4 months ago

Reply to @liebach@mastodon.art

@liebach One can only hope!

zivi

zaire@fedi.absturztau.be

4 months ago

Reply to @algernon@come-from.mad-scientist.club

@algernon you won the game

zivi

zaire@fedi.absturztau.be

4 months ago

Reply to @zaire@fedi.absturztau.be

@algernon now heading into part 2 of the baba and the `wei saga, where they realize their chromes are bringing back the same garbage

Towel Mouse

algernon@come-from.mad-scientist.club

4 months ago

Reply to @zaire@fedi.absturztau.be

@zaire

Baba: are you thinking what I'm thinking, 'Wei?
'Wei: that we should scrape the whole wide web? Yes, Baba!
Baba: No, you idiot. We only get garbage. We'll go back to the cage and Vibe!

Joerg Jaspert

Ganneff@fulda.social

4 months ago

Reply to @algernon@come-from.mad-scientist.club

@algernon Feeling lonely?

Towel Mouse

algernon@come-from.mad-scientist.club

4 months ago

Reply to @Ganneff@fulda.social

@Ganneff A little!

For the past year, I never saw my incoming request rate below 100req/sec for longer than half a day (except when I firewalled half the internet off). It's at ~60req/sec for ~23 hours now.

Towel Mouse

algernon@come-from.mad-scientist.club

4 months ago

Reply to @algernon@come-from.mad-scientist.club

I have not seen anything like this in a year. Never, actually, never since I started monitoring the bots.

Towel Mouse

algernon@come-from.mad-scientist.club

4 months ago

Reply to @algernon@come-from.mad-scientist.club

I had lulls, yes, but... never this long, and never with Alibaba & Huawei almost completely disappearing. Never with the faked browsers barely registering.

Well, except when I firewalled them all off, but... that doesn't really count. They would've been here, if they could be. Nothing is stopping them now! They just... don't come.

Noisytoot

noisytoot

4 months ago

Reply to @algernon@come-from.mad-scientist.club

@algernon congratulations, you won

Towel Mouse

algernon@come-from.mad-scientist.club

4 months ago

Reply to @algernon@come-from.mad-scientist.club

Now if this stayed like this forever, that would be grand. If any AI company wants me to stop working on iocaine: this is how you do it. You stop visiting.

Preferably you take one hard look at my /robots.txt and fuck off forever. Or better yet, just close shop and do something useful, but that's probably too much to ask.

Lars Wirzenius

liw@toot.liw.fi

4 months ago

Reply to @algernon@come-from.mad-scientist.club

@algernon Conspiracy theory: Alibaba and Huawei actually operate from inside Iran.

Towel Mouse

algernon@come-from.mad-scientist.club

4 months ago

Reply to @liw@toot.liw.fi

@liw The worst part is that I can't immediately refute that.

(I will keep an eye on my metrics and their correlation with the internet situation in Iran...)

solo

solonovamax@tech.lgbt

4 months ago

Reply to @algernon@come-from.mad-scientist.club

@algernon do you have metrics for the total number of requests that aren't blocked? what if they just aren't crawling the iocaine pages at all bc they're detecting iocaine & iocaine can't detect them, so they're still crawling your normal pages

Towel Mouse

algernon@come-from.mad-scientist.club

4 months ago

Reply to @solonovamax@tech.lgbt

@solonovamax I do, that's the green "default" line. They ain't coming through. Not in large numbers anyway.

technomancy

technomancy@hey.hagelb.org

4 months ago

Reply to @algernon@come-from.mad-scientist.club

@algernon the funniest explanation would be that we all first heard about the big crash/onset of the third winter from watching iocaine logs

Towel Mouse

algernon@come-from.mad-scientist.club

4 months ago

Reply to @technomancy@hey.hagelb.org

@technomancy ROFL.

But... how do I sell the rights to Hollywood so they can make a film of my dashboards... hmm....

solo

solonovamax@tech.lgbt

4 months ago

Reply to @algernon@come-from.mad-scientist.club

@algernon all my bots, gone

Towel Mouse

algernon@come-from.mad-scientist.club

4 months ago

Reply to @algernon@come-from.mad-scientist.club

In half an hour, the "150" (request/sec) range will disappear off of the charts for the past 24 hours.

Feels so weird.

Towel Mouse

algernon@come-from.mad-scientist.club

4 months ago

Reply to @algernon@come-from.mad-scientist.club

Is this what success looks like?

What if they just want to deny me data, so I can't do crawler research?

Towel Mouse

algernon@come-from.mad-scientist.club

4 months ago

Reply to @algernon@come-from.mad-scientist.club

Well, if that's the case, they're in for a surprise. I have way more sources of data than my own sites.

Towel Mouse

algernon@come-from.mad-scientist.club

4 months ago

Reply to @algernon@come-from.mad-scientist.club

If any of you scrapers are reading this:

Fuck you.
This doesn't mean I want you back.

Towel Mouse

algernon@come-from.mad-scientist.club

4 months ago

Reply to @algernon@come-from.mad-scientist.club

We're soon entering "80 is the highest number on the Y axis" territory, and this is just unbelieavable.

Towel Mouse

algernon@come-from.mad-scientist.club

4 months ago

Reply to @algernon@come-from.mad-scientist.club

No, they're not getting past my defenses. The green line is the "default" ruleset that lets things through. It's not different than normal.

My firewall doesn't block them. The load on my servers are noticably lower.

I forgot this feeling of calm.

Towel Mouse

algernon@come-from.mad-scientist.club

4 months ago

Reply to @algernon@come-from.mad-scientist.club

Yes, ai.robots.txt is still blocking ~60 requests / sec, but that's such a tiny amount compared to what I normally get hit with.

I wouldn't mind that going away either, mind you.

Cyborus

Cyborus@social.cartoon-aa.xyz

4 months ago

Reply to @algernon@come-from.mad-scientist.club

@algernon I doubt it is, but it would be cool if this was some early warning to interesting news. (wistfully hoping to see they just shut it down) (not gonna happen, but i can wish)

Towel Mouse

algernon@come-from.mad-scientist.club

4 months ago

Reply to @Cyborus@social.cartoon-aa.xyz

@Cyborus Indeed!

Towel Mouse

algernon@come-from.mad-scientist.club

4 months ago

Reply to @algernon@come-from.mad-scientist.club

Edited 4 months ago

...and I have the core idea for the next Baba and the 'Wei episode. Coming to a blog near you.... proooooobably tonight.

datarama

datarama@hachyderm.io

4 months ago

Reply to @algernon@come-from.mad-scientist.club

@algernon Maybe someone has finished gathering data for a training run?

Towel Mouse

algernon@come-from.mad-scientist.club

4 months ago

Reply to @datarama@hachyderm.io

@datarama they pretty much all disappeared. Not just a single distributed crawler, but... like, all of the disguising ones are gone.

datarama

datarama@hachyderm.io

4 months ago

Reply to @algernon@come-from.mad-scientist.club

@algernon OK; that's just *weird*.

Towel Mouse

algernon@come-from.mad-scientist.club

4 months ago

Reply to @datarama@hachyderm.io

@datarama yep.

datarama

datarama@hachyderm.io

4 months ago

Reply to @algernon@come-from.mad-scientist.club

@algernon The other obvious explanation, namely that someone put your site on an ignore-list because they realized they were getting served garbage, also doesn't make sense then.

Towel Mouse

algernon@come-from.mad-scientist.club

4 months ago

Reply to @datarama@hachyderm.io

@datarama i'm seeing the same thing on a canary domain that doesn't serve garbage, so... probs not an ignore list either.

But: others are still seeing regular crawler assault. So this retreat feels staggered at best.

datarama

datarama@hachyderm.io

4 months ago

Reply to @algernon@come-from.mad-scientist.club

@algernon Is your canary domain on the same IP as the wonderful garbage spout?

Towel Mouse

algernon@come-from.mad-scientist.club

4 months ago

Reply to @datarama@hachyderm.io

@datarama No. Different IP, different hoster, different country, different domain, and my name or references to me doesn't appear anywhere near it. Entirely different software stack too (OpenBSD + httpd, no iocaine).

(It's not even registered in my name, I just control it.)

Towel Mouse

algernon@come-from.mad-scientist.club

4 months ago

Reply to @algernon@come-from.mad-scientist.club

Oops! I just had a minor ASN blip! Baba and the 'Wei came back for half an hour.

MY PETS ARE ALIVE!

Towel Mouse

algernon@come-from.mad-scientist.club

4 months ago

Reply to @algernon@come-from.mad-scientist.club

So... about that new Baba and the 'Wei story. It's title's gonna be: "Baba and the 'Wei do Hollywood".

There might be spicy scenes.

Jak2k 🏳️‍🌈

jak2k@mastodontech.de

4 months ago

Reply to @algernon@come-from.mad-scientist.club

@algernon I still get a lot of requests (almost 700 reqs/m, if I calculate correctly). Maybe they specifically excluded your sites?

Towel Mouse

algernon@come-from.mad-scientist.club

4 months ago

Reply to @jak2k@mastodontech.de

@jak2k Dunno. I'm seeing the same drop on a canary domain (different ip, different hoster, country, stack, no iocaine, etc), and I've heard others report a drop in activity too.

But I've also heard - and seen - other places still seeing the "normal" amount of absurdly high request rates.

Towel Mouse

algernon@come-from.mad-scientist.club

4 months ago

Reply to @algernon@come-from.mad-scientist.club

...and it is probably not coming today, because I lost inspiration halfway through.

Towel Mouse

algernon@come-from.mad-scientist.club

4 months ago

Reply to @algernon@come-from.mad-scientist.club

We are now in the 80s territory. If things stay this way, we'll be in the 60s territory by two in the morning.

Towel Mouse

algernon@come-from.mad-scientist.club

4 months ago

Reply to @algernon@come-from.mad-scientist.club

Still feels weird... if I'd firewall off about a hundred IP ranges, iocaine would be out of a job.

Joerg Jaspert

Ganneff@fulda.social

4 months ago

Reply to @algernon@come-from.mad-scientist.club

@algernon Can't live with them, can't live without them.

Towel Mouse

algernon@come-from.mad-scientist.club

4 months ago

Reply to @Ganneff@fulda.social

@Ganneff Oh I could definitely live without them. But they'd have to take the rest of the crap with them, that faint yellow area on my stats is still a nuisance!

Joerg Jaspert

Ganneff@fulda.social

4 months ago

Reply to @algernon@come-from.mad-scientist.club

@algernon Is it on purpose piss yellow? 🤔

Towel Mouse

algernon@come-from.mad-scientist.club

4 months ago

Reply to @Ganneff@fulda.social

@Ganneff Yes.

Aaron

aaron@zadzmo.org

4 months ago

Reply to @algernon@come-from.mad-scientist.club

@algernon I have seen this happen previously. They came back within a few months.

Towel Mouse

algernon@come-from.mad-scientist.club

4 months ago

Reply to @aaron@zadzmo.org

@aaron Yeah, I fully expect them to come back, but... this is the first big lull I'm seeing on my own infra. I've had 8-12 hour "outages" where the total request/sec fell below 100. But it's over 24 hours now.

Wouldn't mind if it stayed that way a little bit longer.

Joerg Jaspert

Ganneff@fulda.social

4 months ago

Reply to @algernon@come-from.mad-scientist.club

@algernon Ok, honestly, I would have been really surprised if the answer would have been a no.

Towel Mouse

algernon@come-from.mad-scientist.club

4 months ago

Reply to @Ganneff@fulda.social

@Ganneff That would have been half correct too. Originally, it was piss yellow purely by accident. I wanted to preserve that accident, so made it explicit.

Towel Mouse

algernon@come-from.mad-scientist.club

4 months ago

Reply to @algernon@come-from.mad-scientist.club

I have enlisted the Bestest Detective I know to aid me in finding my bots. She's keenly observing the scene.

#DogsOfMastodon

KFears& 🏳️‍⚧️

KFears@mstdn.games

4 months ago

Reply to @algernon@come-from.mad-scientist.club

@algernon Btw, how load-bearing is ASN ruleset? Are there many scrapers that match exclusively on it? I've been massively procrastinating on setting up maxmind db, wonder how important it is.

Towel Mouse

algernon@come-from.mad-scientist.club

4 months ago

Reply to @KFears@mstdn.games

@KFears So-so. In most cases, the other rules catch them too, but I ordered ASN early, so I could highlight Alibaba & Huawei in stats easier. Usually, there are very few matches. But there are waves sometimes where they piggy-back on real Chromes (it's always Chrome), and even avoid poisoned URLs. Then the ASN ruleset catches them.

That happens rarely, though, so in the grand scheme of things, I would not consider it load bearing. But it is useful.

Towel Mouse

algernon@come-from.mad-scientist.club

4 months ago

Reply to @algernon@come-from.mad-scientist.club

A day later, and my bots are still gone. Baba and the 'Wei had a 40 minute spike, but they have barely crossed the 60 request/sec line.

Not going to lie, I'm enjoying this calmness.

datarama

datarama@hachyderm.io

4 months ago

Reply to @algernon@come-from.mad-scientist.club

@algernon I'm weirded out by the bots being gone.

I wonder what's up.

Towel Mouse

algernon@come-from.mad-scientist.club

4 months ago

Reply to @datarama@hachyderm.io

@datarama Me too, because they're only gone from some sites. Other people, and other sites of mine still see them.

Maybe a particular new model is done collecting, and now they're filtering, realized I only serve them garbage and they're not getting through, so put me on a blocklist. We'll see if that's the case if they stay away. I fully expect them to come back, though.

This is something I will never know for sure, and even if I had a chance of figuring out, I wouldn't spend effort on it. Not worth it. My time is better spent enjoying the quiet!

Byte

Byte@rage.love

4 months ago

Reply to @algernon@come-from.mad-scientist.club

@algernon @datarama it’s possible they thought the return on investment was not there

I suspect large codebases/forges will still get hammered until the bubble pops, because it’s worth it to try and bypass defenses like iocaine with real browsers

Towel Mouse

algernon@come-from.mad-scientist.club

4 months ago

Reply to @Byte@rage.love

@Byte @datarama If that's the case, if they're really gone, that'd be fantastic. Then the method works!

(Luckily, one of the larger GitHub alternatives is also behind iocaine + Nam-Shub of Enki, so... hopefully this'll pan out similarly for them too in the longer run!)

datarama

datarama@hachyderm.io

4 months ago

Reply to @Byte@rage.love

@Byte @algernon Yes, but *all of them*? These goons don't strike me as the most cooperatively-inclined bunch.

But our gracious host understandably prefers to enjoy the peace and quiet, so I won't speculate further here.

Towel Mouse

algernon@come-from.mad-scientist.club

4 months ago

Reply to @datarama@hachyderm.io

@datarama @Byte Well, the Usual Suspects¹ that don't try to hide are still here. "Only" the worst ones that try to hide are gone - and they're not completely gone either, just fell from ~200 req/sec to ~3-4req/sec.

Anthropic, OpenAI, Meta, Google, etc ↩︎

Byte

Byte@rage.love

4 months ago

Reply to @algernon@come-from.mad-scientist.club

@datarama @algernon do the usual suspects who don’t try to hide still ignore robots.txt?

Towel Mouse

algernon@come-from.mad-scientist.club

4 months ago

Reply to @Byte@rage.love

@Byte @datarama Of course they do.

There are some, like Google, who do a bit of performance art to try and prove they respect it, but that's just that, performance art. For all intents and purposes they still ignore it.

(With that said, my data is half a year old. I serve garbage to them at /robots.txt too, because why bother telling them to fuck off when they're gonna ignore it anyway.)

Byte

Byte@rage.love

4 months ago

Reply to @algernon@come-from.mad-scientist.club

@algernon what kind of performance art?

Towel Mouse

algernon@come-from.mad-scientist.club

4 months ago

Reply to @Byte@rage.love

@Byte "We'll respect robots.txt when directly crawling, but we'll crawl your site if anything links to it. Including our index, or your own site."

I mean, Google's various bots hit my sites with ~9k requests a day, even though I have x-robots-tag: noindex, nofollow, nosnippet, noimageindex, noarchive, nocache, notranslate in every response. They can't access my /robots.txt¹, but even if they could, they still hit me with the same amount of requests.

Now, 9k hits isn't much, but it's about 8.9k more than it should be. They also load resources found in the HTML, despite the x-robots-tag header, so they barely even respect that.

This is my robots.txt ↩︎

Towel Mouse

algernon@come-from.mad-scientist.club

4 months ago

Reply to @algernon@come-from.mad-scientist.club

Baba and the 'Wei are showing signs of life. They had a 6-hour scraping episode between ~20:00 and 02:00 my time, roughly 03:00-09:00 China Standard Time.

I wonder what they will do over the weekend. Will I have my playthings back?

I have some surprises waiting for them.

Towel Mouse

algernon@come-from.mad-scientist.club

4 months ago

Reply to @algernon@come-from.mad-scientist.club

Also, for some fun facts! You see those green spikes?

That's when y'all boost and star my toots. ~75% of the green line, the requests that pass through, are from fedi software.

Towel Mouse

algernon@come-from.mad-scientist.club

4 months ago

Reply to @algernon@come-from.mad-scientist.club

A couple of hours later, Baba and the 'Wei ain't back yet. But the weekend wave usually starts around 17 my time, so there's a few hours to go.

I wonder what will happen!

66% Baba and the 'Wei do what they do every weekend: try to scrape the whole wide world

33% This weekend's episode is cancelled.

Towel Mouse

algernon@come-from.mad-scientist.club

4 months ago

Reply to @algernon@come-from.mad-scientist.club

1.5 hours, and we'll see! I'm camp "try to scrape the whole wide world".

Towel Mouse

algernon@come-from.mad-scientist.club

4 months ago

Reply to @algernon@come-from.mad-scientist.club

Huh. This is not what I expected.

Baba and the 'Wei are back, but... they're at the rate they crawled at some 12 hours ago, not nearly at the level they crawled last weekend.

Now, I've seen them do this pattern before, coming at me every ~11.5-12.5 hours. It's not unusual.

But it's not their weekend pattern of late.

So it looks like this weekend's episode is cancelled. We're not going to be without any Baba and the 'Wei scraping, but instead of a weekend episode, we're gonna have some old re-runs.

Towel Mouse

algernon@come-from.mad-scientist.club

4 months ago

Reply to @algernon@come-from.mad-scientist.club

I'm seeing a very tiny uptick from Baba, but it's staying below the ai.robots.txt line. As if it was reading my charts.

Maybe that could be another story in the "algernon presents Baba and the 'Wei" series?

Towel Mouse

algernon@come-from.mad-scientist.club

4 months ago

Reply to @algernon@come-from.mad-scientist.club

Baba's gone. It did it's ~6 hour scraping, like ~16 hours earlier, and then it left. Its crawling speed never exceeded that of ai.robots.txt.

famfo

famfo@frogs.lgbt

4 months ago

Reply to @algernon@come-from.mad-scientist.club

@algernon global conspiracy against your metrics

Towel Mouse

Wolf480pl

Towel Mouse

[SUBJECT_NAME_HERE]

Towel Mouse

zivi

zivi

Towel Mouse

Joerg Jaspert

Towel Mouse

Towel Mouse

Towel Mouse

Noisytoot

Towel Mouse

Lars Wirzenius

Towel Mouse

solo

Towel Mouse

technomancy

Towel Mouse

solo

Towel Mouse

Towel Mouse

Towel Mouse

Towel Mouse

Towel Mouse

Towel Mouse

Towel Mouse

Cyborus

Towel Mouse

Towel Mouse

datarama

Towel Mouse

datarama

Towel Mouse

datarama

Towel Mouse

datarama

Towel Mouse

Towel Mouse

Towel Mouse

Jak2k 🏳️‍🌈

Towel Mouse

Towel Mouse

Towel Mouse

Towel Mouse

Joerg Jaspert

Towel Mouse

Joerg Jaspert

Towel Mouse

Aaron

Towel Mouse

Joerg Jaspert

Towel Mouse

Towel Mouse

KFears& 🏳️‍⚧️

Towel Mouse

Towel Mouse

datarama

Towel Mouse

Byte

Towel Mouse

datarama

Towel Mouse

Byte

Towel Mouse

Byte

Towel Mouse

Towel Mouse

Towel Mouse

Towel Mouse

Towel Mouse

Towel Mouse

Towel Mouse

Towel Mouse

famfo

Terms of Service