Conversation

Christine Lemmer-Webber

This is interesting. In my blogposts analyzing ATproto I had compared the shared heap vs message passing from a CS perspective

https://dustycloud.org/blog/how-decentralized-is-bluesky/
https://dustycloud.org/blog/re-re-bluesky-decentralization/

Ian Preston of Peergos did the actual formal mathematical proof of the incentive structure: https://peergos.net/secret/z59vuwzfFDp45jmsA6Wj2jc9hemCjB4JJHB81iosJsA9GAVRtkbrqBs/1024927538#%7B%22app%22:%22markup%22%2c%22path%22:%22ianopolous/docs%22%2c%22args%22:%7B%22filename%22:%22social-scaling.note%22%7D%2c%22writable%22:false%2c%22secretLink%22:true%2c%22linkpassword%22:%22UfAQURKSTTmM%22%2c%22open%22:true%7D

> It is interesting that this is independent of N. Let's say you have 1000 servers, and 1000 followers per user. Then the shared heap model uses about the same network bandwidth. With a small number of servers SH can be better, with many servers AP is better.

> The conclusion is that the shared-heap model builds in a structural incentive to keep M small, and thus has a natural centralizing force. Conversely there is an incentive in AP to keep F small.

Ie, there is a mathematical incentive in ATproto to only have a few large players.

9
2
0

@cwebber Would it be wrong to say that the vibes for both felt mostly correct? Where ActivityPub trends towards lots of smaller instances passing messages back and forth and ATProto being the big, shiny, corporate-backed thing that looks centralized?

0
0
0

@cwebber

May I ask, does that applies if you only store what you need from the network?

1
0
0

@gabboman @cwebber

I don't think that matters - the bottleneck is that with SH you need to read through all messages to determine what *is* relevant to you, while in AP you're only forwarded relevant messages to begin with based on your server's federation and following tree. Same bandwidth and computational costs, just less storage.

[That's not considering making a 'feed' on BlueSky, where maybe you want to look through messages retroactively to find posts or bios containing keywords - that certainly requires storing currently irrelevant posts for future use]

1
0
0

@cwebber @illegaldaydream

its not such a big deal to do that, i did it and im not a genius.

2
0
0

@illegaldaydream

a set of the dids of everyone being followed by users in the instance, a set of the dids of local users, and a bloom filter of posts any user has ever replied to in memory.

if event is by followed user, or the bloom filter says a user may had replied to the same post, or contains a mention to a local user, or a hashtag followed by an user, force store the post.

if its a like, check if the post is from a local user (post urls in bsky are userid/postid, so very easy).

if its a follow, only check for local users being followed.

Very easy to rebuild at a fedi scale.

0
0
0
@cwebber for some reason that link doesn't have an obvious way to actually view the file and I needed to click on "edit" then close the editor to actually view it
0
0
1

@cwebber This is exactly the kind of information I've been looking for lately! Thank you so much.

0
0
0

@gabboman @cwebber @illegaldaydream *implements bubblesort* what do you mean "computational complexity"? Look I successfully sorted a ten-element list, I'm not a genius but my code works

1
0
0

@cwebber @monokeros @illegaldaydream

hosting tech lgbt costs 800 euros a month. it hosts 15k users

app.wafrn.net has 5k users. Hosting it is 35€ a month. And it does both atproto and activitypub

that was uncalled for.

1
0
0

@gabboman @monokeros @illegaldaydream I'm guessing it's an atproto PDS though, which is not the same as hosting the full atproto infrastructure (relay with sufficient long term storage / backfill, etc)

1
0
0

@cwebber @illegaldaydream

we listen to the jetstream and decide what to store, if we decide that an event is worth to keep we ask the PDS directly for the event and do something that resembles more what AP expects you to do than what AT expects.

Wafrn does not use the bsky appview, and each wafrn server is its own appview.

we dont need the whole network history for stuff. why would we?

1
0
0

@cwebber @illegaldaydream

correction: not the jetstream, A jetstream.

1
0
0

@gabboman @illegaldaydream Am I right in my impression that this is effectively a filter over all events in the network coming from one relay that is presumed to know all potentially relevant events?

1
0
0

@cwebber @illegaldaydream

One of the pieces is filtering events, wich i do. The filtering that the jetstream offers is not good enough

The other, is fetching data from the pds.

And a third one, as optional to force fetch some missing replies is to use constelation, same way red dwarf does it.

I understand holding all the network is am expensive thing. But then again, why work hard when you can work smart ish.

then of course is showing the posts and building timelines.

1
0
0

@cwebber That’s fascinating! But the probability of any user following any other is far from constant. I wonder how that changes the analysis

In privately operated social networks, the celebrities / trending topics are treated differently. Different distribution, guarantees, and internal protocols. Only in the frontend do they appear to be the same

It might be ideologically hard to swallow but maybe distributed social networks should embrace this fact to scale better

0
0
0

@cwebber It is hard to view the proof without using the editor and probably impossible with a screen reader, so here are screenshots. I did the best I could with the equations and added parentheses where things might be ambiguous

0
0
0

@cwebber That can come from original design decisions, too. I routinely look at my intended use cases and pick algorithms and data structures to fit the expected needs. So, if BlueSky were looking at a centralized solution they'd pick shared-heap because it serves that better, whereas the Fediverse would be looking for a distributed solution and would pick message-passing as the better fit.

1
0
0

@cwebber I'm not sure it matters which came first, because no matter which you end up in a cycle where the initial choice reinforces the design and the design reinforces the choice. But honestly I tend to lean towards the algorithm choices being determined by the intended use cases. Personal experience says that's more common.

0
0
0