@lyse@lyse.isobeef.org Now increase the indexes on the Twt Subject form 7 bytes to 64 bytes š
@lyse@lyse.isobeef.org Congrats š
Hmm this question has a leading āYesā in favor of so far with 13 votes:
Should we formally support edit and deletion requests?
Thanks yāall for voting (itās all anonymous so I have no idea whoās voted for what!)
If you havenāt already had your say, please do so here: http://polljunkie.com/poll/xdgjib/twtxt-v2 ā This is my feeble attempt at trying to ascertain the voice of the greater community with ideas of a Twtxt v2 specification (which Iām hoping will just be an improved specification of what we largely have already built to date with some small but important improvements š¤)
Starting a couple of new projects (geez where do I find the time?!):
HomeTunnel:
HomeTunnel is a self-hosted solution that combines secure tunneling, proxying, and automation to create your own private cloud. Utilizing Wireguard for VPN, Caddy for reverse proxying, and Traefik for service routing, HomeTunnel allows you to securely expose your home network services (such as Gitea, Poste.io, etc.) to the Internet. With seamless automation and on-demand TLS, HomeTunnel gives you the power to manage your own cloud-like environment with the control and privacy of self-hosting.
CraneOps:
craneops is an open-source operator framework, written in Go, that allows self-hosters to automate the deployment and management of infrastructure and applications. Inspired by Kubernetes operators, CraneOps uses declarative YAML Custom Resource Definitions (CRDs) to manage Docker Swarm deployments on Proxmox VE clusters.
@aelaraji@aelaraji.com I think all replies are missing the fact that your auto-completion isnāt working. LOL. Or did I misunderstood?
I think thatās one of the worst aspects of the proposed idea of location-based addressing or identity. The fact that Alice reads Twt A and Bob reads Twt A at the same location, but Alice and Bob could have in fact read very different content entirely. It is no longer possible to have consistency in a decentralised way that works properly.
One could argue this is fine, because weāre so small and nothing matters, but itās a properly I rely on fairly heavily in yarnd
, a properly that if lost would have significant impact on how yarnd
works I think. š¤
Unless Iām missing something here š¤ But a <url> <timestamp>
does not for me identify an individual Twt, it only identifies its location, which may or may not have changed since I last saw a version of it hmmm š§
Also Iām not even sure I can validly cache, let alone index feeds anymore if we do this, because if the structure of a Twt is cuh that I can no longer trust that an individual Twtās content hasnāt been changed at the source, whatās the point of caching or indexing individual twts at all? This makes the implementations of yarnd
and yarns
(the search engine, crawlers and indexer) kind of hard to reason about.
Also youāre right I guess. But still that also requires the author not to change the timestamp too. Hmmm
@movq@www.uninformativ.de I donāt think thereās any misunderstand at all. I just treat every lines in a feed as an individual entity. These are stored on their own.
@movq@www.uninformativ.de So I obviously happen to agree with you as well. However in so saying, one of my goals was also to bring the simplicity of Twtxt to the Web and for the general ālay personā (of sorts). So I eventually found myself building yarnd
. Has it been successful, well sort of, somewhat (but that doesnāt matter, I like that itās small and niche anyway).
I agree that the goal of simplicity is a good goal to strive for, which is why Iām actually suggesting we change the Twt identifiers to be a simple SHA256 hash, something that everyone understand and has readily available tools for. I really donāt think we should be doing any of this by hand to be honest. But part of the beauty of Twt Subject and Twt Hash(es) in the first place is replying by hand is much much easier because you only have a short 7 or 11 character thing to copy/paste in your reply. Switching to something like <url> <timestamp>
with a space in it is going to become a lot harder to copy/paste, because you canāt ādouble clickā (or is it triple click for some?) to copy/paste to your clipboard/buffer now š¤£
Anyway I digress⦠On the whole edit thing, Iām actually find if we donāt support it at all and donāt build a protocol around that. I have zero issues with dropping that as an idea. Why? Because I actually think that clients should be auto-detecting edits anyway. They already can, Iāve PoCād this myself, I think it can be done. I havenāt (yet), and one of the reasons Iāve not spent much effort in it is it isnāt something that comes up frequently anyway.
Who cares if a thread breaks every now ān again anyway?
@doesnm@doesnm.p.psf.lt Like maybe you need to check something, debug a client, or whatever š
Sorry but i dont undestand b. New feed author? But why?
Donāt forget about the upcoming Yarn.social online meetup coming up this Saturday! š See #jjbnvgq for details! ā Hope to see yāall there šŖ
š Donāt forget to take the Twtxt v2 poll š if you havenāt done so already (sorry about the confusing question at the end!)
(#abcdefg12345)
to something like (https://twtxt.net/user/prologic/twtxt.txt 2024-09-22T07:51:16Z)
.
@doesnm@doesnm.p.psf.lt I donāt even advocate for reading Twtxt in its raw form in the first place, which is why Iām in favor of continuing to use content-based addressing (hashes) and incremental improve what we already have. IMO the only reason to read a Twtxt file in itās raw form is a) if youāre a developer b) new feed author or c) debugging a client issue.
(#abcdefg12345)
to something like (https://twtxt.net/user/prologic/twtxt.txt 2024-09-22T07:51:16Z)
.
Aggred. But reading twtxt in raw form sounds⦠I canāt do this
And finally the legibility of feeds when viewing them in their raw form are worsened as you go from a Twt Subject of (#abcdefg12345)
to something like (https://twtxt.net/user/prologic/twtxt.txt 2024-09-22T07:51:16Z)
.
There is also a ~5x increase cost in memory utilization for any implementations or implementors that use or wish to use in-memory storage (yarnd
does for example) and equally a 5x increase in on-disk storage as well. This is based on the Twt Hash going from a 13 bytes (content-addressing) to 63 bytes (on average for location-based addressing). There is roughly a ~20-150% increase in the size of individual feeds as well that needs to be taken into consideration (on the average case).
With Location-based addressing there is no way to verify that a single Twt actaully came from that feed without actually fetching the feed and checking. That has the effect of always having to rely on fetching the feed and storing a copy of feeds you fetch (which is okay), but youāre force to do this. You cannot really share individual Twts anymore really like yarnd
does (as peering) because there is no āintegrityā to the Twt identified by itās <url> <timestamp>
. The identify is meaningless and is only valid as long as you can trust the location and that the location at that point hasnāt changed its content.
Location-based addressing is vulnerable to the content changing. If the content changes the ālocationā is no longer valid. This is a problem if you build systems that rely on this.
So really your argument is just that switching to a location-based addressing ājust makes senseā. Why? Without concrete pros/cons of each approach this isnāt really a strong argument Iām afraid. In fact I probably need to just sit down and detail the properties of both approaches and the pros/cons of both.
I also donāt really buy the argument of simplicity either personally, because I donāt technically see it much more difficult to take a echo -e "<url>\t<timestamp>\t<content>" | sha256sum | base64
as the Twt Subject or concatenating the <url> <timestamp>
ā The āeffortā is the same. If weāre going to argue that SHA256 or cryptographic hashes are ātoo complicatedā then Iām not really sure how to support that argument.
@sorenpeter@darch.dk Points 2 & 3 arenāt really applicable here in the discussion of the threading model really Iām afraid. WebMentions is completely orthogonal to the discussion. Further, no-one that uses Twtxt really uses WebMentions, whilst yarnd
supports the use of WebMentions, itās very rarely used in practise (if ever) ā In fact I should just drop the feature entirely.
The use of WebSub OTOH is far more useful and is used by every single yarnd
pod everywhere (no that thereās that many around these days) to subscribe to feed updates in ~near real-time without having the poll constantly.
Some more arguments for a local-based treading model over a content-based one:
The format:
(#<DATE URL>)
or(@<DATE URL>)
both makes sense: # as prefix is for a hashtag like we allredy got with the(#twthash)
and @ as prefix denotes that this is mention of a specific post in a feed, and not just the feed in general. Using either can make implementation easier, since most clients already got this kind of filtering.Having something like
(#<DATE URL>)
will also make mentions via webmetions for twtxt easier to implement, since there is no need for looking up the#twthash
. This will also make it possible to make 3th part twt-mentions services.Supporting twt/webmentions will also increase discoverability as a way to know about both replies and feed mentions from feeds that you donāt follow.
@doesnm@doesnm.p.psf.lt Welcome back š
Finally pubnix is alive! Thatās im missing? Im only reading twtxt.net timeline because twtxt-v2.sh works slowly for displaying timelineā¦
@aelaraji@aelaraji.com Rsync has a ton of options and I probably still havenāt scratched the surface, but I was able to memorize the options I actually need for day-to-day work in a relatively short time. I guess Iām the opposite of you, because I donāt know any scp(1)
options.
x86 Embedded Controller with PC/104 Compatibility for Legacy Systems
The VDX3-6757 PC/104 family of low-power x86 embedded controllers meets PC/104 specifications, offering backward compatibility for projects facing end-of-life x86-based controllers. It is suited for applications like data acquisition, industrial automation, process control, and automotive control. Powered by a DM&P Vortex86DX3 1GHz dual-core CPU with 32KB L1 cache and 512KB L2 cache, the VDX3-6757 supports ⦠ā Read more
Been trying to get acquainted with rsync(1)
but, whenever I Tab
for completion and get this:
Ī» ~/ rsync ā
zsh: do you wish to see all 484 possibilities (162 lines)?
Iām like: Nope! a scp -rpCq ...
or whatever option salad will do just fine. š
[Insert: āAināt nobody got time foāthat!ā Meme.]
Syncthing is also as good as everyone says it is.
@movq@www.uninformativ.de Yes, the tools are surprisingly fast. Still, magrep takes about 20 seconds to search through my archive of 140K emails, so to speed things up I would probably combine it with an indexer like mu, mairix or notmuch.
@eapl.me@eapl.me Sad to see you go, disappointed in your choice of X, but respect your decision and choice. I will never cave in myself, even if it means my ācircle of friendsā remains low. I guess we call āem internet friends right? š
Aunque me gusta mucho el concepto descentralizado de ātwtxtā, este aƱo no lo he utilizado tanto. No pude tener a mi cĆrculo cercano, con quienes surgen las conversaciones que me gustan, y por el que se da un efecto de red significativo.
TambiĆ©n estoy buscando un minimalismo digital, utilizando servicios que brinden alegrĆa, valor y un uso de tiempo razonable.
Aunque es un tema controversial, ¿por qué no tener una comunidad de personas con las que sintamos que el mundo (digital al menos) es un lugar mejor?
QuizĆ”s un poco idealista el punto, aunque la intención es que el tiempo que pasamos en āla redā, nos ayude a crecer como personas, a disfrutar el tiempo, y a vivir esta vida digital con sentido.
Por todo esto, el poco tiempo que estƩ en microblogging, lo buscarƩ en las dos plataformas que mƔs conversaciones significativas me generan, que por un lado es X, para todo lo profesional, y Mastodon, para lo hipster, indie, idealista, etc.
Si algo de lo que he compartido por twtxt ha sido importante para ti, o quieres que sigamos charlando, me puedes encontrar en alguna de estas otras plataformas:
https://text.eapl.mx/microblogging
#fzf is the new emacs: a tool with a simple purpose that has evolved to include an #email client. https://sr.ht/~rakoo/omail/
Iām being a little silly, of course. fzf doesnāt actually check your email, but it appears to be basically the whole user interface for that mail program, with #mblaze wrangling the emails.
Iāve been thinking about how I handle my email, and am tempted to make something similar. (When I originally saw this linked the author was presenting it as an example tweaked to their own needs, encouraging people to make their own.)
This approach could surely also be combined with #jenny, taking the place of (neo)mutt. For example mblazeās mthread tool presents a threaded discussion with indentation.
@lyse@lyse.isobeef.org How violent is the thunderstorm? š¤
@aelaraji@aelaraji.com LOl š
LMAO 𤣠⦠Iāve been scrolling through mutt(1) man page and found this:
BUGS
None. Mutts have fleas, not bugs.
A new thing LLM(s) canāt do well. Write patches š¤£
@lyse@lyse.isobeef.org Yeah I think itās one of the reasons why yarnd
ās cache became so complicated really. I mean itās a bunch of maps and lists that is recalculated every ~5m. I donāt know of any better way to do this right now, but maybe one day Iāll figure out a better way to represent the same information that is displayed today that works reasonably well.
My point is, this is not a small trade-off to make for the sake of simplicity š
@movq@www.uninformativ.de Maybe I misspoke. Itās a factor of 5 in the size of the keyspace required. The impact is significantly less for on-disk storage of raw feeds and such, around ~1-1.5x depending on how many replies there are I suppose.
I wasnāt very clear; my apologies. If we update the current hash truncation length from 7 to 11. But then still decide anyway to go down this location-based twt identity and threading model then yes, weāre talking about twt subjects having a ~5x increase in size on average. Going from 14 characters (11 for the has, 2 for the parens, 1 for the #) to ~63 bytes (average Iāve worked out of length of URL + Timestamp) + 3 byte overhead for parents and space.
Donāt forget about the upcoming Yarn.social meetup coming up this Saturday! See #jjbnvgq for details! Hope to see some/all of yāall there šŖ
@lyse@lyse.isobeef.org And your query to construct a tree? Can you share the full query (screenshot looks scary š¤£) ā On another note, SQL and relational databases arenāt really that conduces to tree-like structures are they? š¤£
In fact it depends on how many Twts there are that form part of a thread, if you take a much larger sample size of my own feed for example, it starts to approximate ~1.5x increase in size:
$ ./compare.sh https://twtxt.net/user/prologic/twtxt.txt 500
Original file size: 126842 bytes
Modified file size: 317029 bytes
Percentage increase in file size: 149.94%
...
In fact @falsifian@www.falsifian.org you had quite a lot of good feedback, do you mind collecting them in a task list on the doc somewhere so I can get to em? š¤
Can someone make the edit?
@movq@www.uninformativ.de Tbis was just a representative sample. The real concrete cost here is a ~5x increase in memory consumption for yarnd
and/or ~5x increase in disk storage.
@lyse@lyse.isobeef.org Mind sharing your schema?
@lyse@lyse.isobeef.org Not sure Iāll check
@lyse@lyse.isobeef.org My proposal is three steps:
- increase the hash length from 7 to 11
Then:
- Add support for changing your feedās location without breaking g threads
Then much later:
- Add formal support for edits
@lyse@lyse.isobeef.org No I donāt either just sayān š
@movq@www.uninformativ.de Thatās what I want to know š¤£
What gossip, gopherspace?!
(Da ZERO)
šš EmissƵes de Gases com Efeito de Estufa: A Situação Continua Preocupante
ā ā ā ā ā ā ā ā ā ā
Um ano depois, a ZERO voltou a analisar as emissƵes de gases com efeito de estufa (GEE) originadas pelos combustĆveis rodoviĆ”rios, e os resultados continuam alarmantes. Embora tenha havido uma ligeira redução de 1,8%, ainda estamos longe de alcanƧar as metas necessĆ”rias. As emissƵes de transporte precisam cair cerca de 5,3% ao ano para que possamos atingir os objetivos do Plano Nacional de Energia e Clima (PNEC) para 2030.
ā ā ā ā ā ā ā ā ā ā
šØ 2025 Tem de Ser um Ano Exemplar!
O gasóleo, com 14 Mt de CO2 emitidos, continua a ser o maior vilĆ£o, e o consumo de gasolina aumentou significativamente. Para evitar o incumprimento das metas climĆ”ticas, Ć© crucial que 2025 marque uma viragem.ā ā ā ā ā ā ā ā
š Barómetro da Mobilidade: LanƧamos tambĆ©m um inquĆ©rito para criar uma base de dados sobre a mobilidade em Portugal, contribuindo para decisƵes mais informadas.
ā ā ā ā ā ā ā ā ā ā
š” A mudanƧa Ć© urgente e depende de todos nós, especialmente dos nossos lĆderes polĆticos. Vamos juntos construir um futuro mais sustentĆ”vel!
ā ā ā ā ā ā ā ā ā ā
Mais detalhe em https://zero.ong/noticias/emissoes-dos-transportes-continuam-a-ameacar-metas-climaticas-do-pais/
ā ā ā ā ā ā ā ā ā ā
#mundoZERO #MobilidadeSustentƔvel #Clima
So just to be clear, itās not as bad as the OP in this thread, this is just a worst case scenario. With some additional analysis I did today, its closer to around ~5x the memory requirements of my pod, which would roughly go from ~22MB to ~120MB or so, probably a bit more in practise. But this is still a significant increase in memory. The on-disk requirements would also increase by around ~5x as well on average going from ~12GB to about ~60GB at current archive size.
Just out of curiosity, I inspected the yarns database (the search engine//cralwer) to find the average length of a Twtxt URI:
$ inspect-db yarns.db | jq -r '.Value.URL' | awk '{ total += length; count++ } END { if (count > 0) print total / count }'
40.3387
Given an RFC3339 UTC timestamp has a length of 20 characters with seconds precision. Weāre talking about Twt Subject taking up ~63 characters/bytes on average.
Comparing a few feeds:
- @xuu@txt.sour.is would see an increase of ~20%
- @falsifian@www.falsifian.org would see an increase of ~8%
- @bender@twtxt.net would see an increase of ~20%
- @lyse@lyse.isobeef.org would see an increase of ~15%
- @aelaraji@aelaraji.com would see an increase of ~13%
- @sorenpeter@darch.dk would see an increase of ~8%
- @movq@www.uninformativ.de would see an increase of ~9%
Just from a scalability standpoint along Iām not seeing a switch to location-based Twt ids to support threading a good idea here. This is what I meant when I said to @david@collantes.us in a recent call that we open up a new can of worms (or new set of problems) by drastically changing the approach, rather than incrementally improving the existing approach we have today (_which has served us well for the past 4 years already_0.
Reminder to take the Twtxt (anonymous) Poll: http://polljunkie.com/poll/xdgjib/twtxt-v2
Apologies, I canāt edit the poll once itās live, so the suggestion on feedback for supporting Markdown will have to be discussed at another time.
@xuu@txt.sour.is š¤£š¤£š¤£
I demand full 9 digit nano second timestamps and the full TZ identifier as documented in the tz 2024b database! I need to know if there was a change in daylight savings as per the locality in question as of the provided date.
@falsifian@www.falsifian.org I believe the preserve means to include the original subject hash in the start of the twt such as (#somehash)
So I whipped up a quick shell script to demonstrate what I mean by the increase in feed size on average as well as the expected increase in storage and retrieval requirements.
$ ./compare.sh
Original file size: 28145 bytes
Modified file size: 70672 bytes
Percentage increase in file size: 151.10%
...
Thank goodness we relaxed that limit and Iāve stopped being so Puritan about it but my overall point is we would be significantly increasing the human size as well as the machine size of the identity of threads as well as twts
With the original specification of 140 character Twt length recommendation. Thereās only leaves you with about 78 characters worth of anything remotely useful to say in response.
Letās say the overhead is always three bytes two parentheses under space.
So for example, if we would use @movq@www.uninformativ.de ās feed as an example thread ID here, his feed with a particular timestamp, were already looking at a subject length of 59 bytes +/- a couple of bytes to denote the subject in the Twt itself/
One of the reasons we wanted to originally use Contant based addressing and short hashes as our threading model was to keep individual Twts short so that they were still readable if you viewed the manually by hand.
With the proposal to switch to location based addressing using a pointer to a feed and a timestamp in that feed youāre looking at roughly 2025 characters long because both the HTTP and HTML and even URI specifications do not specify maximum length for URI(s) AFAIK only recommendations.
@bender@twtxt.net I canāt see myself personally, increasing the infrastructure and costs to run this pod to support this as we switch over potentially and as things continue to grow in scale. You would never get your infinite search and infinite timeline features that youāve always wanted for example and I would have to drastically reduce what is visible or even searchable at any given point in time to much less than what it is today.
Another interesting side effect of changing from content-based addressing to location-based addressing is that switching from 7-byte keys to 2025-character keys for 3.5 million entries would expand the database size from 24.5 MB to about 7.09 GBāan increase of roughly 7.06 GB!
@bender@twtxt.net Ha! Maybe I should get on the Markdown train. Youāre taking away my excuses.
@falsifian@www.falsifian.org No worries! Fell few to contribute to the doc directly Iād you wish š
@falsifian@www.falsifian.org Hmmm not sure sorry š¤
Sorry, youāre right, I should have used numbers!
Iām donāt understand what āpreserve the original hashā could mean other than āmake sure thereās still a twt in the feed with that hashā. Maybe the text could be clarified somehow.
Iām also not sure what you mean by markdown already being part of it. Of course people can already use Markdown, just like presumably nothing stopped people from using (twt subjects) before they were formally described. But itās not universal; e.g. as a jenny user I just see the plain text.
@xuu@txt.sour.is Goos to know! š So as long as we remain decentralized and non-commercial (I assume non/profit works too?) weāre good?
MS-CF16 Fanless Low-Power Pico-ITX SBC with Alder Lake-N and Amston Lake Processors
The MS-CF16 is a compact Pico-ITX single-board computer designed for fanless, low-power, high-performance applications in harsh environments. Powered by Intel Alder Lake-N or Amston Lake Series SoCs, the board features a 2.5GbE LAN port, a GbE LAN port, and SATA 3.0 for storage. Unlike the previously covered MS-CF17, this model offers configurable Intel processors, each
@falsifian@www.falsifian.org The GDPR does not apply to the processing of data for a purely personal or household activity that is not connected to a professional or commercial activity.
@prologic@twtxt.net Do you feel the same about published vs. privately stored data?
For me thereās a distinction. I feel very strongly that I should be able to retain whatever private information I like. On the other hand, I do have some sympathy for requests not to publish or propagate (though I personally feel itās still morally acceptable to ignore such requests).
@lyse@lyse.isobeef.org Iād suggest making the whole content-type thing a SHOULD, to accommodate people just using some hosting service they donāt have much control over. (The same situation could make detecting followers hard, but IMO āplease email me if you follow meā is still legit twtxt, even if inconvenient.)
@prologic@twtxt.net Thanks for writing that up!
I hope it can remain a living document (or sequence of draft revisions) for a good long time while we figure out how this stuff works in practice.
I am not sure how I feel about all this being done at once, vs. letting conventions arise.
For example, even today I could reply to twt abc1234 with ā(#abc1234) Edit: ā¦ā and I think all you humans would understand it as an edit to (#abc1234). Maybe eventually it would become a common enough convention that clients would start to support it explicitly.
Similarly we could just start using 11-digit hashes. We should iron out whether itās sha256 or whatever but thereās no need get all the other stuff right at the same time.
I have similar thoughts about how some users could try out location-based replies in a backward-compatible way (append the replyto: stuff after the legacy (#hash) style).
However I recognize that Iām not the one implementing this stuff, and itās less work to just have everything determined up front.
Misc comments (I havenāt read the whole thing):
Did you mean to make hashes hexadecimal? You lose 11 bits that way compared to base32. Iād suggest gaining 11 bits with base64 instead.
āClients MUST preserve the original hashā ā do you mean they MUST preserve the original twt?
Thanks for phrasing the bit about deletions so neutrally.
I donāt like the MUST in āClients MUST follow the chain of reply-to referencesā¦ā. If someone writes a client as a 40-line shell script that requires the user to piece together the threading themselves, IMO we shouldnāt declare the client non-conforming just because they didnāt get to all the bells and whistles.
Similarly I donāt like the MUST for user agents. For one thing, you might want to fetch a feed without revealing your identty. Also, it raises the bar for a minimal implementation (Iām again thinking again of the 40-line shell script).
For āwho followsā lists: why must the long, random tokens be only valid for a limited time? Do you have a scenario in mind where they could leak?
Why canāt feeds be served over HTTP/1.0? Again, thinking about simple software. I recently tried implementing HTTP/1.1 and it wasnāt too bad, but 1.0 would have been slightly simpler.
Why get into the nitty-gritty about caching headers? This seems like generic advice for HTTP servers and clients.
Iām a little sad about other protocols being not recommended.
I donāt know how I feel about including markdown. I donāt mind too much that yarn users emit twts full of markdown, but Iām more of a plain text kind of person. Also it adds to the length. I wonder if putting a separate document would make more sense; that would also help with the length.
Apparently, participation in this yearās @hacktoberfest@hacktoberfest wonāt even grant you a āplant a treeā prize⦠how disappointing š
#MaradoWeekly #WeeklyRecord Week 38
@lyse@lyse.isobeef.org Nice ! š
@doesnm@doesnm.p.psf.lt Hello! š
Sassy pererĆŖ
Hello!
@lyse@lyse.isobeef.org Yes letās make UTF-8 mandatory š