I love simple, lightweight, small, minimal tools that just do the bare minimum. Based on that, does anyone have any good recommendations for a key value store that is:
- lightweight
- clustered
- sharded (so if I have 5 instances and 100 keys, each node will roughly have 20 keys on it).
- easy to join nodes: as in
kv-server --join somehost:1111
For reference, I think Consul is too heavy (and not sharded I believe).
It would be great to have a small go executable, that I can run on 10 servers, all connected up, that exposes a redis like api. Simple GET, PUT and STREAM would be great.
@prologic@twtxt.net anyone else here I can ping?
@markwylde@twtxt.net Bitraft is close 👌 but not sharded 😢
Thanks guys. Bitraft is awesome @prologic@twtxt.net but yeah, not sharded :( I did try etcd before @abucci@anthony.buc.ci but I did find it tricker to setup than Bitraft. But again, it’s not sharded :(
@prologic@twtxt.net ever done any stress testing on bitraft? In a cluster, do you know that the throughput would be? Like, PUT’s per second and GET’s per second?
@markwylde@twtxt.net No but I could do some testing and publish the results 👌
As for the sharding though… Let’s discuss this?
@prologic@twtxt.net I’m happy to do it. Might try now actually. It was just incase you knew. I’ll post in the README if I get it working. I’m hoping redis-benchmark will work since it’s got the same api as redis.
I wonder if sharding could be implemented by:
Presumptions:
- redis can broadcast to all nodes in the cluster
- REPLICA_COUNT is 3
PUT workflow:
- a PUT get’s forwarded to REPLICA_COUNT random nodes in the cluster
GET workflow:
- a broadcast is made to the cluster saying “I NEED A VALUE FOR KEY ‘TEST’”
- all nodes that contain that value reply to the server
- the first response get’s forwarded to the client
- the other responses are discarded
I’m sure there would be some edges cases, like syncing.
- What if 1 of the random node’s is full and therefore only REPLICA_COUNT-1 nodes received the document
- This could me 2 nodes have the new value, but the 3rd has the old value
Maybe it could be solved by only committing once REPLICA_COUNT nodes successfully receive the message.
@markwylde@twtxt.net If you could benchmark this that would be wonderful! 👌 – Also reading your thought son “Sharding”, I think you might be slightly confused, because what you just described is essentially “High Availability”, and not Sharding.
In fact Bitraft already has this anyway. It fully supports forming a High Availability Cluster.
But in Bitraft every node contains every key + value, right? I probably wasn’t clear above, but in my idea REPLICA_COUNT would be 3 but the NODE_COUNT may be 10. So a put would go to 3 of 10 of the nodes.
Did a quick benchmark:
https://git.mills.io/prologic/bitraft/issues/58
Seems the summary benchmark of a 5node cluster on my laptop is:
GET: 1165.64 requests per second
SET: 1061.80 requests per second
And why not use redis then?
Or memcached with mcrouter.