list of posts - <antirez> (http://antirez.com)

page = list of posts -

url = http://antirez.com analysed @ 2021-09-28 08:05 UTC

An update about Redis developments in 2019

Yesterday a concerned Redis user wrote the following on Hacker News: — https://news.ycombinator.com/item?id=19204436 — I love Redis, but I'm a bit skeptical of some of the changes that are currently in development. The respv3 protocol has some features that, while they sound neat, also could significantly complicate client library code. There's also a lot of work going into a granular acl. I can't imagine why this would be necessary, or a higher priority than other changes like multi-thread support, better persistence model, data-types, etc.
[EDIT! I'm reconsidering all this because Marc Gravell from Stack Overflow suggested that we could just switch protocol for backward compatibility per-connection, sending a command to enable RESP3. That means no longer need for a global configuration that switches the behavior of the server. Put in that way it is a lot more acceptable for me, and I'm reconsidering the essence of the blog post] A few weeks after the release of Redis 5, I’m here starting to implement RESP3, and after a few days of work it feels very well to see this finally happening. RESP3 is the new client-server protocol that Redis will use starting from Redis 6. The specification at https://github.com/antirez/resp3 should explain in clear terms how this evolution of our old protocol, RESP2, should improve the Redis ecosystem. But let’s say that the most important thing is that RESP3 is more “semantic” than RESP2. For instance it has the concept of maps, sets (unordered lists of elements), attributes of the returned data, that may augment the reply with auxiliary information, and so forth. The final goal is to make new Redis clients have less work to do for us, that is, just deciding a set of fixed rules in order to convert every reply type from RESP3 to a given appropriate type of the client library programming language.
[This blog post is also experimentally available on Medium: https://medium.com/antirez/a-short-tale-of-a-read-overflow-b9210d339cff ] When a long running process crashes, it is pretty uncool. More so if the process happens to take a lot of state in memory. This is why I love web programming frameworks that are able, without major performance overhead, to create a new interpreter and a new state for each page view, and deallocate every resource used at the end of the page generation. It is an inherently more reliable programming paradigm, where memory leaks, descriptor leaks, and even random crashes from time to time do not constitute a serious issue. However system software like Redis is at the other side of the spectrum, a side populated by things that should never crash.
Writing an editor in less than 1000 lines of code, just for fun

WARNING: Long pretty useless blog post. TLDR is that I wrote, just for fun, a text editor in less than 1000 lines of code that does not depend on ncurses and has support for syntax highlight and search feature. The code is here: http://github.com/antirez/kilo . Screencast here: https://asciinema.org/a/90r2i9bq8po03nazhqtsifksb For the sentimentalists, keep reading… A couple weeks ago there was this news about the Nano editor no longer being part of the GNU project. My first reaction was, wow people still really care about an old editor which is a clone of an editor originally part of a terminal based EMAIL CLIENT. Let’s say this again, “email client”. The notion of email client itself is gone at this point, everything changed. And yet I read, on Hacker News, a number of people writing how they were often saved by the availability of nano on random systems, doing system administrator tasks, for example. Nano is also how my son wrote his first program in C. It’s an acceptable experience that does not require past experience editing files.
Yesterday night I was re-reading Redlock analysis Martin Kleppmann wrote ( http://martin.kleppmann.com/2016/02/08/how-to-do-distributed-locking.html ). At some point Martin wonders if there is some good way to generate monotonically increasing IDs with Redis. This apparently simple problem can be more complex than it looks at a first glance, considering that it must ensure that, in all the conditions, there is a safety property which is always guaranteed: the ID generated is always greater than all the past IDs generated, and the same ID cannot be generated multiple times. This must hold during network partitions and other failures. The system may just become unavailable if there are less than the majority of nodes that can be reached, but never provide the wrong answer (note: as we'll see this algorithm has another liveness issue that happens during high load of requests).
Martin Kleppmann, a distributed systems researcher, yesterday published an analysis of Redlock ( http://redis.io/topics/distlock ), that you can find here: http://martin.kleppmann.com/2016/02/08/how-to-do-distributed-locking.html Redlock is a client side distributed locking algorithm I designed to be used with Redis, but the algorithm orchestrates, client side, a set of nodes that implement a data store with certain capabilities, in order to create a multi-master fault tolerant, and hopefully safe, distributed lock with auto release capabilities.
Lua scripting is probably the most successful Redis feature, among the ones introduced when Redis was already pretty popular: no surprise that a few of the things users really want are about scripting. The following two features were suggested multiple times over the last two years, and many people tried to focus my attention into one or the other during the Redis developers meeting, a few weeks ago. 1. A proper debugger for Redis Lua scripts. 2. Replication, and storage on the AOF, of Lua scripts as a set of write commands materializing the *effects* of the script, instead of replicating the script itself as we normally do.
I’m just back from the Redis Dev meeting 2015. We spent two incredible days talking about Redis internals in many different ways. However while I’m waiting to receive private notes from other attenders, in order to summarize in a blog post what happened and what were the most important ideas exposed during the meetings, I’m going to touch a different topic here. I took the non trivial decision to move the Redis mailing list, consisting of 6700 members, to Reddit. This looks like a crazy ideas probably in some way, and “to move” is probably not the right verb, since the ML will still exist. However it will only be used in order to receive announcements of new releases, critical informations like security related ones, and from time to time, links to very important discussions that are happening on Reddit.
If you know me, you know I’m not the kind of guy that considers competing products a bad thing. I actually love the users to have choices, so I rarely do anything like comparing Redis with other technologies. However it is also true that in order to pick the right solution users must be correctly informed. This post was triggered by reading a blog post published by Mike Perham, that you may know as the author of a popular library called Sidekiq, that happens to use Redis as backend. So I would not consider Mike a person which is “against” Redis at all. Yet in his blog post that you can find at the URL http://www.mikeperham.com/2015/09/24/storing-data-with-redis/ he states that, for caching, “you should probably use Memcached instead [of Redis]”. So Mike simply really believes Redis is not good for caching, and he arguments his thesis in this way:
Everybody knows Redis is single threaded. The best informed ones will tell you that, actually, Redis is *kinda* single threaded, since there are threads in order to perform certain slow operations on disk. So far threaded operations were so focused on I/O that our small library to perform asynchronous tasks on a different thread was called bio.c: Background I/O, basically. However some time ago I opened an issue where I promised a new Redis feature that many wanted, me included, called “lazy free”. The original issue is here: https://github.com/antirez/redis/issues/1748 .
Yesterday Amplitude published an article about scaling analytics, in the context of using the Set data type. The blog post is here: https://amplitude.com/blog/2015/08/25/scaling-analytics-at-amplitude/ On Hacker News people asked why not using Redis instead: https://news.ycombinator.com/item?id=10118413 Amplitude developers have their set of reasons for not using Redis, and in general if you have a very specific problem and want to scale it in the best possible way, it makes sense to implement your vertical solution. I’m not adverse to reinventing the wheel, you want your very specific wheel sometimes, that a general purpose system may not be able to provide. Moreover creating your solution gives you control on what you did, boosts your creativity and your confidence in what you, as a developer can do, makes you able to debug whatever bug may arise in the future without external help.
I’m back from Paris, DotScale 2015 was a very interesting conference. Before leaving I was working on Sentinel in the context of the unstable branch: the work was mainly about connection sharing. In short, it is the ability of a few Sentinels to scale, monitoring many masters. Before to leave, and now that I’m back, I tried to “secure” a set of features that will be the basis for Redis 3.2. In the next weeks I’ll be focusing developing these features, so I thought it’s worth to share the list with you ASAP.
Redis Conference 2015

I’m back home, after a non easy trip, since to travel from San Francisco to Sicily is kinda NP complete: there are no solutions involving less than three flights. However it was definitely worth it, because the Redis Conference 2015 was very good, SF was wonderful as usually and I was able to meet with many interesting people. Here I’ll limit myself to writing a short account of the conference, but the trip was also an incredible experience because I discovered old and new friends, that are not just smart programmers, but also people I could imagine being my friends here in Sicily. I never felt alone while I was 10k kilometers away from my home.
Today I was testing Redis latency using m3.medium EC2 instances. I was able to replicate the usual latency spikes during BGSAVE, when the process forks, and the child starts saving the dataset on disk. However something was not as expected. The spike did not happened because of disk I/O, nor during the fork() call itself. The test was performed with a 1GB of data in memory, with 150k writes per second originating from a different EC2 instance, targeting 5 million keys (evenly distributed). The pipeline was set to 4 commands. This translates to the following command line of redis-benchmark:
Almost a month ago a number of people interested in Redis development met in London for the first Redis developers meeting. We identified together a number of features that are urgent (and are now listed in a Github issue here: https://github.com/antirez/redis/issues/2045 ), and among the identified issues, there was one that was mentioned multiple times in the course of the day: diskless replication. The feature is not exactly a new idea, it was proposed several times, especially by EC2 users that know that sometimes it is not trivial for a master to provide good performances during slaves synchronization. However there are a number of use cases where you don’t want to touch disks, even running on physical servers, and especially when Redis is used as a cache. Redis replication was, in short, forcing users to use disk even when they don’t need or want disk durability.
The first commit I can find in my git history about Redis Cluster is dated March 29 2011, but it is a “copy and commit” merge: the history of the cluster branch was destroyed since it was a total mess of work-in-progress commits, just to shape the initial idea of API and interactions with the rest of the system. Basically it is a roughly 4 years old project. This is about two thirds the whole history of the Redis project. Yet, it is only today, that I’m releasing a Release Candidate, the first one, of Redis 3.0.0, which is the first version with Cluster support.
Yesterday I lost all my blog data in a rather funny way. When I installed this new blog engine, that is basically a Lamer News slightly modified to serve as a blog, I spinned a Redis instance manually with persistence *disabled* just to see if it was working and test it a bit. I just started a screen instance, and run something like ./redis-server --port 10000. Since this is equivalent to an empty config file with just "port 10000" inside I was running no disk backed at all. Since Redis very rarely crashes, guess what, after more than one year it was still running inside the screen session, and I totally forgot that it was running like that, happily writing controversial posts in my blog. Yesterday my server was under attack. This caused an higher then normal load, and Linode rebooted the instance. As a result my blog was gone.
Today Joyent wrote a blog post in the company blog about an issue that started with this pull request in the libuv project: https://github.com/joyent/libuv/pull/1015#issuecomment-29538615 Basically the developer Ben Noordhuis rejected a pull request involving a change in the documentation to use gender-neutral form instead of “him”. Joyent replied with this incredible post: http://www.joyent.com/blog/the-power-of-a-pronoun . In the blog post you can read: “But while Isaac is a Joyent employee, Ben is not—and if he had been, he wouldn't be as of this morning: to reject a pull request that eliminates a gendered pronoun on the principle that pronouns should in fact be gendered would constitute a fireable offense for me and for Joyent.”
This blog post describes the new algorithm used in Redis Cluster in order to propagate and update metadata, that is hopefully significantly safer than the previous algorithm used. The Redis Cluster specification was not yet updated, as I'm rewriting it from scratch, so this blog post serves as a first way to share the algorithm with the community. Let's start with the problem to solve. Redis Cluster uses a master - slave design in order to recover from nodes failures. The key space is partitioned across the different masters in the cluster, using a concept that we call "hash slots". Basically every key is hashed into a number between 0 and 16383. If a given key hashes to 15, it means it is in the hash slot number 15. These 16k hash slots are split among the different masters.
Twilio just released a post mortem about an incident that caused issues with the billing system: http://www.twilio.com/blog/2013/07/billing-incident-post-mortem.html The problem was about a Redis server, since Twilio is using Redis to store the in-flight account balances, in a master-slaves setup, with multiple slaves in different data centers for obvious availability and data safety concerns. This is a short analysis of the incident, what Twilio can do and what Redis can do to avoid this kind of issues.
Lately I'm trying to push forward Redis 2.8 enough to reach the feature freeze and release it as a stable release as soon as possible. Redis 2.8 will not contain Redis Cluster, and its implementation of Redis Sentinel is the same as 2.6 and unstable branches, (Sentinel is taken mostly in sync in all the branches being fundamentally a different project using Redis just as framework). However there are many new interesting features in Redis 2.8 that are back ported from the unstable branch. Basically 2.8 it's our usual "in the middle" release, like 2.4 was: waiting for Redis 3.0 that will feature Redis Cluster (we have great progresses about it! See https://vimeo.com/63672368 ), we'll have a 2.8 release with everything that is ready to be released into the unstable branch. The goal is of course to put more things in the hands of users ASAP.
Dear Redis users, in the final part of 2012 I repeated many time that the focus, for 2013, is all about Redis Cluster and Redis Sentinel. This is exactly what I'm going to do from the point of view of the big picture, however there are many smaller features that make a big difference from the point of view of the Redis user day to day operations. Such features can't be ignored as well. They are less shiny in a feature list, and they are not good to generate buzz and interest in new users, and sometimes boring to code, but they are very important from a practical point of view.
This is one of this trivial changes in Redis that can make a big difference for users. Basically in the unstable branch I added some code that has the following effect, when running Redis on Linux systems: [32741] 19 Nov 12:00:55.019 * Background saving started by pid 391 [391] 19 Nov 12:01:00.663 * DB saved on disk [391] 19 Nov 12:01:00.673 * RDB: 462 MB of memory used by copy-on-write As you can see now the amount of additional memory used by the saving child is reported (it is also reported for AOF rewrite operations).
On Twitter: @War3zRub1 "Hahaha it's silly how people use Redis when they need a reverse proxy" @C4ntc0de "ZOMG! Use a real message queue, Redis is not a queue!" @L4m3tr00l "My face when Redis BLABLABLA..." Meanwhile *at* Twitter: OP1: "Hey guys, there is a spike in the number of lame messages today, load is increasing..." OP2: "Yep noticed, it's the usual troll fiesta trowing shit at Redis, 59482 messages per second right now." OP1: "Ok, no prob, let's spawn two additional Redis nodes to serve their timelines as smooth as usually".
3245 days ago,