This article is based on the talk I gave at the ViennaDB meetup on Sep 22nd, 2014.
Memcached was traditionally used for all our web projects at KURIER.at. While it worked without a hitch, I found it too simple for certain use cases. Which is why as a tech lead on the events.at relaunch team, I decided to use Redis in our new stack instead.
It has been described as “Memcached on steroids” and, while that’s certainly true, it’s only part of the story. Redis offers functionality far beyond what would be expected from a typical key-value cache. For instance, it’s also a message broker right out of the box.
Redis is used by big services like Twitter, Pinterest, Tumblr, GitHub, and Stack Overflow.
Redis stands for “REmote DIctionary Server”, is written in C, and was initially released in 2009. The most recent version is 2.8.17. Clients to access Redis are available for over 30 programming languages.
The official project description states:
Redis is an open source, BSD licensed, advanced key-value cache and store. It is often referred to as a data structure server since keys can contain strings, hashes, lists, sets, sorted sets, bitmaps and hyperloglogs.
There are two interesting observations here: Redis describes itself not only as a cache but also as a store. Additionally, it supports multiple data types.
Redis stores its data fully in memory but additionally also on disk. Data is persisted to disk periodically, e.g. every minute or after X writes, which can be configured. The idea is not to use it as primary store for sensitive data, but rather to not lose all of your (cached) data if the system needs to restart.
1 million small key-value string pairs use about 100 MB of memory. Redis is single-threaded but they claim that CPU should never be the bottleneck. Their documentation says that an average Linux system is able to deliver 500,000 requests per second. However, in our own benchmarks on our very capable servers we found the limit to be around 150,000 req/s – which is still excellent.
As a side note: Redis comes with its own benchmarking tool which is very handy and easy to use. Here is a run from my MacBook Pro (Retina, 13-inch, Mid 2014):
$ redis-benchmark -q -n 100000 PING_INLINE: 80321.28 requests per second PING_BULK: 74183.98 requests per second SET: 74404.77 requests per second GET: 79681.27 requests per second INCR: 80256.82 requests per second LPUSH: 81234.77 requests per second LPOP: 81566.07 requests per second SADD: 81234.77 requests per second SPOP: 83263.95 requests per second LPUSH (needed to benchmark LRANGE): 76045.62 requests per second LRANGE_100 (first 100 elements): 19801.98 requests per second LRANGE_300 (first 300 elements): 9391.44 requests per second LRANGE_500 (first 450 elements): 6667.56 requests per second LRANGE_600 (first 600 elements): 5291.01 requests per second MSET (10 keys): 37091.99 requests per second
Memcached is multi-threaded but many people claim that they found both databases to be roughly equally fast. So performance should not be a deciding factor when choosing between one or the other.
The core commands to write, read, and delete data are the following:
SET key value [EXPIRES] GET key DEL key
Those are basically the same commands that Memcached supports.
In Redis keys can expire but don’t have to. Memcached is an LRU-cache so it automatically deletes keys that haven’t been accessed for the longest time once available memory is tight. With Redis’ default configuration the developer is responsible to make sure to not run out of memory. This may sound like a disadvantage but it guarantees that no data is lost unless it expires or is explicitly deleted. However, Redis can also be configured to act as an LRU-cache.
One command that Redis offers that I found extremely useful is
KEYS. It enables the retrieval of all keys that match a given pattern (e.g. all keys that start with “event:”). It has to be used with care though, since it requires O(N) time, where N is the amount of keys in the database. 1 million keys can be scanned in roughly 40 ms but no other command is able to execute during that time since Redis is single-threaded.
Memcached does not distinguish between different data types, everything is a string. The only commands it offers to modify values directly are
Redis supports multiple data types and offers operations tailored to each type. The possible data types are:
- Sorted Sets
Possible use cases for hashes are sessions, lists could be used for task queuing or last modified items (with paging), sets to index items by category, sorted sets to manage scoreboards or to track time-based events, and HyperLogLogs to track unique visitors.
The documentation of the commands is outstanding so there’s no need to explain everything here.
As an example, let’s take a look at some of commands regarding sets. Sets are, as usual, an unordered collection of values where none is allowed to appear multiple times.
The basic CRUD operations are:
SADD key member [member …] SMEMBERS key SREM key member DEL key
What I found very useful is to select random members via
SRANDMEMBER. To look up if a value is stored in a set
SISMEMBER can be used. The number of total elements can be retrieved via
SCARD and values can be moved from one set to another via
SMOVE. Operations on multiple sets are also possible via
Redis does support transactions but not in the traditional sense. Commands can be queued via
MULTI and the only guarantee Redis gives you is that those commands are executed sequentially with no other commands in-between once
EXEC is called.
To make sure no important values are modified since they were last read, changes can be detected via
WATCH. If the value of a monitored key is modified,
EXEC will fail.
Rollbacks are not supported.
While all of this is very powerful already, Redis is also a simple message broker. It supports the publish-subscribe pattern through the following commands:
PUBLISH channel message SUBSCRIBE channel [channel …] UNSUBSCRIBE channel [channel …]
Additionally, as stated earlier, lists can also be used as queues to store messages/commands for other parts of your system.
So if your needs for message passing are very simple, Redis can directly be used instead of maintaining a more complex system like RabbitMQ or ActiveMQ.
While I’m not a big fan of storing logic in the database, it can be useful or more efficient at times. Redis enables server-side scripting via Lua. For more information about this, take a look at the documentation of the
One of my favorite features is how easy monitoring is.
INFO provides important statistics about your server such as memory consumption, active connections, and replication information.
The most powerful command is
MONITOR though. It shows all commands that are processed in real time. This is very useful during development as well as in production.
Redis at events.at
We run Redis in a master-slave setup. Support for clustering will be added with version 3 in a few months. Both setups are not possible with Memcached.
At the moment we only use it to cache data from our REST API on our web frontend. We make heavy use of sets. For example, we maintain a set for each event category, e.g. “Theater” or “Jazz”, and store the keys of the 50 most interesting upcoming events. On the detail page of an event we select five random keys from the set of the event’s category to display similar events.
In the near future we plan to make use of the message broking features in our REST API. Since we’re constrained to a single-threaded programming language, we can’t fork another thread to start a long running task (like an import) and end the incoming HTTP request immediately. With Redis we’re able to enqueue a job and run it later. That is basically what GitHub is doing with Resque.
We are very happy with Redis. It was very easy to integrate in our application as well as to deploy it. It is a joy to develop with, so far it has not failed us, and it just works™.
As it stands today, I don’t see a reason for anyone to pick Memcached over Redis.