Okay this one is pretty specific to our setup, but it’s a neat little trick I came up with, so I want to share it.
First a little background regarding our caching strategy with Varnish. Ideally, we would like to serve fresh content every time. This isn’t a perfect world, though, and for various reasons, we can’t get away with that. Sometimes we have to serve cached objects to protect our Apache backends from getting blown out of the water by traffic spikes or particularly popular URLs. So our strategy is as follows:
- Always serve static objects out of cache. Static objects are image files, .js, .css, etc. Most of this is taken care of by serving off of CDN edge servers, but there are still a few static objects served directly off of our cluster, and invoking an entire Apache/PHP process just to serve a 10kB .gif is dumb.
- If the client is a logged-in user, always return a fresh object from the Apache backend. In the future we plan to use Edge Side Includes to achieve delivering cached versions of all the non-user-specific objects on a page to a logged-in user, which will bring the Apache load down, but for now the number of requests from logged-in users makes up a small-enough percentage of the total to make it okay to simply pass auth’d requests back to Apache every time.
- This is the more complex bit: we have Varnish configured to insert every object it retrieves from the backend into the cache, except for objects generated for logged-in users (for fault-tolerance, see below). This ensures that most every request from an anonymous user will be a cache hit. But we still want to serve fresh content whenever possible, so there’s an override block in vcl_hit() which says “if the backend is alive and the object doesn’t match some specific regex, return(pass);”, which is varnish-speak for don’t serve a cached object, get a fresh copy from the backend instead.
Point C is a simple method of fault-tolerance that I designed which allows Varnish to easily do four things: one, take advantage of real-world workload and get a nice full cache that’s also current - that is, it’s continually updated with new items that get published and clicked on.
Two, the short-circuit in vcl_hit() bypasses cache hits for objects that are deemed ‘cool’ (as in not hot) enough that we can be sure we won’t blast our Apache backends to death by getting fresh copies of them, thus ensuring that most objects are delivered fresh.
Three, the fact that we have this nice full cache means that we can take advantage of Varnish’s grace mode in case our Apache backend cluster dies. If the backend dies, then Varnish can serve the client stale versions of the objects they request instead of spewing an error. This really comes in handy when something stupid happens like our database backend gets overloaded due to someone performing an ill-advised cache bust on the memcache side, or a bug in our code causes Apache to become unresponsive.
Four, for objects that are hot, we give them a medium-length TTL of 4 minutes and serve them out of cache. In effect, that means that a hot page only invokes an Apache/PHP page generation once every four minutes, while Varnish takes care of the rest of the requests for that object by serving a cached version. This drastically reduces the load on the Apache cluster, which makes everyone happy.
That last bit brings me to the point of this post, which is to illustrate a couple of little helper scripts I wrote to make the task of changing the varnish configuration to pop out and push in URLs which either no longer need to be served out of cache, or need to start being served out of cache fast and easy. The relevant bit in the varnish configuration looks something like this:
sub vcl_hit {
if (req.backend.healthy && req.url !~ “(png|gif|jpeg|jpg|ico|swf|css|js|html|htm)(\?[a-z0-9]+)?$” && req.url !~ “6747655” && req.url !~ “6749862”) { ### sedbait
return (pass);
} else {
return (deliver);
}
This is the conditional that says “if the backend is healthy and req.url doesn’t match any of the regexps in this list, grab a fresh copy from the backend, otherwise serve it out of cache.” Those regexps we test for are just numbers, they correspond to unique node_ids representing specific pieces of content on our site, which is contained in the URL. The first two pieces of the conditional never change: req.backend.healthy and a regexp matching static objects. After that, it gets dynamic. That list represents the node_ids that are currently “hot”, as a result of a promoted link on another site, for example. These change on a sometimes-hourly (but mostly daily or weekly) basis. Initially I would edit the script by hand any time we needed to add or remove a new node_id to or from the hot list, but it becomes apparent pretty quickly that we need a better way to make these changes and get them pushed out quickly, in order to respond to unforeseen hot objects that are beginning to have noticeable negative effects on our Apache cluster. Here are the scripts I wrote to automate the task:
#!/bin/bash
# vadd.sh by Adam Staudt <adam.staudt#connectedventures.com>
# takes two arguments - first the source file, then a pattern.
# appends pattern to vcl_hit() block in order to
# force-cache URLs containing the supplied pattern.
if [ $# -ne 2 ]; then
echo “Usage: $0 <file> <pattern>”;
exit 1;
fi
FILE=$1
PATTERN=$2
cat $FILE | sed -e “/sedbait/ s/) {/ \&\& req.url \!~ "$PATTERN") {/” > $FILE.new ; mv $FILE.new $FILE
You can see why there’s a comment at the end of the if() block that says “sedbait”. The script to perform the inverse operation is similar:
FILE=$1
PATTERN=$2
cat $FILE | sed -e “/sedbait/ s/ \&\& req.url \!~ "$PATTERN"//g” > $FILE.new ; mv $FILE.new $FILE
So now if our Apache servers are spiking and we can pinpoint specific hot objects (either with the help of varnishtop or our ad ops team), it’s quick and easy to append the hot object’s node_id to the varnish config and push it live, using methods discussed in my previous post.