read mysqldump output from mysql 5.1+ into a mysql 5.0 server

For dumb reasons, we sometimes need to load a mysqldump from a MySQL 5.1 or 5.5 machine into a MySQL 5.0 server. Since we use indexes, and the format for setting an index changed from 5.0 to 5.1, you’ll be greeted with one of these when you try this:

ERROR 1064 (42000) at line 2451: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ‘USING BTREE

) ENGINE=InnoDB DEFAULT CHARSET=utf8’ at line 10

Annoying. How to fix:

$ zcat 20120621.db.sql.gz | sed ‘/USING BTREE/ s/\(.*\)\((.*\)\(USING BTREE\)/\1 \3 \2/’ | mysql db

What does it do? All it does is move the “USING BTREE” string out in front of the array of columns on which you’re defining the index. This is a syntax change introduced in MySQL 5.1.

be careful about attaching truss to memcached process!

Here’s one that bit me in the balls today:

On FreeBSD 8.2-RELEASE, using memcached-1.4.5 (built out of ports), I attached truss to the PID of the running memcached process, and after about 20 seconds of logging system calls to the terminal, truss segfaulted and took memcached with it. When your live site makes such heavy use of memcache, this is bad for business. Fortunately traffic at the time wasn’t too high, and a quick restart of the daemon was sufficient to protect the MySQL backends from being crushed entirely to death before the cache could warm back up.

Since the truss binary on this machine doesn’t have any debugging symbols built in, a gdb backtrace is not ultra helpful:

(gdb) bt full

#0  0x0000000800748b27 in strlen () from /lib/libc.so.7

No symbol table info available.

#1  0x00000000004030ab in ?? ()

No symbol table info available.

#2  0x0000000000409f84 in ?? ()

No symbol table info available.

——- on and on “no symbol table available” and unknown syscall for many pages ——-

Cannot access memory at address 0x800000000000

So maybe think twice about using truss to peek into the guts of your running memcached process in production.

collegehumor:


OMG THE ANNUAL COLLEGEHUMOR ALL-NIGHTER IS HEREFeaturing:- 12 brand new videos filmed, edited, and released in 12 hours- A LIVE office video stream throughout the night- Back by demand: The Chris Gethard Insult-O-Thon- Awesome surprise guests that we’re not allowed to mention yet (but past guests have included Pete Holmes, Whitest Kids U Know, Ben Schwartz, MC Chris, Allison Williams (of HBO’s Girls), Dereck Comedy, Freestyle Love Surpeme, BriTaNick, and more)- A LIVE Dinosaur Office- Lots of free giveaways and contests for you to win- All-new Jake & Amir- Appearances from David Young and Sarah Schneider - Even more awesome things that we just don’t remember right now

Set every alarm in your phone. You do not wanna miss what we have in store for this year.MAY 24 8pm (EST) - May 25 8am (EST)

RSVP TO THE GREATEST SLUMBER PARTY ON THE INTERNET







brutally ironic that you’re instructed to set all the alarms on your phone, immediately followed by the wrong time for the event. NYC is currently on EDT (UTC-4), not EST.

collegehumor:

OMG THE ANNUAL COLLEGEHUMOR ALL-NIGHTER IS HERE

Featuring:
- 12 brand new videos filmed, edited, and released in 12 hours

- A LIVE office video stream throughout the night

- Back by demand: The Chris Gethard Insult-O-Thon

- Awesome surprise guests that we’re not allowed to mention yet (but past guests have included Pete Holmes, Whitest Kids U Know, Ben Schwartz, MC Chris, Allison Williams (of HBO’s Girls), Dereck Comedy, Freestyle Love Surpeme, BriTaNick, and more)

- A LIVE Dinosaur Office

- Lots of free giveaways and contests for you to win

- All-new Jake & Amir

- Appearances from David Young and Sarah Schneider 

- Even more awesome things that we just don’t remember right now
Set every alarm in your phone. You do not wanna miss what we have in store for this year.

MAY 24 8pm (EST) - May 25 8am (EST)







brutally ironic that you’re instructed to set all the alarms on your phone, immediately followed by the wrong time for the event. NYC is currently on EDT (UTC-4), not EST.

what’s eating up all the disk space?

This one has been kicking around for a long time, but I find it pretty damn useful so I’ll share:

$ cd /

$ du -cks * | sort -rn | head

7001595 total

3733310 usr

1345750 tmp

747090 var

634500 staging

488456 boot

29532 root

7848  lib

4736 sbin

4640 rescue

Be aware this can take some time, especially if you have a large disk or big filesystems mounted below the point you told du to start searching, because it does it recursively. You’ll definitely want to use the —exclude (-I on BSD) switch to du to ignore patterns if you have other filesystems mounted that you don’t want to waste time recursively searching.

This gives you the top 10 biggest directories in /, sorted from biggest to smallest. The number in front is the size in kB (provided by the -k switch to du). From here you can drill down into individual directories and use the same command to uncover exactly what’s eating up all your disk space.

bit counting

This one is fun for nerds but not really too useful in real life, probably.

Say you want to count all the ‘on’ bits (the 1’s) in some arbitrary base 10 integer. ‘bc’ and some shell program friends can help!

$ echo “ibase=10;obase=2; 4095” | bc | tr -dc ‘1’ | wc -c

      12

Slick. What if you have a hexadecimal number?

echo “ibase=16;obase=2; DEADBEEF” | bc | tr -dc ‘1’ | wc -c

      24

Alternatively you can count bits in a stream of ASCII characters, read from a text file, for example. Here’s one way, in C:
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
FILE *infile;
unsigned long num_bits(int num)
{
  unsigned long z = 0;          // number of ‘1’ bits
  int n;                        // loop iterator
  for (n=7 ; n>=0 ; n—)
  {
    if (num >= pow(2,n))
    {
      z++;
      num -= (pow(2,n));
    }
  } 
  return z;
}
int main(int argc, char* argv[])
{
  unsigned long z = 0; // number of 1 bits
  int c; // char read in from file
  if (argc != 2)
  {
    printf(“Usage: %s <filename>\n”, argv[0]);
    return 0;
  }
  if (!(infile = fopen(argv[1], “r”)))
  {
    perror(argv[1]);
    return -1;
  }
  while ( (c = fgetc(infile)) != EOF)
  {
    z += num_bits(c);
  }
  printf(“Number of ON bits in the stream is %lu\n”, z);
  return 0;
}
EDIT! A pal of mine who is an actual professional programmer points out that using bit-shift and bitwise comparisons in the num_bits() function is extra nerdy, extra correct, and best of all, extra fast:
unsigned long num_bits(int num)
{
  unsigned long z = 0;          // number of ‘1’ bits
  for ( ; num > 0 ; num = num » 1)
  {
    z += (num & 1);
  }
  return z;
}
Nice. Thanks ln!

Varnish Cache config changes made fast and easy

Okay this one is pretty specific to our setup, but it’s a neat little trick I came up with, so I want to share it.

First a little background regarding our caching strategy with Varnish. Ideally, we would like to serve fresh content every time. This isn’t a perfect world, though, and for various reasons, we can’t get away with that. Sometimes we have to serve cached objects to protect our Apache backends from getting blown out of the water by traffic spikes or particularly popular URLs. So our strategy is as follows:

  1. Always serve static objects out of cache. Static objects are image files, .js, .css, etc. Most of this is taken care of by serving off of CDN edge servers, but there are still a few static objects served directly off of our cluster, and invoking an entire Apache/PHP process just to serve a 10kB .gif is dumb.
  2. If the client is a logged-in user, always return a fresh object from the Apache backend. In the future we plan to use Edge Side Includes to achieve delivering cached versions of all the non-user-specific objects on a page to a logged-in user, which will bring the Apache load down, but for now the number of requests from logged-in users makes up a small-enough percentage of the total to make it okay to simply pass auth’d requests back to Apache every time.
  3. This is the more complex bit: we have Varnish configured to insert every object it retrieves from the backend into the cache, except for objects generated for logged-in users (for fault-tolerance, see below). This ensures that most every request from an anonymous user will be a cache hit. But we still want to serve fresh content whenever possible, so there’s an override block in vcl_hit() which says “if the backend is alive and the object doesn’t match some specific regex, return(pass);”, which is varnish-speak for don’t serve a cached object, get a fresh copy from the backend instead.

Point C is a simple method of fault-tolerance that I designed which allows Varnish to easily do four things: one, take advantage of real-world workload and get a nice full cache that’s also current - that is, it’s continually updated with new items that get published and clicked on.

Two, the short-circuit in vcl_hit() bypasses cache hits for objects that are deemed ‘cool’ (as in not hot) enough that we can be sure we won’t blast our Apache backends to death by getting fresh copies of them, thus ensuring that most objects are delivered fresh.

Three, the fact that we have this nice full cache means that we can take advantage of Varnish’s grace mode in case our Apache backend cluster dies. If the backend dies, then Varnish can serve the client stale versions of the objects they request instead of spewing an error. This really comes in handy when something stupid happens like our database backend gets overloaded due to someone performing an ill-advised cache bust on the memcache side, or a bug in our code causes Apache to become unresponsive.

Four, for objects that are hot, we give them a medium-length TTL of 4 minutes and serve them out of cache. In effect, that means that a hot page only invokes an Apache/PHP page generation once every four minutes, while Varnish takes care of the rest of the requests for that object by serving a cached version. This drastically reduces the load on the Apache cluster, which makes everyone happy.

That last bit brings me to the point of this post, which is to illustrate a couple of little helper scripts I wrote to make the task of changing the varnish configuration to pop out and push in URLs which either no longer need to be served out of cache, or need to start being served out of cache fast and easy. The relevant bit in the varnish configuration looks something like this:

sub vcl_hit {

  if (req.backend.healthy && req.url !~ “(png|gif|jpeg|jpg|ico|swf|css|js|html|htm)(\?[a-z0-9]+)?$” && req.url !~ “6747655” && req.url !~ “6749862”) {   ### sedbait

    return (pass);

  } else {

    return (deliver);

}

This is the conditional that says “if the backend is healthy and req.url doesn’t match any of the regexps in this list, grab a fresh copy from the backend, otherwise serve it out of cache.” Those regexps we test for are just numbers, they correspond to unique node_ids representing specific pieces of content on our site, which is contained in the URL. The first two pieces of the conditional never change: req.backend.healthy and a regexp matching static objects. After that, it gets dynamic. That list represents the node_ids that are currently “hot”, as a result of a promoted link on another site, for example. These change on a sometimes-hourly (but mostly daily or weekly) basis. Initially I would edit the script by hand any time we needed to add or remove a new node_id to or from the hot list, but it becomes apparent pretty quickly that we need a better way to make these changes and get them pushed out quickly, in order to respond to unforeseen hot objects that are beginning to have noticeable negative effects on our Apache cluster. Here are the scripts I wrote to automate the task:

#!/bin/bash

# vadd.sh by Adam Staudt <adam.staudt#connectedventures.com>

# takes two arguments - first the source file, then a pattern.

# appends pattern to vcl_hit() block in order to

# force-cache URLs containing the supplied pattern.

if [ $# -ne 2 ]; then

  echo “Usage: $0 <file> <pattern>”;

  exit 1;

fi

FILE=$1

PATTERN=$2

cat $FILE | sed -e “/sedbait/ s/) {/ \&\& req.url \!~ "$PATTERN") {/” > $FILE.new ; mv $FILE.new $FILE

You can see why there’s a comment at the end of the if() block that says “sedbait”. The script to perform the inverse operation is similar:

FILE=$1

PATTERN=$2

cat $FILE | sed -e “/sedbait/ s/ \&\& req.url \!~ "$PATTERN"//g” > $FILE.new ; mv $FILE.new $FILE

So now if our Apache servers are spiking and we can pinpoint specific hot objects (either with the help of varnishtop or our ad ops team), it’s quick and easy to append the hot object’s node_id to the varnish config and push it live, using methods discussed in my previous post.

a couple of useful one-liners

Like any sysadmin, I spend about 60-70% of my working time on the command line. I want to share a couple of the most useful command-line tricks I use often.

We move files between machines A LOT. Code pushes, configuration updates, database migrations, the list of reasons to move files around goes on and on. Here are a few of my favourite ways to do it:

Code pushes: When we’re ready to push code from Staging to Live environment, it involves rsync’ing the directory tree from one machine to multiple targets. In the past we had a shell script that simply ran the same rsync operation to each of the live frontends in series. This is really slow, and obviously does not scale at all. Another problem this presents is that there’s a period of time while you’re doing the push where part of your cluster has old code, and the other part has new code. It’s not ideal. This is where a really cool piece of software called GNU parallel comes in. (Note that parallel can do a LOT more than just what I illustrate here. Read their documentation, it’s a really versatile program.) We go from this:

$ for node in [list of frontend nodes] ; do \

  rsync -v -a -e ssh —exclude=”.svn” $STAGING_DIR/ $node:$LIVE_DIR;

done

to this:

$ for node in [list of frontend nodes] ; do echo $node ; done | parallel —tag —group -k -j 0 rsync -v -a -e ssh —exclude=”.svn” $STAGING_DIR/ {}:$LIVE_DIR

Parallel allows us to rsync to all the servers in parallel, independently of all the others. This also solves the problem that arises if one machine is slow or down - it blocks the push. When done in parallel, that’s not a problem for the rest of the machines, the push goes out instantly. This scales a lot better but you’re still at the mercy of your network connection. If you have too many machines, you’ll saturate the connection and things will slow down. The guys over at Tumblr have a nice solution to this which involves setting up a chain of fifos. They’ve described it on their engineering blog. It’s really clever, but involves having to set up the pipe chain with a command on every machine individually. They’ve wrapped it up nicely in a ruby program, but we have not been forced to go that far just yet, as the number of live frontends in our cluster is still manageable.

Another thing we do fairly often is pull in snapshots of the live database onto the dev database server, so the devs can develop against the most current DB schema. We have in place already a system of cron scripts to dump and compress daily snapshots of the production databases, and scp them down to our dev RAID here in the office. When dev asks for new data to be loaded into the dev DB, it’s as simple as this:

On the target machine:

$ nc -l 2012 | mysql <db_name>

On the source machine:

$ zcat db_backup.gz | nc <target_machine_ip> 2012

That’s it. Note that on some older versions of nc you need to specify both -l and -p <port> on the listening side.

One last one for now: say I want to make an update to the Varnish Cache configuration on multiple caches at once. GNU Parallel, in tandem with Varnish’s excellent admin console, makes the job easy:

$ for server in <list of servers> ; do echo server ; done | parallel -j 0 scp varnish.conf root@{}:/etc/varnish.conf

$ for server in <list of servers> ; do echo server ; done | parallel -j 0 —tag -k “echo ‘vcl.load <config_id> /etc/varnish.conf’ | nc {} 81”

$ for server in <list of servers> ; do echo server ; done | parallel -j 0 —tag -k “echo ‘vcl.use <config_id>’ | nc {} 81”

If your list of servers doesn’t change that much, you can automate that first part where you prepare the list of targets for parallel to expand with the “{}” construction a little more. Personally I use a sequence of numbers and a “seq” script I wrote which simply takes two numerical arguments and prints out a sequence of numbers with those args as upper- and lower-bounds. This works for us because our hostnames use a <function>-<site>-<number> scheme, so for example “vc-jest-01” and “vc-jest-02” are Varnish Cache nodes for Jest.

CollegeHumor infrastructure overview

A quick high-level overview of the infrastructure on which the CollegeHumor site runs:

  • Database Layer: 8 CentOS5 machines running MySQL 5.0. This is a very simple replicated setup having one write-only master and seven read-only slaves. Numerous improvements to this cluster are on the agenda, including upgrading all of the machines to MySQL 5.5 and improving the method by which one of the slaves can fail over and become the master in case the master fails. Our data set isn’t all that enormous, weighing in at 7.2GB, so we have not had to do any extensive sharding of our data. The only sharding we have done is to break the `sessions` table out into its own database running on its own server (Percona 5.5, InnoDB), as it is much more write-heavy than any of our other tables. Rounding out the database layer is a small Redis cluster which serves as the backend for the UPick feature, which was recently launched live.
  • Memcache: One of the key ways in which our application tries to protect the MySQL cluster is through a pretty aggressive caching scheme using memcache. From a systems perspective, this part is really simple: we have four dedicated FreeBSD VMs running memcached with a cache size of a few GB. I almost never have to worry about this layer of the stack. Munin alerts are in place to warn us in case things go haywire, but fortunately this has not been an issue for us.
  • Apache: Our Apache nodes run on FreeBSD 8.2 machines. I like FreeBSD for web servers because it’s ultra-solid and delivers excellent performance with the kernel tweaks (via sysctl) we’ve made. There’s nothing particularly special about the Apache configuration here, it is built out of the FreeBSD ports tree and configured with MaxClients and MaxServers values that work for us. We use the standard prefork MPM.
  • PHP: Our framework is written in PHP, so PHP is one of the core components in the stack. PHP is built out of the FreeBSD ports tree and linked into Apache as a shared library. Again there’s nothing super exotic about the configuration here, aside from the fact that we use a dev version of igbinary in place of the standard PHP serialiser. This is due to a bug we found, the gory details of which are described in Aleksey’s blog post.
  • Varnish Cache: This is definitely the workhorse in our stack, and is responsible for preventing site problems and outages on a more-or-less daily basis. Our use of Varnish Cache deserves its own blog post, and will get one later. For now, I’ll just say that I love Varnish Cache, it is one of the most useful pieces of software I know of for a high-traffic website. It will save your ass, and look good doing it.
  • Other pieces: The above items describe the core stack that runs the CollegeHumor site, but we have recently launched a feature called UPick, which allows users to submit content as well as up/down-vote other users’ submissions. This application is also written in PHP, but it doesn’t run on the main CH cluster. UPick has its own pair of load-balanced frontend boxes, and a dedicated Redis backend. The frontend application itself is proxied by nginx to php-fpm, but the really clever bit that makes it massively scalable is that the application caches its output into memcache, and we’re able to cache the entire output of the frontend PHP app for some amount of time (4 minutes, currently). With that in place, we configure nginx to simply proxy all requests to the app directly to memcache, and if it’s a cache hit, then we’ve served the entire request right out of RAM, which is awesome. If it’s a cache miss or the TTL expires, then nginx will proxy the request to php-fpm, the app will run and stash its result into memcache, and nginx will deliver the results to the client. Currently UPick gets an average of 50 req/sec, about 20% of which actually result in php-fpm having to run. This allows us to run the app on a very modestly endowed VM and get great performance. UPick also deserves its own blog post, which will be forthcoming.

New Direction!

So… I started this blog a little over a year ago, and as you can see, it hasn’t really gone anywhere. The reasons why are explained pretty well in my previous post, but I found something to do with it. I’m going to write about the engineering stuff I do at my job working for CollegeHumor.

I’ve been here since October of 2010, and since then the technical landscape has changed quite a bit. I’ve been fortunate to be a member of a very small team of system engineers (read: two) who have been tasked with designing, building, launching, and maintaining (babysitting?) the stacks on which our group of websites is built.

Since I started here, our largest site - that’s CollegeHumor - underwent a complete rewrite and relaunch on a new codebase, running on a new cluster in a new datacenter. We also successfully launched a brand new site - Jest.

With this blog I want to share some strategies, insights, failures, solutions, and lessons we’ve learned - and continue to learn - along the way as we continue to develop and expand.

Warum gibt’s so wenig hier?

So I just logged into tumblr for the first time in months, and saw that nothing much is developing out of this tumblr that I started earlier in the year. Why is that? I thought about it a little bit, and I’ve decided that I don’t really know how to use the internet. I’ve identified three principal reasons for the lack of content here:

  • I don’t overshare. I don’t feel the need or the desire to publish onto the internet every thought, action, and interaction I have in real life. I don’t feel the need to post a link to every song or video or article I like just so that people will know I liked it. I understand why other people do this, and I appreciate that they do - I read your blogs, Facebook walls, and Twitter timelines, too - but that’s not me. It’s not that I don’t have a lot to say, it’s just that I don’t feel the need to say it into the ether. I’d rather talk to you.
  • There are too many services designed for spewing your thoughts onto the internet. I have a tumblr, a Google+ account, a Twitter feed, and a Facebook wall. None of them get used very often - though again I do read what my friends post - but it is a case of having too many tools for the same job. If I do have something I want to say, should I just copy/paste the same thing onto all four services? I know people do that, but it seems both pretentious and severely inefficient.
  • A lot of the things I am up to on a daily basis are either irrelevant to the general public - for example work stuff or polo stuff - or things I am better off not talking about in such a public forum (draw your own conclusions from that). Does the general public really care about the specifics and day-in/day-out tasks around designing, building, and maintenance of web stacks for the websites at the company I work for? I think it would be if I were developing new things or significantly pushing certain software packages or practices forward, but to be honest, I’m not. I use the hardware I have available to me and put together the software components to make it work. Sometimes creatively, sometimes to a recipe that’s already proven, but either way the specifics are not groundbreaking, so it doesn’t seem like something worthy of filling up blog space with. (EDIT: Screw that. Since I first wrote this, I realise that everything is meaningful when it comes to solving engineering problems, and it’s always worth it to share what you know, even if you’re just tweaking others’ ideas.) Is it really relevant to spew to the entire world my thoughts on specific issues and developments having to do with the fake sport I play? Only a (very) small percentage of the population is going to care about that, and there are other well-established ways to communicate directly with them about that. Beyond that I could talk about my opinions on culture, current events, politics, history, mathematics, or any of the other things I’m interested in, but this falls back to point number 1: I don’t feel the need to shout out everything I think, see, feel, and do. This is my failure at using the modern internet.

TL;DR version: I’m not an attention-seeker or a willing broadcaster of my unsolicited opinion. Therefore I do not know how to use the internet.