Building Lanyrd

coderholic · on Aug 11, 2011

There's a comment on slide 43 about finding a way to replay access logs. Siege is a great tool for this, as it takes a URLs file as an argument.

You can get a list of URLs from your apache access log with

    cut -d ' ' -f7 /var/log/apache2/access.log > urls.txt

And then hammer your test server with

    siege -c<concurreny rate> -f urls.txt

sciurus · on Aug 11, 2011

Siege is great. httperf is also worth trying. There's a good guide at http://www.comlore.com/redist_files/httperf-quickstart-guide...

po · on Aug 11, 2011

Just went through the slides... watching now. Very interesting. One thing that is a bit surprising is the diversity of key-value/caching technologies. I felt like there was a lot of overlap in capability between Varnish, redis, memcached, mongodb, etc...

simonw · on Aug 11, 2011

I'm a big fan of polyglot persistence - we're using MySQL, memcached, redis, Solr, Varnish and MongoDB now.

MySQL is where all of our key data lives up. I trust it, it's backed up, and the entire site can be recreated just from the MySQL dump.

Redis and Solr are both used for denormalisation. Solr provides search and our core calendar view, and is updated every 60-90 seconds by a cron job. It's replicated, which means that our calendar view (the most expensive page on the site) scales horizontally with the number of replicas.

Redis powers a few features, most notably pages that show which of your Twitter contacts are attending an event (a simple Redis set intersection, which Redis will happily perform 100,000 times a second). It's also used for our message queue, which means I don't have to run RabbitMQ as well.

memcached is used for caching. I could use Redis for this, but the nice thing about memcached is that it has a hard memory limit and will throw away keys without any fuss when it hits that limit. It's also a good idea to keep resources set aside for caching separate from resources being used for other purposes, in my opinion.

Varnish is currently just used as a layer in front of our JavaScript badges ( http://lanyrd.com/services/badges/ ), purely to protect us against a super high traffic site deploying our badges (all badge requests are cached for 10 minutes, and Varnish handles dogpiling for us). I'd like to use Varnish in front of the main site as well just for logged out users, but I haven't had time to deploy that yet. Our badges are also designed to not block the loading of your site if we're down for some reason - varnish helps a bit there as well. See our badge performance notes here: http://lanyrd.com/services/badges/docs/#performance

We recently started storing application logs in MongoDB, mainly as an experiment. MongoDB is very fast at writes, and lets us easily run structured queries across our logs. I don't care too much about persistence here, since the data isn't as valuable as e.g. our core database of conferences.

Swannie · on Aug 12, 2011

So it this a question of:

Learning new tools and understanding their idiosyncrasies vs. building new functionality on top of existing tools?

Where learning new tools is more interesting, so it wins? :P

mccutchen · on Aug 11, 2011

I like this approach to static assets, where each file's name is changed to include the hash of the file's contents on deployment. But I've never implemented it myself, because I can't figure out a good way to refer to those static assets in my templates.

simonw, if you're listening, how do you solve this problem?

How do you go from, e.g.,

to

?

simonw · on Aug 11, 2011

I'm using a Django template tag, {% static "css/example.css" %}

In development, the above tag would output "/static/css/example.css?0.234234" - the random number at the end cache busts so e.g. IE will always load the latest version of the file.

In production, the tag looks up the transformed filename in a dictionary, which looks something like this:

    STATIC_ASSETS = {
        "css/core.css": "css/core.b1b09227.min.css"
    }

The deployment script includes a bit of code that goes through every file in the static directory, figures out the hash, renames it and then writes out that dictionary in a generated static_assets.py file ready to be deployed to the servers. There's a separate management script that pushes the renamed files to S3 - I run that before doing a deploy.

The only really fiddly bit is that the script needs to rewrite all of the CSS files to include the updated filename of any referenced images. I'm using a dumb regex to do this:

    css_url_re = re.compile(r'url\((["\']?)([^)]+?)\1\)')

Since we control the coding standards for our own CSS, there's no need to do anything more robust than that.

arthur_debert · on Aug 11, 2011

On universalsubtitles.org, we're doing something pretty similar to that.

1) Move all static media to a unique hash (we get that from git's commit id, making it trivial to correlate code & static) to /[static-media]/[static-cache]/[git commit uid]/...path to file

2) Set the MEDIA url accordingly

The nice thing about this is that the generated file names are very readable and you always know what changeset generated it just by looking. It's open source if someone finds it useful https://github.com/8planes/mirosubs/tree/master/apps/unisubs...

mccutchen · on Aug 11, 2011

That's a pretty good approach. I guess the main drawback is that the git commit hash will change far more often than the contents of most of the static files, though, right?

arthur_debert · on Aug 11, 2011

Of course, but when in development, we use the a template tag that inserts the original url (no commit hash mangling on MEDIA URL), which means that yes, for each deployment we nuke statics, but on our dev cycle that would happen any way (it's very rare to have a release that does not touch static files), so it's not an issue in practice

simonw · on Aug 12, 2011

We've written our CSS and JavaScript to have a bunch of reusable components, so often we can deploy new features without changing our static assets at all (we reuse classes for components that are already in use elsewhere on the site). I can see how for heavier JS sites this wouldn't be worthwhile though.

simonw · on Aug 11, 2011

That's exactly why we don't use git commit hashes - we only want our users to download new versions when the files have definitely changed.

mccutchen · on Aug 11, 2011

Ahhh, thanks for the details. The only solution I could think of involved writing out a dict like you're describing and using a template helper to look up the modified asset filenames. I was wondering if there was a more elegant solution that was escaping me.

And it probably would've taken me a little while to realize that I'd need to rewrite CSS files as well, so thanks for that, too!

lamby · on Aug 11, 2011

Have you considered rendering your CSS using the templating system? That way {% static .. %} Just Works, and you can also roll extra template tags for repetitive stuff like rounded corners. You can pre-render for the live site, of course.

simonw · on Aug 12, 2011

We thought about it briefly, but it doesn't really fit Nat's preferred way of working with CSS. Might try something like this in the future though.

bmelton · on Aug 11, 2011

If you're using Django, you might want to look into Django-Compressor (https://github.com/mintchaos/django_compressor#readme).

It doesn't QUITE get what you're looking for as for the naming pattern, but it does automatically take your .css files, compress them into one (removing unnecessary whitespace and the like), and then replace:

to something like:

I'm currently working a bunch with Flask, which has something similar (http://pypi.python.org/pypi/Flask-Assets) that I'm not yet using, but I've gotten the basics down with SASS tacked into a Fabric script that compiles my CSS on each deploy.

Thus far, this has meant a cache purge when deploying new assets, as I don't have the hashtag in the name (or even a datestamp) for that matter, so for that, I've looked into using Flask-Assets, but I'm not there yet.

jsdalton · on Aug 11, 2011

The feature you are describing was just committed to Django trunk this a.m. The main docs have not been updated with it yet, but you can view the docs in this changeset to see how it will work: https://code.djangoproject.com/changeset/16594

The author of the new feature is the guy behind django-compressor.

jonasvp · on Aug 11, 2011

Just use django-compressor (https://github.com/jezdez/django_compressor). It does this as well as minifying and compressing static assets.

simonw · on Aug 11, 2011

django-compressor uses the file modification time to assign a different filename to the file - unfortunately, this doesn't work so well if you are deploying from multiple machines. I much prefer using a hash of the file contents, since that is guaranteed to change if and only if the file itself has changed.

In fact, I'll file a bug suggesting this on django_compressor right now.

arthur_debert · on Aug 11, 2011

Django compressor has another issue on load balanced environments: the cached key uses the hostname of the machine , which really made it impractical for us. I promised Jannis a clean patch for this, but haven't found the time to do so...

In our case we have a utility machine that compiles media and deployes it but that machine doesn't even run a webserver at all...

https://github.com/jezdez/django_compressor/blob/develop/com...

mikeyagley · on Aug 11, 2011

django-compress, while a little older than django-compressor, supports versioning based on a hash of the file contents. I've been pretty happy with it. https://github.com/pelme/django-compress

jpb0104 · on Aug 11, 2011

https://github.com/paulirish/html5-boilerplate/blob/master/....

danielknell · on Aug 11, 2011

I've noticed some multi-variant testing on some of the design elements, and have rolled things out to beta testers first in the past, is there anything juicy behind the scenes powering these features?

simonw · on Aug 11, 2011

Nothing fancy at the moment - just some if blocks in our templates. We'll probably start using redis set membership for feature flags in the future though.

sarp · on Aug 12, 2011

Could you elaborate on your read-only mode?

Is it only hiding functionality that would cause writes or is there more to it? Is it built into the application logic?

simonw · on Aug 14, 2011

Yup, it's hiding functionality that causes writes. Lots of messy template logic and a few bits of app logic as well. It isn't very neatly abstracted at the moment.