Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Building Lanyrd (lanyrd.com)
82 points by mace on Aug 11, 2011 | hide | past | favorite | 26 comments


There's a comment on slide 43 about finding a way to replay access logs. Siege is a great tool for this, as it takes a URLs file as an argument.

You can get a list of URLs from your apache access log with

    cut -d ' ' -f7 /var/log/apache2/access.log > urls.txt
And then hammer your test server with

    siege -c<concurreny rate> -f urls.txt


Siege is great. httperf is also worth trying. There's a good guide at http://www.comlore.com/redist_files/httperf-quickstart-guide...


Just went through the slides... watching now. Very interesting. One thing that is a bit surprising is the diversity of key-value/caching technologies. I felt like there was a lot of overlap in capability between Varnish, redis, memcached, mongodb, etc...


I'm a big fan of polyglot persistence - we're using MySQL, memcached, redis, Solr, Varnish and MongoDB now.

MySQL is where all of our key data lives up. I trust it, it's backed up, and the entire site can be recreated just from the MySQL dump.

Redis and Solr are both used for denormalisation. Solr provides search and our core calendar view, and is updated every 60-90 seconds by a cron job. It's replicated, which means that our calendar view (the most expensive page on the site) scales horizontally with the number of replicas.

Redis powers a few features, most notably pages that show which of your Twitter contacts are attending an event (a simple Redis set intersection, which Redis will happily perform 100,000 times a second). It's also used for our message queue, which means I don't have to run RabbitMQ as well.

memcached is used for caching. I could use Redis for this, but the nice thing about memcached is that it has a hard memory limit and will throw away keys without any fuss when it hits that limit. It's also a good idea to keep resources set aside for caching separate from resources being used for other purposes, in my opinion.

Varnish is currently just used as a layer in front of our JavaScript badges ( http://lanyrd.com/services/badges/ ), purely to protect us against a super high traffic site deploying our badges (all badge requests are cached for 10 minutes, and Varnish handles dogpiling for us). I'd like to use Varnish in front of the main site as well just for logged out users, but I haven't had time to deploy that yet. Our badges are also designed to not block the loading of your site if we're down for some reason - varnish helps a bit there as well. See our badge performance notes here: http://lanyrd.com/services/badges/docs/#performance

We recently started storing application logs in MongoDB, mainly as an experiment. MongoDB is very fast at writes, and lets us easily run structured queries across our logs. I don't care too much about persistence here, since the data isn't as valuable as e.g. our core database of conferences.


So it this a question of:

Learning new tools and understanding their idiosyncrasies vs. building new functionality on top of existing tools?

Where learning new tools is more interesting, so it wins? :P


I like this approach to static assets, where each file's name is changed to include the hash of the file's contents on deployment. But I've never implemented it myself, because I can't figure out a good way to refer to those static assets in my templates.

simonw, if you're listening, how do you solve this problem?

How do you go from, e.g.,

<link rel="stylesheet" href="/style.css">

to

<link rel="stylesheet" href="/style.{current-hash}.css">

?


I'm using a Django template tag, {% static "css/example.css" %}

In development, the above tag would output "/static/css/example.css?0.234234" - the random number at the end cache busts so e.g. IE will always load the latest version of the file.

In production, the tag looks up the transformed filename in a dictionary, which looks something like this:

    STATIC_ASSETS = {
        "css/core.css": "css/core.b1b09227.min.css"
    }
The deployment script includes a bit of code that goes through every file in the static directory, figures out the hash, renames it and then writes out that dictionary in a generated static_assets.py file ready to be deployed to the servers. There's a separate management script that pushes the renamed files to S3 - I run that before doing a deploy.

The only really fiddly bit is that the script needs to rewrite all of the CSS files to include the updated filename of any referenced images. I'm using a dumb regex to do this:

    css_url_re = re.compile(r'url\((["\']?)([^)]+?)\1\)')
Since we control the coding standards for our own CSS, there's no need to do anything more robust than that.


On universalsubtitles.org, we're doing something pretty similar to that.

1) Move all static media to a unique hash (we get that from git's commit id, making it trivial to correlate code & static) to /[static-media]/[static-cache]/[git commit uid]/...path to file

2) Set the MEDIA url accordingly

The nice thing about this is that the generated file names are very readable and you always know what changeset generated it just by looking. It's open source if someone finds it useful https://github.com/8planes/mirosubs/tree/master/apps/unisubs...


That's a pretty good approach. I guess the main drawback is that the git commit hash will change far more often than the contents of most of the static files, though, right?


Of course, but when in development, we use the a template tag that inserts the original url (no commit hash mangling on MEDIA URL), which means that yes, for each deployment we nuke statics, but on our dev cycle that would happen any way (it's very rare to have a release that does not touch static files), so it's not an issue in practice


We've written our CSS and JavaScript to have a bunch of reusable components, so often we can deploy new features without changing our static assets at all (we reuse classes for components that are already in use elsewhere on the site). I can see how for heavier JS sites this wouldn't be worthwhile though.


That's exactly why we don't use git commit hashes - we only want our users to download new versions when the files have definitely changed.


Ahhh, thanks for the details. The only solution I could think of involved writing out a dict like you're describing and using a template helper to look up the modified asset filenames. I was wondering if there was a more elegant solution that was escaping me.

And it probably would've taken me a little while to realize that I'd need to rewrite CSS files as well, so thanks for that, too!


Have you considered rendering your CSS using the templating system? That way {% static .. %} Just Works, and you can also roll extra template tags for repetitive stuff like rounded corners. You can pre-render for the live site, of course.


We thought about it briefly, but it doesn't really fit Nat's preferred way of working with CSS. Might try something like this in the future though.


If you're using Django, you might want to look into Django-Compressor (https://github.com/mintchaos/django_compressor#readme).

It doesn't QUITE get what you're looking for as for the naming pattern, but it does automatically take your .css files, compress them into one (removing unnecessary whitespace and the like), and then replace:

<link rel="stylesheet" href="/style.css" /> <link rel="stylesheet" href="/style2.css" /> <link rel="stylesheet" href="/style3.css" />

to something like:

<link rel="stylesheet" href="/ax502b7.css" />

I'm currently working a bunch with Flask, which has something similar (http://pypi.python.org/pypi/Flask-Assets) that I'm not yet using, but I've gotten the basics down with SASS tacked into a Fabric script that compiles my CSS on each deploy.

Thus far, this has meant a cache purge when deploying new assets, as I don't have the hashtag in the name (or even a datestamp) for that matter, so for that, I've looked into using Flask-Assets, but I'm not there yet.


The feature you are describing was just committed to Django trunk this a.m. The main docs have not been updated with it yet, but you can view the docs in this changeset to see how it will work: https://code.djangoproject.com/changeset/16594

The author of the new feature is the guy behind django-compressor.


Just use django-compressor (https://github.com/jezdez/django_compressor). It does this as well as minifying and compressing static assets.


django-compressor uses the file modification time to assign a different filename to the file - unfortunately, this doesn't work so well if you are deploying from multiple machines. I much prefer using a hash of the file contents, since that is guaranteed to change if and only if the file itself has changed.

In fact, I'll file a bug suggesting this on django_compressor right now.


Django compressor has another issue on load balanced environments: the cached key uses the hostname of the machine , which really made it impractical for us. I promised Jannis a clean patch for this, but haven't found the time to do so...

In our case we have a utility machine that compiles media and deployes it but that machine doesn't even run a webserver at all...

https://github.com/jezdez/django_compressor/blob/develop/com...


django-compress, while a little older than django-compressor, supports versioning based on a hash of the file contents. I've been pretty happy with it. https://github.com/pelme/django-compress



I've noticed some multi-variant testing on some of the design elements, and have rolled things out to beta testers first in the past, is there anything juicy behind the scenes powering these features?


Nothing fancy at the moment - just some if blocks in our templates. We'll probably start using redis set membership for feature flags in the future though.


Could you elaborate on your read-only mode?

Is it only hiding functionality that would cause writes or is there more to it? Is it built into the application logic?


Yup, it's hiding functionality that causes writes. Lots of messy template logic and a few bits of app logic as well. It isn't very neatly abstracted at the moment.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: