Posts tagged squid
The following is a write-up is leaning onto an artictle I read years ago, but this post's essence should still hold up today. It's written being targetet at PHP landscape, and is not something you need for your personal blog. Do use a static website generator if you care for speed there, your wordpress is gonna be hacked some day anyway.
If you already have two servers, one for your webserver, one for your database, this is more likely for you. If you already have a loadbalancer in front of two webservers, this is definitely for you.
But even big webshops usually do not put such measures in place, except if they really do care about their response times and thus about their google ranking.
Mostly this is a guidance on how to tackle the customers favourite complaint: 'It is so slow!' and providing some background.
When trying to fix a 'slow' website, there are several approaches.
- fix the website code
- throw hardware at the problem
- change the underlying infrastructure (software-wise)
First usually is not going to happen, as good web developers, especially in the PHP universe, are just rare. They fight their codebase and are happy when things are working correct. Performance comes second, and profiling their application is something they often never did or heard about.
Second was a nice solution, but these GHz numbers don't really improve that drastically as they did in the past. And since the memory wall gets hit, this solution also ceases to be a viable approach, no matter how fast you make your webserver connect to your database. SSD's do help, but only so much.
Which leaves us with option three, or the following measures in particular:
- SSD's (noted here for sake of completeness)
- handle sessions via redis
- separate static from dynamic content and serve each via different webservers
- browser caching
- accelerators, like squid or varnish in front
- opcode caches
- database caching via memcached to relieve the main database
- CDN's like akamai
If you are about to migrate your website onto new hardware anytime soon (onto a new single server is what we talk about), think about getting SSD's. These have capacities of 250GB upwards, which will give you in a RAID10 setup like 500GB of usable redundantly persisted space. No matter how big your web presence is usually, after substracting 20GB for a linux operating system, this leaves you with plenty of diskspace for whatever your future may hold for you.
For budgeting reasons, a RAID1 setup of two SSD's providing still ~230GB of space is usually sufficient, except you plan on storing literally shitloads of FTP data or useless backups.
Backups are to be done off-site on another server anyway.
You don't need version control on your production server anyway (except for
/etc maybe), except you think you are a true DevOp, fight others to the bone about the agile kool-aid and know jack-shit anyway.
But, no hard feelings, point is, just get SSD's if you can afford them.
This is sort of a pre-requisite, depending your overall approach, see the fazit at the bottom to see what this means.
If you happen to have a lot of sessions, this improves things a bit. Reading sessions from an in-memory database is just plain faster than letting the webserver getting them from the harddisk every time. This is only true if redis runs on the same machine as your webserver, as network latency is almost always higher than the latency of disk I/O operations. If you have a direct crosslink to your dedicated session server with 10G NIC's, this is not true, but if you have this in place you sure as hell do not need this whole article.
If you however split your load onto several webservers behind a loadbalancer, and want a real 'shared-nothing' architecture, things are different. In that case, you don't have your loadbalancer configured to use sticky session, and so you need a central place where your sessions are managed. 'Sticky sessions' simply mean, each user is served by the same webserver each time he visits your website.
Unless you really think you need a shared nothing installation or have REALLY many sessions, you don't need this.
Use redis instead of memcached for this, as the former can persist the in-memory data to disk.
static vs. dynamic content
Classify all your content in one of these two groups. Put all static content on a separate webserver, and let it be served by it only handling a subdomain of your site. This frees up resources on your 'dynamic' webserver. If you put both on the same hardware, if the dynamic webserver eats all the resources, it's no use doing that of course. You need another server in this case.
In case you read this, and have zero clue what static vs. dynamic means:
- static content = html files on your website
- dynamic content = html code generated from your php code which is then inserted in the already existing html code mentioned as 'static'
Classify your content into things that, maybe like this:
- never change (6 months caching time)
- seldom change (1 week)
- often (1 day)
- always (1 minute)
This is just rough guidance from the top of my head, adjust to your needs.
Set the caching headers of your HTTP packets accordingly, and let the users browser help you reduce your servers' load.
To still be able to exchange old content with new one, add hashes to the URLs of your 'never-changes' content, to make sure when things change, new content will be served no matter what cache expiration times you use. These hashes have to be created during your deployment process automatically and be inserted to your application code, also automatically.
This is actually something rather sophisticated, but otherwise you have the same problem as with using 301 Redirects: Caching times and permanent redirects don't forgive fuck-ups on your behalf.
reverse-proxies for accellerating things
If you already seperated dynamic vs. static content, what sense does it make to put an accelerator in form of a reverse-proxy like squid or varnish up front, too?
Accelerators do create like 'static snapshots' of your combined static-dynamic content, and serves them directly. An accelerator does create static html code to be served from the already existing static html parts and the html generated from the interpreted php code.
Some made up numbers for a big website and requests being possibly served per second:
- dynamic content webserver: 100
- static content webserver: 5k
- accelerator: 250k
Important is to differentiate HTTP GET and all the other requests. GET's don't change things, POST's or PUT's and such do. GET are served by the accelerator, but the others must pass all your caching layers.
PHP instructions are parsed and translated into operation codes (machine language) in the process of their execution through the php interpreter. To speed up things, opcode caches like APC do basically precompile the instruction to speed up the php execution.
Like up to three times faster your website can become, just through the opcode cache.
memcached is a in-memory key-value store, caching often-used data from the database. Any questions why this might be beneficial? ;)
Sidenote: There exist memcache and memcached which are separate programs, don't get confused by that.
content delivery networks
In case you have seriously big traffic spikes (read: if you wonder about this happening, you don't), you don't need a CDN.
CDN's are put in place by exchanging the subdomain pointing to the data served by your static webservers, to another subdomain pointing to the CDN. This is helpful if you have had single times like special days where you knew your load to be ridiculously high where you'd need a lot more serving power than you usually do.
If you need your CDN to not just serve static content but complete sites, you can't just use your own loadbalancer. The CDN's loadbalancer must be configured and put to work.
So instead of getting more machines yourself, set things up accordingly and employ a CDN of your choice. Akami is rather good.
Else your machines will idle around for 99,99% of all the time you have them in place, and would have a hard time making profit out of them. Never wondered why amazon is such a huge cloud provider? That's just their machines that would otherwise be doing nothing since christmas is just not there already.
In case you have a single server where a webserver and a database server run on, what are the easiest steps for speeding up things?
- opcode cache
Also the SSD's help, but usually you get them up front, not after your installation is already running, as there are migration fees to be paid if you want your provider to reinstall your hosting.
Further you do categorize your content, and implement caching.
Once all this is done, implement caching.
The next step would be more hardware and distributing the load onto several webservers.
So what if you already have more than one server?
Do the first three points mentioned above and caching.
Then set up dedicated session handling, so your load will be distrubuted more evenly accross your servers, when using a loadbalancer.
For setting everything else up, you should know what you are doing and not just be reading this.
View posts from 2017-02, 2017-01, 2016-12, 2016-11, 2016-10, 2016-09, 2016-08, 2016-07, 2016-06, 2016-05, 2016-04, 2016-03, 2016-02, 2016-01, 2015-12, 2015-11, 2015-10, 2015-09, 2015-08, 2015-07, 2015-06, 2015-05, 2015-04, 2015-03, 2015-02, 2015-01, 2014-12, 2014-11, 2014-10, 2014-09, 2014-08, 2014-07, 2014-06, 2014-05, 2014-04, 2014-03, 2014-01, 2013-12, 2013-11, 2013-10