Nginx and why you should be running it instead of, or at least in front of Apache

April 14th, 2011 3 comments

After 9 years of development, Nginx hit a milestone release this week when version 1.0.0 was released (on 12th April 2011). Despite only now reaching a 1.0 release, it is already in widespread use, powering a lot of high traffic websites and CDN’s and is very popular with developers in particular. With such a milestone release though, I thought it a good opportunity to get motivated and do some posts on it here.

Nginx (pronounced “engine-x”) is a free, open-source, high-performance HTTP server (aka web server) and reverse proxy, as well as an IMAP/POP3 proxy server. Igor Sysoev started development of Nginx in 2002, with the first public release in 2004.

Nginx is known for its high performance, stability, rich feature set, simple configuration, and low resource consumption. It is built specifically to be able handle more than 10,000 request/sec and do so using minimal server resources. It does this by using a non-blocking event based model.

In this article, i’m going to look at the problems with Apache and explain why you would want to use Nginx. In a subsequent article, i’m going to explain how to install and configure Nginx.

The most popular web server, Apache powers around 60% of the world’s web sites. I’ve been using Apache for around 10 years but more recently have been using Nginx. Due to it’s widespread use, Apache is well used, well understood and reliable. However, it does have some problems when we are dealing with high traffic websites. A lot of these problems center around the fact that it uses a blocking process based architecture.

The typical setup for serving PHP based websites in a LAMP (Linux, Apache, MySQL and PHP) environment uses the Prefork MPM and mod_php. The way this works is to have the PHP binary (and any other active Apache modules) embedded directly into the Apache process. This gives very little overhead and means Apache can talk to PHP very fast but also results in each Apache process consuming between 20MB and 50MB of RAM. The problem with this is that once a process is dealing with a request, it can not be used to serve another request so to be able to able to handle multiple simultaneous requests (and remember that even a single person visiting a web page will generate multiple requests because the page will almost certainly contain images, stylesheets and javascript files which all need to be downloaded before the page can render), Apache spawns a new child process for each simultaneous request it is handling. Because the PHP binary is always embedded (to keep the cost of spawning processes to a minimum), each of these processes takes the full 20MB-50MB of RAM, even if it is only serving static files so you can see how a server can quickly run out of memory.

To compound the problem, if a PHP script takes a while to execute (due to either processing/load or waiting on an external process like MySQL) or the client is on a slow/intermittent connection like a mobile device then the Apache process is tied up until the execution and transmission etc has completed which could be a while. These factors and a lot of traffic can often mean that Apache has hundreds of concurrent processes loaded and it can easily hit the maximum number of processes (configured) or completely exhaust the available RAM in the system (at which point it will start using the virtual memory on the hard disk and everything will get massively slower and further compound the problem). If a web page has say 10 additional assets (css, javascript and images), that’s 11 requests per user. If 100 users hit the page at the same time, that’s 1,100 requests and up to around 50GB of RAM required (although in reality you would have a limit on the number of Apache processes much lower than this so the requests would actually be queued and blocked until a process became free and browsers will generally open up a few simultaneous connections to a server at a time). Hopefully you are starting to see the problem.

With Nginx’s event based processing model, each request triggers events to a process and the process can handle multiple events in parallel. What this means is that Nginx can handle many simultaneous requests and deal with execution delays and slow clients without spawning processes. If you look at the two graphs from webfaction, you can quite clearly see that Nginx can handle a lot more simultaneous requests while using significantly less, and quite a constant level (and low amount) of RAM.

Nginx excels at serving static files and it can do so very fast. What we can’t do is embed something like PHP into the binary as PHP is not asynchronous and would block requests and therefore render the event based approach of Nginx useless. What we therefore do is have either PHP over FastCGI or Apache+mod_php in the background handle all the PHP requests. This way, Nginx can be used to serve all static files (css, javascript, images, pdf’s etc), handle slow clients etc but pass php requests over to one of these backend processes, receive the response back and handle delivering it to the client leaving the backend process free to handle other requests. Nginx doesn’t block while wating for FastCGI or Apache it just carries on handing events as they happen.

The other advantage of this “reverse proxy” mode is that Nginx can act as a load balancer and distribute requests to not just one but multiple backend servers over a network. Nginx can also act as a reverse caching proxy to reduce the amount of dynamic requests needing to be processed by the backend PHP server. Both of these functions allow even more simultaneous dynamic requests.

What this means is that if your application requires a specific Apache configuration or module then you can gain the advantages of Nginx handling simultaneous requests and serving static files but still use Apache to handle the requests you need it to.

If these is no requirement for Apache then Nginx also supports communication protocols like FastCGI, SCGI and UWSGI. PHP also happens to support FastCGI so we can have Nginx interact with PHP over FastCGI without needing the whole of Apache around.

In the past, you either had to use a script called spawn-fcgi to spawn FastCGI processes or handle FastCGI manually and then use some monitoring software to monitor them to ensure they were running. However, as of PHP 5.3.3, something called PHP-FPM is (which distributions often package up in a package called php5-fpm) part of the PHP core code which handle all this for you in a way similar to Apache – you can set the minimum and maximum number of proceses and how many you would like to spawn and keep around waiting. The other advantage to this is that PHP-FPM is an entirely separate process to Nginx so you can change configurations and restart each of them independently of each other (and Nginx actually supports reloading it’s configuration and upgrading it’s binary on-the-fly so it doesn’t require a restart).

In the next post in this series, i’ll explain how to install and configure Nginx for serving both static and dynamic content.

One of the disadvantages of Nginx is that it doesn’t support .htaccess files to dynamically modify the server configuration – all configuration must be stored in the Nginx config files and can not be changed at runtime. This is a positive for performance and security but makes it less suitable for running “shared hosting” platforms.

Categories: Web Tags: , , , ,

Stop Windows Restarting Automatically After Scheduled Updates

April 13th, 2011 No comments

If I had to name the most annoying thing about Windows, it would be that it automatically re-starts after automatically installing Windows updates (assuming for a minute that IE isn’t part of Windows of course!).

I always have a lot open on a system – browsers & tabs, terminal sessions etc etc and numerous times, i’ve come back to a system on a morning, logged in and stared at a blank taskbar in disbelief. It’s also a big problem if you connect in to use a machine from an outside location, particularly if you use drive encryption and it requires a password on boot.

Even if the results are not so catastrophic and i’m actually at the system, i’m forever clicking ‘Remind me later’ each time it pops up saying it’s going to restart.

I’ve always got round this by disabling automatic Windows updates and just installed them manually, periodically.

This very problem happened to me this morning and a friend pointed out a great tip – you can actually disable the automatic reboot. Like disabling automatic updates, you still have to remember to do this on each machine, but at least updates still get applied.

You do this as follows:

Start -> Run -> gpedit.msc
Computer Configuration -> Administrative Templates -> Windows Components -> Windows Update
Double click on: “No auto-restart for scheduled Automatic Updates installations”
Change to Enabled
Click Ok

Categories: Software Tags:

Threading / Blocking vs Event Driven Servers (and Node.js)

April 9th, 2011 No comments

I was just reading an old (Nov 2009) article by Simon Willison (of Django, The Guardian and Lanyrd Fame) discussing the emergence of Node.js.

Two of the things in particular I found interesting about the article:

Firstly, he cleverly predicted the importance and future popularity of Node.JS – and boy was he right. A year and a bit later and Node.JS is everywhere. I don’t think a day ever goes by when I don’t see at least one mention and/or article about it.

Secondly, he has a brilliant (and simple) description of threading / blocking servers vs event driven servers (just like Apache vs Nginx).

Event driven servers are a powerful alternative to the threading / blocking mechanism used by most popular server-side programming frameworks. Typical frameworks can only handle a small number of requests simultaneously, dictated by the number of server threads or processes available. Long-running operations can tie up one of those threads—enough long running operations at once and the server runs out of available threads and becomes unresponsive. For large amounts of traffic, each request must be handled as quickly as possible to free the thread up to deal with the next in line.

This makes certain functionality extremely difficult to support. Examples include handling large file uploads, combining resources from multiple backend web APIs (which themselves can take an unpredictable amount of time to respond) or providing comet functionality by holding open the connection until a new event becomes available.

Event driven programming takes advantage of the fact that network servers spend most of their time waiting for I/O operations to complete. Operations against in-memory data are incredibly fast, but anything that involves talking to the filesystem or over a network inevitably involves waiting around for a response.

With Twisted, EventMachine and Node, the solution lies in specifying I/O operations in conjunction with callbacks. A single event loop rapidly switches between a list of tasks, firing off I/O operations and then moving on to service the next request. When the I/O returns, execution of that particular request is picked up again.

You can read the full article here: Node.js is genuinely exciting.

If you are not familiar with Node.js (have you been living under a rock for the past year?? :) ), there is a great video by the author, Ryan Dahl here:

Categories: Interesting Tags:

Why Can’t Developers Estimate Time?

April 9th, 2011 1 comment

Ashley Moran just wrote an interesting article about why developers can’t estimate time.

Something most developers find hard is estimating the time something will take.

If it’s something you have done before, a repeatable task, then it’s easy – although most of the time things take longer than you think they will when you think it through in your head and quite often it turns out to be more complicated than you thought when you get into it or you hit a bug of some kind that you end up having to debug which takes a long time and compounds the issue.

Sometimes the problem of bad estimates is made worse is when estimates (which are at the end of the day an educated guess) are taken as a promise and therefore if it takes longer then there is a feeling that it should have been done quicker.

I particularly found these quotes interesting and amusing:

We can’t estimate the time for any individual task in software development because the nature of the work is creating new knowledge.

Rule of thumb: take the estimates of a developer, double it and add a bit

The double-and-add-a-bit rule is interesting. When managers do this, how often are tasks completed early? We generally pay much more attention to overruns than underruns

Categories: development Tags:

Faster Broadband – BT Infinity (Fibre to the Cabinet) Coming to Ingleby Barwick – How does it work?

April 7th, 2011 No comments

I first learned about BT Infinity last year when a friend I used to work with at BT pointed out that my exchange, Ingleby Barwick had been scheduled to be enabled in June 2011. At the time that sounded a long time away but now it’s getting closer, I decided to do a bit of digging and find out technically how it worked.

BT Infinity is BT’s new Fibre To The Cabinet (FTTC) broadband service which is slowly being rolled out across the country and promises speeds of up to 40Mbps downstream and 10Mbps upstream.

Pretty much all of the country can now get broadband in the form of ADSL (Asymmetric Digital Subscriber Line). ADSL works by utilising the existing copper wiring for the telephone lines already in the majority of homes. Unused frequencies are utilised to send data over the lines and a splitter is placed on the customers phone sockets to split the broadband signals off and allow simultaneous use of the telephone and broadband.

The original ADSL standard gives a theoretical maximum downstream speed of 8Mbps and upstream speed of 1Mbps and the newer ADSL2+ standard gives a theoretical maximum downstream speed of 24Mbps and upstream speed of 3.3Mbps. I say theoretical because it’s practically impossible to ever obtain those speeds unless you are literally next door to the exchange. The majority of people only obtain a fraction of those speeds. I am reasonably lucky to be able to be able to get 5Mbps downstream and just under 1Mbps upstream – most people I know get even less than that. (My modem actually sync’s at around 6000kbps and 1000kbps but only get around 5Mbps and just under 1Mbps real world speed test). The reason for this is that when transmitting signals over long distances of copper wire, noise on the line degrades the signal and the maximum speed reduces. As the exchange can be miles away from the premises and cable ducts do not necessarily even run directly as the crow flies, this loss can be great. It’s also very very sensitive to dodgy wiring – for this reason it’s recommended to plug the modem/router into the master socket and use a filtered faceplate to filter off the ADSL signal before the extension wiring to minimise the risk of interference (which is exactly what I do).

The only other serious option for fast home broadband in the UK is if you are in the coverage area for Virgin Media’s Cable Internet service. I used to have cable internet for several years – however when we moved house, even though it was less than 5 minutes walk round the corner, our new street is not wired for cable. There is a Virgin Media cabinet opposite our road end but they have not cabled down the street. If you can get cable then you are able to obtain speeds of up to 100Mbps downstream and 10Mbps upstream through their latest packages. The good thing about cable internet is when you sign up for a certain package, be it 10Mbs, 20Mbps, 30Mbps, 50Mbps or 100Mbps, you do actually get that speed connection. Of course, with connection in NTL’s network and quite harsh traffic management (throttling) they apply at peak periods, you wont necessarily see those real life download speeds all of the time, but at least you are actually connected at the speed you are paying for. The way this is achieved is, Comcast as they were known when they first started laying cables (they were later sold to NTL and later to Virgin), similar to BT, distributed cabinets around their coverage area to interconnect users. However, unlike BT, Comcast ran fibre optic connections to their cabinets rather than huge quantities of copper wires (one pair per line). What they then do is lay low-loss coaxial cable (coax) between the nearest cabinet and the premises. A single length of coax can provide the subscriber with fast broadband and cable television.

With ADSL, a modem at the customers premises (usually ISP’s supply a combined ADSL modem and router so the single connection can be shared between several machines connected via ethernet or 802.11 wireless using NAT (Network Address Translation) which allows multiple computers on a Local Area Network (LAN) to communicate to the internet with a single external IP address) connects to the ADSL splitter and over the copper wire direct to DSLAM equipment in the telephone exchange.

With BT’s new FTTC network (BT Infinity), as it’s name suggests, the local cabinets are connected via fibre (optics) back to the telephone exchange (which I assume will be connected by fibre to BT’s core network). As fibre travels at the speed of light, there is negligible loss – hence why it’s also used to connect different countries together round the world.

When I first heard about it, I didn’t really think about it and assumed that they would do similar to Virgin’s cable service and lay new cables of some kind between the cabinet and the customers premises. When you actually think about it though, that would be expensive, slow to roll out and would end up like Virgin’s cable network – severely limited to certain areas. i.e it wouldn’t really be practical.

So, how does it work? – well, what they are doing is using a technology called VDSL which is similar to the current ADSL technology already in use.

What this means in reality is the following:

Once the exchange has been enabled for FTTC, they will distribute new and slightly bigger cabinets which will house DSLAM equipment which is similar (but newer) to that currently housed in the telephone exchange, the fibre backbone, patch panels and a cross connect to the existing BT cabinet.

When you order BT Infinity, an engineer will come out to install the product. He will replace your current master socket with a new NTE5 master socket with a built in filter (so the modem/router will need to go into the master socket as the broadband frequencies will be split off before the extensions). He will then hook up a VDSL modem and separate “BT Home Hub” router.

What this modem does is connects using VDSL from your home to the VDSL cabinet using the existing copper wires, on to the DSLAM, back to the exchange (over the fibre) and on to BT’s core network.

Your phone line is then still terminated in the original cabinet for telephony (using the cross connect between the old and new cabinets I mentioned above) and back to the exchange over the original multi pair copper cable as it always has done.

What this means is that the noise induced loss is now only an issue between your premises and the cabinet rather than your premises and the exchange. This is how they are able to provide the quoted maximum speeds of 40Mbps downstream and 10Mbps upstream and you are much more likely to be able to get somewhere near these speeds depending on the distance to your cabinet and the quality of the lines and wiring to it.

Another area worth mentioning is currently, any ISP (Internet Service Provider) can sell you ADSL broadband. They either do this by using BT’s network and renting from their wholesale devision (I believe the product is called IPStream) or renting space in their exchanges and installing their own equipment (known as LLU). I believe this will still be possible with the new FTTC network but there doesn’t yet seem to be a great uptake in this – probably due to the costs involved.

Disclaimer: This is only my own knowledge mixed with snippets i’ve read about FTTC rather than any inside information so please do feel free to comment if you know anything here to be incorrect.

Categories: Phones Tags: , , ,

Book Review: Sphinx Search Beginner’s Guide

April 5th, 2011 No comments

Packtpub were kind enough to send me a copy of their new book, Sphinx Search Beginner’s Guide to review.

The book is written by Abbas Ali who is currently working as Chief Operating Officer and Technical Manager at SANIsoft Technologies Private Limited, Nagpur, India. The company specializes in development of large, high performance, and scalable PHP applications.

Sphinx is well described by it’s website as follows:

Sphinx is an open source full text search server, designed from the ground up with performance, relevance (aka search quality), and integration simplicity in mind. It’s written in C++ and works on Linux (RedHat, Ubuntu, etc), Windows, MacOS, Solaris, FreeBSD, and a few other systems.

Sphinx lets you either batch index and search data stored in an SQL database, NoSQL storage, or just files quickly and easily — or index and search data on the fly, working with Sphinx pretty much as with a database server.

The book covers everything from the installation and setup of Sphinx to simple and advanced use in PHP.

Here is a full outline of what’s covered:

  • Chapter 1, Setting Up Sphinx is an introduction to Sphinx. It guides the reader through the installation process for Sphinx on all major operating systems.
  • Chapter 2, Getting Started demonstrates some basic usage of Sphinx in order to test its installation. It also discusses full-text search and gives the reader an overview of Sphinx.
  • Chapter 3, Indexing teaches the reader how to create indexes. It introduces and explains the different types of datasources, and also discusses different types of attributes that can comprise an index.
  • Chapter 4, Searching teaches the reader how to use the Sphinx Client API to search indexes from within PHP applications. It shows the reader how to use the PHP implementation of the Sphinx Client API.
  • Chapter 5, Feed Search creates an application that fetches feed items and creates a Sphinx index. This index is then searched from a PHP application. It also introduces delta indexes and live index merging.
  • Chapter 6, Property Search creates a real world real estate portal where the user can add a property listing and specify different attributes for it so that you can search for properties based on specific criteria. Some advanced search techniques using a client API are discussed in this chapter.
  • Chapter 7, Sphinx Configuration discusses all commonly used configuration settings for
    Sphinx. It teaches the reader how to configure Sphinx in a distributed environment where
    indexes are kept on multiple machines.
  • Chapter 8, What Next? discusses some new features introduced in the recent Sphinx release.
    It also shows the reader how a Sphinx index can be searched using a MySQL client library.
  • Lastly, it discusses the scenarios where Sphinx can be used and mentions some of the
    popular Web applications that are powered by a Sphinx search engine.

At first, the style of the book seemed a bit strange to me – it’s split up into small chunks which are often followed by a “What just happened” section which gives a summary or broken down explanation of the concept just explained. Once I got used to it though, this actually improved the clarity and aided understanding.

The book is a very informative read for both beginners to either search or Sphinx and existing users and i’d highly recommend it to anyone interested in either search or the Sphinx product.

Anyone willing to find out more about or purchase the book can do so on the Packetpub website.

Categories: Reviews Tags: , , , ,

HTML5 Boilerplate Reaches v1.0

March 21st, 2011 No comments

This afternoon sees the HTML5 Boilerplate project (H5BP), led by Paul Irish, reach it’s much anticipated v1.0 release.

If you have not heard of the HTML5 Boilerplate (or H5BP as they started calling it recently) before, it’s an excellent and essential resource for web developers. It can either be used as a complete starting point for a new web project or as a large collection of cross browser recipes and best practices to incorporate into your existing projects.

To quote from the HTML5 Boilerplate Website:

HTML5 Boilerplate is the professional badass’s base HTML/CSS/JS template for a fast, robust and future-proof site.

After more than three years in iterative development, you get the best of the best practices baked in: cross-browser normalization, performance optimizations, even optional features like cross-domain Ajax and Flash.

A starter apache .htaccess config file hooks you the eff up with caching rules and preps your site to serve HTML5 video, use @font-face, and get your gzip zipple on.

Boilerplate is not a framework, nor does it prescribe any philosophy of development, it’s just got some tricks to get your project off the ground quickly and right-footed.

Today’s release of v1.0 isn’t just a bump from the Release Candidate to the final version, it also brings with it some exciting new material:

  • A Boilerplate Custom Builder – providing a custom package with only the features you require (similar to Paul’s previous Modernizr builder.
  • Improved Documentation
  • Improved Build Script
  • New video guides
  • Support for lighttpd, Google App Engine and Node.JS
  • Reduced size (50% smaller!)

You can find out more about the Boilerplate by watching the following videos. The latter is particularly entertaining and Paul is a great presenter.

Categories: development Tags: ,

DOM Monster

March 17th, 2011 No comments

There is an interesting new JavaScript tool circulating today – DOM Monster.

DOM Monster is our answer to JavaScript performance tools that just don’t give you the full picture.

DOM Monster is a cross-platform, cross-browser bookmarklet that will analyze the DOM & other features of the page you’re on, and give you its bill of health.

If there are problems, DOM Monster will point them out—and even make suggestions on how to fix ‘em.

You can read more and get it from here.

Categories: development Tags: , ,

Logging to Syslog with Kohana 3

March 16th, 2011 No comments

By default, the Kohana PHP Framework logs errors to files in the application/logs directory, separated into directories and files for each year, month and day (eg: application/logs/2011/03/16.php).

To change this behaviour to use the system log daemon, you change the following line in bootstrap.php:
Kohana::$log->attach(new Kohana_Log_File(APPPATH.'logs'));
Kohana::$log->attach(new Kohana_Log_Syslog('site_identifier', LOG_LOCAL1));

By default, these will probably appear in your /var/log/syslog file (on Debian/Ubuntu anyway – some other distributions use /var/log/messages).

You can configure syslog to put these in their own file by adding this to your syslog configuration file (on Debian/Ubuntu it’s done by adding a file into /etc/rsyslog.d):
local1.* /var/log/local1.log

If something else is already logging to local1, you can change that and LOG_LOCAL1 above to local2 and LOG_LOCAL2.

If you would like to write your own messages to the log, you can do so with:
Kohana::$log->add(Log::ERROR, 'error text');


If you would like to pull out different types of errors into different files, you can use this in your syslog configuration:
local1.=alert /var/log/local1.strace.log

Note that the errors are not actually written until the request has completed. You can force them to be written with the following:

You can also force this behaviour by setting Log::$write_on_add = TRUE; in bootstrap.php, but be aware there is an overhead to doing that.

If you would like to ensure that logging is setup before writing to the log, you can do the following:

if (is_object(Kohana::$log))
// Add this exception to the log:
Kohana::$log->add(Log::ERROR, $error);

// Make sure the logs are written:

Categories: development Tags: , – how fast does your website load?

March 15th, 2011 1 comment

This is a very nice tool –

You enter a URL and it shows you how long your site took to load, what was shown at certain intervals and a waterfall chart – from a certain location.

You can then choose another location from a whole list of continents and cities and repeat the test again from that location.

Categories: Uncategorized, Web Tags: