define( 'WPCACHEHOME', '/srv/www/www.ichilton.co.uk/html/blog/wp-content/plugins/wp-super-cache/' ); //Added by WP-Cache Manager Web | Ian Chilton

Archive

Archive for the ‘Web’ Category

Amazon Web Services, Hosting in the Cloud and Configuration Management

April 23rd, 2011 No comments

Amazon is probably the biggest cloud provider in the industry – they certainly have the most features and are adding more at an amazing rate.

Amongst the long list of services provided under the AWS (Amazon Web Services) banner are:

  • Elastic Compute Cloud (EC2) – scalable virtual servers based on the Xen Hypervisor.
  • Simple Storage Service (S3) – scalable cloud storage.
  • Elastic Load Balancing (ELB) – high availability load balancing and traffic distribution.
  • Elastic IP Addresses – re-assignable static ip addresses to EC2 instances.
  • Elastic Block Store (EBS) – persistant storage volumes for EC2.
  • Relational Database Service (RDS) – scalable MySQL compatible database services.
  • CloudFront – a Content Delivery Network (CDN) for serving content from S3.
  • Simple E-Mail System (SES) – for sending bulk e-mail.
  • Route 53 – high availability and scalable Domain Name System (DNS).
  • CloudWatch – monitoring of resources such as EC2 instances.

Amazon provides these services in 5 different regions:

  • US East (North Virginia)
  • US West (North California)
  • Europe (Ireland)
  • Asia Pacific – Tokyo
  • Asia Pacific – Singapore

Each region has it’s own pricing and features available.

Within each region, Amazon provides multiple “Availability Zones”. These different zones are completely isolated from each other – probably in separate data centers, as Amazon describes them as follows:

Q: How isolated are Availability Zones from one another?
Each availability zone runs on its own physically distinct, independent infrastructure, and is engineered to be highly reliable. Common points of failures like generators and cooling equipment are not shared across Availability Zones. Additionally, they are physically separate, such that even extremely uncommon disasters such as fires, tornados or flooding would only affect a single Availability Zone.

However, unless you have been offline for the past few days, you will have no doubt heard about the extended outage Amazon has been having in their US East region. The outage started on Thursday, 21st April 2011) taking down some big name sites such as Reddit, Quora, Foursquare & Heroku and the problems are still ongoing now, nearly 2 days later – with Reddit and Quora still running in an impaired state.

I have to confess, my first reaction was that of surprise that such big names didn’t have more redundancy in place – however, once more information came to light, it became apparent that the outage was affecting multiple availability zones – something Amazon seems to imply above shouldn’t happen.

You may well ask why such sites are not split across regions to give more isolation against such outages. The answer to this lies in the implementation of the zones and regions in AWS. Although isolated, the zones within a single region are close enough together that low cost, low latency links can be provided between the different zones within the same region. Once you start trying to run services across regions, all inta-region communication will go over the normal internet and is therefore comparatively slow, expensive and unreliable so it becomes much more difficult and expensive to keep data reliably syncronised. This coupled with Amazon’s above claims about the isolation between zones and best practises has lead to the common setup being to split services over multiple availability zones within the same region – and what makes this outage worst is that US East is the most popular region due to it being a convenient location for sites targeting both the US and Europe.

On the back of this, there are many people are giving both Amazon and cloud hosting a good bashing in both blog posts and on Twitter.

Where Amazon has let everyone down in this instance is that they let a problem (which in this case is largely centered around EBS) to affect multiple availability zones and thus screwing everyone who either had not implemented redundancy or had followed Amazon’s own guidelines and assurances of isolation. I also believe that their communication has been poor and had customers been aware it would take so long to get back online, they may have been in a position to look at measures to get back online much sooner.

In reality though, both Amazon and cloud computing less to do with this problem and more specifically the blame associated with it. At the end of the day, we work in an industry that is susceptible to failure. Whether you are hosting on bare metal or in the cloud, you will experience failure sooner or later and part of the design of any infrastructure you need to take that into account. Failure will happen – it’s all about mitigating the risk of this failure through measures like backups and redundancy. There is a trade-off between the cost, time and complexity of implementing multiple levels of redundancy verses the risk of failure and downtime. On each project or infrastructure setup, you need to work out where on this sliding scale you are.

In my opinion, cloud computing provides us an easy way out of such problems. Cloud computing gives us the ability to quickly spin up new services and server instances within minutes, pay by the hour for them and destroy them when they are no longer required. Gone are the days of having to order servers or upgrades and wait in a queue for a data center technician to deal with hardware. It was the norm to incur large setup costs and/or get locked into contracts. In the cloud, instances can be resized, provisioned or destroyed in minutes and often without human intervention as most cloud computing providers also provide an API so users can handle the management of their services programatically. Under load, instances can be upgraded or additional instances brought online and in quiet periods, instances can be downgraded or destroyed, yielding a significant cost saving. Another huge bonus is that instances can be spun up for development, testing or to perform an intensive task and thrown away afterwards.

Being able to spin new instances up in minutes is however less effective if you have to spend hours installing and configuring each instance before it can perform it’s task. This is especially true if more time is wasted chasing and debugging problems because something is setup differently or missed during the setup procedure. This is where configuration management tools or the ‘infrastructure as code’ principles come in. Tools such as Puppet and Chef were created to allow you to describe your infrastructure and configuration in code and have machines or instances provisioned or updated automatically.

Sure, with virtual machines and cloud computing, things have got a little easier by easily allowing re-usable machine images. You can setup a certain type of system once and re-use the image for any subsequent systems of the same type. This is however greatly limiting in that it’s very time consuming to then later update that image with small changes, to cope with small variations between systems and almost impossible to keep track of what changes have been made to which instances.

Configuration Management tools like Puppet and Chef manage system configuration centrally and can:

  • Be used to provision new machines automatically.
  • Roll out a configuration change across a number of servers.
  • Deal with small variations between systems or different types of systems (web, database, app, dns, mail, development etc).
  • Ensure all systems are in a consistant state.
  • Ensure consistency and repeatability.
  • Easily allow the use of source code control (version control) systems to keep a history of changes.
  • Easily allow the provisioning of development and staging environments which mimic production.

As time permits, i’ll publish some follow up posts which go into Puppet and Chef in more detail and look at how they can be used. I’ll also be publishing a review of James Turnbull’s new book, Pro Puppet which is due to go to print at the end of the month.

Categories: Web Tags: , , , , ,

Nginx and why you should be running it instead of, or at least in front of Apache

April 14th, 2011 3 comments

After 9 years of development, Nginx hit a milestone release this week when version 1.0.0 was released (on 12th April 2011). Despite only now reaching a 1.0 release, it is already in widespread use, powering a lot of high traffic websites and CDN’s and is very popular with developers in particular. With such a milestone release though, I thought it a good opportunity to get motivated and do some posts on it here.

Nginx (pronounced “engine-x”) is a free, open-source, high-performance HTTP server (aka web server) and reverse proxy, as well as an IMAP/POP3 proxy server. Igor Sysoev started development of Nginx in 2002, with the first public release in 2004.

Nginx is known for its high performance, stability, rich feature set, simple configuration, and low resource consumption. It is built specifically to be able handle more than 10,000 request/sec and do so using minimal server resources. It does this by using a non-blocking event based model.

In this article, i’m going to look at the problems with Apache and explain why you would want to use Nginx. In a subsequent article, i’m going to explain how to install and configure Nginx.

The most popular web server, Apache powers around 60% of the world’s web sites. I’ve been using Apache for around 10 years but more recently have been using Nginx. Due to it’s widespread use, Apache is well used, well understood and reliable. However, it does have some problems when we are dealing with high traffic websites. A lot of these problems center around the fact that it uses a blocking process based architecture.

The typical setup for serving PHP based websites in a LAMP (Linux, Apache, MySQL and PHP) environment uses the Prefork MPM and mod_php. The way this works is to have the PHP binary (and any other active Apache modules) embedded directly into the Apache process. This gives very little overhead and means Apache can talk to PHP very fast but also results in each Apache process consuming between 20MB and 50MB of RAM. The problem with this is that once a process is dealing with a request, it can not be used to serve another request so to be able to able to handle multiple simultaneous requests (and remember that even a single person visiting a web page will generate multiple requests because the page will almost certainly contain images, stylesheets and javascript files which all need to be downloaded before the page can render), Apache spawns a new child process for each simultaneous request it is handling. Because the PHP binary is always embedded (to keep the cost of spawning processes to a minimum), each of these processes takes the full 20MB-50MB of RAM, even if it is only serving static files so you can see how a server can quickly run out of memory.

To compound the problem, if a PHP script takes a while to execute (due to either processing/load or waiting on an external process like MySQL) or the client is on a slow/intermittent connection like a mobile device then the Apache process is tied up until the execution and transmission etc has completed which could be a while. These factors and a lot of traffic can often mean that Apache has hundreds of concurrent processes loaded and it can easily hit the maximum number of processes (configured) or completely exhaust the available RAM in the system (at which point it will start using the virtual memory on the hard disk and everything will get massively slower and further compound the problem). If a web page has say 10 additional assets (css, javascript and images), that’s 11 requests per user. If 100 users hit the page at the same time, that’s 1,100 requests and up to around 50GB of RAM required (although in reality you would have a limit on the number of Apache processes much lower than this so the requests would actually be queued and blocked until a process became free and browsers will generally open up a few simultaneous connections to a server at a time). Hopefully you are starting to see the problem.

With Nginx’s event based processing model, each request triggers events to a process and the process can handle multiple events in parallel. What this means is that Nginx can handle many simultaneous requests and deal with execution delays and slow clients without spawning processes. If you look at the two graphs from webfaction, you can quite clearly see that Nginx can handle a lot more simultaneous requests while using significantly less, and quite a constant level (and low amount) of RAM.

Nginx excels at serving static files and it can do so very fast. What we can’t do is embed something like PHP into the binary as PHP is not asynchronous and would block requests and therefore render the event based approach of Nginx useless. What we therefore do is have either PHP over FastCGI or Apache+mod_php in the background handle all the PHP requests. This way, Nginx can be used to serve all static files (css, javascript, images, pdf’s etc), handle slow clients etc but pass php requests over to one of these backend processes, receive the response back and handle delivering it to the client leaving the backend process free to handle other requests. Nginx doesn’t block while wating for FastCGI or Apache it just carries on handing events as they happen.

The other advantage of this “reverse proxy” mode is that Nginx can act as a load balancer and distribute requests to not just one but multiple backend servers over a network. Nginx can also act as a reverse caching proxy to reduce the amount of dynamic requests needing to be processed by the backend PHP server. Both of these functions allow even more simultaneous dynamic requests.

What this means is that if your application requires a specific Apache configuration or module then you can gain the advantages of Nginx handling simultaneous requests and serving static files but still use Apache to handle the requests you need it to.

If these is no requirement for Apache then Nginx also supports communication protocols like FastCGI, SCGI and UWSGI. PHP also happens to support FastCGI so we can have Nginx interact with PHP over FastCGI without needing the whole of Apache around.

In the past, you either had to use a script called spawn-fcgi to spawn FastCGI processes or handle FastCGI manually and then use some monitoring software to monitor them to ensure they were running. However, as of PHP 5.3.3, something called PHP-FPM is (which distributions often package up in a package called php5-fpm) part of the PHP core code which handle all this for you in a way similar to Apache – you can set the minimum and maximum number of proceses and how many you would like to spawn and keep around waiting. The other advantage to this is that PHP-FPM is an entirely separate process to Nginx so you can change configurations and restart each of them independently of each other (and Nginx actually supports reloading it’s configuration and upgrading it’s binary on-the-fly so it doesn’t require a restart).

In the next post in this series, i’ll explain how to install and configure Nginx for serving both static and dynamic content.

One of the disadvantages of Nginx is that it doesn’t support .htaccess files to dynamically modify the server configuration – all configuration must be stored in the Nginx config files and can not be changed at runtime. This is a positive for performance and security but makes it less suitable for running “shared hosting” platforms.

Categories: Web Tags: , , , ,

loads.in – how fast does your website load?

March 15th, 2011 1 comment

This is a very nice tool – loads.in.

You enter a URL and it shows you how long your site took to load, what was shown at certain intervals and a waterfall chart – from a certain location.

You can then choose another location from a whole list of continents and cities and repeat the test again from that location.

Categories: Uncategorized, Web Tags:

Twitter – lost touch with the users?

March 12th, 2011 No comments

It’s been an interesting few weeks for the social networking site, Twitter.

A few weeks ago, Twitter publishes a new version (3.3) of their iOS application. Amongst other changes, they introduced a very obtrusive and badly implemented quick bar feature which showed trending topics (including paid for ones) in a grey overlaid bar.

The feature seems to be universally hated and seems to have caused quite a backlash. Straight after the release, Twitter streams and blog feeds are awash with complaints of the so called “Dickbar” – so called after Chicagoan Dick Costolo, the (relatively) new CEO of Twitter.

Twitter reacted quite quickly and released an update which makes the feature slightly less annoying but doesn’t get rid of it. In amongst the flurry of complaints about the bar, i’ve seen a number of people who have changed to 3rd party applications or have resolved to stick with the older version of the application to avoid getting the bar.

In relation to this, I recently read an interesting article by Oliver Cameron entitled R.I.P Tweetie. Tweetie was originally a 3rd party Twitter client written by developer Loren Brichter and his company, Atebits. Tweetie became a vastly popular Twitter client, especially after the release of the 2nd major version, Tweetie 2. I remember purchasing Tweetie 2 a few hours after it’s release and have not really used another Twitter client since. Some of his UI elements, for example the ‘pull down to refresh’ method have become almost a standard user interface element across a lot of different iOS applications. In April 2010, Twitter announced that it had purchased Tweetie and Loren would be employed by them to work on the app. Oliver explains in his article that Loren was a widely respected and competent developer and some of the traites in the latest updates to Twitter for iOS do not exhibit the same quality that Loren would usual be known for.

As if the whole ‘Dickbar’ saga was not enough bad publicity, Twitter yesterday announced that despite their application originating as a third party client, they are now clamping down on 3rd party applications. Fred Oliveira has some interesting thoughts on this in his post, “Dear Twitter“. I quote:

PS: for a company that cares about user experience as your roadmap email mentioned, you have certainly weirded a few people out (me included) with #Dickbar. Have you lost touch with what people really want?

EDIT: a few more interesting posts have come out recently:

  • Twitter angers third-party developers with ‘no more timelines’ urging
  • Here’s Why Developers Are Scaring Twitter
  • Marco Arment has also come out with an interesting post.

    EDIT – April 2011: Twitter have released a new version of the app with the Quick Bar removed!

    Categories: Web Tags: ,

    Kohana 3 (KO3) PHP Framework with Nginx (and PHP-FPM)

    March 7th, 2011 No comments

    The Kohana manual (or for 3.0) gives the following configuration for doing clean url rewriting with Nginx:

    location / {
    index index.php index.html index.htm;
    try_files $uri index.php;
    }

    location = index.php {
    include fastcgi.conf;
    fastcgi_pass 127.0.0.1:9000;
    fastcgi_index index.php;
    }

    I hit two problems with this.

    Firstly, I had to change it to location = /index.php to get it to work.

    Once I had done that it seemed to work fine.

    However, I then realised that query string parameters were not working correctly. For example, if I had a URL of: http://mydomain.com/controller/param1/param2?myvar=123. Under Apache, that would work fine and $_GET['myvar'] would be set to 123. Under Nginx however, $_GET['myvar'] wouldn’t be set.

    I fixed this by using: try_files $uri /index.php?$query_string;

    My final, working config is therefore as follows:

    location / {
    try_files $uri /index.php?$query_string;
    }

    location = /index.php {
    fastcgi_pass 127.0.0.1:9000;
    fastcgi_index index.php;
    include fastcgi_params;
    }

    This seems to be working fine on an Ubuntu 10.10 box with: nginx-full 0.8.54-4ppa14~maverick and php5-fpm 5.3.3-1ubuntu9.3.

    It’s worth noting that the above won’t let you put any other .php files in your web root. If you want to be able to do that, you’ll want to use:
    location ~ \.php$ {
    instead of:
    location = /index.php {

    Note though that there is a bit of a security problem to be aware of with doing \.php$ – i’ll blog about this soon.

    Categories: Web Tags: , ,

    Network Path Reference (Protocol Relative URL)

    March 5th, 2011 No comments

    Anyone who has done web development on a site that uses a https:// (SSL) secure connection will be familiar with the annoying “This Page Contains Both Secure and Non-Secure Items” error (or exclamation mark in other browsers) if you include an asset in http:// on the page.

    This neat trick, which Paul Irish calls the “Protocol Relative URL”.

    If you use // at the beginning of the URL, it will automatically use whichever protocol you are currently using.

    eg:

    <img src="//domain.com/images/logo.png">
    <script src="//ajax.googleapis.com/ajax/libs/jquery/1.4.2/jquery.js"></script>
    .myclass { background: url(//domain.com/images/background.gif);

    The only cavet is that you should avoid using it with <link> or @import to include stylesheets as IE7 and IE8 have a bug where it will load the stylesheet twice, which is in-efficient (especially since downloading a stylesheet is a blocking action so it will halt rendering the page while it downloads).

    Categories: Web Tags: , ,

    IE6 Countdown

    March 4th, 2011 No comments

    You wont find many a web developer around who doesn’t hate IE (and in particular, IE6). It’s buggy, doesn’t support newer technologies and exhibits all kinds of problems which don’t appear in other browsers.

    Despite cross browser testing, it’s a regular occurrence for users to report problems that you can’t re-produce in your own browser.

    IE6 is in particular old and pain to maintain and a lot of sites are now not supporting it – often serving up a standard style sheet which gives the user the content but without the site styling.

    It comes as a pleasant treat for Microsoft to actively want to kill off IE6 and launching the IE6 Countdown site.

    Steve Webster makes and interesting point though – IE9 isn’t supported on Windows XP, which is still very much in use. Therefore the latest version of IE that XP users can have is IE8, which is still a 2 year old browser.

    Oh well, at least it’s a glimmer of hope! – the day IE6 dies will be a joyous day all round.

    What I dont quite understand though is where they are getting the data from….

    Categories: Web Tags: ,

    Remote JavaScript Debugging

    March 3rd, 2011 No comments

    Remy Sharp has produced a really useful tool called jsconsole.com.

    To quote:

    jsconsole.com is a simple JavaScript command line tool. However, it also provides the ability to bridge across to other browser windows to remotely control and debug that window – be it in another browser or another device altogether.

    He has put together a great video demonstrating how it works:

    You can read more about how it works here.

    Other good tools to check out for JavaScript development are:

  • jsbin.com
  • jsfiddle.net
  • Categories: Web Tags:

    IETester – Different Internet Explorer Versions in a Single Browser!

    November 11th, 2009 No comments

    Came across a very nifty utility for web developers – it’s called IETester.

    It’s a web browser which allows you to open up different tabs for different versions of Internet Explorer all at the same time.

    You can view IE 5, IE6, IE7 and IE8.

    It seems to fall down with some complicated Javascript but it’s great for testing how a site renders in the different versions or as usual, sorting out IE6 problems!

    It’s freeware and you can download it here.

    Categories: Web Tags: , , ,