define( 'WPCACHEHOME', '/srv/www/' ); //Added by WP-Cache Manager April, 2011 | Ian Chilton


Archive for April, 2011

Optimizing for Happiness – why you want to go work at Github!

April 28th, 2011 No comments

If you are a manager or high up in any company then I highly recommend you watch this video of a recent talk by Tom Preston-Werner, co-founder of Github. It’s around an hour in length but I urge you to take the time to watch it – it’s packed full of great advice all the way through.

The way traditional businesses approach the management and organization of creative, intellectual workers is wrong. By throwing away everything that blocks productivity (meetings, deadlines, managers, titles, strict vacation policies, etc) and treating your employees as the responsible adults that they are, huge amounts of potential can be unlocked and employee happiness and retention can be at unprecedented highs. At GitHub we’ve embraced a philosophy that gets things done and strips away policy and procedure in favor of smart decision making and personal responsibility. Come see how we make it work and how you can reap the same benefits in your own company.

The video goes into both how they recruit and how they run a profitable and productive company.

At GitHub we don’t have meetings. We don’t have set work hours or even work days. We don’t keep track of vacation or sick days. We don’t have managers or an org chart. We don’t have a dress code. We don’t have expense account audits or an HR department.

We pay our employees well and give them the tools they need to do their jobs as efficiently as possible. We let them decide what they want to work on and what features are best for the customers. We pay for them to attend any conference at which they’ve gotten a speaking slot. If it’s in a foreign country, we pay for another employee to accompany them because traveling alone sucks. We show them the profit and loss statements every month. We expect them to be responsible.

We make decisions based on the merits of the arguments, not on who is making them. We strive every day to be better than we were the day before.

We hold our board meetings in bars.

We do all this because we’re optimizing for happiness, and because there’s nobody to tell us that we can’t.

You can watch the video here.

Tell me now that you don’t want to work at Github?

Categories: Interesting Tags: ,

Amazon Web Services, Hosting in the Cloud and Configuration Management

April 23rd, 2011 No comments

Amazon is probably the biggest cloud provider in the industry – they certainly have the most features and are adding more at an amazing rate.

Amongst the long list of services provided under the AWS (Amazon Web Services) banner are:

  • Elastic Compute Cloud (EC2) – scalable virtual servers based on the Xen Hypervisor.
  • Simple Storage Service (S3) – scalable cloud storage.
  • Elastic Load Balancing (ELB) – high availability load balancing and traffic distribution.
  • Elastic IP Addresses – re-assignable static ip addresses to EC2 instances.
  • Elastic Block Store (EBS) – persistant storage volumes for EC2.
  • Relational Database Service (RDS) – scalable MySQL compatible database services.
  • CloudFront – a Content Delivery Network (CDN) for serving content from S3.
  • Simple E-Mail System (SES) – for sending bulk e-mail.
  • Route 53 – high availability and scalable Domain Name System (DNS).
  • CloudWatch – monitoring of resources such as EC2 instances.

Amazon provides these services in 5 different regions:

  • US East (North Virginia)
  • US West (North California)
  • Europe (Ireland)
  • Asia Pacific – Tokyo
  • Asia Pacific – Singapore

Each region has it’s own pricing and features available.

Within each region, Amazon provides multiple “Availability Zones”. These different zones are completely isolated from each other – probably in separate data centers, as Amazon describes them as follows:

Q: How isolated are Availability Zones from one another?
Each availability zone runs on its own physically distinct, independent infrastructure, and is engineered to be highly reliable. Common points of failures like generators and cooling equipment are not shared across Availability Zones. Additionally, they are physically separate, such that even extremely uncommon disasters such as fires, tornados or flooding would only affect a single Availability Zone.

However, unless you have been offline for the past few days, you will have no doubt heard about the extended outage Amazon has been having in their US East region. The outage started on Thursday, 21st April 2011) taking down some big name sites such as Reddit, Quora, Foursquare & Heroku and the problems are still ongoing now, nearly 2 days later – with Reddit and Quora still running in an impaired state.

I have to confess, my first reaction was that of surprise that such big names didn’t have more redundancy in place – however, once more information came to light, it became apparent that the outage was affecting multiple availability zones – something Amazon seems to imply above shouldn’t happen.

You may well ask why such sites are not split across regions to give more isolation against such outages. The answer to this lies in the implementation of the zones and regions in AWS. Although isolated, the zones within a single region are close enough together that low cost, low latency links can be provided between the different zones within the same region. Once you start trying to run services across regions, all inta-region communication will go over the normal internet and is therefore comparatively slow, expensive and unreliable so it becomes much more difficult and expensive to keep data reliably syncronised. This coupled with Amazon’s above claims about the isolation between zones and best practises has lead to the common setup being to split services over multiple availability zones within the same region – and what makes this outage worst is that US East is the most popular region due to it being a convenient location for sites targeting both the US and Europe.

On the back of this, there are many people are giving both Amazon and cloud hosting a good bashing in both blog posts and on Twitter.

Where Amazon has let everyone down in this instance is that they let a problem (which in this case is largely centered around EBS) to affect multiple availability zones and thus screwing everyone who either had not implemented redundancy or had followed Amazon’s own guidelines and assurances of isolation. I also believe that their communication has been poor and had customers been aware it would take so long to get back online, they may have been in a position to look at measures to get back online much sooner.

In reality though, both Amazon and cloud computing less to do with this problem and more specifically the blame associated with it. At the end of the day, we work in an industry that is susceptible to failure. Whether you are hosting on bare metal or in the cloud, you will experience failure sooner or later and part of the design of any infrastructure you need to take that into account. Failure will happen – it’s all about mitigating the risk of this failure through measures like backups and redundancy. There is a trade-off between the cost, time and complexity of implementing multiple levels of redundancy verses the risk of failure and downtime. On each project or infrastructure setup, you need to work out where on this sliding scale you are.

In my opinion, cloud computing provides us an easy way out of such problems. Cloud computing gives us the ability to quickly spin up new services and server instances within minutes, pay by the hour for them and destroy them when they are no longer required. Gone are the days of having to order servers or upgrades and wait in a queue for a data center technician to deal with hardware. It was the norm to incur large setup costs and/or get locked into contracts. In the cloud, instances can be resized, provisioned or destroyed in minutes and often without human intervention as most cloud computing providers also provide an API so users can handle the management of their services programatically. Under load, instances can be upgraded or additional instances brought online and in quiet periods, instances can be downgraded or destroyed, yielding a significant cost saving. Another huge bonus is that instances can be spun up for development, testing or to perform an intensive task and thrown away afterwards.

Being able to spin new instances up in minutes is however less effective if you have to spend hours installing and configuring each instance before it can perform it’s task. This is especially true if more time is wasted chasing and debugging problems because something is setup differently or missed during the setup procedure. This is where configuration management tools or the ‘infrastructure as code’ principles come in. Tools such as Puppet and Chef were created to allow you to describe your infrastructure and configuration in code and have machines or instances provisioned or updated automatically.

Sure, with virtual machines and cloud computing, things have got a little easier by easily allowing re-usable machine images. You can setup a certain type of system once and re-use the image for any subsequent systems of the same type. This is however greatly limiting in that it’s very time consuming to then later update that image with small changes, to cope with small variations between systems and almost impossible to keep track of what changes have been made to which instances.

Configuration Management tools like Puppet and Chef manage system configuration centrally and can:

  • Be used to provision new machines automatically.
  • Roll out a configuration change across a number of servers.
  • Deal with small variations between systems or different types of systems (web, database, app, dns, mail, development etc).
  • Ensure all systems are in a consistant state.
  • Ensure consistency and repeatability.
  • Easily allow the use of source code control (version control) systems to keep a history of changes.
  • Easily allow the provisioning of development and staging environments which mimic production.

As time permits, i’ll publish some follow up posts which go into Puppet and Chef in more detail and look at how they can be used. I’ll also be publishing a review of James Turnbull’s new book, Pro Puppet which is due to go to print at the end of the month.

Categories: Web Tags: , , , , ,

Nginx and why you should be running it instead of, or at least in front of Apache

April 14th, 2011 3 comments

After 9 years of development, Nginx hit a milestone release this week when version 1.0.0 was released (on 12th April 2011). Despite only now reaching a 1.0 release, it is already in widespread use, powering a lot of high traffic websites and CDN’s and is very popular with developers in particular. With such a milestone release though, I thought it a good opportunity to get motivated and do some posts on it here.

Nginx (pronounced “engine-x”) is a free, open-source, high-performance HTTP server (aka web server) and reverse proxy, as well as an IMAP/POP3 proxy server. Igor Sysoev started development of Nginx in 2002, with the first public release in 2004.

Nginx is known for its high performance, stability, rich feature set, simple configuration, and low resource consumption. It is built specifically to be able handle more than 10,000 request/sec and do so using minimal server resources. It does this by using a non-blocking event based model.

In this article, i’m going to look at the problems with Apache and explain why you would want to use Nginx. In a subsequent article, i’m going to explain how to install and configure Nginx.

The most popular web server, Apache powers around 60% of the world’s web sites. I’ve been using Apache for around 10 years but more recently have been using Nginx. Due to it’s widespread use, Apache is well used, well understood and reliable. However, it does have some problems when we are dealing with high traffic websites. A lot of these problems center around the fact that it uses a blocking process based architecture.

The typical setup for serving PHP based websites in a LAMP (Linux, Apache, MySQL and PHP) environment uses the Prefork MPM and mod_php. The way this works is to have the PHP binary (and any other active Apache modules) embedded directly into the Apache process. This gives very little overhead and means Apache can talk to PHP very fast but also results in each Apache process consuming between 20MB and 50MB of RAM. The problem with this is that once a process is dealing with a request, it can not be used to serve another request so to be able to able to handle multiple simultaneous requests (and remember that even a single person visiting a web page will generate multiple requests because the page will almost certainly contain images, stylesheets and javascript files which all need to be downloaded before the page can render), Apache spawns a new child process for each simultaneous request it is handling. Because the PHP binary is always embedded (to keep the cost of spawning processes to a minimum), each of these processes takes the full 20MB-50MB of RAM, even if it is only serving static files so you can see how a server can quickly run out of memory.

To compound the problem, if a PHP script takes a while to execute (due to either processing/load or waiting on an external process like MySQL) or the client is on a slow/intermittent connection like a mobile device then the Apache process is tied up until the execution and transmission etc has completed which could be a while. These factors and a lot of traffic can often mean that Apache has hundreds of concurrent processes loaded and it can easily hit the maximum number of processes (configured) or completely exhaust the available RAM in the system (at which point it will start using the virtual memory on the hard disk and everything will get massively slower and further compound the problem). If a web page has say 10 additional assets (css, javascript and images), that’s 11 requests per user. If 100 users hit the page at the same time, that’s 1,100 requests and up to around 50GB of RAM required (although in reality you would have a limit on the number of Apache processes much lower than this so the requests would actually be queued and blocked until a process became free and browsers will generally open up a few simultaneous connections to a server at a time). Hopefully you are starting to see the problem.

With Nginx’s event based processing model, each request triggers events to a process and the process can handle multiple events in parallel. What this means is that Nginx can handle many simultaneous requests and deal with execution delays and slow clients without spawning processes. If you look at the two graphs from webfaction, you can quite clearly see that Nginx can handle a lot more simultaneous requests while using significantly less, and quite a constant level (and low amount) of RAM.

Nginx excels at serving static files and it can do so very fast. What we can’t do is embed something like PHP into the binary as PHP is not asynchronous and would block requests and therefore render the event based approach of Nginx useless. What we therefore do is have either PHP over FastCGI or Apache+mod_php in the background handle all the PHP requests. This way, Nginx can be used to serve all static files (css, javascript, images, pdf’s etc), handle slow clients etc but pass php requests over to one of these backend processes, receive the response back and handle delivering it to the client leaving the backend process free to handle other requests. Nginx doesn’t block while wating for FastCGI or Apache it just carries on handing events as they happen.

The other advantage of this “reverse proxy” mode is that Nginx can act as a load balancer and distribute requests to not just one but multiple backend servers over a network. Nginx can also act as a reverse caching proxy to reduce the amount of dynamic requests needing to be processed by the backend PHP server. Both of these functions allow even more simultaneous dynamic requests.

What this means is that if your application requires a specific Apache configuration or module then you can gain the advantages of Nginx handling simultaneous requests and serving static files but still use Apache to handle the requests you need it to.

If these is no requirement for Apache then Nginx also supports communication protocols like FastCGI, SCGI and UWSGI. PHP also happens to support FastCGI so we can have Nginx interact with PHP over FastCGI without needing the whole of Apache around.

In the past, you either had to use a script called spawn-fcgi to spawn FastCGI processes or handle FastCGI manually and then use some monitoring software to monitor them to ensure they were running. However, as of PHP 5.3.3, something called PHP-FPM is (which distributions often package up in a package called php5-fpm) part of the PHP core code which handle all this for you in a way similar to Apache – you can set the minimum and maximum number of proceses and how many you would like to spawn and keep around waiting. The other advantage to this is that PHP-FPM is an entirely separate process to Nginx so you can change configurations and restart each of them independently of each other (and Nginx actually supports reloading it’s configuration and upgrading it’s binary on-the-fly so it doesn’t require a restart).

In the next post in this series, i’ll explain how to install and configure Nginx for serving both static and dynamic content.

One of the disadvantages of Nginx is that it doesn’t support .htaccess files to dynamically modify the server configuration – all configuration must be stored in the Nginx config files and can not be changed at runtime. This is a positive for performance and security but makes it less suitable for running “shared hosting” platforms.

Categories: Web Tags: , , , ,

Stop Windows Restarting Automatically After Scheduled Updates

April 13th, 2011 No comments

If I had to name the most annoying thing about Windows, it would be that it automatically re-starts after automatically installing Windows updates (assuming for a minute that IE isn’t part of Windows of course!).

I always have a lot open on a system – browsers & tabs, terminal sessions etc etc and numerous times, i’ve come back to a system on a morning, logged in and stared at a blank taskbar in disbelief. It’s also a big problem if you connect in to use a machine from an outside location, particularly if you use drive encryption and it requires a password on boot.

Even if the results are not so catastrophic and i’m actually at the system, i’m forever clicking ‘Remind me later’ each time it pops up saying it’s going to restart.

I’ve always got round this by disabling automatic Windows updates and just installed them manually, periodically.

This very problem happened to me this morning and a friend pointed out a great tip – you can actually disable the automatic reboot. Like disabling automatic updates, you still have to remember to do this on each machine, but at least updates still get applied.

You do this as follows:

Start -> Run -> gpedit.msc
Computer Configuration -> Administrative Templates -> Windows Components -> Windows Update
Double click on: “No auto-restart for scheduled Automatic Updates installations”
Change to Enabled
Click Ok

Categories: Software Tags:

Threading / Blocking vs Event Driven Servers (and Node.js)

April 9th, 2011 No comments

I was just reading an old (Nov 2009) article by Simon Willison (of Django, The Guardian and Lanyrd Fame) discussing the emergence of Node.js.

Two of the things in particular I found interesting about the article:

Firstly, he cleverly predicted the importance and future popularity of Node.JS – and boy was he right. A year and a bit later and Node.JS is everywhere. I don’t think a day ever goes by when I don’t see at least one mention and/or article about it.

Secondly, he has a brilliant (and simple) description of threading / blocking servers vs event driven servers (just like Apache vs Nginx).

Event driven servers are a powerful alternative to the threading / blocking mechanism used by most popular server-side programming frameworks. Typical frameworks can only handle a small number of requests simultaneously, dictated by the number of server threads or processes available. Long-running operations can tie up one of those threads—enough long running operations at once and the server runs out of available threads and becomes unresponsive. For large amounts of traffic, each request must be handled as quickly as possible to free the thread up to deal with the next in line.

This makes certain functionality extremely difficult to support. Examples include handling large file uploads, combining resources from multiple backend web APIs (which themselves can take an unpredictable amount of time to respond) or providing comet functionality by holding open the connection until a new event becomes available.

Event driven programming takes advantage of the fact that network servers spend most of their time waiting for I/O operations to complete. Operations against in-memory data are incredibly fast, but anything that involves talking to the filesystem or over a network inevitably involves waiting around for a response.

With Twisted, EventMachine and Node, the solution lies in specifying I/O operations in conjunction with callbacks. A single event loop rapidly switches between a list of tasks, firing off I/O operations and then moving on to service the next request. When the I/O returns, execution of that particular request is picked up again.

You can read the full article here: Node.js is genuinely exciting.

If you are not familiar with Node.js (have you been living under a rock for the past year?? :)), there is a great video by the author, Ryan Dahl here:

Categories: Interesting Tags:

Why Can’t Developers Estimate Time?

April 9th, 2011 1 comment

Ashley Moran just wrote an interesting article about why developers can’t estimate time.

Something most developers find hard is estimating the time something will take.

If it’s something you have done before, a repeatable task, then it’s easy – although most of the time things take longer than you think they will when you think it through in your head and quite often it turns out to be more complicated than you thought when you get into it or you hit a bug of some kind that you end up having to debug which takes a long time and compounds the issue.

Sometimes the problem of bad estimates is made worse is when estimates (which are at the end of the day an educated guess) are taken as a promise and therefore if it takes longer then there is a feeling that it should have been done quicker.

I particularly found these quotes interesting and amusing:

We can’t estimate the time for any individual task in software development because the nature of the work is creating new knowledge.

Rule of thumb: take the estimates of a developer, double it and add a bit

The double-and-add-a-bit rule is interesting. When managers do this, how often are tasks completed early? We generally pay much more attention to overruns than underruns

Categories: development Tags:

Faster Broadband – BT Infinity (Fibre to the Cabinet) Coming to Ingleby Barwick – How does it work?

April 7th, 2011 No comments

I first learned about BT Infinity last year when a friend I used to work with at BT pointed out that my exchange, Ingleby Barwick had been scheduled to be enabled in June 2011. At the time that sounded a long time away but now it’s getting closer, I decided to do a bit of digging and find out technically how it worked.

BT Infinity is BT’s new Fibre To The Cabinet (FTTC) broadband service which is slowly being rolled out across the country and promises speeds of up to 40Mbps downstream and 10Mbps upstream.

Pretty much all of the country can now get broadband in the form of ADSL (Asymmetric Digital Subscriber Line). ADSL works by utilising the existing copper wiring for the telephone lines already in the majority of homes. Unused frequencies are utilised to send data over the lines and a splitter is placed on the customers phone sockets to split the broadband signals off and allow simultaneous use of the telephone and broadband.

The original ADSL standard gives a theoretical maximum downstream speed of 8Mbps and upstream speed of 1Mbps and the newer ADSL2+ standard gives a theoretical maximum downstream speed of 24Mbps and upstream speed of 3.3Mbps. I say theoretical because it’s practically impossible to ever obtain those speeds unless you are literally next door to the exchange. The majority of people only obtain a fraction of those speeds. I am reasonably lucky to be able to be able to get 5Mbps downstream and just under 1Mbps upstream – most people I know get even less than that. (My modem actually sync’s at around 6000kbps and 1000kbps but only get around 5Mbps and just under 1Mbps real world speed test). The reason for this is that when transmitting signals over long distances of copper wire, noise on the line degrades the signal and the maximum speed reduces. As the exchange can be miles away from the premises and cable ducts do not necessarily even run directly as the crow flies, this loss can be great. It’s also very very sensitive to dodgy wiring – for this reason it’s recommended to plug the modem/router into the master socket and use a filtered faceplate to filter off the ADSL signal before the extension wiring to minimise the risk of interference (which is exactly what I do).

The only other serious option for fast home broadband in the UK is if you are in the coverage area for Virgin Media’s Cable Internet service. I used to have cable internet for several years – however when we moved house, even though it was less than 5 minutes walk round the corner, our new street is not wired for cable. There is a Virgin Media cabinet opposite our road end but they have not cabled down the street. If you can get cable then you are able to obtain speeds of up to 100Mbps downstream and 10Mbps upstream through their latest packages. The good thing about cable internet is when you sign up for a certain package, be it 10Mbs, 20Mbps, 30Mbps, 50Mbps or 100Mbps, you do actually get that speed connection. Of course, with connection in NTL’s network and quite harsh traffic management (throttling) they apply at peak periods, you wont necessarily see those real life download speeds all of the time, but at least you are actually connected at the speed you are paying for. The way this is achieved is, Comcast as they were known when they first started laying cables (they were later sold to NTL and later to Virgin), similar to BT, distributed cabinets around their coverage area to interconnect users. However, unlike BT, Comcast ran fibre optic connections to their cabinets rather than huge quantities of copper wires (one pair per line). What they then do is lay low-loss coaxial cable (coax) between the nearest cabinet and the premises. A single length of coax can provide the subscriber with fast broadband and cable television.

With ADSL, a modem at the customers premises (usually ISP’s supply a combined ADSL modem and router so the single connection can be shared between several machines connected via ethernet or 802.11 wireless using NAT (Network Address Translation) which allows multiple computers on a Local Area Network (LAN) to communicate to the internet with a single external IP address) connects to the ADSL splitter and over the copper wire direct to DSLAM equipment in the telephone exchange.

With BT’s new FTTC network (BT Infinity), as it’s name suggests, the local cabinets are connected via fibre (optics) back to the telephone exchange (which I assume will be connected by fibre to BT’s core network). As fibre travels at the speed of light, there is negligible loss – hence why it’s also used to connect different countries together round the world.

When I first heard about it, I didn’t really think about it and assumed that they would do similar to Virgin’s cable service and lay new cables of some kind between the cabinet and the customers premises. When you actually think about it though, that would be expensive, slow to roll out and would end up like Virgin’s cable network – severely limited to certain areas. i.e it wouldn’t really be practical.

So, how does it work? – well, what they are doing is using a technology called VDSL which is similar to the current ADSL technology already in use.

What this means in reality is the following:

Once the exchange has been enabled for FTTC, they will distribute new and slightly bigger cabinets which will house DSLAM equipment which is similar (but newer) to that currently housed in the telephone exchange, the fibre backbone, patch panels and a cross connect to the existing BT cabinet.

When you order BT Infinity, an engineer will come out to install the product. He will replace your current master socket with a new NTE5 master socket with a built in filter (so the modem/router will need to go into the master socket as the broadband frequencies will be split off before the extensions). He will then hook up a VDSL modem and separate “BT Home Hub” router.

What this modem does is connects using VDSL from your home to the VDSL cabinet using the existing copper wires, on to the DSLAM, back to the exchange (over the fibre) and on to BT’s core network.

Your phone line is then still terminated in the original cabinet for telephony (using the cross connect between the old and new cabinets I mentioned above) and back to the exchange over the original multi pair copper cable as it always has done.

What this means is that the noise induced loss is now only an issue between your premises and the cabinet rather than your premises and the exchange. This is how they are able to provide the quoted maximum speeds of 40Mbps downstream and 10Mbps upstream and you are much more likely to be able to get somewhere near these speeds depending on the distance to your cabinet and the quality of the lines and wiring to it.

Another area worth mentioning is currently, any ISP (Internet Service Provider) can sell you ADSL broadband. They either do this by using BT’s network and renting from their wholesale devision (I believe the product is called IPStream) or renting space in their exchanges and installing their own equipment (known as LLU). I believe this will still be possible with the new FTTC network but there doesn’t yet seem to be a great uptake in this – probably due to the costs involved.

Disclaimer: This is only my own knowledge mixed with snippets i’ve read about FTTC rather than any inside information so please do feel free to comment if you know anything here to be incorrect.

Categories: Phones Tags: , , ,

Book Review: Sphinx Search Beginner’s Guide

April 5th, 2011 No comments

Packtpub were kind enough to send me a copy of their new book, Sphinx Search Beginner’s Guide to review.

The book is written by Abbas Ali who is currently working as Chief Operating Officer and Technical Manager at SANIsoft Technologies Private Limited, Nagpur, India. The company specializes in development of large, high performance, and scalable PHP applications.

Sphinx is well described by it’s website as follows:

Sphinx is an open source full text search server, designed from the ground up with performance, relevance (aka search quality), and integration simplicity in mind. It’s written in C++ and works on Linux (RedHat, Ubuntu, etc), Windows, MacOS, Solaris, FreeBSD, and a few other systems.

Sphinx lets you either batch index and search data stored in an SQL database, NoSQL storage, or just files quickly and easily — or index and search data on the fly, working with Sphinx pretty much as with a database server.

The book covers everything from the installation and setup of Sphinx to simple and advanced use in PHP.

Here is a full outline of what’s covered:

  • Chapter 1, Setting Up Sphinx is an introduction to Sphinx. It guides the reader through the installation process for Sphinx on all major operating systems.
  • Chapter 2, Getting Started demonstrates some basic usage of Sphinx in order to test its installation. It also discusses full-text search and gives the reader an overview of Sphinx.
  • Chapter 3, Indexing teaches the reader how to create indexes. It introduces and explains the different types of datasources, and also discusses different types of attributes that can comprise an index.
  • Chapter 4, Searching teaches the reader how to use the Sphinx Client API to search indexes from within PHP applications. It shows the reader how to use the PHP implementation of the Sphinx Client API.
  • Chapter 5, Feed Search creates an application that fetches feed items and creates a Sphinx index. This index is then searched from a PHP application. It also introduces delta indexes and live index merging.
  • Chapter 6, Property Search creates a real world real estate portal where the user can add a property listing and specify different attributes for it so that you can search for properties based on specific criteria. Some advanced search techniques using a client API are discussed in this chapter.
  • Chapter 7, Sphinx Configuration discusses all commonly used configuration settings for
    Sphinx. It teaches the reader how to configure Sphinx in a distributed environment where
    indexes are kept on multiple machines.
  • Chapter 8, What Next? discusses some new features introduced in the recent Sphinx release.
    It also shows the reader how a Sphinx index can be searched using a MySQL client library.
  • Lastly, it discusses the scenarios where Sphinx can be used and mentions some of the
    popular Web applications that are powered by a Sphinx search engine.

At first, the style of the book seemed a bit strange to me – it’s split up into small chunks which are often followed by a “What just happened” section which gives a summary or broken down explanation of the concept just explained. Once I got used to it though, this actually improved the clarity and aided understanding.

The book is a very informative read for both beginners to either search or Sphinx and existing users and i’d highly recommend it to anyone interested in either search or the Sphinx product.

Anyone willing to find out more about or purchase the book can do so on the Packetpub website.

Categories: Reviews Tags: , , , ,