Super Crunchers by Ian Ayres (Shane's book 39, 2011)

This is one of those books that feels like a good, long magazine article that has been expanded beyond the range of the material. Other examples include The Long Tail, Freakonomics and anything by Malcolm Gladwell. Indeed, Gladwell is probably the apotheosis of the form: his books feel like over-extended articles; his articles feel like over-extended anectdotes. [amtap book:isbn=0719564654]

Ayres at least has an interesting story to tell. The rise in the practice of analysing large data sets is changing the way many areas of our lives work, from finance to medicine, shopping to wine criticism. These changes are profound and although they will help us to make better decisions, they will also make a lot of people uncomfortable, not least those who consider themselves experts.We meet a man who created a formula for predicting the quality of wine years before it became drinkable and a man who has developed a computer programme that takes a person's symptoms and generates a comprehensive list of possible illnesses.

What Ayres calls 'super crunching' works by taking a set of criteria - a list of symptoms, for example - and checking it against a massive data set, such as a list of known medical conditions, to generate results that would have been almost impossible to produce manually. Various statistical techniques, such as regression analysis, are used to determine which criteria are relevant to the required outcome and these can then be assembled into a formula.

Obviously, this is the same role performed by a doctor, who uses training and experience to assess symptoms and make a diagnosis. However, human beings a not perfect reasoning machines. We tend to overestimate the significance of coincidences, for example, and to assume that patterns we have seen before will repeat themselves.

Computers don't do that. They deliver results based purely on the data. Of course, that means they are only as good as the data they are given and the criteria by which they assess it. Ayres makes clear that determining the factors to measure is still a job for a skilled human, as is deciding how to act on the results.

For example, it's possible to determine the likelihood that a convicted criminal will re-offend. Does that mean it is reasonable not to release those who have a high likelihood of re-offending? Most people would say no. Since all we can determine is a likelihood, we would be keeping locked up some people who would not have re-offended and that would be unfair.

Nevertheless, Ayres shows how some are using the results of data analysis in ways that most of us would consider to be unfair. Retailers are increasingly realising that they can determine how much a shopper would be willing to pay. That means instead of offering everyone the same price, they will charge each customer as much as they can get away with. If you demonstrate that you don't mind paying high prices then you can expect to be charged accordingly. The only answer, Ayres says, is for consumers to educate themselves.

Ayres has lots of examples but over 272 pages his material wears thin and he ends up repeating himself. Once you understand the concepts at work here, it doesn't really require an entire chapter to detail how the concept applies to a different field.

Furthermore, Ayres's central concept is a little fuzzy. There is no precise definition of 'super crunching'. When does mere 'crunching' become 'super'? When the data set is of a certain size? When it's done by a computer? Ayres doesn't give a clear answer because there isn't one. The form of these kind of non-fiction books requires Ayres to act as if we have just passed a pivotal moment in history, when in fact these techniques have been progressing over many decades and will continue to do so.

Still, Ayres is very readable and the subject is fascinating. Those who enjoyed Freakonomics or who are intrigued by the idea that statistically analysis can uncover 'hidden truths' should give this a read.