Showing posts with label scaling. Show all posts
Showing posts with label scaling. Show all posts

12 October 2009

Windows Does Not Scale

Who's afraid of the data deluge?


Researchers and workers in fields as diverse as bio-technology, astronomy and computer science will soon find themselves overwhelmed with information. Better telescopes and genome sequencers are as much to blame for this data glut as are faster computers and bigger hard drives.

While consumers are just starting to comprehend the idea of buying external hard drives for the home capable of storing a terabyte of data, computer scientists need to grapple with data sets thousands of times as large and growing ever larger. (A single terabyte equals 1,000 gigabytes and could store about 1,000 copies of the Encyclopedia Britannica.)

The next generation of computer scientists has to think in terms of what could be described as Internet scale. Facebook, for example, uses more than 1 petabyte of storage space to manage its users’ 40 billion photos. (A petabyte is about 1,000 times as large as a terabyte, and could store about 500 billion pages of text.)

Certainly not GNU/Linux: the latest Top500 supercomputer rankings show that the GNU/Linux family has 88.60%. Windows? Glad you asked: 1%.

So, forget about whether there will ever be a Year of the GNU/Linux Desktop: the future is about massive data-crunchers, and there GNU/Linux already reigns supreme, and has done for years. It's Windows that's got problems....

Follow me @glynmoody on Twitter or identi.ca.

02 January 2008

Vista's Problem: Microsoft Does Not Scale

It is deeply ironic that once upon a time Linux - and Linus - was taxed with an inability to scale. Today, though, when Linux is running everything from most of the world's supercomputers to the new class of sub-laptops like the Asus EEE PC and increasing numbers of mobile phones, it is Microsoft that finds itself unable to scale its development methodology to handle this range. Indeed, it can't even produce a decent desktop system, as the whole Vista fiasco demonstrates.

But the issue of scaling goes much deeper, as this short but insightful post indicates:

The world has been scaling radically since the Web first came on the scene. But the success of large, open-ended collaborations -- a robust operating system, a comprehensive encyclopedia, some "crowd-sourced" investigative journalism projects -- now is not only undeniable, but is beginning to shape expectations. This year, managers are going to have to pay attention.

Moreover, it points out exactly why scaling is important - and it turns out to be precisely the same reason that open source works so well (surprise, surprise):

The scaling is due to the basic elements in the Web equation: Lots of people, bazillions of pieces of information, and gigabazillions of links among them all. As more of the market, more of the supply chain, and more of the employees spend more of their time online, the scaled world of the Web begins to set the agenda for the little ol' real world.

23 February 2007

Going Beyond: Ultra VioLet Composer

One of the common criticisms of the open source development methodology is that it only works for a limited class of software, namely those with big constituencies. According to this view, there are unlikely to be successful open source projects for niche sectors.

That may have been true in the early days of free software, but now that the number of developers who are prepared to get involved has grown, the overall scaling means that more such niches can be addressed.

A good example is VioLet Composer:


A modular multispace desktop music composer for Win32 (source available for porting). Features realtime wavelet processing, flexible arrangement, extensible sample types, autosaving and much, much more. Now with support for simple Buzz effects.

Great fun.

19 February 2007

Everyone Loves Second Life

Well, not quite, but that's the impression you get reading the comments on this post, an unprecedented outpouring of gratitude. It's not hard to see why:

Since September concurrency rates have tripled, to a peak last week of over 34,000. While we love that so many people are enjoying Second Life, there have been some challenging moments in keeping up with the growth, resulting in the now somewhat infamous message “heavy load on the database”. When this happens it usually means that the demand for transmission of data between servers is outstripping the ability of the network to support it.

When the Grid is under stress, resulting in content loss and a generally poor experience, we would like to have an option less disruptive than bringing the whole Grid down. So we’ve developed a contingency plan to manage log-ins to the Grid when, in our judgment, the risk of content loss begins to outweigh the value of higher concurrency. Looking at the concurrency levels, it’s clear heaviest use is on the weekends.

When you open your log-in screen and see in the upper right hand corner Grid Status: Restricted, you’ll know that only those Second Life Residents who have transacted with Linden Lab either by being a premium account holder, owning land, or purchasing currency on the LindeX, will be able to log-in. Residents who are in Second Life when this occurs will only be affected if they log-out and want to return before the grid returns to normal status.

This is precisely what many SL residents have been calling for - some preferential treatment for those that pay.

Of course, it's in part an admission that SL isn't scaling too well, but equally I doubt if anybody ever expected the kind of growth that has been seen in the last few months. Unlike some, I don't see this as the end of the SL dream; the open sourcing of the viewer, and the confirmation that the server code would also be released were signs that Linden Lab knows that drastic measures are required to move into the next phase. Philip Rosedale and Cory Ondrejka, the two main brains behind the world and its code, are clever chaps, and I don't think they underestimate the magnitude of the task facing them. It will be interesting to see how these occasional lock-outs affect the influx of newbies and the general perception of SL.

05 February 2007

Some Things Do Scale, It Seems

Great quote here:

The most concerning issue is the growth of bandwidth as piracy has shifted from stealing an individual song on Napster to stealing albums on Kaaza to now using Bittorent to steal entire Discographies.

Maybe there's a bit of a lesson to be learned from this. If music companies had sorted this out a few years back, and gone straight to DRM-less downloads, they wouldn't be facing this massively greater problem today. Moreover, once entire discographies are being passed around, the game's over, because the record companies have nothing left to offer as an incentive to choose them over underground sources.

13 December 2005

Driving Hard

Hard discs are the real engines of the computer revolution. More than rising processing speeds, it is constantly expanding hard disc capacity that has made most of the exciting recent developments possible.

This is most obvious in the case of Google, which now not only searches most of the Web, and stores its (presumably vast) index on cheap hard discs, but also offers a couple of Gbytes of storage to everyone who uses/will use its Gmail. Greatly increased storage has also driven the MP3 revolution. The cheap availability of Gigabytes of storage means that people can - and so do - store thousands of songs, and now routinely expect to have every song they want on tap, instantly.

Yet another milestone was reached recently, when even the Terabyte (=1,000 Gbytes) became a relatively cheap option. For most of us mere mortals, it is hard to grasp what this kind of storage will mean in practice. One person who has spent a lot of time thinking hard about such large-scale storage and what it means is Jim Gray, whom I had the pleasure of interviewing last year.

On his Web site (at Microsoft Research), he links to a fascinating paper by Michael Lesk that asks the question How much information is there in the world? (There is also a more up-to-date version available.) It is clear from the general estimates that we are fast approaching the day when it will be possible to have just about every piece of data (text, audio, video) that relates to us throughout our lives and to our immediate (and maybe not-so-immediate) world, all stored, indexed and cross-referenced on a hard disc somewhere.

Google and the other search engines already gives us a glimpse of this "Information At Your Fingertips" (now where did I hear that phrase before?), but such all-encompassing Exabytes (1,000,000 Terabytes) go well beyond this.

What is interesting is how intimately this scaling process is related to the opening up of data. In fact, this kind of super-scaling, which takes us to realms several orders of magnitude beyond even the largest proprietary holdings of information, only makes sense if data is freely available for cross-referencing (something that cannot happen if there are isolated bastions of information, each with its own gatekeeper).

Once again, technological developments that have been in train for decades are pushing us inexorably towards an open future - whatever the current information monopolists might want or do.