19 March 2006

How Do I Blog Thee?

Let me count the ways.

List blog

The original: lots and lots of links to things with no theme but their sum.

Diary blog

The other original - but don't try this at home unless you are really interesting.

Shard blog

Not quite a list blog, not quite a diary blog: instead, small fragments of a life refracted through the links encountered each day.

News blog

Lots of useful links on a well-defined subject area, plus quotes and the odd dash of intelligent comment.

Essay blog

Longer, more thoughtful postings, typically one per day: mental meat to chew on.

Photo blog

A picture is worth a thousand blog postings.

Video blog

Done well, this is the ultimate magic casement in the middle of your screen, a window on another world.

18 March 2006

Economistical with the Truth

The Economist is a strange beast. It has a unique writing style, born of the motto "simplify, then exaggerate"; and it has an unusual editorial structure, whereby senior editors read every word written by those reporting to them - which means the editor reads every word in the magazine (at least, that's the way it used to work). Partly for this reason, nearly all the articles are anonymous: the idea is that they are in some sense a group effort.

One consequence of this anonymity is that I can't actually prove I've written for title (which I have, although it was a long time ago). But on the basis of a recent showing, I don't think I want to write for it anymore.

The article in question, which is entitled "Open, but not as usual", is about open source, and about some of the other "opens" that are radiating out from it. Superficially, it is well written - as a feature that has had multiple layers of editing should be. But on closer examination, it is full of rather tired criticisms of the open world.

One of these in particular gets my goat:

...open source might already have reached a self-limiting state, says Steven Weber, a political scientist at the University of California at Berkeley, and author of “The Success of Open Source” (Harvard University Press, 2004). “Linux is good at doing what other things already have done, but more cheaply—but can it do anything new? Wikipedia is an assembly of already-known knowledge,” he says.

Well, hardly. After all, the same GNU/Linux can run globe-spanning grids and supercomputers; it can power back office servers (a market where it bids fair to overtake Microsoft soon); it can run on desktops without a single file being installed on your system; and it is increasingly appearing in embedded devices - mp3 players, mobile phones etc. No other operating system has ever achieved this portability or scalability. And then there's the more technical aspects: GNU/Linux is simply the most stable, most versatile and most powerful operating system out there. If that isn't innovative, I don't know what is.

But let's leave GNU/Linux aside, and consider what open source has achieved elsewhere. Well, how about the Web for a start, whose protocols and underlying software have been developed in a classic open source fashion? Or what about programs like BIND (which runs the Internet's name system), or Sendmail, the most popular email server software, or maybe Apache, which is used by two-thirds of the Internet's public Web sites?

And then there's Wikimedia, which powers Wikipedia (and a few other wikis): even if Wikipedia were merely "an assembly of already-known knowledge", Wikimedia (based on the open source applications PHP and MySQL) is an unprecedentedly large assembly, unmatched by any proprietary system. Enough innovation for you, Mr Weber?

But the saddest thing about this article is not so much these manifest inaccuracies as the reason why they are there. Groklaw's Pamela Jones (PJ) has a typically thorough commentary on the Economist piece. From corresponding with its author, she says "I noticed that he was laboring under some wrong ideas, and looking at the finished article, I notice that he never wavered from his theory, so I don't know why I even bothered to do the interview." In other words, the feature is not just wrong, but wilfully wrong, since others, like PJ, had carefully pointed out the truth. (There's an old saying among journalists that you should never let the facts get in the way of a good story, and it seems that The Economist has decided to adopt this as its latest motto.)

But there is a deeper irony in this sad tale, one carefully picked out by PJ:

There is a shocking lack of accuracy in the media. I'm not at all kidding. Wikipedia has its issues too, I've no doubt. But that is the point. It has no greater issues than mainstream articles, in my experience. And you don't have to write articles like this one either, to try to straighten out the facts. Just go to Wikipedia and input accurate information, with proof of its accuracy.

If you would like to learn about Open Source, here's Wikipedia's article. Read it and then compare it to the Economist article. I think then you'll have to agree that Wikipedia's is far more accurate. And it isn't pushing someone's quirky point of view, held despite overwhelming evidence to the contrary.

Wikipedia gets something wrong, you can correct it by pointing to the facts; The Economist gets it wrong - as in the piece under discussion - and you are stuck with an article that is, at best, Economistical with the truth.

17 March 2006

Google's Grief, Open Source's Gain?

The news that a judge has ordered Google to turn over all emails from a Gmail account, including deleted messages, has predictably sent a shiver of fear down the collective spine of the wired community, all of whom by now have Gmail accounts. Everybody can imagine themselves in a similar situation, with all their most private online thoughts suddenly revealed in this way.

The really surprising thing about this development is not that it's happened, but that anyone considers it surprising. Lawyers were bound to be tempted by the all unguarded comments lying in emails, and judges were bound to be convinced that since they existed it was legitimate to look at them for evidence of wrong-doing. And Google, ultimately, is bound to comply: after all, it's in the business of making money, not of martyrdom.

So the question is not so much What can we do to stop such court orders being made and executed? but What can we do to mitigate them?

Moving to another email provider like Yahoo or Hotmail certainly won't help. And even setting up your own SMTP server to send email won't do much good, since your ISP probably has copies of bits of your data lying around on its own servers that sooner or later will be demanded by somebody with a court order.

The only real solution seems to be to use strong encryption to make each email message unreadable except by the intended recipient (and even this is an obvious weakness).

It would, presumably, be relatively simple for Google to add this to Gmail. But even if it won't, there is also a fine open source project called Enigmail, which is an extension to the Mozilla family of email readers - Thunderbird et al. - currently nearing version 1.0. The problem is that installation is fairly involved, since you must first set up GnuPG, which provides the cryptographic engine. If the free software world could make this process easier - a click, a passphrase and you're done - Google's present grief could easily be turned into open source's opportunity.

16 March 2006

The Power of Open Genomics

The National Human Genome Research Institute (NHGRI), one of the National Institutes of Health (NIH), has announced the latest round of mega genome sequencing projects - effectively the follow-ons to the Human Genome Project. These are designed to provide a sense of genomic context, and to allow all the interesting hidden structures within the human genome to be teased out bioinformatically by comparing them with other genomes that diverged from our ancestors at various distant times.

Three more primates are getting the NHGRI treatment: the rhesus macacque, the marmoset and the orangutan. But alongside these fairly obvious choices, eight more mammals will be sequenced too. As the press release explains:

The eight new mammals to be sequenced will be chosen from the following 10 species: dolphin (Tursiops truncates), elephant shrew (Elephantulus species), flying lemur (Dermoptera species), mouse lemur (Microcebus murinus), horse (Equus caballus), llama (Llama species), mole (Cryptomys species), pika (Ochotona species), a cousin of the rabbit, kangaroo rat (Dipodomys species) and tarsier (Tarsier species), an early primate and evolutionary cousin to monkeys, apes, and humans.

If you are not quite sure whom to vote for, you might want to peruse a great page listing all the genomes currently being sequenced for the NHGRI, which provides links to a document (.doc, alas, but you can open it in OpenOffice.org) explaining why each is important (there are pix, too).

More seriously, it is worth noting that this growing list makes ever more plain the power of open genomics. Since all of the genomes will be available in public databases as soon as they are completed (and often before), this means that bioinformaticians can start crunching away with them, comparing species with species in various ways. Already, people have done the obvious things like comparing humans with chimpanzees, or mice with rats, but the possibilities are rapidly becoming extremely intriguing (tenrec and elephant, anyone?).

And beyond the simple pairing of genomes, which yields a standard square-law richness, there are even more inventive combinations involving the comparison of multiple genomes that may reveal particular aspects of the Great Digital Tree of Life, since everything may be compared with everything, without restriction. Now imagine trying to do this if genomes had been patented, and groups of them belonged to different companies, all squabbling over their "IP". The case for open genomics is proved, I think.

15 March 2006

Microsoft Goes (a Bit More) Open Source

Many people were amazed back in 2004 when Microsoft released its first open source software, Windows Installer XML (WiX). But this was only the first step in a long journey towardness openness that Microsoft is making - and must make - for some time to come.

It must make it because the the traditional way of writing software simply doesn't work for the ever-more complex, ever-more delayed projects that Microsoft is engaged upon: Brooks' Law, which states that "Adding manpower to a late software project makes it later," will see to this if nothing else does.

Microsoft itself has finally recognised this. According to another fine story from Mary Jo Foley, who frequently seems to know more about what's happening in the company than Bill Gates does:

Beta testing has been the cornerstone of the software development process for Microsoft and most other commercial software makers for as long as they've been writing software. But if certain powers-that-be in Redmond have their way, betas may soon be a thing of the past for Microsoft, its partners and its customers.

The alternative is to adopt a more fluid approach that is a commonplace in the open source world:

Open source turned the traditional software development paradigm on its head. In the open source world, testers receive frequent builds of products under development. Their recommendations and suggestions typically find their way more quickly into developing products. And the developer community is considered as important to writing quality code as are the "experts" shepherding the process.

One approach to mitigating the effects of Brooks' Law is to change the fashion in which the program is tested. Instead of doing this in a formal way with a few official betas - which tend to slow down the development process - the open source method allows users to make comments earlier and more frequently on multiple builds as they are created, and without hindering the day-to-day working of developers, who are no longer held hostage by artificial beta deadlines that become ends in themselves rather than means.

E-commerce 2.0

It is striking how everybody is talking about Web 2.0, and yet nobody seems to mention e-commerce 2.0. In part, this is probably because few have managed to work out how to apply Web 2.0 technologies to e-commerce sites that are not directly based on selling those technologies (as most Web 2.0 start-ups are).

For a good example of what an e-commerce 2.0 site looks like, you could do worse than try Chinesepod.com (via Juliette White), a site that helps you learn Mandarin Chinese over the Net.

The Web 2.0-ness is evident in the name - though I do wish people would come up with a different word for what is, after all, just an mp3 file. It has a viral business model - make the audio files of the lessons freely available under a Creative Commons licence so that they can be passed on, and charge for extra features like transcripts and exercises. The site even has a wiki (which has some useful links).

But in many ways the most telling feature is the fact that as well as a standalone blog, the entire opening page is organised like one, with the lessons arranged in reverse chronological order, complete with some very healthy levels of comments. Moreover, the Chinesepod people (Chinese podpeople?) are very sensibly drawing on the suggestions of their users to improve and extend their service. Now that's what I call e-commerce 2.0.

14 March 2006

Will Data Hoarding Cost 150 Million Lives?

The only thing separating mankind from a pandemic that could kill 150 million people are a few changes in the RNA of the H5N1 avian 'flu virus. Those changes would make it easier for the virus to infect and pass between humans, rather than birds. Research into the causes of the high death-rate among those infected by the Spanish 'flu - which killed between 50 and 100 million people in 1918 and 1919, even though the world population was far lower then than now - shows that it was similar changes in a virus otherwise harmless to humans that made the Spanish 'flu so lethal.

The good news is that with modern sequencing technologies it is possible to track those changes as they happen, and to use this information to start preparing vaccines that are most likely to be effective against any eventual pandemic virus. As one recent paper on the subject put it:

monitoring of the sequences of viruses isolated in instances of bird-to-human transmission for genetic changes in key regions may enable us to track viruses years before they develop the capacity to replicate with high efficiency in humans.

The bad news is that most of those vital sequences are being kept hidden away by the various national laboratories that produce them. As a result, thousands of scientists outside those organisations do not have the full picture of how the H5N1 virus is evolving, medical communities cannot plan properly for a pandemic, and drug companies are hamstrung in their efforts to develop effective vaccines.

The apparent reason for the hoarding - because some scientists want to be able to publish their results in slow-moving printed journals first so as to be sure that they are accorded full credit by their peers - beggars belief against a background of growing pandemic peril. Open access to data never looked more imperative.

Although the calls to release this vital data are gradually becoming more insistent, they still seem to be falling on deaf ears. One scientist who has been pointing out longer than most the folly of the current situation is the respected researcher Harry Niman. He has had a distinguised career in the field of viral genomics, and is the founder of the company Recombinomics.

The news section of this site has long been the best place to find out about the latest developments in the field of avian 'flu. This is for three reasons: Niman's deep knowledge of the subject, his meticulous scouring of otherwise-neglected sources to find out the real story behind the news, and - perhaps just as important - his refusal meekly to tow the line that everything is under control. For example, he has emphasised that the increasing number of infection clusters indicates that human-to-human transmission is now happening routinely, in flat contradiction to the official analysis of the situation.

More recently, he has pointed out that the US decision to base its vaccine on a strain of avian 'flu found in Indonesia is likely to be a waste of time, since the most probable pandemic candidate has evolved away from this.

The US Government's choice is particularly worrying because human cases of avian 'flu in North America may be imminent. In another of Niman's characteristically forthright analyses, he suggests that there is strong evidence that H5N1 is already present in North America:

Recombinomics is issuing a warning based on the identification of American sequences in the Qinghai strain of H5N1 isolated in Astrakhan, Russia. The presence of the America sequences in recent isolates in Astrakhan indicates H5N1 has already migrated to North America. The levels of H5N1 in indigenous species will be supplemented by new sequences migrating into North America in the upcoming months.

Niman arrived at this conclusion by tracking the genomic changes in the virus as it travelled around the globe with migrating birds, using some of the few viral sequences that have been released.

Let's hope for the sake of everyone that WHO and the other relevant organisations see the light and start making all the genomic data available. This would allow Niman and his many able colleagues to monitor even the tiniest changes, so that the world can be alerted at the earliest possible moment to the start of a pandemic that may be closer than many think.

Update: In an editorial, Nature is now calling for open access to all this genomic data. Unfortunately, the editorial is not open access....

13 March 2006

OU on UK ID DBs

Talking of the Open University, here's an interesting research report from them on the UK Government's plans to introduce ID cards. The study looks at things from a slightly novel angle: people's attitudes to the scheme, and how they vary according to the details.

The most interesting result was that even those moderately in favour of the idea became markedly less enthusiastic when the card was compulsory and a centralised rather than distributed database was used to store the information. Since this is precisely what the government is planning to do, the research rather blows a hole in their story that the British population is simply begging them to introduce ID cards. John Lettice has provided more of his usual clear-headed analysis on the subject.

What is also fascinating is how the British public - or at least the sample interviewed - demonstrated an innate sense of how unwise such a centralised database would be. I think this argues a considerable understanding of what is on the face of it quite an abstract technical issue. There's hope yet - for the UK people, if not for the UK Government....

12 March 2006

Mozart the Blogger

To celebrate the 250th anniversary of Mozart's birth, I've been reading some of his letters, described by Einstein (Alfred, not his cousin Albert) as "the most lively, the most unvarnished, the most truthful ever written by a musician". It is extraordinary to think that these consist of the actual words that ran through Mozart's head, probably at the same time when he was composing some masterpiece or other as a background task. To read them is to eavesdrop on genius.

The other striking thing about them is their volume and detail. Mozart was an obsessive letter-writer, frequently knocking out more than one a day to his wide range of regular correspondents. And these are no quick "having a lovely time, wish you were here" scribbles on the back of a postcard: they often run to many pages, and consist of extended, complex sentences full of dazzling wordplay, describing equally rich ideas and complicated situations, or responding in thoughtful detail to points made in the letters he received.

Because they are so long, the letters have a strong sense of internal time: that is, you feel that the end of the letter is situated later than the beginning. As a result, his letters often function as a kind of diary entry, a log of the day's events and impressions - a kind of weblog without the reverse chronology (and without the Web).

Mozart was a blogger.

If this intense letter-writing activity can be considered a proto-blog, the corollary is that blogs are a modern version of an older epistolary art. This is an important point, because it addresses two contemporary concerns in one fell swoop: that the art of the letter is dead, and that there is a dearth of any real substance in blogs.

We are frequently told that modern communications like the telephone and email have made the carefully-weighed arrangement of words on the page, the seductive ebb and flow of argument and counter-argument, redundant in favour of the more immediate, pithier forms. One of the striking things about blogs is that some - not all, certainly - are extremely well written. And even those that are not so honed still represent considerable effort on the part of their authors - effort that 250 years ago was channelled into letters.

This means that far from being the digital equivalent of dandruff - stuff that scurfs off the soul on a daily basis - the growing body of blog posts represents a renaissance of the art of letter-writing. In fact, I would go further: no matter how badly written a blog might be, it has the inarguable virtue of being something that is written, and then - bravely - made public. As such, it is another laudable attempt to initiate or continue a written dialogue of a kind that Mozart would have understood and engaged with immediately. It is another brick - however humble - in the great edifice of literacy.

For this reason, the current fashion to decry blogs as mere navel-gazing, or vacuous chat, is misguided. Blogs are actually proof that more and more people - 30,000,000 of them if you believe Technorati - are rediscovering the joy of words in a way that is unparalleled in recent times. We may not all be Mozarts of the blog, but it's better than silence.

11 March 2006

Open University Meets Open Courseware

Great news (via Open Access News and the Guardian): the Open University is turning a selection of its learning materials into open courseware. To appreciate the importance of this announcement, a little background may be in order.

As its fascinating history shows, the Open University was born out of Britain's optimistic "swinging London" culture of the late 1960s. The idea was to create a university open to all - one on a totally new scale of hundreds of thousands of students (currently there are 210,000 enrolled). It was evident quite early on that this meant using technology as much as possible (indeed, as the history explains, many of the ideas behind the Open University grew out of an earlier "University of the Air" idea, based around radio transmissions.)

One example of this is a close working relationship with the BBC, which broadcasts hundreds of Open University programmes each week. Naturally, these are open to all, and designed to be recorded for later use - an early kind of multimedia open access. The rise of the Web as a mass medium offered further opportunities to make materials available. By contrast, the holdings of the Open University Library require a username and password (although there are some useful resources available to all if you are prepared to dig around).

Against this background of a slight ambivalence to open access, the announcement that the Open University is embracing open content for at least some of its courseware is an extremely important move, especially in terms of setting a precedent within the UK.

In the US, there is already the trail-blazing MIT OpenCourseWare project. Currently, there are materials from around 1250 MIT courses, expected to rise to 1800 by 2007. Another well-known example of open courseware is the Connexions project, which has some 2900 modules. This was instituted by Rice University, but now seems to be spreading ever wider. In this it is helped by an extremely liberal Creative Commons licence, that allows anyone to use Connexions material to create new courseware. MIT uses a Creative Commons licence that is similar, except it forbids commercial use.

At the moment, there's not much to see at the Open University's Open Content Initiative site. There is an interesting link is to information from the project's main sponsor, the William and Flora Hewlett Foundation, about its pioneering support for open content. This has some useful links at the foot of the page to related projects and resources.

One thing the Open University announcement shows is that open courseware is starting to pick up steam - maybe a little behind the related area of open access, but coming through fast. As with all open endeavours, the more there are, the more evident the advantages of making materials freely available becomes, and the more others follow suit. This virtuous circle of openness begetting openness is perhaps one of the biggest advantages that it has over the closed, proprietary alternatives, which by their very nature take an adversarial rather than co-operative approach to those sharing their philosophy.

09 March 2006

RIAA Fights to the Death for DRM - Your Death

The ever-perceptive Ed Felten has an amazing story about the Record Industry Association of America (RIAA) and its friends-in-copyright fighting to keep DRM on people's systems in all circumstances - even those that might be life-threatening. From his post:

In order to protect their ability to deploy this dangerous DRM, they want the Copyright Office to withhold from users permission to uninstall DRM software that actually does threaten critical infrastructure and endanger lives.

In fact, it's enough to gaze (not too long, mind) at the RIAA's home page: it is a cacophony of "lawsuits", "penalties", "pirates", "theft" and "parental advisories" - a truly sorry example of narrow-minded negativity. Whatever happened to music as one of the loftiest expressions of the human spirit?

Savonarola, St. Francis - or St. IGNUcius?

There's a well-written commentary on C|Net that makes what looks like a neat historical parallel between Savonarola and Richard Stallman; in particular, it wants us to consider the GPL 3 as some modern-day equivalent of a Bonfire of the Vanities, in which precious objects were consigned to the flames at the behest of the dangerous and deranged Savonarola.

It's a clever comparison, but it suffers from a problem common to all clever comparisons: they are just metaphors, not cast-iron mathematical isomorphisms.

For example, I could just as easily set up a parallel between Stallman and St. Francis of Assisi: both renounced worldy goods, both devoted themselves to the poor, both clashed with the authorities on numerous occasions, and both produced several iterations of their basic tenets. And St. Francis never destroyed, as Savonarola did: rather, he is remembered for restoring ruined churches - just as Stallman has restored the ruined churches of software.

In fact, Stallman is neither Savonarola nor St. Francis, but his own, very special kind of holy man: St. IGNUcius of the Church of Emacs.

The Dream of Open Data

Today's Guardian has a fine piece by Charles Arthur and Michael Cross about making data paid for by the UK public freely accessible by them. But it goes beyond merely detailing the problem, and represents the launch of a campaign called "Free Our Data". It's particularly good news that the unnecessary hoarding of data is being addressed by a high-profile title like the Guardian, since a few people in the UK Government might actually read it.

It is rather ironic that at a time when nobody outside Redmond disputes the power of open source, and when open access is almost at the tipping point, open data remains something of a distant dream. Indeed, it is striking how advanced the genomics community is in this respect. As I discovered when I wrote Digital Code of Life, most scientists in this field have been routinely making their data freely available since 1996, when the Bermuda Principles were drawn up. The first of these stated:

It was agreed that all human genomic sequence information, generated by centres funded for large-scale human sequencing, should be freely available and in the public domain in order to encourage research and development and to maximise its benefit to society.

The same should really be true for all kinds of large-scale data that require governmental-scale gathering operations. Since they cannot be feasibly gathered by private companies, such data ends up as a government monopoly. But trying to exploit that monopoly by crudely over-charging for the data is counter-productive, as the Guardian article quantifies. Let's hope the campaign gathers some momentum - I'll certainly being doing my bit.

Update: There is now a Web site devoted to this campaign, including a blog.

Enter the Splogfighter

Talking of splogs, I came across (via SEO Data) the valiant Splogfighter's Blogger-based anti-splog blog. All power to whatever part of the virtual anatomy he/she/it uses in this laudable effort.

08 March 2006

Splog in a Box?

A long time ago, in a galaxy far away - well, in California, about 1994 - O'Reilly came out with something called "Internet in a Box". This wasn't quite the entire global interconnect of all networks in a handy cardboard container, but rather a kind of starter kit for Web newbies - and bear in mind that in those days, the only person who was not a newbie was Tim (not O'Reilly, the other one).

Two components of O'Reilly's Internet in a Box were particularly innovative. One was Spry Mosaic, a commercial version of the early graphical Web browser Mosaic that arguably began the process of turning the Web into a mass medium. Mosaic had two important offspring: Netscape Navigator, created by some of the original Mosaic team, and its nemesis, Internet Explorer. In fact, if you choose the "About Internet Explorer" option on the Help menu of any version of Microsoft's browser, you will see to this day the surprising words:

Based on NCSA Mosaic. NCSA Mosaic(TM); was developed at the National Center for Supercomputing Applications at the University of Illinois at Urbana-Champaign.
Distributed under a licensing agreement with Spyglass, Inc.

So much for Bill Gates inventing the Internet....

The other novel component of "Internet in a Box" was the Global Network Navigator. This was practically the first commercial Web site, and certainly the first portal: it was actually launched before Mosaic 1.0, in August 1993. Unfortunately, this pioneering site was later sold to AOL, where it sank without trace (as most pioneers do when they are sold to AOL: anybody remember the amazing Internet search company WAIS? No, I thought not.)

Given this weight of history, it seems rather fitting that something called Boxxet should be announced at the O’Reilly Emerging Technology Conference, currently running in San Diego. New Scientist has the details:

A new tool offers to create websites on any subject, allowing web surfers to sit back, relax and watch a virtual space automatically fill up with relevant news stories, blog posts, maps and photos.

The website asks its users to come up with any subject they are interested in, such as a TV show, sports team or news topic, and to submit links to their five favourite news articles, blogs or photos on that subject. Working only from this data, the site then automatically creates a webpage on that topic, known as a Boxxet. The name derives from “box set”, which refers to a complete set CDs or DVDs from the same band or TV show.

As this indicates, Boxxet is a kind of instant blog - just add favourite links and water. It seems the perfect solution for a world where people are so crushed by ennui that most bloggers can't even be bothered posting for more than a few weeks. Luckily, that's what we have technology for: to spare us all those tiresome activities like posting to blogs, walking to the shops or changing television channels by getting up and doing it manually.

It's certainly a clever idea. But I just can't see myself going for this Blog in a Box approach. Perhaps I over-rate the specialness of my merely human blogging powers; perhaps I just need to wait until the Singularity arrives in a few years time, and computers are able to produce trans-humanly perfect blogs.

What I can see - alas - are several million spammers rubbing their hands with glee at the thought of a completely automatic way of generating spurious, self-updating blogs. Not so much Blog in a Box as Splog in a Box.

07 March 2006

The Other Grid God: Open Source

As I was browsing through Lxer.com, my eye caught this rather wonderful headline: "Grid god to head up Chicago computing institute". The story explains that Ian Foster, one of the pioneers in the area of grid computing (and the grid god in question), is moving to the Computation Institute (great name - horrible Web site).

Grid computing refers to the seamless linking together across the Internet of physically separate computers to form a huge, virtual computer. It's an idea that I've been following for some time, not least because it's yet another area where free software trounces proprietary solutions.

The most popular toolkit for building grids comes from the Globus Alliance, and this is by far the best place to turn to find out about the subject. For example, there's a particularly good introduction to grid computing's background and the latest developments.

The section dealing with grid architecture notes that there is currently a convergence between grid computing and the whole idea of Web services. This is only logical, since one of the benefits of having a grid is that you can access Web services across it in a completely transparent way to create powerful virtual applications running on massive virtual hardware.

The Globus Alliance site is packed with other resources, including a FAQ, a huge list of research papers on grids and related topics, information about the Globus Toolkit, which lets you create grids, and the software itself.

Open source's leading position in the grid computing world complements a similar success in the related field of supercomputing. As this chart shows, over 50% of the top 500 supercomputers in the world run GNU/Linux; significantly, Microsoft Windows does not even appear on the chart.

This total domination of top-end computing - be it grids or supercomputers - by open source is one of the facts that Microsoft somehow omits to tell us in its "Get The Facts" campaign.

06 March 2006

Blogging Newspapers

One of the interesting questions raised by the ascent of blogs is: What will the newspapers do? Even though traditional printed titles are unlikely to disappear, they are bound to change. This post, from the mysteriously-named "Blue Plate Special" blog (via C|Net's Esoteric blog) may not answer that question, but it does provide some nutritious food for thought.

It offers its views on which of the major US dailies blog best, quantified through a voting system. Although interesting - and rich fodder for those in need of a new displacement activity - the results probably aren't so important as the criteria used for obtaining them. They were as follows:

Ease-of-use and clear navigation
Currency
Quality of writing, thinking and linking
Voice
Comments and reader participation
Range and originality
Explain what blogging is on your blogs page
Show commitment

The blog posting gives more details on each, but what's worth noting is that most of these could be applied to any blog - not just those in newspapers. Having recently put together my own preliminary thoughts on the Art of the Blog, I find that these form a fascinating alternative view, and with several areas of commonality. I strongly recommend all bloggers to read the full article - whether or not you care about blogging newspapers.

05 March 2006

Google Googlied by Spaiku Adages

Today was a black day in the annals of my Gmail account: I received my first piece of spam. You might think I should be rejoicing that I've only ever received one piece of spam, but bear in mind that this is a relatively new account, and one that I've not used much. Moreover, Gmail comes with spam filtering as standard: you might hope that Google's vast computing engines would be able consistently to spot spam.

So far they have: the spam bucket of my account lists some 42 spam messages that Google caught. The question is: why did Google get googlied by this one? It's not particularly cunning: it has the usual obfuscated product names (it's one of those), with some random characters and the usual poetic signoff.

Actually, now that I come to check, this turns out to be slightly special:

Work first and then rest.
Actions speak louder than words.
Old head and young hand.

Maybe this is Gmail's Achille's Heel: it is defenceless in the face of spam haiku (spaiku?) adages.

04 March 2006

The European Digital Library: Dream, but Don't Touch

With all the brouhaha over the Google Book Search Library Project, it is easy to overlook other efforts directed along similar lines. I'm certainly guilty of this sin of omission when it comes to The European Library, about which I knew nothing until very recently.

The European Library is currently most useful for carrying out integrated searches across many European national libraries (I was disappointed to discover that neither Serbia nor Latvia has any of my books in their central libraries). Its holdings seem to be mainly bibliographic, rather than links to the actual text of books (though there are some exceptions).

However, a recent press release from the European Commission seems to indicate that The European Library could well be transmogrified into something altogether grander: The European Digital Library. According to the release:

At least six million books, documents and other cultural works will be made available to anyone with a Web connection through the European Digital Library over the next five years. In order to boost European digitisation efforts, the Commission will co-fund the creation of a Europe-wide network of digitisation centres.

Great, but it adds:

The Commission will also address, in a series of policy documents, the issue of the appropriate framework for intellectual property rights protection in the context of digital libraries.

Even more ominously, the press release concludes:

A High Level Group on the European Digital Library will meet for the first time on 27 March 2006 and will be chaired by Commissioner Reding. It will brings together major stakeholders from industry and cultural institutions. The group will address issues such as public-private collaboration for digitisation and copyrights.

"Stakeholders from industry and cultural institutions": but, as usual, nobody representing the poor mugs who (a) will actually use this stuff and (b) foot the bill. So will our great European Digital Library be open access? I don't think so.

The Amazing Amazon Mechanical Turk

OK, so I may be well behind the times, but I still found this rather amazing when I came across it. Not so much for what it is - a version of Google Answers - but for the fact that Amazon is doing it.

Google I can understand: its Answers service is reaching the parts its other searches cannot - a complement to the main engine (albeit a tacit admission of defeat on Google's part: resorting to wetware, whatever next?). But Amazon? What has a people-generated answer service got to do with selling things? Come on Jeff, focus.

Cool name, though.

Digg This, It's Groovy

Digg.com is a quintessentially Web 2.0 phenomenon: a by-the-people, for-the-people version of Slashdot (itself a keyWeb 1.0 site). So Digg's evolution is of some interest as an example of part of the Net's future inventing itself.

A case in point is the latest iteration, which adds a souped-up comment system (interestingly, this comes from the official Digg blog, which is on Blogger, rather than self-hosted). Effectively, this lets you digg the comments.

An example is this story: New Digg Comment System Released!, which is the posting by Kevin Rose (Digg's founder) about the new features. Appropriately enough, this has a massive set of comments (nearly 700 at the time of writing).

The new system's not perfect - for example, there doesn't seem to be any quick way to roll up comments which are initially hidden (because they have been moderated away), but that can easily be fixed. What's most interesting is perhaps the Digg sociology - watching which comments get stomped on vigorously, versus those that get the thumbs up.

Tying the Kangaroo Down

If any proof were needed that some people still don't really get the Internet, this article is surely it. Apparently Australia's copyright collection agency wants schools to pay a "browsing fee" every time a teacher tells students to browse a Web site.

Right.

So, don't tell me: the idea is to ensure that students don't use the Web, and that they grow up less skilled in the key enabling technology of the early twenty-first century, that they learn less, etc. etc. etc.?

Of course, the fact that more and more content is freely available under Creative Commons licences, or is simply in the public domain, doesn't enter into the so-called "minds" of those at the copyright collection office. Nor does the fact that by making this call they not only demonstrate their extraordinary obtuseness, but also handily underline why copyright collection agencies are actually rather irrelevant these days. And that rather than waste schools' time and money paying "browsing fees", Australia might perhaps do better to close down said irrelevant, clueless copyright office, and save some money instead?

03 March 2006

Beyond Parallel Universes

One of the themes of this blog is the commonality between the various opens. In a piece I wrote for the excellent online magazine LWN.net, I've tried to make some of the parallels between open source and open access explicit - to the point where I set up something of a mapping between key individuals and key moments (Peter Suber at Open Access News even drew a little diagram to make this clearer).

My article tries to look at the big picture, largely because I was trying to show those in the open source world why they should care about open access. At the end I talk a little about specific open source software that can be used for open access. Another piece on the Outgoing blog (subtitle: "Library metadata techniques and trends"), takes a closer look at a particular kind of such software, that for repositories (where you can stick your open access materials).

This called forth a typically spirited commentary from Stevan Harnad, which contains a link to yet more interesting words from Richard Poynder, a pioneering journalist in the open access field, with a blog - called "Open and Shut" (could there be a theme, here?) - that is always worth taking a look at. For example, he has a fascinating interview on the subject of the role of open access in the humanities.

Poynder rightly points out that there is something a contradiction in much journalistic writing about open access, in that it is often not accessible itself (even my LWN.net piece was subscribers-only for a week). And so he's bravely decided to conduct a little experiment by providing the first section of a long essay, and then asking anyone who reads it - it is freely accessible - and finds it useful to make a modest donation. I wish him well, though I fear it may not bring him quite the income he is hoping for.

01 March 2006

There's No INSTEDD without Open Access

An interesting story in eWeek.com. Larry Brilliant, newly-appointed head of the Google.org philanthropic foundation, wants to set up a dedicated search engine that will spot incipient disease outbreaks.

The planned name is INSTEDD: International Networked System for Total Early Disease Detection - a reference to the fact that it represents an alternative option to just waiting for cataclysmic infections - like pandemics - to happen. According to the article:

Brilliant wants to expand an existing web crawler run by the Canadian government. The Global Public Health Intelligence Network monitors about 20,000 Web sites in seven languages, searching for terms that could warn of an outbreak.

What's interesting about this - apart from the novel idea of spotting outbreaks around the physical world by scanning the information shadow they leave in the digital cyberworld - is that to work it depends critically on having free access to as much information and as many scientific and medical reports as possible.

Indeed, this seems a clear case where it could be claimed that not providing open access in relevant areas - and the range of subjects that are relevant is vast - is actually endangering the lives of millions of people. Something for publishers and their lawyers to think about, perhaps.

Higgins: Social Web, Social Commerce

Identity is a slippery thing at the best of times. On the Internet it's even worse (as the New Yorker cartoon famously encapsulated). But identity still matters, and sorting it out is going to be crucial if the Internet is to continue moving into the heart of our lives.

Of course, defining local solutions is easy: that's why you have to remember 33 different passwords for 33 different user accounts (you do change the password for each account, don't you?) at Amazon.com and the rest. The hard part is creating a unitary system.

The obvious way to do this is for somebody to step forward - hello Microsoft Passport - and to offer to handle everything. There are problems with this approach - including the tasty target that the central identity stores represent for ne'er-do-wells (one reason why the UK Government's proposed ID card scheme is utterly idiotic), and the concentration of power it creates (and Microsoft really needs more power, right?).

Ideally, then, you would want a completely modular, decentralised approach, based on open source software. Why open source? Well, if it's closed source, you never really know what it's doing with your identity - in the same way that you never really know what closed software in general is doing with your system (spyware, anyone?).

Enter Higgins, which not only meets those requirements, but is even an Eclipse project to boot. As the goals page explains:

The Higgins Trust Framework intends to address four challenges: the lack of common interfaces to identity/networking systems, the need for interoperability, the need to manage multiple contexts, and the need to respond to regulatory, public or customer pressure to implement solutions based on trusted infrastructure that offers security and privacy.

Perhaps the most interesting of these is the "multiple contexts" one:

The existence of common identity/networking framework also makes possible new kinds of applications. Applications that manage identities, relationships, reputation and trust across multiple contexts. Of particular interest are applications that work on behalf of a user to manage their own profiles, relationships, and reputation across their various personal and professional groups, teams, and other organizational affiliations while preserving their privacy. These applications could provide users with the ability to: discover new groups through shared affinities; find new team members based on reputation and background; sort, filter and visualize their social networks. Applications could be used by organizations to build and manage their networks of networks.

The idea here seems to be a kind of super-identity - a swirling bundle of different cuts of your identity that can operate according to the context. Although this might lead to fragmentation, it would also enable a richer kind of identity to emerge.

As well as cool ideas, Higgins also has going for it the backing of some big names: according to this press release, those involved include IBM, Novell, the startup Parity Communications (Dyson Alert: Esther's in on this one, too) and the Berkman Center for Internet & Society at Harvard Law School.

The latter is also involved in SocialPhysics.org, whose aim is

to help create a new commons, the "social web". The social web is a layer built on top of the Internet to provide a trusted way to link people, organizations, and concepts. It will provide people more control over their digital identities, the ability to more easily find other people and groups, and more control over how they are seen by others across diverse contexts.

There is also a blog, called Social Commerce, defined as "e-commerce + social networking + user-centric identity". There are lots of links here, as well as on the SocialPhysics site. Clearly there's much going on in this area, and I'm sure I'll be returning to it in the future.