Subscribe to

Archive for February, 2008

Tim Bray on the history of XML

Tim Bray has a terrific piece on the development of XML, now in its tenth year as an official standard. He focuses on the people, not on the technicalities of the standard.

It’s worth it just to re-read Tim’s words about Yuri Rubinsky, an SGML advocate of enormous energy and passion, without a mean bone in his body.

It’s also worth it to learn the off-the-mainstage history of XML, of course.

Reuters Semantic Web Web service

Let me disambiguate that title: Reuters is offering a Web service, called Calais, that will parse text and return it in a form (RDF) that can be utilized by Semantic Web applications. It uses natural language processing (from ClearForest) to find structures of meaning such as places, jobs, facts, events, etc. It apparently has its own metadata schema, but it allows users to extend it. It’s an open API, and Reuters is being quite generous in how much they’ll let you submit during this beta period. It’s English only for now, although they plan to support other languages, opening the exciting prospect of being able to find items of interest in languages you don’t understand via a unified metadata framework.

I’m going by the site’s FAQ. I haven’t tried it and can’t tell how well it works, how accurate it is, how comprehensive or detailed its metadata are, and how much post-processing cleanup uses will want to provide (which of course depends on the application). There are some points I just don’t understand, such as the claim “Calais carries your own metadata anywhere in the content universe.” But if it works within some reasonable definition of “works,” and if it gets widely adopted, Calais could make a lot more information a lot easier to find, and to process for further meaning. [Tags:semantic_web semweb reuters calais nlp ]

“Everybody is miscellaneous”

Doc has a nice post about the fact that everybody is miscellaneous (to use his phrase), and why being lumped with others gives him aggregaphobia (another nice turn of phrase). [Tags: ]

Cool visualizations

Bestiario is a Spanish group that does some insanely watchable visualizations of networks of information. For example, poke around at their way of mapping links.

I’m not very good at interpreting visual data so I can’t tell if it’s helpful, but it sure is cool. [Tags: visualization social_networks ]

Virtual business

Here’s an article about businesses glomming on to the virtual worlds thang…

Twitter + Maps + News

Of course, this is a little past tense this morning — with an emphasis on the tense — but here’s a very cool mashup of election results, Google maps and Twitter.. It’d be more useful to me if it would only show me tweets from the people I follow, but, well, maybe next election…[Tags: twitter election politics mashup ]

Keen vs. Me: The Word clouds

David Warlick has created word clouds based on the words Andrew Keen and I use in our debate in the WSJ. Pretty interesting. For me, the stand out result is that I use the word “amateur” far more than Andrew “Cult of the Amateur” Keen does.

The accidental and intentional

 “It starts to mutiply, the grading of tones, until it becomes thousands of tones,” he [John Currin] reflected. “Some are accidental and some are intentional. It’s great when the accidental becomes indistinguishable from the intentional. That’s when it begins to seem like a living thing.”

Calvin Tomkins, Profiles, “Lifting the Veil,” The New Yorker, January 28, 2008, p. 58

I’ve been playing with a little, and liking it a lot.

It’s a free site built by Marco Arment, who works at Tumblr (if I’m reading this right). You put the Instapaper “Read Later” button in your button bar, and click it if you’re on a site you want to read later. Go to and you’ll see a list of what you’ve clicked. Simplicity itself.

There seems to be just one more feature: Any text you’ve selected on the page your instapapering is taken as that page’s description.

That takes care of my temporary bookmarking needs, a feature I’ve wanted for a while. But I wonder what would happen if my instapaper page were public and pointable. Could we start to use instapaper to build a collaborative newspaper that pulls together the recommended reading of people you respect?

[Tags: ]