Subscribe to

Archive for August, 2007

Tagging like it was 2002

Matt Mower writes:

I have been surprised, disappointed, and excited that, despite the widespread adoption of tagging across many applications, the state of the art in tagging seems firmly wedged in 2003. Surprised because there seemed, despite the expectations of many that nobody would tag things, to be a momentum building in the use of tagging. Dissappointed because I expected to be using applications that really used tagging to do some interesting things. Excited because it means the field is still open.

 Paolo Valdemarin continues the thread.

Matt and Paolo are behind K-Collector and Nova100, tag-based systems, so this is stuff they think a lot about.

The irony of bookstores

While searching for a copy of EiM at Border’s, Scott Karp was struck by the irony of searching for a book about the problems with traditional organization in a store organized traditionally. Scott goes on to predict the demise not just of the bookstore but of business books in general…and the continued rise of linked writing on the Web.

Preserve the record by manifesting the context

This morning if you search Google for “Enron,” the top hit is (the creditors’ recovery page) and the second is the Wikipedia article on Enron. The first listing from is about 45th and it’s a TimesSelect (= pay) page that doesn’t even actually reference Enron. That’s an example of what’s on the mind of the Times’ ombudsman (um, “public editor”) Clark Hoyt when he begins his column. He finds the Times’ “business strategy” of getting “its articles to pop up first in Internet searches” — well, at least not at #45 — responsible for the quandary the Times finds itself in when it comes to the errors in its archive. I don’t quite see it that way.

Hoyt takes as his example an article abot Allen Kraus, who “once led a welfare office praised for its efforts to uncover fraud.” The Times first reported he resigned under pressure after a bribery investigation without including Kraus’ side of the story and later published a more balanced follow-up. Kraus says his boss eventually publicly sided with Kraus’ version. The details don’t matter much, although I must say it’s a relief for a change not to be talking about John Siegenthaler. The point is that Kraus is understandably upset that searches on his name turn up the Times’ faulty story. If that’s all you read, you’d think he’s a crook.

Hoyt then considers several solutions to this problem, seeming to favor the suggestion that the Time expunge faulty articles from its archive.


In fact, the solution is already in place. If you google “allen kraus” (in quotes), the #1 hit is a Times topic page about him that lists first the corrective article and then the faulty one. Perfect! We get the context we need while preserving the record. Topic pages are in fact the Times attempt to move its content up the Google results page. They give us a single, persistent URL that aggregates everything the Times knows about a topic…including what it got wrong.

Jeez, if the Times expunged from its archive every article about Iraq Judith Miller wrote, we’d think the Times slept through the whole run-up to the war. And future researchers would never understand how culpable the Times was for getting us into that miss. Bloggers get this right-er than Hoyt when we use strikethrough font to indicate an error we’ve corrected. We need the full archive.

Topic pages are a great solution to the problem of providing context, as well as advancing the Times’ search engine optimization desires. Removing articles from the record destroys the value of the record. You shouldn’t write history by rewriting the record.

So, rather than setting “time-outs” for articles based on how important the Times’ judges them, which is Hoyt’s suggestion, do more topic pages. And harvest the power of the crowd to create more topic pages and more context. [Tags: nytimes wikipedia newspapers journalism history archives everything_is_miscellaneous ]

Victorian scholarship and the miscellaneous

Patrick Leary had a terrific article in Journal of Victorian Culture in 2005 that Alexander Macgillivray just pointed out to me. It’s called “Googling the Victorians,” and the premise is: “Fortuitous electronic connections, and the information that circulates through them, are emerging as hallmarks of humanities scholarship in the digital age. ” He’s got some great examples — tracking down the meaning of an 1858 cartoon’s “Remember the grotto!” caption — to make the point that “What is most striking, and often quite useful, about this sort of fishing expedition is how often the sources in which one finds a ‘hit’ are utterly unexpected.” Here’s another:

…when searching for additional instances, beyond those I had found in print sources, in which the Saturday Review had
been referred to by its critics’ nickname, the Saturday Reviler. Google instantly
located the phrase in the following: a biographical account of Charles Haddon
Spurgeon, as a favourite epithet of his associates; the short-lived 1872 periodical,
The Ladies; an 1864 book about the contemporary stage magicians the Brothers
Davenport; an appendix, by Richard Burton, to his 1885 edition of Arabian Nights;
and a magazine account of a conversation with Frank Harris about his tenure as
editor in the 1890s.

Leahy goes on:

Such experiences reinforce the
conviction that the very randomness with which much online material has been
placed there, and the undiscriminating quality of the search procedure itself,
gives it an advantage denied to more focused research. It has been often and
rather piously proclaimed (by myself, among others) that googling around the
internet cannot possibly substitute for good old-fashioned library research, and
this is certainly true. But we are perhaps reaching a point in our relationship to
the online world at which it is important to recognize that the reverse is equally
true. No amount of time spent in the library stacks would have suggested to me
that any of those sources would be an especially good place to look for instances
of that particular phrase, and if it had, the likelihood of actually discovering
the phrase in a printed edition of any of them would have been virtually nil.

This is an excellent argument for reversing the current momentum of copyright law. Our culture benefits from having as much of this stuff searchable and available as possible. Since 19th century stuff is generally out of copyright, the Victorian scholars are in good shape, as Leahy notes. But why should our ability to research, learn and understand suddenly come to a galloping halt towards the beginning of the 20th century?

I don’t want to miss another of Leahy’s points: “…the vast reach of online
searching is connecting people, not merely with information, but with one
another, often in the most unexpected and fruitful ways.” [Tags: copyright scholarship google everything_is_miscellaneous ]

Miscellaneous FrontPorch

FrontPorch presents itself as a positive example of the power of the miscellaneous…

Taste and quirks

Pandora is really in a groove this morning. One of its channels is playing song after song that I like. Usually, I have to thumbs down about every fourth song. But all that training has paid off. Pandora seems to know my tastes. At least this morning.

I’ve been told (but haven’t checked) that Pandora works because it hires people to tag the gazillions of songs in its library with lots and lots of metadata: The style, tempo, key, gender of the singers … on and on. That enables it to find other songs you might like by looking at the attributes of the ones you do like. It works.

And yet, what do I really like about a particular track? That the singer is female or the chorus brings in backup singers? Nah. Within the range of songs I like — I have one channel crystallized around Duke Ellington and another around the Stones and the Beatles — I like this song because the singer’s voice cracks and that one because the bass thumps really loudly in the bridge. I like this one because of the absurdity of the lyrics and that one because the melody is broken across several instruments. And then there’s that other one that I like because I used to listen to it with my high school friends.

Except for that last characteristic,with a large enough sample and sufficiently fine-grained analytic tools, could a site figure out what I like about this song and that? Or is this a case where the difference between signal and noise is just too much in the ears of the beholder?

Andrew Keen’s best case

I’ve posted a long piece at Huffington Post that tries to put together the strongest, most coherent version of Andew “Cult of the Amateur” Keen’s argument against the Web…and then critiques it. Tags: andrew_keen web_2.0]

Making sense of RSS

From SnarkMarket:

AideRSS is a Godsend. It analyzes the activity around each item in an RSS feed — Technorati hits, comments, links, traffic reports, etc. — and calculates a score for the item. It then creates four feeds from the original feed, each set to a higher activity threshold.

Yet another way to sort through the miscellaneous…

Forbes’ review

Andy Greenberg at Forbes takes the book seriously, seems to think it’s on a topic worth writing about, and thinks it has interesting things to say, although he thinks I say most of them in the first few chapters (I disagree – I think the book develops a thesis, but, well, I would think that, wouldn’t I). Even so, my “story-laden writing keeps readers from straying.” Here’s his final paragraph: “As the author of a book about the virtues of chaos, Weinberger may be putting the last brick in the tomb of that “ancient and beautiful Greek idea.” But Everything Is Miscellaneous isn’t just about the promises of a messy Internet. It’s also a thoughtful obituary of history’s librarians, an elegy for the last order of order.”

This, and every other review I’ve found, is listed on the reviews page


I spoke with Lowell Anderson, VP of Marketing of SchemaLogic today. He called me because my book talks about a couple of their clients. Here’s what I learned:

SL helps companies figure out how their various knowledge silos connect by building ontologies that express the relationships among the terms they use. Thesauruses identify synonyms so people can continue using their accustomed vocabularies. SL thinks in RDF but end up exporting to non-RDF XML frequently in order to support applications such as Sharepoint.

They like to start with publicly-available ontologies and then enable the client to customize. Or, they’ll start with any existing taxonomies. For example, the Associated Press had 40-50K words in their standard vocabulary. SL sucked it into their system and then provided the tools by which “subject-matter experts” (e.g., editors) could identify weaknesses (using a graphical view) and make changes. Changes that affect another experts’ domain, even by linking to it, require permission from the other expert. The permission management system is configurable to each client’s needs and is one of the key advantages of the SL system.

SL provides no tagging tools for users and readers. It is a top-down system. Compared to the systems or lack of systems it replaces, however, it looks wild and loose. For example, the International Press Telecommunications Council had a complex taxonomy of topics (which I discuss less than enthusiastically in my book) that it stored in an Excel spreadsheet. The new ontology includes many more relationships. And at the AP, although editors have to fill in change request forms and get permission from other editors, the old process had a central committee making all decisions. From my point of  view, ontologies capture lots of information. They of course don’t capture all information. Bottom up adds information well. Fortunately, there’s plenty of room for it all in the gigantic miscellaneous pile.

Next »