The Google Book deal

I just heard Robert Darnton on On The Media talking about the Google Book settlement. (Sorry, but I don’t yet see a link specifically to that interview.) Brilliant. The two things I’d recommend reading about this massive and massively important deal are Darnton’s piece in the NY Review of Books, and an article by James Grimmelmann.

The book settlement is hugely complex, hugely important, and overall a big step forward. But, the ur-cause of the issues many of us have with it is that it’s a settlement among authors, publishers and Google, which leaves readers, scholars, teachers — AKA the rest of us — out.

Long-tail museum

Jeff Gates posts about how the Smithsonian American Art Museum is facing the fact that it’s a long-tail phenomenon:

Our Web statistics showed that the number of visitors to our top ten sections paled when compared with the total number of visitors for all other pages, even though only a few people viewed each page. The challenge: how could we make it easier for our online visitors to find things of interest even if that information is buried deep in our site?

He continues:

Museums are changing. Like many other organizations, our hierarchical structure has historically disseminated information from our experts to our visitors. The envisioned twenty-first-century model, however, is more level. Instead of a one-way presentation, our online visitors are often interested in having a conversation with our curators and content providers. In response, many of us at American Art have been looking for ways to engage our public by designing applications that promote dialogue. By encouraging user-generated content and by distributing our assets beyond our own Web site and out across the Internet, we hope to make our content easier to find. In doing so, we are trying to fulfill our long tail strategy. In order to succeed we will need to approach our jobs differently.

And that’s just the introduction.

Meanwhile, the Library of Congress has expanded on its successful 15.7M views Flickr experiment and is now posting material at iTunes and YouTube.

Among the items Web surfers can expect on iTunes and YouTube are 100-year-old films from Thomas Edison’s studio, book talks with contemporary authors, early industrial films from Westinghouse factories, first-person audio accounts of life in slavery, and inside looks into the library’s holdings, including the rough draft of the Declaration of Independence and the contents of President Abraham Lincoln’s pockets on the night of his assassination.

This is all getting just too cool. Time to put the toys back on shelves behind glass


Data in its untamed abundance gives rise to meaning

Seb Schmoller points to a terrific article by Google’s Alon Halevy, Peter Norvig, and Fernando Pereira about two ways to get meaning out of information. Their example is machine translation of natural language where there is so much translated material available for computers to learn from, which (they argue) works better than trying to learn from attempts that go up a level of abstraction and try to categorize and conceptualize the language. Scale wins. Or, as the article says, “But invariably, simple models and a lot of data trump more elaborate models based on less data.”

They then use this to distinguish the Semantic Web from “Semantic Interpretation.” The latter “deals with imprecise, ambiguous natural languages,” as opposed to aiming at data and application interoperability. “The problem of semantic interpretation remains: using a Semantic Web formalism just means that semantic interpretation must be done on shorter strings that fall between angle brackets.” Oh snap! “What we need are methods to infer relationships between column headers or mentions of entities in the world.” “Web-scale data” to the rescue! This is basic the same problem as translating from one language to another, given a large enough corpus of translations: We have a Web-scale collection of tables with column headers and content, so we should be able to algorithmically recognize clustering concordances of meaning.

I'm not doing the paper justice because I can't, although it's written quite clearly. But I find it fascinating. [Tags: ]

Andrew Lih on Wikipedia

Vincent Rossmeier has a solid interview at Salon with Andrew Lih, author of The Wikipedia Revolution.

I’m going to interview Andrew as a Berkman event on Wednesday night, 6pm at Griswold Hall, room 110, at Harvard Law. Andrew is certainly a partisan, but he’s also an insider whose book is quite candid and direct about troubling episodes in Wikipedia’s history. I enjoyed his book and look forward to talking with him. (He and I will probably talk for 30 mins, and then we’ll open it up.)

4.5 things Twitter teaches us

You can tell that Twitter has added something important to the ecosystem by the volume of the snickering. If you dismiss it by asking “Why do I care what you had for breakfast?”, there are only two choices. First, you’re saying everyone on Twitter is an idiot. Second, you don’t understand what you’re talking about. As a Twitterer (dweinberger), I’m going to go with Option #2.

Twitter’s success tells us a lot…including the following 4.5 points:

1. Twitter in its native form assumes we’re ok with not keeping up with the abundance. Tweets are going to scroll by when you’re not looking, and you’re never going to see them. Twitter assumes you will let them go, the way most of us cannot leave unread the messages in our inbox.

2. Social asymmetry addresses the scaling problem. At Twitter, the people you follow are not necessarily the people who are following you. That’s exactly not how mailing lists and weekly status meetings work, and Twitter’s approach impedes the back-and-forth development of ideas. But, maybe that’s not what Twitter is primarily about. And the asymmetry means that some people can have lots of followers but still participate as listeners.

2.5. (Maybe in an age of abundance, the back and forth development of ideas isn’t the only process. Sure, having a small group kick around an idea often works. But maybe in some instances it also works for an idea to be lobbed like a beach ball from one group to another, each putting their own spin on it.)

3. Twitter is an app that scales as as platform. That is, it comes with a set of features that makes it usable and popular. But it’s open enough to enable users and third parties to add capabilities that make it useful for what it wasn’t designed for. For example, a convention has arisen among users that “RT” will stand for “re-tweet” when you want to publish someone else’s tweet to one’s own followers.

4. We'll complicate simple things as much as we have to. We'll invent "hashtags" (tags that begin with #, embedded within a tweet) to let people find tweets on a particular topic, getting past the "it already scrolled past" issue. We'll invent layers upon layers of aggregators of tweets. We'll just bang away on it as hard as we have to in order to accrete significance. We truly are meaning monkeys.

[berkman] Jeff Howe on crowd sourcing

Jeff Howe of Wired is giving a Berkman lunchtime talk on his book Crowd Sourcing. (He coined the term in 2006.) [Note: I’m live blogging, making mistakes, missing stuff, paraphrasing inappropriately, etc.]

From the beginning, he says, he’s been ambivalent about crowd sourcing. His book is a series of stories showing crowdsourcing’s promise and perils. The book is short on quantitative data, he says. As he was finishing up the edits, he came across a survey of 650 photo contributors. iStock was one of Jeff’s main examples, a stock photo agency that undercut competitors by 99%. They were able to do this because amateur photographers were willing to upload entire libraries of their photos. iStock culled them. iStock runs its corporate decisions past the community. The survey showed that contributors had a rich mix of motivations. He’d like to revisit this question.

Jeff gives his 45 minute book talk in 20 mins: He got interested in crowdsourcing by watching Myspace. “User generated content” doesn’t begin to tap the change that’s taking place. (Plus, he adds, he hates the phrase.) He spent a night searching for user-generated anything to show that it was about more than teenagers making “content.” E.g., John Fluevog Open Source Shoeware names shoes after designs contributed by users. He wrote an article for Wired in June 2006. The term took off.

As an example, he tells the story of the Two Jakes who created a crowdsourced t-shirt company, It created a community of designers and people who like to vote on designs. Revenues in 2007 topped $30M. The community provides the designs, does the marketingt, and Threadless has a mechanism that lets them gauge how much they need quite accurately.

iStockPhoto was bought by Getty, and revenues have continued to climb…over $100M in 2008, with 50% profit margin.

Another example: The way amateur ornithologists have transformed the way ornithology works,, the Elements restaurant in DC…

Why did crowdsourcing happen? Lots of amateurs, open source, tools, online communities. The cardinal rule of crowdsourcing: “Ask not what your community can do for you, etc.”

Jeff ends by asking about the study of iStock contributions’ motivations. 80% of iStockers religiously visit the site. The study shows the primacy of the financial motivation. Only 4% of the contributors make their primary living off of photography. The forum gets 37 posts per minute. 80% consider their work profitable, and 20% consider it extremely profitable. iStockers are largely not out to make friends or to network with others. iStockers are unsure that other iStockers can be trusted. This runs counter to how the company portrays them.

Q: I just had a logo made for $250 through LogoTournament. 30-40 designers worked on it from all over the world. The contestants all see one another’s designs.
A: Anectodotally, people seem to love it. There’s also CrowdSpring and 99Designs.

I used worth1000 for cover design. The Berkman folk loved it, but when I posted about it, I got flamed.
A: I understand that crowdsourcing is disruptive. It’s an emotional subject. Creatives can shape the transformation by embracing it.

Q: Your examples largely focused on highly creative forms of work. People do these things on their own as hobbies. How about crowdsourcing that has people transcribing podcasts via MechanicalTurk. Are these two types of crowdsourcing the same phenomenon?
A: MechanicalTurk is for repetitive, boring tasks. I don’t know how to encompass this. This makes the motivation for crowdsourcing more complex. That doesn’t dismay me.

Q: Is the difference about passion?
A: My catchphrase is that passion is the currency of the 21st century.

Q: [me] You position this as a contradiction. But it’s not if you define crowdsourcing as the action of a crowd, etc., and stir in economics: Those with leisure will do it for passion, while the rest will do more boring tasks for money. Unless what matters to you, and to the media that took it up, is that it’s a statement about human motivation.

Q:[eszter] You’re putting too much faith in the study. It’s only 1% of users and the methodology isn’t necessarily rock solid.
A: I called iStock’s founder and he has the same problems with the study.

Q: When I got the book, what was exciting was the possibility of solving altruistic problems. Do you have any examples?
A: GlobalVoices. Transcription services from a mobile phone for nonprofits.

Q: ReCaptcha is a great example. Also,

Some of the crowdsourced stock photo sites are scams.

Q: Is crowdsourcing exploitative?
Q: Is crowdsourcing exploitative?
A: Sure could be. Professional stock photographers certainly think so.

Open Congress Wiki

Congresspedia has become the Open Congress Wiki, where we can build transparency and knowledge together.

[Tags: ]

Steven Johnson on the future of news

On the heels of Clay’s splash o’ cold water — to paraphrase: “Revolutions aren’t pretty” — comes Steven Johnson’s eloquent pointing to the “old growth forests” of online news as indicators of what might be. As brilliant as ever.

[Tags: ]

Front page flash cards

At the Newseum site, mousing over a map pops up the front page of the local newspaper. Cool!

(And won’t the site please start taking ads so we can all run the headline: “Ad Newseum!” Please”?)

[Tags: ]

