Subscribe to

Archive for October, 2009

How embarrassing

All the tagging and categorization info on this site seems to be gone. Poof! These are the categories and tags that would help you browse the site by topic.

Very embarrassing for a site about the power of tagging and categorization. The lesson: Metadata needs to be backed upas much as content does.

I didn’t. That’s what’s embarrassing.

The FCC has put up a site — — where anyone (after registering with a valid email address) can post an idea, or vote existing ideas up or down. I love the idea of the feds opening discussions up, although, I am not convinced that this particular implementation achieves its presumed aims. But, what the heck! Try-fail-try is the right rhythm for the Net.

The site defaults to listing the ideas reverse chronologically, which adds some serendipity, or you can choose to view them listed in order of popularity, which encourages piling on. You can also browse by category/tag.

Anyone who registers can post a comment. The comments are unthreaded, discouraging much development of ideas but also discouraging flaming. You can report a comment as being “abusive,” but otherwise cannot rate them.

At the moment, the most popular posting is from Tim Karr, who, according to his biography at, a site sponsored by, “oversees all Free Press campaigns and online outreach efforts, including” Tim — who I know a bit and like — is an activist. He has the most popular post at the FCC’s site presumably because sent out a mailing urging supporters to vote it up.

There’s absolutely nothing wrong with that. It’s how politics is played in this country. If an anti-NN group sponsored by, say, AT&T wanted to play the same game, it’s perfectly entitled to. It’s not hard to imagine a well-funded group swamping FreePress’s shoestring efforts and getting orders of magnitudes more people to thumbs-up an anti-NN comment.

Which is to say that an open discussion board like the one the FCC has posted can serve either of two purposes. It can be a place where people come for rational discussions across political positions, or it can serve as an informal poll of citizens’ sentiments about an issue. But combining the two means that neither works very well. It becomes simply an opportunity for gaming the system.

It seems to me that sites such as these cannot serve as a poll that has any value at all. Besides, we have lots of other ways of gauging public opinion, including scientific polling and elections. If, on the other hand, the FCC wants to sponsor a forum for useful discussion or to generate new ideas, it could modify the current implementation. For example — and these are just ideas that may turn out to be gigantic belly flops — comments could be divided into two tracks, pro and con, with most-popular listings for each. Readers could be allowed to vote up but not down. Comments could be threaded. The comments could be rated. Postings could have buttons for “agree/disagree” and “interesting,” so that the site could highlight articles that people disagree with but find interesting.

All of these techniques could be gamed because everything can be gamed. Some discussion boards do work, though. I don’t know what the magic keys are, but I’m pretty confident that a political discussion board that includes an overall popularity contest will so encourage gaming that its results will necessarily be unreliable. At the very least, the popularity contest should be confined to determining the best arguments for each side.

But I don’t want to close on a negative note, for the FCC is to be congratulated on its efforts to open its processes up not only to lobbyists and geeks who know how to walk and talk like an FCC commenter, but to the general public. And it’s doing so in the proper Webby way of taking small steps and not being afraid to fail in public. That takes guts.

Radio Berkman on Forgetting, and Remembering the Media

There are two new-ish Radio Berkman interviews up: Me talking with Viktor Mayer-Schönberger about his book that argues that we are in danger of forgetting how to forget, and Russell Neuman on learning from the past of the media.

Harry Lewis has a terrific post about a $300 do-it-yourself book scanner he saw at the D is for Digitize conference on the Google Book settlement. The plans are available at, from Daniel Reetz, the inventor.

There are lots of personal uses for home-digitized books, so — I am definitely not a lawyer — I assume it’s legal to scan in your own books. But doesn’t that just seem silly if your friend or classmate has gone to the trouble of scanning in a book that you already own? Shouldn’t there be a site where we can note which books we’ve scanned in? Then, if we can prove that we’ve bought a book, why shouldn’t we be able to scarf up a copy another legitimate book owner has scanned in, instead of wasting all the time and pixels scanning in our own copy?

Isn’t Amazon among the places that: (a) knows for sure that we’ve bought a book, (b) has the facility to let users upload material such as scans, and (c) could let users get an as-is scan from a DIY-er if there is one available for the books they just bought?

Net uncovers new type of cloud

There are reports of a new type of cloud, one that is not currently in the official International Cloud Atlas. Or, possibly, it is a formation that’s been around forever, but the scattered reports are only now coalescing thanks to the Net.

According to Amazon’s review of Richard Hamblyn’s The Invention of Clouds, we only began thinking clouds could be categorized in 1802 when Luke Howard started giving public lectures. The very idea that clouds — the paradigm of uncatchable — could be divided into groups was (apparently) fascinating and thrilling. (Lamarck had also categorized clouds, but it didn’t catch on.)

A quick googly scan makes it seem that the cloud taxonomy is pretty messy. For example, the University of Illinois’ “cloud types” page lists four broad categories, and a list of miscellaneous clouds, each of which is categorized under one of the four basic types, evoking a “Huh?” reaction from at least one of us. The cloud taxonomy page at Univ. Missouri-Columbia lists eight types. Do you categorize by what they look like, how high they are, what they do (rain or not?), which celebrity profiles they resemble …? Categorizing clouds is truly a Borgesian task.

And, dammit, wouldn’t you know? Here’s a poem by Jorge Luis Borges called: “Clouds (II)” (with the line-endings probably removed):

Placid mountains meander through the air, or tragic cordilleras cast a pall, overshadowing the day. They are what we call clouds. And their shapes are often strange and rare. Shakespeare observed one once. It seemed to be a dragon. That one cloud of an afternoon still kindles in his words and blazes down, so that we go on seeing it today. What are the clouds? An architecture of chance? Perhaps they are the necessary things from which God weaves his vast imaginings, threads of a web of infinite expanse. Maybe the cloud is emptiness returning, just like the man who watches it this morning.

(translated by Richard Barnes. B; Robert Mezey; Richard Barnes. “Clouds (II). (poem).” The American Poetry Review. World Poetry, Inc. 1996. HighBeam Research. 11 Oct. 2009 v)

More Borges poems

Viktor Mayer-Schönberger is giving a talk at the Berkman Center (well, actually at Pound Hall) on his book Delete: The Virtue of Forgetting in the Digital Age. Viktor teaches at Singapore University, and was at the Kennedy School for ten years.

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.

He begins with a story of person studying to become a teacher who was kicked out of school because the school noticed a photo of her drinking on Facebook. She tried deleting it, but the Internet remembered it. He gives another example: A person who noted in an article that he had taken LSD in the 1960s. When trying to cross into the US, an immigration officer refused him admittance because he hadn’t offered up that information, and the officer uncovered it by googling him. What’s put on the Web is never forgotten. In another example, the information was not put up by the individual but by someone else: a bar/club in Europe records all the people, all the drinks, etc., and hasn’t ever deleted any information. Likewise, Google knows more about us than we can remember.

For millennia, forgetting was easy, and remembering was hard, says Viktor. So, we’ve come up with ways to pass on our memories. The oral tradition. Painting. Writing. “But these tools have not altered the fundamental fact that for us humans, forgetting is easy, and remembering is time-consuming and expensive.” The book and the photo also haven’t altered this fact. What is long past fades in our mind. We depreciate what is no longer relevant. But because forgetting is biological, we never had to develop explicit strategies to forget. Now we’ve moved from biologically forgetting to permanent remembering. [Hmm. I haven’t. We still don’t remember much. But we have more records, and thus are able to retrieve more. That seems different to me.]

This has happened because storage is cheap in the digital world. Google has server farms with a capacity of 100,000 terabytes perhaps. And we’ve gotten much better at retrieving information. And we have global access. Remembering has become the default.

There are, of course, benefits to this, Viktor says. But undoing forgetting has deep consequences, far beyond the information efficiencies. He points to power and time.

Power: If others have info about us and can keep that info accessible for a very long time, the informational power increases, and can affect how we transact and interact. It’s Bentham’s Panopticon: behavioral compliance through the permanent threat of constant surveillance.

Time: Imagine Jane is about to catch up with her old friend John, but when reviewing their history of email, discovers msgs from a time when he was nasty to her. She had forgotten that time. Now it comes back. Her current relationship with John now is ruined. [Or, she discovers msgs that remind her she once loved him. Isn’t Viktor’s example actually an argument for more remembering, so she can see how she got over the bad time?] “In analog times, the dangers were limited” because our biology would have brought us to forget.

Viktor talks about AJ, a non-fictional woman who has difficulty forgetting. It is a weird and unhappy condition.[This is why the conflation of human remembering and the presence of a fairly complete digital record matters. The presence of digital info and the tools for retrieving it does not turn us into AJ.]

Without forgetting, we have trouble changing. We have trouble forgiving. We may turn into an unforgiving society. “This is the real danger of shifting the default from forgetting to remembering.” Worse, suppose we stop relying on our own memories and rely instead on the digital memories. “Does that give those who control digital memory the power over history?”

What to do? Perhaps give privacy rights to individuals. But there are weaknesses: It’s not politically feasible in the US. The European have those rights, but people have not used them.

Or perhaps we could create an information ecology, a regulatory construction of what can be remembered. E.g., it might require the deletion of info after a particular time. This does not require individuals to go to court for enforcement, and it protects against an unforeseen future as when the benign Dutch social services registry was repurposed by the Nazis to identify Jews. “It may be better to store less than more.” But, after 9/11, we’re seeing requirements for increasing data retention, Viktor notes.

So, maybe we need to augment these approaches. “Digital abstinence,” for example. Don’t put everything on Facebook. But abstinence isn’t all that reasonable, he says. By the end of 2007, two out of three young Americans had put their info online.

The opposite approach is “full contextualization.” E.g., Jane can’t find the context of her bad treatment by John. Full contextualization would restore that. But will that ever be technically feasible? And if it were, would it really address the challenge of digital remembering? Do we have time to relive our past again and again?

Another approach: Hope for a cognitive adjustment. That is, over time we’ll learn to devalue older info and learn to live with an omnipresent past. “That would solve our problem. But is it likely?” How long would it take us to change how we assess information? “Cognitive psychologists are very critical of our ability to change our decision making in the short run.” [But a change in norms can happen much faster than that, and we govern what we’re allowed to notice and remember through norms. Statements like “That’s water under the bridge” and “Youthful indiscretions” are expressions of norms that enforce social forgetting without requiring actual brain evolution.]

Or, we could change our technology, rather than changing ourselves. E.g., a global DRM system to protect privacy. Viktor is not recommending this: “Wouldn’t this be a perfect surveillance system?” And we’d have to make sure that privacy is built deep into the infrastructure.

None of these six solutions are sufficient, although all offer something.

“I advocate a revival of forgetting…to establish a mechanism that makes forgetting easy, and makes remembering just a bit more strenuous.” Just enough to shift the incentives back to what we humans are used to. Viktor suggests an expiry date for information. Whenever we save info, we should be prompted to put in a date when we want it deleted. We should be able to change those dates.

The core of this proposal isn’t the automatic deletion, he says. Rather, the prompting for the date will remind us humans that most information is not of permanent value.

E.g., search engines could offer us an easy way to say how long we should remember searches. Or people could carry a device on their keyring to set expiration dates, perhaps tagging the expiration dates for the images of the people in digital photos.

Any expiry date system must have only two characteristics. First, it must aim at changing the default from remembering back to forgetting. Second, it must remind us of information’s temporal nature.

Expiry dates are also no silver bullet, and don’t solve digital privacy problems, Viktor says. But they could be useful when used with some of the other proposed solutions.

“Forgetting is often forgotten…Let us remember to forget.”

Q: You don’t mention the propensity of all media to fade over time. Digital memory is not perfect. Also, data is growing so quickly that it gets too expensive to digitally remember everything. The amount of data is growing faster than Moore’s Law.
A: You don’t need much space to remember a billion queries a day. A couple of hundred dollars worth of data storage. And Google’s way of saving data is relatively future-proof.

Q: [me] If we take memory to mean only the human capacity, and digital “memory” to be more like what we usually call storage, then what has actually happened to human memory in the digital age?
A: I chose the term “digital memory” carefully. If I can’t access my VCR tapes easily, they’re pretty much useless to me. Digital stuff is so easily accessible. How has digital remembering changed human remembering? I don’t know. But my argument isn’t that it’s changed human remembering, but that it has changed the external stimuli affecting our memory.

Q: One of the way a culture forgets is that it lets books go out of print, get moved out of libraries, etc. Now we have Google Books, which will make all books ever printed available (pretty much). Do you see negative effects of this project?
A: I haven’t given it enough thought because authors would like to set their books’ expiry dates very far in the future. Some preliminary research we’re doing on court decisions are showing an interesting effect on memory.
Q: The author of the book isn’t the only one concerned with the info in it. There may be people written about who would want to a say…
A: Yes, and the author’s rights aren’t always fully owned by them.

Q: Digital memory has value as cultural memory. The things we’d put expiration dates on have value even if against the interests of the people at the time, because it has social and historic meaning…
A: That’s just conjecture…
Q: No it’s not. We’re leaving traces now all the time. How we put that info to use is a different question.
A: Suppose you’re an author. Shouldn’t you be able to put bad early stories into the trash bin? Why should society have the right to take it from you and preserve it and make it public?
Q: Great point, but we still do struggle with this. Nonetheless, I would recommend we give thought to how these things might sensibly be balanced. E.g., the Iran election twitter stream. Enormous amt of fascinating info has been lost.
A: The solution is built in. For certain contexts, we may be required to mandate a very long expiry date. We do that all the time. I’m arguing for keeping that as the exception to the rule.

Q: I’m a cultural historian, trained as a Medievalist. There’s data scarcity in that field. Who decides about inclusion, preservation, etc.? Institutions have performed the filtering role. Google keeps some types of info and not others. Others are interested in your social security number, etc. So, who are the gatekeepers? There’s power to the Internet Archive’s approach of capturing everything. The stuff that the institutions of memory don’t preserve may turn out to be the most interesting for historians. (I basically buy your core argument, although I’m a believer in the cognitive adjustment.)
A: Brewster Kale and I (of Internet Archive) are in general agreement. The Archive sets expiry dates. [Not sure I got that right. Sorry.] My core argument is to give back the choice to the individuals.

Q: I too believe in the cognitive adjustment because I see myself and others already doing that. Sure, you find old emails reminding of something you wanted to forget, but when you accidentally delete some years’ worth, you feel an intense sense of loss.
A: When I lost all my email at the end of 1998, I was completely horrified. But then I discovered it doesn’t really matter. I started out believing the cog adjustment argument, but after I read cog science books, I changed my mind. I want to plug The Seven Sins of Memory, which shows how hard it is to readjust.

Q: Suppose two of us in a shared record have different expiry preferences…
A: I talk about that a lot in the book.

Q: There’s a big diff between what I want to preserve and what others do. The European privacy laws require data deletion. Google and others are now negotiating with the European Commission about this …
A: We need to differentiate between privacy rights and norms.

[missed a couple of questions. sorry.]

Viktor says that he recognizes that expiry dates are a crude instrument. Too binary. “I’d prefer rusting or something like that.” :)

The Dewey Belushi system

Here’s the Onion on the Dewey Decimal Classification system meeting its nemesis, Jim Belushi. (Thanks to Jay Hurvitz for the pointer.)

Libraries sans Dewey

Barbara Fister has a terrific article in LibraryJournal about libraries who have moved away from the Dewey Decimal Classification (DDC) system, many in favor of some version of the BISAC system that arranges books alphabetically by topic. This is a more bookstore-like approach. The article presents the multiple sides of this discussion, with lots of examples.

The disagreement among librarians is, to my mind, itself evidence that there is no one right way to organize physical objects. Classification is pragmatic. You classify in a way that works, but what works depends upon what you’re trying to do. Libraries serve multiple purposes, so librarians have to make hard decisions. If the DDC isn’t the safe and obvious choice, then libraries have to confront the question of their mission. The classification question quickly becomes existential in the JP Sartre sense.

At the end, she quotes from Everything Is Miscellaneous where I say that the Dewey system “can’t be fixed.” I still think that’s right in its context: No single classification system can work for everyone or for every purpose, although they can be better or worse at what they’re trying to do. In that sense, the DDC can be improved, and the OCLC has continuously improved it. But because it’s premised on assigning a single main category to each book, it is repeating the limitations of the physical world that require physical books each to go on a single shelf. Any single classification is going to be inapt for some purposes, and is going to embody biases constitutive of its culture. It’s the job of a library and of a book store to decide which single way of classifying works best for its patrons, with the obvious recognition that no single way works best for all. Books are miscellaneous. Libraries, bookstores, and the shelves over your desk are not.

Anyway, Barbara’s article is a fascinating look at how libraries are trying to do the best for their patrons, working within the constraints of the physical.