Subscribe to
Posts
Comments

Archive for September, 2009

Herkko Hietanen: Network Recorders and Social Enrichment of Television

Herkko Hietanen, a Berkman Fellow, is giving a talk about TV. “Television is really broken.” It’s not providing what consumers want: programs when we want them, where we want them. It lacks interaction with other viewers and with broadcasters. It has ads. It’s geographically limited. If you had to pitch TV to a venture capitalist, it would have a hard time getting funding.

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.

Herkko gives a brief history of the highlights. VCRs were an early attempt to fix tv. This frightened the broadcasters, who took it to court, where — in Sony vs. Betamax — they lost. The court said the manufacturers were not responsible for infringing uses because the devices had non-infringing uses, and personal use was declared a fair use. Satellites extend over-the-air (OTA) broadcast. Community antennas were first set up by stores selling TV sets. Now cable is dominant. But contracts limit core innovation. “If you’re afraid you’ll piss off your content provider, you’re not going to do something that’s good for the consumer.”

There has been some innovation in the core. On-demand video. Time-Warner “LookBack” lets you view any show on the day it’s broadcast at any time during that day. Cable also provides a whole lot of channels. But, “Intelligence in the middle stops innovation at the edge.” The industry has litigated against just about everything innovative. E.g., Cablevision want to launch a service that would centralize storage rather than putting it in the set-top boxes. Just about everyone sued Cablevision for copyright infringement. The court saw that every user would have their own copy of a saved show. The court decided it doesn’t matter where the copies are stored. Herkko says it’s too bad it didn’t go to the Supreme Court so we’d have a definitive decision.

The problem with mythtv, Herkko says, is that it’s not user-friendly. [I spent 1.5 yrs trying to get MythTV to work, and failed :( Wendy Seltzer, seated across the table, has been using MythTV for years.] Tivo is easy but not all that easily hackable. You can’t share TiVo’ed shows, you can modify the code in the box. ReplayTV got sued for having a skip commercials feature, and went bankrupt.

Herkko points to living room clutter as another problem with TV today.

Herkko looks forward to PVRs getting connected to the Internet, because connected users create social networks, and they start to innovate. “We want stupid networked records and intelligent open client-players.” We want connected and tagged shows. We’ll have interactive TV for real, including gambling. Social groups could recommend what to watch.

This all creates privacy problems. E.g., an MIT study discovered they could identify gays by analyzing their social networks, with a high degree of accuracy.

At some point, users will probably start sharing their resources, cluster their recorders. Why should everyone record the same show over and over? Why get it from a central recorder when your neighbors have a copy? Of course, this is what got Replay TV into trouble, Herkko notes. He thinks that the social interaction around shows will happen before and after the show, because people won’t sit with a keyboard in their laps. [Since I’m on the backchannel as I listen to him, I guess I disagree.]

What about ads? Adding social networks would mean that people could watch ads they actually want to watch.

Overall: TV can be fixed. Social networks. Socially-oriented recorders.

Q: This is a compelling vision of the opposite of the Net. The Net is smart at the edges and dumb in the middle. TV has been the opposite. You seem to hope that the future will invert so consumers can get what they want. But consumers have never gotten what they wanted. What will change it?
A: We need brave entrepreneurs to test it in the courts. Having network recorders isn’t that different from having a VCR.

Q: When you were talking about the keyboard in your lap, I think that’s wrong generationally.
A: Voice works while watching tv. But typing and sharing the screen doesn’t.

Q: You’re talking about what the cable companies will do. But then there’s the stuff in the IP world: mythTV, Boxee, etc. That’s where the exciting stuff is.
A: Innovation at the core is very slow, while innovation at the edge is happens very fast.

Q: If the Internet arises to bypass the core, will the quality decline? Will it be more like YouTube style?
A: That’s a real concern. If everyone skips the ads, then there won’t be profit in producing high quality shows. Although there are also premium channels. And in Finland we pay an annual fee and get 4 channels.

Q: There are a lot of forces driving the centralization of TV. With that comes control against innovation at the edges. Is TV going to change or be changed by people sharing content from the edges?
A: If we force a change on TV, the broadcast flag will be re-introduced. Big audiences still demand the lay-back experience.
Q: The sitting back phenomenon has persisted for 50 yrs. Why will it continue?

Q: What is your main research question?
A: When recorders get connected, what sort of innovation are we going to get?

Q: Don’t we need non-Net neutrality to ensure that the video experience over the Net is good enough to inspire innovation in that space?
A: It can be done in other ways. You don’t need immediate delivery of all packets if you’re downloading for viewing late. E.g., in Finland I have a box that records 2 weeks of all 10 channels.

Q: The picture you’re painting is not very TV-like. It’s not broadcast, not one-directional, the business model doesn’t work, we’ll be using our computers…So, it seems like you’re dissolving what TV is. Rather talking about the “social enrichment of TV” [the title of Herkko’s talk], we should be talking about the visual enrichment of the Internet. E.g., how do you see Hulu, which has some community features.
A: I defined TV at the outset: It’s geographically bounded, it’s broadcast, it’s scheduled, etc. And Hulu takes some of the edge approach, but it’s very much a core app. We’re going to see a big shift of control from the rights owners to consumers.

Sidewiki: Google at the center

I agree with Jeff Jarvis’ critique of Google’s Sidewiki.

Sidewiki is ThirdVoice yet again. Both let you write and read comments on a site — actually on the site — so long as you have the proprietary client. ThirdVoice failed mainly because it couldn’t get enough people to install its client. (Of course, one could ask why enough people weren’t interested in this.) Sidewiki might succeed because it’s part of the vastly popular Google Toolbar. And, as Jeff says, that means it might succeed because Google is using its near ubiquity as a center of the Net. Which is troubling. For example, again as Jeff reports, insofar as the commentary on his site about his Sidewiki post occurs in Sidewiki, Google now owns the comments on his post. Troubling.

I think there are reasons to doubt Sidewiki’s success. As more people add comments, we need good ways to sort through them, to eliminate spam, to decide which types of comments are useful to us. Google is promising us algorithms. But algorithms won’t know that I don’t particularly want to read comments about my friend Jeff’s character, but I am particularly interested in what technologists are saying, or about Net politics, or what my friends are saying, or about how to hack Sidewiki.

Sidewiki has its uses. I’d rather see it connected to social networks, and I’d rather see it provided as an open source browser add-in. But I don’t know who should own the comments and what the control mechanisms should be. This is one of the edges of the Web that defies easy answers because it’sso hard to tell what is the center and what are the sides.

News is a river is a blog…

WLEX-TV in Lexington, Kentucky, an NBC affiliate, has turned its news site into a blog. It actually contains news produced independently of what goes out on broadcast. Very very interesting. It’s a different way of slicing the news, with much debt to Dave Winer’s river of news idea, and it’ll be fascinating to see how and in what ways it’s useful and how it changes our idea of what news should be.

The temptation of stories

Journalism at its best is a way to uncover and communicate the truth, subject to all the usual human limitations. But journalism’s fundamental form, the story itself, brings a special temptation to manipulate the truth for economic or aesthetic reasons. The temptation is resistible to varying degrees, depending on the type of story (the temptations are greater for feature stories than for hard-core reportage of the day’s events), the nature of the journal, and the standing of journalist. Nevertheless, the temptation is there, built into the form itself.

The very idea that there’s a story is itself a temptation. Maybe the story is on Facebook addiction or the rise in incivility. A journalist who goes back to her editor and says, “Nope, no story there” has disappointed the editor who now has to find another story to fill the hole in the paper newspaper or to feed the maw of the online publication. Not a big deal; it happens all the time. But if it’s fifth consecutive time that the reporter says there was no story there, it’s getting to be a problem. If it’s the reporter who has suggested the stories in the first place, as is often the case at many publications, she will be judged a failure because she’s wasted her time and gummed up the editor’s planning.

It’s not like it’s supposed to be in science, where a failed hypothesis is as valuable as a proved one, even though of course every scientist would rather discover that a new compound cures cancer than that it doesn’t. A failed hypothesis in the world of journalism is a story that won’t run, that won’t bring in readers, that won’t give businesses a page on which to place an ad. There are real prices to stories failing to pan out. Reporters are thus tempted to make the story work.

Even when the hypothesis of a story is true, journalists almost always reach a place in the story where they know what they want their interviewees to say. An interview is requested of a particular person to provide the “some experts disagree” statement or the “the implications of this are vast” verbiage. If that person doesn’t provide it, someone else will. Depending on the stage of the story, the interviewee may spark interest in a side issue or an approach the reporter hadn’t considered…resulting in someone else being called to provide the other side or the amplification.

This happens at some of stage of the story even when the topic is interesting no matter what storyline it takes. For example, the death of Pat Tillman is interesting because it is instantly symbolic: Football star turns down a life of fame and wealth in order to defend his country, and dies a soldier’s death in Afghanistan. Beyond the basic reportage the day that it happened, it was bound to inspire journalistic stories. A reporter could enter with an open mind. Even so, she’ll enter with an open mind looking for an angle, which is to say, looking for a story. Is it a relatively simple narrative of an inspiring patriot who gave his life to support his ideals? Or was there “more” to it? That search for the “more” isn’t simply a hunt for unknown truths. It’s a search for a narrative that reveals the simple surface to be a veneer from which we will learn something unexpected. The reporter may have no idea what the more is, but once she gets a hint of it, she’ll be on it, and the narrative itself — if not personal ambition — will carry her forward. Maybe Tillman wasn’t as virtuous as we thought. Maybe his death wasn’t as straightforward as we were told. Maybe his story was of a life fulfilled or of a life wasted or of a life more complex than we’d thought. Maybe it’s about the government’s cynical use of him, or of the media’s own eagerness to find a hero. But something will emerge. And as it emerges, it gathers its story around it, and the reporter is off looking for the voices who will play certain roles in the story. Why? Because the story demands it.

At the very least, the temptation journalistic stories is that of all story-telling, the basic way we humans make sense of our world. Stories, not just in journalism, are about the gradual revealing of truth. The surface wasn’t as it seemed. The ending was contained, hidden, in the beginning. What looked continuous was in fact disruptive. Stories have a shape, and story-tellers fit the pieces into that shape. There’s nothing wrong with that, except in an environment where there’s economic and social pressure to produce a story. Then the temptation is to get the pieces to fit. And that can corrode the truth.

So can the simple fact that stories tend towards closure. They end. They’re done. Some circle of understanding has been drawn and closed, tip to tip. The story says, simply by ending. “This is what you needed to know.” There can often be truth in that, but there is always falsity in it. The world, its events, and its people escape even the best of stories.

Stories are not going away from journalism, just as they’re not going away from history, biography, or how we talk about our day over dinner. They’re fundamental. Stories are how we understand, but they also inevitably are constructions, incomplete, and organized around a point of view. All stories are temptations. Journalistic stories have their own special and strong temptations because of their economics and because of the nature of the medium in which they’ve been embodied. Now those economics and that medium are changing, diminishing the old temptations but creating new ones:

::: Because we are increasingly turning to publications that explicitly take a stand, the temptation to include false views for “balance” is diminished. But, the preference for partisan media creates a new temptation: To over-state, in order to attract attention. [Guilty as charged!]

::: The old medium limited the length of stories, forcing unnecessary trimming except in very special circumstances. The new medium has infinite space so that stories can be right-sized. But it turns out that prolixity discourages on-line readers, so the new temptation is toward brevity. It’s not clear if that’s an expression of an impatience that’s always been with us or if the new medium constitutes a new temptation.

::: The old medium’s inability to embed links encouraged journalists to try to encapsulate the world in a single column of text. The new hyperlinked medium can tempt authors to gloss over points and contradictions because they’ve put in some links, putting the burden on readers who are (usually) lazier than the writers.

::: The economics of the old medium tempted publications to appear valuable by being a reliable source of the single truth. While they of course have encouraged discourse on controversial topics, their bread and butter have been stories that “get it right” and thus serve as a stopping point for belief. Stories are the bulwark of authority, and authority is the currency of the old journalistic economics. The new medium now can include as many stories as we want, from as many different points of view, connected by curators above the stories and by hyperlinks within the stories. The story no longer has to tell the whole truth. It’s just one of the stories. But, while that’s true of the ecosystem as a whole, the old temptation to be a single-source truth shop exists for individual online publications, whether they’re commercial or personal.

Now, the form I’ve adopted for this essay, which is itself a type of story-telling, is one of balance: Old temptations matched by new temptations. It’s a form that aims at inspiring trust: “See, I’m presenting both sides!” And that itself can be corrosive. Indeed, in this case it is. While the old temptations are being replaced by new ones, the locus of truth is moving decisively from individual stories and publications to the network of stories and publications. The balancing of temptations misses this most important change. The hyperlinked context of stories creates not only new temptations to go wrong, but a greater possibility for going right.

[berkman] Transforming Scholarly Communication

Lee Dirks [site] Director of Education and Scholarly Communication at Microsoft External Research is giving a Berkman-sponsored talk on “Transforming Scholarly Communications.” His group works with various research groups “to develop functionality that we think would benefit the community overall,” with Microsoft possibly as a facilitator. (Alex Wade from his group is also here.)

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.

He begins by noting the “data deluge.” But, compuing is stepping up to the problem: Massive data sets, evolution of multicore, and the power of the cloud. We’ll need all that (Lee says) because the workflow for processing all the new info we’re gathering hasn’t kept up with the amount we’re taking in via sensor networks, global databases, laboratory instruments, desktops, etc. He points to the Life Under Your Feet project at Johns Hopkins as an example. They have 200 wireless computers, each with 10 sensors, monitoring air and soil temperature and moisture, and much more. (Microsoft funds it.) Lee recommends Joe Hellerstein’s blog if you’re interested in “the commoditization of massive data analysis.” We’re at the very early stages of this, Lee says. For e-scientists and e-researchers, there’s just too much: too much data, too much workflow, too much “opportunity.”

We need to move upstream in the research lifecycle: 1. collect data and do research, 2. author it, 3. publish, and then 4. store and archive it. That store then feeds future research and analysis. Lee says this four-step lifecycle needs collaboration and discovery. Libraries and archives spend most of their time in stage 4, but they ought to address the problems much early on. The most advanced thinkers are working on these earlier stages.

“The trick there is integration.” Some domains are quite proprietary about their data, which makes it problematic to get data and curation standards so that the data can move from system to system. From Microsoft’s perspective, the question is how can they move from static summaries to much richer information vehicles. Why can’t a research reports be containers that facilitate reproducible science? It should help you use your methodology against its data set. Alter data and see the results, and then share it. Collaborate real time with other researchers. Capture reputation and influence. Dynamic documents. [cf. Interleaf Active Documents, circa 1990. The dream still lives!]

On the commercial side, Elsevier has been running an “Article of the Future Competition.” Other examples: PLoS Currents: Influenza. Nature Preceedings. Google Wave. Mendeley (”iTunes for academic papers”). These are “chinks in the armor of the peer review system.”

Big changes, Lee says. We’ll see more open access and new economic models, particularly adding services on top of content. We’ll see a world in which data is increasingly easily sharable. E.g., the Sloan Digital Sky Survey ios a prototyupe in data publishing: 350M web hits in 6yrs, 930k distinct users, 10k astronmers, delivered 100B rows of data. Likewise, GalaxyZoo.org at which the public can classify galaxies and occasionally discover a new object or two.

Lee points to challenges with data sharing: integrating it, annotating, maintaining provenance and quality, exporting in agreed formats, security. These issues have stopped some from sharing data, and have forced some communities to remain proprietary. “The people who can address these problems in creative ways” will be market leaders moving forward.

Lee points to some existing sharing and analysis services. Swivel, IBM’s Many Eyes, Google’s Gapminder, Freebase, CSA’s Illustra…

The business models are shifting. Publishers are now thinking about data sharing services. IBM and RedHat provides an interesting model: Giving the code away but selling services. Repositories will contain not only the full text versions of reserach papers, but also “gray” literature “such as technical reports and theses,” and real-time streaming data, images and software. We need enhanced interoperability protocols.

E.g., Data.gov provides a searchable data catalog that provides access through the raw data and using various tools. Lee also likes WorldWideScience.org, “a global science gateway” to international scientific databases. Sxty-sevenety countries are pooling their scientific data and providing federated search.

Lee believes that semantic computing will provide fantastic results, although it may take a while. He points to Cameron Neylon’s discussion of the need to generate lab report feeds. (Lee says the Semantic Web is just one of the tools that cojuld be used for semantics-based computing,.) So, how do we take advantage of this? Recommender systems, as at Last.fm and Amazon. Connotea and BioMedCentral’s Faculty of 1000 are early examples of this [LATER: Steve Pog’s comment below says Faculty of 1000 is not owned by BioMedCentral] . Lee looks forward to the automatic correlation of scientific data and the “smart composition of services and functionality,” in which the computers do the connecting. And we’re going to need the cloud to do this sort of thing, both for the computing power and for the range of services that can be brought to bear on the distributed collection of data.

Lee spends some time talkingabout the cloud. Among other points, he points to SciVee and Viddler as interesting examples. Also, SmugMug as a photo aggregator that owns none of its own infrastructure. Also Slideshare and Google Docs. But these aren’t quite what researchers need, which is an opportunity. Also interesting: NSF DataNet grants.

When talking about preservation and provenance, Lee cites DuraSpace and its project, DuraCloud. It’s a cross-repository space with services added. Institutions pay for the service.

Lee ends by pointing to John Wilbanks‘ concern about the need for a legal and policy infrastructure that enables and encourages sharing. Lee says that at the end of the day, it’s not software, but providing incentives and rewards to get people to participate.

Q: How soon will this happen?
A: We can’t predict which domains will arise and which ones people will take to.

Q: What might bubble up from the consumer sector?
A: It’s an amazing space to watch. There are lots of good examples already?

Q: [me] This is great to have you proselytizing outside. But as an internal advocate inside Microsoft, what does Msft still have to do, and what’s the push back?
A: We’ve built 6-8 add-ins for Word for semantic markup, scholarly writing, consumption of ontologies. A repository platform. An open source foundation separate from Micrsooft, contributing to Linux kernel, etc.

Q: You’d be interested in Dataverse.org.
A: Yes, it sounds like it.

Q: Data is agnostic, but how articles aren’t…
A: We’re trying to figure out how to embed and link. But we’re also thinking about how you do it without the old containers, on the Web, in Google Wave, etc.
Q: Are you providing a way to ID relationships?
A: In part. For people using their ordinary tools (e.g., Word), we’re providing ways to import ontologies, share them with the repository or publisher, etc.

Q: How’s auto-tagging coming? The automatic creation of semantically correct output?
A: We’re working on this. A group at Oxford doing cancer research allows researchers to semantically annotate within Excel, so that the spreadsheet points to an ontology that specifies the units, etc. Fluxnet.org is an example of collaborative curation within a single framework.

Q: Things are blurring. Traditionally libraries collect, select and preserve schoilarly info. What do you think the role of the library will be?
A: I was an academic librarian. In my opinion, the safe world of collecting library journals has been done. We know how to do it. The problem these days is data curation, providing services, working with publishers.
Q: It still takes a lot of money…
A: Definitely. But the improvements are incremental. The bigger advances come further up the stream.

Q: Some cultures will resist sharing…
A: Yes. It’ll vary from domain to domain, and within domains. In some cases we’ll have to wait a generation.

Q: What skills would you give a young librarian?
A: I don’t have a pat answer for you. But, a service orientation would help, building services on top of the data, for example. Multi-disciplinary partnerships.

Q: You’re putting more info online. Are you seeing the benefit of that?
A: Most researchers already have Microsoft software, so we’re not putting the info up in order to sell more. We’re trying to make sure researchers know what’s there for them.

Reuse metadata, don’t reinvent it

John Udell has a lovely post talkingabout an interview with Ian Forrester of the BBC who cites Tom Scott using a phrase from Michael Smethurst: “The simple joy of webscale identifiers.” The point is that if someone has invented an identifier for an object and you want to point to it, use the existing identifier. That enables a namespace conglomerating that keeps information all huddled and cozy, rather than drifting apart on ice floes.

From Technorati to WordPress tag namespace

The excessively sharp-eyed of you may have noticed that I have recently switch from listing tags at the end of posts to using WordPress tags at the end of posts. Here’s why. Not that you should care.

When tagging first took off, there weren’t a lot of good places to link your tags to. So, I chose to have them link to Technorati because Technorati was then the leading search engine for blogs. Plus, Technorati had taken the lead in making itself tag-worthy. Plus, Technorati was founded by a friend of mine — David Sifry — who I trusted (and still do trust) to do the Right Thing. Also, I was on the Technorati board of advisers (uncompensated), so I had some basic familiarity with the site and the the people. As a result, when you click on one of my old-style tags, it does a search for tags at Technorati and shows you the results. For example, here’s a tag to try: [Tags: ].

A couple of years ago, Word Press — the blogging software I use — introduced its own tagging capability. Instead of my having to hand-create links to the tags I want to use (actually, I wrote a little javascript to do it for me), I can enter tags and Word Press will turn them into links that aggregate all of my own postings that I’ve tagged that way. At the bottom of this post, you can try out the taxonomy link.

This is a further step into narcissism, for rather than seeing what the rest of the world has tagged “e-gov” (or whatever), you now see only my posts tagged that way. But I suspect that is probably what most users expect and want when they click on a tag at the bottom of a post. If you want to search all posts by everyone that have a certain tag, Technorati and other sites will do it for you.

(By the way, many thanks to Brad Sucks for writing the scripts that extracted my old tags and auto-inserted them as Word Press tags. He says the scripts are too focused to be of general use, so don’t ask. But do buy his music.)

Making the most of government data

The Sunlight Foundation has picked two winning mashups in its contest:

Washington, DC – The Sunlight Foundation awarded Datamasher.org with the grand prize of $10,000 for Sunlight’s Apps for America 2: The Data.gov Challenge. Datamasher.org is a Web application designed by Forum One Communications that lets anyone–no programming background required–choose different government data sets and mash them up to create visualizations and compare results on a state by state basis. Clay Johnson, director of Sunlight Labs, announced the winners and distributed over $25,000 in awards late yesterday at the Gov 2.0 Expo hosted by O’Reilly Media and TechWeb.

Sunlight created the Apps for America 2: The Data.gov Challenge to solicit creative Web applications based on the information available at Data.gov, the new central depository for government data created by Federal Chief Information Officer Vivek Kundra. It was inspired by the Sunlight’s commitment to use new tools to make the work of the federal government more transparent

[Tags: ]

Google Books metadata: Google responds

There’s a terrific colloquy between Google and Geoff Nunberg in response to Geoff’s critique of Google’s handling of the metadata attached to the books Google is digitizing (which I blogged about here). It’s fascinating for its content, but also very cool as a conversation between a company and its market. Of course, it would have been even better if Google had initiated this conversation when it started its digitization project.

[Tags: ]

Data and metadata: Together again

Terry Jones has an excellent post that lists the problems introduced by maintaining a hard distinction between metadata and data.

Terry cites Everything Is Miscellaneous (thanks, Terry), which argues that the distinction, which is hard-coded in the Age of Databases, becomes a merely functional difference in the Age of Messy Links: Metadata is what you know and data is what you’re looking for. For example, the year of a CD is metadata about the CD if you know the year a Bob Dylan CD came out but you don’t remember the title, and the title can be metadata if you know the title but want to find the year. And in both cases, it could all be metadata in your search for lyrics.

This is all very squishy and messy because the distinction is, as Terry says, artificial. It comes from thinking about experience as content that gets processed, as if we worked the way computers do. More exactly, it comes from thinking about experience as a set of Experience Atoms that then have to be assembled; metadata are the labels that tell you that Atom A goes into Atom Z. But experience is far more like language than like particle physics or Ikea assembly instructions. And that’s for a very good reason: linguistic creatures’ experience cannot be understood apart from language. Language doesn’t neatly separate into content and meta-content. It all comes together and it’s all intertwingled. Language is so very non-atomic that it makes atoms realize how lonely they’ve been.

That doesn’t mean that computer software that separates metadata from data is useless. Lord knows I love a good database. But it also means that computer software that can treat anything as metadata depending on what we’re trying to do opens up some interesting possibilities…

[Tags: ]

Next »