Subscribe to
Posts
Comments

Et tu, U2?

A few days ago, when Apple pushed the latest from U2 into everyone’s iTunes library, you could hear the Internet pause as it suddenly realized that Apple is its parents’ age.

Now in the ad-promotion succubus occupying the body of what used to be Time Magazine, you can see U2 desperate to do exactly the wrong thing: insisting that it wasn’t a gift at all. You can learn more about this in the hilariously titled cover article of Time: “The veteran rock band faces the future.” This a future in which tracks we don’t like are bundled with tracks we do (the return of the CD format) and people who share with their fans are ruining it for U2, boohoo.

Or, as Bono recently said, “We were paid” for the Apple downloads, adding, “I don’t believe in free music. Music is a sacrament.” And as everyone knows, sacraments need to be purchased at a fair market value, the results of which Bono, as a deeply spiritual artist, secures in sacred off-shore accounts.

In my head I hear Bono, enraged by the increasingly bad publicity, composing a message that he posts without first running it through his phalanx of PR folks:

Dear fans:

You have recently received a copy of our latest album, Songs of Innocence, in your iTunes library. U2 understands you may be confused or even upset by this. So, let me clarify once and for all the most important point about this — if I may humbly say so — eternal masterpiece. It was not our intention to cause you stress or to wonder if you have the musical sensitivity to full grasp (if I may, humbly say) the greatness of our work. But most important, it is essential above all that you understand that it was not our intention to give you a gift. No freaking way.

We understand your mistake. You are, after all, just fans, and you don’t play in the Jetstream world of global music. As I said to my dear friend Nelson Mandela (friend is too weak a word; I was his mentor) shortly before he passed, music is a sacrament, just like tickets to movies, especially ones with major stars working for scale, or like the bill at a restaurant where you and any two of the Clintons (Chelsea, you are a star! Give yourself that!) are plotting goodness.

To tell you the truth, I’m disappointed in you. No, worse. I’m hurt. Personally hurt. How dare you think this was a gift! After all these years, is that all U2 is worth to you? Nothing? Our music has all the value of a CrackerJacks trinket or a lower-end Rolex in an awards show gift bag? Do you not understand that Apple paid us for every copy they distributed? We were paid for it, sheeple! Massive numbers of dollars were transferred into our bank accounts! More dollars than you could count, you whiny little “Ooh look at me I’m sharing” wankers! We’re U2 dammit! We don’t need you! You need us! MONEY IS LOVE! EXTRA-ORDINARY LOVE!!!!!!

Have a beautiful day.

Meanwhile, as always, Amanda Palmer expresses the open-hearted truth about this issue. It almost makes me regret making fun of Bono. Almost.

>Bono makes it clear U2 was paid for the

This is one of the most amazing examples I’ve seen of the complexity of even simple organizational schemes. “Unicode Collation Algorithm (Unicode Technical Standard #10)” spells out in precise detail how to sort strings in what we might colloquially call “alphabetical order.” But it’s way, way, way more complex than that.

Unicode is an international standard for how strings of characters get represented within computing systems. For example, in the familiar ASCII encoding, the letter “A” is represented in computers by the number 65. But ASCII is too limited to encode the world’s alphabets. Unicode does the job.

As the paper says, “Collation is the general term for the process and function of determining the sorting order of strings of characters” so that, for example, users can look them up on a list. Alphabetical order is a simple form of collation.

Sorting inconsistent alphabets is, well, a problem. But let Technical Standard #10 explain the problem:

It is important to ensure that collation meets user expectations as fully as possible. For example, in the majority of Latin languages, ø sorts as an accented variant of o, meaning that most users would expect ø alongside o. However, a few languages, such as Norwegian and Danish, sort ø as a unique element after z. Sorting “Søren” after “Sylt” in a long list, as would be expected in Norwegian or Danish, will cause problems if the user expects ø as a variant of o. A user will look for “Søren” between “Sorem” and “Soret”, not see it in the selection, and assume the string is missing, confused because it was sorted in a completely different location.

Heck, some French dictionaries even sort their accents in reverse order. (See Section 1.3.)

But that’s nothing. Here’s a fairly random paragraph from further into this magnificent document (section 7.2):

In the DUCET, characters are given tertiary weights according to Table 17. The Decomposition Type is from the Unicode Character Database [UAX44]. The Case or Kana Subtype entry refers either to a case distinction or to a specific list of characters. The weights are from MIN = 2 to MAX = 1F16, excluding 7, which is not used for historical reasons.

Or from section 8.2:

Users often find asymmetric searching to be a useful option. When doing an asymmetric search, a character (or grapheme cluster) in the query that is unmarked at the secondary and/or tertiary levels will match a character in the target that is either marked or unmarked at the same levels, but a character in the query that is marked at the secondary and/or tertiary levels will only match a character in the target that is marked in the same way.

You may think I’m being snarky. I’m not at all. This document dives resolutely into the brambles and does not give up. It incidentally exposes just how complicated even the simplest of sorting tasks is when looked at in their full context, where that context is history, language, culture, and the ambiguity in which they thrive.

Two percent of Harvard’s library collection circulates every year. A high percentage of the works that are checked out are the same as the books that were checked out last year. This fact can cause reflexive tsk-tsking among librarians. But — with some heavy qualifications to come — this is at it should be. The existence of a Long Tail is not a sign of failure or waste. To see this, consider what it would be like if there were no Long Tail.

Harvard’s 73 libraries have 16 million items [source]. There are 21,000 students and 2,400 faculty [source]. If we guess that half of the library items are available for check-out, which seems conservative, that would mean that 160,000 different items are checked out every year. If there were no Long Tail, then no book would be checked out more than any other. In that case, it would take the Harvard community an even fifty years before anyone would have read the same book as anyone else. And a university community in which across two generations no one has read the same book as anyone else is not a university community.

I know my assumptions are off. For example, I’m not counting books that are read in the library and not checked out. But my point remains: we want our libraries to have nice long tails. Library long tails are where culture is preserved and discovery occurs.

And, having said that, it is perfectly reasonable to work to lower the difference between the Fat Head and the Long Tail, and it is always desirable to help people to find the treasures in the Long Tail. Which means this post is arguing against a straw man: no one actually wants to get rid of the Long Tail. But I prefer to put it that this post argues against a reflex of thought I find within myself and have encountered in others. The Long Tail is a requirement for the development of culture and ideas, and at the same time, we should always help users to bring riches out of the Long Tail

There’s a terrific article by Helen Vendler in the March 24, 2014 New Republic about what can learn about Emily Dickinson by exploring her handwritten drafts. Helen is a Dickinson scholar of serious repute, and she finds revelatory significance in the words that were crossed out, replaced, or listed as alternatives, in the physical arrangement of the words on the page, etc. For example, Prof. Vendler points to the change of the line in “The Spirit” : “What customs hath the Air?” became “What function hath the Air?” She says that this change points to a more “abstract, unrevealing, even algebraic” understanding of “the future habitation of the spirit.”

Prof. Vendler’s source for many of the poems she points to is Emily Dickinson: The Gorgeous Nothings, by Marta Werner and Jen Bervin, the book she is reviewing. But she also points to the new online Dickinson collection from Amherst and Harvard. (The site was developed by the Berkman Center’s Geek Cave.)


Unfortunately, the New Republic article is not available online. I very much hope that it will be since it provides such a useful way of reading the materials in the online Dickinson collection which are themselves available under a CreativeCommons license that enables
non-commercial use without asking permission.

CityCodesAndOrdinances.xml

A friend is looking into the best way for a city to publish its codes and ordinances to make them searchable and reusable. What are the best schemas or ontologies to use?

I work in a law school library so you might think I’d know. Nope. So I asked a well-informed mailing list. Here’s what they have suggested, more or less in their own words:


Any other suggestions?

Schema.org…now for datasets!

I had a chance to talk with Dan Brickley today, a semanticizer of the Web whom I greatly admire. He’s often referred to as a co-creator of FOAF, but these days he’s at Google working on Schema.org. He pointed me to the work Schema has been doing with online datasets, which I hadn’t been aware of. Very interesting.

Schema.org, as you probably know, provides a set of terms you can hide inside the HTML of your page that annotate what the visible contents are about. The major search engines — Google, Bing, Yahoo, Yandex — notice this markup and use it to provide more precise search results, and also to display results in ways that present the information more usefully. For example, if a recipe on a page is marked up with Schema.org terms, the search engine can identify the list of ingredients and let you search on them (“Please find all recipes that use butter but not garlic”) and display them in a more readable away. And of course it’s not just the search engines that can do this; any app that is looking at the HTML of a page can also read the Schema markup. There are Schema.org schemas for an ever-expanding list of types of information…and now datasets.

If you go to Schema.org/Dataset and scroll to the bottom where it says “Properties from Dataset,” you’ll see the terms you can insert into a page that talk specifically about the dataset referenced. It’s quite simple at this point, which is an advantage of Schema.org overall. But you can see some of the power of even this minimal set of terms over at Google’s experimental Schema Labs page where there are two examples.

The first example (click on the “view” button) does a specialized Google search looking for pages that have been marked up with Schema’s Dataset terms. In the search box, try “parking,” or perhaps “military.” Clicking on a return takes you to the original page that provides access to the dataset.

The second demo lets you search for databases related to education via the work done by LRMI (Learning Resource Metadata Initiative); the LRMI work has been accepted (except for the term useRightsUrl) as part of Schema.org. Click on the “view” button and you’ll be taken to a page with a search box, and a menu that lets you search the entire Web or a curated list. Choose “entire Web” and type in a search term such as “calculus.”

This is such a nice extension of Schema.org. Schema was designed initially to let computers parse information on human-readable pages (“Aha! ‘Butter’ on this page is being used as a recipe ingredient and on that page as a movie title“), but now it can be used to enable computers to pull together human-readable lists of available datasets.

I continue to be a fan of Schema because of its simplicity and pragmatism, and, because the major search engines look for Schema markup, people have a compelling reason to add markup to their pages. Obviously Schema is far from the only metadata scheme we need, nor does it pretend to be. But for fans of loose, messy, imperfect projects that actually get stuff done, Schema is a real step forward that keeps taking more steps forward.

Here’s a recipe for a Manhattan cocktail that I like. The idea of adding Kahlua came from a bartender in Philadelphia. I call it a Bogotá Manhattan because of the coffee.

You can’t tell by looking at this post that it’s marked up with Schema.org codes, unless you View Source. These codes let the search engines (and any other computer program that cares to look) recognize the meaning of the various elements. For example, the line “a splash of Kahlua” actually reads:

<span itemprop=”ingredients”>a splash of Kahlua</span>

“itemprop=ingredients” says that the visible content is an ingredient. This does not help you as a reader at all, but it means that a search engine can confidentally include this recipe when someone searches for recipes that contain Kahlua. Markup makes the Web smarter, and Schema.org is a lightweight, practical way of adding markup, with the huge incentive that the major search engines recognize Schema.

So, here goes:

Bogotá Manhattan

A variation on the classic Manhattan — a bit less bitter, and a bit more complex.

Prep Time: 3 minutes
Yield: 1 drink

Ingredients:

  • 1 shot bourbon

  • 1 shot sweet Vermouth

  • A few shakes of Angostura bitters

  • A splash of Kahlua

  • A smaller splash of grenadine or maraschino cherry juice

  • 1 maraschino cherry and/or small slice of orange as garnish. Delicious garnish.

Instructions:

Shake together with ice. Strain and serve in a martini glass, or (my preference) violate all norms by serving in a small glass with ice.

Here’s the Schema.org markup for recipes. author url

Are tags over-rated?

Jeff Atwood [twitter:codinghorror] , a founder of Stackoverflow and Discourse.org — two of my favorite sites — is on a tear about tags. Here are his two tweets that started the discussion:

I am deeply ambivalent about tags as a panacea based on my experience with them at Stack Overflow/Exchange. Example: pic.twitter.com/AA3Y1NNCV9

Here’s a detweetified version of the four-part tweet I posted in reply:

Jeff’s right that tags are not a panacea, but who said they were? They’re a tool (frequently most useful when combined with an old-fashioned taxonomy), and if a tool’s not doing the job, then drop it. Or, better, fix it. Because tags are an abstract idea that exists only in particular implementations.

After all, one could with some plausibility claim that online discussions are the most overrated concept in the social media world. But still they have value. That indicates an opportunity to build a better discussion service. … which is exactly what Jeff did by building Discourse.org.

Finally, I do think it’s important — even while trying to put tags into a less over-heated perspective [do perspectives overheat??] — to remember that when first introduced in the early 2000s, tags represented an important break with an old and long tradition that used the authority to classify as a form of power. Even if tagging isn’t always useful and isn’t as widely applicable as some of us thought it would be, tagging has done the important work of telling us that we as individuals and as a loose collective now have a share of that power in our hands. That’s no small thing.

A few times in the course of Derek Attig’s really interesting talk on the history of bookmobiles yesterday, he pointed out how the route map of the early bookmobiles (and later ones, too) resembles a network map. He did this to stress that the history of bookmobiles is not simply a history of vehicles, but rather should be understood in terms of those vehicles’ social effect: creating and connecting communities.

I like this point, and I don’t mean to suggest that Derek carried the analogy too far. Not at all. But, it is an excellent example of how we are reinterpreting everything in terms of networks, just as we had previously interpreted everything in terms of computers and programs and information, and before that in terms of telephone networks, and before that…and before that…and before that….

Cultural paradigm shift? Underway!

I’m at a Berkman lunchtime talk on crowdsourcing curation. Jeffrey Schnapp, Matthew Battles [twitter:matthewBattles] , and Pablo Barria Urenda are leading the discussion. They’re from the Harvard metaLab.

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.

Matthew Battles begins by inviting us all to visit the Harvard center for Renaissance studies in Florence, Italy. [Don't toy with us, Matthew!] There’s a collection there, curated by Bernard Berenson, of 16,000 photos documenting art that can’t be located, which Berenson called “Homeless Paintings of the Italian Renaissance.” A few years ago, Mellon sponsored the digitization of this collection, to be made openly available. One young man, Chris Daley [sp?] has since found about 120 of the works. [This is blogged at the metaLab site.]

These 16,000 images are available at Harvard’s VIA image manager [I think]. VIA is showing its age. It doesn’t support annotation, etc. There are some cultural crowdsourcing projects already underway, e.g., Zooniverse’s Ancient Lives project for transcribing ancient manuscripts. metaLab is building a different platform: Curarium.com.

Matthew hands off to Jeffrey Schnapp. He says Curarium will allow a diverse set of communities (archivist, librarian, educator, the public, etc.) to animate digital collections by providing tools for doing a multiplicity things with those collections. We’re good at making collections, he says, but not as good at making those collections matter. Curarium should help take advantage of the expertise of distributed communities.

What sort of things will Curarium allow us to do? (A beta should be up in about a month.) Add metadata, add meaning to items…but also work with collections as aggregates. VIA doesn’t show relations among items. Curarium wants tomake collections visible and usable at the macro and micro levels, and to tell stories (“spotlights”).

Jeffrey hands off to Pablo, who walks us through the wireframes. Curarium will ingest records, and make them interoperable. They take in reords in JSON format, and extract the metadata they want. (They save the originals.) They’re working on how to give an overview of the collection; “When you have 11,000 records, thumbnails don’t help.” So, you’ll see a description and visualizations of the cloud of topic tags and items. (The “Homeless” collection has 2,000 tags.)

At the item level, you can annotate, create displays of selected content (“‘Spotlights’ are selections of records organized as thematized content”) in various formats (e.g., slideshow, more academic style, etc.). There will be a rich way of navigating and visualizing. There will be tools for the public, researchers, and teachers.

Q&A

Q: [me] How will you make the enhanced value available outside of Curarium? And, have you considered using Linked Data?

A: We’re looking into access. The data we have is coming from other places that have their own APIs, but we’re interested in this.

Q: You could take the Amazon route by having your own system use API’s, and then make those API’s open.

Q: How important is the community building? E.g., Zooniverse succeeds because people have incentives to participate.

A: Community-building is hugely important to us. We’ll be focusing on that over the next few months as we talk with people about what they want from this.

A: We want to expand the scope of conversation around cultural history. We’re just beginning. We’d love teachers in various areas — everything from art history to history of materials — to start experimenting with it as a teaching tool.

Q: The spotlight concept is powerful. Can it be used to tell the story of an individual object. E.g., suppose an object has been used in 200 different spotlights, and there might be a story in this fact.

A: Great question. Some of the richness of the prospect is perhap addressed by expectations we have for managing spotlights in the context of classrooms or networked teaching.

Q: To what extent are you thinking differently than a standard visual library?

A: On the design side, what’s crucial about our approach is the provision for a wide variety of activities, within the platform itself: curate, annotate, tell a story, present it… It’s a CMS or blogging platform as well. The annotation process includes bringing in content from outside of the environment. It’s a porous platform.

Q: To what extent can users suggest changes to the data model. E.g., Europeana has a very rigid data model.

A: We’d like a significant user contribution to metadata. [Linked Data!]

Q: Are we headed for a bifurcation of knowledge? Dedicated experts and episodic amateurs. Will there be a curator of curation? Am I unduly pessimistic?

A: I don’t know. If we can develop a system, maybe with Linked Data, we can have a more self-organizing space that is somewhere in between harmony and chaos. E.g., Wikimedia Loves Monuments is a wonderful crowd curatorial project.

Q: Is there anything this won’t do? What’s out of scope?

A: We’re not providing tools for creating animated gifs. We don’t want to become a platform for high-level presentations. [metaLab's Zeega project does that.] And there’s a spectrum of media we’ll leave alone (e.g., audio) because integrating them with other media is difficult.

Q: How about shared search, i.e., searching other collections?

A: Great idea. We haven’t pursued this yet.

Q: Custodianship is not the same as meta-curation. Chris Daly could become a meta-curator. Also, there’s a lot of great art curation at Pinterist. Maybe you should be doing this on top of Pinterest? Maybe built spotlight tools for Pinteresters?

A: Great idea. We already do some work along those lines. This project happens to emerge from contact with a particular collection, one that doesn’t have an API.

Q: The fact that people are re-uploading the same images to Pinterest is due to the lack of standards.

Q: Are you going to be working on the vocabulary, or let someone else worry about that?

A: So far, we’re avoiding those questions…although it’s already a problem with the tags in this collection.

[Looks really interesting. I'd love to see it integrate with the work the Harvard Library Interoperability Initiative is doing.]

Next »