Subscribe to
Posts
Comments

It’s good to have Hoder — Hossein Derakhshan— back. After spending six years in an Iranian jail, his voice is stronger than ever. The changes he sees in the Web he loves are distressingly real.

Hoder was in the cohort of early bloggers who believed that blogs were how people were going to find their voices and themselves on the Web. (I tried to capture some of that feeling in a post a year and a half ago.) Instead, in his great piece in Medium he describes what the Web looks like to someone extremely off-line for six years: endless streams of commercial content.

Some of the decline of blogging was inevitable. This was made apparent by Clay Shirky’s seminal post that showed that the scaling of blogs was causing them to follow a power law distribution: a small head followed by a very long tail.

Blogs could never do what I, and others, hoped they would. When the Web started to become a thing, it was generally assumed that everyone would have a home page that would be their virtual presence on the Internet. But home pages were hard to create back then: you had to know HTML, you had to find a host, you had to be so comfortable with FTP that you’d use it as a verb. Blogs, on the other hand, were incredibly easy. You went to one of the blogging platforms, got yourself a free blog site, and typed into a box. In fact, blogging was so easy that you were expected to do it every day.

And there’s the rub. The early blogging enthusiasts were people who had the time, skill, and desire to write every day. For most people, that hurdle is higher than learning how to FTP. So, blogging did not become everyone’s virtual presence on the Web. Facebook did. Facebook isn’t for writers. Facebook is for people who have friends. That was a better idea.

But bloggers still exist. Some of the early cohort have stopped, or blog infrequently, or have moved to other platforms. Many blogs now exist as part of broader sites. The term itself is frequently applied to professionals writing what we used to call “columns,” which is a shame since part of the importance of blogging was that it was a way for amateurs to have a voice.

That last value is worth preserving. It’d be good to boost the presence of local, individual, independent bloggers.

So, support your local independent blogger! Read what she writes! Link to it! Blog in response to it!

But, I wonder if a little social tech might also help. . What follows is a half-baked idea. I think of it as BOAB: Blogger of a Blogger.

Yeah, it’s a dumb name, and I’m not seriously proposing it. It’s an homage to Libby Miller [twitter:LibbyMiller] and Dan Brickley‘s [twitter:danbri ] FOAF — Friend of a Friend — idea, which was both brilliant and well-named. While social networking sites like Facebook maintain a centralized, closed network of people, FOAF enables open, decentralized social networks to emerge. Anyone who wants to participate creates a FOAF file and hosts it on her site. Your FOAF file lists who you consider to be in your social network — your friends, family, colleagues, acquaintances, etc. It can also contain other information, such as your interests. Because FOAF files are typically open, they can be read by any application that wants to provide social networking services. For example, an app could see that Libby ‘s FOAF file lists Dan as a friend, and that Dan’s lists Libby, Carla and Pete. And now we’re off and running in building a social network in which each person owns her own information in a literal and straightforward sense. (I know I haven’t done justice to FOAF, but I hope I haven’t been inaccurate in describing it.)

BOAB would do the same, except it would declare which bloggers I read and recommend, just as the old “blogrolls” did. This would make it easier for blogging aggregators to gather and present networks of bloggers. Add in some tags and now we can browse networks based on topics.

In the modern age, we’d probably want to embed BOAB information in the HTML of a blog rather than in a separate file hidden from human view, although I don’t know what the best practice would be. Maybe both. Anyway, I presume that the information embedded in HTML would be similar to what Schema.org does: information about what a page talks about is inserted into the HTML tags using a specified vocabulary. The great advantage of Schema.org is that the major search engines recognize and understand its markup, which means the search engines would be in a position to constructdiscover the initial blog networks.

In fact, Schema.org has a blog specification already. I don’t see anything like markup for a blogroll, but I’m not very good a reading specifications. In any case, how hard could it be to extend that specification? Mark a link as being to a blogroll pal, and optionally supply some topics? (Dan Brickley works on Schema.org.)

So, imagine a BOAB widget that any blogger can easily populate with links to her favorite blog sites. The widget can then be easily inserted into her blog. Hidden from the users in this widget is the appropriate Schema.org markup. Not only could the search engines then see the blogger network, so could anyone who wanted to write an app or a service.

I have 0.02 confidence that I’m getting the tech right here. But enhancing blogrolls so that they are programmatically accessible seems to me to be a good idea. So good that I have 0.98 confidence that it’s already been done, probably 10+ years ago, and probably by Dave Winer :)


Ironically, I cannot find Hoder’s personal site; www.hoder.com is down, at least at the moment.

More shamefully than ironically, I haven’t updated this blog’s blogroll in many years.


My recent piece in The Atlantic about whether the Web has been irremediably paved touches on some of the same issues as Hoder’s piece.

The post Restoring the Network of Bloggers appeared first on Joho the Blog.

I’m in Oslo for Kunnskapsorganisasjonsdagene, which my dear friend Google Translate tells me is Knowledge Organization Days. I have been in Oslo a few times before — yes, once in winter, which was as cold as Boston but far more usable — and am always re-delighted by it.

Alex Wright is keynoting this morning. The last time I saw him was … in Oslo. So apparently Fate has chosen this city as our Kismet. Also coincidence. Nevertheless, I always enjoy talking with Alex, as we did last night, because he is always thinking about, and doing, interesting things. He’s currently at Etsy , which is a fascinating and inspiring place to work, and is a professor interaction design,. He continues to think about the possibilities for design and organization that led him to write about Paul Otlet who created what Alex has called an “analog search engine”: a catalog of facts expressed in millions of index cards.

NOTE: Live-blogging. Getting things wrong. Missing points. Omitting key information. Introducing artificial choppiness. Over-emphasizing small matters. Paraphrasing badly. Not running a spellpchecker. Mangling other people’s ideas and words. You are warned, people.

Alex begins by telling us that he began as a librarian, working as a cataloguer for six years. He has a library degree. As he works in the Net, he finds himself always drawn back to libraries. The Net’s fascination with the new brings technologists to look into the future rather than to history. Alex asks, “How do we understand the evolution of the Web and the Net in an historical context?” We tend to think of the history of the Net in terms of computer science. But that’s only part of the story.

A big part of the story takes us into the history of libraries, especially in Europe. He begins his history of hypertext with the 16th century Swiss naturalist Conrad Gessner who created a “universal bibliography” by writing each entry on a slip of paper. Leibniz used the same technique, writing notes on slips of paper and putting them in an index cabinet he had built to order.

In the 18th century, the French started using playing cards to record information. At the beginning of the 19th, the Jacquard loom used cards to guide weaving patterns, inspiring Charles Babbage to create what many [but not me] consider to be the first computer.

In 1836, Isaac Adams created the steam powered printing press. This, along with economic and social changes, enabled the mass production of books, newspapers, and magazines. “This is when the information explosion truly started.”

To make sense of this, cataloging systems were invented. They were viewed as regimented systems that could bring efficiencies … a very industrial concept, Alex says.

“The mid-19th century was also a period of networking”: telegraph systems, telephones, internationally integrated postal systems. “Goods, people, and ideas were flowing across national borders in a way they never had before.” International journals. International political movements, such as Marxism. International congresses (conferences). People were optimistic about new political structures emerging.

Alex lists tech from the time that spread information: a daily reading of the news over copper wires, pneumatic tubes under cities (he references Molly Wright Steenson‘s great work on this), etc.

Alex now tells us about Paul Otlet, a Belgian who at the age of 15 started designing his own cataloging system. He and a partner, Henri La Fontaine, started creating bibliographies of disciplines, starting with the law. Then they began a project to create a universal bibliography.

Otlet thought libraries were focused on the wrong problem. Getting readers to the right book isn’t enough. People also need access to the information in the books. At the 1900 [?] world’s fair in Paris, Otlet and La Fontaine demonstrated their new system. They wanted to provide a universal language for expressing the connections among topics. It was not a top-down system like Dewey’s.

Within a few years, with a small staff (mainly women) they had 15 million cards in their catalog. You could buy a copy of the catalog. You could send a query by telegraphy, and get a response telegraphed back to you, for a fee.

Otlet saw this in a bigger context. He and La Fontaine created the Union of International Associations, an association of associations, as the governing body for the universal classification system. The various associations would be responsible for their discpline’s information.

Otlet met a Scotsman named Patrick Geddes who worked against specialization and the fracturing of academic disciplines. He created a camera obscura in Edinburgh so that people could see all of the city, from the royal areas and the slums, all at once. He wanted to stitch all this information together in a way that would have a social effect. [I’ve been there as a tourist and had no idea!] He also used visual forms to show the connections between topics.

Geddes created a museum, the Palais Mondial, that was organized like hypertext., bringing together topics in visually rich, engaging displays. The displays are forerunners of today’s tablet-based displays.

Another collaborator, Hendrik Christian Andersen, wanted to create a world city. He went deep into designing it. He and Otlet looked into getting land in Belgium for this. World War I put a crimp in the idea of the world joining in peace. Otlet and Andersen were early supporters of the idea of a League of Nations.

After the War, Otlet became a progressive activist, including for women’s rights. As his real world projects lost momentum, in the 1930s he turned inward, thinking about the future. How could the new technologies of radio, television, telephone, etc., come together? (Alex shows a minute from the documentary, The Man who wanted to Classify the World.”) Otlet imagines a screen and television instead of books. All the books and info are in a separate facility, feeding the screen. “The radiated library and the televised book.” 1934.

So, why has no one ever heard of Otlet? In part because he worked in Belgium in the 1930s. In the 1940s, the Nazis destroyed his work. They replaced his building, destrooying 70 tons of materials, with an exhibit of Nazi art.

Although there are similarities to the Web, how Otlet’s system worked was very different. His system was a much more controlled environment, with a classification system, subject experts, etc. … much more a publishing system than a bottom-up system. Linked Data and the Semantic Web are very Otlet-ish ideas. RDF triples and Otlet’s “auxiliary tables” are very similar.

Alex now talks about post-Otlet hypertext pioneers.

H.G. Wells’ “World Brain” essay from 1938. “The whole human memory can be, and probably in a shoirt time will be, made accessibo every individual.” He foresaw a complete and freely avaiable encyclopedia. He and Otlet met at a conference.

Emanuel Goldberg wanted to encode punchcard-style information on microfilm for rapid searching.

Then there’s Vannevar Bush‘s Memex that would let users create public trails between documents.

And Liklider‘s idea that different types of computers should be able to share infromation. And Engelbart who in 1968’s “Mother of all Demos” had a functioning hypertext system.

Ted Nelson thought computer scientists were focused on data computation rather than seeing computers as tools of connection. He invnted the term “hypertext,” the Xanadu web, and “transclusion” (embedding a doc in another doc). Nelson thought that links always should be two way. Xanadu= “intellectual property” controls built into it.

The Internet is very flat, with no central point of control. It’s self-organizing. Private corporations are much bigger on the Net than Otlet, Engelbart, and Nelson envisioned “Our access to information is very mediated.” We don’t see the classification system. But at sites like Facebook you see transclusion, two-way linking, identity management — needs that Otlet and others identified. The Semantic Web takes an Otlet-like approach to classification, albeit perhaps by algorithms rather than experts. Likewise, the Google “knowledge vaults” project tries to raise the ranking of results that come from expert sources.

It’s good to look back at ideas that were left by the wayside, he concludes, having just decisively demonstrated the truth of that conclusion :)

Q: Henry James?

A: James had something of a crush on Anderson, but when he saw the plan for the World City told him that it was a crazy idea.

[Wonderful talk. Read his book.]

The post [misc][liveblog] Alex Wright: The secret history of hypertext appeared first on Joho the Blog.

A dumb idea, but its dumbness is its virtue.

The idea is that libraries that want to make data about how relevant items are to their communities could algorithmically assign a number between 1-100 to those items. This number would present a very low risk of re-identification, would be easily compared across libraries, and would give local libraries control over how they interpret relevance.

I explain this idea in a post at The Chronicle of Higher Ed

The post A dumb idea for opening up library usage data appeared first on Joho the Blog.

BoogyWoogy library browser

Just for fun, over the weekend I wrote a way of visual browsing the almost 13M items in the Harvard Library collection. It’s called the “BoogyWoogy Browser” in honor of Mondrian. Also, it’s silly. (The idea for something like this came out of a conversation with Jeff Goldenson several years ago. In fact, it’s probably his idea.)

screen capture

You enter a search term. It returns 5-10 of the first results of a search on the Library’s catalog, and lays them out in a line of squares. You click on any of the squares and it gets another 5-10 items that are “like” the one you clicked on … but you get to choose one of five different ways items can be alike. At the strictest end, they are other items classified under the same first subject. At the loosest end, the browser takes the first real word of the title and does a simple keyword search on it, so clicking on Fifty Shades of Gray will fetch items that have the word “fifty” in their titles or metadata.

It’s fragile, lousy code (see for yourself at Github), but that’s actually sort of the point. BoogyWoogy is a demo of the sort of thing even a hobbyist like me can write using the Harvard LibraryCloud API. LibraryCloud is an open library platform that makes library metadata available to developers. Although I’ve left the Harvard Library Innovation Lab that spawned this project, I’m still working on it through November as a small but talented and knowledgeable team of developers at the Lab and Harvard Library Technical Services are getting ready for a launch of a beta in a few months. I’ll tell you more about it as the time approaches. For example, we’re hoping to hold a hackathon in November.

Anyway, feel free to give BoogyWoogy a try. And when it breaks, you have no one to blame but me.

The post BoogyWoogy library browser appeared first on Joho the Blog.

Et tu, U2?

A few days ago, when Apple pushed the latest from U2 into everyone’s iTunes library, you could hear the Internet pause as it suddenly realized that Apple is its parents’ age.

Now in the ad-promotion succubus occupying the body of what used to be Time Magazine, you can see U2 desperate to do exactly the wrong thing: insisting that it wasn’t a gift at all. You can learn more about this in the hilariously titled cover article of Time: “The veteran rock band faces the future.” This a future in which tracks we don’t like are bundled with tracks we do (the return of the CD format) and people who share with their fans are ruining it for U2, boohoo.

Or, as Bono recently said, “We were paid” for the Apple downloads, adding, “I don’t believe in free music. Music is a sacrament.” And as everyone knows, sacraments need to be purchased at a fair market value, the results of which Bono, as a deeply spiritual artist, secures in sacred off-shore accounts.

In my head I hear Bono, enraged by the increasingly bad publicity, composing a message that he posts without first running it through his phalanx of PR folks:

Dear fans:

You have recently received a copy of our latest album, Songs of Innocence, in your iTunes library. U2 understands you may be confused or even upset by this. So, let me clarify once and for all the most important point about this — if I may humbly say so — eternal masterpiece. It was not our intention to cause you stress or to wonder if you have the musical sensitivity to full grasp (if I may, humbly say) the greatness of our work. But most important, it is essential above all that you understand that it was not our intention to give you a gift. No freaking way.

We understand your mistake. You are, after all, just fans, and you don’t play in the Jetstream world of global music. As I said to my dear friend Nelson Mandela (friend is too weak a word; I was his mentor) shortly before he passed, music is a sacrament, just like tickets to movies, especially ones with major stars working for scale, or like the bill at a restaurant where you and any two of the Clintons (Chelsea, you are a star! Give yourself that!) are plotting goodness.

To tell you the truth, I’m disappointed in you. No, worse. I’m hurt. Personally hurt. How dare you think this was a gift! After all these years, is that all U2 is worth to you? Nothing? Our music has all the value of a CrackerJacks trinket or a lower-end Rolex in an awards show gift bag? Do you not understand that Apple paid us for every copy they distributed? We were paid for it, sheeple! Massive numbers of dollars were transferred into our bank accounts! More dollars than you could count, you whiny little “Ooh look at me I’m sharing” wankers! We’re U2 dammit! We don’t need you! You need us! MONEY IS LOVE! EXTRA-ORDINARY LOVE!!!!!!

Have a beautiful day.

Meanwhile, as always, Amanda Palmer expresses the open-hearted truth about this issue. It almost makes me regret making fun of Bono. Almost.

>Bono makes it clear U2 was paid for the

The post Et tu, U2? appeared first on Joho the Blog.

This is one of the most amazing examples I’ve seen of the complexity of even simple organizational schemes. “Unicode Collation Algorithm (Unicode Technical Standard #10)” spells out in precise detail how to sort strings in what we might colloquially call “alphabetical order.” But it’s way, way, way more complex than that.

Unicode is an international standard for how strings of characters get represented within computing systems. For example, in the familiar ASCII encoding, the letter “A” is represented in computers by the number 65. But ASCII is too limited to encode the world’s alphabets. Unicode does the job.

As the paper says, “Collation is the general term for the process and function of determining the sorting order of strings of characters” so that, for example, users can look them up on a list. Alphabetical order is a simple form of collation.

Sorting inconsistent alphabets is, well, a problem. But let Technical Standard #10 explain the problem:

It is important to ensure that collation meets user expectations as fully as possible. For example, in the majority of Latin languages, ø sorts as an accented variant of o, meaning that most users would expect ø alongside o. However, a few languages, such as Norwegian and Danish, sort ø as a unique element after z. Sorting “Søren” after “Sylt” in a long list, as would be expected in Norwegian or Danish, will cause problems if the user expects ø as a variant of o. A user will look for “Søren” between “Sorem” and “Soret”, not see it in the selection, and assume the string is missing, confused because it was sorted in a completely different location.

Heck, some French dictionaries even sort their accents in reverse order. (See Section 1.3.)

But that’s nothing. Here’s a fairly random paragraph from further into this magnificent document (section 7.2):

In the DUCET, characters are given tertiary weights according to Table 17. The Decomposition Type is from the Unicode Character Database [UAX44]. The Case or Kana Subtype entry refers either to a case distinction or to a specific list of characters. The weights are from MIN = 2 to MAX = 1F16, excluding 7, which is not used for historical reasons.

Or from section 8.2:

Users often find asymmetric searching to be a useful option. When doing an asymmetric search, a character (or grapheme cluster) in the query that is unmarked at the secondary and/or tertiary levels will match a character in the target that is either marked or unmarked at the same levels, but a character in the query that is marked at the secondary and/or tertiary levels will only match a character in the target that is marked in the same way.

You may think I’m being snarky. I’m not at all. This document dives resolutely into the brambles and does not give up. It incidentally exposes just how complicated even the simplest of sorting tasks is when looked at in their full context, where that context is history, language, culture, and the ambiguity in which they thrive.

The post [eim] Alphabetical order explained in a mere 27,817 words appeared first on Joho the Blog.

Two percent of Harvard’s library collection circulates every year. A high percentage of the works that are checked out are the same as the books that were checked out last year. This fact can cause reflexive tsk-tsking among librarians. But — with some heavy qualifications to come — this is at it should be. The existence of a Long Tail is not a sign of failure or waste. To see this, consider what it would be like if there were no Long Tail.

Harvard’s 73 libraries have 16 million items [source]. There are 21,000 students and 2,400 faculty [source]. If we guess that half of the library items are available for check-out, which seems conservative, that would mean that 160,000 different items are checked out every year. If there were no Long Tail, then no book would be checked out more than any other. In that case, it would take the Harvard community an even fifty years before anyone would have read the same book as anyone else. And a university community in which across two generations no one has read the same book as anyone else is not a university community.

I know my assumptions are off. For example, I’m not counting books that are read in the library and not checked out. But my point remains: we want our libraries to have nice long tails. Library long tails are where culture is preserved and discovery occurs.

And, having said that, it is perfectly reasonable to work to lower the difference between the Fat Head and the Long Tail, and it is always desirable to help people to find the treasures in the Long Tail. Which means this post is arguing against a straw man: no one actually wants to get rid of the Long Tail. But I prefer to put it that this post argues against a reflex of thought I find within myself and have encountered in others. The Long Tail is a requirement for the development of culture and ideas, and at the same time, we should always help users to bring riches out of the Long Tail

The post [2b2k] In defense of the library Long Tail appeared first on Joho the Blog.

There’s a terrific article by Helen Vendler in the March 24, 2014 New Republic about what can learn about Emily Dickinson by exploring her handwritten drafts. Helen is a Dickinson scholar of serious repute, and she finds revelatory significance in the words that were crossed out, replaced, or listed as alternatives, in the physical arrangement of the words on the page, etc. For example, Prof. Vendler points to the change of the line in “The Spirit” : “What customs hath the Air?” became “What function hath the Air?” She says that this change points to a more “abstract, unrevealing, even algebraic” understanding of “the future habitation of the spirit.”

Prof. Vendler’s source for many of the poems she points to is Emily Dickinson: The Gorgeous Nothings, by Marta Werner and Jen Bervin, the book she is reviewing. But she also points to the new online Dickinson collection from Amherst and Harvard. (The site was developed by the Berkman Center’s Geek Cave.)


Unfortunately, the New Republic article is not available online. I very much hope that it will be since it provides such a useful way of reading the materials in the online Dickinson collection which are themselves available under a CreativeCommons license that enables
non-commercial use without asking permission.

The post Reading Emily Dickinson’s metadata appeared first on Joho the Blog.

CityCodesAndOrdinances.xml

A friend is looking into the best way for a city to publish its codes and ordinances to make them searchable and reusable. What are the best schemas or ontologies to use?

I work in a law school library so you might think I’d know. Nope. So I asked a well-informed mailing list. Here’s what they have suggested, more or less in their own words:


Any other suggestions?

The post CityCodesAndOrdinances.xml appeared first on Joho the Blog.

Schema.org…now for datasets!

I had a chance to talk with Dan Brickley today, a semanticizer of the Web whom I greatly admire. He’s often referred to as a co-creator of FOAF, but these days he’s at Google working on Schema.org. He pointed me to the work Schema has been doing with online datasets, which I hadn’t been aware of. Very interesting.

Schema.org, as you probably know, provides a set of terms you can hide inside the HTML of your page that annotate what the visible contents are about. The major search engines — Google, Bing, Yahoo, Yandex — notice this markup and use it to provide more precise search results, and also to display results in ways that present the information more usefully. For example, if a recipe on a page is marked up with Schema.org terms, the search engine can identify the list of ingredients and let you search on them (“Please find all recipes that use butter but not garlic”) and display them in a more readable away. And of course it’s not just the search engines that can do this; any app that is looking at the HTML of a page can also read the Schema markup. There are Schema.org schemas for an ever-expanding list of types of information…and now datasets.

If you go to Schema.org/Dataset and scroll to the bottom where it says “Properties from Dataset,” you’ll see the terms you can insert into a page that talk specifically about the dataset referenced. It’s quite simple at this point, which is an advantage of Schema.org overall. But you can see some of the power of even this minimal set of terms over at Google’s experimental Schema Labs page where there are two examples.

The first example (click on the “view” button) does a specialized Google search looking for pages that have been marked up with Schema’s Dataset terms. In the search box, try “parking,” or perhaps “military.” Clicking on a return takes you to the original page that provides access to the dataset.

The second demo lets you search for databases related to education via the work done by LRMI (Learning Resource Metadata Initiative); the LRMI work has been accepted (except for the term useRightsUrl) as part of Schema.org. Click on the “view” button and you’ll be taken to a page with a search box, and a menu that lets you search the entire Web or a curated list. Choose “entire Web” and type in a search term such as “calculus.”

This is such a nice extension of Schema.org. Schema was designed initially to let computers parse information on human-readable pages (“Aha! ‘Butter’ on this page is being used as a recipe ingredient and on that page as a movie title“), but now it can be used to enable computers to pull together human-readable lists of available datasets.

I continue to be a fan of Schema because of its simplicity and pragmatism, and, because the major search engines look for Schema markup, people have a compelling reason to add markup to their pages. Obviously Schema is far from the only metadata scheme we need, nor does it pretend to be. But for fans of loose, messy, imperfect projects that actually get stuff done, Schema is a real step forward that keeps taking more steps forward.

The post Schema.org…now for datasets! appeared first on Joho the Blog.

Next »