Subscribe to
Posts
Comments

Archive for the 'facets' Category

Andy Carvin (in a tweet) points to the Wikipedia entry on the phrase “Viewers like you.” All part of the Web’s dismantling (and reassembling) of the traditional notion of topics.

[Tags: ]

Facets of reference

Stephen Francoeur has a very interesting post that tries to miscellanize the desk vs. digital reference dichotomy by suggesting lots of other facets one could use to slice and dice the reference trade. (Self-aggrandizement warning: He begins his post by saying something nice about EiM.)

Thomas Mann (no, not that one) has a fascinating and important article about why tagging, folksonomies, and the rest of the hip Web 2.0 stuff is inadequate to meet the needs of scholars looking for information. It is, at least informally, a response to the Calhoun Report.

His example of trying to find information about “tribute payments in the Peloponnesian War” is classic and convincing: Finding what the scholar needs requires smart human guides and the smart guides that humans have created for scholars.

But, of course that doesn’t scale:

I would be the first to agree that the inexpensive indexing methods of term weighting, tagging, and folksonomy referrals–none of which requires expensive professional input–are entirely appropriate for dealing with most of the Internet’s Web offerings. With billions of sites to be indexed, it is out of the question to think that traditional cataloging can be applied to all of them. No one in his right mind would say otherwise.

But there is a crucial distinction that is being swept under the rug: the difference between quick information seeking and scholarship.

And, he says, scholarship requires books. Thus, the labor- and intelligence-intensive scholarly information clustering techniques will continue to work because the flow of books will continue to be relatively slow:

The universe of books published every year is much smaller, and much more manageable, than the universe of Web sites; this is the “niche” of sources to which professional cataloging should be primarily devoted. … Most of the billions of Web sites do not merit this level of attention to begin with; they are too inconsequential and too ephemeral. If we are going to promote scholarship, it is not enough to simply digitize the books for immediate retrieval if term weighting of keywords, tagging, and folksonomy referrals are the only mechanisms we provide for finding them. It is not at all unrealistic to propose that research libraries fill the niche of providing the best, most systematic, access to books…

He later says that systematic cataloging should not exclude all non-books.

As an argument for maintaining human expertise in manually assembling information into meaningful relationships, this paper is convincing. But it rests on supposing that books will continue to be the locus of worthwhile scholarly information. Suppose more and more scholars move onto the Web and do their thinking in public, in conversation with other scholars? Suppose the Web enables scholarship to outstrip the librarians? Manual assemblages of knowledge would retain their value, but they would no longer provide the authoritative guide. Then we will have either of two results: We will have to rely on “‘lowest common denominator’”and ‘one search box/one size fits all’ searching that positively undermines the

requirements of scholarly research”…or we will have to innovate to address the distinct needs of scholars.

My money is on the latter.

He concludes:

We need to make the best possible use of our principles, our experience, our tested practices, and our technologies, and not yield to the temptations to let either the technologies themselves or transient fashions constrict our vision of what needs to be done to promote scholarship of the highest possible quality–and that is a goal very different from striving to provide ’something quickly.’

Amen.

(Thanks to Bradley Allen for the link.)

[Tags: ]

TagAndFacet lets you tag Web sites, Outlook messages, and Windows files for easy re-finding. It also lets you declare “facets” — metadata categories of continuing use — so you can do faceted, tree-like browsing.  A version is available for free with a limit on how many items you can tag; a for-pay version should be available soon. (I haven’t yet tried it.)

Scott Rosenberg, co-founder of Salon and the author of Dreaming in Code, has posted at Salon an interview with me about Everything is Miscellaneous.

At his blog, Scott adds some “out-takes” from the interview, and recommends the book. Thanks, Scott. [Tags: ]

Google has posted the video of my talk there.

1 billion facets

Well, not exactly. Siderean has announced that a pilot deployment for Elsevier has over one billion RDF triples (the press release says “relations,” but I assume that’s what that means) in what Siderean calls a “relational navigation” system, i.e., a faceted system that allows for looser links across and among the resources.

I’m working off a press release, so I’m probably getting some or all of this wrong. But, it’s still a heck of a lot of relationships. [Tags: ]

Ranganathan’s fantasy

From Ranganathan, the founder of library science:

"Since multiplicity of helpful order among specific subjects is a fact independent of library classification - a fact to be reckoned with in arrangement - how are we to provide for it? It is a case of arranging concrete materials - books and other kindred materials - in such a way that one kind of arrangement presents itself to one person and another kind to another person. To secure this by pressing a button is obviously possible only in the world of fancy; it is not possible in the world of reality."

Ranganathan, Philosophy of Library Classification (1951)

Via Tim Spalding via Jacob Glenn [Tags: ]

EngineeringVillage.org has about 32 million records available, including 10.7 million from the Compendex (Computerized Engineering Index) that has data going back to 1884, 9.5 million records from the Inspec Archives that goes back to 1896, 2.2 milllion government technical records in the NTIS collection, and 9.5 million patent abstracts.

How can you possibly navigate 32 million records? Searching requires second-guessing authors, and with that many records, it’s bound to miss more than it finds. So, EV uses a combination of full text searching and faceted navigation.

For example, if you’re looking for anti-gravity devices, begin by doing a text search on “gravity.” You’ll get 202,162 results. In the righthand frame, you are shown eight areas (facets) — source, author, affiliation, country, document type, year, etc. — each with a list of the occupants of that particular branch. So, under Affiliation, you can see that the Jet Propulsion Lab has 326 records that contain the word “gravity,” while NASA’s Goddard Center only has 155; this by itself is valuable information. Check the NASA box, and now you you can further refine the 234 results by deciding only to see those articles published in the US, and then the ones on solid state physics. We’re now down to 11 articles. But we can always go back and remove the restriction to only articles published by NASA. It’s tree browsing where we get to construct the tree.

Now EngineeringVillage has added user-created tags. Tags can be declared as public, institutional, or belonging to a user-defined group. Very cool. (It would be especially helpful if, say, the US Patent Office were to suck in the tags applied to patents.)

The tag cloud shows that the top tags at the moment — early days for the tagging feature — are “Thermal management,” “sathya,” “Unsaturated soils,” “Wireless sensor networks,” “Photonic crystals,” and “Room temperature,” which suggests that users are working on growing photonic crystals at room temperature for use in wireless sensor networks, to enable the Sathya Sai Organization at long last to achieve world domination.

In an email, Rafael Sidi, VP of product engineering at Elsevier Engineering Information says that the faceted system was built in house using the FAST search engine.

BTW, I think Rafael makes the right response to Steve Rubel’s idea that “It’s very difficult to survive as a paid service in a Long Tail environment. One reason is that it’s now easier to discover free, open source alternatives.” Rafael replies that services like EngineeringVillage add “value to the content that we publish (indexing, writing abstracts), creating better searching features and providing analytical tools (intelligence).” The Long Tail enables the creation of such deep value that only some of that value can be addressed by Open Source solutions (long may they wave).

(Disclosure: Steve Rubel works for Edelman PR, to whom I consult, and I recently did some videoblogging for FastSearch.)

[Tags: ]

Siderean, a faceted classification company, has announced a patent for what it calls "relational navigation."

Faceted classification lets a user browse a field in typical hierarchical fashion—like navigating through the nested folders on your desktop—except the hierarchy is created dynamically as the user decides which property matters to her now. So, instead of having a fixed taxonomy that first divides all books into fiction and non-fiction, and then subdivides them by language and then by year, with a faceted classification, a user might decide first to find all the works written in the 19th century, then drill down to the non-fiction, etc. It has taxonomy's virtue of guiding navigation without its vice of having to present the user with one and only one path through the taxonomy.

Faceted classification and taxonomies both work by showing the user narrower and narrower results . That's often what we want, but in this crazy world, we may also want to leap off the branch we've walked onto. Siderean's relational nav shows context from branches outside of the path you've walked. Siderean refers to this as the ability to "pivot," as in a database pivot.

Techniques that let us play with the dialectic between narrowing our focus and expanding it—searching and discovering—are all to the good. The faceted classification industry overall is up to important and exciting stuff. [Tags: ]