Subscribe to


Hanan Cohen points me to a blog post by a MLIS student at Haifa U., named Shir, in which she discourses on the term “paradata.” Shir cites Mark Sample who in 2011 posted a talk he had given at an academic conference, Mark notes the term’s original meaning:

In the social sciences, paradata refers to data about the data collection process itself—say the date or time of a survey, or other information about how a survey was conducted.

Mark intends to give it another meaning, without claiming to have worked it out fully. :

…paradata is metadata at a threshold, or paraphrasing Genette, data that exists in a zone between metadata and not metadata. At the same time, in many cases it’s data that’s so flawed, so imperfect that it actually tells us more than compliant, well-structured metadata does.

His example is We Feel Fine, a collection of tens of thousands (or more … I can’t open the site because Amtrak blocks access to what it intuits might be intensive multimedia) of sentences that begin “I feel” from many, many blogs. We Feel Fine then displays the stats in interesting visualizations. Mark writes:

…clicking the Age visualizations tells us that 1,223 (of the most recent 1,500) feelings have no age information attached to them. Similarly, the Location visualization draws attention to the large number of blog posts that lack any metadata regarding their location.

Unlike many other massive datamining projects, say, Google’s Ngram Viewer, We Feel Fine turns its missing metadata into a new source of information. In a kind of playful return of the repressed, the missing metadata is colorfully highlighted—it becomes paradata. The null set finds representation in We Feel Fine.

So, that’s one sense of paradata. But later Mark makes it clear (I think) that We Feel Fine presents paradata in a broader sense: it is sloppy in its data collection. It strips out HTML formatting, which can contain information about the intensity or quality of the statements of feeling the project records. It’s lazy in deciding which images from a target site it captures as relevant to the statement of feeling. Yet, Mark finds great value in We Feel Fine.

His first example, where the null set is itself metadata, seems unquestionably useful. It applies to any unbounded data set. For example, that no one chose answer A on a multiple choice test is not paradata, just as the fact that no one has checked out a particular item from a library is not paradata. But that no one used the word “maybe” in an essay test is paradata, as would be the fact that no one has checked out books in Aramaic and Klingon in one bundle. Getting a zero in a metadata category is not paradata; getting a null in a category that had not been anticipated is paradata. Paradata should therefore include which metadata categories are missing from a schema. E.g., that Dublin Core does not have a field devoted to reincarnation says something about the fact that it was not developed by Tibetans.

But I don’t think that’s at the heart of what Mark means by paradata. Rather, the appearance of the null set is just one benefit of considering paradata. Indeed, I think I’d call this “implicit metadata” or “derived metadata,” not “paradata.”

The fuller sense of paradata Mark suggests — “data that exists in a zone between metadata and not metadata” — is both useful and, as he cheerfully acknowleges, “a big mess.” It immediately raises questions about the differences between paradata and pseudodata: if We Feel Fine were being sloppy without intending to be, and if it were presenting its “findings” as rigorously refined data at, say, the biennial meeting of the Society for Textual Analysis, I don’t think Mark would be happy to call it paradata.

Mark concludes his talk by pointing at four positive characteristics of the We Feel Fine site:? It’s inviting, paradata, open, and juicy. (“Juicy” means that there’s lots going on and lots to engage you.) It seems to me that the site’s only an example of paradata because of the other three. If it were a jargon-filled, pompous site making claims to academic rigor, the paradata would be pseudodata.

This isn’t an objection or a criticism. In fact, it’s the opposite. Mark’s post, which is based on a talk that he gave at the Society for Textual Analysis, is a plea for research thatis inviting, open, juicy, and is willing to acknowledge that its ideas are unfinished. Mark’s post is, of course, paradata.

The post Paradata appeared first on Joho the Blog.

16 Responses to “Paradata”

  1. on 09 May 2014 at 8:29 pmShantae

    If you have to warning some sort of long term contract, it really is
    well-advised which you look at in which agreement to see anything
    that stands out. As with any online marketing activity,
    keep the message short, sweet and direct and be sure to include the necessary contact
    information needed to send in a referral.
    On the other hand, someone driving a car who sees
    a billboard, will at best be interested and might decide to get more information at some time.

    Here is my blog post: SMRT CEO [Shantae]

  2. on 09 May 2014 at 8:55 pmMarcela

    Thanks A Bunch for this site, can blissfully declare we
    now have a steam shower of our very own and we love it

    My website :: steam shower uk (Marcela)

  3. on 06 Jun 2014 at 3:07 amgrow taller tips

    It’s hard to find knowledgeable people about this subject, but you seem like
    you know what you’re talking about! Thanks

  4. on 09 Jun 2014 at 3:28 ambest hotels in santorini

    And now there was un-abounded joyy in mmy heart, as I
    set offf again in a mood of fervor, and iff all wwnt well enough it should
    remain confident. What mattered now was getting
    ready again for the road. And, according to the Greek Hotel Branding Report, branded hotels in Greece account for 4% of the total number of hotels and 19% of total availability of rooms, while in other
    European countries this figure lies between 25 and 40%.

  5. on 20 Jun 2014 at 11:49 amseo for lawyers

    Belief in what you are saying is transmitted to your prospect on an emotionaql level.
    A fiscal sponsor is another non-profit organization that will help you when you lack resources for startup costs and fees and
    skills. The lawyer is knowledgeable and trained in negotiation, and will be
    able to determine the fair payout price and make sure you get your fair share of reimbursement.

  6. on 20 Jun 2014 at 6:51 coupons

    We are wellstocked for only majestic selection catching each latest styles Use Bloomingdales Coupons,
    Bloomingdales Promotional Codes pleasant Blooming dales Coupon Codes at Coupon – Winner off save online.

    Tibi has rapidly exspanded offerinjg of ten collections
    per year as well as a complete she line, swimwear, and home accessories.
    That means that when pewrusing either the grocery marketplace or alternatively the community
    farmers’ market it really is difficult never to find
    the precise kind as well as amount off almonds we are scouting for.

  7. on 20 Jun 2014 at 8:12 pm

    From the freeways to the canyons to the surface
    streets, it is pretty clear that Loss Angelenos do a lot of driving.
    This article is not meant to be interpreted as a legal advice.
    Department of Labor offers a great tool to find local offices (link in reference

  8. You will be amazed the psychics and psychic readings.
    Full of misunderstanding and conflicting information, we do not live in a world that iss simple, expecting the afterlife
    to be ordered and sensible seems at odds when you consider the
    perpetual confusion of earthly existence.
    What ever you’re looking forr in your future, we
    can help you.

  9. on 24 Jun 2014 at 6:34

    Despite the recession, funds are available for new business Startups right now.
    However you can also avail government business grants for your existing business
    and these grantrs also have the same range which can also be in the same amount range.
    Make sure the funds are domestic, and that your funding contract has specific dates for when you will receive funds.

  10. on 09 Jul 2014 at 6:25

    While only 20 percent of the men with normal testosterone died during follow-up, deaths occurred among 24.
    Towards the end of the article, I reveal how to gget helpful information if you’re seriously considering buying this supplement.
    From leaking into the food chain to contraceptive pills, plastics, pesticides and other manufacturing processes, the human race has
    never had to be burdened with such high levels Estrogen before.

  11. on 24 Jul 2014 at 12:05 am

    I take pleasure in, lead to I discovered just what I
    was taking a look for. You’ve ended my 4 day long hunt!
    God Bless you man. Have a great day. Bye

  12. on 04 Aug 2014 at 1:19 pmKellye

    Never ever heard of a steam shower enclosure until I discovered this site, so glad I did so want to have one now and funds permitting will probably be purchasing one
    very soon

    My web page – steam showers for less, Kellye,

  13. on 13 Sep 2014 at 9:01 pmSlim Garcinia

    Thank you for sharing your info. I truly appreciate
    your efforts and I will be waiting for your further post
    thank you once again.

    Also visit my blog post – Slim Garcinia

  14. on 24 Oct 2014 at 1:14 ammake taller yourself

    Hello, after reading this amazing post i am too happy
    to share my experience here with friends.

  15. on 28 Jan 2015 at 6:02 amMelvina

    looked over the info within this site lots, planning to take
    the leap very soon and buy a steam shower
    cabin, in all likelihood after the holiday season

    my website steam showers reviews (Melvina)

  16. on 19 Feb 2015 at 3:31 amtest pdf

    By providing insights into a persons behavioral traits, managers may take working with decisions with a economical
    amount out of precision. Also, studies own found which
    will a dummy during very painful procedures, for example ,
    a vaccination shot, can easily ease any kind of child’s worry.