Blog by Sumana Harihareswara, Changeset founder

15 Nov 2013, 14:44 p.m.

Code4Lib, Open Data, Open Access, and Fighting Systemic Bias

Hi, reader. I wrote this in 2013 and it's now more than five years old. So it may be very out of date; the world, and I, have changed a lot since I wrote it! I'm keeping this up for historical archive purposes, but the me of today may 100% disagree with what I said then. I rarely edit posts after publishing them, but if I do, I usually leave a note in italics to mark the edit and the reason. If this post is particularly offensive or breaches someone's privacy, please contact me.

"Missing from Wikipedia" (code) makes me happy. I presented about it yesterday at Hacker School, asked a fellow HSer to discuss his critique of my code, and - live! on stage! - merged his pull request. Yay for code review and collaboration! (I also showed off a much sillier toy I made, which grabs some sentence from an English Wikipedia page if you give it a topic. Sample for "Chairs": "Some are decorative.")

I am grateful and proud that I can, with "Missing from Wikipedia," make a small contribution to the ecology of openly licensed code and content that I draw from. I could make "Missing from Wikipedia" because:

  1. the data for all Wikimedia projects is available under an open content license
  2. and queryable via an open-to-all API
  3. that lets you get information about 50 pages at a time (and with not-too-terrible rate limiting)
  4. that I could access using a good open source library with great docs
  5. available for an excellent and well-documented open source programming language
  6. that already Just Works with my source control system, text editor, operating system, and laptop
And so on. I fork from the repos of giants.

But we can only use a tool like "Missing from Wikipedia" if we have data to feed into it: a list of names. This is another way open data and open access to research is important. If we can get digital copies of things like the tables of contents of other encyclopedias and dictionaries, that makes it easier for us to systematically check for missing coverage on Wikipedia. But if those lists and tables are behind paywalls, then we can't see them.

And we need access to research papers, to help us figure out what tools to write. Let's say you'd like to fight systemic bias on Wikipedia and you want to write the most effective tool you can. What proportion of these citations on the effect of sexist language can you read & assess yourself? What proportion of the research that would help you do your job better is behind a paywall, and therefore not just hard to find, but essentially undiscoverable? Papers you can't link to are like missing Wikipedia articles -- out of sight, out of mind, out of the group discourse.

Code4Lib logo At this point I wave my hands excitedly and go off in some direction expounding on the intersection of open stuff (especially Wikimedia), social justice, comedy, and transformation. I presume I will cover similar topics in March 2014 when I keynote the Code4Lib conference, speaking to people who make things for/with cultural institutions. (Such an honor to be asked to keynote Code4Lib! And with Val Aurora of The Ada Initiative giving the other keynote!)

I've benefited so much from the ecology of open stuff. I aim to reciprocate, and to help make it even better.