My name is danah boyd and I'm a Principal Researcher at Microsoft Research and the founder/president of Data & Society. Buzzwords in my world include: privacy, context, youth culture, social media, big data. I use this blog to express random thoughts about whatever I'm thinking.

Relevant links:

Archive

random ontology thoughts

Clay finally posted a piece based on his recent talks entitled Ontology is Overrated: Categories, Links and Tags (discussed on many-to-many. It’s a must-read although i suspect it’ll make some of the librarians squirm. The essay is structured in a narrative style, making it super accessible and offering anecdotes to frame very logical arguments. Yet, somehow, i still cannot resist the temptation to respond, albeit in a rambly way since i’m focused on finals. By and large, i agree with the essay but i think that Clay is missing a few things:

– issues of one-to-one and many-to-one
– cognitive overload
– problems of retro-activity
– category splits
– exponential tag growth
– user interfaces from hell


First, it’s important to note that there’s extensive slippage in this essay between four different kinds of classification schemas:
– one to many (periodic table, card catalogs)
– many to many (group del.icio.us)
– one to one (individual del.icio.us, personal bookshelves)
– many to one (also present in tagging)

You cannot discuss whether or not ontological classification works well without contextualizing the source and recipient participants as separate entities. Clay does an excellent job of comparing the one-to-many classification with the many-to-many, citing all of the reasons and places where many-to-many is much more effective. Embedded in this is a set of assumptions about why many-to-many is much more effective than one-to-one. Collective del.icio.us has far greater value than individual del.icio.us. With this same logic, on an individual selfish level, many-to-one is the most advantageous. The individual has to do little work beyond finding others who categorize in ways that feel right. This is the lazy tagger who relies on others to do the work.

I am that lazy tagger. I upload pictures onto Flickr and let others sort them out. I never bother to put more than one tag in my del.icio.us posts or my blog entries. Actually, i’m not lazy so much as cognitively overwhelmed by the process. Herein lies another important missing piece in Clay’s puzzle. Yes, i know that most things can be placed in multiple categories but my brain goes into spazz mode dealing with this. I’m really good at finding one keyword, but when i can present multiple, i go into an infinite loop. Which ones are best? What about synonyms? Do i use first name or last name or both or nickname or …? How do i mark which are the most salient tags? What if i forget a tag? Ack! ::implosion:: Herein lies the reason why i’m a dreadful tagger – i cannot cognitively deal with the task.

This makes me think – what cognitive shifts are required when people are asked to tag as flexibly as they wish? Why we certainly run around the world building mental models of things, we are rarely asked to explicitly categorize and when we are, our task is usually to organize physical things. Physical things do not have flexible, infinite possibilities. Thus, what Clay is arguing for is not something that we currently do. Such a shift is not necessarily accessible if not everyone is comfortable with the task. This becomes increasingly problematic when entire cultures have difficulty with the task.

Finally, not all classification efforts can take advantage of many-to-X. There are also still tasks where bookshelf real estate is a contextual factor and needs to be optimized. I would absolutely love to have collective action descend on my bookshelves or file system and organize that mess. Like Blockbuster, i have two schemes operating: time and topic. The bookshelves may be organized by topic, but the floor is organized by time – things that need to be dealt with now. There are file structures but there is also my Desktop. And as much as i’d be ecstatic to get rid of the folder metaphor, the idea of retro-actively tagging this is haunting.

Retro-activity is another issue that Clay fails to address. While tagging solutions are often useful going forward, going backwards in haunting. I’ve already spent 10 hour trying to get my books and papers into citeulike, but i’m probably only 1/10 of the way there. Retro-activity is also not simply a matter of once and done. After loading 20 books, i realized i missed a crucial tag and had to go back and start over. The problem of retro-activity is disturbingly recursive and thus exponential in time.

Another issue in tagging concerns splits in categories. In hierarchical systems, if i have a category “Asia” and the bookshelf gets too large, i can split it into two, go through the items in that category and split them. With tagging, i am faced with a far greater number of potential items and tags. Rather than splitting Asia, i add a new tag so that now i have Asia and Japan. Now with each new insert, i need to remember to add Asia and Japan instead of just Japan which would include Asia in a hierarchical system. If i fail to do this, i cannot look at the Asia tag and see what is also Japan. Flat structures can be advantageous, but they can also be completely troubling when not all tags are actually of equal weight.

Watching people tag things, i’m also painfully aware that they are still playing the real estate game depending on their own values and conceptualizations. While i have broken the category of theory into tons of separate categories and have one technology category, others have done the reverse. Certainly, many-to-X solves this but it is crucial to remember that people, not just librarians, think in this way.

All this said, i do agree that tagging is a great solution but i would not go so far as to say *the* solution because i still think there are issues that haven’t been worked out because tagging does not solve all ontology issues. But i do think that the arguments that Clay makes are crucial.

Of course, my biggest beef with tagging is not actually at a structural level, but at an implementation level. Not a single tagging system is built to deal with the retro-activity problem. Editing tags requires editing each entry; systems aren’t built to let me do group operations. Let me search through my data, circle relevant entries and add a new tag or subtract one. Group operations. Quickly and without little checkboxes. Give me a visual aid to see how my tags are overlapping so that i can quickly assess if i failed to tag something appropriately. Let me build local hierarchies as i see fit. Let me auto-associate tags so that i can add synonyms or abbreviations easily because i’m tired of writing LiveJournal and LJ because i want to tag it both ways so that i can see what others have done.

Let me easily combine tags when i screw up and write both fanfic and fan_fiction and realize later that i want them to be the same. While context is important, my own inconsistencies are not a context that i want to incorporate into a system.

Another issue is that tags grow exponentially and they’re really hard to manage. I forget about tags that i even had and figuring out how to make certain that i give all appropriate tags to a new entry is an interface nightmare. I look at the long list of tags in del.icio.us or citeulike and cringe because i can’t keep them all in my head. The interface for managing tags is a wreck. At least with personally created hierarchical tags, it’s easy to find the category. Alphabetical is not the answer. Here i run into the Elizabeth/Liz problem that i have on my phone (whereby folks like Liz Lawley are inconveniently entered twice because i forgot that i put her in originally as Elizabeth). The more tags i create, the less usable the system is because managing tags is a process in itself.

Given this, i’m not certain that tagging is the solution to a large corpus – i think it gets equally as messy as with ontology. I also think that the problems of unstable entities are still present in tagging, only exacerbated by internal confusion.

Print Friendly, PDF & Email

3 comments to random ontology thoughts

  • danah — thanks for this review. Your last sentence sums it up. The ontology/folksonomy discussions in Corante are getting rather tiresome.

  • Thanks to Google, people are becoming more and more used to “just typing stuff” and getting more or less what they want in return. The burden of classification has been shifting, not from heirarchies to flat namespaces, but from people to software. If a system has enough entities and enough tags, the software will get smart enough to figure out the synonyms and heirarchies. It will, however, take a little time.

  • I always try to leave a comment. Sometimes I might go to a blog and it’s all advertising and a product I’m not interested in, so I don’t say anything, but otherwise I try to leave a comment. If you don’t leave comments, how are you ever going to meet people? On my other blog, I’ve made some wonderful friends because of visits and commenting.