the biases of links

I have a hard time respecting anyone who believes that science or technology is neutral. Unfortunately, even when people consciously know that they are not, they give credence to the biased outputs without questioning the underlying assumptions. This is why i’m an academic – nothing gives me greater joy than to think about what biases go into the creation of a particular system.

After reminding folks at Blogher that there are gender differences in networking habits, i decided to do some investigation into the network structures of blogs. Kevin Marks of Technorati kindly gave me a random sample of 500 blogs to play with. I began coding them based on gender (which is surprisingly easy to do given the amount of personal information people put about themselves) and looking for patterns in links and blogrolls.

I decided to do the same for non-group blogs in the Technorati Top 100. I hadn’t looked at the Top 100 in a while and was floored to realize that most of those blogs are group blogs and/or professional blogs (with “editors” and clear financial backing). Most are covered in advertisements and other things meant to make them money. It’s very clear that their creators have worked hard to reach many eyes (for fame, power or money?).

Here are some of the patterns that i saw*:

Blogrolls:

  • All MSNSpaces users have a list of “Updated Spaces” that looks like a blogroll. It’s not. It’s a random list of 10 blogs on MSNSpaces that have been recently updated. As a result, without special code (like in Technorati), search engines get to see MSNSpace bloggers as connecting to lots of other blogs. This would create the impression of high network density between MSNSpaces which is inaccurate.
  • Few LiveJournals have a blogroll but almost all have a list of friends one click away. This is not considered by search tools that look only at the front page.
  • Bloggers who use hosting services tend to link to only others on the same hosting service (from the blogrolls on Xanga and Rakuten to the friend links on LJ). The blogroll structure on these is often set up to only accept lists of blogs from that service.
  • Blogrolls seem to be very common on politically-oriented blogs and always connect to blogs with similar political views (or to mainstream media).
  • Blogrolls by group blogging companies (like Weblogs, Inc.) always link to other blogs in the domain, using collective link power to help all.
  • A fraction of the Top 100 have blogrolls of blogs. Some have blogrolls that are a link away (like Crooked Timber). Quite a few use that space to advertise or link to mainstream media or companies.
  • Male bloggers who write about technology (particularly social software) seem to be the most likely to keep blogrolls. Their blogrolls tend be be dominantly male, even when few of the blogs they link to are about technology. I haven’t found one with >25% female bloggers (and most seem to be closer to 10%).
  • On LJ (even though it doesn’t count) and Xanga, there’s a gender division in blogrolls whereby female bloggers have mostly female “friends” and vice versa.
  • I was also fascinated that most of the mommy bloggers that i met at Blogher link to Dooce (in Top 100) but Dooce links to no one. This seems to be true of a lot of topical sites – there’s a consensus on who is in the “top” and everyone links to them but they link to no one.
  • I also get the impression that blogrolls are not frequently updated (although i have to imagine that the blogs one reads are). I wonder how static blogrolls are.

Linking patterns:

  • The Top 100 tend to link to mainstream media, companies or websites (like Wikipedia, IMDB) more than to other blogs (Boing Boing is an exception).
  • Blogs on blogging services rarely link to blogs in the posts (even when they are talking about other friends who are in their blogroll or friends’ list). It looks like there’s a gender split in tool use; Mena said that LJ is like 75% female, while Typepad and Moveable Type have far fewer women.
  • Bloggers often talk about other people without linking to their blog (as though the audience would know the blog based on the person). For example, a blogger might talk about Halley Suitt’s presence or comments at Blogher but never link to her. This is much rarer in the Top 100 who tend to link to people when they reference them.
  • Content type is correlated with link structure (personal blogs contain few links, politics blogs contain lots of links). There’s a gender split in content type.
  • When bloggers link to another blog, it is more likely to be same gender.

I began this investigation curious about gender differences. There are a few things that we know in social networks. First, our social networks are frequently split by gender (from childhood on). Second, men tend to have large numbers of weak ties and women tend to have fewer, but stronger ties. This means that in traditional social networks, men tend to know far more people but not nearly as intimately as those women know. (This is a huge advantage for men in professional spheres but tends to wreak havoc when social support becomes more necessary and is often attributed to depression later in life.)

While blog linking tends to be gender-dependent, the number of links seems to be primarily correlated with content type and service. Of course, since content type and service are correlated by gender, gender is likely a secondary effect.

Interestingly, there are distinct clusters of norms wrt linking in blogging, not a coherent and consistent one. The search engines (and the Technorati 100 and PubSub’s Daily 100 Top Links) are validating one of those clusters, regardless of whether or not that is what searchers are looking for. The Top 100 is a list of blogs who either fit into those norms or have adopted those norms in their patterns (most commonly the companies).

I also want to point out a few other issues in link biases that are relevant here:

  • All links are created equal. All relationships are not. Treating everything like a consistent weak tie is quantity over quality and in social networks, that means male over female.
  • When the data being measured has inconsistent structure rules, any ranking metric is inherently flawed. In blogs, there’s no consistency for what a link means, no consistent social norms for blogrolls, no agreed-upon links norms. Metrics inherently squish out this nuance and force all of the square pegs into the round holes.
  • Links indicate no weight, no valence, no attributes. I know Technorati has asked folks to indicate positive/negative in their links or to use nofollow, but few do this. And even if people did, that kind of articulation is a social disaster (::cough:: think Friendster).
  • Traditionally, there is power in keeping your black book shut; one’s position in a network can be quite powerful. You get kudos by helping two unconnected people. You can limit information flow and acquire credit when you take something from one group to another. (This is the basis for some interesting work on creativity – creativity is when bridges connect information from disparate worlds.) While some think that transparency is good, some hide their network to maintain power. For example, if as a blogger, you provide “cool links,” you want others to read you, not the collection of people you read. Of course, a reasonable counter argument is that this person is no longer needed as a bridge, but as a curator. Still, some people hide so that they must be asked for recommendations directly and thus can control who they send people to. (Note: this is a particular kind of power move; transparency can also be a power move by through gifting.)
  • There are social consequences to linking structures and those who have a lot of eyes on them are probably more aware of the consequences of their linking habits. This is another reason why people with a lot of eyes may get rid of blogrolls. Having to negotiate lots of requests for links can be a real turn-off.
  • People will try to manipulate any ranking if there is an advantage to being up top. Static measurement algorithms cause harm to the entire community that is being measured. Web search engines know this, but it’s equally critical for blog search.

These services are definitely measuring something but what they’re measuring is what their algorithms are designed to do, not necessarily influence or prestige or anything else. They’re very effectively measuring the available link structure. The difficulty is that there is nothing consistent whatsoever with that link structure. There are disparate norms, varied uses of links and linking artifacts controlled by external sources (like the hosting company). There is power in defining the norms, but one should question whether or companies or collectives should define them. By squishing everyone into the same rule set so that something can be measured, the people behind an algorithm are exerting authority and power, not of the collective, but of their biased view of what should be. This is inherently why there’s nothing neutral about an algorithm.

While i’ve been looking into the linking patterns, Mary Hodder has been thinking through new metrics for measurement. These are very important but not because one is better than the other. In fact, if we all switched to any of her metrics, we’d have just as many biases as we have now. And many of the Top blogs would try to figure out how to get rank in that system. The significance lies in the ability to offer choice.

Of course, choice is difficulty. Lots of people want to know what the “best” one is and don’t want to think about the metrics behind it (yes, these are the “neutral” people). Unfortunately, many of those types have a lot of power that motivate people to want their attention. The press want a list of the best and many bloggers want the attention of the press and thus want to be listed among the best. Breaking this cycle is virtually impossible, but it how power maintains power. And in our current system, we are doing a damn fine job of replicating the power structures that pervade everyday life under the auspices of creating a new system that usurps power. Ah, what fun.

Still, i think it’s critical to work on new metrics so that we can at least start showing alternate ways of organizing information if for no other reason than to push back against the conception of neutrality. And thus, i’m stoked to help Mary out and i would encourage everyone else interested in altering the power structure to do so as well.

At the least, i do think we need to really think about what is at stake and what we’re inadvertently supporting through our current systems. Are these the power structures that we want to maintain? Because there’s nothing neutral about our technological choices.

* Note: these are patterns, not findings. The methodology used here is not solid enough for findings. I am not offering quantitative data because i want it to be clear that these are trends based on tracking patterns. Think of them as guesstimated hypotheses (and i’d be ecstatic if someone would compute them).

Updated: Related Links

Note: i don’t agree with the points of all of the related posts but i do think they’re important to consider and i want to respond more broadly when i can. In the meantime, i figured that those interested in this post should know about them.

Print Friendly, PDF & Email

30 thoughts on “the biases of links

  1. mary hodder

    danah,
    Thanks.. these are great points. I think that whatever the metric, or group of metrics is, that is used to describe blogs, it will have bias. I hope we can get the most useful and least biased of these into the metric, and then use it knowing there is still bias is there. Thanks also for offering to help. I think it takes many perspective to achieve the goals on this one, because there are so many different ways of communicating using blog tools, so many kinds of communities, and of course as you say, great gender differences in the use of the tools.

    Also, note that the Top 100, and all link counts at technorati, are derived by counting each source that links to another blog, either in the blogroll or a post, as long as it sits on the front of a blog, counting once only per blog. Post links scroll off the front page as new posts are added, but blogroll links are there as long as the blog owner doesn’t remove them. So there is both a currency about counting post links but also a legacy factor to blogroll links that can last years. I’d would really be curious, if someone were to do the calculations for you, about where the bias is within each type of link count. PubSub’s daily 100 is strickly using the current links, and Technorati is mixing current post links and blogrolls. Would be really interesting to see what is there behind each, and then to see how that compares to Technorati’s blended version.

    Thanks,
    mary

  2. Blogging IT and EDucation

    the biases of links

    Many-to-Many: the biases of links, in particular the differences between male and female bloggers.This is a really interesting article (as is the ref to a bog about Blogher that Danah gives at the start.)It seems that different systems have differe…

  3. Licence to Roam

    Link Bias

    Danah Boyd has written about link bias using a random smaple. Although she says that her conclusions are not firm enough to be called findings, they provide good indicators into some online behaviours. Looking at the Technorati 100, many of…

  4. Shelley

    “I have a hard time respecting anyone who believes that science or technology is neutral.”

    Was this in reference to something somebody wrote?

  5. zephoria

    Shelley – it is in response to a few people, but you weren’t amongst them. I didn’t read your post until i received multiple copies after i posted this. That said, i am bothered by your post. Technology reflects the biases of all relevant participants (creators, users, etc). I don’t believe in framing technology good/evil moral structures because there’s no way to discuss those concepts without dealing in doctrine. That said, the implication you’re making by your title implies that technology is neutral (although the latter half of your post goes against that). Thus, i’m not sure how i feel about your post because it seems self-contradictory.

  6. 60k Marketing

    the biases of links

    danah boyd has published a thoughtful preliminary analysis on linking at both apophenia (link below) and many to many. Her point on the clusters of linking types quoted below has some interesting implications for the discussion of the tripartate blogo…

  7. mir

    Speaking poetically, or at least with a degree of creative license.

    It is nice to imagine ones blogroll as a little black book. BUt I feel like the functions are different.

    IMHO a blogroll is more akin to the signatures in your yearbook. It’s a measure of a certain kind of social success, and not reflective of ones real connections.

    Real blackbooks are sexy and powerful, because they are little palimpsests of true relationships.

    I used to love going through my dads lbb, which was about 20 years old to try to find the names he wouldn’t explain to me.. for whatever reason. I eventually got the story of every name but it took some prying.

    A blogroll is too public and too flat.

    Although probably the backstory of some blogroll connections would be interesting.

    ps: how was your math class? mine is turning out easier than I remembered.

  8. Brad DeLong

    Fame. Definitely fame. And through fame, power. It’s a play in the intellectual influence game. Either the stupidest or the most brilliant thing I’ve ever done.

    Brad DeLong #84

  9. ketsugi

    “I haven’t found one with >25% female bloggers”

    On my website’s sidebar blogroll, 5 out of 6 of my linked blogs are female bloggers. On the extended blogroll page, I’d hazard a guess of at least 50% (I’m not sure of the genders of some of the bloggers on my blogroll).

    And yes, I’m a male blogger. And a Computer Science undergraduate student.

  10. mark

    I think the effort to develop a single set of criteria — however painstakingly formulated — for ranking oand searching the entire “blogoshere” fails to take into account the vast differences between the categories of both content type and the “purpose” of individual blogs. That’s why the “metric” of the number of links pointing to a sight as a credible sign of a blog’s relevancy and content respectability is flawed from the outset. The fact that the most trafficed or referenced/linked to blogs are essentially edited digests of other blogs or a collection of links submitted by a blog’s readers, vs. a blog that consists entirely of originally generated content (by both the blogger and their participants)with germane links as reference, suggests that the thing we call blogging today (weren’t these called forums in the 90s?) is really just a new interaction platform that hasn’t evolved into its various niches yet. So many people have gotten into blogging simply because it is the easiest way to self-publish, not because they want to foster dialog or community and not because it is the most appropriate platform. And while attempts by the blog search engines to quantify and categorize the blogoshere are noble (though certainly not without vested interest), it’s like trying to standardize a search of a grocery store for “the best food.” There are a thousand ways to set criteria for such a search, all of which are accurate and none of which is all encompassing. As a nexus for a dedicated group of people focused on a specific issue or subject, blogs are fine. It is also an appropriate mechanism for those articulate, prolifigate and original thinkers who can consistently churn out provocative or entertaing content that will attract an audience and evoke intelligent discourse. But even for that, the format/UI/navigational conventions available through most blog hosting services are extremely elementary. If the purpose of a blog is to allow the communal mind to postulate, hone and showcase ever-evolving knowledge, and for blog search engines to accurately locate the knowledge that is most relevant to a specific search, why don’t blogs auto-create blogrolls that list the names/links of individuals whose contributions/posts have consistently been rated as the most thoughtful by the other users of that blog? Why don’t blogging platforms allow the blogger(or automatically)to bubble up posts (with contributor links)to the top page as a sort of “readers recommend” mini-feature? For blog search engines to be effective, these are the kinds of conventions that must be followed by blog sites,or they will never be able to make accurate suggestions and activity assessments. But the fact is, most bloggers are there to promote their own line of thinking and perspective, which includes links to their friends and like-minded sites (or as, zephoria points out, to sites that they hope will point back to their site and, thus, their personal perspective.) Most blogs are not really created with the idea of intellectual cross-pollenization . . . which is why most do not fall into a true “social networking” category and why, frankly, they don’t get any traffic. The key to mimimizing “link bias” in blog search/monitoring is to formalize a way that the content presented by a blog is more important than the number of links.

  11. My Thoughts on Changes

    Gender and linking habits….

    By way of Madame Levy I found an interesting article about linking habits of bloggers. It appears to be divided along gender lines with males overwhelmingly favoring male blog authors and females with the same trend towards their own gender….

  12. Sarah Allen's Weblog

    the value of conversation

    Mary Hodder (napsterization) proposes an alternate ranking system for blogs. The current ranking systems depend largely on inbound links and has the odd effect of making the popular bloggers even more popular, bringing the blogosphere ever closer to th…

  13. Napsterization

    More comments on…

    a community based algorithm and the attendant issues… Michael Frasse on Information authority and ranking: Hodder says, rightly, that the metric for assessing weight in the blogosphere should be open, not closed. “Bloggers should have input about t…

  14. Napsterization

    More comments on…

    a community based algorithm and the attendant issues… Michael Frasse on Information authority and ranking: Hodder says, rightly, that the metric for assessing weight in the blogosphere should be open, not closed. “Bloggers should have input about t…

  15. sgcbearcub

    Just a thought: you did touch on the fact that there is a gender split as well as a technological split in how people linked/blogrolled- and someone else commented on the possibility that norms of linking behaviour were networked based – in future metrics, you may want to consider how people learn. If they have no preexisting bias, A) they will be more likely to follow what they see, ie. their friends(LJ) or their competitors or professional graphic design background(editors for business blogs) unless what they are doing is significantly different to justify breaking from the ‘norm’. B) people can only do what they are motivated to learn. LJ cuts and links take time to locate and learn how to do. Someone who is not comfortable with technology or html/weblog is going to be less inclined to use those features unless there is significant social or risk/reward preassure to do so.

  16. Stan

    IMHO a blogroll is more akin to the signatures in your yearbook. It’s a measure of a certain kind of social success, and not reflective of ones real connections.

  17. Kare Anderson

    In keeping with the topic of your post I (a woman) discovered this post va Penelope Trunk and Bill Weil (a former classmate of yours and the son of a friend of mine)

    Working in male-dominated professions (when I worked in them) I’ve been a longtime, first-hand observer of male/female bonding and weak and strong ties and agree with your findings.

    This post could and should be part of the curriculum for courses in psychology, business or journalism. Thank you!

Comments are closed.