FOAF,, and "The Net" (the Real One, the One We All Grew Up Reading About)

7 April, 2005

This post is a little different from others I've made. It is still an overly long and maybe less than precisely clear exposition on a technical issue of limited interest on which I have minimal expertise, so there's no reason to really worry, but the different thing about it is that it's not new. It comes from an email I wrote a while back to Brain Kuhn in the wake of the discussion that arose from my first post on Rory's blog. Chris and I were talking about some of this stuff last night so I forwarded the email to him. He recommended that I post it, so here it is, slightly edited to add in some necessary context. Enjoy!

One of the things that needs to happen for the web to become more a part of "real life" is that online communities need to have some of the advantages of real communities. I remember, as an early- and pre-teen, reading sci-fi books like Orson Scott Card's Ender's Game where the public sphere of the world would take place on "the net". Characters would have online personas that would, in turn, have political rights, be able to make speeches in public forums, that would become more or less famous, that would, in short, have complete public identities. Right now it can be said of certain internet "celebrities" that they have an identity on the web (Rory is an example of this within a limited community, Cory, within a larger one). They have a homepage, when they comment elsewhere, people know it's them, they have jobs in the physical world that are related to their online identities, etc. The system for online identity that I'm trying to imagine would enable this for everyone and, hopefully, in a way that is not based on celebrity as the cost of entry (as if we didn't place enough importance on that already).

Before moving forward, let me give you a miniature example of what I'm talking about. I'm sure there are all kinds of technical problems with what I'm going to say, but I'm not interested in those, I'm just trying to illustrate a concept by reference to an existing technology (that I happen to only understand in its barest outlines). The example is: FOAF (Friend Of A Friend).

Here is a quick link with a good description and some technical examples that I didn't completely understand: it's from the IBM developers' network.

The point of FOAF is to create something like an open Friendster. The way it works is that individuals create their own XML files in which they describe themselves via fixed unique reference points (their email addresses) and, more importantly, their relationships to other people. Then concatenating these various XML files allows a service to describe a social network (if Billy is friends with Sally and Sally is Friends with Joey, then maybe Billy is friends with Joey). The idea is that you could create your own self-describing XML file and then just point a service like Friendster to it so you could withdraw it if you felt it was necessary. You wouldn't be uploading a batch of your personal data to them that they would then own (and be free to do whatever they want with including charge you for, sell to the Nazis, or accidentally lose). Also if you didn't like a Friendster policy (like firing an employee for blogging) you could withdraw your file in protest.

Taking it one step further, it seems to me that if a trustworthy third party could verify a relationship (or, better, that there could be some kind of open protocol for this), by checking the XML files of both parties in a claimed relationship (Joey claims to be Sally's friend and Sally claims to be Joey's friend) then, we could start to embed public web identities in a network of trust that would establish identities beyond a reasonable doubt without reference to anything like credit card numbers or social security ids. (I think there are significant social benefits to reducing the dependence of digital identity security on these types of government and corporate-issued markers, but I won't outline those because I want to try to keep my focus here).

While the Friendster example of this has somewhat limited scope imagine the same thing applied to Google. Steven Mallett wrote a mobilizing rant on this subject. And then founded a group to do something about it: Data Libre. These areas are where data-decentralization of this type really becomes radical.

Now, here I want to do a little bit of a jump cut to talk about folksonomies before coming back around, hopefully, to link the two areas together.

All of a sudden, seemingly out of nowhere in the last couple of months, appeared with the concept of tagging (or at least that's how it looks from my a-historical seat as a recent arrival to these kinds of issues). Before we get too far, we need a definition of tagging. Here's mine:

Define: tagging. noun. the act whereby users mark certain websites with various words for a variety of purposes including their own future reference and public consumption. The idea being that the convenience to the user drives the action of tagging and then great social utility is derived from combining all the results. See also: folksonomy.

(A great discussion of the comparative merits of folksonomy v. taxonomy (the traditional library way) is taking place on Many-2-Many)

So, the juice for us here is that, just maybe, this solves the problem of credibility. On a web filled with, say, a thousand times the data, Google becomes less useful. Especially because the more "amateurs" are empowered to create data, the more the results for the most popular search terms become useless (I had a link to a more authoritative source on this, but I can't find it right now -- why didn't I tag it!?): when you search for Iraq, do you want to get the NYT or some Portlander sitting in the Red and Black cafe blogging a latte-drink-in? Now, if your search was based on tags created by actual people, and organized by what results were tagged with your term by the most people then this problem would be at least reduced (ironically, doesn't anywhere allow for sorting of results by most popular; I think they've got some kind of ideology about causing churn, with most recent results always at the top and their deep commitment to RSS-ing everywhere they can, rather than re-enforcing popular links by letting them rise to the top). places their emphasis on another way of dealing with this problem, which is through filtering. You have an inbox there where you can subscribe to other people's tags, essentially indicating them as trusted filters of the web. You can also, through RSS, subscribe to any at all on So you can get a feed of (all of the things I tag with "useful" and "web", which, since I mentioned it, is a tag combination I use to mark sites I find that relate to the kind of issues we're talking about here. That page has links to all/most of my online reading on the subject so far).

Now, I'd finally like to try and link my two areas (FOAF and tagging) together. I don't exactly know how to do this yet, but I feel like they could be related very powerfully to try to solve the issue of trusted content. Some kind of method for having the trustworthiness of a tagger (determined via their FOAF xml file) weight their tags in search results. So, if you have verified links with a lot of people as being a trustworthy tagger then your tags will play a greater role in determining tag-based search results. This is basically a way of enacting the importance of experts in giving authority to information without creating barriers to entry for becoming an expert (no degrees, no learning complicated taxonomies). This would also deter spammers, etc., since their FOAF would reflect their extremely untrustworthy status and so their tags wouldn't effect search results (you could, potentially, even set up something like a do-not-call list, where you could choose to specifically eliminate all sites tagged by a particular user from your results, or set some kind of floor where only people with a certain level of trustworthiness could contribute to your results; you could range this level anywhere from only your direct friends to just high enough to try to keep out bots).

Tagged: , , , , , ,