Thoughts on APML

by Sebastian Benthall

The Attention Profile Markup Language (APML) is a neat idea–but it’s hard to pin down exactly what that idea is. This is how its creators describe it:

APML allows you to share your own personal Attention Profile in much the same way that OPML allows the exchange of reading lists between News Readers. The idea is to compress all forms of Attention Data into a portable file format containing a description of your ranked interests.

The comparison to OPML is useful. APML is supposed to be a standard XML format that makes it easy to transfer important information across web services.

The definition of APML in terms of Other Capitalized Jargon is much less useful, and I think points to a couple important flaws in the project’s philosophy.

On the one hand, it aims to be a format of compressed data–“Attention Data”–and especially data of the kind that can be easily collected from internet behavior. The APML FAQ indicates that there is all kinds of Attention Data– including clickstreams, bookmarks, and OPML-described feeds.

But the community emphatically denies that APML is just a data format. “APML is only interested in your Attention Profile,” which is defined as “a list of the topics and sources you are interested in, and a value representing your level of interest in them.”

There are two major problems I see regarding this plan.

This first problem is a confusion of data and hypothesis. The language that the project uses–talk about “compressing” Attention Data into an Attention Profile, or “deriving” the latter from the former–demonstrates a lack of awareness that a record of my use of web services just isn’t the same thing as a description of my interests. It’s a category mistake. Somebody trying to discover my interests from my web behavior is going to have to perform a difficult, essentially scientific feat of inference to get anything right. As far as I can tell, this problem hasn’t been discussed in the APML community.

The second problem is that nobody knows what a description of people’s interests looks like, and it will take a lot of hard, empirical work to figure that out. The topics I’m interested in, and how interested in them I am, and when I am interested in them, and how my interests are changing today, are all properties of my extremely complicated psychological makeup. A description of my interests is only going to be accurate to the extent that it is an approximate description of that psychology. But psychology of attention is hard, experimental, and unfinished work. You can’t just pull it out of a database.

The main issue here is that any format that attempts to capture an “Attention Profile” is going to build in implicit presuppositions about how interest works–in the head. And it could be very hard to get these right. E.g., a simple list of “concepts” mapped to numbers representing my level of interest–basically the core of the current APML spec, as far as I can tell–doesn’t allow for the representation of how my interest in one topic might change in the context of another. Users of APML will likely bump up against this constraint in the future.

My fear is that these concerns are far, far off the radar of the APML workgroup. For example, the original APML spec discriminates the “concepts” about which people are interested on the basis of a “key,” which is just a string (de facto in English), and in most cases a source, some particular web service that supplied the point of interest. All this is–pardon me–completely fucked.

From a psychological perspective, it conflates strings of text with concepts. Meanwhile–permit me to get off-topic for a second–from a “semantic web” perspective, it cripples machine-readability of the format. From a natural language processing perspective, it opens up huge problems for the disambiguation of what any of the “keys” are supposed to actually mean. From an internationalization perspective, it makes it impossible for Attention Profiles in different languages to communicate with each other. And from an open standards and portability perspective, it keeps the content balkanized into private silos. Even if the syntax is wide open and standard, its semantics will be left undetermined and vulnerable to enclosure.

The fact that there was so much resistance to the idea of distinguishing concepts by URI–the clear albeit only partial solution to a lot of these problems–is discouraging, to say the least. But it looks like the community is coming around.

None of this is meant to suggest that the APML isn’t a cool idea or one that should be worked on. There is a core to it which is an essential one for the future. Here, the perfect would be the mortal enemy of the good. But right now it appears to be a beast that feeds on the slick stuff of web marketing hype. If it were developed instead in coordination with the right psychological and AI communities, with a clearer vision, and away from what sounds like a profiteering drumbeat behind it, it could become a tremendous force in the world.