What I’m working on

by Sebastian Benthall

I’ve been a bit scatterbrained this semester and distacted by quite a bit of personal stuff and art/hobby stuff. But I need to reorient myself to my actual work. Unfortunately, I keep running into walls trying to configure emacs with the Solarized theme, which is naturally a prerequisite to doing my statistics homework. So instead of working, I’m going to write a blog post about what I’m working on.

Dissent networks. I’m working with Yahel Ben-David and Giulia Fanti to design an anonymized microblogging service that could work over a delay-tolerant network of wirelessly connecting smart phones in case of Internet blackout, due to natural disaster or government intervention. The idea is that as far as effective anti-censorship technology goes, soft rollout of applications on existing mobile hardware is much more feasible than hard rollout of new mesh networking hardware. And ad hoc mesh networks have serious performance issues at scale, which may in theory be circumvented (or, assumed away?) in a delay tolerant network. We are making a lot of progress on the message prioritization model but there may be some scaling problems hiding behind the curtain…
GeoNode interviews. I’m planning on doing a number of interviews of members of the GeoNode community and related activity so I can write up a report of the project for an academic audience. There are a lot of open questions in ICTD that GeoNode addresses as a case study, and in the process I hope to be able to give back to the community somehow through the research. I’m excited to catch up with old colleagues and learn more about how the project has progressed since I started school. Because of some logistic contingencies at the same time I’m looking into the ideology of the Singularity Institute. It provides an interesting contrast in terms of internet eschatology. I’m not sure how much I want to get into web utopianism in the write-up of this; I think I’d prefer to focus on practical generalizations. But since the purported generalizations of GeoNode are so often utopian (not necessarily in an unattainable way–I’m still a believer) I’m not sure I can dodge the issue.
Figuring out what information is. Maybe because I’m a painfully literal person, it bothers me that I’m at a School of Information where nobody thinks there’s a usable definition of “information”. It’s especially bothersome because there are some really good theories of information out there, including the mathematical ones (originating in Claude Shannon but evolving through Solomonoff, Kolmogorov, Bennett…) and some philosophers (Dretske, Floridi). The biggest barrier to adoption of these definitions in my corner of the world seems to be convincing the social scientists that these concepts are relevant to their work. I think I’ve got a pretty compelling argument in the works though. It takes Andy diSessa’s theory of constructivist epistemology to show that the “phenomenological blending” that leads Nunberg to dismiss the term ‘information’ along with the ‘Information Age’ is actually characteristic of scientific understanding generally (e.g. in learning physics). So I think basically I need to figure out the underlying phenomenological components of people’s naive intuitive theories of information and try to develop a curriculum that reformulates them into the more formal one. I think that if I could pull that off it would make the social scientific and “big data” analytics worlds more comprehensible to each other, maybe.
Bounded symbolic networks. With Dave Tomcik I’ve been working on a paper to try to synthesize Anthony Cohen’s theory of the symbolically constructed community with the kinds of social network theories that are applicable to online social media. This was the work that inspired some of the Weird Twitter performance/experiment/fiasco. It’s funny that that experience shed so much light on the subject for me, but I think I’m going to leave all those insights as out of scope for the paper. It already looks like it may be a stretch to move from Cohen’s sense of ‘symbol’ to the kind of symbol used in digital communication but I’m going to give it a shot. I’m less confident about where any of this goes since I’ve learned since starting the projec that there are so many other contesting theories of community, but I still think its worth trying to combine a couple of them into a single model since so many communities are digitally constituted and in principle algorithmically detectable.
I’ve got to come up with a project for my statistical learning theory course fast.

Currently backburnered but hopefully getting back to work on next semester:

Deception detection on Twitter. With Alex Kantchellian, using LDA to detect ‘deceptive’ tweets (tweets with links to content that aren’t what they say they are about–think rickrolling) and using deception as an indicator for spam detection. Preliminary results good, just need to check it against a larger data set.
Strategic Bayes Nets. I need to rework the paper I did on computational asymmetry and Strategic Bayes Nets with John Chuang and submit it somewhere. After my statistical learning theory course, I’m better prepared to think through the engineering implications of the model. And an economist friend, Gavin McCormick, has taken an interest in applying the model in some behavior economics contexts, which would be sweet.

For some reason I keep getting distracted from the two things I imagine myself really focusing on, which are:

Automated argument analysis of mailing list discussions. Using Bluestocking to analyze argumentation on mailing list (starting with open source projects because it’s where my background is and where there’s a solid benefit to industry, but hopefully generalizing to web standards bodies and collaborative decision making more generally). Idea is to try to build tools that help facilitate consensus or disseminate consensus-building. I think this could be really hard but really awesome if I can pull it off.
Dissertationtron. In my imagination, in the process of writing my dissertation I build a really thin CMS that allows me to logically organize my content, notes, and references while keeping everything open to the web. That way I can publish the content in increments and gather user feedback. Also, I anticipate wanting to build dynamic demos of the concepts and tools used in the process of the dissertation research and making them available through the web. Naturally the whole thing would be backed by a git repository. Ideally I would be running it through the same automated argumentation analysis from the previous bullet point as a way of doing automated tests on my own logic, though to get that to work that may involve some crippling limitations on my vocabulary use and sentence structure while writing.

At this stage in the process, I’m having a hard time distinguishing between pipe dreams, productive work, and academic yak shaving. Interestingly, because there are so many different academic communities to pick from in terms of audience, and so few constraints coming from my department, the real problem (in terms of pulling a dissertation together) isn’t getting work done but in the integration of disparate components. That, and the problem that since I don’t know what direction I’m going in, I don’t have any guarantee that where I end up will be anything anyone will want to hire into an academic setting. Fingers crossed, industry will continue to be an option. I guess it’s good I’m comfortable with bewilderment.

Digifesto