Planning the Dissertron

by Sebastian Benthall

In my PhD program, I’ve recently finished my coursework and am meant to start focusing on research for my dissertation. Maybe because of the hubbub around open access research, maybe because I still see myself as a ‘hacker’, maybe because it’s somehow recursively tied into my research agenda, or because I’m an open source dogmatic, I’ve been fantasizing about the tools and technology of publication that I want to work on my dissertation with.

For this project, which I call the Dissertron, I’ve got a loose bundle of requirements feature creeping its way into outer space:

  1. Incremental publishing of research and scholarship results openly to the web.
  2. Version control.
  3. Mathematical rendering a la LaTeX.
  4. Code highlighting a la the hacker blogs.
  5. In browser rendering of data visualizations with d3, where appropriate.
  6. Site is statically generated from elements on the file system, wherever possible.
  7. Machine readable metadata on the logical structure of the dissertation argument, which gets translated into static site navigation elements.
  8. Easily generated glossary with links for looking up difficult terms in-line (or maybe in-margin)
  9. A citation system that takes advantage of hyperlinking between resources wherever possible.
  10. Somehow, enable commenting. But more along the lines of marginalia comments (comments on particular lines or fragments of text) rather than blog comments. “Blog” style comments should be facilitated as notes on separately hosted dissertrons, or maybe a dissertron hub that aggregates and coordinates pollination of content between dissertrons.

This is a lot, and arguably just a huge distraction from working on my dissertation. However, it seems like this or something like it is a necessary next step in the advance of science and I don’t see how I really have much choice in the matter.

Unfortunately, I’m traveling, so I’m going to miss the PLOS workshop on Markdown for Science tomorrow. That’s really too bad, because Scholarly Markdown would get me maybe 50% of the way to what I want.

Right now the best tool chain I can imagine for this involves Scholarly Markdown, run using Pandoc, which I just now figured out is developed by a philosophy professor at Berkeley. Backing it by a Git repository would allow for incremental changes and version control.

Static site generation and hosting is a bit trickier. I feel like GitHub’s support of Jekyll make it a compelling choice, but hacking it to make it fit into the academic frame I’m thinking in might be more trouble than its worth. While it’s a bit of an oversimplification to say this, my impression is that at my university at least there is a growing movement to adopt Python as the programming language of choice for scientific computing. The exceptions seem to be people in the Computer Science department that are backing Scala.

(I like both languages and so can’t complain, except that it makes it harder to do interdisciplinary research if there is a technical barrier in their toolsets. As more of scientific research becomes automated, it is bound to get more crucial that scientific processes (broadly speaking) inter-operate. I’m incidentally excited to be working on these problems this summer for Berkeley’s new Social Science Data Lab. A lot of interesting architectural design is being masterminded by Aaron Culich, who manages the EECS department’s computing infrastructure. I’ve been meaning to blog about our last meeting for a while…but I digress)

Problem is, neither Python or Scala is Ruby, and Ruby is currently leading the game (in my estimate, somebody tell me if I’m wrong) in flexible and sexy smooth usable web design. And then there’s JavaScript, improbably leaking into the back end of the software stack after overflowing the client side.

So for the aspiring open access indie web hipster hacker science self-publisher, it’s hard to navigate the technical terrain. I’m tempted to string together my own rig depending mostly on Pandoc, but even that’s written in Haskell.

These implementation-level problems suggest that the problem needs to be pushed up a level of abstraction to the question of API and syntax standards around scientific web publishing. Scholarly Markdown can be a standard, hopefully with multiple implementations. Maybe there needs to be a standard around web citations as well (since in an open access world, we don’t need the same level of indirection between a document and the works it cites. Like blog posts, web publications can link to the content it derives from directly.)