Digifesto

Tag: dissertron

Reinventing wheels with Dissertron

I’ve found a vehicle for working on the Dissertron through the website of the course I’ll be co-teaching this Fall on Open Collaboration and Peer Production.

In the end, I went with Pelican, not Hyde. It was a difficult decision (because I did like the idea of supporting an open source project coming out of Chennai, especially after reading Coding Places. But I had it on good authority that Pelican was more featureful with cleaner code. So here I am.

The features I have in mind are crystalizing as I explore the landscape of existing tools more. This is my new list:

  • Automatically include academic metadata on each Dissertron page so it’s easy to slurp it into Zotero.
  • Include the Hypothes.is widget for annotations. I think Hypothes.is will be better for commenting that Disqus because it does annotations in-line, as opposed to comments in the footer. It also uses the emerging W3C Open Annotation standard. I’d like this to be as standards based as possible.
  • Use citeproc-js to render citations in the browser cleanly. I think this handles the issue of in-line linked academic citations without requiring a lot of manual work. The citeproc-js looks like it’s come out of Zotero as well. Since Elsevier bought Mendeley, Zotero seems like the more reliable ally to pick for independent scholarship.
  • Trickiest is going to be porting a lot of features from jekyll-scholar into a Pelican plug-in. I really want jekyll-scholar‘s bibliographic management. But I’m a little worried that Pelican isn’t well-designed for that sort of flexibility in theming. More soon.
  • I’m interested in trying to get the HTML output of Dissertron as close as possible to emerging de facto standards on what on-line scholarship is like. I’ve asked about what PLOS ONE does about this. The answer sounds way complicated: a tool chain the goes from Latex to Word Docs to NLM 3.0 XML (which I didn’t even know was a thing), and at last into HTML. I’m trying to start from Markdown because I think it’s a simple markup language for the future, but I’m not deep enough in that tool chain to understand how to replicate its idiosyncracies.

If I could have all these nice things, and maybe a pony, then I would be happy and have no more excuses for not actually doing research, as opposed to obsessing about the tooling around independent publishing.

Dissertron build notes

I’m going to start building the Dissertron now. These are my notes.

  • I’m going with Hyde as a static site generator on Nick Doty‘s recommendation. It appears to be tracking Jekyll in terms of features, but squares better with my Python/Django background (it uses Jinja2 templates in its current, possibly-1.0-but-under-development version). Meanwhile, at Berkeley we seem to be investing a lot in Python as the language of scientific computing. If scientists skills should be transferrable to their publication tool, this seems like the way to go.
  • Documentation for Hyde is a bit scattered. This first steps guide is sort of helpful, and then there are these docs hosted on Github. As mentioned, they’ve moved away from Django templates to Jinja2, which is similar but less idiosyncratic. They refer you to the Jinja2 docs here for templating.
  • Just trying to make a Hello World type site, I ran into an issue with Markdown rendering. I’ve filed an issue with the project, and will use it as a test of the community’s responsiveness. Since Hyde is competing with a lot of other Python static site generators, it’s kind of nice to bump into this kind of thing early.
  • Got this response from the creator of Hyde in less than 3 hours. Problem was with my Jinja2 fu (which is weak at the moment)–turns out I have a lot to learn about Whitespace Control. Super positive community experience. I’ll stick with Hyde.
  • “Hello World” intact and framework chosen, my next step is to convert part 2 of my Weird Twitter work to Markdown and use Hyde’s tools to give it some decent layout. If I can make some headway on the citation formating and management in the process, so much the better.

Complications in Scholarly Hypertext

I’ve got a lot of questions about on-line academic publishing. A lot of this comes from career anxiety: I am not a very good academic because I don’t know how to write for academic conferences and journals. But I’m also coming from an industry that is totally eating the academy’s lunch when it comes to innovating and disseminating information. People within academia are increasingly feeling the disruptive pressure of alternative publication venues and formats, and moreover seeing the need for alternatives for the sake of the intellectual integrity of the whole enterprise. Open science, open data, reproducible research–these are keywords for new practices that are meant to restore confidence in science itself, in part by making it more accessible.

One manifestation of this trend is the transition of academic group blogs into academic quasi-journals or on-line magazines. I don’t know how common this is, but I recently had a fantastic experience of this writing for Ethnography Matters. Instead of going through an opaque and problematic academic review process, I worked with editor Rachelle Annechino to craft a piece about Weird Twitter that was appropriate for the edition and audience.

During the editing process, I tried to unload everything I had to say about Weird Twitter so that I could at last get past it. I don’t consider myself an ethnographer and I don’t want to write my dissertation of Weird Twitter. But Rachelle encouraged me to split off the pseudo-ethnographic section into a separate post, since the first half was more consistent with the Virtual Identity edition. (Interesting how the word “edition”, which has come to mean “all the copies of a specific issue of a newspaper”, in the digital context returns to its etymological roots as simply something published or produced (past participle)).

Which means I’m still left with the (impossible) task of doing an ethnography (something I’m not very well trained for) about Weird Twitter (which might not exist). Since I don’t want to violate the contextual integrity of Weird Twitter more than I already have, I’m reluctant to write about it in a non-Web-based medium.

This carries with it a number of challenges, not least of which is the reception on Twitter itself.

What my thesaurus and I do in the privacy of our home is our business and anyway entirely legal in the state of California. But I’ve come to realize that forced disclosure is an occupational hazard I need to learn to accept. What these remarks point to, though, is the tension between access to documents as data and access to documents as sources of information. The latter, as we know from Claude Shannon, requires an interpreter who can decode the language in which the information is written.

Expert language is a prison for knowledge and understanding. A prison for intellectually significant relationships. It is time to move beyond the institutional practices of triviledge

– Taylor and Saarinen, 1994, quoted in Kolb, 1997

Is it possible to get away from expert language in scholarly writing? Naively, one could ask experts to write everything “in plain English.” But that doesn’t do language justice: often (though certainly not always) new words express new concepts. Using a technical vocabulary fluently requires not just a thesaurus, but an actual understanding of the technical domain. I’ve been through the phase myself in which I thought I knew everything and so blamed anything written opaquely to me on obscurantism. Now I’m humbler and harder to understand.

What is so promising about hypertext as a scholarly medium is that it offers a solution to this problem. Wikipedia is successful because it directly links jargon to further content that explains it. Those with the necessary expertise to read something can get the intended meaning out of an article, and those that are confused by terminology can romp around learning things. Maybe they will come back to the original article later with an expanded understanding.

xkcd: The Problem with Wikipedia

Hypertext and hypertext-based reading practices are valuable for making ones work open and accessible. But it’s not clear how to combine these with scholarly conventions on referencing and citations. Just to take Ethnography Matters as an example, for my article I used in-line linking and where I got to it parenthetical bibliographic information. Contrast with Heather Ford’s article in the same edition, which has no links and a section at the end for academic references. The APA has rules for citing web resources within an academic paper. What’s not clear is how directly linking citations within an academic hypertext document should work.

One reason for lack of consensus around this issue is that citation formatting is a pain in the butt. For off-line documents, word processing software has provided myriad tools for streamlining bibliographic work. But for publishing academic work on the web, we write in markup languages or WYSIWIG editors.

Since standards on the web tend to evolve through “rough consensus and running code”, I expect we’ll see a standard for this sort of thing emerge when somebody builds a tool that makes it easy for them to follow. This leads me back to fantasizing about the Dissertron. This is a bit disturbing. As much as I’d like to get away from studying Weird Twitter, I see now that a Weird Twitter ethnography is the perfect test-bed for such a tool precisely because of the hostile scrutiny it would attract.

Planning the Dissertron

In my PhD program, I’ve recently finished my coursework and am meant to start focusing on research for my dissertation. Maybe because of the hubbub around open access research, maybe because I still see myself as a ‘hacker’, maybe because it’s somehow recursively tied into my research agenda, or because I’m an open source dogmatic, I’ve been fantasizing about the tools and technology of publication that I want to work on my dissertation with.

For this project, which I call the Dissertron, I’ve got a loose bundle of requirements feature creeping its way into outer space:

  1. Incremental publishing of research and scholarship results openly to the web.
  2. Version control.
  3. Mathematical rendering a la LaTeX.
  4. Code highlighting a la the hacker blogs.
  5. In browser rendering of data visualizations with d3, where appropriate.
  6. Site is statically generated from elements on the file system, wherever possible.
  7. Machine readable metadata on the logical structure of the dissertation argument, which gets translated into static site navigation elements.
  8. Easily generated glossary with links for looking up difficult terms in-line (or maybe in-margin)
  9. A citation system that takes advantage of hyperlinking between resources wherever possible.
  10. Somehow, enable commenting. But more along the lines of marginalia comments (comments on particular lines or fragments of text) rather than blog comments. “Blog” style comments should be facilitated as notes on separately hosted dissertrons, or maybe a dissertron hub that aggregates and coordinates pollination of content between dissertrons.

This is a lot, and arguably just a huge distraction from working on my dissertation. However, it seems like this or something like it is a necessary next step in the advance of science and I don’t see how I really have much choice in the matter.

Unfortunately, I’m traveling, so I’m going to miss the PLOS workshop on Markdown for Science tomorrow. That’s really too bad, because Scholarly Markdown would get me maybe 50% of the way to what I want.

Right now the best tool chain I can imagine for this involves Scholarly Markdown, run using Pandoc, which I just now figured out is developed by a philosophy professor at Berkeley. Backing it by a Git repository would allow for incremental changes and version control.

Static site generation and hosting is a bit trickier. I feel like GitHub’s support of Jekyll make it a compelling choice, but hacking it to make it fit into the academic frame I’m thinking in might be more trouble than its worth. While it’s a bit of an oversimplification to say this, my impression is that at my university at least there is a growing movement to adopt Python as the programming language of choice for scientific computing. The exceptions seem to be people in the Computer Science department that are backing Scala.

(I like both languages and so can’t complain, except that it makes it harder to do interdisciplinary research if there is a technical barrier in their toolsets. As more of scientific research becomes automated, it is bound to get more crucial that scientific processes (broadly speaking) inter-operate. I’m incidentally excited to be working on these problems this summer for Berkeley’s new Social Science Data Lab. A lot of interesting architectural design is being masterminded by Aaron Culich, who manages the EECS department’s computing infrastructure. I’ve been meaning to blog about our last meeting for a while…but I digress)

Problem is, neither Python or Scala is Ruby, and Ruby is currently leading the game (in my estimate, somebody tell me if I’m wrong) in flexible and sexy smooth usable web design. And then there’s JavaScript, improbably leaking into the back end of the software stack after overflowing the client side.

So for the aspiring open access indie web hipster hacker science self-publisher, it’s hard to navigate the technical terrain. I’m tempted to string together my own rig depending mostly on Pandoc, but even that’s written in Haskell.

These implementation-level problems suggest that the problem needs to be pushed up a level of abstraction to the question of API and syntax standards around scientific web publishing. Scholarly Markdown can be a standard, hopefully with multiple implementations. Maybe there needs to be a standard around web citations as well (since in an open access world, we don’t need the same level of indirection between a document and the works it cites. Like blog posts, web publications can link to the content it derives from directly.)