Digifesto

Dissertron build notes

I’m going to start building the Dissertron now. These are my notes.

  • I’m going with Hyde as a static site generator on Nick Doty‘s recommendation. It appears to be tracking Jekyll in terms of features, but squares better with my Python/Django background (it uses Jinja2 templates in its current, possibly-1.0-but-under-development version). Meanwhile, at Berkeley we seem to be investing a lot in Python as the language of scientific computing. If scientists skills should be transferrable to their publication tool, this seems like the way to go.
  • Documentation for Hyde is a bit scattered. This first steps guide is sort of helpful, and then there are these docs hosted on Github. As mentioned, they’ve moved away from Django templates to Jinja2, which is similar but less idiosyncratic. They refer you to the Jinja2 docs here for templating.
  • Just trying to make a Hello World type site, I ran into an issue with Markdown rendering. I’ve filed an issue with the project, and will use it as a test of the community’s responsiveness. Since Hyde is competing with a lot of other Python static site generators, it’s kind of nice to bump into this kind of thing early.
  • Got this response from the creator of Hyde in less than 3 hours. Problem was with my Jinja2 fu (which is weak at the moment)–turns out I have a lot to learn about Whitespace Control. Super positive community experience. I’ll stick with Hyde.
  • “Hello World” intact and framework chosen, my next step is to convert part 2 of my Weird Twitter work to Markdown and use Hyde’s tools to give it some decent layout. If I can make some headway on the citation formating and management in the process, so much the better.

Complications in Scholarly Hypertext

I’ve got a lot of questions about on-line academic publishing. A lot of this comes from career anxiety: I am not a very good academic because I don’t know how to write for academic conferences and journals. But I’m also coming from an industry that is totally eating the academy’s lunch when it comes to innovating and disseminating information. People within academia are increasingly feeling the disruptive pressure of alternative publication venues and formats, and moreover seeing the need for alternatives for the sake of the intellectual integrity of the whole enterprise. Open science, open data, reproducible research–these are keywords for new practices that are meant to restore confidence in science itself, in part by making it more accessible.

One manifestation of this trend is the transition of academic group blogs into academic quasi-journals or on-line magazines. I don’t know how common this is, but I recently had a fantastic experience of this writing for Ethnography Matters. Instead of going through an opaque and problematic academic review process, I worked with editor Rachelle Annechino to craft a piece about Weird Twitter that was appropriate for the edition and audience.

During the editing process, I tried to unload everything I had to say about Weird Twitter so that I could at last get past it. I don’t consider myself an ethnographer and I don’t want to write my dissertation of Weird Twitter. But Rachelle encouraged me to split off the pseudo-ethnographic section into a separate post, since the first half was more consistent with the Virtual Identity edition. (Interesting how the word “edition”, which has come to mean “all the copies of a specific issue of a newspaper”, in the digital context returns to its etymological roots as simply something published or produced (past participle)).

Which means I’m still left with the (impossible) task of doing an ethnography (something I’m not very well trained for) about Weird Twitter (which might not exist). Since I don’t want to violate the contextual integrity of Weird Twitter more than I already have, I’m reluctant to write about it in a non-Web-based medium.

This carries with it a number of challenges, not least of which is the reception on Twitter itself.

What my thesaurus and I do in the privacy of our home is our business and anyway entirely legal in the state of California. But I’ve come to realize that forced disclosure is an occupational hazard I need to learn to accept. What these remarks point to, though, is the tension between access to documents as data and access to documents as sources of information. The latter, as we know from Claude Shannon, requires an interpreter who can decode the language in which the information is written.

Expert language is a prison for knowledge and understanding. A prison for intellectually significant relationships. It is time to move beyond the institutional practices of triviledge

– Taylor and Saarinen, 1994, quoted in Kolb, 1997

Is it possible to get away from expert language in scholarly writing? Naively, one could ask experts to write everything “in plain English.” But that doesn’t do language justice: often (though certainly not always) new words express new concepts. Using a technical vocabulary fluently requires not just a thesaurus, but an actual understanding of the technical domain. I’ve been through the phase myself in which I thought I knew everything and so blamed anything written opaquely to me on obscurantism. Now I’m humbler and harder to understand.

What is so promising about hypertext as a scholarly medium is that it offers a solution to this problem. Wikipedia is successful because it directly links jargon to further content that explains it. Those with the necessary expertise to read something can get the intended meaning out of an article, and those that are confused by terminology can romp around learning things. Maybe they will come back to the original article later with an expanded understanding.

xkcd: The Problem with Wikipedia

Hypertext and hypertext-based reading practices are valuable for making ones work open and accessible. But it’s not clear how to combine these with scholarly conventions on referencing and citations. Just to take Ethnography Matters as an example, for my article I used in-line linking and where I got to it parenthetical bibliographic information. Contrast with Heather Ford’s article in the same edition, which has no links and a section at the end for academic references. The APA has rules for citing web resources within an academic paper. What’s not clear is how directly linking citations within an academic hypertext document should work.

One reason for lack of consensus around this issue is that citation formatting is a pain in the butt. For off-line documents, word processing software has provided myriad tools for streamlining bibliographic work. But for publishing academic work on the web, we write in markup languages or WYSIWIG editors.

Since standards on the web tend to evolve through “rough consensus and running code”, I expect we’ll see a standard for this sort of thing emerge when somebody builds a tool that makes it easy for them to follow. This leads me back to fantasizing about the Dissertron. This is a bit disturbing. As much as I’d like to get away from studying Weird Twitter, I see now that a Weird Twitter ethnography is the perfect test-bed for such a tool precisely because of the hostile scrutiny it would attract.

Aristotelian legislation and the virtual community

I dipped into Aristotle’s Politics today and was intrigued by William Ellis’ introduction.

Ellis claims that in Aristotle’s day, you would call on a legislator as an external consultant when you set about founding a new city or colony. There were a great variety of constitutions available to be studied. You would study them to become an expert in how to design a community’s laws. Classical political philosophy was part of the very real project of starting new communities supporting human flourishing.

We see a similar situation with on-line communities today. If cyberspace was an electronic frontier, it’s been bulldozed and is now a metropolis with suburbs and strip malls. But there is still innovation in on-line social life as social media infrastructure as users migrate between social networking services.

If Lessig is right and “code is law“, then the variety of virtual communities and the opportunity to found new ones renews the role of the Aristotelian legislator. We can ask questions like: should an on-line community be self-governing or run by an aristocracy? How can it sustain itself economically, or defend itself in (cyber-)wars? How can it best promote human flourishing? The arts? Justice?

It would be easy to trivialize these possibilities by noting that virtual life is not real life. But that would underestimate the shift that is occurring as economic and political engagement moves on-line. In recognition and anticipation of these changes, philosophy has a practical significance in comprehensive design.

Planning the Dissertron

In my PhD program, I’ve recently finished my coursework and am meant to start focusing on research for my dissertation. Maybe because of the hubbub around open access research, maybe because I still see myself as a ‘hacker’, maybe because it’s somehow recursively tied into my research agenda, or because I’m an open source dogmatic, I’ve been fantasizing about the tools and technology of publication that I want to work on my dissertation with.

For this project, which I call the Dissertron, I’ve got a loose bundle of requirements feature creeping its way into outer space:

  1. Incremental publishing of research and scholarship results openly to the web.
  2. Version control.
  3. Mathematical rendering a la LaTeX.
  4. Code highlighting a la the hacker blogs.
  5. In browser rendering of data visualizations with d3, where appropriate.
  6. Site is statically generated from elements on the file system, wherever possible.
  7. Machine readable metadata on the logical structure of the dissertation argument, which gets translated into static site navigation elements.
  8. Easily generated glossary with links for looking up difficult terms in-line (or maybe in-margin)
  9. A citation system that takes advantage of hyperlinking between resources wherever possible.
  10. Somehow, enable commenting. But more along the lines of marginalia comments (comments on particular lines or fragments of text) rather than blog comments. “Blog” style comments should be facilitated as notes on separately hosted dissertrons, or maybe a dissertron hub that aggregates and coordinates pollination of content between dissertrons.

This is a lot, and arguably just a huge distraction from working on my dissertation. However, it seems like this or something like it is a necessary next step in the advance of science and I don’t see how I really have much choice in the matter.

Unfortunately, I’m traveling, so I’m going to miss the PLOS workshop on Markdown for Science tomorrow. That’s really too bad, because Scholarly Markdown would get me maybe 50% of the way to what I want.

Right now the best tool chain I can imagine for this involves Scholarly Markdown, run using Pandoc, which I just now figured out is developed by a philosophy professor at Berkeley. Backing it by a Git repository would allow for incremental changes and version control.

Static site generation and hosting is a bit trickier. I feel like GitHub’s support of Jekyll make it a compelling choice, but hacking it to make it fit into the academic frame I’m thinking in might be more trouble than its worth. While it’s a bit of an oversimplification to say this, my impression is that at my university at least there is a growing movement to adopt Python as the programming language of choice for scientific computing. The exceptions seem to be people in the Computer Science department that are backing Scala.

(I like both languages and so can’t complain, except that it makes it harder to do interdisciplinary research if there is a technical barrier in their toolsets. As more of scientific research becomes automated, it is bound to get more crucial that scientific processes (broadly speaking) inter-operate. I’m incidentally excited to be working on these problems this summer for Berkeley’s new Social Science Data Lab. A lot of interesting architectural design is being masterminded by Aaron Culich, who manages the EECS department’s computing infrastructure. I’ve been meaning to blog about our last meeting for a while…but I digress)

Problem is, neither Python or Scala is Ruby, and Ruby is currently leading the game (in my estimate, somebody tell me if I’m wrong) in flexible and sexy smooth usable web design. And then there’s JavaScript, improbably leaking into the back end of the software stack after overflowing the client side.

So for the aspiring open access indie web hipster hacker science self-publisher, it’s hard to navigate the technical terrain. I’m tempted to string together my own rig depending mostly on Pandoc, but even that’s written in Haskell.

These implementation-level problems suggest that the problem needs to be pushed up a level of abstraction to the question of API and syntax standards around scientific web publishing. Scholarly Markdown can be a standard, hopefully with multiple implementations. Maybe there needs to be a standard around web citations as well (since in an open access world, we don’t need the same level of indirection between a document and the works it cites. Like blog posts, web publications can link to the content it derives from directly.)

POSSE homework: how to contribute to FOSS without coding

One of the assignments for the POSSE workshop is the question of how to contribute to FOSS when you aren’t a coder.

I find this an especially interesting topic because I think there’s a broader political significance to FOSS, but those that see FOSS as merely the domain of esoteric engineers can sometimes be a little freaked out by this idea. It also involves broader theoretical questions about whether or how open source jives with participatory design.

In fact, they have compiled a list of lists of ways to contribute to FOSS without coding: this, this, this, and this are provided in the POSSE syllabus.

Turning our attention from the question in the abstract, we’re meant to think about it in the context of our particular practices.

For our humanitarian FOSS project of choice, how are we interested in contributing? I’m fairly focused in my interests on open source participation these days: I’m very interested in the problem of community metrics and especially how innovation happens and diffuses within these communities. I would like to be able to build a system for evaluating that kind of thing that can be applied broadly to many projects. Ideally, it could do things like identify talented participants across multiple projects, or suggest interventions for making projects work better.

It’s an ambitious research project, but one for which there is plenty of data to investigate from the open source communities themselves.

What about teaching a course on such a thing? I anticipate that my students are likely to be interested in design as well as positioning their own projects within the larger open source ecosystem. Some of the people who I hope will take the class have been working on FuturePress, an open source e-book reading platform. As they grow the project and build the organization around it, they will want to be working with constituent technologies and devising a business model around their work. How can a course on Open Collaboration and Peer Production support that?

These concerns touch on so many issues outside of the consideration of software engineering narrowly (including industrial organization, communication, social network theory…) that it’s daunting to try to fit it all into one syllabus. But we’ve been working on one that has a significant hands-on component as well. Really I think the most valuable skill in the FOSS world is having the chutzpah to approach a digital community, propose what you are thinking, and take the criticism or responsibility that comes with that.

What concrete contribution a student uses to channel that energy should…well, I feel like it should be up to them. But is that enough direction? Maybe I’m not thinking concretely enough for this assignment myself.

POSSE and the FOSS field trip

I’m excited to be participating in POSSE — Professor’s Open Source Software Experience — this coming weekend in Philidelphia. It’s designed to train computer science professors how to use participation in open source communities to teach computer science. Somehow, they let me in as a grad student.

My goals are somewhat unusual for the program, I imagine. I’m not even in a computer science department. But I do have background in open source development, and this summer I’ll be co-teaching a course on Open Collaboration and Peer Production at Berkeley’s I School. We aren’t expecting all the students to be coders, so though there is a hands on component, it may be with other open collaborative projects like Wikipedia or OpenStreetMap. Though we aren’t filling a technical requirement (the class is aimed at Masters students), it will fulfill a management requirement. So you might say it’s a class on theory and practice of open collaborative community management.

My other education interest besides my own teaching is in the role of open source and computer science education in the larger economy. Lately there have been a lot of startups around programmer education. That makes sense, because demand for programming talent exceeds supply now, and so there’s an opportunity to train new developers. I’m curious whether it would be possible to build and market an on-line, MOOC-style programming course based on apprenticeship within open source communities.

One of our assignments to complete before the workshop is an open source community field trip. We’re suppose to check out a number of projects and get a sense of how to distinguish between strong and weak projects at a glance.

The first thing I noticed was that SourceForge is not keeping up with web user experience standards. That’s not so surprising, since as a FOSS practitioner I’m more used to projects hosted on GitHub and Google Code. Obviously, that hasn’t always been the case. But I’m beginning to think SourceForge’s role may now be mainly historic. Either that, or I have a heavy bias in my experience because I’ve been working with a not of newer, “webbier” code. Maybe desktop projects still have a strong SourceForge presence.

I was looking up mailing server software, because I’m curious about mining and modding mailing lists as a research area. Fundamentally, they seem like one of the lightest weight and most robust forms of on-line community out there, and the data is super rich.

Mailing lists on source Forge appear to be mainly in either Java, PHP, or some blend of Python and C. There are a couple Enterprise solutions. Several of the projects have moved their source code, hosted version control, and issue tracking off of SourceForge. Though the projects were ranked from “Inactive” through “Beta” to “Mature” and “Production/Stable”, usage of the tags was a little inconsistent across projects. Projects with a lot of weekly downloads tended to be either Mature or Production/Stable or both.

I investigated Mailman in particular. It’s an impressive piece of software; who knows how many people use Mailman mailing lists? I’m probably on at least ten myself. But it’s a fairly humble project in terms of its self-presentation and what people have done with it.

Turns out it has a lead developer, Barry Warsaw, who works at Canonical, and a couple other core committers, in addition to other contributors. There appears to be a v3.0 in production, which suddenly I’m pretty excited about.

POSSE has a focus on humanitarian FOSS projects. I’m not sure exactly how they define humanitarian, “that is, projects for which the primary purpose is to provide some social benefit such as economic development, disaster relief, health care, ecology. Examples include Mifos, Sahana and OpenMRS.”

For the purpose of this workshop I plan to look into Ushahaidi. I’ve heard so many good things about it, but frustratingly even after working four years on open source geospatial software, including a couple crowdsourced mapping apps, I never took a solid look at Ushahidi. Maybe because it was in PHP. I’m proud to say the project I put the most effort into, GeoNode, also has a humanitarian purpose. (GeoNode also now has a very pretty new website, and totally revamped user interface for its 2.0 release, now in alpha.) And though not precisely a software project, I’ve spent a lot of time admiring the intrepid Humanitarian OpenStreetMap Team for their use of open data as a humanitarian means and end.

But Ushahidi–there’s something you don’t see everyday.

We’re asked, on the POSSE field trip prompt, how we would decide whether our selected project was worth contributing to as an IT professional. The answer is: it depends on if I could do it for my job, but I’ve asked around about the project and community some and it seems like great people and usable software. I’d be proud to contribute to it, so at this point I expect my comparative advantage would be on the data analysis end (both of the community that builds it and data created by it) rather than to the core.

We were also asked to check out projects on Ohloh, which has also had a user interface revamp since I last looked carefully at it. Maybe significantly, we were asked to compare a number of different projects (two of them web browsers), but there was no feature on the website that provided a side-by-side comparison of the projects.

Also, one thing Ohloh doesn’t do yet is analytics on mailing lists. Which is odd, since that’s often where developers within a community get the most visceral sense of how large their community is (in my experience). Mailing lists wind up being the place where users can as participants affect software development, and where a lot of conflict resolution occurs. (Though there can be a lot of this on issue tracker discussions as well.)

This summer I hope to begin some more rigorous research into mailing list discussions and open source analytics. Seeing how Ohloh has moved forward reminds me I should be sharing my research with them. The focus of POSSE on project evaluation is encouraging–I’m curious to see where it goes next.

The recursive public as practice and imaginary

Chris Kelty’s Two Bits: The Cultural Significance of Free Software is one of the best synthetic histories of the Internet and Free Culture that I’ve encountered so far. Most exciting about it is his concept of the recursive public, the main insight of his extensive ethnographic work:

A recursive public is a public that is vitally concerned with the material and practical maintenance and modification of the technical, legal, practical, and conceptual means of its own existence as a public; it is a collective independent of other forms of constituted power and is capable of speaking to existing forms of power through the production of actually existing alternatives.

Speaking today about the book with Nick Doty and Ashwin Mathew, we found it somewhat difficult to tease out the boundaries of this concept. What publics aren’t recursive publics? And are the phenomena Kelty sometimes picks out by this concept (events in the history of Free Software) really examples of a public after all?

Just to jot down some thoughts:

  • If what makes the public is a social organization that contests other forms of institutional power (such as the state or the private sector), then there does seem to be an independence to the FOSS movement that makes the label appropriate. I believe this holds even when the organizations embodying this movement explicitly take part in state or commercial activities–as in resistance to SOPA, for example–though Ashwin seemed to think that was problematic.
  • I read recursion to refer to many aspects of this public. These include both the mutual reinforcement of its many components through time and the drive to extend its logic (e.g. the logic of open systems that originated in the IT sector in the 80’s) beyond its limits. If standards are open, then the source code should be next. If the source code is open, then the hardware is next. If the company’s aren’t open, then they’re next. Etc.

I find the idea of the recursive public compelling because it labels something aspirational: a functional unit of society that is cohesive despite its internal ideological diversity. However, it can be hard to tell whether Kelty is describing what he thinks is already the case or what he aspires for it to be.

The question is whether the recursive public is referring to the social imaginary of the FOSS movement or its concrete practices (which he lists: arguing about license, sharing source code, conceiving of the open, and coordinating collaboration). He does brilliant work in showing how the contemporary FOSS movement is a convergence of the latter. Misusing a term of Piaget’s, I’m tempted to call this an operational synthesis, analogous to how a child’s concept of time is synthesized through action from multiple phenomenological modalities. Perhaps it’s not irresponsible to refer to the social synthesis of a unified practice from varied origins with the same term.

Naming these practices, then, is a way of making them conscious and providing the imaginary with a new understanding of its situation.

Saskia Sassen in Territory, Authority, Rights notes that in global activism, action and community organization is highly local; what is global is the imagined movement in which one participates. Manuel Castells refers to this as the power of identity in social movements; the deliberate “reprogramming of networks” (of people) with new identities is a form of communication power that can exert political change.

It’s difficult for me to read Two Bits and not suspect Kelty of deliberately proposing the idea of a recursive public as an intellectual contribution to the self-understanding of the FOSS movement in a way that is inclusive of those that vehemently deny that FOSS is a movement. By identifying a certain set of shared practices as a powerful social force with its own logic in spite of and even because of its own internal ideological cacophony (libertarian or socialist? freedom or openness? fun, profit, or justice?), he is giving people engaged in those practices a kind of class consciousness–if they read his book.

That is good, because the recursive public is only one of many powers tussling over control of the Internet, and it’s a force for justice.

Ascendency and overhead in networked ecosystems

Ulanowicz (2000) proposes in information-theoretic terms several metrics for ecosystem health, where one models an ecosystem as a for example a trophic network. Principal among them ascendancy , which is a measure of the extent to which energy flows in the system are predictably structured weighted by the total energy of the system. He believes that systems tend towards greater ascendancy in expectation, and that this is predictive of ecological ‘succession’ (and to some extent ecological fitness). On the other hand, overhead, which is unpredictability (perhaps, inefficiency) in energy flows (“free energy”?), are important for the system’s resiliency towards external shocks.
ascendency
At least in the papers I’ve read so far, Ulanowicz is not mathematically specific about the mechanism that leads to greater ascendancy, though he sketches some explanations. Autocatalytic cycles within the network reinforce their own positive perturbations and mutations, drawing in resources from external sources, crowding out and competing with them. These cycles become agents in themselves, exerting what Ulanwicz suggests is Aristotelian final or formal causal power on the lower level components. In this way, freely floating energy is drawn into structures of increasing magnificence and complexity.

I’m reminded on Bataille’s The Accursed Share, in which he attempts to account for societal differences and the arc of human history through the use of its excess energy. “The sexual act is in time what the tiger is in space,” he says, insightfully. The tiger, as an apex predator, is flame that clings brilliantly to the less glamorous ecosystem that supports it. That is why we adore them. And yet, their existence is fragile, as it depends on both the efficiency and stability of the rest of its network. When its environment is disturbed, it is the first to suffer.
space tiger
Ulanowicz cites himself suggesting that a similar framework could be used to analyze computer networks. I have not read his account yet, though I anticipate several difficulties. He suggests that data flows in a computer network are analogous to energy flows within an ecosystem. That has intuitive appeal, but obscures the fact that some data is more valuable than others. A better analogy might be money as a substitute for energy. Or maybe there is a way to reduce both to a common currency, at least for modeling purposes.

Econophysics has been gaining steam, albeit controversially. Without knowing anything about it but based just on statistical hunches, I suspect that this comes down to using more complex models on the super duper complex phenomenon of the economy, and demonstrating their success there. In other words, I’m just guessing that the success of econophysics modeling is due to the greater degrees of freedom in the physics models compared to non-dynamic, structural equilibrium models. However, since ecology models the evolutionary dynamics of multiple competing agents (and systems of those agents), its possible that those models could capture quite a bit of what’s really going on and even be a source of strategic insight.

Indeed, economics already has a sense of stable versus unstable equilibria that resonate with the idea of stability of ecological succession. These ideas translate into game theoretic analysis as well. As we do more work with Strategic Bayesian Networks or other constructs to model equilibrium strategies in a networked, multi-agent system, I wonder if we can reproduce Ulanowicz’s results and use his ideas about ascendancy (which, I’ve got to say, are extraordinary and profound) to provide insight into the information economy.

I think that will require translating the ecosystem modeling into Judea Pearl’s framework for causal reasoning. Having been indoctrinated in Pearl’s framework in much of my training, I believe that it is general enough to subsume Ulanowicz’s results. But I have some doubt. In some of his later writings Ulanowicz refers explicitly to a “Hegelian dialectic” between order and disorder as a consequence of some of his theories, and between that and his insistence on his departure from mechanistic thinking over the course of his long career, I am worried that he may have transcended what it’s possible to do even with the modeling power of Bayesian networks. The question is: what then? It may be that once one’s work sublimates beyond our ability to model explicitly and intervene strategically, it becomes irrelevant. (I get the sense that in academia, Ulanwicz’s scientific philosophizing is a privilege reserved for someone tenured who late in their career is free to make his peace with the world in their own way) But reading his papers is so exhilarating to me. I’ve had no prior exposure to ecology before this, so his papers are packed with fresh ideas. So while I don’t know how to justify it to any of my mentors or colleagues, I think I just have to keep diving into it when I can, on the side.

@#$%! : variance annotations in Scala’s unsound parameterized types

[error] /home/sb/ischool/cs294/hw3/src/main/scala/TestScript.scala:32: type mismatch;
[error] found : Array[wikilearn.TestScript.parser.Page]
[error] required: Array[wikilearn.WikiParser#Page]
[error] Note: wikilearn.TestScript.parser.Page <: wikilearn.WikiParser#Page, but class Array is invariant in type T.
[error] You may wish to investigate a wildcard type such as `_ <: wikilearn.WikiParser#Page`. (SLS 3.2.10)

wtf, Scala.  You know exactly what I’m trying to do here.

EDIT: I sent a link to the above post to David Winslow. He responded with a crystal clear explanation that was so great I asked him if I could include it here. This is it, below:

It’s a feature, not a bug :) This is actually the specific issue that Dart had in mind when they put this note in the language spec:

The type system is unsound, due to the covariance of generic types. This is a deliberate choice (and undoubtedly controversial). Experience has shown that sound type rules for generics fly in the face of programmer intuition. It is easy for tools to provide a sound type analysis if they choose, which may be useful for tasks like refactoring.

Which of course caused some hubbub among the static typing crowd.

The whole issue comes down to the variance annotations of type parameters Variance influences how type parameters relate to the subtyping relationships of parameterized types:

Given types A and B, A is a supertype of B
trait Invariant[T] means there is no subtype relationship between Invariant[A] and invariant[B]. (Either could be used as an Invariant[_] though)
trait Covariant[+T] means Covariant[A] is a supertype of Covariant[B]
trait Contravariant[-T] means Contravariant[A] is a subtype of Contravariant[B].

The basic rule of thumb is that if you produce values of type T, you can be covariant in T, and if you consume values of type U, you can be contravariant in type U. For example, Function1 has two type parameters, the parameter type A and the result type T. it is contravariant in A and covariant in T. An (Any => String) can be used where a (String => Any) is expected, but not the other way around.

So, what about the type parameter for Array[T]? Among other operations, Arrays provide:

class Array[T] {
  def apply(i: Int): T // "producing" a T
  def update(i: Int, t: T): Unit // "consuming" a T
}

When the type parameter appears in contravariant and covariant positions the only option is to make it invariant.

Now, it’s interesting to note that in the Java language Arrays are treated as if they are covariant. This means that you can write a Java program that doesn’t use casts, passes the typechecker, and generates a type error at runtime; the body of main() would look like:

String[] strings = new String[1];
Object[] objects = strings;
objects[0] = Integer.valueOf(0); // the runtime error occurs at this step, but even if it didn't: 
System.out.println(strings[0]); // what happens here?

Anyway, the upshot is that immutable collections only use their types in covariant positions (you can get values out, but never insert) so they are much handier. Does your code work better if you replace your usage of Array with Vector? Alternatively, you can always provide the type parameter when you construct your array. Array(“”) is an Array[String], but Array[AnyRef](“”) is an Array[AnyRef].

Bash script for converting all .wav files in a directory to .mp3

I’ve been working with music files lately trying to get Steve Morrell‘s music online. In the process I’ve had to convert his albums, which I’ve ripped in .wav format, to .mp3.

To accomplish this, I’ve written a short bash script. It’s requires a number of tricks I wasn’t familiar with and had to look up.

#!/bin/bash

SAVEIF=$IFS
IFS=$(echo -en "\n\b")

for file in $(ls *wav)
do
  name=${file%%.wav}
  lame -V0 -h -b 160 --vbr-new $name.wav $name.mp3
done


IFS=$SAVEIFS

Though it isn’t recommended, I did the for loop on ls because I wanted to limit it to .wav files. But that means the script chokes on file names with spaces unless you swap out the IFS variable.

I used LAME for the conversion.