Saturday, I reupped a piece about coding and biologists, which seemed to make a lot of people upset. I’ll attribute that to ‘instructor error’ on my part, since I obviously wasn’t clear what I meant (people still might disagree, of course, even if I’m clearer). If there’s one thing I learned from the reaction, it’s what coding means very different things to different people.
I primarily work with computational scientists and software engineers, so my definition of coding is very different from what I think many commenters meant.
I’ll return to coding in a bit, but consider this: I imagine many people reading this have, at some point in their careers, performed a 2×2 contingency table test, or a simple 2-way ANOVA (I don’t want to get into arguments about statistical methods, but suffice it to say, I think these tools do have their uses, even if it’s just to say, “I should probably explore that result further tomorrow because it might not be random”). No one would say, solely on the basis of having performed these tests, however, that they are statisticians (I hope). Yes, you did a statistical test, and if you’re trying to, let’s say, encourage young students in science, there’s nothing wrong with saying that they know some statistics. But you’re not going to apply for that job opening in the applied math department or that consultant position that requires, well, a trained statistician based on the qualification of using a 2×2 contingency table.
Many biologists (depending on the discipline) were told in days of yore they needed to learn statistics: not the simple techniques, or a basic understanding of methods, but be really well-trained in stats–perhaps not to the level of an Ph.D., but enough that, were you to apply for a job requiring statistical training, you really did have the chops. Now, we don’t hear that so much.
There was also a period where learning math (that is, theoretical biology), not just what you need for your non-mathematics major requirements (usually calculus or a semester of linear algebra), but enough math, to the point where you were in the same league as math majors was prized. Not so much anymore either. That isn’t to say either statistics or mathematics aren’t good now (I’ve worked in both of those areas). But is having a high level of expertise essential?
So to return to what I meant by coding. It’s useful to use awk and sed to pull data from tables, to write a short bash or python script to munge data around from one format to another, to use existing packages in R, or to run an software tool on a batch of data files. I do this! But I really don’t consider this coding. Yes, you put something in a file with .sh or .pl at the end and executed it. If I were trying to convince students to get excited about science and so on, I would probably call it coding (good job guys!). But that level of skill isn’t the level I would call a coder/programmer, anymore than performing a chi-square test means you’re a statistician. It’s useful, it’s ‘coding’, but that’s not what I meant in my posts–which is why I referred to having a backup plan (for coding to be a backup plan, you have to have significant skills–you will be competing against trained and/or experienced programmers).
There are opportunity costs to gain expertise in anything (arguably, life itself is one big sunk opportunity cost), and when I saw the discussion about coding, I interpreted it as meaning someone has spent a significant amount of time, both training and experience, comparable to being trained as a statistician. I still don’t think many biologists need that level of expertise in coding, even in The Tech Era. Obviously, if you need to solve your scientific problem of interest by building a new software tool, then you need to learn how to do so. But that’s no different than saying if you need to figure out your scientific problem of interest using crystal structure, then you need to learn crystallography. Crystallography is good and useful, but no one is saying ‘biologists need to learn crystallography.’* (and as I noted, people aren’t emphasizing advanced math or statistics as universal skills much these days either–and they’re good to know too!).
From where I sit, ‘coding’ probably will become less important to many biologists as analyses will become more routine and self-contained and as ‘wrapper’ tools that allow converting among formats and tools are better developed. More importantly, we will have failed if we aren’t in that position. People on the bleeding edge will always do the bespoke stuff that might require serious programming (i.e., both coding and ‘coding’), but most biologists won’t be doing that–and when they need to, that’s when you collaborate with an expert.
If we think about all of the things biologists were supposed to know very well (not just a few basic things in each of these areas), including molecular biological techniques, stats, advanced math, and now coding, I don’t see how it’s possible to be an expert in all of these things (even as we do need experts in these areas, along with other areas I haven’t listed). That doesn’t mean you should learn new things–I still learn new things, but having expertise is very difficult to come by. There are opportunity costs.
Finally, there’s one other tangential thing: the learning to think argument, the precision thinking arguments and so on. I call bullshit. I’ve never heard an intellectual discipline that doesn’t claim it teaches its trainees ‘how to think.’ Philosophers, lawyers, physicists, statisticians, scholars in the humanities, all argue that their discipline teaches rigor (No one says, “Actually, we don’t teach our students how to think, we just cram data up their asses. We’re fine with them being dumber than a sack of hammers.”) If you went through college, and are in a PhD program or completed one and it took coding to teach you how to think, then you have received a shitty education and should be disappointed in your teachers.
*Nor is anyone claiming crystallography ‘teaches you how to think’, though one could probably argue that it could.