‘Coding’, Coding, and Biologists

Saturday, I reupped a piece about coding and biologists, which seemed to make a lot of people upset. I’ll attribute that to ‘instructor error’ on my part, since I obviously wasn’t clear what I meant (people still might disagree, of course, even if I’m clearer). If there’s one thing I learned from the reaction, it’s what coding means very different things to different people.

I primarily work with computational scientists and software engineers, so my definition of coding is very different from what I think many commenters meant.

I’ll return to coding in a bit, but consider this: I imagine many people reading this have, at some point in their careers, performed a 2×2 contingency table test, or a simple 2-way ANOVA (I don’t want to get into arguments about statistical methods, but suffice it to say, I think these tools do have their uses, even if it’s just to say, “I should probably explore that result further tomorrow because it might not be random”). No one would say, solely on the basis of having performed these tests, however, that they are statisticians (I hope). Yes, you did a statistical test, and if you’re trying to, let’s say, encourage young students in science, there’s nothing wrong with saying that they know some statistics. But you’re not going to apply for that job opening in the applied math department or that consultant position that requires, well, a trained statistician based on the qualification of using a 2×2 contingency table.

Many biologists (depending on the discipline) were told in days of yore they needed to learn statistics: not the simple techniques, or a basic understanding of methods, but be really well-trained in stats–perhaps not to the level of an Ph.D., but enough that, were you to apply for a job requiring statistical training, you really did have the chops. Now, we don’t hear that so much.

There was also a period where learning math (that is, theoretical biology), not just what you need for your non-mathematics major requirements (usually calculus or a semester of linear algebra), but enough math, to the point where you were in the same league as math majors was prized. Not so much anymore either. That isn’t to say either statistics or mathematics aren’t good now (I’ve worked in both of those areas). But is having a high level of expertise essential?

So to return to what I meant by coding. It’s useful to use awk and sed to pull data from tables, to write a short bash or python script to munge data around from one format to another, to use existing packages in R, or to run an software tool on a batch of data files. I do this! But I really don’t consider this coding. Yes, you put something in a file with .sh or .pl at the end and executed it. If I were trying to convince students to get excited about science and so on, I would probably call it coding (good job guys!). But that level of skill isn’t the level I would call a coder/programmer, anymore than performing a chi-square test means you’re a statistician. It’s useful, it’s ‘coding’, but that’s not what I meant in my posts–which is why I referred to having a backup plan (for coding to be a backup plan, you have to have significant skills–you will be competing against trained and/or experienced programmers).

There are opportunity costs to gain expertise in anything (arguably, life itself is one big sunk opportunity cost), and when I saw the discussion about coding, I interpreted it as meaning someone has spent a significant amount of time, both training and experience, comparable to being trained as a statistician. I still don’t think many biologists need that level of expertise in coding, even in The Tech Era. Obviously, if you need to solve your scientific problem of interest by building a new software tool, then you need to learn how to do so. But that’s no different than saying if you need to figure out your scientific problem of interest using crystal structure, then you need to learn crystallography. Crystallography is good and useful, but no one is saying ‘biologists need to learn crystallography.’* (and as I noted, people aren’t emphasizing advanced math or statistics as universal skills much these days either–and they’re good to know too!).

From where I sit, ‘coding’ probably will become less important to many biologists as analyses will become more routine and self-contained and as ‘wrapper’ tools that allow converting among formats and tools are better developed. More importantly, we will have failed if we aren’t in that position. People on the bleeding edge will always do the bespoke stuff that might require serious programming (i.e., both coding and ‘coding’), but most biologists won’t be doing that–and when they need to, that’s when you collaborate with an expert.

If we think about all of the things biologists were supposed to know very well (not just a few basic things in each of these areas), including molecular biological techniques, stats, advanced math, and now coding, I don’t see how it’s possible to be an expert in all of these things (even as we do need experts in these areas, along with other areas I haven’t listed). That doesn’t mean you should learn new things–I still learn new things, but having expertise is very difficult to come by. There are opportunity costs.

Finally, there’s one other tangential thing: the learning to think argument, the precision thinking arguments and so on. I call bullshit. I’ve never heard an intellectual discipline that doesn’t claim it teaches its trainees ‘how to think.’ Philosophers, lawyers, physicists, statisticians, scholars in the humanities, all argue that their discipline teaches rigor (No one says, “Actually, we don’t teach our students how to think, we just cram data up their asses. We’re fine with them being dumber than a sack of hammers.”) If you went through college, and are in a PhD program or completed one and it took coding to teach you how to think, then you have received a shitty education and should be disappointed in your teachers.

*Nor is anyone claiming crystallography ‘teaches you how to think’, though one could probably argue that it could.

This entry was posted in Education. Bookmark the permalink.

3 Responses to ‘Coding’, Coding, and Biologists

  1. Joe Shelby says:

    There’s also the option of actually working with your school’s Math and CS department and get the coding done by them.

    One of the things about software design is the problem domain, and this is where most of the kids out of school are lacking experience: translating a problem domain from another academic/professional genre into code that they can understand and maintain, while providing a user experience that is optimal for the person who is the problem domain expert but knows little about coding. [Eventually software companies learn to hire domain experts to become program managers to provide that link between the coders and the intended audience…because the sales and marketing departments who usually are the ones that companies try to put in that spot first are horrible at it.]

    So by having a technical problem (and you’re in a university), perhaps turn to your math and cs departments and work with them to find a coder to do it. You get the program you need, and the students or grad students/interns who produce the work get some experience in writing programs out side of normal “data entry” canned stuff that the cs textbooks usually involve, which improves their abilities to deal with non-coders in how to interpret and explain things.

  2. Jonathan Jacobs says:

    At QIAGEN BioInformatics’s- we hire biologists as domain experts. They are expected to code, but it’s cool if they can. They aren’t really allowed/expected contribute to the code base though. They help define the problem, requirements, etc. We also hire bioinformaticians who serve as a bridge between the biology and CS sides
    Of the team. They are expected to code, contribute to the code base, and have at least domain expertise in something (human genomics, transcriptomics, microbiome, etc). We also hire computer scientists and mathematicians, these are the folks who do most of the hard core coding in C, C++, and JAVA. They are expected to know much about the domain space they may be developing a solution for, but over time they learn enough. The key piece here is communication and teamwork. Building a de novo assembler (for example) that competes with some of the best ones in the opensource space, but runs on any operating system from a GUI or command line is a real challenge. With the communication and teamwork it would be impossible.

  3. zero says:

    I would call what you described ‘scripting’. You needed some data modified, identified a pattern, then applied a transform to get the pattern you wanted. Scripting can be done without needing to understand fundamental computer science ideas. Clever people can learn how to do it in a few days if they have enough motivation, and there’s no need to know about recursion or memory management.

    In the wider world, developers write code. A whole lot of people beyond just developers write scripts. If a scripting problem graduates in scale or importance to a code problem then you hire an expert to develop a solution.

    Just about everyone who works with data should be able to write scripts of some kind. The details may vary; for some people SQL is more useful than bash. I don’t think it is useful for people who already have a primary expertise to invest in becoming a developer unless their goal is to be a developer (vs. a biologist with sharp coding skills).

Comments are closed.