Exercise design


How to design useful exercises is an important question for a class like this. I’ve taught in many workshops and such (SSB’s workshops at Evolution, Arnold & Felsenstein’s Evolutionary Quantitative Genetics workshops at NESCent and NIMBioS, phrapl workshops) and have gotten the best student feedback by teaching with a skeleton of a solution that has sections where we stop and talk about what is needed and which has some bits for students to figure out themselves but still in a framework. For example, from the EQG workshop:

#Let's go back to coin flipping in R. Let's have three hypotheses:
prob.heads.hypotheses <- c(0.1, 0.5, 0.8) 


prior.on.hypotheses <- c(-------------------) #add your own here. What is the prob of c(0.1, 0.5, etc.
prior.on.hypotheses <- prior.on.hypotheses/sum(prior.on.hypotheses) #priors should sum to 1

#next assume our next flips we get 2/3 heads
num.flips <- 3
num.heads <- round(num.flips*2/3)

#how to calculate the likelihood?
likelihood.data.given.hypotheses <- c(----, -----, ----)

#hint: ?dbinom

which has sections for students to fill in on their own, then we talk about them in class before moving on. I also include a completely filled in version so students who fall behind or miss a point can check it to keep up. Other instructors sometimes live code on screen and students follow along, or try to type the same on their computer, but this emphasizes copying exactly but not thinking about what is being written and why. Another possibility is to have canned exercises students can copy and paste from; this lets students proceed at their own pace, but it’s easy to skim over parts they don’t understand (for example, think about what a student learns by figuring out what the likelihood should be above, rather than just copying a set of dbinom() strings). The Software Carpentry lessons I’ve looked at (but note that I haven’t taken a course from them) seem to do a mixture of self-paced fully specified code and open ended challenges: design a function for doing X, draw a diagram to show Y. I think this is great, but it’s hard for a student on her or his own to figure out if the challenge was completed successfully. In the R world, swirl is popular for teaching: it runs within R, and provides a mixture of info and questions, and the student has to write answers (multiple choice, supply a function, etc.). I may use this for some aspects of the course, but one thing I don’t like is how brittle it can be at assessing correct answers: for example, I was doing a tutorial where it asked us to print out a variable: typing out the variable name (i.e, foo) was taken as correct, but doing print(foo) was not, even though the latter was better R code. This is understandable, but results in the student having to figure out what the examiner wants as well as what the right answer is. It’s also hard to have students return complex things (like a new function) as an answer in this framework.

One thing that is becoming increasingly important in software development in science is unit testing: automated tests to make sure that the code works as intended, which you can rerun as the code changes. testthat is a useful package for writing such tests in R. One can test for matching a given value, class, and many other features. I’m thus doing my usual procedure of having partial solutions and a separate file of actual solutions, but using testthat with tests so students can check their work when they are done. Exercises will be grouped into packages; a bit of overkill, but by making them packages it makes testing easier, and by doing a series of short packages rather than one giant one it makes it faster to test just what the student is working on. The first one made is for getting trees from TreeBASE and Open Tree of Life using R.