Friday, June 17, 2011

Cheerful colors for time series

Over at the Blackboard. Lucia was looking at how to get a good color scheme in R to show multiple time series. It's quite hard to get a set with good contrast.

I've been wondering about that too. It's a personal problem - my ability to distinguish color shades has decreased.

I've been dabbling with an alternative idea - stripy lines. Or at least alternating color segments. Then you don't have to rely on shades to make the distinction.

Lucia illustrated with some solar data from Leif Svalgaard. She used different dot-dash line styles to nelp make contrasts. I thought it would be really good to make these in alternating colors. You can do this by over-writing.

So here's what I came up with. Some may like it, some not. The lines are in principle more distinctive, but it's harder to see where they are going. Single contrasting colors are certainly better, if you can get enough of them.

Anyway, here's my plot. The R code is below the jump, and I'll put a zip file (TSIcolors.zip) with data on the doc repository. As Lucia noted, Leif's file just has blanks for missing data, so I edited the NA entries in.The colors are automatically and randomly chosen.

Update:
Peter O'Neill (oneillp) in comments  suggested using R-supplied palettes. I think this is better, specifically rainbow(). He also suggested a way to fix the line segments in legend, using seg.len. I found my legend() function would not take that as an argument. I also found that the problem with lines only applied when in jpeg or png mode. I couldn't find the bug, so I wrote my own legend routine - using a subset of the regular arguments.
Update.  Replacing the above update. I've redone in the spirit of Peter's second comment. Instead of a new legend function, I use the values returned by the the standard oneto overwrite the line segs. I don't then need to use seg.len

Revised pictures and code below.

Here's a plot with thicker lines. I couldn't get the legend lines right here:

And here is the (revised) code:

#  Program written by Nick Stokes to make multi-colored curves
# File from http://www.leif.org/research/TSI%20%28Reconstructions%29.txt
# Blanks have been converted to NA

N=dim(w)[1]-1  # Number of curves
x=w[1,]
cl=rainbow(N)
cl=matrix(cl[round(outer(1:N*3-3,0:2*N,"+")/3)%%N+1],N,3) ## Make orthog colors
# Now make dash patterns
k1=1:N%%3+2;k2=1:N%%5+8; k=k1+k2;
lt=matrix(paste(as.hexmode(c(241+k*0,k1*16+k2,15+k))),N,3)
lw=4  # line width
### Now plotting
png("TSI1.png",width=800)
plot(range(x),range(w[1:N+1,],na.rm=T),type="n",xlab="Year",ylab="TSI")
#Now plot curves 3 times with 3 different colors and dash patterns

for(i in 1:N)for(j in 1:3)lines(x,w[i+1,], col=cl[i,j], lty=lt[i,j], lwd=lw)
# Now plots legend, also 3 times
L = legend(1970,1364.74, legend=names, cex=1.0, text.col=cl[,1], col="white",lty=lt[,1])
x=L$rect$left+c(0.02,0.3)*L$rect$w
for(i in 1:N) for(j in 1:3) lines(x,rep(L$text$y[i],2),col=cl[i,j],lty=lt[i,j],lwd=lw)

dev.off()

1. I'm a protanope (no red cells at all, or pretty close). It's the rarer (0.5%) and somewhat more debilitating form of red-green colourblindness. As a result, I find most colour coded graphs with more than 3 lines, and some with fewer, useless. I end up just reading the conclusions. If I'm desparate for a value, I pop the graph up in a graphics tool and read off the RGB values to work out which line is which.

A big help is if the legends are sorted by the same order as the endpoint heights of the lines on the plot (or start, or mean).

Your lines are better for me where the lines are well separated. When they cross it's just a jumble of colour though. And of course reading them by RGB value is now much harder.

Kevin C

2. Kevin,
THanks for the comment. I guess a better legend would help. I'm afraid the legends here were messed up for reasons that I don't understand. I would like to show a longer segment of line, but I can't see how to make R do that.

For me it also gets to be a jumble of colors at times - but that's when other schemes get messy too. I used random colors - I imagine for red-green it would be better to make sure there was at least one strong blue and one other in each line.

3. The sample below should help solve the "longer segment" difficulty: seg.len is in character width units.

I've modified your example to use solid lines and various colour palettes in R: cm.colors, rainbow, terrain.colors, topo.colors and heat.colors, as well as creating my own palette using the colours used for GISS Global Maps. I find the rainbow palette quite good for distinguishing lines.

You may also find something of interest in two R packages:

dichromat, described as "Collapse red-green distinctions to simulate the effects of colour-blindness", providing "17 color schemes suitable for people with deficient or anomalous red-green vision"

RColorBrewer, described as "The package provides palettes for drawing nice maps shaded according to a variable", with brewer.pal which "makes the color palettes from ColorBrewer available as R palettes". The selection tools at http://colorbrewer.org provide options such as "colorblind safe", "print friendly" and "photocopy-able", and might be worth exploring.

# Select a palette from the selection below (myPal)
#
myPal<-cm.colors(N); paletteTitle<-"cm.colors(N)"
myPal<-rainbow(N); paletteTitle<-"rainbow(N)"
myPal<-terrain.colors(N); paletteTitle<-"terrain.colors(N)"
myPal<-topo.colors(N); paletteTitle<-"topo.colors(N)"
myPal<-heat.colors(N); paletteTitle<-"heat.colors(N)"

myCols<- c(rgb(0.498039,0,1),rgb(0.247059,0.580392,0.988235),
rgb(0.470588,0.796078,0.996078),rgb(0.6,0.929412,0.996078),
rgb(0.85098,0.996078,0.85098),
#rgb(0.996078,0.996078,1), # remove to avoid "white" lines
rgb(1,1,0.298039),rgb(1,0.8,0),rgb(1,0.494118,0),rgb(1,0,0),rgb(0.494118,0,0),
"grey") # grey added for NA areas, when replaced by big value
getPalette<- function (myCols) {
p<-palette(myCols)
p
}
myPal<-getPalette(myCols); paletteTitle<-"\"GISS Global Maps\""
#
# then use this palette
#
plot(range(x),range(w[1:N+1,],na.rm=T),type="n",xlab="Year",ylab="TSI")
for(i in 1:N) lines(x, w[i+1,], col=myPal[i], lwd=lw)
temp<-legend("bottomright", title=paletteTitle,
legend = c(" ", " ", " ", " ", " ", " ", " ", " ", " ", " "),
text.width = strwidth("DIARAD "), lwd = lw, col=myPal, lty = 1, xjust = 0,
yjust = 1, seg.len = 6)
text(temp$text$x, temp$text$y, names, pos=4)

4. And an extension to my legend code to show how "stripy" lines can be handled. For simplicity I have just overplotted each solid coloured line with a (differently) dashed black line. The added code here calculates the plotting positions for the line segments in the legend. This calculation requires the cex value and the seg.len value, so these are set as variables myCex and mySeglen.

myCex <- 0.9; mySeglen <- 6; lw <- 5
plot(range(x),range(w[1:N+1,],na.rm=T),type="n",xlab="Year",ylab="TSI")
for(i in 1:N) lines(x, w[i+1,], col=myPal[i], lwd=lw)
for(i in 1:N) lines(x, w[i+1,], lwd=lw, lty=(i+1))
temp<-legend("bottomright", title=paletteTitle,
legend = c(" ", " ", " ", " ", " ", " ", " ", " ", " ", " "),
text.width = strwidth("DIARAD "), lwd = lw, col=myPal, lty = 1, xjust = 0,
yjust = 1, seg.len = mySeglen, cex = myCex)
cin <- par("cin")
Cex <- myCex * par("cex")
xc <- Cex * xinch(cin[1L], warn.log = FALSE)
text(temp$text$x, temp$text$y, names, pos=4)
segments(temp$rect$left + xc, temp$text$y, temp$rect$left + (mySeglen + 1) * xc,
temp$text$y, lty=(1+c(1:N)), lwd=lw)

5. Wow, thanks, oneillp
It's early morning here, but soon I'll try your code and post the results.

6. oneillp #4,
Peter, I've been trying to run your code. Have you been tinkering with legend()? My version does not accept seg.len as an argument. It's there as an internal variable (set to 2), and I could hack it easily, but I wondered if there's some other version I'm missing. I'm running R 2.11.1, but I updated to the latest graphics package.

7. No tinkering with legend() - as installed with R 2.12.2 (sessionInfo pasted below)

seg.len modification is accepted, and it appears to work in jpeg mode too. Digging out an older laptop "frozen" at 2.10.1 I find seg.len is not accepted as an argument at that release, and now checking back on changes in versions I find for 2.12.1:

legend() allows the length of lines to be set by the end user
_via_ the new argument seg.len.

My few lines of code to handle the line segments for dashed lines are derived from the legend() function, and assume that cex and seg.len are the only relevant arguments modified by the user. Changing other graphical parameters could require more work. If that happens, examine the body of the legend() function to see what may be needed. I added trace=TRUE to the arguments for the legend() call to see further information in the console log which was helpful when examining the legend() body to find the code needed.

> sessionInfo()
R version 2.12.2 (2011-02-25)
Platform: x86_64-pc-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_Ireland.1252 LC_CTYPE=English_Ireland.1252
[3] LC_MONETARY=English_Ireland.1252 LC_NUMERIC=C
[5] LC_TIME=English_Ireland.1252

attached base packages:
[1] grDevices datasets splines graphics stats tcltk utils
[8] methods base

other attached packages:
[1] survival_2.36-5

loaded via a namespace (and not attached):
[1] grid_2.12.2 lattice_0.19-26 tools_2.12.2
>

8. Thanks, Peter
I updated to R 2.13.0. That didn't fix my basic problem, which was that the line segments don't overwrite properly in jpeg or png mode in legend(). But then I realised that I could use your trick of overwriting the lines explicitly using coords returned by legend(). So I made that change - it's much better than rewriting legend(). The default seglen=2 was OK.

9. One rule I've taken from countless powerpoint presentations -- thin lines have no color. Thicken them up as you do in your second figure.

Second, though it's harder to apply -- avoid relying on red-green distinctions. Some level of red-green color blindness is in about 5% of the population, iirc. Rainbow-style color bars are bad at this. HSV color bars do a better job, as it's easier to distinguish between high and low saturation red than between red and green.

10. PD,
Yes, I'm not color-blind, just fading - and I agree about thick lines.

I found on looking into it that rainbow() just ranges through the hue of hsv(), with s=v=1. So I've been looking at emphasising parts of the range.

I'll do another post soon.

11. Nick-
I find many 'rainbow()' colors are too light against white backgrounds. Yellow and cyan are particularly bad. I was using rainbow when my color blind readers and many others found some of those colors difficult to see. So, rainbow() is clearly not the solution.

That's why I've gone to fiddling with saturation.

12. A possible solution would be a rollover where the lines are in dull shades and as you roll over it becomes thicker and brighter and the legend lights up. Of course, that is a JAVA app, but perhaps there could be a frame that everyone could use and just insert data and axes.

13. http://stevemosher.wordpress.com/2011/06/21/rghcnv3-a-new-package/

thanks for all your help nick

14. Thanks, Steven,