Wednesday, June 22, 2011

More visible time series plots

In my previous post I described the use of alternating colors to improve the readability of "spaghetti" plots of time series, especially for readers who had trouble distinguishing fine shades of color. I updated several times, so if you read it a while ago, you might like to check it again.

There was feedback, here and at Lucia's, from readers concerned about color-blindness, especially red-green, That got me thinking more about appropriate color schemes.

The benefit of three colors alternating, as I had, is that one can hope thatmost people could distinguish at least two of them, since they come from different parts of the rainbow(). But maybe that can be reinforced.

The downside to all this is that alternating colored lines are harder to follow by eye than single color.

Anyway, I've looked more into the R function rainbow(). It is just scanning the hue spectrum in the hsv() function. I'll talk more about HSV and RGB color numberings below. For the moment, this just makes possible a more flexible approach to the spectrum, which may help with color difficulties.

I've plotted here the same TSI example using different spectral ranges. You can click on any plot to see enlarged. The top blue mini-graph shows the part of the spectrum that is enhanced - more colors are chosen from the region where that function is higher. Below is a bar with the uniform spectrum, and below that, the spectrum actually chosen. The rest is as before. Below the jump come some thick line versions.






I'll be interested to hear if any of these color selections seem to be more easily discriminated.

Technical details - RGB, HSV etc

This gets into how colors are represented by numbers. The simplest model is RGB - three numbers representing the amount of red, green and blue. A notation common to many graphics platforms is the string "#rrggbb" where r,g,b are hex digits. So "#ff0000" is just red, "#bbbb00" is gold, and "#444444" is dark grey. Often two more digits are added to represent transparency - FF for none.

HSV is intended to be more in line with the way we perceive colors. Again it's a triple of numbers (in R on a scale of 0-1). H represents hue - like the familiar spectrum. S is saturation, and pretty much represents the amount of color at a given brightness. The brightness (as opposed to darkness) is given by the value (black=0, bright=1).

One thing to note in R is that HSV emulates the spectrum rather than generates it. The violet colors are generally created using reds with blue, so the high end doesn't help if you have trouble with red.

The following plot should make this clearer. Each bar shows the effect of varying one of h, s or v from 0 to 1.



R has various routines that support rgb etc. rgb(r,g,b) turns a triple (range 0-1) into a string for color. hsv(h,s,v) likewise creates a string. Then there are rgb2hsv etc. To make all these plots, I just used the hsv() function.

To get the modified spectrum, I decide on a function f(x) as shown in the top plots above. Then I invert and sum, and normalize to a 0:1 range. That gives a mapping vector i (length N) that moves rapidly through the unwanted colors. So hsv(i,1,1) gives the N colors to be used.

2 comments:

  1. I think you are going to continue to run into trouble with differentiating colors as long as you are relying on simply incrementing through a spectrum. Try out some of the qualitative colormaps from http://colorbrewer2.org/

    It is also more than just the colors that help differentiate data. Have you started looking at Tufte's Visual Display of Quantitative Data yet? Highly, highly recommended.

    ReplyDelete
  2. Anon, I checked out Colorbrewer. They don't seem to do palettes of more than ten colors, and they focus on gradations of a single hue.

    But I'm thinking more now of the dynamic displays - but it's still good to have color differentiation.

    I haven't read Tufte's book, though when you mentioned him, I looked up his Wiki bio. Interesting guy.

    ReplyDelete