How to choose color

Guideline: The order of colors in our “rainbow” is easy for everyone to understand, but this order is not universal and will make charts and maps harder to read.

Question: From this guide, it is true that any kind of order is not universal, it seems that if the order follows the light frequency spectrum, it will be better. What would you do when we want to separate data into different categories in color maps and how many colors at most should we use to visualize our multi-dimensional data and avoid misunderstanding or getting confused with colors?

Source: Krzywinski, M., Brol, I., Jones, S., & Marra, M. (2012). Getting into visualization of large biological data sets: 20 imperatives of information design. Poster presented at 2nd IEEE Symposium on Biological Data Visualization (BioVis 2012), Seattle, WA.

Thanks!

Hi, thanks for your post.

There is an interesting discussion about “Rainbow colormaps” on this post: Rainbow Colormap

Please, see if it helps, you could follow-up your discussion there.

The first thing to consider when using color to represent data is whether you data are categorical or continuous. The three perceptual dimensions of color, hue, saturation and luminance, process information differently. Hue is great for categorical data. The colors of the rainbow are great to use, but if you want more categories, try more desaturated colors (e.g., salmon, teal, amythyst) instead of saturated spectral colors. You can order categories if your data are ordinal. You can, for example, have a set of colors that are ordered, and that ordering can follow the spectrum.

If the data are continuous (or “quantitative”), then your goal is to convey magnitude. For this, luminance variation is your best bet. Your lowest value should map onto a very dark color and your highest value should map onto a very bright color. This colors however do not have to be just black and white. You can map from a very dark red through orange, to yellow and then white, for example. Just be careful to have perceived luminance increase monotonically, and have smooth transitions between hues. If you want to also show that there are different semantic “bands” in your data, you can quantize the color scale into uniform bins or you can select bins of different sizes and hues to provide that extra dimension. The rainbow colormap is not good for this purpose because luminance is not monotonic and the color bands are not uniformly spaced, which breaks the perception of monotonicity in the data and draws attention to accidental regions in the data.

You can combine these ideas. For example, if you want to represent data that are monotonically increasing above a zero and monotonically decreasing (that is, continuous) below zero, you can use one hue or hue range above zero and a different hue or hue range below zero. This could be a divergent scale, where colors become more saturated and lower luminance the greater the distance from the zero. Or you can have a monotonic luminance scale which just switches hue above zero, as in a topological map.

So, the first step is to understand the structure of your data, then, what aspect are you trying to represent, and then use the three perceptual dimensions of color to guide your selections.

As Bernice mentioned, perceiving magnitude is one of the tasks of viewing quantitative data. Here I used the word perceiving is to imply that this action of visualization is not in any way as accurate as retrieving values from the source dataset itself, regardless what continuous colormap one uses. Hence, this may not be the main benefit of visualization. We must consider what other visualization tasks that a colormap may support. For example, do viewers wish to identify data in a specific data range quickly, do viewers wish to see the gradient change in a certain range with more visual resolution, do viewers have the knowledge that a certain range of data values are less important than others, do viewers know that there are critical data values (e.g., 0 in the two-band colormap in Bernice’s answer if 0 is in the data range, and there may be other important values such the water boiling point) so they wish to observe them quickly. In many cases, a specialised colormap (for a type of data in a subject field but not for a specific dataset) can encode important semantic information, such as data ranges and critical values. When there is no standard for such a colormap within a subject field, people naturally try to use a “universal” colormap. A latest paper address the need for domain-specific colormaps.

P. Nardini et al. “The Making of Continuous Colormaps.” IEEE TVCG, doi: 10.1109/TVCG.2019.2961674

Hopefully there will be more standard or de facto standard semantic-rich colormaps in different subject fields. No universal colormap can be an optimal solution! The main purpose of visualization is not for retrieving values but for observing patterns, which are often not well-defined beforehand but are dynamically defined by viewers during visualization. Hence when users define a standard colormap for their subject field, they are making a step towards defining their patterns.