You sometimes come across heat maps in data visualization, and they’re used to represent data values with color gradients. This technique is great for identifying patterns, trends, outliers, and missing data when there’s lots of data. Tables can have this sort of treatment as well! Typically, formatted numeric values are shown along with some color treatment coinciding with the underlying data values.
We can make this possible in Great Tables by using the data_color() method. Let’s start with a simple example, using a Polars DataFrame with three columns of values. We can introduce that data to GT() and use data_color() without any arguments.
This works but doesn’t look all too appealing. However, we can take note of a few things straight away. The first thing is that data_color() doesn’t format the values but rather it applies color fill values to the cells. The second thing is that you don’t have to intervene and modify the text color so that there’s enough contrast, Great Tables will do that for you (this behavior can be deactivated with the autocolor_text= argument though).
Setting palette colors
While this first example illustrated some basic things, the common thing to do in practices to provide a list of colors to the palette= argument. Let’s choose two colors "green" and "red" and place them in that order.
GT(simple_df).data_color(palette=["blue", "red"])
integer
float
category
1
2.3
one
2
1.3
two
3
5.1
three
4
None
one
5
4.4
three
Now that we’ve moved away from the default palette and specified colors, we can see that lower numerical values are closer to blue and higher values are closer to red (those in the middle have colors that are a blend of the two; in this case, more in the purple range). Categorical values behave similarly, they take on ordinal values based on their first appearance (from top to bottom) and those values are used to generate the background colors.
Coloring missing values with na_color
There is a lone "None" value in the float column, and it has a gray background. Thoughout the Great Tables package, missing values are treated in different ways and, in this case, it’s given a default color value. We can change that with the na_color= argument. Let’s try it now:
Now, the gray color has been changed to Bisque. Note that when it comes to colors, you can use any combination of CSS/X11 color names and hexadecimal color codes.
Using domain= to color values across columns
The previous usages of the data_color() method were such that the color ranges encompassed the boundaries of the data values. That can be changed with the domain= argument, which expects a list of two values (a lower and an upper value). Let’s use the range [0, 10] on the first two columns, integer and float, and not the third (since a numerical domain is incompatible with string-based values). Here’s the table code for that:
Nice! We can clearly see that the color ramp in the first column (integer) only proceeds from blue (value: 1) to purple (value: 5) and there isn’t a reddish color in sight (would need a value close to 10).
Bringing it all together
For a more advanced treatment of data colorization in the table, let’s take the sza dataset (available in the great_tables.data submodule) and vigorously reshape it with Polars so that solar zenith angles are arranged as rows by month, and the half-hourly clock times are the columns (from early morning to solar noon).
Once the pivot()ing is done, we can introduce that that table to the GT() class, placing the names of the months in the table stub. We will use data_color() with a domain that runs from 90 to 0 (here, 90° is sunrise, and 0° is represents the sun angle that’s directly overhead). There are months where the sun rises later in the morning, before the sunrise times we’ll see missing values in the dataset, and na_color="white" will handle those cases. Okay, that’s the plan, and now here’s the code:
from great_tables import htmlfrom great_tables.data import szaimport polars.selectors as cssza_pivot = ( pl.from_pandas(sza) .filter((pl.col("latitude") =="20") & (pl.col("tst") <="1200")) .select(pl.col("*").exclude("latitude")) .drop_nulls() .pivot(values="sza", index="month", columns="tst", sort_columns=True))( GT(sza_pivot, rowname_col="month") .data_color( domain=[90, 0], palette=["rebeccapurple", "white", "orange"], na_color="white", ) .tab_header( title="Solar Zenith Angles from 05:30 to 12:00", subtitle=html("Average monthly values at latitude of 20°N."), ))
/tmp/ipykernel_4177/4120735438.py:6: DeprecationWarning: The argument `columns` for `DataFrame.pivot` is deprecated. It has been renamed to `on`.
pl.from_pandas(sza)
Solar Zenith Angles from 05:30 to 12:00
Average monthly values at latitude of 20°N.
0530
0600
0630
0700
0730
0800
0830
0900
0930
1000
1030
1100
1130
1200
jan
None
None
None
84.9
78.7
72.7
66.1
61.5
56.5
52.1
48.3
45.5
43.6
43.0
feb
None
None
88.9
82.5
75.8
69.6
63.3
57.7
52.2
47.4
43.1
40.0
37.8
37.2
mar
None
None
85.7
78.8
72.0
65.2
58.6
52.3
46.2
40.5
35.5
31.4
28.6
27.7
apr
None
88.5
81.5
74.4
67.4
60.3
53.4
46.5
39.7
33.2
26.9
21.3
17.2
15.5
may
None
85.0
78.2
71.2
64.3
57.2
50.2
43.2
36.1
29.1
26.1
15.2
8.8
5.0
jun
89.2
82.7
76.0
69.3
62.5
55.7
48.8
41.9
35.0
28.1
21.1
14.2
7.3
2.0
jul
88.8
82.3
75.7
69.1
62.3
55.5
48.7
41.8
35.0
28.1
21.2
14.3
7.7
3.1
aug
None
83.8
77.1
70.2
63.3
56.4
49.4
42.4
35.4
28.3
21.3
14.3
7.3
1.9
sep
None
87.2
80.2
73.2
66.1
59.1
52.1
45.1
38.1
31.3
24.7
18.6
13.7
11.6
oct
None
None
84.1
77.1
70.2
63.3
56.5
49.9
43.5
37.5
32.0
27.4
24.3
23.1
nov
None
None
87.8
81.3
74.5
68.3
61.8
56.0
50.2
45.3
40.7
37.4
35.1
34.4
dec
None
None
None
84.3
78.0
71.8
66.1
60.5
55.6
50.9
47.2
44.2
42.4
41.8
Because this is a table for presentation, we can’t neglect using tab_header(). A title and subtitle can provide just enough information to guide the reader out through your table visualization.