Great Tables: The Polars DataFrame Styler of Your Dreams
Author
Michael Chow
Published
January 8, 2024
Analyzing data with polars is a no-brainer in python. It provides an intuitive, expressive interface to data. When it comes to reports, it’s trivial to plug polars into plotting libraries like seaborn, plotly, and plotnine.
However, there are fewer options for styling tables for presentation. You could convert from polars to pandas, and use the built-in pandas DataFrame styler, but this has one major limitation: you can’t use polars expressions.
As it turns out, polars expressions make styling tables very straightforward. The same polars code that you would use to select or filter combines with Great Tables to highlight, circle, or bolden text.
In this post, I’ll show how Great Tables uses polars expressions to make delightful tables, like the one below.
Daily measurements in New York City (May 1-10, 1973)
Time
Measurement
Year
Month
Day
Ozone, ppbV
Solar R., cal/m2
Wind, mph
Temp, °F
1973
5
1
41.0
190.0
7.4
67
1973
5
2
36.0
118.0
8.0
72
1973
5
3
12.0
149.0
12.6
74
1973
5
4
18.0
313.0
11.5
62
1973
5
5
None
None
14.3
56
1973
5
6
28.0
None
14.9
66
1973
5
7
23.0
299.0
8.6
65
1973
5
8
19.0
99.0
13.8
59
1973
5
9
8.0
19.0
20.1
61
1973
5
10
None
194.0
8.6
69
The parts of a presentation-ready table
Our example table customized three main parts:
Title and subtitle: User friendly titles and subtitles, describing the data.
Column spanners: Group related columns together with a custom label.
Styles: Highlight rows, columns, or individual cells of data.
This is marked below.
Let’s walk through each piece in order to produce the table below.
Creating GT object
First, we’ll import the necessary libraries, and do a tiny bit of data processing.
import polars as plimport polars.selectors as csfrom great_tables import GTfrom great_tables.data import airquality# Note that we'll use the first 5 rows as we build up our codeairquality_mini = airquality.head(5).assign(Year =1973)pl_airquality = pl.DataFrame(airquality_mini).select("Year", "Month", "Day", "Ozone", "Solar_R", "Wind", "Temp")pl_airquality
shape: (5, 7)
Year
Month
Day
Ozone
Solar_R
Wind
Temp
i64
i64
i64
f64
f64
f64
i64
1973
5
1
41.0
190.0
7.4
67
1973
5
2
36.0
118.0
8.0
72
1973
5
3
12.0
149.0
12.6
74
1973
5
4
18.0
313.0
11.5
62
1973
5
5
null
null
14.3
56
The default polars output above is really helpful for data analysis! By passing it to the GT constructor, we can start getting it ready for presentation.
gt_air = GT(pl_airquality)gt_air
Year
Month
Day
Ozone
Solar_R
Wind
Temp
1973
5
1
41.0
190.0
7.4
67
1973
5
2
36.0
118.0
8.0
72
1973
5
3
12.0
149.0
12.6
74
1973
5
4
18.0
313.0
11.5
62
1973
5
5
None
None
14.3
56
In the next section I’ll show setting a title, and then go on to more exciting stuff like styling the body and creating column spanners.
Set title and subtitle
The simplest method in gt is GT.tab_header(), which lets you add a title and subtitle.
( gt_air# Table header ---- .tab_header( title ="New York Air Quality Measurements", subtitle ="Daily measurements in New York City (May 1-10, 1973)" ))
New York Air Quality Measurements
Daily measurements in New York City (May 1-10, 1973)
Year
Month
Day
Ozone
Solar_R
Wind
Temp
1973
5
1
41.0
190.0
7.4
67
1973
5
2
36.0
118.0
8.0
72
1973
5
3
12.0
149.0
12.6
74
1973
5
4
18.0
313.0
11.5
62
1973
5
5
None
None
14.3
56
Just like with plots, tables need titles so people know what they’re about!
Set body styles
The .tab_style() method sets styles—like fill color, or text properties—on different parts of the table. Let’s use it twice with a polars expression. First to highlight the row corresponding to the max Wind value, and then to bold that value.
Functions like style.fill() indicate what style to set.
Functions like loc.body() indicate where to apply the style. Its columns= and rows= parameters let you target specific parts of the table body (using polars expressions).
Currently, Great Tables only supports styling the table body. In the (very near) future, other loc.* functions will allow styling other parts of the table (e.g. the title, column labels, etc..).
Notice that there are now labels for “Time” and “Measurement” sitting above the column names. This is useful for emphasizing columns that share something in common.
Use GT.cols_labels() with html() to create human-friendly labels (e.g. convert things like cal_m_2 to cal/m2).
This is helpful because labels are often in a funky format (e.g. cal_m_2) because they’re easier to work with programmatically, so need to be cleaned up for presentation.
See the Column Labels page of the guide for more details.
Putting it all together
Finally, we’ll combine everything from the sections above into a single block of code, and use a few more rows of data.
Daily measurements in New York City (May 1-10, 1973)
Time
Measurement
Year
Month
Day
Ozone, ppbV
Solar R., cal/m2
Wind, mph
Temp, °F
1973
5
1
41.0
190.0
7.4
67
1973
5
2
36.0
118.0
8.0
72
1973
5
3
12.0
149.0
12.6
74
1973
5
4
18.0
313.0
11.5
62
1973
5
5
None
None
14.3
56
1973
5
6
28.0
None
14.9
66
1973
5
7
23.0
299.0
8.6
65
1973
5
8
19.0
99.0
13.8
59
1973
5
9
8.0
19.0
20.1
61
1973
5
10
None
194.0
8.6
69
Not too bad!
Table styling is a surprisingly deep topic, and we’ve only scratched the surface on what Great Tables can do. A big one not covered here is the many column formatting methods, like GT.fmt_currency().
Thoughtfully styled tables tell viewers that you care as much about communicating the table, as much as the data wrangling behind it. Working with polars has been an absolute dream, and we’re excited to explore different ways it and Great Tables make for easy table styling!