Great Tables version 0.10.0 has be released today and it contains a host of new features to support tables meant for scientific publishing.
In this post, we’ll review the big pieces that scientific tables need:
Unit notation: rendering units and chemical formulas (e.g., °C or C6H6).
Scientific notation: formatting for very large and small numbers (e.g., 3.50 × 10−11)
Nanoplots: compact visualizations for revealing trends.
We’ve added six new datasets, to help quickly show off scientific publishing! We’ll use the new reactions and gibraltar datasets to create examples in the fields of Atmospheric Chemistry and Meteorology, respectively.
Tip
Rich will be speaking on this at SciPy!
If you’re at SciPy 2024 in Tacoma, WA, Rich’s talk is scheduled for July 11, 2024 (16:30–17:00 PT). The talk is called Great Tables for Everyone and it’s sure to be both exciting and educational. If you’re not attending that’s okay, the talk is available in GitHub.
Unit and scientific notation
We added the reactions dataset to serve as the basis for examples in the discipline of Atmospheric Chemistry. The dataset contains reaction rate constants for gas-phase reactions of 1,683 organic compounds. Each of these compounds can potentially undergo reaction with hydroxyl radicals (OH), nitrate radicals (NO3), or chlorine atoms (Cl). These reaction rate constants are typically very small values in units of cm3 molecules–1 s–1. In the upcoming example, we’ll pare down this massive dataset to only 11 rows representing the class of organic compounds known as mercaptans.
To make this table work well in a scientific reporting context, we need three pieces:
way to represent units, like cm3
method for typesetting chemical formulae, as in CH4
formatting for very small numbers in scientific notation.
Great Tables provides the necessary functionality for all three requirements. Here is a summary table that tabulates rate constants for mercaptan compounds undergoing reaction with OH, O3, and Cl:
This is a nice-looking table! And note these pieces:
The label= argument to functions like .tab_spanner() supports the use of curly braces ({{/}}) for the specialized units notation. So using "{{cm^3 molecules^–1 s^–1}}" in the input will become cm3 molecules–1 s–1 in the output
The .fmt_units() method converts values that are already in units notation in the table body. For example, a cell with text "%CH4S%" becomes CH4S (the surrounding % indicates that the text should be interpreted as chemistry notation).
The .fmt_scientific() method formats values (in this case, very small values) to scientific notation (e.g., 3.50 × 10–11). Not doing so would make the table look very strange to a researcher that is familar with this sort of data.
The combination of units notation (and chemistry notation, which is a part of that) really makes the presentation of this table complete and understandable to a practioner of the field. Great Tables supports the use of units notation in spanner labels (with .tab_spanner()) and also in column labels (with .cols_labels()). The column label ‘NO3’ was created with the latter method by supplying the text "{{%NO3%}}" as the column label for the NO3_k298 column.
Nanoplots
We added the nanoplots feature to Great Tables in v0.4.0 (check out the intro blog post for a quick explainer) so that tables can contain small, info-packed plots that fit reasonably well into a table context. They are interactive in that hovering over the data points provides additional plot information. This approach brings together the advantages of plots (elucidation of trends in data) and tables (access to numerical values representing the data points) in a single summary visualization.
Version 0.10.0 of Great Tables adds the gibraltar dataset, which provides meteorological data (temeperature, humidity, wind speed, etc.) for the entire month of May 2024 at Gibraltar Airport Station.
Nanoplots, as mentioned, are great for condensing a lot of information into a small area. Our example here with the gibraltar dataset takes all of the temperature and humidity data for the first 10 days of May 2023 and displays them in easy-to-explore nanoplots across two columns:
Once we have the data aggregated in the form of list columns, the .fmt_nanoplot() method shows us the trends of temperature and relative humidity values throughout the day (from 00:00 to 24:00). One interesting observation that can be made from the table is that on May 9, 2023 there was a late-day temperature increase that coincided with a corresponding decrease in relative humidity. Making such an observation without nanoplots would be quite a bit more difficult and would require some serious determination, necessitating a careful scanning of numbers across a row cells.
Units notation is ever useful and it is applied in one of the column labels of this table. It could potentially be difficult to format even simple things like the units of temperature. In this case we wanted to add in the temperature units of °C for the temperature column. Units notation has a collection of symbols available, including ":degree:" (colons encapsulate the collection of symbol keywords), for insertion within units notation text. The example takes advantage of the available symbols and so having °C as part of a label is not too hard to express.
Hope all your (science-y) tables are great!
We did scientific work pretty heavily in the past and so we understand that great tables in the realm of science publication is something that could and should be possible. We’ll keep doing more to make this even better in upcoming releases.