When bar charts take over the world of scientific figures, Mayo Clinic researchers and colleagues sound the alarm.
Looking up, into the night sky overwhelms the senses.
Then the brain picks up a pattern—three bright stars? Oh, that’s Orion’s belt.
And from there a viewer can find the Orion nebula, Betelgeuse and Bellatrix, and Saiph and Rigel; all stars in the The Hunter constellation.
Connecting the Dots
Patterns in the night sky allow viewers to gain and process information. Similarly, scientific data has patterns that tell a story. But that story can be obscured by the way it’s presented, so Mayo Clinic researchers and their colleagues are working on better ways to present and share scientific data. To do that, they assessed how data are presented currently.
Information Design Team Assemble!
Tracey Weissgerber, Ph.D., is a Mayo Clinic hypertension researcher and a National Institutes of Health Building Interdisciplinary Research Careers in Women’s Health scholar, mentored by Vesna Garovic, M.D., a nephrology and hypertension specialist at Mayo.
Drs. Weissgerber and Garovic formed a multidisciplinary team with biomedical statistics and informatics specialist Natasa Milic, M.D., Ph.D.; and software engineer Marko Savic from the University of Belgrade Medical School, as well as Stacey Winham, Ph.D., a Mayo Clinic biostatistician.
The Rise of the Bar
In a 2015 paper published in PLoS Biology, the team looked at full-length articles published in the top 25 percent of physiology journals during the first quarter of 2014. The authors report that bar charts are one of the most recognizable data visualizations, appearing in 86 percent of the papers reviewed.
When the team’s PLOS Biology paper was published, Dr. Weissgerber tracked tweets that mentioned the paper. Biologists, ecologists, archaeologists, and astronomers all picked up on the story. Whether the subjects are lakes, insects, cells, mice, or humans, small data sets hiding out behind bar graphs are a big problem.
The height of the bar represents the mean, or average, of the numbers within the collected data (also called observations). The Popsicle stick, or error bar, typically shows the range of values in which the average is most likely to fall with repeated experiments.
But for all they show, bar charts hide several pieces of information that scientists need. How many observations were made? What is the range of values observed? What is the overlap between the observations in different groups? This information is essential to help scientists determine how important the difference between groups might be.
Hidden Data, Clear Medical Consequences
Here’s one example: preeclampsia. This high blood pressure disorder affects women during pregnancy and can cause both maternal and fetal death.
“One of the things we’re concerned with in preeclampsia,” Dr. Weissgerber says, “is that we think that while all women end up with a similar set of symptoms, they get there in different ways. You can’t understand those types of differences if you are only looking at a bar graph showing an average response. You really need to be able to see individual data from each woman.”
In their paper, Dr. Weissgerber and her co-authors show how the same bar graph (A in the image below) can represent different groupings that could have real consequences for interpretation. The bar graph labeled A in the figure could be used to represent the graphs labeled B, C, D, and E. But B through E can all be interpreted differently if you can see the layout of the data in what are called dot plots:
- The second group has higher values than the first group (symmetric)
- One subject has an unusually high value (outlier)
- Subjects seem to cluster into subgroups with higher and lower values. For example, this might occur if each group includes men and women, and men have lower values than women (bimodal)
- The smaller range of values in the second group might be because there are fewer data points (Unequal n)
“All the bars do is focus your attention on whether the means look different,” says Dr. Weissgerber. “You’re not really thinking about the sample sizes and the distributions; are there outliers, and all those types of questions.”
But those types of questions are what allow scientists to evaluate data.
“The scientific process is really built around us being able to check each other’s work, build on each other’s work and move it a step forward,” says Dr. Weissgerber. “And it’s hard to do that if you can’t see the data behind a bar graph or line graph.”
Reproducing Results, Clarifying the Data
So to make transparency a little easier, the team developed templates to help scientists switch from bar charts to dot pots. And with the help of Savic, the team created a free software tool allowing scientists (even those without any programming experience) to interact with data sets using another common visualization, the line graph. Using this tool, scientists (or readers and reviewers) can adjust variables to potentially gain deeper insight into the observations.
This is one of several attempts by scientists to steer data visualization back on course – some have turned to Kickstarter campaigns and Twitter to get their “#barbarplots” message out. Edward Tufte, Ph.D., statistician, visualizer, and artist, laments in a post on presenting cancer survival rates the “low data-density” of the charts. And that’s the nicest thing he has to say.
But this new interactive tool is a proof of concept, designed to illustrate the idea that interactive graphs can improve transparency and address the unmet statistical needs of scientists, especially those just getting started in the discipline.
“When you see the data points, you really start to think about the assumptions underlying the statistical tests,” says Dr. Weissgerber.
And for scientists, that transparency is what will facilitate interactive discussion, scientific advances, and innovations in the practice of health care. The Mayo Clinic team and their colleagues hopes that in time data transparency will, in essence, let those data stars shine.
Publications by this team:
Beyond Bar and Line Graphs: Time for a new Data Presentation Paradigm, PLoS Biology Perspective
Reinventing Biostatistics Education for Basic Scientists, PLoS Biology Perspective
From Statistics to interactive: Transforming Data Visualization to Improve Transparency, PLoS Biology Community Page
Editorial: Transparent Reporting for Reproducible Science, Journal of Neuroscience Research
– Sara Tiner, August 30, 2016