Plotting data

TIMOTHY JONES
FEBRUARY 2ND, 2020 AT 12:36 AM

Tim studied electrical engineering as an undergraduate and a graduate student at the University of Pennsylvania. He enjoyed writing reports in his introductory lab courses and began to obsess over the quality of his plots, teaching himself to use various software tools to make clear and effective visualizations. He started as a postdoctoral researcher in the field of synthetic biology at Boston University in 2017.

In science, communication is inseparable from the process of investigation. We depend on the unique observations of our peers to point out things we may have missed on our own. How do we effectively present the data we have collected in our experiments to take advantage of our peers’ insight, to justify our research to skeptics, or to dazzle a broader audience with our most impactful results? We’ve all heard that a picture is worth a thousand words, and the famous computer scientist Edsger Dijkstra has even suggested that an equation is worth a thousand pictures, but nevertheless we usually rely on visuals as the most efficient means to convey a message. In fact, we often assume that most of the prose or other information accompanying our figures may be completely ignored! The many important details in our visuals can make or break our audience’s conclusions about our research.. In this article, we will review some guidelines on constructing effective plots. A resources section also exists to provide links to useful software tools and more detailed discussion on the topic of data visualization.


Tangible tips


  • Consider the type of plot. What sort of plot captures the underlying relationships in the data? Pie charts, bar charts, and radar charts are ideal for categorization; scatter plots can show the relationship between two different quantities; histograms represent distributions of data. The message you are trying to convey will determine the type of visualization.


  • Consider the audience. The anticipated consumer of your visualization will inform you both on what data to present and how to display it. With highly technical audiences and members of your field, it may be appropriate to present more complicated plots with a lot of detail, whereas with broader audiences you may want to keep it simple.


  • Consider the message. Spending time making beautiful plots can be fun, but the message is more important. Balance your time and effort accordingly.


  • Include all the relevant data in your presentation or report. Include controls, original data after a transformation you have applied, etc. Anticipate questions from the audience, whether they are reading a report in isolation or listening to a live presentation. Include additional visualizations that answer the questions that may come up.


  • Label the axes. Plotting tools in the resources section have facilities to manipulate axes to your liking. Always label your axes with a name of the quantity that they measure. Include the units of the measurement in brackets or parenthesis. If the measurement was not calibrated to any particular unit, you can write “arbitrary units” or “a.u.”


  • Add a legend, if needed. If you have plotted multiple curves with different colors, include a legend specifying what each data set represents.


  • Be consistent. It is always good practice to be consistent with naming and labeling conventions you have adopted whenever possible. For example, if you label one axis of a plot with units of “Ohms,” label all other axes with the same units also as “Ohms,” rather than the symbol for the Ohm, “Ω.” While the same information is conveyed in this case, inconsistency can be distracting. You can also be consistent in how you style your plots. If you associate a particular experiment, data set, physical system, etc. with a particular color in one plot, it may be a good idea to use that color to refer to that thing in all of your other plots. Keep in mind that even minor inconsistencies like a sudden change in font or line thickness can distract the audience from your message.


  • Title your plots. Always provide a title for your plots. If the plot appears alone in a slide of a presentation, you may title the slide instead. If the plot appears in a report with a lot of text, you may provide what you would have otherwise written as a title as the first sentence in a figure caption.


  • Use insets wisely. Insets can be very useful to inform the reader or audience of a related behavior of the system under study, but use insets carefully, don’t unnecessarily crowd the plot, and be mindful of the effects of shrinking a plot to serve as an inset.


  • Select colors carefully. Be cognizant of any color restrictions of the medium in which the plot will appear. Will the plot be printed in color? It may be helpful to make color choices that are grayscale-friendly. Meaning that if a reader were to print your plot in grayscale, they would still be able to differentiate between differently-colored samples based on their shade of gray. Are your colors clear and visually differentiable? This last question can be a subject of lengthy discussion. See, for example, articles linked in the Resources section as to why using rainbow or other attractive color maps may be a bad idea, and for information on how to make sure your plots are intelligible to readers or audience members with color blindness.



Useful Resources


Software tools for data visualization and data processing

  • Microsoft Excel – The classic spreadsheet software, sufficient for uncomplicated data manipulation and storage, basic plots and curve-fitting. Google Docs offers many of the same features.


  • Origin – Powerful plotting software with a graphical user interface, used by many without a penchant for programming. Your lab may have a license. https://www.originlab.com/


  • MATLAB – The most popular computer algebra system, with a built-in plotting capabilities. As with most of the tools below, MATLAB can allow a user to very easily generate plots in batch, and regenerate them after tweaking, a workflow that is cumbersome in GUI-based tools. https://www.mathworks.com/products/matlab.html




  • matplotlib – Python has emerged as a competitor to MATLAB, offering all the features of a general-purpose programming language and a very large community. Matplotlib is the most commonly used and supported module in Python for data visualization. https://matplotlib.org/



  • pgfplots – If you are a LaTeX power user, give this package a try. It can be used in conjunction with TikZ to achieve very precisely specified generative plots. https://ctan.org/pkg/pgfplots



General purpose drawing tools

  • Adobe Illustrator – Illustrator is well known among digital artists and designers, but is often used in the scientific community to make figures. Illustrator may be used to post-process or add illustrations to plots made in other software. http://www.adobe.com/Illustrator


  • Inkscape – A vector-graphics editor, similar to Illustrator, available for free under an open-source license. https://inkscape.org/


Color selection



Books

  • The Visual Display of Quantitative Information – Edward Tufte
  • Visualize This: The FlowingData Guide to Design, Visualization, and Statistics – Nathan Yau


Other articles and resources

Comments and questions

Leave your thoughts or ask a question to get advice.

No comments were found.