Goal of Presentation

By the end of this you will have had a whirlwind tour of the very tip of the data visualization best-practices iceberg. We will go over a broad range of topics generally applicable to data science usecases but not dive too deep into any single one. One thing to keep in mind the whole time is none of this is absolutely set in stone, most often in the real world you have to bend or break some of these rules to do what you want.

What to use?

“The best camera is the one that’s with you.” –Chase Jarvis.

A very common question for people starting out with R and visualization is “which library should I use?” Like most things there is no right answer. Every situation is different. These are a few points to keep in mind in deciding which tool to use (hint: it really doesn’t matter).

Jeff Leek has a fantastic article on his blog about this issue.
Ultimately it comes down to what you know. You can do an absolutely amazing amount in most tools (even excel) so do what you like best.
For most people in R the choice is Ggplot vs Base. I mostly use Ggplot because it’s what I am the most familiar with (and it has nice defaults (more on this later)).
Whatever you choose will, in the not to distant future, be old and replaced by the new best thing, so understanding the concepts is a much better investment of your time. The next bit of this will be trying to reinforce good concepts.

Common Mistakes

A lot of data visualization is common sense, but some of it isn’t. These are a few of the examples of charts made that are not the best fit for the data that I frequently see.

The Pie Chart

Okay, let’s get the elephant out of the room first. The pie chart elicits a similar response in a data-viz person as a computer scientist’s prediction algorithm to a statistician. Initially claims of blaspheme but sometimes upon closer inspection grudging respect.

# a simple pie chart
data = data.frame(
  val  = c( 8 ,  6 ,  9 ,  4 ,  2 , 3.5),
  labs = c("a", "b", "c", "d", "e", "f") )

pie(data$val, data$labs)

So why all the ire?

Humans have a very hard time interpreting angles, and that’s how a pie chart encodes the data. Looking at the code/chart above we know that d and f are 0.5 apart, or f is only 87.5% of the value of d, but upon initial inspection the average user would probably say they are the same.

So how could we fix it?

We could use something called a tree map:

library(treemap)
treemap(data, c("labs"), "val")

This works similar in spirit to a pie chart, but encodes values in physical area rather than using an angle.

I would argue this is actually worse than the pie chart, but it is certianly a good option for some types of data. If you had a large number of values or hierarchically clustered data, treemaps can be excellent tools for looking at large amounts of data fast.

Even simpler you could do a stacked bar chart.

library(ggplot2)
ggplot(data, aes(1, val, fill=labs, width=0.2)) + 
  geom_col()

Same concept as the treemap in that value is encoded in area rather than angle. This would be good for a low number of comparisons with logical ordering or as a supplementary figure for a larger visualization.

Out of all of these options a plain bar chart is probably the most clear.

ggplot(data, aes(y = val, x = labs)) + 
  geom_col() + 
  labs("x" = "")

Using a bar chart we can clearly see that f is smaller than d.

There was a paper a few years ago by two super stars in data visualization Jeffrey Heer and Mike Bostock. In it they took a bunch of visual encodings of the same data (much like we are doing here) and showed them to people and asked them questions about what the data said. They then recorded these results and plotted them to show differences between encoding quality.

Pie charts are pretty far down there, but then again so are tree maps. If you did it for a different dataset I am betting you would get different results given which chart type the data fits best with. This raises the important question:

Is all the hate warranted for pie charts?

Penn postdoc Randal Olsen has a good blog post on pie charts.. It is a highly recommended read but to paraphrase his rules on pie charts:

The parts must sum to a meaningful whole. Always ask yourself what the parts add up to, and if that makes sense for what you’re trying to convey.
Try and collapse your categories down to three or fewer. Pie charts cannot communicate multiple proportions, so stick to their strengths and keep your pie charts simple.
Always start your pie charts at the top. We naturally start reading pie charts at the top (the 0° mark). Don’t violate your reader’s expectations.

People intuitively get pie charts so don’t rule out their use entirely, but make sure you are using them properly.

When To Use a Bar Chart

Bar charts are fantastic tools. It seems that more often than not they are the best visualization for the job, often out-competing more complicated flashy visualizations in terms of ease of reading/comprehension. There are some instances where they are not appropriate however.

As a general rule of thumb if the measure is a quantity of something then it makes sense to use a bar chart. This would include number of infections, a person’s weight etc.. A general heuristic I like to use when deciding to use a bar chart or not is ‘could I redraw the chart such that the bars are made up of individual instances of whatever the y-axis is encoding?’

Let’s look at a group of patients and their percentiles for vitamin d levels in their blood.

First we plot with a bar plot.

data <- data.frame(student = c("Tina", "Trish", "Kevin", "Rebecca", "Sarah"),
                  percentile = c(25, 95, 54, 70, 99)  ) #percentile of d levels

p <- ggplot(data, aes(x = student, y = percentile))

p + geom_col()

The hierarchy of the data is clearly visible but the intuitive interpretation of the bar is slightly confusing. A percentile is not a sum of values but simply a place on the continuum of a scale. In addition, we have a tendency to assign good or bad to large or small levels of bar charts when in this case the middle would be best.

Let’s re-visualize the data as a dot-plot.

p + 
  geom_point(color = "steelblue", size = 4) +
  theme_minimal() + # helps make the grid lines look more like guides
  coord_flip()

This is more legible and intuitive. We see that the measure is simply a point where the student falls, not the accumulation of percentiles.

There are some exceptions to this rule. For instance: weight being looked at for a single person over time might be best shown on a line chart. Like almost everything in visualization, thinking carefully about what your data are before plotting them is important.

Showing a sample

What if the thing we’re interested in is showing multiple observations of a given value? For instance, say we were looking at the expression of a protein of interest across different conditions. Like good scientists we took multiple measurements of each condition so we should represent that.

Dynamite plots

So I will forgive you if you want to use these plots just due to the fact they have the coolest name, however, don’t.

To illustrate why let’s look at the dynamite plot of our hypothetical expression experiment…

p <- ggplot(expression_summarized,aes(x = sample, y = average)) +
  geom_errorbar(aes(ymin = average + sd, ymax = average + sd)) +
  geom_linerange(aes(ymin = average, ymax = average + sd))
  
p + geom_col()

All these look pretty similar! Must be nothing really interesting going on. Maybe the first two conditions have higher peak? Or do the bars represent variablity? Also, does the top of the bar represent the bottom of an interval? Or the middle? I can never remember (because it changes from plot to plot).

Let’s actually look at the data that went into these plots. Usually with plots like these the number of datapoints going in is tiny so we may as well just show it.

p +
  geom_col(alpha = 0.5) +
  geom_jitter(data = expressions, aes(x = sample, y = expression), width = 0.1)

Oh, oh my, those aren’t the same at all. Let’s just clean this up a tiny bit.

ggplot(expression_summarized,aes(x = sample, y = average)) +
  geom_pointrange(aes(ymin = average - sd, ymax = average + sd), 
                  shape = 1, color = 'steelblue') +
  geom_jitter(data = expressions, aes(x = sample, y = expression), 
              width = 0.1, alpha = 0.5)

If you want more evidence to push back against your PI with than the word of some random biostats grad student the blog post dynamite plots must die by Rafael Irizarry, chair of biostats at Harvard has much more indepth coverage of why dynamite plots are bad.

Box Plots

Box plots are, like the pie chart, one of the first visualization techniques we are taught. However, it is not necessarily a good one and many better new options have arisen.

The problem with box plots, much like dynamite plots, is they obscure trends at a resolution finer than the quantiles. Take for instance the following two box plots:

#Hiding the data input on purpose...
p <- ggplot(data, aes(dataset, val))
p + 
  geom_boxplot(fill = "steelblue", color = "grey") + 
  labs(title = "Box Plots")

Given the information that the standard box plot provides us we would say that these groups are identical.

What happens if we try another way of visualizing the distribution of data?

First let’s try a(nother form of the) dot plot:

p + 
  geom_dotplot(binaxis = "y", stackdir = "center", 
               fill = "steelblue", color = "steelblue", dotsize = 0.7) + 
  labs(title = "Dot Plots")

Now we can see that these data are very differently distributed.

Another method of visualizing the distribution of the groups is a violin plot. This is essentially a kernel density version of the dot plot. Useful for when the data are very large and a dot plot is not particularly useful due to the large number of dots drawn. However, if your data are small enough that you can actually visualize each point, do it.

p + 
  geom_violin(adjust = .5, fill = "steelblue", color = "steelblue") + 
  labs(title = "Violin Plots")

If you still want the familiarity of the box plot combined with the enhanced ability to see the underlying distribution you can combine the two plots as well.

p + 
  geom_dotplot(binaxis = "y", stackdir = "center", 
               fill = "steelblue", color = "steelblue") +
  geom_boxplot(alpha = 0, size=1)

Now we get the standard and familiar inference of quantiles, combined with seeing the finer resolution information about the distribution.

Histograms

Lastly we have the grand daddy of distribution visualizations, the histogram.

ggplot(data, aes(val)) + 
  geom_histogram(bins = 40) + 
  facet_wrap(~dataset)

That tells the story pretty well, too.

Nothing could be wrong with the simple well meaning histogram, right? Wrong, it is entirely possible to fall into the same issues as the box-plots with histograms. If your bins are aligned awkwardly with your data two histograms of the same data can look entirely different.

Look what happens if we switch up the bin number on our data.

ggplot(data, aes(val)) + 
  geom_histogram(bins = 10) + 
  facet_wrap(~dataset)

Here’s a brief animation of the histogram bin widths slowly changing while the underlying dataset doesn’t change. Be careful!

Read this article by statistician turned data-visualiation expert Nathan Yu on plotting distribution data for a much more thorough treatment of this issue.

Personal Plug: Use a sliding histogram (patent/trademark pending) to get rid of problems with binning but also keep intepretability that you lose with a kernal density plot. See my interactive demo.

Word Clouds

First we draw a traditional word cloud of the Bertrand Russell’s “An essay on the foundations of geometry”

library(tm)
library(SnowballC)
library(wordcloud)

Russell_Geom <- readChar("data/Russell_Geometry.txt", file.info("data/Russell_Geometry.txt")$size)
text_corpus <- Corpus(VectorSource(Russell_Geom)) #Generate a corpus
text_corpus <- tm_map(text_corpus, content_transformer(tolower))

## Warning in tm_map.SimpleCorpus(text_corpus, content_transformer(tolower)):
## transformation drops documents

text_corpus <- tm_map(text_corpus, removePunctuation) #remove punctuation

## Warning in tm_map.SimpleCorpus(text_corpus, removePunctuation):
## transformation drops documents

#Remove commonly used words that dont add meaning. (e.g. I, Me)
text_corpus <- tm_map(text_corpus, removeWords, stopwords('english'))

## Warning in tm_map.SimpleCorpus(text_corpus, removeWords,
## stopwords("english")): transformation drops documents

wordcloud(text_corpus, max.words = 40, random.order = FALSE)

Ahh clearly we can grab very important information on the frequency of the words in this book…

Is “point” or “space” bigger? “Geometry” and “axiom”? Basically it’s impossible to tell.

Now let’s do it in a bar chart.

freq_df <- DocumentTermMatrix(text_corpus) %>% 
  as.matrix() %>% 
  colSums() %>% {tibble(
    word = names(.),
    frequency = .
  )} %>% 
  arrange(-frequency)

#sort the data so ggplot respects the dataframe order
ggplot(head(freq_df, 40), aes(x = reorder(word, frequency), y = frequency)) +
  geom_bar(stat = "identity") +  labs(x = "Word") + #use a barchart and label the xaxis
  coord_flip()

So while the bar chart might not be as flashy and cool it certainly more accurately coveys the information that you are trying to show.

That being said, if you are trying to simply make eye candy then go for the word cloud. However, if you are attempting to facilitate meaningful analysis stick to a bar-chart.

Trucated Axes

The re-arranging of axes is one of the most potentially damaging forms of data visualization mistakes. By truncating an axis you can entirely change the interpretation of a chart. You can exaggerate a difference or minimize it. A good example of this done with potentially dangerous side effects is a tweet sent out by the magazine National Review.

Look at that, we’ve all been getting way too worried about climate change! But wait, looks like they started their x-axis at 0. Seems like a good idea until you realize that 0 Fahrenheit means absolutely nothing. If you’re going to start a temperature at 0 you might as well go all the way and do Kelvin.

Let’s see an example of where truncating the axis is bad.

data <- tibble(
  "date"  = c(2010, 2011, 2012, 2013),
  "deaths"= c(400,  402, 408,  412) )

p <- ggplot(data, aes(x = date, y = deaths)) + 
  geom_line() + 
  theme_bw() + 
  labs(title = "Hospital Deaths from 2010-2013")
p

Oh my, looks like we’ve had a massive spike in hospital deaths.

Deaths however, are a measurement that has a meaningful start point (zero). So let’s try and fix our axis scale to represent that.

p + ylim(0,450)

Turns out that was a false alarm (although still 12 more deaths might not be trivial).

Important point: ggplot automatically truncated the axis in this case. In a bar chart it wont let you set a non-zero axis without some esoteric scale commands but for many other plots (such as points and lines) it automatically truncates the axis so your data just fits in the limits. Be vigilant of this.

Multiple Axes

source

Let’s continue with our morbid theme by looking at Nicholas Cage movies, Tyler Vigen’s excelent site on spurious-correlations illustrates our next point very well. When you make a chart with two different axis you can basically make the data say anything you want.

Duke Professor Kieran Healy sums this up very well in a blog post titled “Two Y-Axes”.

source

This also goes with the previous point of axes truncation. You can see that by changing axes you can very drastically change interpretations.

Ggplot doesn’t even allow multiple axes at all as Hadley Wickham is strongly against the practice. (Again, good defaults.)

Information Overload

Say you have a lot of time series data. You might want to compare temporal trends in some measurement for patients in a clinical trial. One natural tendency might be to plot all of their values on the same plot, like below.

x_vals <- 1:50
lots_o_lines <- purrr::map_df(
  letters, 
  ~tibble(
    x = x_vals,
    y = sin(x_vals + rnorm(1))*rnorm(1) + rnorm(50),    
    id = .
  ))

#plot with different lines of different letters.
p <- ggplot(lots_o_lines, aes(x = x, y = y))

p + 
  geom_line(aes(color = id)) +
  labs(title = "Delicious Data Spaghetti", x = "time")

Well this is a mess. You really can’t tell what’s going on in any way. If you want to see any trends or potential outliers you better be able to distinguish between the shade of green for k and i, and then be able to filter out all the noise and run 50 choose 2 comparisons in your head.

A way to get around this is using a technique known as small multiples. In small multiples you have a bunch of little tiny charts all with a single data element. So in this case it would be 50 separate line plots with one line each.

p +
  geom_line() +
  facet_wrap(~id) + #Facet on each line and draw a seperate plot for each.
  labs(title = "Small multiple lines", x = "time")

As you can see patterns are much easier to see and outliers pop out immediately.

There is another method of dealing with this information overload. Say you have explored your data and want to highlight a single (or maybe two) value in the context of the others. You can highlight that individual line (or whatever graphical element you desire) to call attention to it alone in the chart. This is much more of a explanatory data visualization technique but it does work very well for showing context for an individual element.

library(gghighlight)

p +
  geom_line(aes(color = id)) +
  # Highlight just a single line
  gghighlight(id == 'z')

## Warning: You set use_group_by = TRUE, but grouped calculation failed.
## Falling back to ungrouped filter operation...

Sometimes however (are you sensing a theme?) information overload can be used to your advantage.

Take for instance the New York Times delegate prediction model:

source

In this situation we have a ton of lines, way more than a user can truly parse at one time. In this case this is an intentional method of illustrating uncertainty. This is a topic that could encompase an entire course. If you are interested in the cutting edge research on uncertianty visualization I suggest you look at Jessica Hulman of the University of Washington’s recent work on the topic.

As a side note I feel this is a specific area in which statisticians should be doing the inovating. We understand uncertainty better than most and as is evidenced in Jessica Hulman’s post they are representing multiple things the same that perhaps shouldn’t be (confidence and credible intervals). If anyone has ideas on this, talk to me!

The Third Dimension!

3d charts are cool and very tempting to make, but they are fraught with all sorts of problems. The main one being that perspective (literally) matters. Just like real life, stuff looks bigger the closer it is, so unless your viewer is going to be viewing your visualization on an oculus rift with stereo 3d (I have a visualization like this if you are interested) you should stick to two dimensions. (That being said, per usual, there are some ways around this that are acceptable.)

With that I give you potentially the worst data visualization ever created:

source

As we already talked about pie charts are dangerous as slices with different values can look very similar. Once you take that and add in the perspective skewings of the third dimension you get a perfect storm of misleading. I have no suggestions on how to fix this as there are none; it should probably be burned. Just don’t do it. (But later on I will demonstrate an example of when you can use a 3d visualization and be mostly okay.)

Designing Visualizations

So you have some interesting data and you need to build a plot. How should you approach the problem? There is no one-size-fits-all approach but the following are a few tips and techniques to building an effective visualization within a reasonable amount of time.

Requirements Analysis

A technique used in software engineering (but really any engineering) that can be very valuable when designing a visualizations is something know as requirements analysis.

Requirements analysis is in essence the act of sitting down before starting and thoroughly sketching out exactly what you need and want out of the visualization so you don’t waste time in the coding process by backtracking.

I had to use this because it uses comic-sans.

For example, say you are building a report looking at the time to event for some outcome. The person you are building the report is interested in how the time to event changes based on the product the event is happening on. The study was poorly funded so there are only a few dozen observations.

In this scenario your main goal is to compare values across conditions…

Since the condition of “product” doesn’t change across time there’s no real pairing to look at.
- This means you probably want a visualization that shows distribution.
Since you don’t have many observation you will probably want to use a technique that directly shows the data such as a dot/beeswarm plot.
Because you have just a few categories and the comparison of interest is the value, you should facet each dot plot so it is aligned by the numeric axis.

Don’t go crazy and explicitly lay out every little thing but make sure you have a clear goal and an idea of what you need to get there. Also, it’s okay if things need to be tweaked along the way.

Don’t Go Crazy

By virtue of cleaning and working with the data, you will most likely have a much more intimate understanding of it than the viewers of your visualization. Keep this in mind as you decide how to present your data. Something that may seem intuitive to you very well be entirely unintelligible to those less in touch with the data.

Beauty Vs. Clarity

A frustratingly qualitative issue in visualization is the role of beauty. Some people such as Edward Tufte are very strict on the matter. Stating that no “chart junk” should clutter your visualization. Basically that only precisely what is necessary to convey the data should be plotted, nothing more. This is very true and most likely the best practice for scientific articles, but many times beauty is a tool for comprehension as well.

If a plot is beautiful the viewer is more likely to stick with it, to investigate it and explore the data within. There is a field of psychology known as the generation effect which describes how people remember things better that the explore and explain to themselves. If you data is super clearly displayed then the user might simply see it but never remember it.

Take for example the following two visualizations. One is beautiful but certainly doesn’t convey the underlying data in the most straightforward way, where as the second is pretty spartan but the data is more than clear. Which one would you spend more time looking at (assuming you’re not a physicist)?

Home Grown Artisan Visualizations

These are some examples of more complex and interactive visualizations I have built for statistics related applications.

Tools (revisited)

While it’s all good to claim tools don’t matter. Ultimately you do need to use a tool to build your visualizations. So here’s some of the options.

Ggplot

While R vs. Python is a heated battle in the statistics community, a much more vitriolic battle is waged on the R sideline over plotting vs base graphics. Jeff Leek’s aforementioned article, while written in a tone calling for understanding on the two sides simply ignited passions to hereto unseen levels.

Ultimately, ggplot has its positives and negatives.

Strengths

Has fantastic defaults

It is pretty dang hard to make a plot with ggplot that looks bad. Base graphics? Pretty easy. This is good as it has helped many people put out better graphics than they otherwise would have.

Grammar of Graphics

The “gg” in ggplot stands for grammar of graphics which is a framework for plotting developed by Leland Wilkinson in his book The Grammar of Graphics. The basic tenants behind this methodology are that you start with your data, and then you assign a geometry to elements of that data, such as circle size to population, then you draw those geometries based upon some scaling of your data. When you think about visualization this way it helps you develop a better understanding of the data itself and think of proper ways to visualize it. (Think the bar vs dot chart.)

Verbose

Due to the grammar of graphics aspect ggplot is rather intelligible. For instance, while writing a line chart takes more characters of code than it does in base graphics it tends to be much clearer what is going on.

#base graphics
plot(x = df$date, y = df$weight, type = "l", col = "blue")

##ggplot
ggplot(data = df, aes(x = date, y = weight)) + geom_line(color = "blue")

Is col columns? type also seems rather esoteric and would require looking up definitions. In ggplot you can see what x is mapping to, what y is mapping to in your data, geom_line is rather clear that it’s drawing a line and coloring it blue.

This is important for sharing code with potentially less fluent coders.

In addition, to me, the ggplot chart is much more pleasing. The lines in the background facilitate comparisons of values far away from eachother, the axes have good names, and there is no harsh boundry around the plot.

Weaknesses

Slow

It generally takes a good bit of time to construct a ggplot graphic. Base allows you to rapidly get a plot up and running. For instance if you want to check if your simulation is running properly or if something interesting is happening in your data a quick plot(x,y) is usually more than enough. It doesn’t need to look pretty for you.

Limitated functionality

Want to plot a bunch of different charts on a single plot? With ggplot the charts generated with facet have to be of the same geometry. If you want to put together a line and bar plot you need to use another library called grid which is a pain, especially considering that it’s a single simple command it base (par(mfrow = c(a,b))).

Fixes?

There are some strong efforts to fix both of these weaknesses. RStudio is currently heading up a project to rewrite the main rendering engine behind ggplot and the early results show very impressive speedups. Also, the patchwork package lets you easily combine multiple ggplot visualation together in a straightforward way.

Plotly

This is a plotting library that allows you to generate interactive plots directly from R. It does this by rendering them in JavaScript (using a technique we will see shortly).

One beautiful thing about plotly is the ability to export ggplot objects directly to it.

library(plotly)

#grab some of R's built in data.
d <- diamonds[sample(nrow(diamonds), 1000), ]

#generate a ggplot object
p <- ggplot(data = d, aes(x = carat, y = price)) +
  geom_point(aes(text = paste("Clarity:", clarity)), size = 2) +
  geom_smooth(aes(colour = cut, fill = cut)) + facet_wrap(~ cut)

## Warning: Ignoring unknown aesthetics: text

#send it to plotly to recreate
ggplotly(p)

Now our normal ggplots can have interactivity which can be absolutely fantastic for exploring outliers/ presenting data in an engaging way.

Plotly is not limited simply to re-rendering ggplot. It is capable of rendering three dimensional and/or high performance visualizations using the same engine that video games use.

In the next example z is a matrix of data corresponding to a two parameter normal likelihood. We pass it to plotly and tell it to draw a surface plot and …

plot_ly(z = resMat, type = "surface")

…pretty cool.

Sometimes a 3d visualization is absolutely necessary (by the data or choice), in these instances you pretty much need the visualization to be interactive (or at least animated) to allow the user to be able to explore the 3d space in order to eliminate the biases injected by perspective.

Summary

Like said at the beginning of this document, choose your plotting library and then apply the above principles in it. Very rarely will you need to jump to a whole different library to do something. If you do, that’s why stackoverflow was invented.

R2D3

Until recently building custom interactive visualization for R using Javascript required a lot of boilerplate code and building an entire package. The package R2D3 vastly simplifies this process by allowing you to write plain javascript code and have data from R magically imported into it.

D3 is like the ggplot of the javascript visualization world. Most people that are serious about visualizations in javascript use it. It is much lower level than ggplot, but the amount of expression it gives you to build custom visualization’s is practically unlimited.

So while you still have to learn how to write visualizations in javascript/d3, you don’t have to do anything other than plop some javascript code in a file or markdown chunk to get your data from R into the extremely powerful ecosystem of javascript data visualization.

Examples

Here’s a few examples I’ve recently made:

Interactive Upset

r2d3::r2d3(
  script = 'upset_plot/upset.js',
  data = readr::read_rds("upset_plot/data/fake_upset_main.rds"),
  options = readr::read_rds('upset_plot/data/fake_upset_options.rds'),
  dependencies = c("d3-jetpack", 'upset_plot/helpers.js'),
  css= 'upset_plot/upset.css'
)

You-Draw-It charts

These are inspired by a great article done by the Times. (Unfortunately not by me :()

See my package, shinysense for more details and examples.

library(shinysense) 

tibble(
  x = 1:50, 
  value = x * sin(x / 6) + rnorm(50)
) %>% 
  drawr(
    x_col = x,
    y_col = value,
    draw_start = 25 # Start obscuring data after x = 25
  )

R2D3 is very valuable if you have a particular chart you need to make all the time that would benefit from interactivity and isn’t a standard plot like those supported in plotly. More and more researchers are publishing interactive online visualizations to go along with their papers and the combination of rmarkdown and R2D3 makes this very straightforward. No server needed.

Note: If you have something that falls into this category and aren’t comfortable with JavaScript visualization get in contact with me. I never turn down an opportunity to make cool visualizations.

Best Practives in Dataviz: An R Perspective

Nick Strayer

2019-11-14

Goal of Presentation

What to use?

Common Mistakes

The Pie Chart

So how could we fix it?

When To Use a Bar Chart

Showing a sample

Dynamite plots

Box Plots

Histograms

Word Clouds

Trucated Axes

Multiple Axes

Information Overload

The Third Dimension!

Designing Visualizations

Requirements Analysis

Don’t Go Crazy

Beauty Vs. Clarity

Home Grown Artisan Visualizations

Tools (revisited)

Ggplot

Strengths

Weaknesses

Fixes?

Plotly

Summary

R2D3

Examples