multiple - r plot two histograms side by side
How to plot two histograms together in R? (6)
library(plotly) #add username and key p <- plotly(username="Username", key="API_KEY") #generate data x0 = rnorm(500) x1 = rnorm(500)+1 #arrange your graph data0 = list(x=x0, name = "Carrots", type='histogramx', opacity = 0.8) data1 = list(x=x1, name = "Cukes", type='histogramx', opacity = 0.8) #specify type as 'overlay' layout <- list(barmode='overlay', plot_bgcolor = 'rgba(249,249,251,.85)') #format response, and use 'browseURL' to open graph tab in your browser. response = p$plotly(data0, data1, kwargs=list(layout=layout)) url = response$url filename = response$filename browseURL(response$url)
Full disclosure: I'm on the team.
I am using R and I have two data frames: carrots and cucumbers. Each data frame has a single numeric column which lists the length of all measured carrots (total: 100k carrots) and cucumbers (total: 50k cucumbers).
I wish to plot two histogram - carrot length and cucumbers lengths - on the same plot. They overlap, so I guess I also need some transparency. I also need to use relative frequencies not absolute numbers since the number of instances in each group is different.
something like this would be nice but I don't understand how to create it from my two tables:
@Dirk Eddelbuettel: The basic idea is excellent but the code as shown can be improved. [Takes long to explain, hence a separate answer and not a comment.]
hist() function by default draws plots, so you need to add the
plot=FALSE option. Moreover, it is clearer to establish the plot area by a
plot(0,0,type="n",...) call in which you can add the axis labels, plot title etc. Finally, I would like to mention that one could also use shading to distinguish between the two histograms. Here is the code:
set.seed(42) p1 <- hist(rnorm(500,4),plot=FALSE) p2 <- hist(rnorm(500,6),plot=FALSE) plot(0,0,type="n",xlim=c(0,10),ylim=c(0,100),xlab="x",ylab="freq",main="Two histograms") plot(p1,col="green",density=10,angle=135,add=TRUE) plot(p2,col="blue",density=10,angle=45,add=TRUE)
And here is the result (a bit too wide because of RStudio :-) ):
Here is an even simpler solution using base graphics and alpha-blending (which does not work on all graphics devices):
set.seed(42) p1 <- hist(rnorm(500,4)) # centered at 4 p2 <- hist(rnorm(500,6)) # centered at 6 plot( p1, col=rgb(0,0,1,1/4), xlim=c(0,10)) # first histogram plot( p2, col=rgb(1,0,0,1/4), xlim=c(0,10), add=T) # second
The key is that the colours are semi-transparent.
Edit, more than two years later: As this just got an upvote, I figure I may as well add a visual of what the code produces as alpha-blending is so darn useful:
Here is an example of how you can do it in "classic" R graphics:
## generate some random data carrotLengths <- rnorm(1000,15,5) cucumberLengths <- rnorm(200,20,7) ## calculate the histograms - don't plot yet histCarrot <- hist(carrotLengths,plot = FALSE) histCucumber <- hist(cucumberLengths,plot = FALSE) ## calculate the range of the graph xlim <- range(histCucumber$breaks,histCarrot$breaks) ylim <- range(0,histCucumber$density, histCarrot$density) ## plot the first graph plot(histCarrot,xlim = xlim, ylim = ylim, col = rgb(1,0,0,0.4),xlab = 'Lengths', freq = FALSE, ## relative, not absolute frequency main = 'Distribution of carrots and cucumbers') ## plot the second graph on top of this opar <- par(new = FALSE) plot(histCucumber,xlim = xlim, ylim = ylim, xaxt = 'n', yaxt = 'n', ## don't add axes col = rgb(0,0,1,0.4), add = TRUE, freq = FALSE) ## relative, not absolute frequency ## add a legend in the corner legend('topleft',c('Carrots','Cucumbers'), fill = rgb(1:0,0,0:1,0.4), bty = 'n', border = NA) par(opar)
The only issue with this is that it looks much better if the histogram breaks are aligned, which may have to be done manually (in the arguments passed to
Here's the version like the ggplot2 one I gave only in base R. I copied some from @nullglob.
generate the data
carrots <- rnorm(100000,5,2) cukes <- rnorm(50000,7,2.5)
You don't need to put it into a data frame like with ggplot2. The drawback of this method is that you have to write out a lot more of the details of the plot. The advantage is that you have control over more details of the plot.
## calculate the density - don't plot yet densCarrot <- density(carrots) densCuke <- density(cukes) ## calculate the range of the graph xlim <- range(densCuke$x,densCarrot$x) ylim <- range(0,densCuke$y, densCarrot$y) #pick the colours carrotCol <- rgb(1,0,0,0.2) cukeCol <- rgb(0,0,1,0.2) ## plot the carrots and set up most of the plot parameters plot(densCarrot, xlim = xlim, ylim = ylim, xlab = 'Lengths', main = 'Distribution of carrots and cucumbers', panel.first = grid()) #put our density plots in polygon(densCarrot, density = -1, col = carrotCol) polygon(densCuke, density = -1, col = cukeCol) ## add a legend in the corner legend('topleft',c('Carrots','Cucumbers'), fill = c(carrotCol, cukeCol), bty = 'n', border = NA)
That image you linked to was for density curves, not histograms.
If you've been reading on ggplot then maybe the only thing you're missing is combining your two data frames into one long one.
So, let's start with something like what you have, two separate sets of data and combine them.
carrots <- data.frame(length = rnorm(100000, 6, 2)) cukes <- data.frame(length = rnorm(50000, 7, 2.5)) #Now, combine your two dataframes into one. First make a new column in each that will be a variable to identify where they came from later. carrots$veg <- 'carrot' cukes$veg <- 'cuke' #and combine into your new data frame vegLengths vegLengths <- rbind(carrots, cukes)
After that, which is unnecessary if your data is in long formal already, you only need one line to make your plot.
ggplot(vegLengths, aes(length, fill = veg)) + geom_density(alpha = 0.2)
Now, if you really did want histograms the following will work. Note that you must change position from the default "stack" argument. You might miss that if you don't really have an idea of what your data should look like. A higher alpha looks better there. Also note that I made it density histograms. It's easy to remove the
y = ..density.. to get it back to counts.
ggplot(vegLengths, aes(length, fill = veg)) + geom_histogram(alpha = 0.5, aes(y = ..density..), position = 'identity')