SPSS Statistics

SPSS Statistics

Your hub for statistical analysis, data management, and data documentation. Connect, learn, and share with your peers! 

 View Only

Avoid Dynamite Plots! Visualizing dot plots with super-imposed confidence intervals in SPSS and R

By Archive User posted Mon February 20, 2012 01:51 PM

  
Over at the stats.se site I have come across a few questions demonstrating the power of utilizing dot plots to visualize experimental results.

Also some interesting discussion on what error bars to plot in similar experiments is in this question, Follow up: In a mixed within-between ANOVA plot estimated SEs or actual SEs?

Here I will give two examples utilizing SPSS and R to produce similar plots. I haven't annotated the code that much, but if you need anything clarified on what the code is doing let me know in the comments. The data is taken from this question on the stats site.




Citations of Interest to the Topic







SPSS Code to generate below dot plot





 

*******************************************************************************************. data list free /NegVPosA NegVNtA    PosVNegA    PosVNtA NtVNegA NtVPosA.
begin data
0.5 0.5 -0.4 0.8 -0.45 -0.3
0.25 0.7 -0.05 -0.35 0.7 0.75
0.8 0.75 0.65 0.9 -0.15 0
0.8 0.9 -0.95 -0.05 -0.1 -0.05
0.9 1 -0.15 -0.35 0.1 -0.85
0.8 0.8 0.35 0.75 -0.05 -0.2
0.95 0.25 -0.55 -0.3 0.15 0.3
1 1 0.3 0.65 -0.25 0.35
0.65 1 -0.4 0.25 0.3 -0.8
-0.15 0.05 -0.75 -0.15 -0.45 -0.1
0.3 0.6 -0.7 -0.2 -0.5 -0.8
0.85 0.45 0.2 -0.05 -0.45 -0.5
0.35 0.2 -0.6 -0.05 -0.3 -0.35
0.95 0.95 -0.4 0.55 -0.1 0.8
0.75 0.3 -0.05 -0.25 0.45 -0.45
1 0.9 0 0.5 -0.4 0.2
0.9 0.25 -0.25 0.15 -0.65 -0.7
0.7 0.6 -0.15 0.05 0 -0.3
0.8 0.15 -0.4 0.6 -0.05 -0.55
0.2 -0.05 -0.5 0.05 -0.5 0.3
end data.
dataset name dynamite.

*reshaping the data wide to long, to use conditions as factors in the plot.

varstocases
/make condition_score from NegVPosA to NtVPosA
/INDEX = condition (condition_score).

*dot plot, used dodge symmetric instead of jitter.
GGRAPH
/GRAPHDATASET dataset = dynamite NAME="graphdataset" VARIABLES=condition condition_score MISSING=LISTWISE
REPORTMISSING=NO
/GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
SOURCE: s=userSource(id("graphdataset"))
DATA: condition=col(source(s), name("condition"), unit.category())
DATA: condition_score=col(source(s), name("condition_score"))
GUIDE: axis(dim(1), label("condition"))
GUIDE: axis(dim(2), label("condition_score"))
ELEMENT: point.dodge.symmetric(position(condition*condition_score))
END GPL.

*confidence interval plot.

*cant get gpl working (maybe it is because older version) - will capture std error of mean.

dataset declare mean.
OMS /IF LABELS = 'Report'
/DESTINATION FORMAT = SAV OUTFILE = 'mean'.
MEANS TABLES=condition_score BY condition
/CELLS MEAN SEMEAN.
OMSEND.

dataset activate mean.
compute mean_minus = mean - Std.ErrorofMean.
compute mean_plus = mean + Std.ErrorofMean.
execute.

select if Var1 "Total".
execute.

rename variables (Var1 = condition).

*Example just interval bars.
GGRAPH
/GRAPHDATASET dataset = mean NAME="graphdataset2" VARIABLES=condition mean_plus
mean_minus Mean[LEVEL=SCALE]
MISSING=LISTWISE REPORTMISSING=NO
/GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
SOURCE: s2=userSource(id("graphdataset2"))
DATA: condition=col(source(s2), name("condition"), unit.category())
DATA: mean_plus=col(source(s2), name("mean_plus"))
DATA: mean_minus=col(source(s2), name("mean_minus"))
DATA: Mean=col(source(s2), name("Mean"))
GUIDE: axis(dim(1), label("Var1"))
GUIDE: axis(dim(2), label("Mean Estimate and Std. Error of Mean"))
SCALE: linear(dim(2), include(0))
ELEMENT: interval(position(region.spread.range(condition*(mean_minus+mean_plus))),
shape(shape.ibeam))
ELEMENT: point(position(condition*Mean), shape(shape.square))
END GPL.

*now to put the two datasets together in one chart.
*note you need to put the dynamite source first, otherwise it treats it as a dataset with one observation!
*also needed to do some post-hoc editing to get the legend to look correct, what I did was put an empty text box over top of
*the legend items I did not need.

GGRAPH
/GRAPHDATASET dataset = mean NAME="graphdataset2" VARIABLES=condition mean_plus
mean_minus Mean[LEVEL=SCALE]
MISSING=LISTWISE REPORTMISSING=NO
/GRAPHDATASET dataset = dynamite NAME="graphdataset" VARIABLES=condition condition_score MISSING=LISTWISE
REPORTMISSING=NO
/GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
SOURCE: s=userSource(id("graphdataset"))
DATA: condition2=col(source(s), name("condition"), unit.category())
DATA: condition_score=col(source(s), name("condition_score"))
SOURCE: s2=userSource(id("graphdataset2"))
DATA: condition=col(source(s2), name("condition"), unit.category())
DATA: mean_plus=col(source(s2), name("mean_plus"))
DATA: mean_minus=col(source(s2), name("mean_minus"))
DATA: Mean=col(source(s2), name("Mean"))
GUIDE: axis(dim(1), label("Condition"))
GUIDE: axis(dim(2), label("Tendency Score"))
SCALE: linear(dim(2), include(0))
SCALE: cat(aesthetic(aesthetic.color.interior), map(("Observation", color.grey), ("Mean", color.black), ("S.E. of Mean", color.black)))
SCALE: cat(aesthetic(aesthetic.color.exterior), map(("Observation", color.grey), ("Mean", color.black), ("S.E. of Mean", color.black)))
SCALE: cat(aesthetic(aesthetic.shape), map(("Observation", shape.circle), ("Mean", shape.square), ("S.E. of Mean", shape.ibeam)))
ELEMENT: point.dodge.symmetric(position(condition2*condition_score), shape("Observation"), color.interior("Observation"), color.exterior("Observation"))
ELEMENT: interval(position(region.spread.range(condition*(mean_minus+mean_plus))),
shape("S.E. of Mean"), color.interior("S.E. of Mean"), color.exterior("S.E. of Mean"))
ELEMENT: point(position(condition*Mean), shape("Mean"), color.interior("Mean"), color.exterior("Mean"))
END GPL.
*******************************************************************************************.

R code using ggplot2 to generate dot plot


 



library(ggplot2)
library(reshape)

#this is where I saved the associated dat file in the post
work <- "F:\Forum_Post_Stuff\dynamite_plot"
setwd(work)

#reading the dat file provided in question
score <- read.table(file = "exp2tend.dat",header = TRUE)

#reshaping so different conditions are factors
score_long <- melt(score)

#now making base dot plot
plot <- ggplot(data=score_long)+
layer(geom = 'point', position =position_dodge(width=0.2), mapping = aes(x = variable, y = value)) +
theme_bw()

#now making the error bar plot to superimpose, I'm too lazy to write my own function, stealing from webpage listed below
#very good webpage by the way, helpful tutorials in making ggplot2 graphs
#http://wiki.stdout.org/rcookbook/Graphs/Plotting%20means%20and%20error%20bars%20(ggplot2)/

##################################################################################
## Summarizes data.
## Gives count, mean, standard deviation, standard error of the mean, and confidence interval (default 95%).
## data: a data frame.
## measurevar: the name of a column that contains the variable to be summariezed
## groupvars: a vector containing names of columns that contain grouping variables
## na.rm: a boolean that indicates whether to ignore NA's
## conf.interval: the percent range of the confidence interval (default is 95%)
summarySE <- function(data=NULL, measurevar, groupvars=NULL, na.rm=FALSE, conf.interval=.95, .drop=TRUE) {
require(plyr)

# New version of length which can handle NA's: if na.rm==T, don't count them
length2 <- function (x, na.rm=FALSE) {
if (na.rm) sum(!is.na(x))
else length(x)
}

# This is does the summary; it's not easy to understand...
datac <- ddply(data, groupvars, .drop=.drop,
.fun= function(xx, col, na.rm) {
c( N = length2(xx[,col], na.rm=na.rm),
mean = mean (xx[,col], na.rm=na.rm),
sd = sd (xx[,col], na.rm=na.rm)
)
},
measurevar,
na.rm
)

# Rename the "mean" column
datac <- rename(datac, c("mean"=measurevar))

datac$se <- datac$sd / sqrt(datac$N) # Calculate standard error of the mean

# Confidence interval multiplier for standard error
# Calculate t-statistic for confidence interval:
# e.g., if conf.interval is .95, use .975 (above/below), and use df=N-1
ciMult <- qt(conf.interval/2 + .5, datac$N-1)
datac$ci <- datac$se * ciMult

return(datac)
}
##################################################################################

summary_score <- summarySE(score_long,measurevar="value",groupvars="variable")

ggplot(data = summary_score) +
layer(geom = 'point', mapping = aes(x = variable, y = value)) +
layer(geom = 'errorbar', mapping = aes(x = variable, ymin=value-se,ymax=value+se))

#now I need to merge these two dataframes together and plot them over each other
#merging summary_score to score_long by variable

all <- merge(score_long,summary_score,by="variable")

#adding variables to data frame for mapping aesthetics in legend
all$observation <- "observation"
all$mean <- "mean"
all$se_mean <- "S.E. of mean"

#these define the mapping of categories to aesthetics
cols <- c("S.E. of mean" = "black")
shape <- c("observation" = 1)

plot <- ggplot(data=all) +
layer(geom = 'jitter', position=position_jitter(width=0.2, height = 0), mapping = aes(x = variable, y = value.x, shape = observation)) +
layer(geom = 'point', mapping = aes(x = variable, y = value.y, color = se_mean)) +
layer(geom = 'errorbar', mapping = aes(x = variable, ymin=value.y-se,ymax=value.y+se, color = se_mean)) +
scale_colour_manual(" ",values = cols) +
scale_shape_manual(" ",values = shape) +
ylab("[pVisual - pAuditory]") + xlab("Condition") + theme_bw()
plot
#I just saved this in GUI to png, saving with ggsave wasn't looking as nice

#changing width/height in ggsave seems very strange, maybe has to do with ymax not defined?
#ggsave(file = "Avoid_dynamite.png", width = 3, height = 2.5)
#adjusting size of plot within GUI works just fine





Feel free to let me know of any suggested improvements in the code. The reason I did code both in SPSS and R is that I was unable to generate a suitable legend in SPSS originally. I was able to figure out how to generate a legend in SPSS, but it still requires some post-hoc editing to eliminate the extra aesthetic categories. Although the chart is simple enough maybe a legend isn't needed anyway.







#datavisualization
#R
#SPSS
#SPSSStatistics
#stackexchange
#Visualization
7 comments
0 views

Permalink

Comments

Thu January 02, 2014 09:58 AM

This message was posted by a user wishing to remain anonymous
[…] size groups in SPSS (2,468 total views), Hacking the default SPSS chart template (2,237), and Avoid Dynamite Plots! Visualizing dot plots with super-imposed confidence intervals in SPSS and R (1,542) are some of my most popular posts. The Junk Charts Challenge post has a total of 1,804 […]

Fri August 02, 2013 03:44 PM

Thanks again, I couldn't find a solution with override, but your code works just as well!

Wed July 31, 2013 05:54 PM

There is probably a more elegant way, but if you map the aesthetics to different variables it will separate the legends.

#seperate legends
########################################
all$group12 <- all$group1
andy3 <- ggplot(all, aes(x=variable, y=value.x)) +
geom_dotplot(binwidth=.1, alpha=.6, position="dodge", binaxis="y", binstataxis="y",
stackdir="up", aes(fill=group1)) +
geom_errorbar(aes(ymin=value.y-se,ymax=value.y+se, width = 0.2, colour=group12), position = position_dodge(width = 0.9)) +
theme_bw() + facet_wrap(~group2, nrow = 2)
andy3
########################################

I wouldn't be surprised if there is a way to use `guide_legend(override.aes(` to make separate legends within the original code.

Tue July 30, 2013 12:25 PM

Thanks a lot for your help Andrew, that did the trick!

Do you think I have to specify any extra arguments for creating the legend? Because it will just plot the mean and errbar legend above the observation legend, other than in the layer option.

Sun July 28, 2013 08:06 AM

Hi lueromat and thanks for the comment. It is a bit busy, but this doesn't look too bad off the cuff to me.

########################################
summary_score <- summarySE(score_long,measurevar="value",groupvars=c("variable", "group1"))
all <- merge(score_long,summary_score,by=c("variable", "group1"))
andy2 <- ggplot(all, aes(x=variable, y=value.x, fill=group1, colour=group1)) +
geom_dotplot(binwidth=.1, alpha=.6, position="dodge", binaxis="y", binstataxis="y",
stackdir="centerwhole") +
geom_errorbar(aes(ymin=value.y-se,ymax=value.y+se, width = 0.2), position = position_dodge(width = 0.9)) +
theme_bw() + facet_wrap(~group2, nrow = 2)
andy2
########################################

I believe in current ggplot2 code the layer argument has been depreciated in exchange for using "geom_*" layers, so there shouldn't be anything you can do with layers that you can't do with adding geom's. Also IMO geom layers are more expressive than layers with geom arguments.

Also check out geom_crossbar, http://docs.ggplot2.org/current/geom_crossbar.html, it may work out to be more visible with the current plot.

Sat July 27, 2013 07:48 PM

Hi Andrew,

I stumpled upon this very helpful code of yours a couple of days ago and have been trying to apply it to a dataset of mine with multiple groups, unfortunately with no succes possibly due to my limited understanding of the ggplot2 package.

I've modified the code you provided to create this example:

score <- read.table(file = "exp2tend.dat",header = TRUE)
score_long <- melt(score)
group1 <- c("A","A","A","A","A","B","B","B","B","B")
group2 <- c("C","C","D")
score_long$group1<-rep(group1, times=12)
score_long$group2<-LETTERS[3:6]

plot <- ggplot(data=score_long) +
layer(geom = 'point', position =position_dodge(width=0.4),
mapping = aes(x = variable, y = value, shape=group1)) +
theme_bw() + facet_wrap(~group2, nrow = 2)
plot

This is almost as I would like to have it, except for the fact that my single observations for the variable groups are not dodging - this option is applied to the arguments from group1 but not for observations within.

Identical observations within a group argument will dodge when I use the geom_dotplot instead of the layer option:

ggplot(score_long, aes(x=variable, y=value, fill=group1)) +
geom_dotplot(binwidth=.1, alpha=.6, position="dodge", binaxis="y", binstataxis="y",
stackdir="centerwhole") + theme_bw() + facet_wrap(~group2, nrow = 2)

But this will cause me trouble later on when I create the errorbars and I find the layer option more elegant in means of recreating your example with the errorbars.

Do you see where my problem lies and possbliby have a solution for it?

Many Thanks!

Wed April 03, 2013 01:13 PM

This message was posted by a user wishing to remain anonymous
[...] confusing to make such an error interval across a spectrum of bar charts. You could always do dynamite graphs, with error bars protruding from each bar, but that does not allow one to assess the general trend [...]