A pluralistic framework for measuring, interpreting and stratifying heterogeneity in meta-analysis
A step-by-step illustration
Update
Last update June 2025
Preface
This step-by-step tutorial is a supplement to our paper targeted at quantifying and interpreting heterogeneity in meta-analyses.
Set-up
Our illustrations use R statistical software and existing R packages, which you will first need to download and install.
If you do not have it on your machine, first install R
(download). We recommend also
downloading RStudio
, a popular integrated development
environment for coding with R
, created by a company named
posit (download).
After installing R
, you must install several packages
that contain necessary functions for performing the analyses in this
tutorial. If the packages are archived in CRAN, use
install.packages()
to install them. For example, to install
the metafor
package (a common meta-analysis package), you
can execute install.packages("metafor")
in the console
(bottom left pane of R Studio
). To install packages that
are not on CRAN and archived in Github repositories, execute
devtools::install_github()
. For example, to install
orchaRd
(a meta-analysis visualization package) from Github
repository, execute
devtools::install_github("daniel1noble/orchaRd", force = TRUE)
.
The package list is as follows:
# install and load necessary library
::p_load(tidyverse,
pacman
here,
DT,
janitor,
ggpubr,
readxl,
metafor,
ggplot2,
ggsignif,
visdat,
cowplot,
patchwork,
pander,
rotl,
ape,
ggstance,
ggtree,
RColorBrewer,
wesanderson )
Custom function
We provide helper functions necessary for our illustrations. To use
them, you need to source
them. You also can paste the
source code into the console, and hit “Enter” to let R
“learn” these custom functions.
source(here("function","custom_function.R"))
Case study 1
Load data
Risely et al. (2018) quantitatively reviewed 85 observations extracted from 41 studies examining the relationship between infection status and intensity of infection and changes in body stores, refuelling rates, movement capacity, phenology and survival of migratory hosts in different taxa1.
#load data
<- read.csv(here("data","Risely_2018_data.csv"))
dat_Risely
#subset observations on infection status
<- subset(dat_Risely, infection.measure=="Infection status")
status head(status)
id rowid
1 99 153
6 39 49
7 39 50
8 39 51
13 2 6
14 81 135
authors
1 Altizer, SM; Hobson, KA; Davis, AK; De Roode, JC, Wassenaar, LI
6 Arizaga, J; Barba, E; Hernandez, MA
7 Arizaga, J; Barba, E; Hernandez, MA
8 Arizaga, J; Barba, E; Hernandez, MA
13 Asghar, M; Hasselquist, D; Hansson, B; Zehtindjiev, P; Westerdahl, H; Bensch, S
14 Bengtsson et al
title
1 Do Healthy Monarchs Migrate Farther? Tracking Natal Origins of Parasitized vs. Uninfected Monarch Butterflies Overwintering in Mexico
6 DO HAEMOSPORIDIANS AFFECT FUEL DEPOSITION RATE AND FUEL LOAD IN MIGRATORY BLACKCAPS SYLVIA ATRICAPILLA?
7 DO HAEMOSPORIDIANS AFFECT FUEL DEPOSITION RATE AND FUEL LOAD IN MIGRATORY BLACKCAPS SYLVIA ATRICAPILLA?
8 DO HAEMOSPORIDIANS AFFECT FUEL DEPOSITION RATE AND FUEL LOAD IN MIGRATORY BLACKCAPS SYLVIA ATRICAPILLA?
13 Hidden costs of infection: Chronic malaria accelerates telomere degradation and senescence in wild birds
14 Does influenza A virus infection affect movement behaviour during stopover in its wild reservoir host?
journal year
1 PLOS ONE 2015
6 ARDEOLA 2009
7 ARDEOLA 2009
8 ARDEOLA 2009
13 SCIENCE 2015
14 ROYAL SOCIETY OPEN SCIENCE 2016
abstract
1 Long-distance migration can lower parasite prevalence if strenuous journeys remove infected animals from wild populations. We examined wild monarch butterflies (Danaus plexippus) to investigate the potential costs of the protozoan燨phryocystis elektroscirrha爋n migratory success. We collected monarchs from two wintering sites in central Mexico to compare infection status with hydrogen isotope (?2H) measurements as an indicator of latitude of origin at the start of fall migration. On average, uninfected monarchs had lower\xa0?2H values than parasitized butterflies, indicating that uninfected butterflies originated from more northerly latitudes and travelled farther distances to reach Mexico. Within the infected class, monarchs with higher quantitative spore loads originated from more southerly latitudes, indicating that heavily infected monarchs originating from farther north are less likely to reach Mexico. We ruled out the alternative explanation that lower latitudes give rise to more infected monarchs prior to the onset of migration using citizen science data to examine regional differences in parasite prevalence during the summer breeding season. We also found a positive association between monarch wing area and estimated distance flown. Collectively, these results emphasize that seasonal migrations can help lower infection levels in wild animal populations. Our findings, combined with recent declines in the numbers of migratory monarchs wintering in Mexico and observations of sedentary (winter breeding) monarch populations in the southern U.S., suggest that shifts from migratory to sedentary behavior will likely lead to greater infection prevalence for North American monarchs.
6 Do haemosporidians affect fuel deposition rate and fuel load in migratory blackcaps, Sylvia atricapilla? Aims: Fuel deposition rate is one of the main parameters determining bird migration strategies. Accordingly, factors compromising fuel deposition rate, Such as parasite infections, might have relevant effects not only on migration, but also on other life history events that depend on migration Success, Such as breeding. We analysed the effect of haemosporidians on fuel load and fuel deposition rate in a population of migratory blackcaps (Sylvia atricapilla) during stopover in northern Iberia. Locality: Loza lagoon, northern Iberia (42 degrees 50' N, 01 degrees 43' W, 400 m a.s.l.). Methods: From blood samples of recaptured blackcaps we determined haemospiridian content by amplification of 479 bp of the parasite's cytochrome b gene. Independent on body size, of the sampled birds, 35% were infected by Haemoproteus-Plasmodium. Results: Mass deposition rate, fat score and body mass showed similar values in non-infected as in infected blackcaps. No differences in age or body size proportions were detected between infected and non-infected birds. Conclusions: These results may be explained by infected birds with high virulence (compromising fuel accumulation) being unable to migrate south from their breeding areas in central and northern Europe. In contrast, more resistant birds may be able to tolerate the parasitaemia and gain fuel normally. Another possibility is that only birds with a low intensity of parasite infection were captured (presumably birds able to overcome the parasitaemia successfully), since these birds may be able to accumulate fuel at a similar rate to non-infected birds. Furthermore, highly infected birds may have a lower likelihood of capture than non-infected birds, because birds lose mobility with a high intensity of infection and under these circumstances they are less likely to be captured.
7 Do haemosporidians affect fuel deposition rate and fuel load in migratory blackcaps, Sylvia atricapilla? Aims: Fuel deposition rate is one of the main parameters determining bird migration strategies. Accordingly, factors compromising fuel deposition rate, Such as parasite infections, might have relevant effects not only on migration, but also on other life history events that depend on migration Success, Such as breeding. We analysed the effect of haemosporidians on fuel load and fuel deposition rate in a population of migratory blackcaps (Sylvia atricapilla) during stopover in northern Iberia. Locality: Loza lagoon, northern Iberia (42 degrees 50' N, 01 degrees 43' W, 400 m a.s.l.). Methods: From blood samples of recaptured blackcaps we determined haemospiridian content by amplification of 479 bp of the parasite's cytochrome b gene. Independent on body size, of the sampled birds, 35% were infected by Haemoproteus-Plasmodium. Results: Mass deposition rate, fat score and body mass showed similar values in non-infected as in infected blackcaps. No differences in age or body size proportions were detected between infected and non-infected birds. Conclusions: These results may be explained by infected birds with high virulence (compromising fuel accumulation) being unable to migrate south from their breeding areas in central and northern Europe. In contrast, more resistant birds may be able to tolerate the parasitaemia and gain fuel normally. Another possibility is that only birds with a low intensity of parasite infection were captured (presumably birds able to overcome the parasitaemia successfully), since these birds may be able to accumulate fuel at a similar rate to non-infected birds. Furthermore, highly infected birds may have a lower likelihood of capture than non-infected birds, because birds lose mobility with a high intensity of infection and under these circumstances they are less likely to be captured.
8 Do haemosporidians affect fuel deposition rate and fuel load in migratory blackcaps, Sylvia atricapilla? Aims: Fuel deposition rate is one of the main parameters determining bird migration strategies. Accordingly, factors compromising fuel deposition rate, Such as parasite infections, might have relevant effects not only on migration, but also on other life history events that depend on migration Success, Such as breeding. We analysed the effect of haemosporidians on fuel load and fuel deposition rate in a population of migratory blackcaps (Sylvia atricapilla) during stopover in northern Iberia. Locality: Loza lagoon, northern Iberia (42 degrees 50' N, 01 degrees 43' W, 400 m a.s.l.). Methods: From blood samples of recaptured blackcaps we determined haemospiridian content by amplification of 479 bp of the parasite's cytochrome b gene. Independent on body size, of the sampled birds, 35% were infected by Haemoproteus-Plasmodium. Results: Mass deposition rate, fat score and body mass showed similar values in non-infected as in infected blackcaps. No differences in age or body size proportions were detected between infected and non-infected birds. Conclusions: These results may be explained by infected birds with high virulence (compromising fuel accumulation) being unable to migrate south from their breeding areas in central and northern Europe. In contrast, more resistant birds may be able to tolerate the parasitaemia and gain fuel normally. Another possibility is that only birds with a low intensity of parasite infection were captured (presumably birds able to overcome the parasitaemia successfully), since these birds may be able to accumulate fuel at a similar rate to non-infected birds. Furthermore, highly infected birds may have a lower likelihood of capture than non-infected birds, because birds lose mobility with a high intensity of infection and under these circumstances they are less likely to be captured.
13 Recovery from infection is not always complete, and mild chronic infection may persist. Although the direct costs of such infections are apparently small, the potential for any long-term effects on Darwinian fitness is poorly understood. In a wild population of great reed warblers, we found that low-level chronic malaria infection reduced life span as well as the lifetime number and quality of offspring. These delayed fitness effects of malaria appear to be mediated by telomere degradation, a result supported by controlled infection experiments on birds in captivity. The results of this study imply that chronic infection may be causing a series of small adverse effects that accumulate and eventually impair phenotypic quality and Darwinian fitness.
14 The last decade has seen a surge in research on avian influenza A viruses (IAVs), in part fuelled by the emergence, spread and potential zoonotic importance of highly pathogenic virus subtypes. The mallard (Anas platyrhynchos) is the most numerous and widespread dabbling duck in the world, and one of the most important natural hosts for studying IAV transmission dynamics. In order to predict the likelihood of IAV transmission between individual ducks and to other hosts, as well as between geographical regions, it is important to understand how IAV infection affects the host. In this study, we analysed the movements of 40 mallards equipped with GPS transmitters and three-dimensional accelerometers, of which 20 were naturally infected with low pathogenic avian influenza virus (LPAIV), at a major stopover site in the Northwest European flyway. Movements differed substantially between day and night, as well as between mallards returning to the capture site and those feeding in natural habitats. However, movement patterns did not differ between LPAIV infected and uninfected birds. Hence, LPAIV infection probably does not affect mallard movements during stopover, with high possibility of virus spread along the migration route as a consequence.
type trait measure exp.methods species
1 O Movement Distance travelled na Monarch butterly
6 O Refuelling Mass change na Blackcap
7 O Body stores Fat score na Blackcap
8 O Body stores Body mass na Blackcap
13 O Survival Lifespan na Great reed warbler
14 O Movement Local movements na Mallard
species_latin taxa Order Family
1 Danaus plexippus Insect Lepidoptera Nymphalidae
6 Sylvia atricapilla Bird Passeriformes Sylviidae
7 Sylvia atricapilla Bird Passeriformes Sylviidae
8 Sylvia atricapilla Bird Passeriformes Sylviidae
13 Acrocephalus arundinaceus Bird Passeriformes Acrocephalidae
14 Anas platyrhynchos Bird Anseriformes Anatidae
strain parasite.taxa life.history.measured setting
1 Ophryocystis elektroscirrha Protozoa Non-breeding Field
6 Haemoparasites Protozoa Migration Field
7 Haemoparasites Protozoa Migration Field
8 Haemoparasites Protozoa Migration Field
13 Haemoparasites Protozoa Breeding Field
14 LPAIV Virus Migration Field
subset Infection.type migratory.leg2 ss ss_infected ss_healthy
1 all Single Y 175 100 75
6 all Single Y 53 18 35
7 all Single Y 53 18 35
8 all Single Y 53 18 35
13 all Single Y 75 32 43
14 all Single N 87 37 50
slope effect stat presented.p adj.p sig0.05 infection.measure
1 Not presented 12.78 Wald X2 <0.0001 0.0001 1 Infection status
6 Not presented 0.1 t 0.9 0.9000 0 Infection status
7 Not presented 0.99 t 0.33 0.3300 0 Infection status
8 Not presented 0.01 F 0.93 0.9300 0 Infection status
13 0.84 2.82 t 0.006 0.0060 1 Infection status
14 Not presented 0.538 t >0.05 0.5000 0 Infection status
z var.z g var.g var.g1 function.
1 -0.28 0.01 -0.56 0.02 0.02 pes(0.0001,100,75) or chies(12.78,175)
6 0.02 0.02 0.04 0.08 0.08 pes(0.9,18,35)
7 -0.13 0.02 -0.28 0.08 0.08 pes(0.33,18,35)
8 -0.01 0.02 -0.03 0.08 0.08 pes(0.93,18,35)
13 -0.32 0.01 -0.65 0.06 0.06 tes(2.82,32,43)
14 -0.06 0.01 -0.12 0.05 0.05 tes(0.538,37,50)
Effect.direction
1 Negative
6 Positive
7 Negative
8 Negative
13 Negative
14 Negative
We also construct the phylogenetic tree to allow for decomposing heterogeneity at both species and phylogeny levels.
#check species we have
#length(unique(status$species_latin)) #25 unique species names, if no misspelling
#find Open Tree Taxonomy (OTT) IDs for each species
<- tnrs_match_names(names = unique(status$species_latin))
taxa
#rough check
#length(taxa$unique_name) # 25 unique species names, which is aligned with the species names in the dataset
#tabyl(taxa,approximate_match) # 0 approximate match
#check whether all otts occur in the synthetic tree
<- ott_id(taxa)[is_in_tree(ott_id(taxa))] # all good
ott_in_tree #length(ott_in_tree) # 25
#make phylo tree
<- tol_induced_subtree(ott_ids = ott_id(taxa)) tree
$tip.label <- strip_ott_ids(tree$tip.label, remove_underscores = TRUE)
tree
# test whether a tree is binary
#is.binary(tree)
#decapitalise species names to match with the search string names in taxa
<- status %>% mutate(search_string = tolower(species_latin))
status
#align data
<- left_join(status, dplyr::select(taxa, search_string, unique_name, ott_id), by = "search_string")
status
#create the variables of spp and phylo
<- status %>% mutate(spp = search_string, phylo = unique_name)
status
#estimate branch lengths using APE
<- compute.brlen(tree, method = "Grafen", power = 1)
tree2 ##create correlation matrix
<- vcv(tree2, corr=TRUE)
phylo_corr plot(tree2, cex = .8, label.offset = .1, no.margin = TRUE)
Fit data
We fit a phylogenetic multilevel model as illustrated in the main
text to the data. Doing so, we can stratify the heterogeneity at
different strata: within-study level (obs_id
),
between-study level (study_id
), species level
(spp
), and phylogeny level (phylo
).
\[ ES_{[i]} = \mu + u_{species[k]} + u_{phylogeny[k]} + \mu_{between[j]} + \mu_{within[j]} + e_{[i]} \]
#unify the name of key variables
<- status
dat $obs_id <- 1:nrow(dat)
dat$study_id <- dat$id
dat# run the model
<- rma.mv(g, var.g, random = list(~ 1 | obs_id, ~1 | study_id, ~ 1 | spp, ~ 1 | phylo), R = list(phylo = phylo_corr), data = dat)
mod_status
summary(mod_status)
Multivariate Meta-Analysis Model (k = 66; method: REML)
logLik Deviance AIC BIC AICc
-30.9191 61.8382 71.8382 82.7102 72.8552
Variance Components:
estim sqrt nlvls fixed factor R
sigma^2.1 0.0072 0.0846 66 no obs_id no
sigma^2.2 0.0000 0.0000 35 no study_id no
sigma^2.3 0.0006 0.0243 23 no spp no
sigma^2.4 0.0296 0.1720 23 no phylo yes
Test for Heterogeneity:
Q(df = 65) = 356.1658, p-val < .0001
Model Results:
estimate se zval pval ci.lb ci.ub
-0.2134 0.0917 -2.3274 0.0199 -0.3932 -0.0337 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Heterogeneity quantification and stratification
We quantify and stratify the heterogeneity using the formulas
provided in the main text. All functions used are integrated into
orchard
package2.
Typical sampling variance (which captures the statistical noise of the data, but is rarely reported in the current meta-analytic practice):
sigma2_v(mod_status)
[1] 0.00131211
Unstandardised heterogeneity metrics \(\sigma^2\):
Variance-standardised heterogeneity metrics \(I^2\):
<- data.frame(stratum = c("total", mod_status$s.names),
t3 I2 = i2_ml(mod_status) / 100) %>% dfround(1)
$stratum <- c("total", "within-study", "between-study", "species", "phylogeny")
t3rownames(t3) <- NULL
%>% DT::datatable() t3
Mean-standardised heterogeneity metrics \(CVH2\):
<- data.frame(stratum = c("total", mod_status$s.names),
t4 CVH2 = cvh2_ml(mod_status)) %>% dfround(2)
$stratum <- c("total", "within-study", "between-study", "species", "phylogeny")
t4rownames(t4) <- NULL
%>% DT::datatable() t4
Mean- and variance-standardised heterogeneity metrics \(M2\):
<- data.frame(stratum = c("total", mod_status$s.names),
t5 M2 = m2_ml(mod_status)) %>% dfround(2)
$stratum <- c("total", "within-study", "between-study", "species", "phylogeny")
t5rownames(t5) <- NULL
%>% DT::datatable() t5
Let’s visualize all heterogeneity measures:
# Visualize heterogeneity
## make dataframe
<- data.frame(sigma2 = t2$sigma2,
h_status I2 = i2_ml(mod_status),
CVH = cvh2_ml(mod_status),
M2 = m2_ml(mod_status))
rownames(h_status) <- c("Total", "obs_id", "study_id", "spp", "phylo")
$levels <- rownames(h_status)
h_status$levels <- dplyr::recode(h_status$levels, "Total" = "Total", "obs_id" = "Within", "study_id" = "Between", "spp" = "Spp", "phylo" = "Phylo")
h_status$levels <- as.factor(h_status$levels)
h_status$levels <- factor(h_status$levels, levels = c("Total", "Phylo", "Spp", "Between", "Within"))
h_status
## plot
## sigma
<- ggplot(h_status, aes(levels, sigma2)) +
p.sigma geom_col(alpha = 1, color = wes_palette('GrandBudapest1', 4, type = 'discrete')[1], fill = wes_palette('GrandBudapest1', 4, type = 'discrete')[1]) +
labs(y = expression("Variance"), x = "Strata" , title = "Unstandardised heterogeneity metrics") +
theme_bw() +
theme(legend.background = element_blank(),
axis.text = element_text(size = 12, color = "black"),
axis.title = element_text(size = 12, color = "black")
)
# I2
<- ggplot(h_status, aes(levels, I2/100)) +
p.I2 geom_col(alpha = 1, color = wes_palette('GrandBudapest1', 4, type = 'discrete')[2], fill = wes_palette('GrandBudapest1', 4, type = 'discrete')[2]) +
labs(y = expression(paste(italic(I)[]^2)), x = "Strata" , title = "Source of heterogeneity") +
theme_bw() +
theme(legend.background = element_blank(),
axis.text = element_text(size = 12, color = "black"),
axis.title = element_text(size = 12, color = "black")
)
# CVH
<- ggplot(h_status, aes(levels, CVH)) +
p.CV geom_col(alpha = 1, color = wes_palette('GrandBudapest1', 4, type = 'discrete')[3], fill = wes_palette('GrandBudapest1', 4, type = 'discrete')[3]) +
labs(y = expression(paste(italic(CVH2)[])), x = "Strata" , title = "Magnitude of heterogeneity") +
scale_y_continuous(labels = scales::number_format(accuracy = 0.01)) +
theme_bw() +
theme(legend.background = element_blank(),
axis.text = element_text(size = 12, color = "black"),
axis.title = element_text(size = 12, color = "black")
)
# M
<- ggplot(h_status, aes(x = levels, y = M2)) +
p.M geom_col(alpha = 1, color = wes_palette('GrandBudapest1', 4, type = 'discrete')[4], fill = wes_palette('GrandBudapest1', 4, type = 'discrete')[4]) +
labs(y = expression(paste(italic(M2)[])), x = "Strata" , title = "Magnitude of heterogeneity") +
scale_y_continuous(labels = scales::number_format(accuracy = 0.01)) +
theme_bw() +
theme(legend.background = element_blank(),
axis.text = element_text(size = 12, color = "black"),
axis.title = element_text(size = 12, color = "black")
)
#png(filename = "Fig. 6.png", width = 9, height = 8, units = "in", type = "windows", res = 400)
+ p.I2 + p.CV + p.M + plot_layout(ncol =2, nrow = 2) + plot_annotation(tag_levels = "A") & theme(plot.tag = element_text(face = "bold")) p.sigma
Heterogeneity interpretation
The best way to interpret heterogeneity is to place the interpretation of heterogeneity in the specific context of the research topic. When meta-analysts lack sufficient domain knowledge to implement this context-specific interpretation strategy, empirically derived benchmarks, which compare the heterogeneity of interest with published meta-analyses can provide a starting point for interpreting heterogeneity.
We provide an R helper function (het_interpret()
) that
can help determine the percentile range in which the heterogeneity
estimates for a particular meta-analysis fall, based on the
heterogeneity distribution of the published meta-analysis.
The main parameters of het_interpret()
are as
follows:
observed_value
Specify the observed value of
heterogeneity.
het_type = "V_bar"
Specify the heterogeneity measures:
"I2"
, "CVH"
, "M"
, raw
heterogeneity ("sigma2"
), and typical sampling variance
("V_bar"
).
es_type
Specify the effect size measures: standardised
mean difference like Cohen’s d ("SMD"
), log response ratio
("lnRR"
), Fisher’s r-to-z transformed correlation
coefficient ("Zr"
), 2-by-2 table denotes often dichotomous
(“binary
”) effect size measures like log odds ratio, and
uncommon measures ("uncommon"
) represent less frequently
used effect size measures like mean difference.
data
Specify the dataset containing empirical benchmarks
(i.e., the dataset het.benchmark
),
First, let’s upload the benchmark data (het.benchmark
),
which are publicly available in the GitHub repository:
Then, we calculate the percentile range for each type of heterogeneity measure.
het_interpret(observed_value = 0.9, het_type = “CVH”, es_type = “SMD”, data = het.benchmark)
Percentile range of typical sampling variance:
$declaration
[1] "We encourage interpreting heterogeneity in context. These empirical benchmarks are intended to contextualize heterogeneity magnitude relative to published meta-analyses, not to replace domain-specific interpretation."
$interpretation
# A tibble: 1 x 4
observed_value het_type es_type percentile_range
<dbl> <chr> <chr> <chr>
1 0.00131 V_bar Zr 0%-5% percentile
Unstandardised heterogeneity metrics \(\sigma^2\):
$declaration
[1] "We encourage interpreting heterogeneity in context. These empirical benchmarks are intended to contextualize heterogeneity magnitude relative to published meta-analyses, not to replace domain-specific interpretation."
$interpretation
# A tibble: 1 x 4
observed_value het_type es_type percentile_range
<dbl> <chr> <chr> <chr>
1 0.0373 sigma2 Zr 10%-15% percentile
Variance-standardised heterogeneity metrics \(I^2\):
$declaration
[1] "We encourage interpreting heterogeneity in context. These empirical benchmarks are intended to contextualize heterogeneity magnitude relative to published meta-analyses, not to replace domain-specific interpretation."
$interpretation
# A tibble: 1 x 4
observed_value het_type es_type percentile_range
<dbl> <chr> <chr> <chr>
1 0.966 I2 Zr 80%-85% percentile
Mean-standardised heterogeneity metrics \(CVH2\):
$declaration
[1] "We encourage interpreting heterogeneity in context. These empirical benchmarks are intended to contextualize heterogeneity magnitude relative to published meta-analyses, not to replace domain-specific interpretation."
$interpretation
# A tibble: 1 x 4
observed_value het_type es_type percentile_range
<dbl> <chr> <chr> <chr>
1 0.820 CVH Zr 25%-30% percentile
Mean- and variance-standardised heterogeneity metrics \(M2\):
$declaration
[1] "We encourage interpreting heterogeneity in context. These empirical benchmarks are intended to contextualize heterogeneity magnitude relative to published meta-analyses, not to replace domain-specific interpretation."
$interpretation
# A tibble: 1 x 4
observed_value het_type es_type percentile_range
<dbl> <chr> <chr> <chr>
1 0.450 M Zr 25%-30% percentile
Case study 2
Load data
Yang et al. (2024) conducted a meta-analysis on an evidence base including 437 effect sizes extracted from 127 experiments investigating the impact of artificial light at night on the melatonin suppression of wildlife3.
#load data
<- read.csv(here("data","Yang_2024_data.csv"))
dat_Yang
#effect size computation - use log response ratio as the effect size measure
<- escalc(measure = "ROM", m1i = Exp_Mean, m2i = Con_Mean, sd1i = Exp_SD, sd2i = Con_SD, n1i = Exp_N, n2i = Con_N, data = dat_Yang) dat_Yang
Fit data
We fit a three-level multilevel model to the data. Doing so, we can
stratify the heterogeneity at within-study level (obs_id
),
and between-study level (study_id
).
#unify the name of key variables
<- dat_Yang
dat # add row no.
$obs_id <- 1:nrow(dat)
dat$study_id <- dat$StudyID
dat# run the model
<- rma.mv(yi, vi, random = list(~ 1 | obs_id, ~1 | study_id), data = dat)
mod_alan
summary(mod_alan)
Multivariate Meta-Analysis Model (k = 372; method: REML)
logLik Deviance AIC BIC AICc
-489.9875 979.9750 985.9750 997.7236 986.0404
Variance Components:
estim sqrt nlvls fixed factor
sigma^2.1 0.5677 0.7534 372 no obs_id
sigma^2.2 0.1268 0.3562 35 no study_id
Test for Heterogeneity:
Q(df = 371) = 4916.5818, p-val < .0001
Model Results:
estimate se zval pval ci.lb ci.ub
-0.4815 0.0865 -5.5686 <.0001 -0.6510 -0.3120 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Heterogeneity quantification and stratification
We quantify and stratify the heterogeneity using the formulas
provided in the main text. All functions used are integrated into
orchard
package2.
Typical sampling variance (which captures the statistical noise of the data, but is rarely reported in the current meta-analytic practice):
sigma2_v(mod_alan)
[1] 0.03674805
Unstandardised heterogeneity metrics \(\sigma^2\):
Variance-standardised heterogeneity metrics \(I^2\):
<- data.frame(stratum = c("total", mod_alan$s.names),
t3 I2 = i2_ml(mod_alan) / 100) %>% dfround(1)
$stratum <- c("total", "within-study", "between-study")
t3rownames(t3) <- NULL
%>% DT::datatable() t3
Mean-standardised heterogeneity metrics \(CVH2\):
<- data.frame(stratum = c("total", mod_alan$s.names),
t4 CVH2 = cvh2_ml(mod_alan)) %>% dfround(2)
$stratum <- c("total", "within-study", "between-study")
t4rownames(t4) <- NULL
%>% DT::datatable() t4
Mean- and variance-standardised heterogeneity metrics \(M2\):
<- data.frame(stratum = c("total", mod_alan$s.names),
t5 M2 = m2_ml(mod_alan)) %>% dfround(2)
$stratum <- c("total", "within-study", "between-study")
t5rownames(t5) <- NULL
%>% DT::datatable() t5
Let’s visualize all heterogeneity measures:
# Visualize heterogeneity
## make dataframe
<- data.frame(sigma2 = t2$sigma2,
h_status I2 = i2_ml(mod_alan),
CVH = cvh2_ml(mod_alan),
M2 = m2_ml(mod_alan))
rownames(h_status) <- c("Total", "obs_id", "study_id")
$levels <- rownames(h_status)
h_status$levels <- dplyr::recode(h_status$levels, "Total" = "Total", "obs_id" = "Within", "study_id" = "Between")
h_status$levels <- as.factor(h_status$levels)
h_status$levels <- factor(h_status$levels, levels = c("Total", "Between", "Within"))
h_status
## plot
## sigma
<- ggplot(h_status, aes(levels, sigma2)) +
p.sigma geom_col(alpha = 1, color = wes_palette('GrandBudapest1', 4, type = 'discrete')[1], fill = wes_palette('GrandBudapest1', 4, type = 'discrete')[1]) +
labs(y = expression("Variance"), x = "Strata" , title = "Unstandardised heterogeneity metrics") +
theme_bw() +
theme(legend.background = element_blank(),
axis.text = element_text(size = 12, color = "black"),
axis.title = element_text(size = 12, color = "black")
)
# I2
<- ggplot(h_status, aes(levels, I2/100)) +
p.I2 geom_col(alpha = 1, color = wes_palette('GrandBudapest1', 4, type = 'discrete')[2], fill = wes_palette('GrandBudapest1', 4, type = 'discrete')[2]) +
labs(y = expression(paste(italic(I)[]^2)), x = "Strata" , title = "Source of heterogeneity") +
theme_bw() +
theme(legend.background = element_blank(),
axis.text = element_text(size = 12, color = "black"),
axis.title = element_text(size = 12, color = "black")
)
# CVH
<- ggplot(h_status, aes(levels, CVH)) +
p.CV geom_col(alpha = 1, color = wes_palette('GrandBudapest1', 4, type = 'discrete')[3], fill = wes_palette('GrandBudapest1', 4, type = 'discrete')[3]) +
labs(y = expression(paste(italic(CVH2)[])), x = "Strata" , title = "Magnitude of heterogeneity") +
scale_y_continuous(labels = scales::number_format(accuracy = 0.01)) +
theme_bw() +
theme(legend.background = element_blank(),
axis.text = element_text(size = 12, color = "black"),
axis.title = element_text(size = 12, color = "black")
)
# M
<- ggplot(h_status, aes(x = levels, y = M2)) +
p.M geom_col(alpha = 1, color = wes_palette('GrandBudapest1', 4, type = 'discrete')[4], fill = wes_palette('GrandBudapest1', 4, type = 'discrete')[4]) +
labs(y = expression(paste(italic(M2)[])), x = "Strata" , title = "Magnitude of heterogeneity") +
scale_y_continuous(labels = scales::number_format(accuracy = 0.01)) +
theme_bw() +
theme(legend.background = element_blank(),
axis.text = element_text(size = 12, color = "black"),
axis.title = element_text(size = 12, color = "black")
)
+ p.I2 + p.CV + p.M + plot_layout(ncol =2, nrow = 2) + plot_annotation(tag_levels = "A") & theme(plot.tag = element_text(face = "bold")) p.sigma
Heterogeneity interpretation
We also illustrate the use of empirical benchmarks using the data from Yang et al. (2024)3.
Upload the benchmark data (het.benchmark
):
Calculate the percentile range for each type of heterogeneity measure.
Percentile range of typical sampling variance:
$declaration
[1] "We encourage interpreting heterogeneity in context. These empirical benchmarks are intended to contextualize heterogeneity magnitude relative to published meta-analyses, not to replace domain-specific interpretation."
$interpretation
# A tibble: 1 x 4
observed_value het_type es_type percentile_range
<dbl> <chr> <chr> <chr>
1 0.0367 V_bar lnRR 75%-80% percentile
Unstandardised heterogeneity metrics \(\sigma^2\):
$declaration
[1] "We encourage interpreting heterogeneity in context. These empirical benchmarks are intended to contextualize heterogeneity magnitude relative to published meta-analyses, not to replace domain-specific interpretation."
$interpretation
# A tibble: 1 x 4
observed_value het_type es_type percentile_range
<dbl> <chr> <chr> <chr>
1 0.695 sigma2 lnRR 80%-85% percentile
Variance-standardised heterogeneity metrics \(I^2\):
$declaration
[1] "We encourage interpreting heterogeneity in context. These empirical benchmarks are intended to contextualize heterogeneity magnitude relative to published meta-analyses, not to replace domain-specific interpretation."
$interpretation
# A tibble: 1 x 4
observed_value het_type es_type percentile_range
<dbl> <chr> <chr> <chr>
1 0.950 I2 lnRR 45%-50% percentile
Mean-standardised heterogeneity metrics \(CVH2\):
$declaration
[1] "We encourage interpreting heterogeneity in context. These empirical benchmarks are intended to contextualize heterogeneity magnitude relative to published meta-analyses, not to replace domain-specific interpretation."
$interpretation
# A tibble: 1 x 4
observed_value het_type es_type percentile_range
<dbl> <chr> <chr> <chr>
1 3.00 CVH lnRR 45%-50% percentile
Mean- and variance-standardised heterogeneity metrics \(M2\):
$declaration
[1] "We encourage interpreting heterogeneity in context. These empirical benchmarks are intended to contextualize heterogeneity magnitude relative to published meta-analyses, not to replace domain-specific interpretation."
$interpretation
# A tibble: 1 x 4
observed_value het_type es_type percentile_range
<dbl> <chr> <chr> <chr>
1 0.750 M lnRR 45%-50% percentile
Additional illustration
In addition, we show how the traditional heterogeneity measure \(I^2\) can distort heterogeneity estimates for some specific effect size measures.
Some meta-analysts may be interested in the difference in variability between two groups. The (log-transformed) ratio of the coefficients of variation of the two groups (also known as the coefficient of variation ratio) can be a useful metric for comparing the variability of the two groups. Due to the properties of the coefficient of variation ratio, it usually has a large sampling variance. This inherently large sampling variance leads to smaller \(I^2\) values, regardless of whether the true heterogeneity is large or not.
Let’s use the example of Yang et al. (2024)3 to illustrate it.
Load data
#load data
<- read.csv(here("data","Yang_2024_data.csv"))
dat_Yang
# calculate the coefficient of variation ratio for each observation
<- escalc(measure = "CVR", m1i = Exp_Mean, m2i = Con_Mean, sd1i = Exp_SD, sd2i = Con_SD, n1i = Exp_N, n2i = Con_N, data = dat_Yang) dat_Yang
Fit data
Fit a three-level multilevel model and stratify the heterogeneity at
within-study level (obs_id
), and between-study level
(study_id
).
#unify the name of key variables
<- dat_Yang
dat # add row no.
$obs_id <- 1:nrow(dat)
dat$study_id <- dat$StudyID
dat# run the model
<- rma.mv(yi, vi, random = list(~ 1 | obs_id, ~1 | study_id), data = dat)
mod_alan
summary(mod_alan)
Multivariate Meta-Analysis Model (k = 372; method: REML)
logLik Deviance AIC BIC AICc
-374.6991 749.3982 755.3982 767.1468 755.4636
Variance Components:
estim sqrt nlvls fixed factor
sigma^2.1 0.1035 0.3217 372 no obs_id
sigma^2.2 0.0575 0.2397 35 no study_id
Test for Heterogeneity:
Q(df = 371) = 544.9818, p-val < .0001
Model Results:
estimate se zval pval ci.lb ci.ub
0.2804 0.0605 4.6340 <.0001 0.1618 0.3990 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Heterogeneity quantification and stratification
Calculate heterogeneity for the meta-analysis using the coefficient of variation ratio. As you can see, the \(I^2\) values become much smaller, while \(\sigma^2\), \(CVH2\), and \(M2\) are still large.
# Visualize heterogeneity
## make dataframe
<- data.frame(sigma2 = t2$sigma2,
h_status I2 = i2_ml(mod_alan) / 100,
CVH = cvh2_ml(mod_alan),
M2 = m2_ml(mod_alan))
rownames(h_status) <- c("Total", "obs_id", "study_id")
$levels <- rownames(h_status)
h_status$levels <- dplyr::recode(h_status$levels, "Total" = "Total", "obs_id" = "Within", "study_id" = "Between")
h_status$levels <- as.factor(h_status$levels)
h_status$levels <- factor(h_status$levels, levels = c("Total", "Between", "Within"))
h_status
## plot
## sigma
<- ggplot(h_status, aes(levels, sigma2)) +
p.sigma geom_col(alpha = 1, color = wes_palette('GrandBudapest1', 4, type = 'discrete')[1], fill = wes_palette('GrandBudapest1', 4, type = 'discrete')[1]) +
labs(y = expression("Variance"), x = "Strata" , title = "Unstandardised heterogeneity metrics") +
theme_bw() +
theme(legend.background = element_blank(),
axis.text = element_text(size = 12, color = "black"),
axis.title = element_text(size = 12, color = "black")
)
# I2
<- ggplot(h_status, aes(levels, I2/100)) +
p.I2 geom_col(alpha = 1, color = wes_palette('GrandBudapest1', 4, type = 'discrete')[2], fill = wes_palette('GrandBudapest1', 4, type = 'discrete')[2]) +
labs(y = expression(paste(italic(I)[]^2)), x = "Strata" , title = "Source of heterogeneity") +
theme_bw() +
theme(legend.background = element_blank(),
axis.text = element_text(size = 12, color = "black"),
axis.title = element_text(size = 12, color = "black")
)
# CVH
<- ggplot(h_status, aes(levels, CVH)) +
p.CV geom_col(alpha = 1, color = wes_palette('GrandBudapest1', 4, type = 'discrete')[3], fill = wes_palette('GrandBudapest1', 4, type = 'discrete')[3]) +
labs(y = expression(paste(italic(CVH2)[])), x = "Strata" , title = "Magnitude of heterogeneity") +
scale_y_continuous(labels = scales::number_format(accuracy = 0.01)) +
theme_bw() +
theme(legend.background = element_blank(),
axis.text = element_text(size = 12, color = "black"),
axis.title = element_text(size = 12, color = "black")
)
# M
<- ggplot(h_status, aes(x = levels, y = M2)) +
p.M geom_col(alpha = 1, color = wes_palette('GrandBudapest1', 4, type = 'discrete')[4], fill = wes_palette('GrandBudapest1', 4, type = 'discrete')[4]) +
labs(y = expression(paste(italic(M2)[])), x = "Strata" , title = "Magnitude of heterogeneity") +
scale_y_continuous(labels = scales::number_format(accuracy = 0.01)) +
theme_bw() +
theme(legend.background = element_blank(),
axis.text = element_text(size = 12, color = "black"),
axis.title = element_text(size = 12, color = "black")
)
+ p.I2 + p.CV + p.M + plot_layout(ncol =2, nrow = 2) + plot_annotation(tag_levels = "A") & theme(plot.tag = element_text(face = "bold")) p.sigma
License
This documented is licensed under the following license: CC Attribution-Noncommercial-Share Alike 4.0 International.
Software and package versions
package loadedversion
ape ape 5.8-1
cowplot cowplot 1.1.1
dplyr dplyr 1.0.10
DT DT 0.19
forcats forcats 0.5.2
ggplot2 ggplot2 3.4.4
ggpubr ggpubr 0.4.0
ggsignif ggsignif 0.6.3
ggstance ggstance 0.3.5
ggtree ggtree 3.7.1.001
here here 1.0.1
janitor janitor 2.1.0
knitr knitr 1.37
Matrix Matrix 1.5-3
metadat metadat 1.2-0
metafor metafor 4.7-53
numDeriv numDeriv 2016.8-1.1
pander pander 0.6.4
patchwork patchwork 1.1.1
purrr purrr 0.3.4
RColorBrewer RColorBrewer 1.1-3
readr readr 2.1.2
readxl readxl 1.3.1
rmdformats rmdformats 1.0.3
rotl rotl 3.0.11
stringr stringr 1.5.0
tibble tibble 3.1.8
tidyr tidyr 1.2.1
tidyverse tidyverse 1.3.1
visdat visdat 0.5.3
wesanderson wesanderson 0.3.6