A two-step approach for robust point and variance estimation for meta-analyses with selective reporting and dependent effect sizes

Yefeng Yang, Malgorzata Lagisz, Coralie Williams, Daniel W. A. Noble & Shinichi Nakagawa

Instruction

This online material serves as a supplement to our methodological paper, aiming for providing a step-by-step tutorial.

Yefeng Yang, Malgorzata Lagisz, Coralie Williams, Daniel W. A. Noble & Shinichi Nakagawa. A two-step approach for robust point and variance estimation for meta-analyses with selective reporting and dependent effect sizes. 2023.

A glimpse:

Meta-analysis produces a quantitative synthesis of evidence-based knowledge, shaping not only research trends but also policymaking. However, meta-analytic modelling struggles with addressing two statistical issues concurrently: statistical dependence and selective reporting (e.g., publication bias, P-hacking). Here, we propose a two-step procedure to tackle these challenges. First, we employ bias-robust weighting schemes under the generalized least square estimator to obtain less biased mean effect size estimates by considering the mechanisms of selective reporting. Second, we utilize cluster-robust variance estimation to account for statistical dependence and reduce bias in estimating standard errors, ensuring valid statistical inference. By re-analysing 448 published meta-analyses, we show our approach is effective in mitigating bias when estimating mean effect sizes and standard errors. To assist adoption of our approach, we provide a step-by-step tutorial website. Complementing the current practice with the proposed method facilitates robust analysis and transition to a more pluralistic approach in the quantitative synthesis.

Reach out

For queries, mistakes, or bug, please contact corresponding authors:

Dr. Yefeng Yang

Evolution & Ecology Research Centre, EERC School of Biological, Earth and Environmental Sciences, BEES The University of New South Wales, Sydney, Australia

Email: yefeng.yang1@unsw.edu.au

Professor Shinichi Nakagawa, PhD, FRSN

Evolution & Ecology Research Centre, EERC School of Biological, Earth and Environmental Sciences, BEES The University of New South Wales, Sydney, Australia

Email: s.nakagawa@unsw.edu.au

Setup `R`

This tutorial is based on R statistical software and existing R packages.

If you do not have it on your computer, first install R (download).

We recommend also downloading a popular IDE for coding with R - posit (i.e., RStudio), which is developed by a company named posit (download).

After installing R and posit, next step is to install 7 packages that contain functions for implementing our proposed method. If the packages are archived in CRAN, use install.packages() to install them.

For packages metafor, clubSandwich,ggplot2, readxl, dplyr， and here that are archived in CRAN, use install.packages("package name")in the console (bottom left pane of posit) to install them. For example, to install metafor, you will need to run install.packages("metafor"). For package orchaRd that is archived in GitHub, execute devtools::install_github("repository name/package name"). For example, to install orchaRd, you will need to execute devtools::install_github("daniel1noble/orchaRd", force = TRUE).

Dataset

In the tutorial, we randomly selected a published meta-analysis [1] that showed evidence of publication bias. This example meta-analysis examined the effect of herbivore interaction on fitness based on 179 species, 167 studies, and 1640 effect sizes [1]. To address the statistical dependence among effect sizes (with 10 effect sizes per study), the original publication employed Bayesian multilevel meta-analytic modelling with phylogenetic relatedness, study, and observation identities as random effects.

First, let’s load data and have a glance of data.

study denotes unique identity of each primary study included in this meta-analysis.

eff.size denotes effect size estimate. This meta-analysis utilised standardised mean difference (SMD) as effect size measure.

var.eff.size denotes corresponding sampling variance estimate.

# load data
bird.et.al.2019.ecoletts <- read.csv(here("data","bird.et.al.2019.ecoletts.csv"))

# only keep relevant variables
dat <- bird.et.al.2019.ecoletts %>% select(study, eff.size, var.eff.size)

# have a look at data
head(dat)

##        study   eff.size var.eff.size
## 1 Harrison_1 -1.7771359    0.1099483
## 2 Harrison_1 -1.7519846    0.2385344
## 3 Harrison_1 -1.3276771    1.7153664
## 4 Harrison_1 -0.3359459    0.4703529
## 5 Harrison_1 -0.6486920    1.0548881
## 6 Harrison_1 -0.3984191    0.7450682

Data check

Before formally modeling the data, it is recommended to visually inspect it as part of good data science practice. Visual inspection allows for an initial examination of the data and helps identify any potential issues or patterns. So, let’s do some basic visual inspection of data.

The distribution of effect size measures:

hist(dat$eff.size, breaks = 50, main = "Effect size distribution", xlab = "Effect size estimate (SMD)")

Figure S1. Histogram of effect size measures in the example dataset

There appears to be outlier data points in the dataset.

To validate this observation, we can generate a visual representation by plotting the relationship between the effect size estimate and its corresponding precision:

plot(dat$eff.size, 1/sqrt(dat$var.eff.size), main = "Effect size vs precision", xlab = "Effect size estimate (SMD)", ylab = "Precision (1/SE)")

Figure S2. The histogram of effect size measures in the example dataset

Upon closer examination, it is apparent that the dataset contains effect size estimates that exhibit extreme values, either extremely large or small. In order to address this issue, we can establish an arbitrary threshold, such as [-20, 20], which serves as a criterion for identifying and excluding potential outliers. Then, we revisit the data again:

# exclude effect size > 20 & < -20
dat <- dat %>% filter(eff.size < 20 & eff.size > -20)

# check again
hist(dat$eff.size, breaks = 50, main = "Effect size distribution", xlab = "Effect size estimate (SMD)")

Figure S3. The histogram of effect size measures after deleting potential outliners

plot(dat$eff.size, 1/sqrt(dat$var.eff.size), main = "Effect size vs precision", xlab = "Effect size estimate (SMD)", ylab = "Precision (1/SE)")

Figure S4. Histogram of effect size measures after deleting potential outliners

Based on Figures S4 and S5, it appears that after applying the predefined threshold to exclude potential outliers, the data exhibits a more normal distribution. It is important to note that this tutorial does not specifically focus on data cleaning techniques (we acknowledge we are also not the expert in this aspect). Therefore, we will not delve into this aspect and proceed with our demonstration after the above simple cleaning.

Evidence of publication bias

In line with the publication bias test conducted in the original publication [1], our re-analysis also confirmed the existence of publication bias. We used the recently developed multilevel version of Egger’s test to detect publication bias [2].

## account for non-independence to avoid false positive when detecting publication bias
dat$obs <- 1:nrow(dat)
dat$se.eff.size <- sqrt(dat$var.eff.size)
mod_pb <- rma.mv(yi = eff.size, 
                 V = var.eff.size, 
                 mods = ~ se.eff.size, 
                 random = list(~ 1 | study, ~ 1 | obs), 
                 method = "REML", test = "t", dfs = "contain", 
                 data = dat, 
                 sparse = TRUE, 
                 control=list(optimizer="nlminb", rel.tol=1e-8, iter.max=1000))

summary(mod_pb)

## 
## Multivariate Meta-Analysis Model (k = 1633; method: REML)
## 
##     logLik    Deviance         AIC         BIC        AICc   
## -2383.5518   4767.1035   4775.1035   4796.6913   4775.1281   
## 
## Variance Components:
## 
##             estim    sqrt  nlvls  fixed  factor 
## sigma^2.1  0.2418  0.4918    167     no   study 
## sigma^2.2  0.4539  0.6737   1633     no     obs 
## 
## Test for Residual Heterogeneity:
## QE(df = 1631) = 6170.0259, p-val < .0001
## 
## Test of Moderators (coefficient 2):
## F(df1 = 1, df2 = 1631) = 46.5115, p-val < .0001
## 
## Model Results:
## 
##              estimate      se     tval    df    pval    ci.lb    ci.ub      
## intrcpt       -0.1632  0.0794  -2.0555   165  0.0414  -0.3199  -0.0064    * 
## se.eff.size    0.9010  0.1321   6.8199  1631  <.0001   0.6419   1.1601  *** 
## 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# visualize publication bias
bubble_plot(mod_pb, mod = "se.eff.size", 
            legend.pos = "top.left", 
            group = "study") + 
  labs(x = "Standard error of SMD", y = "Standardized mean difference (SMD)")

Figure S5. The positive relaitonship between effect size estimate and its standard error indicates the evidence of publicaiton bias

The output of the model shows strong statistical evidence of publication bias, or more precisely, small study effects. This is also confirmed by Figure S6. It is worth noting, however, that as we state in the main text, we assume, as do other researchers, that the small study effect is a typical form of selective reporting or publication bias. But whether this is the case, only God knows.

The publication bias also can be visually checked by a contour-enhanced funnel plot, as shown in Figure S6.

# a contour-enhanced funnel plot
with(dat, funnel(eff.size, se.eff.size, 
                 yaxis = "sei", 
                 level=c(90, 95, 99), shade=c("white", "gray55", "gray75"), 
                 legend=F,
                 xlab = "Standardized mean difference (SMD)", ylab = "SE"))

Figure S6. Funnel plot showing the evidence of publicaiton bias

Benchmark method: multilevel model

The multilevel meta-analysis (MLMA) model has gained popularity as a standard method for handling dependent effect sizes in various fields. Its flexible random-effects structure has made it a benchmark approach in many disciplines. We have published Several guideline papers recommending the MLMA model as the default method for conducting meta-analyses, each with a specific methodological focus tailored to different fields, such as preclinic or animal sciences [3], environmental sciences [4], experimental biology [5], ecology and evolution [6], and biology in general [7].

However, it is important to note that the MLMA model exhibits systematic errors in estimating the mean effect size when selective reporting is present. This can be observed in Figures 2 to 5 in the main text of our study. Similar to the traditional random-effects model, the MLMA model tends to assign roughly equal weights to studies in the presence of high heterogeneity, which can lead to biased estimates when publication bias is present.

A MLMA model can be fitted with:

# fit a MLMA model
mod_MLMA <- rma.mv(yi = eff.size, 
                   V = var.eff.size, 
                   random = list(~ 1 | study, ~ 1 | obs), 
                   method = "REML", 
                   test = "t", 
                   dfs = "contain", 
                   data = dat, 
                   sparse = TRUE, 
                   control=list(optimizer="nlminb", rel.tol=1e-8, iter.max=1000))
summary(mod_MLMA)

## 
## Multivariate Meta-Analysis Model (k = 1633; method: REML)
## 
##     logLik    Deviance         AIC         BIC        AICc   
## -2408.1719   4816.3437   4822.3437   4838.5364   4822.3584   
## 
## Variance Components:
## 
##             estim    sqrt  nlvls  fixed  factor 
## sigma^2.1  0.2503  0.5003    167     no   study 
## sigma^2.2  0.4660  0.6826   1633     no     obs 
## 
## Test for Heterogeneity:
## Q(df = 1632) = 6349.8754, p-val < .0001
## 
## Model Results:
## 
## estimate      se    tval   df    pval   ci.lb   ci.ub      
##   0.2748  0.0475  5.7892  166  <.0001  0.1811  0.3685  *** 
## 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

For the above output, we see a statistically significant interaction between herbivores without correcting for publication bias (\(\hat{\beta}\) = 0.275, \({SE(\hat{\beta})}\) = 0.047, 95% CI = [0.181, 0.369], \(t_{166}\) = 5.789, p-value < 0.001). This observation is aligned with the original study [1], albeit with slight difference in the magnitude of estimate.

Two-step approach

In contrast, our proposed two-step approach address this issue by employing bias-robust models within the cluster-robust variance estimation (CRVE) framework.

Step one - fit a bias-robust model

In the first step, we employ bias-robust models with bias-robust weighting schemes to obtain less biased mean effect size estimates \(\hat{\beta}\). Specifically, we incorporate a within-study variance-covariance matrix into the fixed-effect model (FE + VCV) and utilize the unrestricted weighted least square (UWLS) model. The bias-robust weighting schemes counteracted selective reporting by considering the underlying mechanisms that contribute to it. For example, the inverse VCV weighting scheme assigned smaller weights to studies with low precision and large effects, thereby penalizing studies that appear to be “selectively reported”.

To implement step one, you will need to use vcalc() and rma.mv from package metafor [8]. We assume a constant sampling correlation of 0.5 (see a sensitivity analysis at the end of this tutorial). Note that vcalc() in metafor also can be replaced by impute_covariance_matrix() from package clubSandwich [9].

Let’s show the implementation of step one:

# step one - fit a multivariate FE model
## assuming that the effect sizes within studies are correlated with rho=0.5
VCV <- vcalc(vi = var.eff.size,
             cluster = study, 
             rho = 0.5, 
             obs = obs, 
             data = dat) 
## alternative way
# VCV <- impute_covariance_matrix(vi = dat$var.eff.size, cluster = dat$study, r = 0.5)

mod_MLFE <- rma.mv(yi = eff.size, 
                   V = VCV, method = "REML", 
                   test = "t", 
                   dfs = "contain", 
                   data = dat)
summary(mod_MLFE)

## 
## Multivariate Meta-Analysis Model (k = 1633; method: REML)
## 
##     logLik    Deviance         AIC         BIC        AICc   
## -4815.3910   9630.7820   9632.7820   9638.1796   9632.7845   
## 
## Variance Components: none
## 
## Test for Heterogeneity:
## Q(df = 1632) = 9671.1396, p-val < .0001
## 
## Model Results:
## 
## estimate      se    tval    df    pval   ci.lb   ci.ub      
##   0.0738  0.0176  4.1820  1632  <.0001  0.0392  0.1084  *** 
## 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The model output indicates that the interaction between herbivores turn into a minimal magnitude (\(\hat{\beta}\) = 0.074, \({SE(\hat{\beta})}\) = 0.018, 95% CI = [0.039, 0.108], \(t_{166}\) = 4.182, p-value < 0.001), albeit it is still statistically significant. But as we will show that this statistical significance arises from the non-independence among effect sizes. In this example dataset, each primary study contributes 10 effect size estimates. The between-study difference accounts for 35% variation in effect sizes.

Step two - estimate robust error

Moving to the second step, we treated the fitted bias-robust models as the “working” model within the CRVE framework. This step helped to mitigate potential biases in the standard error estimates \({SE(\hat{\beta})}\) that could arise from violating assumption of data independence in FE + VCV and UWLS (i.e., model misspecification). By employing the CRVE, we obtained robust standard error estimates \({SE(\hat{\beta})}\) that ensured the validity of subsequent statistical inference, including null-hypothesis tests and CI construction. In this step, we will be using robust() from in metafor [8]. Alternatively, you can use coef_test() in clubSandwich [9].

The robust standard error estimate \({SE(\hat{\beta})}\) and subsequent statistical inferences can be done with:

# apply CRVE to multivariate FE model
mod_MLFE_RVE <- robust(mod_MLFE, 
                       cluster = study, 
                       adjust = TRUE, 
                       clubSandwich = TRUE)
summary(mod_MLFE_RVE)

## 
## Multivariate Meta-Analysis Model (k = 1633; method: REML)
## 
##     logLik    Deviance         AIC         BIC        AICc   
## -4815.3910   9630.7820   9632.7820   9638.1796   9632.7845   
## 
## Variance Components: none
## 
## Test for Heterogeneity:
## Q(df = 1632) = 9671.1396, p-val < .0001
## 
## Number of estimates:   1633
## Number of clusters:    167
## Estimates per cluster: 1-50 (mean: 9.78, median: 6)
## 
## Model Results:
## 
## estimate      se¹    tval¹     df¹    pval¹    ci.lb¹   ci.ub¹    
##   0.0738  0.0528   1.3971   51.96   0.1683   -0.0322   0.1798     
## 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 1) results based on cluster-robust inference (var-cov estimator: CR2,
##    approx t-test and confidence interval, df: Satterthwaite approx)

The corresponding \({SE(\hat{\beta})}\) now is 0.053, which is more closer to that from the standard method (MLMA model). The null-hypothesis test shows that the interaction between herbivores is not statistically significant (p-value = 0.168). The 95% CI becomes wider [-0.032, 0.18], which means it is more likely to cover the true effect and less likely to have a high false positive. Another interesting point to spot is that the degrees of freedom was decreased from 1632 to 52. Using a more familiar language to explain this is that the dataset has the issue of pseudo-replication.

Technical note: In terms of small sample size correction for CRVE, CR2 correction performs better than CR1. However, CR2 is not applicable to models with non-nested random effects, such as models with crossed random effects. In our case, the model in the first step does not include random effects.

Visualization

We also develop a help function integrated into a meta-analysis visualization package orchaRd [10] to visualize the results of the proposed method along with those from the standard method. The plot includes essential elements such as the point estimate of the mean effect size, 95% CI, 95% prediction interval, as well as information on precision, observational and study-level sample sizes, allowing for a visual assessment of the robustness of the meta-analytic findings and facilitating transparent reporting.

This figure can be made with orchard_plot() and pub_bias_plot() functions:

# typical orchard plot
plot <- orchard_plot(mod_MLMA, mod = "1", 
                     group = "study", 
                     xlab = "Standardised mean difference (SMD)") + 
        scale_x_discrete(labels = "Population mean effect estimate") + 
        ylim(-4,4)

# visualize the model estimates of the proposed two-step approach as the sensitivity analyses
pub_bias_plot(plot, mod_MLFE_RVE) +
  theme(panel.grid = element_blank(),
        axis.text.x = element_text(size = 12, color = "black"),
        axis.title.x = element_text(size = 12, color = "black"),
        axis.text.y = element_text(size = 12, color = "black"))

Figure S7. Orchard plot showing the results from the standard meta-analytic modeeling (multilevel meta-analysis) and the proposed two-step appraoch

# not run - fig used in the panel C in figure 6
#png(filename = "Fig.6 panel4.png", 
#    width = 8, 
#    height = 3, 
#    units = "in", 
#    type = "windows", 
#    res = 400)
#pub_bias_plot2(plot, mod_MLFE_RVE, mod_pb) + 
#  theme(panel.grid = element_blank(), 
#        axis.text.x = element_text(size = 12, color = "black"), 
#        axis.title.x = element_text(size = 12, color = "black"), 
#        axis.text.y = element_text(size = 12, color = "black"))
#dev.off()

Additional analysis

Here, we show how to perform a sensitivity analysis to examine the extent to which the mean effect is sensitive to the assumption of within-study (sampling) correlation \(\rho\) values used for constructing variance-covariance matrix (VCV).

First, set a series of \(\rho\) (i.e., 0.3, 0.5, 0.7, 0.9). Note that we arbitrarily assume these values. The end-users can set them based on their expertise in their fields:

# set a range of rho
rho_range <- c(0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8)
# repeatedly run the specified model with varying rho
mod_MLFE_range <- list(NULL) 
for (i in 1:length(rho_range)) {
# impute VCV matrix
#VCV_range <- vcalc(vi = var.eff.size, cluster = study, obs = id.effect.within.study, rho = rho_range[i],data = dat)
VCV_range <- impute_covariance_matrix(vi = dat$var.eff.size, 
                                      cluster = dat$study, 
                                      r = rho_range[i])
# we write a function to help repeatedly run the specified model, changing rho at a time: 
mod_MLFE_range[[i]] <- rma.mv(yi = eff.size, 
                            V = VCV_range, # VCV matrix with varying values of rho. 
                            method = "REML", 
                            test = "t", 
                            dfs = "contain",
                            data = dat,
                            sparse = TRUE
                           )} # run model with different rho values.

# CRVE
mod_MLFE_RVE_range <- list(NULL)
for (i in 1:length(mod_MLFE_range)) {
mod_MLFE_RVE_range[i] <- robust(mod_MLFE_range[[i]], 
                                cluster = study, 
                                clubSandwich = TRUE) %>% list()  
}

From Table S1, we see that the mean effect does not change with the changing of \(\rho\) values, indicating that the model coefficients are robust to different assumption of \(\rho\).

Table S1 Sensitivity analysis examining the robustness of the mean effect ( to the assumption of \(\rho\) values.

t1 <- data.frame(rho  = rho_range,
                 "mean effect"  = sapply(mod_MLFE_RVE_range, function(x) coef(x)),
                 "standard error" = sapply(mod_MLFE_RVE_range, function(x) x$se),
                 "p-value" = sapply(mod_MLFE_RVE_range, function(x) x$pval),
                 "Lower CI" = sapply(mod_MLFE_RVE_range, function(x) x$ci.lb),
                 "Upper CI" = sapply(mod_MLFE_RVE_range, function(x) x$ci.ub))

colnames(t1) <- c("Correlation (ρ)", "Mean effect", "Standard error", "p-value", "Lower CI", "Upper CI")
dfround(t1,3) %>% DT::datatable() # kable(digits=c(1,3,3,4,3,3))

License

This documented is licensed under the following license: CC Attribution-Noncommercial-Share Alike 4.0 International.

Software and package versions

sessionInfo() %>% pander()

R version 4.3.1 (2023-06-16 ucrt)

Platform: x86_64-w64-mingw32/x64 (64-bit)

locale: LC_COLLATE=English_Australia.utf8, LC_CTYPE=English_Australia.utf8, LC_MONETARY=English_Australia.utf8, LC_NUMERIC=C and LC_TIME=English_Australia.utf8

attached base packages: stats, graphics, grDevices, utils, datasets, methods and base

other attached packages: ggplot2(v.3.4.2), here(v.1.0.1), orchaRd(v.2.0), clubSandwich(v.0.5.8), dplyr(v.1.1.2), metafor(v.4.2-0), numDeriv(v.2016.8-1.1), metadat(v.1.2-0), Matrix(v.1.5-4.1), readr(v.2.1.4), pander(v.0.6.5), rmdformats(v.1.0.4) and knitr(v.1.43)

loaded via a namespace (and not attached): gtable(v.0.3.3), beeswarm(v.0.4.0), xfun(v.0.39), bslib(v.0.5.0), htmlwidgets(v.1.6.2), lattice(v.0.21-8), mathjaxr(v.1.6-0), tzdb(v.0.4.0), crosstalk(v.1.2.0), vctrs(v.0.6.3), tools(v.4.3.1), generics(v.0.1.3), sandwich(v.3.0-2), tibble(v.3.2.1), fansi(v.1.0.4), highr(v.0.10), pacman(v.0.5.1), pkgconfig(v.2.0.3), lifecycle(v.1.0.3), compiler(v.4.3.1), farver(v.2.1.1), stringr(v.1.5.0), munsell(v.0.5.0), vipor(v.0.4.5), htmltools(v.0.5.5), sass(v.0.4.7), yaml(v.2.3.7), pillar(v.1.9.0), jquerylib(v.0.1.4), ellipsis(v.0.3.2), DT(v.0.28), cachem(v.1.0.8), nlme(v.3.1-162), tidyselect(v.1.2.0), digest(v.0.6.31), mvtnorm(v.1.2-2), stringi(v.1.7.12), bookdown(v.0.34), labeling(v.0.4.2), splines(v.4.3.1), latex2exp(v.0.9.6), rprojroot(v.2.0.3), fastmap(v.1.1.1), grid(v.4.3.1), colorspace(v.2.1-0), cli(v.3.6.1), magrittr(v.2.0.3), utf8(v.1.2.3), withr(v.2.5.0), scales(v.1.2.1), ggbeeswarm(v.0.7.2), estimability(v.1.4.1), rmarkdown(v.2.23), emmeans(v.1.8.7), zoo(v.1.8-12), hms(v.1.1.3), coda(v.0.19-4), evaluate(v.0.21), mgcv(v.1.8-42), rlang(v.1.1.1), Rcpp(v.1.0.10), xtable(v.1.8-4), glue(v.1.6.2), rstudioapi(v.0.15.0), jsonlite(v.1.8.5) and R6(v.2.5.1)

References

In this tutorial, we mainly cited literature that is related to the implementation. For other related literature, they are credited in the main text.

Bird G, Kaczvinsky C, Wilson AE, Hardy NB. When do herbivorous insects compete? A phylogenetic meta-analysis. Ecology Letters. 2019;22: 875–883.

Nakagawa S, Lagisz M, Jennions MD, Koricheva J, Noble DW, Parker TH, et al. Methods for testing publication bias in ecological and evolutionary meta-analyses. Methods in Ecology and Evolution. 2022;13: 4–21.

Yang Y, Macleod M, Pan J, Lagisz M, Nakagawa S. Advanced methods and implementations for the meta-analyses of animal models: Current practices and future recommendations. Neuroscience & Biobehavioral Reviews. 2022; 105016.

Nakagawa S, Yang Y, Macartney EL, Spake R, Lagisz M. Quantitative evidence synthesis: A practical guide on meta-analysis, meta-regression, and publication bias tests for environmental sciences. Environmental Evidence. 2023.

Noble DW, Pottier P, Lagisz M, Burke S, Drobniak SM, O’Dea RE, et al. Meta-analytic approaches and effect sizes to account for ‘nuisance heterogeneity’in comparative physiology. Journal of Experimental Biology. 2022;225: jeb243225.

Nakagawa S, Santos ES. Methodological issues and advances in biological meta-analysis. Evolutionary Ecology. 2012;26: 1253–1274.

Nakagawa S, Noble DW, Senior AM, Lagisz M. Meta-evaluation of meta-analysis: Ten appraisal questions for biologists. BMC biology. 2017;15: 1–14.

Viechtbauer W. Confidence intervals for the amount of heterogeneity in meta-analysis. Statistics in medicine. 2007;26: 37–52.

Pustejovsky JE, Tipton E. Meta-analysis with robust variance estimation: Expanding the range of working models. Prevention Science. 2022;23: 425–438.

10.

Nakagawa S, Lagisz M, O’Dea RE, Rutkowska J, Yang Y, Noble DW, et al. The orchard plot: Cultivating a forest plot for use in ecology, evolution, and beyond. Research Synthesis Methods. 2021;12: 4–12.