Methods

An overview of different Mendelian randomization methods for evaluating the causal effect of an exposure on an outcome.

Standardizing and harmonizing GWAS SumStats

Two-sample MR methods require the use of summary statistics from genome-wide association studies, including single nucleotide polymorphisms (SNPs), beta coefficients, standard errors, p-values, and allele frequencies. However, the historical lack of standards for data content and file formats in GWAS summary statistics has resulted in heterogeneous data sets. To address this issue, standardizing and harmonizing the GWAS summary statistics is crucial before conducting MR analyses. The GWAS Catalog and OpenGWAS platforms have developed formats such as GWAS-SSF (Hayhurst et al. 2022) and GWAS-VCF (Lyon et al. 2021) to facilitate sharing of GWAS SumStats. Tools like MungeSumstats (Murphy et al 2021) and GWAS2VCF (Lyon et al. 2021) are available that provide rapid standardization and quality control of GWAS SumStats.

Core MR methods

Tools & Publications: R, TwoSampleMR, MendelianRandomization

Hemani, G. et al. The MR-Base platform supports systematic causal inference across the human phenome. Elife 7, e34408 (2018).
Yavorska, O. O. & Burgess, S. MendelianRandomization: an R package for performing Mendelian randomization analyses using summarized data. Int J Epidemiol 46, dyx034 (2017).

Methods

IVW: The simplest method for MR causal effect estimation is the inverse variance weighted (IVW) meta-analysis of each genetic instrument's Wald ratio. This is similar to conducting a weighted regression of SNP-exposure effects against SNP-outcome effects, with the regression line passing through the origin. Fixed effects IVW is the most powerful method for MR analysis and is widely accepted as the primary method. However, it assumes that there is no horizontal pleiotropy and is sensitive to violations of MR assumptions. Random effects IVW relaxes this assumption by allowing each SNP to have a different mean effect, providing an unbiased estimate if horizontal pleiotropy is present in a balanced manner.

Maximum likelihood: The causal effect is estimated by direct maximization of the liklihood given the SNP-exposure and SNP-outcome effects and assuming a linear relationship between the exposure and outcome. May provide more reliable results in the presence of measurement error in the SNP-exposure effects, but assumes that there is no heterogeneity or horizontal pleiotropy.

MR-Egger: The Inverse Variance Weighted (IVW) method can be modified to account for horizontal pleiotropy by including a non-zero intercept. This allows for an estimation of net horizontal pleiotropic effects even if they are not balanced or directional. However, this approach assumes that the horizontal pleiotropic effects are uncorrelated with the SNP-exposure effects, known as the InSIDE assumption. Despite this, the MR-Egger method may have lower precision, leading to a reduction in statistical power to detect causal relationships.

Mode Based Estimators: Clusters SNPs into groups based on similarity of their causal effects and returns the causal estimates based on the cluster with the largest number of SNPs. Weighting each SNPs contribution to the clustering by the inverse variance of its outcome of its outcome effect can be also be used. Returns an unbiased causal estimate when all the SNPs in the largest cluster are valid instruments.

Median Based Estimators: Estimating the causal effect by taking the median effect of all avaliable SNPs allows for only half the SNPs needing to be valid instruments. Weighting the contribution of each SNP by the inverse variance of its association with the outcome allows for stronger SNPs to contribute more towards the estimate.

Results: In the fixed effects IVW analysis, higher geneticly predicted total cholesteroal levels are associated with increased risk of Alzheimer's disease. However, the majority of our pleiotropy robust methods are non-significant, suggesting that the causal effects may be biased.

alt text

The relationship between SNP effects on the exposure and SNP effects on the outcome can be visualized through a scatter plot. The slopes of the lines correspond to the estimated causal effect for each method.

alt text

Forest plots can be used to display the Wald ratio for single SNPs and their combined effects.

alt text

Diagnostics and sensetivity analyses are used to evaluate if the causal estimates are robust to violations of MR underlying assumptions. The intercept term in MR-Egger regression can provide an indication of the presence of directional horizontal pleiotropy, and help to determine the robustness of the MR results. Directional horizontal pleiotropy refers to the situation where a genetic variant used as an instrumental variable in a Mendelian randomization (MR) study influences the exposure and outcome in the same direction. This can result in biased estimates of the causal effect of the exposure on the outcome, and compromise the validity of the MR results.

We observe that the MR-Egger regression intercept for Total cholesterol onto AD is significant, suggesting that the IVW causal estimate is biased by directional horizontal pleiotropy.

alt text

Heterogeneity refers to the variability or diversity in the relationship between an exposure variable and an outcome variable across different genetic variants that are used as instrumental variables. Heterogeneity can arise due to several factors, including differences in the strength of the genetic associations with the exposure and outcome, differences in the direction of effect, or differences in the way that the genetic variants interact with other variables that may confound the relationship. Heterogeneity can pose a challenge for the validity of the MR results, as it may indicate that the assumption of a consistent relationship between the exposure and outcome across different SNPs is not met. If heterogeneity is present, it can indicate that the underlying causal relationship between the exposure and outcome is complex and cannot be captured by a single SNP or set of SNPs.

We can estimate heterogeneity using Cochran's Q Test. We observe that there is significant heterogeneity in the IVW and MR-Egger analysis, further highlighting that the IVW causal estimates are likely biased.

alt text

Funnel plots in which the estiamte for each SNP is plotted against its precision can be used to visually inspect for horizontal pleitropy, with asymmetry indicative of invalid instruments.

alt text

Leave-one-out analysis can be used to determine if a MR causal estimate is driven or biased by a single SNP that might have a particualy large horizontal pleiotropic effect. The MR causal effect is re-estimated by sequentially droping one SNP at a time.

We observe dramatic changes in the MR causal estimates when two SNPs - rs7412 and rs75687619 - are droped from the anlaysis. This suggests that the IVW estimate is particuarlly sensetive to the inclusion of these variants and that they are potentially outliers.

alt text

Radial MR

Soon^TM

LHC-MR

Soon^TM

Multivariable MR

Multivariable Mendelian randomization (MVMR) can be used to estimate the causal relationships between multiple exposures and a single outcome. It is particularly useful in situations where multiple exposures are related and have potential inter-related effects on the outcome of interest. MVMR can help to disentangle the complex relationships between these exposures and provide insights into their independent causal effects. Examples of situations where MVMR can be useful include the estimation of the independent effects of related risk factors such as lipid fractions on an outcome, or the assessment of the independent effects of a primary risk factor and a secondary mediator on a disease outcome.

Tools & Publications: R, TwoSampleMR, MendelianRandomization, MVMR, RMVMR, MVMRmode

Hemani, G. et al. The MR-Base platform supports systematic causal inference across the human phenome. Elife 7, e34408 (2018)
Yavorska, O. O. & Burgess, S. MendelianRandomization: an R package for performing Mendelian randomization analyses using summarized data. Int J Epidemiol 46, dyx034 (2017).
Sanderson, E., Spiller, W. & Bowden, J. Testing and correcting for weak and pleiotropic instruments in two‐sample multivariable Mendelian randomization. Stat Med 40, 5434–5452 (2021).
Woolf, B., Gill, D., Grant, A. J. & Burgess, S. MVMRmode: Introducing an R package for plurality valid estimators for multivariable Mendelian randomisation. medRxiv (2023).

Harmonizing SumStats for MVMR

MVMR involves the simultaneous use of multiple exposures as instrumental variables in the analysis of a single outcome. As such, harmonizing the exposure and outcome datasets for MVMR is a more complex process than in univariable MR as it requires conducting multiple clumping and proxy-variant procedures. The following is a step-by-step guide to harmonizing the exposure and outcome datasets for MVMR using LDL and HDL cholesterol levels as the exposures, and AD as the outcome:

Perform clumping to obtain independent genome-wide significant variants for each exposure. This step involves identifying the SNPs that are independently associated with each exposure and are significant at the genome-wide level.

There are 79 and 89 independent genome-wide significant SNPS for LDL and HDL respectivly.

Combine the exposure SNP lists and extract all the SNPs from each exposure. This step combines the lists of SNPs for each exposure and extracts all the SNPs that are present in each exposure.

There are 164 uniques SNPs across the LDL and HDL SNP list - 76 unique to LDL, 85 to HDL, and four that are shared.

alt text

Extracting these 164 SNPs from the LDL and HDL SumStats we get the following count of genome-wide significant SNPs.

LDL < 5e-08	HDL < 5e-08	n
FALSE	TRUE	76
TRUE	FALSE	64
TRUE	TRUE	24

Identify proxy variants for any SNPs that are missing in each exposure. This step involves identifying proxy variants for any SNPs that are not present in each exposure dataset. This is necessary to ensure that the exposure datasets are complete and that all relevant SNPs are included in the analysis.

There are no missing variants across LDL/HDL SumStats, as such there is no need identify proxy variants

Perform LD clumping on the combined SNP list to retain indepdent SNPs. This step is performed to reduce the risk of spurious results arising due to multi-collinearity by including correlated SNPs.

From the above the table we can see that there are 24 SNPs which are GWS across both HDL and LDL. After LD clumping on the smallest p-value from the exposures, we retain 136 of the 164 SNPs.

LDL < 5e-08	HDL < 5e-08	n
FALSE	TRUE	66
TRUE	FALSE	57
TRUE	TRUE	13

Extract exposures SNPs from the outcome GWAS. This step involves extracting the SNPs that are associated with the exposures from the genome-wide association study of the outcome.

Of the 136 exposures SNPs, 135 are avaliable in the AD SumStats.

alt text

One issues to be aware of is that five of the exposure SNPs are GWS for AD and are violating the exclusion restriction assumption.

LDL < 5e-08	HDL < 5e-08	AD < 5e-08	n
FALSE	TRUE	FALSE	74
FALSE	TRUE	TRUE	1
FALSE	TRUE	NA	1
TRUE	FALSE	FALSE	62
TRUE	FALSE	TRUE	2
TRUE	TRUE	FALSE	22
TRUE	TRUE	TRUE	2

Identify proxy variants for any SNPs that missing in the outcome. This step involves identifying proxy variants for any SNPs that are not present in the outcome dataset. This is necessary to ensure that the outcome dataset is complete and that all relevant SNPs are included in the analysis.

Harmonized exposure and outcome datasets. In order to perform MR the effect of a SNP on an outcome and exposure must be harmonised to be relative to the same allele.

Results

Using the core univariable MR methods, we observed that higher geneticly predicted LDL and HDL are associated with increased and reduced risk of AD respectivly. However, as with total cholesteroal levels we do observed significant heterogentity suggesting that the IVW estimates are likely to be biased.

alt text

Extending these analyses into a multivariable framework we are able to determine the independent effects of LDL and HDL cholesterol levels on AD. We observed that higher geneticly predicted LDL remains significantly causaly associated with increased AD risk, while the causaly effect of HDL on AD is now non-significant.

alt text

We can use Radial-MVMR to plot MVMR causal estimates.

alt text

As with the univariable MR models we can estimated heterogenity for the IVW model

Qstat: 681.9149
Qpval: 1.007841e-75

Importantly, we also need to consider the strength of the instruments for each exposure. As a rule of thumb, the F-statistic should be above 10.

	exposure1	exposure2
F-statistic	70.85811	77.75899

TODO: Q- and F-statistics are estimated assuming a genetic covariance of 0. Re-estimate based on phenotypic correlations calcuated from PhenoSpD.

References

Hayhurst, J. et al. A community driven GWAS summary statistics standard. Biorxiv 2022.07.15.500230 (2022) doi:10.1101/2022.07.15.500230.
Lyon, M. S. et al. The variant call format provides efficient and robust storage of GWAS summary statistics. Genome Biol 22, 32 (2021).
Murphy, A. E., Schilder, B. M. & Skene, N. G. MungeSumstats: A Bioconductor package for the standardisation and quality control of many GWAS summary statistics. Bioinformatics 37, btab665- (2021).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Methods

Standardizing and harmonizing GWAS SumStats

Core MR methods

Radial MR

LHC-MR

Multivariable MR

References

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally