LD Clumping is a procedure by which SNPs are selected based on their significance or p-value. SNPs in close proximity on the genome are often correlated due to a phenomenon known as Linkage Disequilibrium (LD). In the clumping process, such SNPs are ‘clumped’ together and represented by the most significant SNP within the clump.
The primary purpose of this process is to ensure that we retain only independent genetic variants for the subsequent Mendelian Randomization estimation. But why is this independence important?
Retaining independent genetic variants is critical to avoid overestimation and bias in the causal analysis. If correlated SNPs were included, certain parts of the genome might be overrepresented in the analysis, skewing the results. By ensuring that the SNPs are independent, we avoid this problem and ensure that the observed associations are not merely due to LD between SNPs.