zl程序教程

您现在的位置是:首页 >  其它

当前栏目

人类疾病的遗传分析 | genetic analysis | GWAS

分析 人类 Analysis 疾病 遗传
2023-09-27 14:27:39 时间

 

因为没有在这一行真正干过,所以一些基本概念还不是很明确,这里作一个梳理。

看本领域的遗传学综述有感:The Emerging Genetic Landscape of Hirschsprung Disease and Its Potential Clinical Applications

 

问题:

  • 什么是遗传分析genetic analysis?发展逻辑是什么?
  • 疾病和变异如何分类?mendelian form是什么?syndromic or isolated是什么?de novo和inherited variant?
  • rare damaging or common regulatory risk variants?universal和ancestry-specific risk alleles?
  • effect size是啥?OR是啥?
  • positional cloning是什么?linkage analysis、linkage mapping是什么?基本原理是什么?
  • trio-based和case-control studies的区别?trios分析是什么?

深入:

  • Statistical power和sample size的关系
  • effect size和allele frequency的关系

 

遗传分析

概念非常大,现在的GWAS,以及WGS、WES等应该都属于遗传分析。

Genetic analysis is the overall process of studying and researching in fields of science that involve genetics and molecular biology.

 

发展逻辑

遗传学三大定律

  1. 基因分离定律
  2. 基因自由组合定律
  3. 基因的连锁和交换定律

 

第一阶段:连锁分析linkage mapping找疾病相关的coding mutation

背景:此时我们已经知道性状表型是由基因控制,且认为基因功能受编码区影响(非编码区的功能还不清楚),也已经掌握了连锁分析来定位疾病相关的基因,主要是从家系分析开始。

局限性:rare coding variants in RET appear to play a less prominent role in sporadic and S-HSCR compared to the familial and L-HSCR。

解释:coding variant首先肯定非常rare,代表性不足,其次一般符合常染色体显性遗传的特点,所以在sporadic里不是研究重点。

 

第二阶段:基于high-throughput SNP array的GWAS找common variants

背景:coding是YES or NO的问题,无法解释外显率的差异,。Altogether, the rare damaging variants identified in RET and EDNRB pathways explain only a small fraction (<30%) of sporadic HSCR cases. 罕见的编码区变异已经解释力不足了,找不到新的基因,且无法解释表达差异和疾病severity。必须解决missing heritability的问题。

如何理解common variants?the problem is largely a result of the small contribution of most variants, either because the variants are too rare to contribute population-wide, or because the effect sizes of common variants are, in general, very small.

如何理解penetrance外显率?不是有genotype就肯定有phenotype的,因为有多因素在互作。Penetrance refers to the likelihood that a clinical condition will occur when a particular genotype is present. For adult-onset diseases, penetrance is usually described by the individual carrier's age, sex, and organ site.

如何理解haplotype的over-representation?

如何理解a common functional RET intron 1 enhancer variant (RET+3; rs2435357 T/C) that largely increases risk of HSCR (OR~5).

如何理解Epistatic interaction?Epistasis has been used to describe a number of phenomena, including the functional interaction between genes, the genetic outcome of mutations acting within the same genetic pathway, and the statistical deviation from additive gene action.

GWAS的成果:Altogether, these findings implied that common variants can predispose to HSCR in a low penetrance manner by modifying the phenotypic expression, which opened up a new area of genetic research on HSCR, including family-based and population-based association studies by detecting transmission disequilibrium of common singlenucleotide polymorphisms (SNP) from parents to proband and comparing frequencies of SNPs in cases vs. controls, respectively

 

第三阶段:基于WES和WGS的GWAS分析鉴定rare variants

背景:基于SNP array的GWAS只能鉴定common variants,WES则连coding区域的rare variant都能鉴定出来,WGS则连non-coding区的variant也能鉴定。

 

第四阶段:CNV和SV的鉴定

 

定位克隆

Positional cloning is a laboratory technique used to locate the position of a disease-associated gene along the chromosome. This approach works even when little or no information is available about the biochemical basis of the disease. Positional cloning is used in conjunction with linkage analysis.

也分传统的和基因组的两大类

Traditional versus postgenomic positional cloning strategies. A: Before the availability of the genome sequence, positional cloning involved several labor-intensive steps. After genetic mapping to a chromosomal region, the physical portion of the genome was isolated on large insert DNA clones, often requiring dozens of clones to cover the region. Genes residing on the large insert clones were identified experimentally. Once genes were identified, they were evaluated for potential involvement in the disease by gene expression analysis and DNA sequencing to identify the mutation. B: The availability of the genome sequence streamlines the positional cloning approach, supplanting experimental techniques with in silico analysis of physical map position and gene content. Additional resources, such as gene ontology and gene expression databases, help prioritize candidate genes for mutation analysis. 

 

连锁分析linkage analysis

【不是关联分析】

Genetic linkage analysis is a powerful tool to detect the chromosomal location of disease genes. It is based on the observation that genes that reside physically close on a chromosome remain linked during meiosis.

Linkage analysis is a statistical genetic method that aims to identify chromosomal regions that cosegregate with a disease of interest through pedigrees.

For many years, linkage analysis was the primary tool used for the genetic mapping of Mendelian and complex traits with familial aggregation. Linkage analysis was largely supplanted by the wide adoption of genome-wide association studies (GWASs). However, with the recent increased use of whole-genome sequencing (WGS), linkage analysis is again emerging as an important and powerful analysis method for the identification of genes involved in disease aetiology, often in conjunction with WGS filtering approaches. Here, we review the principles of linkage analysis and provide practical guidelines for carrying out linkage studies using WGS data.

基本原理

These studies built upon the fundamental idea that disease causal variant and nearby genetic markers tend to be transmitted together due to linkage disequilibrium(LD). Such approach of positional cloning and linkage analysis have been applied to multiplex families where highly informative genetic markers are used to map the disease-associated loci of large effect. Once a locus is linked, a search for rare damaging mutations (i.e., variants with minor allele frequency <1% in general population) in candidate genes within the locus is ensued. This strategy remained very popular especially before the GWAS era.

 

关联分析association studies

因为技术原因,关联分析已经等同于全基因组关联分析了。

In genomics, a genome-wide association study (GWA study, or GWAS), also known as whole genome association study (WGA study, or WGAS), is an observational study of a genome-wide set of genetic variants in different individuals to see if any variant is associated with a trait. GWA studies typically focus on associations between single-nucleotide polymorphisms (SNPs) and traits like major human diseases, but can equally be applied to any other genetic variants and any other organisms.

 

effect size和allele frequency的关系

  1. Relationship between effect size and allele frequency (adopted from [25, 26]). Extremely rare genetic variants with large effect sizes (upper left, strong red color) are often identified in family-based genome-wide linkage analyses.
  2. Common genetic variants with small effect sizes (lower right, strong green color) have been identified in traditional GWAS (including only common variants).
  3. Rare variants with small effects (lower left) are difficult to identify.
  4. Whereas common genetic variants with large effects (upper right) have been identified using both linkage analysis and GWAS, however these are highly unusual for common diseases

 

Primary research strategies for identification of genetic variants across the allele frequency spectrum (adopted from [27]).

  1. Genome-wide linkage studies are well suited to identification of genetic variants with allele frequencies below 0.3 % with large effect sizes (OR > 5).
  2. Targeted resequencing often leads to identification of genetic variants with allele frequencies between 0.3 and 5 % with moderate effect sizes (2 < OR < 5), but may also be used to identify rare variants with large effects and common variants with modest effects.
  3. Traditional GWAS is suited to identification of common genetic variants with modest effect sizes (OR < 2)

 

装逼术语

post-genomic era

post-GWAS era

post-single cell era

 

 

参考

  • Epistasis — the essential role of gene interactions in the structure and evolution of genetic systems
  • Rare and common variants: twenty arguments
  • Genetic linkage analysis in the age of whole-genome sequencing
  • Benefits and limitations of genome-wide association studies
  • The Norwegian preeclampsia family cohort study: A new resource for investigating genetic aspects and heritability of preeclampsia and related phenotypes

Statistical power and significance testing in large-scale genetic studies - Pak Sham