zl程序教程

您现在的位置是:首页 >  其他

当前栏目

跟着存档教程动手学RNAseq分析(六):差异分析流程总结

流程教程 分析 总结 差异 跟着 动手 存档
2023-06-13 09:16:26 时间

跟着存档教程动手学RNAseq分析(五):DESeq2基因水平差异表达分析

我们详细介绍了差异表达分析工作流程中的各个步骤,并提供了理论和示例代码。为了给运行DGE分析所需的代码提供更简洁的参考,我们总结了如下分析中的步骤:

  1. 使用tximport导入Salmon的基因水平计数数据
# Run tximport
 txi <- tximport(files, type="salmon", tx2gene=t2g, countsFromAbundance = "lengthScaledTPM")
 
 # "files" is a vector wherein each element is the path to the salmon quant.sf file, and each element is named with the name of the sample.
 # "t2g" is a 2 column data frame which contains transcript IDs mapped to geneIDs (in that order)
  1. 创建dds对象:
 # Check that the row names of the metadata equal
 # the column names of the **raw counts** data
 all(colnames(txi$counts) == rownames(metadata))
 
 # Create DESeq2Dataset object
 dds <- DESeqDataSetFromTximport(txi, colData = metadata, design = ~ condition)
  1. 探索性数据分析(PCA和层次聚类)-识别数据中的异常值和变异来源:
 # Transform counts for data visualization
 rld <- rlog(dds, blind=TRUE)
 
 # Plot PCA 
 plotPCA(rld, intgroup="condition")
 
 # Extract the rlog matrix from the object
 # and compute pairwise correlation values
 rld_mat <- assay(rld)
 rld_cor <- cor(rld_mat)
 
 # Plot heatmap
 pheatmap(rld_cor, annotation = metadata)
  1. 运行DESeq2
# **Optional step** - Re-create DESeq2 dataset 
# if the design formula has changed after QC analysis
# in include other sources of variation 
# using "dds <- DESeqDataSetFromTximport(txi, colData = metadata, design = ~ covaraite + condition)"

# Run DESeq2 differential expression analysis
 dds <- DESeq(dds)

# **Optional step** - Output normalized counts 
# to save as a file to access outside RStudio 
# using "normalized_counts <- counts(dds, normalized=TRUE)"
  1. 检查离散估计的拟合:
plotDispEsts(dds)
  1. 创建对比,对特定条件下的收缩log2折叠变化进行Wald检验:
 # Output results of Wald test for contrast
 contrast <- c("condition", "level_to_compare", "base_level")
 res <- results(dds, contrast = contrast, alpha = 0.05)
 res <- lfcShrink(dds, contrast = contrast, res=res)
  1. 输出显著性结果:
 # Set thresholds
 padj.cutoff < - 0.05
 
 # Turn the results object into a tibble for use with tidyverse functions
 res_tbl <- res %>%
               data.frame() %>%
               rownames_to_column(var="gene") %>% 
               as_tibble()
 
 # Subset the significant results
 sig_res <- filter(res_tbl, padj < padj.cutoff)

8. 可视化结果:火山图、热图、top基因的标准化计数图等。

9. 进行分析,提取结果的功能意义:GO或KEGG富集,GSEA等。