zl程序教程

您现在的位置是:首页 >  工具

当前栏目

一行命令跑通RNAseq下游

命令 一行 RNAseq 跑通 下游
2023-06-13 09:17:01 时间

本期给大家介绍永和曾老师一起开发的R包"RNAseqStat"。

Github网址 https://xiayh17.top/RNAseqStat/index.html

简单来说"RNAseqStat"是用来对RNAseq上游分析得到的表达矩阵进行下游分析,只需要将表达矩阵整理成示例数据的格式,一行代码即可跑通下游分析

runAll(count_data = counts_input, group_list = group_list, 
       test_group = "T", control_group = "C",
       OrgDb = 'org.Hs.eg.db', dir = results_dir)

一共需要6个参数

  • count_data 处理好的表达矩阵
  • group_list 分组信息
  • test_group 处理组
  • control_group 对照组
  • OrgDb 数据库(需要提前下载好)
  • dir 输出目录

安装

可以通过CRAN和GitHub两种途径安装

CRAN

install.packages("RNAseqStat")

GitHub

install.packages("devtools")
devtools::install_github("xiayh17/RNAseqStat")

runALL演示

正如函数名,一步跑通整个流程

runALL包含五个步骤,

  1. 检查数据
  2. DEG分析
  3. GO分析
  4. KEGG分析
  5. GSEA分析

接下来我们先用示例数据进行演示

示例数据下载

示例表达矩阵

示例分组情况

library(RNAseqStat)
load(file = "counts_input.rda")
load(file = "group_list.rda")

# 安装org.Hs.eg.db
BiocManager::install("org.Hs.eg.db")
library(org.Hs.eg.db)

在安装并调用org.Hs.eg.db时很有可能遇到以下报错

Error: package or namespace load failed for ‘org.Hs.eg.db’:
 loadNamespace()里算'org.Hs.eg.db'时.onLoad失败了,详细内容:
  调用: l$contains
  错误: $ operator is invalid for atomic vectors

解决办法是先运行以下命令

options(connectionObserver = NULL)
library(org.Hs.eg.db)

准备好org.Hs.eg.db后,runALL即可出图

runAll(count_data = counts_input,
       group_list = group_list,
       test_group = "T", control_group = "C",
       OrgDb = "org.Hs.eg.db",
       dir ="results/")

runALL后结果

runALL实质

runAll <- function(count_data, group_list, OrgDb = 'org.Hs.eg.db', dir = ".",test_group = "T", control_group = "C") {

  message(glue("Step1: Check you data"))
  pre_check(counts_data = count_data, group_list = group_list, dir = dir)

  message(glue("Step2: DEG analysis"))
  deg_results <- deg_run(count_data, group_list, test_group = test_group, control_group = control_group,dir = dir)

  message(glue("Step3: EnrichGO analysis"))
  enrichGO_run(deg_results@deg_df_limma, x = "logFC", y = "P.Value",
               OrgDb = OrgDb, dir = dir,prefix = "3-EnrichGO-limma")
  enrichGO_run(deg_results@deg_df_edgeR, x = "logFC", y = "PValue",
               OrgDb = OrgDb, dir = dir,prefix = "3-EnrichGO-edgeR")
  enrichGO_run(deg_results@deg_df_DESeq2, x = "log2FoldChange", y = "pvalue",
               OrgDb = OrgDb, dir = dir,prefix = "3-EnrichGO-DESeq2")

  message(glue("Step4: EnrichKEGG analysis"))
  enrichKEGG_run(deg_results@deg_df_limma, x = "logFC", y = "P.Value",
                 OrgDb = OrgDb, dir = dir,prefix = "4-EnrichKEGG-limma")
  enrichKEGG_run(deg_results@deg_df_edgeR, x = "logFC", y = "PValue",
                 OrgDb = OrgDb, dir = dir,prefix = "4-EnrichKEGG-edgeR")
  enrichKEGG_run(deg_results@deg_df_DESeq2, x = "log2FoldChange", y = "pvalue",
                 OrgDb = OrgDb, dir = dir,prefix = "4-EnrichKEGG-DESeq2")

  message(glue("Step5: EnrichgseKEGG analysis"))
  enrichgesKEGG_run(deg_results@deg_df_limma, x = "logFC", OrgDb = OrgDb,
                    dir = dir,prefix = "5-EnrichgseKEGG-limma")
  enrichgesKEGG_run(deg_results@deg_df_edgeR, x = "logFC", OrgDb = OrgDb,
                    dir = dir,prefix = "5-EnrichgseKEGG-edgeR")
  enrichgesKEGG_run(deg_results@deg_df_DESeq2, x = "log2FoldChange", OrgDb = OrgDb,
                    dir = dir,prefix = "5-EnrichgseKEGG-DESeq2")

}

看完以上源代码,我们可以发现现在的runALL还只是5个步骤的简单加和,需要注意的是由于示例数据使用的是人的数据,如果要做别的物种/数据且需要做GO和KEGG,一定要拿到合适的Org.Db数据库。

逐步操作

除了runALL以外,RNAseqStat还提供分步的操作供大家使用。

1 Check_your_data

pre_check(counts_data = counts_input, group_list = group_list, dir = results_dir)

输出样本PCA图、相关性热图等。

2 DEG_analysis

deg_run(counts_input,group_list,test_group = "T", control_group = "C",dir = results_dir)

主要输出样本质控图、差异基因热图和火山图

热图

火山图

3 GO

enrichGO_run(DEG_df, 
  x = "log2FoldChange", 
  y = "pvalue", 
  dir = results_dir
)

输出GO富集到的通路

4 KEGG

enrichKEGG(gene,
  organism = "hsa",
  keyType = "kegg",
  pvalueCutoff = 0.05,
  pAdjustMethod = "BH",
  universe,
  minGSSize = 10,
  maxGSSize = 500,
  qvalueCutoff = 0.2,
  use_internal_data = FALSE
)

5 gseKEGG

gseKEGG(
  geneList,
  organism = "hsa",
  keyType = "kegg",
  exponent = 1,
  minGSSize = 10,
  maxGSSize = 500,
  eps = 1e-10,
  pvalueCutoff = 0.05,
  pAdjustMethod = "BH",
  verbose = TRUE,
  use_internal_data = FALSE,
  seed = FALSE,
  by = "fgsea",
  ...
)