zl程序教程

您现在的位置是:首页 >  其他

当前栏目

feature counts定量时报错failed to find the gene identifier

2023-03-07 09:44:39 时间

很久没遇到转录组的报错了,最近在曾老师3月的生信入门马拉松课里有个学员遇到了这样的疑问。

在转录组分析中,用feature counts进行定量时出现了这种报错

此图并非学员提问原图

比如我查看gtf的时候,其第9列明明包含了gene_id,但仍会报错。

早在曾老师那里实习时,我遇到过这个问题,归根结底是因为物种不常见,基因组和注释文件都不够完善。而且不同的人上传的gtf格式还是略有不同的,如果选中gene_id作为-g参数会报错,可能是gene_id存在空值,去掉就好了

查看gtf文件

cat genomic.gtf | grep gene_id\ \"\"\  # 四个\是转义符,将后面的字符转义为字符串
NC_001788.1     RefSeq  exon    1       71      .       +       .       gene_id ""; transcript_id "unknown_transcript_1"; gbkey "tRNA"; product "tRNA-Phe"; exon_number "1";
NC_001788.1     RefSeq  exon    72      1045    .       +       .       gene_id ""; transcript_id "unknown_transcript_1"; gbkey "rRNA"; product "s-rRNA"; exon_number "1";
NC_001788.1     RefSeq  exon    1046    1112    .       +       .       gene_id ""; transcript_id "unknown_transcript_1"; gbkey "tRNA"; product "tRNA-Val"; exon_number "1";
NC_001788.1     RefSeq  exon    1113    2692    .       +       .       gene_id ""; transcript_id "unknown_transcript_1"; gbkey "rRNA"; product "l-rRNA"; exon_number "1";
NC_001788.1     RefSeq  exon    2693    2766    .       +       .       gene_id ""; transcript_id "unknown_transcript_1"; codons "2,3"; gbkey "tRNA"; product "tRNA-Leu"; exon_number "1";
NC_001788.1     RefSeq  exon    3725    3793    .       +       .       gene_id ""; transcript_id "unknown_transcript_1"; gbkey "tRNA"; product "tRNA-Ile"; exon_number "1";
NC_001788.1     RefSeq  exon    3791    3863    .       -       .       gene_id ""; transcript_id "unknown_transcript_1"; gbkey "tRNA"; product "tRNA-Gln"; exon_number "1";
NC_001788.1     RefSeq  exon    3866    3934    .       +       .       gene_id ""; transcript_id "unknown_transcript_1"; gbkey "tRNA"; product "tRNA-Met"; exon_number "1";
NC_001788.1     RefSeq  exon    4974    5042    .       +       .       gene_id ""; transcript_id "unknown_transcript_1"; gbkey "tRNA"; product "tRNA-Trp"; exon_number "1";
NC_001788.1     RefSeq  exon    5048    5116    .       -       .       gene_id ""; transcript_id "unknown_transcript_1"; gbkey "tRNA"; product "tRNA-Ala"; exon_number "1";
NC_001788.1     RefSeq  exon    5118    5190    .       -       .       gene_id ""; transcript_id "unknown_transcript_1"; gbkey "tRNA"; product "tRNA-Asn"; exon_number "1";
NC_001788.1     RefSeq  exon    5223    5288    .       -       .       gene_id ""; transcript_id "unknown_transcript_1"; gbkey "tRNA"; product "tRNA-Cys"; exon_number "1";
NC_001788.1     RefSeq  exon    5289    5355    .       -       .       gene_id ""; transcript_id "unknown_transcript_1"; gbkey "tRNA"; product "tRNA-Tyr"; exon_number "1";
NC_001788.1     RefSeq  exon    6899    6967    .       -       .       gene_id ""; transcript_id "unknown_transcript_1"; codons "4,5,6,7"; gbkey "tRNA"; product "tRNA-Ser"; exon_number "1";
NC_001788.1     RefSeq  exon    6976    7042    .       +       .       gene_id ""; transcript_id "unknown_transcript_1"; gbkey "tRNA"; product "tRNA-Asp"; exon_number "1";
NC_001788.1     RefSeq  exon    7730    7798    .       +       .       gene_id ""; transcript_id "unknown_transcript_1"; gbkey "tRNA"; product "tRNA-Lys"; exon_number "1";
NC_001788.1     RefSeq  exon    9425    9494    .       +       .       gene_id ""; transcript_id "unknown_transcript_1"; gbkey "tRNA"; product "tRNA-Gly"; exon_number "1";
NC_001788.1     RefSeq  exon    9842    9910    .       +       .       gene_id ""; transcript_id "unknown_transcript_1"; gbkey "tRNA"; product "tRNA-Arg"; exon_number "1";

处理

通过查看我们发现,gtf文件的gene_id存在一些空值,去掉即可。

sed -i -e '/gene_id\ \"\"\;/d' genomic2.gtf

转录组分析是高度依赖基因组和注释文件的质量的,我们组早在14年测有一个不常见物种的转录组数据,一直到今年才开始分析,就是因为该物种的注释文件去年才发表。如果你觉得现有的注释文件不能满足你,以上我只提供了一种思路,具体的处理方法就见仁见智了。