zl程序教程

您现在的位置是:首页 >  系统

当前栏目

Linux-基础

2023-06-13 09:17:49 时间

linux基础

ssh负责把命令传输到服务器上

SFTP负责把文件传输到服务器上

服务器本质上就是一台远程的电脑,大多数服务器安装的系统是Linux系统。处理大型数据时就需要配置较高的服务器,比如生物信息学中的NGS组学测序数据上游处理就需要服务器。通常我们使用服务器是命令行远程访问而不是桌面操作。 Linux服务器的优点在于允许多用户同时访问。

细节操作

·修改Termius 选中复制 右键粘贴

·输入exit退出termius (快捷键controlcontrol+D) 按上键重新登陆

文件夹的管理或者路径有关的符号

· . 当前目录

·..上一级目录

·~家目录:每个用户的家目录都不同

·/ 只有当/在路径的最前面时才是根目录,其他位置的/都是目录层级分隔符

Linux 命令格式:命令+参数+文件

command -option parameter 中间通过空格隔开

·command是命令名,相应功能的单词或缩写

·[]代表有时候可以省略

·- option:选项,用来对命令进行控制,也可以省略

两种格式:-h,--help

·parameter:传给命令的参数,可以是零个、一个或者多个

·FILE:要处理的文件 #举例看图1

图1

文件夹与文件夹管理命令

·pwd ls cd mkdir touch mv rm cp tar ln #看图2

图2

pwd 操作看图3

图3

ls 具体看图4,5 ll -thr (具体信息 常用)

图4
图5

*匹配任意多个字符(0~正无穷)

?匹配任何单个字符

Mar402 13:10:27 ~
$ ls *.txt
readme.txt
Mar402 13:10:50 ~
$ ls ??????.txt
readme.txt

文件属性 ll -h (图6)

图6

文件权限 (图7)

图7

cd 具体看图8

图8

绝对路径相对路径

·绝对路径:从根目录开始引起的全路径

·相对路径:相对于当前工作目录的路径

创建文件夹 图9

图9
Mar402 13:24:33 /home/t_linux
$ cd ~
Mar402 13:26:18 ~ 
$ ls
Data  Data.tar.gz  readme.txt
Mar402 13:26:19 ~
$ mkdir test     #创建一个名为test的文件夹
Mar402 13:27:13 ~
$ ls
Data  Data.tar.gz  readme.txt  test
Mar402 13:27:28 ~
$ mkdir -p test1/test2  #创建一个名为test1的文件夹,里面包含test2
Mar402 13:28:05 ~
$ tree            #用tree查看
.
├── Data
│   ├── example.fa
│   ├── example.fq
│   ├── example_gene.gtf
│   ├── example.gtf
│   ├── Homo_sapiens.GRCh38.102.chromosome.Y.gff3.gz
│   ├── md5.txt
│   ├── readme.txt
│   ├── reads.1.fq.gz
│   └── reads.2.fq.gz
├── Data.tar.gz
├── readme.txt
├── test
└── test1
    └── test2

4 directories, 11 files

创建新文件 touch

Mar402 13:28:07 ~
$ touch file
Mar402 13:32:40 ~
$ tree
.
├── Data
│   ├── example.fa
│   ├── example.fq
│   ├── example_gene.gtf
│   ├── example.gtf
│   ├── Homo_sapiens.GRCh38.102.chromosome.Y.gff3.gz
│   ├── md5.txt
│   ├── readme.txt
│   ├── reads.1.fq.gz
│   └── reads.2.fq.gz
├── Data.tar.gz
├── file
├── readme.txt
├── test
└── test1
    └── test2

4 directories, 12 files

文件的移动和重命名 看图10 图11

图10
图11

文件删除 图12

图12
Mar402 14:15:15 ~
$ ls
Data  Data.tar.gz  readme.txt  test  test1  test2
Mar402 14:24:29 ~
$ rm Data.tar.gz  #删文件/压缩包是 rm
Mar402 14:24:41 ~
$ ls
Data  readme.txt  test  test1  test2
Mar402 14:24:43 ~
$ rm -r test1      #删文件夹+ -r
Mar402 14:25:09 ~
$ ls
Data  readme.txt  test  test2
Mar402 14:25:10 ~
$ rm -r test2 
Mar402 14:26:04 ~ #打开test文件 看里面的文件 再问你删不删
$ rm -i -r test 
rm: remove directory 'test'? y
Mar402 14:26:48 ~
$ ls
Data  readme.txt

文件的复制粘贴 cp

Mar402 14:31:58 ~
$ ls
Data  readme.txt  test1
Mar402 14:31:59 ~
$ cp readme.txt  test1 #复制readme.txt到 test1中
Mar402 14:32:14 ~
$ ls test1/   #打开test1 有readme.txt这个文件
readme.txt
Mar402 14:32:19 ~
$ cp readme.txt  test1/read #复制readme.txt到 test1中并重命名为read
Mar402 14:32:47 ~
$ ls test1/
read  readme.txt #test1中存在read这个文件,内容和readme.txt是一样的
Mar402 14:33:17 ~
$ tree
.
├── Data
│   ├── example.fa
│   ├── example.fq
│   ├── example_gene.gtf
│   ├── example.gtf
│   ├── Homo_sapiens.GRCh38.102.chromosome.Y.gff3.gz
│   ├── md5.txt
│   ├── readme.txt
│   ├── reads.1.fq.gz
│   └── reads.2.fq.gz
├── readme.txt
└── test1
    ├── read
    └── readme.txt

2 directories, 12 files
图13

文件的压缩或者解压缩 图14

tar -zxvf 加文件 解压

tar -zxcf 加文件 压缩

图14
Mar402 15:11:46 ~/mydir
$ ls
Data.tar.gz
Mar402 15:11:49 ~/mydir
$ tar  -zxvf  Data.tar.gz #解压
Data/
Data/reads.1.fq.gz
Data/example_gene.gtf
Data/example.fq
Data/example.fa
Data/reads.2.fq.gz
Data/example.gtf
Data/readme.txt
Data/md5.txt
Data/Homo_sapiens.GRCh38.102.chromosome.Y.gff3.gz
Mar402 15:11:54 ~/mydir
$ tar -zcvf test.tar.gz   #压缩
tar: Cowardly refusing to create an empty archive #当前所在目录是mydir 里面没有文件,所以不行
Try 'tar --help' or 'tar --usage' for more information.
Mar402 15:16:04 ~/mydir #回到上一级目录
$ cd ../
Mar402 15:16:10 ~
$ pwd
/trainee/Mar402
Mar402 15:16:14 ~
$ ls
Data  mydir  readme.txt  samtools-1.14.tar.bz2  test1 #这一级目录有这么多文件
Mar402 15:17:00 ~
$ tar -zcvf test.tar.gz  #按Tab键 显示有一下这些文件,复制粘贴 进行压缩
.bash_history          Data/                  .profile               test1/
.bashrc                .gnupg/                readme.txt             
.cache/                mydir/                 samtools-1.14.tar.bz2  
$ tar -zcvf test.tar.gz test1/ samtools-1.14.tar.bz2 readme.txt 
test1/
test1/readme.txt
test1/read
samtools-1.14.tar.bz2
readme.txt
Mar402 15:20:09 ~
$ ls      #已经压缩完成
Data  mydir  readme.txt  samtools-1.14.tar.bz2  test1  test.tar.gz
Mar402 15:20:48 ~
$ ll -thr   #查看该目录下具体信息
total 56K
drwxrwxr-x   2 Mar402 Mar402  4.0K Oct 25  2021 Data/
-rw-r--r--   1 Mar402 root     207 Mar 20 14:31 readme.txt
-rw-r--r--   1 Mar402 root     807 Mar 20 14:31 .profile
drwx------   3 Mar402 Mar402  4.0K Mar 20 19:51 .gnupg/
drwx------   2 Mar402 Mar402  4.0K Mar 20 19:51 .cache/
-rw-r--r--   1 Mar402 root    3.3K Mar 20 20:53 .bashrc
drwxrwxr-x   2 Mar402 Mar402  4.0K Mar 25 14:32 test1/
lrwxrwxrwx   1 Mar402 Mar402    35 Mar 25 14:50 samtools-1.14.tar.bz2 -> /home/t_linux/samtools-1.14.tar.bz2
drwxrwxr-x   3 Mar402 Mar402  4.0K Mar 25 15:11 mydir/
drwxr-xr-x   7 Mar402 trainee 4.0K Mar 25 15:19 ./
-rw-rw-r--   1 Mar402 Mar402   430 Mar 25 15:20 test.tar.gz
drwxr-xr-x 173 root   root     12K Mar 25 15:20 ../
-rw-------   1 Mar402 Mar402  3.8K Mar 25 15:20 .bash_history

其他压缩和解压命令 图15

zip 和 unzip:用于压缩和解压缩 *zip文件

gzip 和 gunzip:用于压缩和解压缩 *gz文件

bzip2 和 bunzip2:用于压缩和解压缩 *bz2文件

图15

打包和压缩的区别 先打包后压缩

通用的解压命令 tar -xf +文件

打包:(tar)指将一大堆文件或目录变成一个总的文件

压缩:将一个大的文件通过一些压缩算法变成一个小文件

命令总结 图16

图16

常用的Linux快捷键 图17

图17

linux 命令手册 http://linux.51yip.com/

Linux书籍 https://wizardforcel.gitbooks.io/vbird-linux-basic-4e/content/

文件查看 cat

Mar402 10:52:03 ~
$ ls
Da  Data  Miniconda3-latest-Linux-x86_64.sh  mydir  readme.txt
Mar402 11:04:22 ~
$ cat readme.txt  #打印出所有内容,注意文件的大小
Welcome to Biotrainee() !
This is your personal account in our Cloud.
Have a fun with it.
Please feel free to contact with me( email to jmzeng1314@163.com )
(http://www.biotrainee.com/thread-1376-1-1.html)

Mar402 11:04:30 ~
$ cat -A readme.txt  #打印出所有内容包括符号 在文件/行的末尾会有$
Welcome to Biotrainee() !$
This is your personal account in our Cloud.$
Have a fun with it.$
Please feel free to contact with me( email to jmzeng1314@163.com )$
(http://www.biotrainee.com/thread-1376-1-1.html)$
$
Mar402 11:08:56 ~
$ cat -n readme.txt #标记文件有多少行
     1  Welcome to Biotrainee() !
     2  This is your personal account in our Cloud.
     3  Have a fun with it.
     4  Please feel free to contact with me( email to jmzeng1314@163.com )
     5  (http://www.biotrainee.com/thread-1376-1-1.html)
     6
Mar402 11:09:08 ~
$ cat -b readme.txt #标记文件有多少行,不含空行
     1  Welcome to Biotrainee() !
     2  This is your personal account in our Cloud.
     3  Have a fun with it.
     4  Please feel free to contact with me( email to jmzeng1314@163.com )
     5  (http://www.biotrainee.com/thread-1376-1-1.html)

Mar402 11:09:16 ~
$ ls
Da  Data  Miniconda3-latest-Linux-x86_64.sh  mydir  readme.txt
Mar402 11:11:27 ~
$ cat > file #在file这个文件中输入了内容,>是重定向,所有打印到屏幕上的内容都叫做打印到标准输出流里,用>将内容输入到文件中,就是更改了输出流。输错了没有办法更改
1
2
3
^C
Mar402 11:11:38 ~
$ cat file #查看file这个文件
1
2
3
Mar402 11:11:45 ~
$ ls
Da  Data  file  Miniconda3-latest-Linux-x86_64.sh  mydir  readme.txt  #这个文件file就存在于当前目录下了

zcat:可以查看压缩的文本文件 tac:逆向查看

Mar402 12:13:45 ~
$ zcat Data/reads.1.fq.gz

head/tail -n :查看文件的前/后n行,默认10行

Mar402 12:22:15 ~
$ head Data/example.fa
>gi|556503834|ref|NC_000913.3| Escherichia coli str. K-12 substr. MG1655, complete genome
AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGC
TTCTGAACTGGTTACCTGCCGTGAGTAAATTAAAATTTTATTGACTTAGGTCACTAAATACTTTAACCAA
TATAGGCATAGCGCACAGACAGATAAAAATTACAGAGTACACAACATCCATGAAACGCATTAGCACCACC
ATTACCACCACCATCACCATTACCACAGGTAACGGTGCGGGCTGACGCGTACAGGAAACACAGAAAAAAG
CCCGCACCTGACAGTGCGGGCTTTTTTTTTCGACCAAAGGTAACGAGGTAACAACCATGCGAGTGTTGAA
GTTCGGCGGTACATCAGTGGCAAATGCAGAACGTTTTCTGCGTGTTGCCGATATTCTGGAAAGCAATGCC
AGGCAGGGGCAGGTGGCCACCGTCCTCTCTGCCCCCGCCAAAATCACCAACCACCTGGTGGCGATGATTG
AAAAAACCATTAGCGGCCAGGATGCTTTACCCAATATCAGCGATGCCGAACGTATTTTTGCCGAACTTTT
GACGGGACTCGCCGCCGCCCAGCCGGGGTTCCCGCTGGCGCAATTGAAAACTTTCGTCGATCAGGAATTT
Mar402 12:22:46 ~
$ head -2 Data/example.fa #看前两行
>gi|556503834|ref|NC_000913.3| Escherichia coli str. K-12 substr. MG1655, complete genome
AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGC
Mar402 12:23:02 ~
$ head -1 Data/example.fa #看第一行
>gi|556503834|ref|NC_000913.3| Escherichia coli str. K-12 substr. MG1655, complete genome
Mar402 12:23:08 ~
$ head -1 Data/example.fa | rev # 这一行倒过来
emoneg etelpmoc ,5561GM .rtsbus 21-K .rts iloc aihcirehcsE |3.319000_CN|fer|438305655|ig>
# more 逐页查看 #空格键-翻页;回车-下一行
$ more Data/example.fq
# less 查看文件 具体看图18

一般用 less -NS (单行显示);在最后写/+关键词 ->可查找关键词(n往下看,N往上看,G跳转到末尾,gg跳转到开头)看图19

图18
图19

文本统计 wc

·wc -l 统计行数

·wc -w 统计字符串

·wc -c 统计字节数

$ wc -l Data/example.fq Data/example.fa Data/example.gtf #这样可以用wc-l来统计多个文件的 行数
   4000 Data/example.fq
  64995 Data/example.fa
    237 Data/example.gtf
  69232 total

文本切割 cut

cut -d 指定分隔符切割,默认\t 还可以指定字母、数字或字符为分隔符 #图22

cut -f 输出哪几列 #看图20、21

图20
图21 cut提取出来的不同列不可以更改列的顺序
图22

sort排序 看图23 24

图23 -k是指定哪一个区域
图24
$ cat Data/example.gtf | sort -k 4 -n | column -t | less -SN  #排列规矩对比图24 看图25
图25

uniq: 去除重复行,只能去除相邻的重复行!需要和sort连用

uniq -c:统计每个字符串连续出现的行数 (图26)

图26

paste 文本合并

·paste -d 指定分隔符合并

·paste -s 按行合并

Mar402 15:53:58 ~
$ cat file #文件file
1
2
3
sjdiaf
Mar402 15:54:03 ~
$ cat > file2 #新建文件file2
asdfg
edcvfr
ikmnju
^C
Mar402 15:54:29 ~
$ cat file file2 #纵合并file file2
1
2
3
sjdiaf
asdfg
edcvfr
ikmnju
Mar402 15:54:54 ~
$ cat file file2 > file3 #综合并的file file2 为file3
Mar402 15:55:07 ~
$ cat file3
1
2
3
sjdiaf
asdfg
edcvfr
ikmnju
Mar402 15:55:20 ~ 
$ paste file file2 #paste 横向合并file file2
1       asdfg
2       edcvfr
3       ikmnju
sjdiaf
Mar402 16:51:31 ~
$ paste -s file file2 #paste -s 列变行
1       2       3       sjdiaf
asdfg   edcvfr  ikmnju
# past 用法2 
Mar402 15:55:54 ~
$ seq 20 #seq 用1排到20
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Mar402 16:00:51 ~
$ seq 20 | paste - - - - #paste建立成矩阵
1       2       3       4
5       6       7       8
9       10      11      12
13      14      15      16
17      18      19      20

tr:字符替换

·tr '<pre>' '<dest>'

·tr -d:删除指定字符

·tr -s:缩减连续重复字符

Mar402 16:01:03 ~
$ cat readme.txt | tr '[a-z]' '[A-Z]' #小写a-z替换为大写
WELCOME TO BIOTRAINEE() !
THIS IS YOUR PERSONAL ACCOUNT IN OUR CLOUD.
HAVE A FUN WITH IT.
PLEASE FEEL FREE TO CONTACT WITH ME( EMAIL TO JMZENG1314@163.COM )
(HTTP://WWW.BIOTRAINEE.COM/THREAD-1376-1-1.HTML)
Mar402 16:20:59 ~
$ cat readme.txt | tr 'a' 'A' #把a图换成A
Welcome to BiotrAinee() !
This is your personAl Account in our Cloud.
HAve A fun with it.
PleAse feel free to contAct with me( emAil to jmzeng1314@163.com )
(http://www.biotrAinee.com/threAd-1376-1-1.html)
Mar402 16:26:35 ~
$ cat readme.txt  | tr ' ' '\t' #把空格转换为tab(\t)键
Welcome to      Biotrainee()    !
This    is      your    personal        account in      our     Cloud.
Have    a       fun     with    it.
Please  feel    free    to      contact with    me(     email   to  jmzeng1314@163.com       )
(http://www.biotrainee.com/thread-1376-1-1.html)
Mar402 16:27:22 ~
$ cat readme.txt  | tr ' ' '\t' | cat -A  #cat -A可以用于查看tab键的存在^
Welcome^Ito^IBiotrainee()^I!$
This^Iis^Iyour^Ipersonal^Iaccount^Iin^Iour^ICloud.$
Have^Ia^Ifun^Iwith^Iit.$
Please^Ifeel^Ifree^Ito^Icontact^Iwith^Ime(^Iemail^Ito^Ijmzeng1314@163.com^I)$
(http://www.biotrainee.com/thread-1376-1-1.html)$
$
Mar402 16:27:28 ~
$ cat readme.txt | cat -A 
Welcome to Biotrainee() !$
This is your personal account in our Cloud.$
Have a fun with it.$
Please feel free to contact with me( email to jmzeng1314@163.com )$
(http://www.biotrainee.com/thread-1376-1-1.html)$
$
Mar402 16:28:03 ~
$ cat readme.txt | tr -d ' ' #删除空格
WelcometoBiotrainee()!
ThisisyourpersonalaccountinourCloud.
Haveafunwithit.
Pleasefeelfreetocontactwithme(emailtojmzeng1314@163.com)
(http://www.biotrainee.com/thread-1376-1-1.html)
Mar402 16:29:04 ~
$ cat readme.txt | tr -d '\n'
Welcome to Biotrainee() !This is your personal account in our Cloud.Have a fun with it.Please feel free to contact with me( email to jmzeng1314@163.com )(http://www.biotrainee.com/thread-1376-1-1.html)
Mar402 16:29:23 ~
$ cat Data/example.gtf | head -20 | cut -f 3 | sort | uniq -c #这些字母前存在着大量的空白
      1 CDS
     10 exon
      1 gene
      1 start_codon
      2 transcript
      5 UTR
Mar402 16:38:41 ~
$ cat Data/example.gtf | head -20 | cut -f 3 | sort | uniq -c | cut -d ' ' -f 6 #取前面的第六个空格,发现他们前面的空格并不是一样的

10




Mar402 16:39:04 ~
$ cat Data/example.gtf | head -20 | cut -f 3 | sort | uniq -c | cut -d ' ' -f 7
1
exon
1
1
2
5
Mar402 16:39:11 ~
$ cat Data/example.gtf | head -20 | cut -f 3 | sort | uniq -c |tr -s ' ' #tr-s 用于删除连续的字符
 1 CDS
 10 exon
 1 gene
 1 start_codon
 2 transcript
 5 UTR
Mar402 16:39:39 ~
$ cat Data/example.gtf | cut -f 3 | sort | uniq -c | tr -s ' ' #下面是为了计算第二列想加的数值(第一列是空格)
 29 CDS
 111 exon
 20 gene
 7 start_codon
 9 stop_codon
 34 transcript
 27 UTR
Mar402 16:49:04 ~
$ cat Data/example.gtf | cut -f 3 | sort | uniq -c | tr -s ' ' | cut -d ' ' -f 2 #以空格为分隔符将第二列提取出来
29
111
20
7
9
34
27
Mar402 16:50:33 ~
$ cat Data/example.gtf | cut -f 3 | sort | uniq -c | tr -s ' ' | cut -d ' ' -f 2 | paste -s #paste -s 横向拼接
29      111     20      7       9       34      27
Mar402 16:53:37 ~
$ cat Data/example.gtf | cut -f 3 | sort | uniq -c | tr -s ' ' | cut -d ' ' -f 2 | paste -s -d ':'  #paste -s -d以":"将他们连在一起
29:111:20:7:9:34:27
Mar402 16:53:53 ~
$ cat Data/example.gtf | cut -f 3 | sort | uniq -c | tr -s ' ' | cut -d ' ' -f 2 | paste -s -d '+'
29+111+20+7+9+34+27
Mar402 16:53:58 ~
$ cat Data/example.gtf | cut -f 3 | sort | uniq -c | tr -s ' ' | cut -d ' ' -f 2 | paste -s -d '+' | bc #最终求和 bc
237

文件内容查看 边界小结 图27

图27

练习7

练习7
Mar402 12:52:01 ~
$ less Data/example.gtf | wc
    237    6944   77781
Mar402 12:52:17 ~
$ cat Data/example.gtf | cut -f 9 | head
gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "RP11-34P13.1"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "RP11-34P13.1-201"; level 3; havana_gene "OTTHUMG00000000961";
gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "RP11-34P13.1"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "RP11-34P13.1-201"; level 3; havana_gene "OTTHUMG00000000961";
gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "RP11-34P13.1"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "RP11-34P13.1-201"; level 3; havana_gene "OTTHUMG00000000961";
gene_id "ENSG00000223972"; transcript_id "ENSG00000223972"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "RP11-34P13.1"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "RP11-34P13.1"; level 2; havana_gene "OTTHUMG00000000961";
gene_id "ENSG00000223972"; transcript_id "ENST00000450305"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "RP11-34P13.1"; transcript_type "unprocessed_pseudogene"; transcript_status "KNOWN"; transcript_name "RP11-34P13-001"; level 2; havana_gene "OTTHUMG00000000961"; havana_transcript "OTTHUMT00000002844"; ont "PGO:0000005";
gene_id "ENSG00000223972"; transcript_id "ENST00000450305"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "RP11-34P13.1"; transcript_type "unprocessed_pseudogene"; transcript_status "KNOWN"; transcript_name "RP11-34P13-001"; level 2; havana_gene "OTTHUMG00000000961"; havana_transcript "OTTHUMT00000002844"; ont "PGO:0000005";
gene_id "ENSG00000223972"; transcript_id "ENST00000450305"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "RP11-34P13.1"; transcript_type "unprocessed_pseudogene"; transcript_status "KNOWN"; transcript_name "RP11-34P13-001"; level 2; havana_gene "OTTHUMG00000000961"; havana_transcript "OTTHUMT00000002844"; ont "PGO:0000005";
gene_id "ENSG00000223972"; transcript_id "ENST00000450305"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "RP11-34P13.1"; transcript_type "unprocessed_pseudogene"; transcript_status "KNOWN"; transcript_name "RP11-34P13-001"; level 2; havana_gene "OTTHUMG00000000961"; havana_transcript "OTTHUMT00000002844"; ont "PGO:0000005";
gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "RP11-34P13.1"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "RP11-34P13.1-201"; level 3; havana_gene "OTTHUMG00000000961";
gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "RP11-34P13.1"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "RP11-34P13.1-201"; level 3; havana_gene "OTTHUMG00000000961";
Mar402 12:52:24 ~
$ cat Data/example.gtf | cut -f 9 | cut -d ';' -f 1 | head
gene_id "ENSG00000223972"
gene_id "ENSG00000223972"
gene_id "ENSG00000223972"
gene_id "ENSG00000223972"
gene_id "ENSG00000223972"
gene_id "ENSG00000223972"
gene_id "ENSG00000223972"
gene_id "ENSG00000223972"
gene_id "ENSG00000223972"
gene_id "ENSG00000223972"
Mar402 12:53:26 ~
$ cat Data/example.gtf | cut -f 9 | cut -d ';' -f 1 | sort | uniq -c
      8 gene_id "ENSG00000177693"
     15 gene_id "ENSG00000184731"
      3 gene_id "ENSG00000221311"
      3 gene_id "ENSG00000222623"
     19 gene_id "ENSG00000223972"
      4 gene_id "ENSG00000227061"
     83 gene_id "ENSG00000227232"
      8 gene_id "ENSG00000233004"
      3 gene_id "ENSG00000233750"
     15 gene_id "ENSG00000237613"
     12 gene_id "ENSG00000237683"
     18 gene_id "ENSG00000238009"
     12 gene_id "ENSG00000239368"
      4 gene_id "ENSG00000239906"
      4 gene_id "ENSG00000239945"
      3 gene_id "ENSG00000240361"
      3 gene_id "ENSG00000240786"
      4 gene_id "ENSG00000241599"
      8 gene_id "ENSG00000241860"
      8 gene_id "ENSG00000243485"
Mar402 12:53:46 ~
$ cat Data/example.gtf | cut -f 9 | cut -d ';' -f 1 | sort | uniq -c | tr -s ' ' '\t'
        8       gene_id "ENSG00000177693"
        15      gene_id "ENSG00000184731"
        3       gene_id "ENSG00000221311"
        3       gene_id "ENSG00000222623"
        19      gene_id "ENSG00000223972"
        4       gene_id "ENSG00000227061"
        83      gene_id "ENSG00000227232"
        8       gene_id "ENSG00000233004"
        3       gene_id "ENSG00000233750"
        15      gene_id "ENSG00000237613"
        12      gene_id "ENSG00000237683"
        18      gene_id "ENSG00000238009"
        12      gene_id "ENSG00000239368"
        4       gene_id "ENSG00000239906"
        4       gene_id "ENSG00000239945"
        3       gene_id "ENSG00000240361"
        3       gene_id "ENSG00000240786"
        4       gene_id "ENSG00000241599"
        8       gene_id "ENSG00000241860"
        8       gene_id "ENSG00000243485"
练习七2
Mar402 19:18:50 ~
$ cat Data/md5.txt #原文件格式不工整
fastq_md5       fastq_aspera
d57df747bc142e9850074d512ab9d6db;3331c6a9e0183ff9d398a3292dd45f66       SRR1039508_1.fastq.gz;SRR1039508_2.fastq.gz
49400c5685f36f830a277a59004b119d;ab4410a432cc18c1b9f10f93634e5310       SRR1039509_1.fastq.gz;SRR1039509_2.fastq.gz
d2c2d92c67c943648fdde6c70bc0d920;3e4223e08b97f37f3da17d686739e75c       SRR1039510_1.fastq.gz;SRR1039510_2.fastq.gz
4073b1519608c24c0c1119b580dfd9eb;2fcb23d5fb63e322d80cd3cab75faa0b       SRR1039511_1.fastq.gz;SRR1039511_2.fastq.gz
a35f30576f25ea548c7b3a28895a81cf;83bbe3c587d9477938826ea19c53a281       SRR1039512_1.fastq.gz;SRR1039512_2.fastq.gz
b3073b5b057f24208ac1853fdd4b5875;945cb34259d6dbf0362fe9018f769de4;ecb43490d03c9b325352e70488d58611      SRR1039513.fastq.gz;SRR1039513_1.fastq.gz;SRR1039513_2.fastq.gz
ae35fe0ce13badacc48c65717e811528;9ef4fe59d6378c513f933e24d12f6047       SRR1039514_1.fastq.gz;SRR1039514_2.fastq.gz
929b988eb5730eba77aeac98bf8be35f;c674d2ea79835165828b37258abbc925;5640a85f2c181d4886e905e74a32f041      SRR1039515.fastq.gz;SRR1039515_1.fastq.gz;SRR1039515_2.fastq.gz
8f97b3dc8170ecd6fffb39101c3e5bf5;2c4d2ba3b812f14bce25966c98b5b5df;8599c02799338b9514e8d0077a8409e4      SRR1039516.fastq.gz;SRR1039516_1.fastq.gz;SRR1039516_2.fastq.gz
1f2796f07033ec3bfab0981bd0674bb9;008ba2b3b589d553e3e9f8890d5481c2       SRR1039517_1.fastq.gz;SRR1039517_2.fastq.gz
64d1444ad727f48066aeb6ad314d9190;a24eea863bdca0284591fcd5eb076a93       SRR1039518_1.fastq.gz;SRR1039518_2.fastq.gz
f11f41c013ffaf3a031c9836ce81e6ef;9283f111ef774248f6f666e4bf2b1f81;9bcb6c9675631b1dcb8b07f6916d546c      SRR1039519.fastq.gz;SRR1039519_1.fastq.gz;SRR1039519_2.fastq.gz
d8251c87ba3c803d4344c2b24c77b19d;ca8e0014e7ba56982adc37439cea0755;62838f21e66ec78030b51ee6019420ef      SRR1039520.fastq.gz;SRR1039520_1.fastq.gz;SRR1039520_2.fastq.gz
637e08d030778c6581731647f3c3d8cc;4be82ad33d7d4990bed3c4bc701dc070;435aa5e48ba77e4c42218930a0be0de1      SRR1039521.fastq.gz;SRR1039521_1.fastq.gz;SRR1039521_2.fastq.gz
789e86036c81a85d2c1f014f79822d64;54c572cead4074b126f0b81b344af1be;c461a163b72a71efb4027045e6b4d2f6      SRR1039522.fastq.gz;SRR1039522_1.fastq.gz;SRR1039522_2.fastq.gz
ae33f7f6d536d020a2562b8be6e9cc33;083213dc45820db2eb62d66b89e77ce9       SRR1039523_1.fastq.gz;SRR1039523_2.fastq.gz
Mar402 20:04:38 ~
$ cat Data/md5.txt | cut -f 1 #先将第一列取出
fastq_md5
d57df747bc142e9850074d512ab9d6db;3331c6a9e0183ff9d398a3292dd45f66
49400c5685f36f830a277a59004b119d;ab4410a432cc18c1b9f10f93634e5310
d2c2d92c67c943648fdde6c70bc0d920;3e4223e08b97f37f3da17d686739e75c
4073b1519608c24c0c1119b580dfd9eb;2fcb23d5fb63e322d80cd3cab75faa0b
a35f30576f25ea548c7b3a28895a81cf;83bbe3c587d9477938826ea19c53a281
b3073b5b057f24208ac1853fdd4b5875;945cb34259d6dbf0362fe9018f769de4;ecb43490d03c9b325352e70488d58611
ae35fe0ce13badacc48c65717e811528;9ef4fe59d6378c513f933e24d12f6047
929b988eb5730eba77aeac98bf8be35f;c674d2ea79835165828b37258abbc925;5640a85f2c181d4886e905e74a32f041
8f97b3dc8170ecd6fffb39101c3e5bf5;2c4d2ba3b812f14bce25966c98b5b5df;8599c02799338b9514e8d0077a8409e4
1f2796f07033ec3bfab0981bd0674bb9;008ba2b3b589d553e3e9f8890d5481c2
64d1444ad727f48066aeb6ad314d9190;a24eea863bdca0284591fcd5eb076a93
f11f41c013ffaf3a031c9836ce81e6ef;9283f111ef774248f6f666e4bf2b1f81;9bcb6c9675631b1dcb8b07f6916d546c
d8251c87ba3c803d4344c2b24c77b19d;ca8e0014e7ba56982adc37439cea0755;62838f21e66ec78030b51ee6019420ef
637e08d030778c6581731647f3c3d8cc;4be82ad33d7d4990bed3c4bc701dc070;435aa5e48ba77e4c42218930a0be0de1
789e86036c81a85d2c1f014f79822d64;54c572cead4074b126f0b81b344af1be;c461a163b72a71efb4027045e6b4d2f6
ae33f7f6d536d020a2562b8be6e9cc33;083213dc45820db2eb62d66b89e77ce9
Mar402 20:08:31 ~
$ cat Data/md5.txt | cut -f 1 | tr ';' '\n' #将后面转换为换行
fastq_md5
d57df747bc142e9850074d512ab9d6db
3331c6a9e0183ff9d398a3292dd45f66
49400c5685f36f830a277a59004b119d
ab4410a432cc18c1b9f10f93634e5310
d2c2d92c67c943648fdde6c70bc0d920
3e4223e08b97f37f3da17d686739e75c
4073b1519608c24c0c1119b580dfd9eb
2fcb23d5fb63e322d80cd3cab75faa0b
a35f30576f25ea548c7b3a28895a81cf
83bbe3c587d9477938826ea19c53a281
b3073b5b057f24208ac1853fdd4b5875
945cb34259d6dbf0362fe9018f769de4
ecb43490d03c9b325352e70488d58611
ae35fe0ce13badacc48c65717e811528
9ef4fe59d6378c513f933e24d12f6047
929b988eb5730eba77aeac98bf8be35f
c674d2ea79835165828b37258abbc925
5640a85f2c181d4886e905e74a32f041
8f97b3dc8170ecd6fffb39101c3e5bf5
2c4d2ba3b812f14bce25966c98b5b5df
8599c02799338b9514e8d0077a8409e4
1f2796f07033ec3bfab0981bd0674bb9
008ba2b3b589d553e3e9f8890d5481c2
64d1444ad727f48066aeb6ad314d9190
a24eea863bdca0284591fcd5eb076a93
f11f41c013ffaf3a031c9836ce81e6ef
9283f111ef774248f6f666e4bf2b1f81
9bcb6c9675631b1dcb8b07f6916d546c
d8251c87ba3c803d4344c2b24c77b19d
ca8e0014e7ba56982adc37439cea0755
62838f21e66ec78030b51ee6019420ef
637e08d030778c6581731647f3c3d8cc
4be82ad33d7d4990bed3c4bc701dc070
435aa5e48ba77e4c42218930a0be0de1
789e86036c81a85d2c1f014f79822d64
54c572cead4074b126f0b81b344af1be
c461a163b72a71efb4027045e6b4d2f6
ae33f7f6d536d020a2562b8be6e9cc33
083213dc45820db2eb62d66b89e77ce9
Mar402 20:09:07 ~
$ cat Data/md5.txt | cut -f 1 | tr ';' '\n' > tmp1 #将其保存为tmp1
Mar402 20:09:41 ~
$ cat Data/md5.txt | cut -f 2 | tr ';' '\n' > tmp2  #将其保存为tmp2
$ paste tmp1 tmp2 #将tmp1 tmp2粘在一起
fastq_md5       fastq_aspera
d57df747bc142e9850074d512ab9d6db        SRR1039508_1.fastq.gz
3331c6a9e0183ff9d398a3292dd45f66        SRR1039508_2.fastq.gz
49400c5685f36f830a277a59004b119d        SRR1039509_1.fastq.gz
ab4410a432cc18c1b9f10f93634e5310        SRR1039509_2.fastq.gz
d2c2d92c67c943648fdde6c70bc0d920        SRR1039510_1.fastq.gz
3e4223e08b97f37f3da17d686739e75c        SRR1039510_2.fastq.gz
4073b1519608c24c0c1119b580dfd9eb        SRR1039511_1.fastq.gz
2fcb23d5fb63e322d80cd3cab75faa0b        SRR1039511_2.fastq.gz
a35f30576f25ea548c7b3a28895a81cf        SRR1039512_1.fastq.gz
83bbe3c587d9477938826ea19c53a281        SRR1039512_2.fastq.gz
b3073b5b057f24208ac1853fdd4b5875        SRR1039513.fastq.gz
945cb34259d6dbf0362fe9018f769de4        SRR1039513_1.fastq.gz
ecb43490d03c9b325352e70488d58611        SRR1039513_2.fastq.gz
ae35fe0ce13badacc48c65717e811528        SRR1039514_1.fastq.gz
9ef4fe59d6378c513f933e24d12f6047        SRR1039514_2.fastq.gz
929b988eb5730eba77aeac98bf8be35f        SRR1039515.fastq.gz
c674d2ea79835165828b37258abbc925        SRR1039515_1.fastq.gz
5640a85f2c181d4886e905e74a32f041        SRR1039515_2.fastq.gz
8f97b3dc8170ecd6fffb39101c3e5bf5        SRR1039516.fastq.gz
2c4d2ba3b812f14bce25966c98b5b5df        SRR1039516_1.fastq.gz
8599c02799338b9514e8d0077a8409e4        SRR1039516_2.fastq.gz
1f2796f07033ec3bfab0981bd0674bb9        SRR1039517_1.fastq.gz
008ba2b3b589d553e3e9f8890d5481c2        SRR1039517_2.fastq.gz
64d1444ad727f48066aeb6ad314d9190        SRR1039518_1.fastq.gz
a24eea863bdca0284591fcd5eb076a93        SRR1039518_2.fastq.gz
f11f41c013ffaf3a031c9836ce81e6ef        SRR1039519.fastq.gz
9283f111ef774248f6f666e4bf2b1f81        SRR1039519_1.fastq.gz
9bcb6c9675631b1dcb8b07f6916d546c        SRR1039519_2.fastq.gz
d8251c87ba3c803d4344c2b24c77b19d        SRR1039520.fastq.gz
ca8e0014e7ba56982adc37439cea0755        SRR1039520_1.fastq.gz
62838f21e66ec78030b51ee6019420ef        SRR1039520_2.fastq.gz
637e08d030778c6581731647f3c3d8cc        SRR1039521.fastq.gz
4be82ad33d7d4990bed3c4bc701dc070        SRR1039521_1.fastq.gz
435aa5e48ba77e4c42218930a0be0de1        SRR1039521_2.fastq.gz
789e86036c81a85d2c1f014f79822d64        SRR1039522.fastq.gz
54c572cead4074b126f0b81b344af1be        SRR1039522_1.fastq.gz
c461a163b72a71efb4027045e6b4d2f6        SRR1039522_2.fastq.gz
ae33f7f6d536d020a2562b8be6e9cc33        SRR1039523_1.fastq.gz
083213dc45820db2eb62d66b89e77ce9        SRR1039523_2.fastq.gz
Mar402 20:10:57 ~
$ paste tmp1 tmp2 > tmp3 #粘在一起后给tmp3
Mar402 20:11:27 ~
$ mv tmp3 md5 #yi daomd5
Mar402 20:11:49 ~
$ cat md5 #查看
fastq_md5       fastq_aspera
d57df747bc142e9850074d512ab9d6db        SRR1039508_1.fastq.gz
3331c6a9e0183ff9d398a3292dd45f66        SRR1039508_2.fastq.gz
49400c5685f36f830a277a59004b119d        SRR1039509_1.fastq.gz
ab4410a432cc18c1b9f10f93634e5310        SRR1039509_2.fastq.gz
d2c2d92c67c943648fdde6c70bc0d920        SRR1039510_1.fastq.gz
3e4223e08b97f37f3da17d686739e75c        SRR1039510_2.fastq.gz
4073b1519608c24c0c1119b580dfd9eb        SRR1039511_1.fastq.gz
2fcb23d5fb63e322d80cd3cab75faa0b        SRR1039511_2.fastq.gz
a35f30576f25ea548c7b3a28895a81cf        SRR1039512_1.fastq.gz
83bbe3c587d9477938826ea19c53a281        SRR1039512_2.fastq.gz
b3073b5b057f24208ac1853fdd4b5875        SRR1039513.fastq.gz
945cb34259d6dbf0362fe9018f769de4        SRR1039513_1.fastq.gz
ecb43490d03c9b325352e70488d58611        SRR1039513_2.fastq.gz
ae35fe0ce13badacc48c65717e811528        SRR1039514_1.fastq.gz
9ef4fe59d6378c513f933e24d12f6047        SRR1039514_2.fastq.gz
929b988eb5730eba77aeac98bf8be35f        SRR1039515.fastq.gz
c674d2ea79835165828b37258abbc925        SRR1039515_1.fastq.gz
5640a85f2c181d4886e905e74a32f041        SRR1039515_2.fastq.gz
8f97b3dc8170ecd6fffb39101c3e5bf5        SRR1039516.fastq.gz
2c4d2ba3b812f14bce25966c98b5b5df        SRR1039516_1.fastq.gz
8599c02799338b9514e8d0077a8409e4        SRR1039516_2.fastq.gz
1f2796f07033ec3bfab0981bd0674bb9        SRR1039517_1.fastq.gz
008ba2b3b589d553e3e9f8890d5481c2        SRR1039517_2.fastq.gz
64d1444ad727f48066aeb6ad314d9190        SRR1039518_1.fastq.gz
a24eea863bdca0284591fcd5eb076a93        SRR1039518_2.fastq.gz
f11f41c013ffaf3a031c9836ce81e6ef        SRR1039519.fastq.gz
9283f111ef774248f6f666e4bf2b1f81        SRR1039519_1.fastq.gz
9bcb6c9675631b1dcb8b07f6916d546c        SRR1039519_2.fastq.gz
d8251c87ba3c803d4344c2b24c77b19d        SRR1039520.fastq.gz
ca8e0014e7ba56982adc37439cea0755        SRR1039520_1.fastq.gz
62838f21e66ec78030b51ee6019420ef        SRR1039520_2.fastq.gz
637e08d030778c6581731647f3c3d8cc        SRR1039521.fastq.gz
4be82ad33d7d4990bed3c4bc701dc070        SRR1039521_1.fastq.gz
435aa5e48ba77e4c42218930a0be0de1        SRR1039521_2.fastq.gz
789e86036c81a85d2c1f014f79822d64        SRR1039522.fastq.gz
54c572cead4074b126f0b81b344af1be        SRR1039522_1.fastq.gz
c461a163b72a71efb4027045e6b4d2f6        SRR1039522_2.fastq.gz
ae33f7f6d536d020a2562b8be6e9cc33        SRR1039523_1.fastq.gz
083213dc45820db2eb62d66b89e77ce9        SRR1039523_2.fastq.gz
Mar402 20:11:56 ~
$ cat Data/md5.txt 
fastq_md5       fastq_aspera
d57df747bc142e9850074d512ab9d6db;3331c6a9e0183ff9d398a3292dd45f66       SRR1039508_1.fastq.gz;SRR1039508_2.fastq.gz
49400c5685f36f830a277a59004b119d;ab4410a432cc18c1b9f10f93634e5310       SRR1039509_1.fastq.gz;SRR1039509_2.fastq.gz
d2c2d92c67c943648fdde6c70bc0d920;3e4223e08b97f37f3da17d686739e75c       SRR1039510_1.fastq.gz;SRR1039510_2.fastq.gz
4073b1519608c24c0c1119b580dfd9eb;2fcb23d5fb63e322d80cd3cab75faa0b       SRR1039511_1.fastq.gz;SRR1039511_2.fastq.gz
a35f30576f25ea548c7b3a28895a81cf;83bbe3c587d9477938826ea19c53a281       SRR1039512_1.fastq.gz;SRR1039512_2.fastq.gz
b3073b5b057f24208ac1853fdd4b5875;945cb34259d6dbf0362fe9018f769de4;ecb43490d03c9b325352e70488d58611      SRR1039513.fastq.gz;SRR1039513_1.fastq.gz;SRR1039513_2.fastq.gz
ae35fe0ce13badacc48c65717e811528;9ef4fe59d6378c513f933e24d12f6047       SRR1039514_1.fastq.gz;SRR1039514_2.fastq.gz
929b988eb5730eba77aeac98bf8be35f;c674d2ea79835165828b37258abbc925;5640a85f2c181d4886e905e74a32f041      SRR1039515.fastq.gz;SRR1039515_1.fastq.gz;SRR1039515_2.fastq.gz
8f97b3dc8170ecd6fffb39101c3e5bf5;2c4d2ba3b812f14bce25966c98b5b5df;8599c02799338b9514e8d0077a8409e4      SRR1039516.fastq.gz;SRR1039516_1.fastq.gz;SRR1039516_2.fastq.gz
1f2796f07033ec3bfab0981bd0674bb9;008ba2b3b589d553e3e9f8890d5481c2       SRR1039517_1.fastq.gz;SRR1039517_2.fastq.gz
64d1444ad727f48066aeb6ad314d9190;a24eea863bdca0284591fcd5eb076a93       SRR1039518_1.fastq.gz;SRR1039518_2.fastq.gz
f11f41c013ffaf3a031c9836ce81e6ef;9283f111ef774248f6f666e4bf2b1f81;9bcb6c9675631b1dcb8b07f6916d546c      SRR1039519.fastq.gz;SRR1039519_1.fastq.gz;SRR1039519_2.fastq.gz
d8251c87ba3c803d4344c2b24c77b19d;ca8e0014e7ba56982adc37439cea0755;62838f21e66ec78030b51ee6019420ef      SRR1039520.fastq.gz;SRR1039520_1.fastq.gz;SRR1039520_2.fastq.gz
637e08d030778c6581731647f3c3d8cc;4be82ad33d7d4990bed3c4bc701dc070;435aa5e48ba77e4c42218930a0be0de1      SRR1039521.fastq.gz;SRR1039521_1.fastq.gz;SRR1039521_2.fastq.gz
789e86036c81a85d2c1f014f79822d64;54c572cead4074b126f0b81b344af1be;c461a163b72a71efb4027045e6b4d2f6      SRR1039522.fastq.gz;SRR1039522_1.fastq.gz;SRR1039522_2.fastq.gz
ae33f7f6d536d020a2562b8be6e9cc33;083213dc45820db2eb62d66b89e77ce9       SRR1039523_1.fastq.gz;SRR1039523_2.fastq.gz

-----来自生信技能树----