生信技能树 数据框data.frame练习1
介绍:
生信技能树练习题大全:http://www.biotrainee.com/thread-1754-1-1.html by Jimmy老师
向量(vector)和数据框(data.frame)是R语言用于生信分析时最常用且最重要的两种数据类型,编程语言还是需要多练,熟能生巧,没别的捷径,学了不用也就忘了。
今天做了dataframe的第一节练习,以后有时间再做其他的。
题目链接:https://www.r-exercises.com/2016/01/04/data-frame-exercises/
答案链接:https://www.r-exercises.com/2016/01/04/data-frame-exercises-solutions/
题目
Exercise 1
Create the following data frame, afterwards invert Sex for all individuals.
自己答案
Basic=data.frame(
Age=c(25,31,23,52,76,49,26),
Height=c(177,163,190,179,163,183,164),
Weight=c(57,69,83,75,70,83,53),
Sex=c('F','F','M','M','F','M','F')
)
rownames(Basic)=c('Alex','Lilly','Mark','Oliver','Martha','Lucas','Caroline')
Sex=c('M','M','F','F','M','F','M')
标准答案
Name <- c("Alex", "Lilly", "Mark", "Oliver", "Martha", "Lucas", "Caroline")
Age <- c(25, 31, 23, 52, 76, 49, 26)
Height <- c(177, 163, 190, 179, 163, 183, 164)
Weight <- c(57, 69, 83, 75, 70, 83, 53)
Sex <- as.factor(c("F", "F", "M", "M", "F", "M", "F"))
df <- data.frame (row.names = Name, Age, Height, Weight, Sex)
levels(df$Sex) <- c("M", "F")
df
分析
第一次接触factor的因子函数和level排序,学习一下
Exercise 2
Create this data frame (make sure you import the variable Working as character and not factor).
Add this data frame column-wise to the previous one.
a) How many rows and columns does the new data frame have?
b) What class of data is in each column?
自己答案
Basic2=data.frame(
Working=c('Yes','No','No','Yes','Yes','No','Yes')
)
rownames(Basic2)=c('Alex','Lilly','Mark','Oliver','Martha','Lucas','Caroline')
~~Basic3=merge(Basic,Basic2)~~# 这里我自己不会合并行名相同的两个dataframe,就先这样写了
ncol(Basic3);nrow(Basic3)
class(col(Basci3))
标准答案
Name <- c("Alex", "Lilly", "Mark", "Oliver", "Martha", "Lucas", "Caroline")
Working <- c("Yes", "No", "No", "Yes", "Yes", "No", "Yes")
dfa <- data.frame(row.names = Name, Working)
dfa <- cbind (df,dfa)
dim(dfa)
#or:
nrow(dfa)
ncol(dfa)
sapply(dfa, class)
str(dfa)
分析
cbind函数用于直接合并两个dataframe
除了ncol和nrow 之外可以直接dim(Basic3)
sapply函数用于执行功能
Exercise 3
Check what class of data is the (built-in data set) state.center and convert it to data frame.
自己答案
class(state.center)
as.data.frame(state.center)
标准答案
class (state.center)
df <- as.data.frame(state.center)
Exercise 4
Create a simple data frame from 3 vectors. Order the entire data frame by the first column.
自己答案
df1=data.frame(
a=rnorm(10,0,1),
b=rnorm(10,0,2)
c=rnorm(10,0,3)
)
#不会排序
标准答案
# Example vectors
v <- c(45:41, 30:33)
b <- LETTERS[rep(1:3, 3)]
n <- round(rnorm(9, 65, 5))
df <- data.frame(Age = v, Class = b, Grade = n)
df[with (df, order(Age)),]
#or:
df[order(df$Age), ]
分析
order函数的排序,学习一下
Exercise 5
Create a data frame from a matrix of your choice, change the row names so every row says id_i (where i is the row number) and change the column names to variable_i (where i is the column number). I.e., for column 1 it will say variable_1, and for row 2 will say id_2 and so on.
自己答案
ma=matrix(1:12,3,4)
nrow(ma);ncol(ma)
rownames(ma)=paste('id',1:3,sep = '_')
colnames(ma)=paste('variable',1:4,sep = '_')
标准答案
matr <- matrix(1:20, ncol = 5)
df <- as.data.frame(matr)
colnames(df) <- paste("variable_", 1:ncol(df))
rownames(df) <- paste("id_", 1:nrow(df))
分析
取名字或其他要数行列的情况下,可以直接通过ncol和nrow代替
Exercise 6
For this exercise, we’ll use the (built-in) dataset VADeaths.
a) Make sure the object is a data frame, if not change it to a data frame.
b) Create a new variable, named Total, which is the sum of each row.
c) Change the order of the columns so total is the first variable.
自己答案
class(VADeaths)
dfv=as.data.frame(VADeaths)
dfv$Total=rowSums(dfv)
#rowsums是查找后得知的
#不会排序
标准答案
class(VADeaths)
df <- as.data.frame(VADeaths)
df$Total <- df[, 1] + df[, 2] + df[, 3] + df[, 4]
df$Total <- rowSums(df[1:4])
df <- df[, c(5, 1:4)]
分析
排序方式,即从原dataframe取一个新的子集,按所需要的顺序(如列)取
Exercise 7
For this exercise we’ll use the (built-in) dataset state.x77.
a) Make sure the object is a data frame, if not change it to a data frame.
b) Find out how many states have an income of less than 4300.
c) Find out which is the state with the highest income.
自己答案
class(state.x77)
dfs=as.data.frame(state.x77)
table(dfs$Income<4300)
dfsh=dfs[dfs$Income==max(dfs$Income),]
rownames(dfsh)
标准答案
class (state.x77)
df <- as.data.frame(state.x77)
nrow(subset(df, df$Income < 4300))
row.names(df)[(which(max(df$Income) == df$Income))]
分析
which函数,学习一下
Exercise 8
With the dataset swiss, create a data frame of only the rows 1, 2, 3, 10, 11, 12 and 13, and only the variables Examination, Education and Infant.Mortality.
a) The infant mortality of Sarine is wrong, it should be a NA, change it.
b) Create a row that will be the total sum of the column, name it Total.
c) Create a new variable that will be the proportion of Examination (Examination / Total)
自己答案
class(swiss)
dfs2=swiss[c(1,2,3,10,11,12,13),c('Examination','Education','Infant.Mortality')]
dfs2['Sarine','Infant.Mortality']=NA
dfs2['Total',]=colSums(dfs2)
newvariable=dfs2$Examination[1:(nrow(dfs2)-1)]/rowSums(dfs2[nrow(dfs2)-1,])
标准答案
df <- swiss[c(1:3, 10:13), c("Examination", "Education", "Infant.Mortality")]
df[4,3] <- NA
df["Total",] <- c(sum(df$Examination), sum(df$Education), sum(df$Infant.Mortality, na.rm = TRUE))
df$proportion <- round(df$Examination / df["Total", "Examination"], 3)
分析
最后一个取比例,我自己是把简单的事情复杂化,因为想避开Total/Total这一项;另外Total examination可以直接用df"Total", "Examination"选取,没有必要用rowSums(dfs2nrow(dfs2)-1,再算一遍。round函数取小数点后几位。
Exercise 9
Create a data frame with the datasets state.abb, state.area, state.division, state.name, state.region. The row names should be the names of the states.
a) Rename the column names so only the first 3 letters after the full stop appear (e.g. States.abb will be abb).
自己答案
dfstate=data.frame(state.abb,state.area,state.division,state.region,row.names = state.name)
#不会取字符串子集
标准答案
f <- data.frame(state.abb, state.area, state.division, state.region, row.names = state.name)
names(df) <- substr(names(df), 7, 9)
分析
substr函数取字符串子集,学习一下
Exercise 10
Add the previous data frame column-wise to state.x77
a) Remove the variable div.
b) Also remove the variables Life Exp, HS Grad, Frost, abb, and are.
c) Add a variable to the data frame which should categorize the level of illiteracy:
[0,1) is low, [1,2) is some, [2, inf) is high.
d) Find out which state from the west, with low illiteracy, has the highest income, and what that income is.
自己答案
dfstate2=cbind(state.x77,dfstate)
#a题
dfstate2=dfstate2[,-(colnames(dfstate2)=='div')]
#b题
~~dfstate2=dfstate2[,!(colnames(dfstate2)==('Life··Exp'|'HS··Grad'|'Frost'|'abb'|'are'))]~~
#上述代码报错,空格无法解决,后尝试用%in%
dfstate2=dfstate2[,colnames(dfstate2)%in% c('Life··Exp','HS··Grad','Frost','abb','are')]
#c题不会按值的区间分类,看答案后解决
#d题
dfstate3=dfstate2[dfstate2$reg=='West'&dfstate2$illi=='Low Illiteracy',]
rownames(dfstate3[dfstate3$Income==max(dfstate3$Income),])
标准答案
dfa <- cbind(state.x77, df)
#a)
dfa$div <- NULL
#b)
dfa <- subset(dfa, ,-c(4, 6, 7, 9, 10))
# c)
dfa$illi <- ifelse(dfa$Illiteracy < 1,"Low Illiteracy",
ifelse(dfa$Illiteracy >= 1 & dfa$Illiteracy < 2, "Some Illiteracy",
"High Illiteracy")
)
# Or:
dfa$illi <- cut(dfa$Illiteracy,
c(0, 1, 2, 3),
include.lowest = TRUE,
right = FALSE,
labels = c("Low Illiteracy", "Some Illiteracy", "High Illliteracy"))
# d)
sub <- subset(dfa, illi == "Low Illiteracy" & reg == "West")
max <- max(sub$Income)
stat <- row.names(sub)[which (sub$Income == max)]
cat("Highest income from the West is", max , "the state where it's from is", stat, "\n")
分析
1、b题用了subset函数,学习一下
2、c题根据值的区间将其定义为因子,ifelse容易理解,而cut函数专用于numeric向factor的转变,具有普遍性,学会了都通用。0,1,2,3四个数将0-3分成了三个区间,include.lowest代表左边的值取不取,right表示右边的值取不取,意思就是数学中的左开右闭/左闭右开区间。最后的labels就是分三级。
3、d题用了cat函数,最后输出了一句完整的句子:## Highest income from the West is 5149 the state where it's from is Nevada
写在最后
根据我这两天写代码试运行的结果来看,90%的错误会出现在忘记c,引号('')和逗号(,)这三个上面。忘记c就是忘记创建向量直接写了元素;忘记引号就是把要写的字符直接打成了变量,而变量本身不存在,所以经常会报错;忘记逗号主要是在数据框取某些行或列,只写了行或列的条件,没写逗号表示出行或列,另外就是在创建数据框的不同列时忘记用逗号分隔。所以报错的时候时常想想 c '' , 这三个,或许问题就能解决了。
以后有时间再更新其它练习。
相关文章
- 【黄啊码】MySQL入门—5、掌握这些数据筛选技能比你学python还有用-2
- 【说站】python清洗文件数据的方法
- python用线性回归预测时间序列股票价格|附代码数据
- 跟着Nature Communications学作图:R语言UpSetR画图展示不同组数据之间的交集
- FreeBuf 周报 | 2亿推特用户数据被公开;去哪儿又利用大数据杀熟?
- Docker入门:使用数据卷、文件挂载进行数据存储与共享
- MySQL表:更新数据的正确方法(mysql表更新数据)
- MySQL脱裤大作战:掌握简单的数据迁移技能(mysql脱裤工具)
- 快速导出Redis数据,保障数据安全(redis数据导出)
- 如何进行MySQL数据库表恢复:解决数据丢失问题的必备技能.(mysql数据库表恢复)
- “Linux下捕获数据流量的必备技能:抓包命令”(linux下抓包命令)
- 【火热进行中】postgresql培训,助你快速掌握数据存储技能!(postgresql培训)
- Linux备份与还原:数据保障与恢复的必备技能(linux备份与还原)
- 数据库中的数据查看MySQL数据库中的数据:必备技能(怎么查看mysql)
- 妙用oracle冗余字段提升数据安全性(oracle 冗余字段)
- ASP将MySQL数据迁移到新环境(asp迁移mysql)
- 实现快速数据更新清空Redis节点中的值(清空redis节点中的值)
- 轻松精准MySQL 删除数据指南(mysql中删除一行数据)
- 建立基础Redis思维导图,掌握数据存储技能(基础的Redis思维导图)
- MySQL设置不回显,提高数据安全性(mysql 不回显)
- MySQL三表连查高效查询数据的必备技能(mysql三表连查语句)
- sql2005数据库转为sql2000数据库的方法(数据导出导入)