R中 grep gusb sub 正则表达式 匹配与替换|取代
grep() function searchs for matches of a string or string vector. It returns a vector of the matched elements or their indices.
https://www.bioconductor.org/help/course-materials/2015/BioC2015/
grep(pattern, x,
ignore.case = FALSE, perl = FALSE,
value = FALSE,
fixed = FALSE, useBytes = FALSE, invert = FALSE)
• pattern: string to be matched, supports regular expression
• x: string or string vector
• ignore.case: if FALSE, the pattern matching is case sensitive and if TRUE, case is ignored during matching
• perl: logical. Should perl-compatible regexps be used? Has priority over extended
• fixed: logical. If TRUE, pattern is a string to be matched as is. Overrides all conflicting arguments
• useBytes: logical. If TRUE the matching is done byte-by-byte rather than character-by-character
• invert: logical. If TRUE return indices or values for elements that do not match
grep(value = FALSE) returns an integer vector of the indices of the elements of x that yielded a match (or not, for invert = TRUE).
grep(“rect”, “draw a rectangle”)
[1] 1
str <- c(“Regular”, “expression”, “examples of R language”)
x <- grep(“ex”,str,value=F)
x
[1] 2 3
x <- “line 4322: He is now 25 years old, and weights 130lbs”;
x <- grep("\d","",x)
x
[1] 1
• grep(value = TRUE) returns a character vector containing the selected elements of x (after coercion, preserving names but no other attributes).
grep(“rect”, “draw a rectangle”, value=T)
[1] “draw a rectangle”
x <- grep(“ex”,str,value=T)
x
[1] “expression” “examples of R language”
• grepl returns a logical vector (match or not for each element of x).
x <- grepl(“ex”,str)
x
[1] FALSE TRUE TRUE
R has various functions for regular expression based match and replaces. The grep, grepl, regexpr and gregexpr functions are used for searching for matches, while sub and gsub for performing replacement.
• sub and gsub return a character vector of the same length and with the same attributes as x (after possible coercion to character). Elements of character vectors x which are not substituted will be returned unchanged (including any declared encoding). If useBytes = FALSE a non-ASCII substituted result will often be in UTF-8 with a marked encoding (e.g. if there is a UTF-8 input, and in a multibyte locale unless fixed = TRUE).
str <- c(“Regular”, “expression”, “examples of R language”)
x <- sub(“x.ress”,"",str)
x
[1] “Regular” “eion” “examples of R language”
x <- sub(“x.+e”,"",str)
x
[1] “Regular” “ession” “e”
x <- “line 4322: He is now 25 years old, and weights 130lbs”;
x <- gsub("[[:digit:]]","",x)
x
[1] “line : He is now years old, and weights lbs”
x <- “line 4322: He is now 25 years old, and weights 130lbs”;
x <- gsub("\d+","",x)
x
[1] “line : He is now years old, and weights lbs”
• regexpr returns an integer vector of the same length as text giving the starting position of the first match or -1 if there is none, with attribute “match.length”, an integer vector giving the length of the matched text (or -1 for no match). The match positions and lengths are in characters unless useBytes = TRUE is used, when they are in bytes.
str <- c(“Regular”, “expression”, “examples of R language”)
x <- regexpr(“x*ress”,str)
x
[1] -1 4 -1
• gregexpr returns a list of the same length as text each element of which is of the same form as the return value for regexpr, except that the starting positions of every (disjoint) match are given.
str <- c(“Regular”, “expression”, “examples of R language”)
x <- gregexpr(“x*ress”,str)
x
[[1]]
[1] -1
attr(,“match.length”)
[1] -1
attr(,“useBytes”)
[1] TRUE
[[2]]
[1] 4
attr(,“match.length”)
[1] 4
attr(,“useBytes”)
[1] TRUE
[[3]]
[1] -1
attr(,“match.length”)
[1] -1
attr(,“useBytes”)
[1] TRUE
Regular Expression Syntax:
SyntaxDescription
\dDigit, 0,1,2 … 9
\DNot Digit
\sSpace
\SNot Space
\wWord
\WNot Word
\tTab
\nNew line
^Beginning of the string
$End of the string
\Escape special characters, e.g. \ is “”, + is “+”
|Alternation match. e.g. /(e|d)n/ matches “en” and “dn”
•Any character, except \n or line terminator
[ab]a or b
[^ab]Any character except a and b
[0-9]All Digit
[A-Z]All uppercase A to Z letters
[a-z]All lowercase a to z letters
[A-z]All Uppercase and lowercase a to z letters
i+i at least one time
i*i zero or more times
i?i zero or 1 time
i{n}i occurs n times in sequence
i{n1,n2}i occurs n1 - n2 times in sequence
i{n1,n2}?non greedy match, see above example
i{n,}i occures >= n times
[:alnum:]Alphanumeric characters: [:alpha:] and [:digit:]
[:alpha:]Alphabetic characters: [:lower:] and [:upper:]
[:blank:]Blank characters: e.g. space, tab
[:cntrl:]Control characters
[:digit:]Digits: 0 1 2 3 4 5 6 7 8 9
[:graph:]Graphical characters: [:alnum:] and [:punct:]
[:lower:]Lower-case letters in the current locale
[:print:]Printable characters: [:alnum:], [:punct:] and space
[:punct:]Punctuation character: ! " # $ % & ’ ( ) * + , - . / : ; < = > ? @ [ \ ] ^ _ ` { | } ~
[:space:]Space characters: tab, newline, vertical tab, form feed, carriage return, space
[:upper:]Upper-case letters in the current locale
[:xdigit:]Hexadecimal digits: 0 1 2 3 4 5 6 7 8 9 A B C D E F a b c d e f
http://www.endmemo.com/r/grep.php
相关文章
- 20个正则表达式
- 正则表达式 匹配中文
- 使用正则表达式匹配进行文件类名的更改并且去除注释
- Java正则表达式
- 正则表达式中^的用法
- 【python cookbook】【字符串与文本】7.定义实现最短匹配的正则表达式
- Apache日志文件的正则表达式解析
- 使用正则表达式验证手机号码合法性
- 正则表达式Regex类常用方法
- Python语言学习:Python语言学习之正则表达式常用函数之re.search方法【输出仅一个匹配结果(内容+位置)】、re.findall方法【输出所有匹配结果(内容)】案例集合之详细攻略
- NLP:利用count函数或正则表达式compile、findall、finditer实现匹配统计(包括模糊匹配的贪婪匹配、懒惰匹配)、对多个字符串组成的列表进行多个模糊关键词进行模糊匹配案例
- 如何提高编码效率-工作中常用的正则表达式(全)
- 〖Python语法进阶篇⑩〗- 正则表达式的字符匹配
- 正则表达式.匹配不到n 的文本
- Python、PHP:手机号匹配正则表达式
- 在js中正则表达式验证小时分钟,将输入的字符串转换为对应的小时和分钟
- 正则表达式匹配多个指定字符串
- 使用方便 正则表达式grep,sed,awk(一)
- python里使用正则表达式的组匹配自引用
- python里使用正则表达式来替换匹配成功的组并限定替换的次数
- python里使用正则表达式的词组匹配功能
- python里使用正则表达式的转义字符的匹配
- SQL中常用模糊查询的四种匹配模式&&正则表达式
- Clojure 学习入门(16)- 正则表达式