GVKun编程网logo

Regular Expressions --正则表达式官方教程(正则表达式 详解)

10

对于想了解RegularExpressionsinGrepCommandwith10Examples--reference的读者,本文将是一篇不可错过的文章,并且为您提供关于10LinuxDIGCom

对于想了解Regular Expressions in Grep Command with 10 Examples --reference的读者,本文将是一篇不可错过的文章,并且为您提供关于10 Linux DIG Command Examples for DNS Lookup--reference、10 Questions To Make Programming Interviews Less Expensive--reference、15 Advanced PostgreSQL Commands with Examples、15 Linux Split and Join Command Examples to Manage Large Files--reference的有价值信息。

本文目录一览:

Regular Expressions in Grep Command with 10 Examples --reference

Regular Expressions in Grep Command with 10 Examples --reference

Regular expressions are used to search and manipulate the text,based on the patterns. Most of the Linux commands and programming languages use regular expression.

Grep command is used to search for a specific string in a file. Please refer our earlier article for .

You can also use regular expressions with grep command when you want to search for a text containing a particular pattern. Regular expressions search for the patterns on each line of the file. It simplifies our search operation.This articles is part of a 2 article series.

This part 1 article covers grep examples for simple regular expressions. The future part 2 article will cover advanced regular expression examples in grep.

Let us take the file /var/log/messages file which will be used in our examples.

Example 1. Beginning of line ( ^ )

In grep command,caret Symbol ^ matches the expression at the start of a line. In the following example,it displays all the line which starts with the Nov 10. i.e All the messages logged on November 10.

$ grep "^Nov 10" messages.1
Nov 10 01:12:55 gs123 ntpd[2241]: time reset +0.177479 s
Nov 10 01:17:17 gs123 ntpd[2241]: synchronized to LOCAL(0),stratum 10
Nov 10 01:18:49 gs123 ntpd[2241]: synchronized to 15.1.13.13,stratum 3
Nov 10 13:21:26 gs123 ntpd[2241]: time reset +0.146664 s
Nov 10 13:25:46 gs123 ntpd[2241]: synchronized to LOCAL(0),stratum 10
Nov 10 13:26:27 gs123 ntpd[2241]: synchronized to 15.1.13.13,stratum 3

The ^ matches the expression in the beginning of a line,only if it is the first character in a regular expression. ^N matches line beginning with N.

Example 2. End of the line ( $)

Character $ matches the expression at the end of a line. The following command will help you to get all the lines which ends with the word “terminating”.

$ grep "terminating.$" messages
Jul 12 17:01:09 cloneme kernel: Kernel log daemon terminating.
Oct 28 06:29:54 cloneme kernel: Kernel log daemon terminating.

From the above output you can come to kNow when all the kernel log has got terminated. Just like ^ matches the beginning of the line only if it is the first character,$ matches the end of the line only if it is the last character in a regular expression.

Example 3. Count of empty lines ( ^$ )

Using ^ and $ character you can find out the empty lines available in a file. “^$” specifies empty line.

$ grep -c  "^$" messages anaconda.log
messages:0
anaconda.log:3

The above commands displays the count of the empty lines available in the messages and anaconda.log files.

Example 4. Single Character (.)

The special Meta-character “.” (dot) matches any character except the end of the line character. Let us take the input file which has the content as follows.

$ cat input
1. first line
2. hi hello
3. hi zello how are you
4. cello
5. aello
6. eello
7. last line

Now let us search for a word which has any single character followed by ello. i.e hello,cello etc.,

$ grep ".ello" input
2. hi hello
3. hi zello how are you
4. cello
5. aello
6. eello

In case if you want to search for a word which has only 4 character you can give grep -w “….” where single dot represents any single character.

Example 5. Zero or more occurrence (*)

The special character “*” matches zero or more occurrence of the prevIoUs character. For example,the pattern ’1*’ matches zero or more ’1′.

The following example searches for a pattern “kernel: *” i.e kernel: and zero or more occurrence of space character.

$ grep "kernel: *." *
messages.4:Jul 12 17:01:02 cloneme kernel: ACPI: PCI interrupt for device 0000:00:11.0 disabled
messages.4:Oct 28 06:29:49 cloneme kernel: ACPI: PM-Timer IO Port: 0x1008
messages.4:Oct 28 06:31:06 btovm871 kernel:  sda: sda1 sda2 sda3
messages.4:Oct 28 06:31:06 btovm871 kernel: sd 0:0:0:0: Attached scsi disk sda
.
.

In the above example it matches for kernel and colon symbol followed by any number of spaces/no space and “.” matches any single character.

Example 6. One or more occurrence (\+)

The special character “\+” matches one or more occurrence of the prevIoUs character. ” \+” matches at least one or more space character.

If there is no space then it will not match. The character “+” comes under extended regular expression. So you have to escape when you want to use it with the grep command.

$ cat input
hi hello
hi    hello how are you
hihello

$ grep "hi +hello" input
hi hello
hi hello how are you

In the above example,the grep pattern matches for the pattern ‘hi’,followed by one or more space character,followed by “hello”.

If there is no space between hi and hello it wont match that. However,* character matches zero or more occurrence.

“hihello” will be matched by * as shown below.

$ grep "hi *hello" input
hi hello
hi    hello how are you
hihello
$

Example 7. Zero or one occurrence (\?)

The special character “?” matches zero or one occurrence of the prevIoUs character. “0?” matches single zero or nothing.

$ grep "hi \?hello" input
hi hello
hihello

“hi \?hello” matches hi and hello with single space (hi hello) and no space (hihello).

The line which has more than one space between hi and hello did not get matched in the above command.

Example 8.Escaping the special character (\)

If you want to search for special characters (for example: *,dot) in the content you have to escape the special character in the regular expression.

$ grep "127\.0\.0\.1"  /var/log/messages.4
Oct 28 06:31:10 btovm871 ntpd[2241]: Listening on interface lo,127.0.0.1#123 Enabled

Example 9. Character Class ([0-9])

The character class is nothing but list of characters mentioned with in the square bracket which is used to match only one out of several characters.

$ grep -B 1 "[0123456789]\+ times" /var/log/messages.4
Oct 28 06:38:35 btovm871 init: open(/dev/pts/0): No such file or directory
Oct 28 06:38:35 btovm871 last message repeated 2 times
Oct 28 06:38:38 btovm871 pcscd: winscard.c:304:SCardConnect() Reader E-Gate 0 0 Not Found
Oct 28 06:38:38 btovm871 last message repeated 3 times

Repeated messages will be logged in messages logfile as “last message repeated n times”. The above example searches for the line which has any number (0to9) followed by the word “times”. If it matches it displays the line before the matched line and matched line also.

With in the square bracket,using hyphen you can specify the range of characters. Like [0123456789] can be represented by [0-9]. Alphabets range also can be specified such as [a-z],[A-Z] etc. So the above command can also be written as

$ grep -B 1 "[0-9]\+ times" /var/log/messages.4

Example 10. Exception in the character class

If you want to search for all the characters except those in the square bracket,then use ^ (Caret) symbol as the first character after open square bracket. The following example searches for a line which does not start with the vowel letter from dictionary word file in linux.

$ grep -i  "^[^aeIoU]" /usr/share/dict/linux.words
1080
10-point
10th
11-point
12-point
16-point
18-point
1st
2

First caret symbol in regular expression represents beginning of the line. However,caret symbol inside the square bracket represents “except” — i.e match except everything in the square bracket.

http://www.thegeekstuff.com/2011/01/regular-expressions-in-grep-command/

字符 \

将下一字符标记为特殊字符、文本、反向引用或八进制转义符。例如,“n”匹配字符“n”。“\n”匹配换行符。序列“\\”匹配“\”,“\(”匹配“(”。

^

匹配输入字符串开始的位置。如果设置了 RegExp 对象的 Multiline 属性,^ 还会与“\n”或“\r”之后的位置匹配。

$

匹配输入字符串结尾的位置。如果设置了 RegExp 对象的 Multiline 属性,$ 还会与“\n”或“\r”之前的位置匹配。

*

零次或多次匹配前面的字符或子表达式。例如,zo* 匹配“z”和“zoo”。* 等效于 {0,}。

+

一次或多次匹配前面的字符或子表达式。例如,“zo+”与“zo”和“zoo”匹配,但与“z”不匹配。+ 等效于 {1,}。

?

零次或一次匹配前面的字符或子表达式。例如,“do(es)?”匹配“do”或“does”中的“do”。? 等效于 {0,1}。

{n}

是非负整数。正好匹配 n 次。例如,“o{2}”与“Bob”中的“o”不匹配,但与“food”中的两个“o”匹配。

{n,}

是非负整数。至少匹配 次。例如,“o{2,}”不匹配“Bob”中的“o”,而匹配“foooood”中的所有 o。“o{1,}”等效于“o+”。“o{0,}”等效于“o*”。

{n,m}

M 和 n 是非负整数,其中 n <= m。匹配至少 n 次,至多 m 次。例如,“o{1,3}”匹配“fooooood”中的头三个 o。'o{0,1}' 等效于 'o?'。注意:您不能将空格插入逗号和数字之间。

?

当此字符紧随任何其他限定符(*、+、?、{n}、{n,}、{n,m})之后时,匹配模式是“非贪心的”。“非贪心的”模式匹配搜索到的、尽可能短的字符串,而默认的“贪心的”模式匹配搜索到的、尽可能长的字符串。例如,在字符串“oooo”中,“o+?”只匹配单个“o”,而“o+”匹配所有“o”。

.

匹配除“\n”之外的任何单个字符。若要匹配包括“\n”在内的任意字符,请使用诸如“[\s\S]”之类的模式。

(pattern)

匹配 pattern 并捕获该匹配的子表达式。可以使用 $0…$9 属性从结果“匹配”集合中检索捕获的匹配。若要匹配括号字符 ( ),请使用“\(”或者“\)”。

(?:pattern)

匹配 pattern 但不捕获该匹配的子表达式,即它是一个非捕获匹配,不存储供以后使用的匹配。这对于用“or”字符 (|) 组合模式部件的情况很有用。例如,'industr(?:y|ies) 是比 'industry|industries' 更经济的表达式。

(?=pattern)

执行正向预测先行搜索的子表达式,该表达式匹配处于匹配 pattern 的字符串的起始点的字符串。它是一个非捕获匹配,即不能捕获供以后使用的匹配。例如,'Windows (?=95|98|NT|2000)' 匹配“Windows 2000”中的“Windows”,但不匹配“Windows 3.1”中的“Windows”。预测先行不占用字符,即发生匹配后,下一匹配的搜索紧随上一匹配之后,而不是在组成预测先行的字符后。

(?!pattern)

执行反向预测先行搜索的子表达式,该表达式匹配不处于匹配 pattern 的字符串的起始点的搜索字符串。它是一个非捕获匹配,即不能捕获供以后使用的匹配。例如,'Windows (?!95|98|NT|2000)' 匹配“Windows 3.1”中的 “Windows”,但不匹配“Windows 2000”中的“Windows”。预测先行不占用字符,即发生匹配后,下一匹配的搜索紧随上一匹配之后,而不是在组成预测先行的字符后。

x|y

匹配 x 或 y。例如,'z|food' 匹配“z”或“food”。'(z|f)ood' 匹配“zood”或“food”。

[xyz]

字符集。匹配包含的任一字符。例如,“[abc]”匹配“plain”中的“a”。

[^xyz]

反向字符集。匹配未包含的任何字符。例如,“[^abc]”匹配“plain”中的“p”。

[a-z]

字符范围。匹配指定范围内的任何字符。例如,“[a-z]”匹配“a”到“z”范围内的任何小写字母。

[^a-z]

反向范围字符。匹配不在指定的范围内的任何字符。例如,“[^a-z]”匹配任何不在“a”到“z”范围内的任何字符。

\b

匹配一个字边界,即字与空格间的位置。例如,“er\b”匹配“never”中的“er”,但不匹配“verb”中的“er”。

\B

非字边界匹配。“er\B”匹配“verb”中的“er”,但不匹配“never”中的“er”。

\cx

匹配 x 指示的控制字符。例如,\cM 匹配 Control-M 或回车符。x 的值必须在 A-Z 或 a-z 之间。如果不是这样,则假定 c 就是“c”字符本身。

\d

数字字符匹配。等效于 [0-9]。

\D

非数字字符匹配。等效于 [^0-9]。

\f

换页符匹配。等效于 \x0c 和 \cL。

\n

换行符匹配。等效于 \x0a 和 \cJ。

\r

匹配一个回车符。等效于 \x0d 和 \cM。

\s

匹配任何空白字符,包括空格、制表符、换页符等。与 [ \f\n\r\t\v] 等效。

\S

匹配任何非空白字符。与 [^ \f\n\r\t\v] 等效。

\t

制表符匹配。与 \x09 和 \cI 等效。

\v

垂直制表符匹配。与 \x0b 和 \cK 等效。

\w

匹配任何字类字符,包括下划线。与“[A-Za-z0-9_]”等效。

\W

与任何非单词字符匹配。与“[^A-Za-z0-9_]”等效。

\xn

匹配 n,此处的 n 是一个十六进制转义码。十六进制转义码必须正好是两位数长。例如,“\x41”匹配“A”。“\x041”与“\x04”&“1”等效。允许在正则表达式中使用 ASCII 代码。

\num

匹配 num,此处的 num 是一个正整数。到捕获匹配的反向引用。例如,“(.)\1”匹配两个连续的相同字符。

\n

标识一个八进制转义码或反向引用。如果 \n 前面至少有 n 个捕获子表达式,那么 n 是反向引用。否则,如果 n 是八进制数 (0-7),那么 n 是八进制转义码。

\nm

标识一个八进制转义码或反向引用。如果 \nm 前面至少有 nm 个捕获子表达式,那么 nm 是反向引用。如果 \nm 前面至少有 n 个捕获,则 n 是反向引用,后面跟有字符 m。如果两种前面的情况都不存在,则 \nm 匹配八进制值 nm,其中 和 m 是八进制数字 (0-7)。

\arameter">nml

当 n 是八进制数 (0-3),m 和 l 是八进制数 (0-7) 时,匹配八进制转义码 nml

\un

匹配 n,其中 n 是以四位十六进制数表示的 Unicode 字符。例如,\u00A9 匹配版权符号 (©)。

http://msdn.microsoft.com/zh-cn/library/ae5bf541(v=vs.80).aspx

总结

以上是小编为你收集整理的Regular Expressions in Grep Command with 10 Examples --reference全部内容。

如果觉得小编网站内容还不错,欢迎将小编网站推荐给好友。

Express 相关文章

Express 入门 创建一个express文件
根据官网 入门 express
Express 路由 02
java叫接口control什么的app.get.post等等都是请求方式我们可以在游览器输入localhost端口/或者在Apifox里面写。
Express 中间件
为了前端丢进去的时候可以直接判断中间件就是经过了这个就会被使用可以做一些自己的数据处理什么的。
Express 04 文件的上传和下载
Express 文件的上传和下载
Express 05 使用Node链接sqlite
运行命令下载app.js 增加中间件。
Express 06 制作留言板项目 创建第一个项目
基本模板来的 后面用后就有什么加什么都行。
Express 06 制作留言板项目 完
此篇完成 增删 刷新文件 的基本操作
Express VUE3 博客开发
最基本的创建 以及 添加了其他的一些插件 登录接口和四个接口
  • • Express 入门 创建一个express文件
  • • Express 路由 02
  • • Express 中间件
  • • Express 04 文件的上传和下载
  • • Express 05 使用Node链接sqlite
  • • Express 06 制作留言板项目 创建第一个
  • • Express 06 制作留言板项目 完
  • • Express VUE3 博客开发
  • • Express VUE3 博客开发 用户token验证
  • • python实现计算器功能
HTMLreactjsCSSNode.jsangulartypescriptvue.jsreact-natispringkotlinAPIseleniumtensorflowbashangularJSexpressxcodematplotlibflaskHibernatedictionaryrailscocoswebnpmreact-hookmongoosegoogle-appformswpfRestwebpackunit-testihttpclassfileNext.jsJsHTML5bootstrap-
  • 友情链接:
  • 菜鸟教程
  • 前端之家
  • 编程小课
  • 小编
  • -
  • 我要投稿
  • -
  • 广告合作
  • -
  • 联系我们
  • -
  • 免责声明
  • -
  • 网站地图
版权所有 © 2018 小编 闽ICP备13020303号-8
微信公众号搜索 “ 程序精选 ” ,选择关注!
微信公众号搜"程序精选"关注
微信扫一扫可直接关注哦!
说明

10 Linux DIG Command Examples for DNS Lookup--reference

10 Linux DIG Command Examples for DNS Lookup--reference

by araJAN on 

Dig stands for domain information groper.

Using dig command you can query DNS name servers for your DNS lookup related tasks. This article explains 10 examples on how to use dig command.

1. Simple dig Command Usage (Understand dig Output)

When you pass a domain name to the dig command,by default it displays the A record (the ip-address of the site that is queried) as shown below.In this example,it displays the A record of redhat.com in the “ANSWER SECTION” of the dig command output.

$ dig redhat.com

; <<>> DiG 9.7.3-RedHat-9.7.3-2.el6 <<>> redhat.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY,status: NOERROR,id: 62863
;; flags: qr rd ra; QUERY: 1,ANSWER: 1,AUTHORITY: 4,ADDITIONAL: 3

;; QUESTION SECTION:
;redhat.com. IN A

;; ANSWER SECTION:
redhat.com. 37 IN A 209.132.183.81

;; AUTHORITY SECTION:
redhat.com. 73 IN NS ns4.redhat.com.
redhat.com. 73 IN NS ns3.redhat.com.
redhat.com. 73 IN NS ns2.redhat.com.
redhat.com. 73 IN NS ns1.redhat.com.

;; ADDITIONAL SECTION:
ns1.redhat.com. 73 IN A 209.132.186.218
ns2.redhat.com. 73 IN A 209.132.183.2
ns3.redhat.com. 73 IN A 209.132.176.100

;; Query time: 13 msec
;; SERVER: 209.144.50.138#53(209.144.50.138)
;; WHEN: Thu Jan 12 10:09:49 2012
;; MSG SIZE rcvd: 164

The dig command output has the following sections:

  • Header: This displays the dig command version number,the global options used by the dig command,and few additional header information.
  • QUESTION SECTION: This displays the question it asked the DNS. i.e This is your input. Since we said ‘dig redhat.com’,and the default type dig command uses is A record,it indicates in this section that we asked for the A record of the redhat.com website
  • ANSWER SECTION: This displays the answer it receives from the DNS. i.e This is your output. This displays the A record of redhat.com
  • AUTHORITY SECTION: This displays the DNS name server that has the authority to respond to this query. Basically this displays available name servers of redhat.com
  • ADDITIONAL SECTION: This displays the ip address of the name servers listed in the AUTHORITY SECTION.
  • Stats section at the bottom displays few dig command statistics including how much time it took to execute this query

2. display Only the ANSWER SECTION of the Dig command Output

For most part,all you need to look at is the “ANSWER SECTION” of the dig command. So,we can turn off all other sections as shown below.

  • +nocomments – Turn off the comment lines
  • +noauthority – Turn off the authority section
  • +noadditional – Turn off the additional section
  • +nostats – Turn off the stats section
  • +noanswer – Turn off the answer section (Of course,you wouldn’t want to turn off the answer section)

The following dig command displays only the ANSWER SECTION.

$ dig redhat.com +nocomments +noquestion +noauthority +noadditional +nostats

; <<>> DiG 9.7.3-RedHat-9.7.3-2.el6 <<>> redhat.com +nocomments +noquestion +noauthority +noadditional +nostats
;; global options: +cmd
redhat.com. 9 IN A 209.132.183.81

Instead of disabling all the sections that we don’t want one by one,we can disable all sections using +noall (this turns off answer section also),and add the +answer which will show only the answer section.

The above command can also be written in a short form as shown below,which displays only the ANSWER SECTION.

$ dig redhat.com +noall +answer

; <<>> DiG 9.7.3-RedHat-9.7.3-2.el6 <<>> redhat.com +noall +answer
;; global options: +cmd
redhat.com. 60 IN A 209.132.183.81

3. Query MX Records Using dig -t MX

To query MX records,pass MX as an argument to the dig command as shown below.

$ dig redhat.com  MX +noall +answer

; <<>> DiG 9.7.3-RedHat-9.7.3-2.el6 <<>> redhat.com MX +noall +answer
;; global options: +cmd
redhat.com. 513 IN MX 5 mx1.redhat.com.
redhat.com. 513 IN MX 10 mx2.redhat.com.

You can also use option -t to pass the query type (for example: MX) as shown below.

$ dig -t MX redhat.com +noall +answer

; <<>> DiG 9.7.3-RedHat-9.7.3-2.el6 <<>> -t MX redhat.com +noall +answer
;; global options: +cmd
redhat.com. 489 IN MX 10 mx2.redhat.com.
redhat.com. 489 IN MX 5 mx1.redhat.com.

4. Query NS Records Using dig -t NS

To query the NS record use the type NS as shown below.

$ dig redhat.com NS +noall +answer

; <<>> DiG 9.7.3-RedHat-9.7.3-2.el6 <<>> redhat.com NS +noall +answer
;; global options: +cmd
redhat.com. 558 IN NS ns2.redhat.com.
redhat.com. 558 IN NS ns1.redhat.com.
redhat.com. 558 IN NS ns3.redhat.com.
redhat.com. 558 IN NS ns4.redhat.com.

You can also use option -t to pass the query type (for example: NS) as shown below.

$ dig -t NS redhat.com +noall +answer

; <<>> DiG 9.7.3-RedHat-9.7.3-2.el6 <<>> -t NS redhat.com +noall +answer
;; global options: +cmd
redhat.com. 543 IN NS ns4.redhat.com.
redhat.com. 543 IN NS ns1.redhat.com.
redhat.com. 543 IN NS ns3.redhat.com.
redhat.com. 543 IN NS ns2.redhat.com.

5. View ALL DNS Records Types Using dig -t ANY

To view all the record types (A,MX,NS,etc.),use ANY as the record type as shown below.

$ dig redhat.com ANY +noall +answer

; <<>> DiG 9.7.3-RedHat-9.7.3-2.el6 <<>> redhat.com ANY +noall +answer
;; global options: +cmd
redhat.com. 430 IN MX 5 mx1.redhat.com.
redhat.com. 430 IN MX 10 mx2.redhat.com.
redhat.com. 521 IN NS ns3.redhat.com.
redhat.com. 521 IN NS ns1.redhat.com.
redhat.com. 521 IN NS ns4.redhat.com.
redhat.com. 521 IN NS ns2.redhat.com.

(or) Use -t ANY

$ dig -t ANY redhat.com  +noall +answer

; <<>> DiG 9.7.3-RedHat-9.7.3-2.el6 <<>> -t ANY redhat.com +noall +answer
;; global options: +cmd
redhat.com. 367 IN MX 10 mx2.redhat.com.
redhat.com. 367 IN MX 5 mx1.redhat.com.
redhat.com. 458 IN NS ns4.redhat.com.
redhat.com. 458 IN NS ns1.redhat.com.
redhat.com. 458 IN NS ns2.redhat.com.
redhat.com. 458 IN NS ns3.redhat.com.

6. View Short Output Using dig +short

To view just the ip-address of a web site (i.e the A record),use the short form option as shown below.

$ dig redhat.com +short
209.132.183.81

You can also specify a record type that you want to view with the +short option.

$ dig redhat.com ns +short
ns2.redhat.com.
ns3.redhat.com.
ns1.redhat.com.
ns4.redhat.com.

7. DNS Reverse Look-up Using dig -x

To perform a DNS reverse look up using the ip-address using dig -x as shown below

For example,if you just have an external ip-address and would like to kNow the website that belongs to it,do the following.

$ dig -x 209.132.183.81 +short
www.redhat.com.

To view the full details of the DNS reverse look-up,remove the +short option.

$ dig -x 209.132.183.81

; <<>> DiG 9.7.3-RedHat-9.7.3-2.el6 <<>> -x 209.132.183.81
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY,id: 62435
;; flags: qr rd ra; QUERY: 1,ADDITIONAL: 3

;; QUESTION SECTION:
;81.183.132.209.in-addr.arpa. IN PTR

;; ANSWER SECTION:
81.183.132.209.in-addr.arpa. 600 IN PTR www.redhat.com.

;; AUTHORITY SECTION:
183.132.209.in-addr.arpa. 248 IN NS ns2.redhat.com.
183.132.209.in-addr.arpa. 248 IN NS ns1.redhat.com.
183.132.209.in-addr.arpa. 248 IN NS ns3.redhat.com.
183.132.209.in-addr.arpa. 248 IN NS ns4.redhat.com.

;; ADDITIONAL SECTION:
ns1.redhat.com. 363 IN A 209.132.186.218
ns2.redhat.com. 363 IN A 209.132.183.2
ns3.redhat.com. 363 IN A 209.132.176.100

;; Query time: 35 msec
;; SERVER: 209.144.50.138#53(209.144.50.138)
;; WHEN: Thu Jan 12 10:15:00 2012
;; MSG SIZE rcvd: 193

8. Use a Specific DNS server Using dig @dnsserver

By default dig uses the DNS servers defined in your /etc/resolv.conf file.

If you like to use a different DNS server to perform the query,specify it in the command line as @dnsserver.

The following example uses ns1.redhat.com as the DNS server to get the answer (instead of using the DNS servers from the /etc/resolv.conf file).

$ dig @ns1.redhat.com redhat.com

; <<>> DiG 9.7.3-RedHat-9.7.3-2.el6 <<>> @ns1.redhat.com redhat.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY,id: 20963
;; flags: qr aa rd; QUERY: 1,ADDITIONAL: 4
;; WARNING: recursion requested but not available

;; QUESTION SECTION:
;redhat.com. IN A

;; ANSWER SECTION:
redhat.com. 60 IN A 209.132.183.81

;; AUTHORITY SECTION:
redhat.com. 600 IN NS ns1.redhat.com.
redhat.com. 600 IN NS ns4.redhat.com.
redhat.com. 600 IN NS ns3.redhat.com.
redhat.com. 600 IN NS ns2.redhat.com.

;; ADDITIONAL SECTION:
ns1.redhat.com. 600 IN A 209.132.186.218
ns2.redhat.com. 600 IN A 209.132.183.2
ns3.redhat.com. 600 IN A 209.132.176.100
ns4.redhat.com. 600 IN A 209.132.188.218

;; Query time: 160 msec
;; SERVER: 209.132.186.218#53(209.132.186.218)
;; WHEN: Thu Jan 12 10:22:11 2012
;; MSG SIZE rcvd: 180

9. Bulk DNS Query Using dig -f (and command line)

Query multiple websites using a data file:

You can perform a bulk DNS query based on the data from a file.

First,create a sample names.txt file that contains the website that you want to query.

$ vi names.txt
redhat.com
centos.org

Next,execute dig -f as shown below,which will perform DNS query for the websites listed in the names.txt file and display the output.

$ dig -f names.txt +noall +answer
redhat.com.             60      IN      A       209.132.183.81
centos.org.             60      IN      A       72.232.194.162

You can also combine record type with the -f option. The following example displays the MX records of multiple websites that are located in the names.txt file.

$ dig -f names.txt MX +noall +answer
redhat.com.             600     IN      MX      10 mx2.redhat.com.
redhat.com.             600     IN      MX      5 mx1.redhat.com.
centos.org.             3600    IN      MX      10 mail.centos.org.

Query multiple websites from dig command line:

You can also query multiple websites from the dig command line as shown below. The following example queries MX record for redhat.com,and NS record for centos.org from the command line

$ dig redhat.com mx +noall +answer centos.org ns +noall +answer

; <<>> DiG 9.7.3-RedHat-9.7.3-2.el6 <<>> redhat.com mx +noall +answer centos.org ns +noall +answer
;; global options: +cmd
redhat.com. 332 IN MX 10 mx2.redhat.com.
redhat.com. 332 IN MX 5 mx1.redhat.com.
centos.org. 3778 IN NS ns3.centos.org.
centos.org. 3778 IN NS ns4.centos.org.
centos.org. 3778 IN NS ns1.centos.org.

10. Use $HOME/.digrc File to Store Default dig Options

If you are always trying to view only the ANSWER section of the dig output,you don’t have to keep typing “+noall +answer” on your every dig command. Instead,add your dig options to the .digrc file as shown below.

$ cat $HOME/.digrc
+noall +answer

Now anytime you execute dig command,it will always use +noall and +answer options by default. Now the dig command line became very simple and easy to read without you have to type those options every time.

$ dig redhat.com
redhat.com.             60      IN      A       209.132.183.81

$ dig redhat.com MX
redhat.com. 52 IN MX 5 mx1.redhat.com.
redhat.com. 52 IN MX 10 mx2.redhat.com.

Linux provides several powerful administrative tools and utilities which will help you to manage your systems effectively. If you don’t kNow what these tools are and how to use them,you Could be spending lot of time trying to perform even the basic administrative tasks. The focus of this course is to help you understand system administration tools,which will help you to become an effective Linux system administrator

10 Questions To Make Programming Interviews Less Expensive--reference

10 Questions To Make Programming Interviews Less Expensive--reference

Conducting sql Expert and next one is the full stack developer you are looking for. If you have trust them blindly and invite disappointed. One of the first thing you should do is to filter candidates who claims to have certain skills e.g. sql but doesn't have them,the faster you can weed out those candidates the cheaper will be the hiring process. A  is just for that purpose,it doesn't cost you much and also suitable for candidate,as they don't have to take off and come down to your office. It's flexible for both the parties. When I phone interview someone,I spent fist few minutes to listen them and then I go for my list of weed out programming question to see if candidate is good enough to spend another 30 to 40 minutes. They have saved a lot of time,where I found out that candidate having words like "Strong knowledge of Java","Exceptional in SQL" and "Programming gurus" fail to answer these simple questions. If you are a candidate and gone through couple of interviews,you might have noticed that almost all interviewers make up their minds in the first 10 minutes. The rest of the interview gives them reasons supporting said decision,but not all is lost. If you ever feel that you have messed up with your chance,try coming of some really good answers on rest of questions,if you can impress interviewer to an extent that encourage you to go deep,you may be able to change his initial decision. To get some feedback and improve upon my method,I have decided to share my list of weed out programming questions (don't bother about sharing questions,I have many similar questions on my secret question bank and you can create them easily as well). I have chosen one or two question from 

decoration: underline;">10 Questions to Start Your Programming Interview

sql expert who can't write JOIN queries.,My first weed out question is asking them to describe a . They don't have to get it exactly right,I just  Developer the first weed out question is to explain . Here at minimum I want then to kNow is that a GET is what you generally see in the URL and a POST is commonly what you see in HTML Forms. Again depending upon their answer,you can also further question about limitation,security and usage of GET vs POST method. This question will give you enough hint that whether they really kNow something about internet or not.,one of the popular weed out question is rather simple,how do you find a particular process and kill it? Here I expect them to tell me about ps, grep and kill. Also to gauge their level of understanding you can ask them about ps options e.g. what does a,f and e means in ps -afe command. Second level weed out question in UNIX can be about  e.g. files which are greater than 2GB etc. Don't get me wrong but if a person cannot answer these question,it would be difficult for him to work in a project which has tons of process and connected to tons of other server. One counter argument question against my weed out question,I always hear that it would take just 5 minutes to learn those commands,but they fail to answer me,when I said why they didn't spent those five minutes before coming to interview.

 
 

(Object Oriented Programming),my weed out question is  Here I expect slightly more than the popular deFinition of classes are blue print to create objects,yes that's correct but how do you kNow that he understood the concept and not just have mugged it,Ask him to give examples,and then cross question him on that e.g. where does object get created,who creates it etc. ,particularly when it comes to code,the most popular question to weed out the non-programming programmer is . If a programmer cannot write a Fizz-buzz in 10 to 15 minutes,he probably needs more practice and not ready yet. This is something I don't ask on phone interview but on written test I have before face to face interviews. There has been instances in past before we had a proper interview process of multiple rounds where I had literally asked Fizzbuzz,and their answer took the better part of an hour. Another weed out question in my list for programming is to have them write  and ask them to optimize it. Fibonacci is very common but you would be surprise with number of programmers failing to write in using pen and paper and even on IDE. It also weed out programmers who understand recursion than who doesn't. My experience is the programmer who understand recursion are usually better than those who doesn't. This is where most of natural programmers comes in.,my weed out question is Some one may say that it slightly harsh to judge someone's XML skill with just one question,but you would agree that this is a fundamental. I kNow there are many programmer who has worked in XML and can work in XML but doesn't familiar with this fundamental but shouldn't it's their responsibility to learn fundamental like this,just working is not enough,you also need to fill your gap.,my weed out question is  It's such a fundamental that I expect anyone who has worked or learned Java should know about it. Here I expect that they should mention about some tools which comes with JDK,at least javac (the Java compiler) and JVM,which actually runs every Java program.  One more question in my list to weed out non Java programmer is  I have hard time teaching this fundamental to couple of people and have found that if you don't kNow difference between these two,you will struggle to set-up your project,debugging and fixing those nightmarish ClassNotFoundException and NoClassDefFoundError. It's again a must kNow detail for any one who claims to work in Java. be it in Java or any other language,one of the good weed out question is asking candidate to . You can ask this question differently either by giving him a practical scenario or just asking about how to code so that deadlock doesn't happen. If you have not done many interviews,you will be surprised with how many programmers,with professional experience of 2 to 4 years fail to answer this question correctly.,the first question I ask to candidate is about ,because I believe that as a programmer you must know array,linked list,set,map and string algorithms. If you want to add another level of cushion than you can also ask about  without using any library function. This will give you enough idea whether to proceed further or not.

 trivia is not a good way to find a programmers,but questions which are closely related to practical experience are good way to weed out someone who claims to kNow something but not there yet. The best way to find a programmer is to sit down with them and examine their projects,or have them to pair program with you. Ask them what part are they most proud of and ask them what part they would change,why they would change it and how they would change it. Once you do this,other than personality questions their is nothing more that you need to ask to gauge their ability to program. But if you do this with 100 programmers,you are not wasting lot of your time but also your organization time and money. Before you invite programmer for face to face interview,you must ensure they deserved to be there. It's not practical to call all the guys based upon their agents claim only. Let me kNow what are you set of weed out questions,what do you ask to C, C++, Ruby, Python or JavaScript developer to check whether they deserve your time or not.Read more: 

费波那契数列(:Successione di Fibonacci),又译费波拿契数斐波那契数列斐波那契数列黄金分割数列

上,费波那契数列是以的方法来定义:

  • (n≧2)

用文字来说,就是费波那契数列由0和1开始,之后的费波那契系数就由之前的两数相加。首几个费波那契系数是(OEIS

):

, , , , , , , , , , , , ……

特别指出:不是第一项,而是第零项。

源自:http://zh.wikipedia.org/wiki/%E6%96%90%E6%B3%A2%E9%82%A3%E5%A5%91%E6%95%B0%E5%88%97

15 Advanced PostgreSQL Commands with Examples

15 Advanced PostgreSQL Commands with Examples

转自:http://www.thegeekstuff.com/2009/04/15-practical-postgresql-database-adminstration-commands/


1. 如何找到postgresql数据库中占空间最大的表?

$ /usr/local/pgsql/bin/psql test
Welcome to psql 8.3.7,the Postgresql interactive terminal.

Type:  \copyright for distribution terms
       \h for help with sql commands
       \? for help with psql commands
       \g or terminate with semicolon to execute query
       \q to quit

test=# SELECT relname,relpages FROM pg_class ORDER BY relpages DESC;
              relname              | relpages
-----------------------------------+----------
 pg_proc                           |       50
 pg_proc_proname_args_nsp_index    |       40
 pg_depend                         |       37
 pg_attribute                      |       30

如果你只想要最大的那个表,可以用limit参数来限制结果的数量,就像这样:

# SELECT relname,relpages FROM pg_class ORDER BY relpages DESC limit 1;
 relname | relpages
---------+----------
 pg_proc |       50
(1 row)
  • relname- 关系名/表名
  • relpages- 关系页数(默认情况下一个页大小是8kb)
  • pg_class- 系统表,维护着所有relations的详细信息
  • limit 1- 限制返回结果只显示一行

2. 如何计算postgresql数据库所占用的硬盘大小?

pg_database_size这个方法是专门用来查询数据库大小的,它返回的结果单位是字节(bytes)。:

# SELECT pg_database_size('geekdb');
pg_database_size
------------------
         63287944
(1 row)

如果你想要让结果更直观一点,那就使用**pg_size_pretty**方法,它可以把字节数转换成更友好易读的格式。

# SELECT pg_size_pretty(pg_database_size('geekdb'));
 pg_size_pretty
----------------
 60 MB
(1 row)

3. 如何计算postgresql表所占用的硬盘大小?

下面这个命令查出来的表大小是包含索引和toasted data的,如果你对除去索引外仅仅是表占的大小感兴趣,可以 使用后面提供的那个命令。

# SELECT pg_size_pretty(pg_total_relation_size('big_table'));
 pg_size_pretty
----------------
 55 MB
(1 row)

如何查询不含索引的postgresql表的大小?

使用**pg_relation_size**而不是**pg_total_relation_size**方法。

# SELECT pg_size_pretty(pg_relation_size('big_table'));
 pg_size_pretty
----------------
 38 MB
(1 row)

4. 如何查看postgresql表的索引?

Syntax: # \d table_name

让我们看下面这个例子,注意如果你的表有索引的话,你会在命令输出内容的后面那部分找到一个标题Indexes,在这个例子中,pg_attribut表有两个btree类型的索引,默认情况下postgresql使用的索引类型都 是btree,因为它适用于绝大多数情况。

test=# \d pg_attribute
   Table "pg_catalog.pg_attribute"
    Column     |   Type   | Modifiers
---------------+----------+-----------
 attrelid      | oid      | not null
 attname       | name     | not null
 atttypid      | oid      | not null
 attstattarget | integer  | not null
 attlen        | smallint | not null
 attnum        | smallint | not null
 attndims      | integer  | not null
 attcacheoff   | integer  | not null
 atttypmod     | integer  | not null
 attbyval      | boolean  | not null
 attstorage    | "char"   | not null
 attalign      | "char"   | not null
 attnotnull    | boolean  | not null
 atthasdef     | boolean  | not null
 attisdropped  | boolean  | not null
 attislocal    | boolean  | not null
 attinhcount   | integer  | not null
Indexes:
    "pg_attribute_relid_attnam_index" UNIQUE,btree (attrelid,attname)
    "pg_attribute_relid_attnum_index" UNIQUE,attnum)

5. 如何创建一个指定类型的索引?

默认情况下的索引都是btree类型的,但是你可以用下面的方法来指定新索引的类型。

Syntax: CREATE INDEX name ON table USING index_type (column);

# CREATE INDEX test_index ON numbers using hash (num);

6. 如何在postgresql中使用事务?

如何开始一个事务?

# BEGIN -- 开始事务

如何提交或回滚一个事务?

只有当你调用COMMIT命令后,你在BEGIN命令后所做的所有操作才会真正的被提交到postgresql数据库。另外你还 可以使用ROLLBACK命令来回滚事务中做的所有操作。

# ROLLBACK -- 回滚当前事务
# COMMIT -- 提交当前事务

7. 如何查看postgresql数据库对某个SQL查询的执行计划?

# EXPLAIN query;

8. 如何通过在服务端执行查询来显示执行计划?

下面这个命令会在服务器端执行查询,但是并不会把查询结果给用户,而是返回它实际的执行计划。

# EXPLAIN ANALYZE query;

9. 怎么生成一个序列的数字并把它们插入到一个表中?

下面这个命令将会生成1到1000这一千个数字并插入到numbers表中。

# INSERT INTO numbers (num) VALUES ( generate_series(1,1000));

10. 如何统计postgresql表里面的行数?

这个命令可以查询出表里所有记录的条数。

# select count(*) from table;

这个命令会查询出表中指定列的值不为空的所有行数.

# select count(col_name) from table;

这个命令会查询出表中按制定列的值去重后的总行数。

# select count(distinct col_name) from table;

11. 如何查询表中某列**第二大**的值?

查询某列最大的值

# select max(col_name) from table;

查询某列中第二大的值

# SELECT MAX(num) from number_table where num  < ( select MAX(num) from number_table );

12. 如何查询表中某列**第二小**的值?

查询某列最小的值

# select min(col_name) from table;

查询某列第二小的值

# SELECT MIN(num) from number_table where num > ( select MIN(num) from number_table );

13. 如何列出postgresql数据库中基本的数据类型?

下面截取了部分内容,这个命令可以展示可用的数据类型和它们所占用的字节数。

test=# SELECT typname,typlen from pg_type where typtype='b';
    typname     | typlen
----------------+--------
 bool           |      1
 bytea          |     -1
 char           |      1
 name           |     64
 int8           |      8
 int2           |      2
 int2vector     |     -1
  • typname - 类型的名称
  • typlen - 类型的大小

14. 如何把某一次查询的结果保存为一个文件?

# \o output_file
# SELECT * FROM pg_class;

上面这个查询的结果将会被保存到到"output_file"文件中。当重定向被激活后,之后的所有查询都不再会把结果 打印在屏幕上了。如果要再次打开屏幕输出,需要再执行一次不带任何参数的 o 命令。

# \o

我们之前的文章还有提到过,你可以使用pg_dump和psql来备份和恢复你的数据库

15. 存储加密后的密码

Postgresql数据库可以使用下面的crypt命令来加密数据。这可以用来方便的用来保存你的用户名和密码。

# SELECT crypt ( 'sathiya',gen_salt('md5') );

Postgresql crypt方法可能存在的问题:

crypt在你的环境下可能会用不了,并提供下面这个报错信息。

ERROR:  function gen_salt("unkNown") does not exist
HINT:  No function matches the given name and argument types.
         You may need to add explicit type casts.

解决方法:

为了解决这个问题,你需要安装 postgresql-contrib-版本 这个包,然后在psql中执行下面这个命令。

# \i /usr/share/postgresql/8.1/contrib/pgcrypto.sql

原文后的评论

在第13个命令中,那个typtype='b'是什么意思?

typtype='b'表示basetype。b==basetype.

Postgresql有这么几种数据类型: composite types,domains,and pseudo-types.

http://developer.postgresql.org/pgdocs/postgres/extend-type-system.html

获取第二大/小的值效率问题

如果要查询一个表中某列第二小的值,这样查询要快得多:

SELECT m FROM mytable ORDER BY m LIMIT 1 OFFSET 1;

如果m列有索引的话。

COUNT(*)效率问题

在大表上执行count(*)会有比较明显的效率问题

15 Linux Split and Join Command Examples to Manage Large Files--reference

15 Linux Split and Join Command Examples to Manage Large Files--reference

by  on 

http://www.thegeekstuff.com/2012/10/15-linux-split-and-join-command-examples-to-manage-large-files/

Linux split and join commands are very helpful when you are manipulating large files. This article explains how to use Linux split and join command with descriptive examples.

Join and split command syntax:

join [OPTION]… FILE1 FILE2split [OPTION]… [INPUT [PREFIX]]

Linux Split Command Examples

1. Basic Split Example

Here is a basic example of split command.

$ split split.zip 

$ ls
split.zip xab xad xaf xah xaj xal xan xap xar xat xav xax xaz xbb xbd xbf xbh xbj xbl xbn
xaa xac xae xag xai xak xam xao xaq xas xau xaw xay xba xbc xbe xbg xbi xbk xbm xbo

So we see that the file split.zip was split into smaller files with x** as file names. Where ** is the two character suffix that is added by default. Also,by default each x** file would contain 1000 lines.

$ wc -l *
   40947 split.zip
    1000 xaa
    1000 xab
    1000 xac
    1000 xad
    1000 xae
    1000 xaf
    1000 xag
    1000 xah
    1000 xai
...
...
...

So the output above confirms that by default each x** file contains 1000 lines.

2.Change the Suffix Length using -a option

As discussed in example 1 above,the default suffix length is 2. But this can be changed by using -a option.

As you see in the following example,it is using suffix of length 5 on the split files.

$ split -a5 split.zip
$ ls
split.zip  xaaaac  xaaaaf  xaaaai  xaaaal  xaaaao  xaaaar  xaaaau  xaaaax  xaaaba  xaaabd  xaaabg  xaaabj  xaaabm
xaaaaa     xaaaad  xaaaag  xaaaaj  xaaaam  xaaaap  xaaaas  xaaaav  xaaaay  xaaabb  xaaabe  xaaabh  xaaabk  xaaabn
xaaaab     xaaaae  xaaaah  xaaaak  xaaaan  xaaaaq  xaaaat  xaaaaw  xaaaaz  xaaabc  xaaabf  xaaabi  xaaabl  xaaabo

Note: Earlier we also discussed about other file manipulation utilities – .

3.Customize Split File Size using -b option

Size of each output split file can be controlled using -b option.

In this example,the split files were created with a size of 200000 bytes.

$ split -b200000 split.zip 

$ ls -lart
total 21084
drwxrwxr-x 3 himanshu himanshu 4096 Sep 26 21:20 ..
-rw-rw-r-- 1 himanshu himanshu 10767315 Sep 26 21:21 split.zip
-rw-rw-r-- 1 himanshu himanshu 200000 Sep 26 21:35 xad
-rw-rw-r-- 1 himanshu himanshu 200000 Sep 26 21:35 xac
-rw-rw-r-- 1 himanshu himanshu 200000 Sep 26 21:35 xab
-rw-rw-r-- 1 himanshu himanshu 200000 Sep 26 21:35 xaa
-rw-rw-r-- 1 himanshu himanshu 200000 Sep 26 21:35 xah
-rw-rw-r-- 1 himanshu himanshu 200000 Sep 26 21:35 xag
-rw-rw-r-- 1 himanshu himanshu 200000 Sep 26 21:35 xaf
-rw-rw-r-- 1 himanshu himanshu 200000 Sep 26 21:35 xae
-rw-rw-r-- 1 himanshu himanshu 200000 Sep 26 21:35 xar
...
...
...

4. Create Split Files with Numeric Suffix using -d option

As seen in examples above,the output has the format of x** where ** are alphabets. You can change this to number using -d option.

Here is an example. This has numeric suffix on the split files.

$ split -d split.zip
$ ls
split.zip  x01  x03  x05  x07  x09  x11  x13  x15  x17  x19  x21  x23  x25  x27  x29  x31  x33  x35  x37  x39
x00        x02  x04  x06  x08  x10  x12  x14  x16  x18  x20  x22  x24  x26  x28  x30  x32  x34  x36  x38  x40

5. Customize the Number of Split Chunks using -C option

To get control over the number of chunks,use the -C option.

This example will create 50 chunks of split files.

$ split -n50 split.zip
$ ls
split.zip  xac  xaf  xai  xal  xao  xar  xau  xax  xba  xbd  xbg  xbj  xbm  xbp  xbs  xbv
xaa        xad  xag  xaj  xam  xap  xas  xav  xay  xbb  xbe  xbh  xbk  xbn  xbq  xbt  xbw
xab        xae  xah  xak  xan  xaq  xat  xaw  xaz  xbc  xbf  xbi  xbl  xbo  xbr  xbu  xbx

6. Avoid Zero Sized Chunks using -e option

While splitting a relatively small file in large number of chunks,its good to avoid zero sized chunks as they do not add any value. This can be done using -e option.

Here is an example:

$ split -n50 testfile

$ ls -lart x*
-rw-rw-r-- 1 himanshu himanshu 0 Sep 26 21:55 xag
-rw-rw-r-- 1 himanshu himanshu 1 Sep 26 21:55 xaf
-rw-rw-r-- 1 himanshu himanshu 1 Sep 26 21:55 xae
-rw-rw-r-- 1 himanshu himanshu 1 Sep 26 21:55 xad
-rw-rw-r-- 1 himanshu himanshu 1 Sep 26 21:55 xac
-rw-rw-r-- 1 himanshu himanshu 1 Sep 26 21:55 xab
-rw-rw-r-- 1 himanshu himanshu 1 Sep 26 21:55 xaa
-rw-rw-r-- 1 himanshu himanshu 0 Sep 26 21:55 xbx
-rw-rw-r-- 1 himanshu himanshu 0 Sep 26 21:55 xbw
-rw-rw-r-- 1 himanshu himanshu 0 Sep 26 21:55 xbv
...
...
...

So we see that lots of zero size chunks were produced in the above output. Now,lets use -e option and see the results:

$ split -n50 -e testfile
$ ls
split.zip  testfile  xaa  xab  xac  xad  xae  xaf

$ ls -lart x*
-rw-rw-r-- 1 himanshu himanshu 1 Sep 26 21:57 xaf
-rw-rw-r-- 1 himanshu himanshu 1 Sep 26 21:57 xae
-rw-rw-r-- 1 himanshu himanshu 1 Sep 26 21:57 xad
-rw-rw-r-- 1 himanshu himanshu 1 Sep 26 21:57 xac
-rw-rw-r-- 1 himanshu himanshu 1 Sep 26 21:57 xab
-rw-rw-r-- 1 himanshu himanshu 1 Sep 26 21:57 xaa

So we see that no zero sized chunk was produced in the above output.

7. Customize Number of Lines using -l option

Number of lines per output split file can be customized using the -l option.

As seen in the example below,split files are created with 20000 lines.

$ split -l20000 split.zip

$ ls
split.zip testfile xaa xab xac

$ wc -l x*
20000 xaa
20000 xab
947 xac
40947 total

Get Detailed information using –verbose option

To get a diagnostic message each time a new split file is opened,use –verbose option as shown below.

$ split -l20000 --verbose split.zip
creating file `xaa'
creating file `xab'
creating file `xac'

Linux Join Command Examples

8. Basic Join Example

Join command works on first field of the two files (supplied as input) by matching the first fields.

Here is an example :

$ cat testfile1
1 India
2 US
3 Ireland
4 UK
5 Canada

$ cat testfile2
1 NewDelhi
2 Washington
3 dublin
4 London
5 Toronto

$ join testfile1 testfile2
1 India NewDelhi
2 US Washington
3 Ireland dublin
4 UK London
5 Canada Toronto

So we see that a file containing countries was joined with another file containing capitals on the basis of first field.

9. Join works on Sorted List

If any of the two files supplied to join command is not sorted then it shows up a warning in output and that particular entry is not joined.

In this example,since the input file is not sorted,it will display a warning/error message.

$ cat testfile1
1 India
2 US
3 Ireland
5 Canada
4 UK

$ cat testfile2
1 NewDelhi
2 Washington
3 dublin
4 London
5 Toronto

$ join testfile1 testfile2
1 India NewDelhi
2 US Washington
3 Ireland dublin
join: testfile1:5: is not sorted: 4 UK
5 Canada Toronto

10. Ignore Case using -i option

When comparing fields,the difference in case can be ignored using -i option as shown below.

$ cat testfile1
a India
b US
c Ireland
d UK
e Canada

$ cat testfile2
a NewDelhi
B Washington
c dublin
d London
e Toronto

$ join testfile1 testfile2
a India NewDelhi
c Ireland dublin
d UK London
e Canada Toronto

$ join -i testfile1 testfile2
a India NewDelhi
b US Washington
c Ireland dublin
d UK London
e Canada Toronto

11. Verify that Input is Sorted using –check-order option

Here is an example. Since testfile1 was unsorted towards the end so an error was produced in the output.

$ cat testfile1
a India
b US
c Ireland
d UK
f Australia
e Canada

$ cat testfile2
a NewDelhi
b Washington
c dublin
d London
e Toronto

$ join --check-order testfile1 testfile2
a India NewDelhi
b US Washington
c Ireland dublin
d UK London
join: testfile1:6: is not sorted: e Canada

12. Do not Check the Sortness using –nocheck-order option

This is the opposite of the prevIoUs example. No check for sortness is done in this example,and it will not display any error message.

$ join --nocheck-order testfile1 testfile2
a India NewDelhi
b US Washington
c Ireland dublin
d UK London

13. Print Unpairable Lines using -a option

If both the input files cannot be mapped one to one then through -a[FILENUM] option we can have those lines that cannot be paired while comparing. FILENUM is the file number (1 or 2).

In the following example,we see that using -a1 produced the last line in testfile1 (marked as bold below) which had no pair in testfile2.

$ cat testfile1
a India
b US
c Ireland
d UK
e Canada
f Australia

$ cat testfile2
a NewDelhi
b Washington
c dublin
d London
e Toronto

$ join testfile1 testfile2
a India NewDelhi
b US Washington
c Ireland dublin
d UK London
e Canada Toronto

$ join -a1 testfile1 testfile2
a India NewDelhi
b US Washington
c Ireland dublin
d UK London
e Canada Toronto
f Australia

14. Print Only Unpaired Lines using -v option

In the above example both paired and unpaired lines were produced in the output. But,if only unpaired output is desired then use -v option as shown below.

$ join -v1 testfile1 testfile2
f Australia

15. Join Based on Different Columns from Both Files using -1 and -2 option

By default the first columns in both the files is used for comparing before joining. You can change this behavior using -1 and -2 option.

In the following example,the first column of testfile1 was compared with the second column of testfile2 to produce the join command output.

$ cat testfile1
a India
b US
c Ireland
d UK
e Canada

$ cat testfile2
NewDelhi a
Washington b
dublin c
London d
Toronto e

$ join -1 1 -2 2 testfile1 testfile2
a India NewDelhi
b US Washington
c Ireland dublin
d UK London
e Canada Toronto

关于Regular Expressions in Grep Command with 10 Examples --reference的问题就给大家分享到这里,感谢你花时间阅读本站内容,更多关于10 Linux DIG Command Examples for DNS Lookup--reference、10 Questions To Make Programming Interviews Less Expensive--reference、15 Advanced PostgreSQL Commands with Examples、15 Linux Split and Join Command Examples to Manage Large Files--reference等相关知识的信息别忘了在本站进行查找喔。

如果您对Regular Expressions --正则表达式官方教程正则表达式 详解感兴趣,那么这篇文章一定是您不可错过的。我们将详细讲解Regular Expressions --正则表达式官方教程的各种细节,并对正则表达式 详解进行深入的分析,此外还有关于7.2. re — Regular expression operations正则表达式 p...、ABAP 正则表达式(Regular Expressions)、Grep命令中正则表达式(regular Expressions,RE)的用法、ios – NSRegularExpression很奇怪(正则表达式是正确的)的实用技巧。

本文目录一览:

Regular Expressions --正则表达式官方教程(正则表达式 详解)

Regular Expressions --正则表达式官方教程(正则表达式 详解)

http://docs.oracle.com/javase/tutorial/essential/regex/index.html

This lesson explains how to use the java.util.regex API for pattern matching with regular expressions. Although the Syntax accepted by this package is similar to the  programming language,knowledge of Perl is not a prerequisite. This lesson starts with the basics,and gradually builds to cover more advanced techniques.

Provides a general overview of regular expressions. It also introduces the core classes that comprise this API.
Defines a simple application for testing pattern matching with regular expressions.
Introduces basic pattern matching,Metacharacters,and quoting.
Describes simple character classes,negation,ranges,unions,intersections,and subtraction.
Describes the basic predefined character classes for whitespace,word,and digit characters.
Explains greedy,reluctant,and possessive quantifiers for matching a specified expression x number of times.
Explains how to treat multiple characters as a single unit.
Describes line,and input boundaries.
examines other useful methods of the Pattern class,and explores advanced features such as compiling with flags and using embedded flag expressions.
Describes the commonly-used methods of the Matcher class.
Describes how to examine a PatternSyntaxException.
To read more about regular expressions,consult this section for additional resources.
Introduction
What Are Regular Expressions?

Regular expressions are a way to describe a set of strings based on common characteristics shared by each string in the set. They can be used to search,edit,or manipulate text and data. You must learn a specific Syntax to create regular expressions — one that goes beyond the normal Syntax of the Java programming language. Regular expressions vary in complexity,but once you understand the basics of how they're constructed,you'll be able to decipher (or create) any regular expression.

This trail teaches the regular expression Syntax supported by the java.util.regex API and presents several working examples to illustrate how the varIoUs objects interact. In the world of regular expressions,there are many different flavors to choose from,such as grep,Perl,Tcl,Python,PHP,and awk. The regular expression Syntax in the java.util.regex API is most similar to that found in Perl.

How Are Regular Expressions Represented in This Package?

The java.util.regex package primarily consists of three classes: Pattern, Matcher,and PatternSyntaxException.

  • Pattern object is a compiled representation of a regular expression. The Pattern class provides no public constructors. To create a pattern,you must first invoke one of its public static compile methods,which will then return a Pattern object. These methods accept a regular expression as the first argument; the first few lessons of this trail will teach you the required Syntax.
  • Matcher object is the engine that interprets the pattern and performs match operations against an input string. Like thePattern class, Matcher defines no public constructors. You obtain a Matcher object by invoking the matcher method on a Patternobject.
  • PatternSyntaxException object is an unchecked exception that indicates a Syntax error in a regular expression pattern.

The last few lessons of this trail explore each class in detail. But first,you must understand how regular expressions are actually constructed. Therefore,the next section introduces a simple test harness that will be used repeatedly to explore their Syntax.

Test Harness
This section defines a reusable test harness, RegexTestHarness.java ,for exploring the regular expression constructs supported by this API. The command to run this code is java RegexTestHarness; no command-line arguments are accepted. The application loops repeatedly,prompting the user for a regular expression and input string. Using this test harness is optional,but you may find it convenient for exploring the test cases discussed in the following pages.

import java.io.Console; import java.util.regex.Pattern; import java.util.regex.Matcher;

public class RegexTestHarness {

public static void main(String[] args){
    Console console = Sy<a href="https://www.jb51.cc/tag/stem/" target="_blank">stem</a>.console();
    if (console == null) {
        Sy<a href="https://www.jb51.cc/tag/stem/" target="_blank">stem</a>.err.println("No console.");
        Sy<a href="https://www.jb51.cc/tag/stem/" target="_blank">stem</a>.exit(1);
    }
    while (true) {

        Pattern pattern = 
        Pattern.compile(console.readLine("%nEnter your regex: "));

        Matcher matcher = 
        pattern.matcher(console.readLine("Enter input string to search: "));

        boolean found = false;
        while (matcher.find()) {
            console.format("I found the text" +
                " \"<a href="https://www.jb51.cc/tag/s/" target="_blank">%s</a>\" starting at " +
                "index %d and ending at index %d.%n",matcher.group(),matcher.start(),matcher.end());
            found = true;
        }
        if(!found){
            console.format("No match found.%n");
        }
    }
}

}

Before continuing to the next section,save and compile this code to ensure that your development environment supports the required packages.

String Literals
The most basic form of pattern matching supported by this API is the match of a string literal. For example,if the regular expression is foo and the input string is foo,the match will succeed because the strings are identical. Try this out with the test harness:

Enter your regex: foo Enter input string to search: foo I found the text foo starting at index 0 and ending at index 3.

This match was a success. Note that while the input string is 3 characters long,the start index is 0 and the end index is 3. By convention,ranges are inclusive of the beginning index and exclusive of the end index,as shown in the following figure:

The string li<a href=teral foo,with numbered cells and index values." width="304" height="148" align="bottom">

figureCaption">The string literal foo,with numbered cells and index values.

Each character in the string resides in its own cell,with the index positions pointing between each cell. The string "foo" starts at index 0 and ends at index 3,even though the characters themselves only occupy cells 0,1,and 2.

With subsequent matches,you'll notice some overlap; the start index for the next match is the same as the end index of the prevIoUs match:

Enter your regex: foo Enter input string to search: foofoofoo I found the text foo starting at index 0 and ending at index 3. I found the text foo starting at index 3 and ending at index 6. I found the text foo starting at index 6 and ending at index 9.

Metacharacters

This API also supports a number of special characters that affect the way a pattern is matched. Change the regular expression to cat.and the input string to cats. The output will appear as follows:

Enter your regex: cat. Enter input string to search: cats I found the text cats starting at index 0 and ending at index 4.

The match still succeeds,even though the dot "." is not present in the input string. It succeeds because the dot is a Metacharacter— a character with special meaning interpreted by the matcher. The Metacharacter "." means "any character" which is why the match succeeds in this example.

The Metacharacters supported by this API are: <([{\^-=$!|]})?*+.>

Note: In certain situations the special characters listed above will not be treated as Metacharacters. You'll encounter this as you learn more about how regular expressions are constructed. You can,however,use this list to check whether or not a specific character will ever be considered a Metacharacter. For example,the characters @ and # never carry a special meaning.

There are two ways to force a Metacharacter to be treated as an ordinary character:

  • precede the Metacharacter with a backslash,or
  • enclose it within \Q (which starts the quote) and \E (which ends it).

When using this technique,the \Q and \E can be placed at any location within the expression,provided that the \Q comes first.

Character Classes
If you browse through the Pattern class specification,you'll see tables summarizing the supported regular expression constructs. In the "Character Classes" section you'll find the following:

[abc][^abc][a-zA-Z][a-d[m-p]][a-z&&[def]][a-z&&[^bc]][a-z&&[^m-p]]The left-hand column specifies the regular expression constructs,while the right-hand column describes the conditions under which each construct will match.

Note: The word "class" in the phrase "character class" does not refer to a .class file. In the context of regular expressions,a character class is a set of characters enclosed within square brackets. It specifies the characters that will successfully match a single character from a given input string.

Simple Classes

The most basic form of a character class is to simply place a set of characters side-by-side within square brackets. For example,the regular expression [bcr]at will match the words "bat","cat",or "rat" because it defines a character class (accepting either "b","c",or "r") as its first character.

Enter your regex: [bcr]at Enter input string to search: bat I found the text "bat" starting at index 0 and ending at index 3.

Enter your regex: [bcr]at
Enter input string to search: cat
I found the text "cat" starting at index 0 and ending at index 3.

Enter your regex: [bcr]at
Enter input string to search: rat
I found the text "rat" starting at index 0 and ending at index 3.

Enter your regex: [bcr]at
Enter input string to search: hat
No match found.

In the above examples,the overall match succeeds only when the first letter matches one of the characters defined by the character class.

Negation

To match all characters except those listed,insert the "^" metacharacter at the beginning of the character class. This technique is known as negation.

Enter your regex: [^bcr]at Enter input string to search: bat No match found.

Enter your regex: [^bcr]at
Enter input string to search: cat
No match found.

Enter your regex: [^bcr]at
Enter input string to search: rat
No match found.

Enter your regex: [^bcr]at
Enter input string to search: hat
I found the text "hat" starting at index 0 and ending at index 3.

The match is successful only if the first character of the input string does not contain any of the characters defined by the character class.

Ranges

Sometimes you'll want to define a character class that includes a range of values,such as the letters "a through h" or the numbers "1 through 5". To specify a range,simply insert the "-" metacharacter between the first and last character to be matched,such as[1-5] or [a-h]. You can also place different ranges beside each other within the class to further expand the match possibilities. For example, [a-zA-Z] will match any letter of the alphabet: a to z (lowercase) or A to Z (uppercase).

Here are some examples of ranges and negation:

Enter your regex: [a-c] Enter input string to search: a I found the text "a" starting at index 0 and ending at index 1.

Enter your regex: [a-c]
Enter input string to search: b
I found the text "b" starting at index 0 and ending at index 1.

Enter your regex: [a-c]
Enter input string to search: c
I found the text "c" starting at index 0 and ending at index 1.

Enter your regex: [a-c]
Enter input string to search: d
No match found.

Enter your regex: foo[1-5]
Enter input string to search: foo1
I found the text "foo1" starting at index 0 and ending at index 4.

Enter your regex: foo[1-5]
Enter input string to search: foo5
I found the text "foo5" starting at index 0 and ending at index 4.

Enter your regex: foo[1-5]
Enter input string to search: foo6
No match found.

Enter your regex: foo[^1-5]
Enter input string to search: foo1
No match found.

Enter your regex: foo[^1-5]
Enter input string to search: foo6
I found the text "foo6" starting at index 0 and ending at index 4.

Unions

You can also use unions to create a single character class comprised of two or more separate character classes. To create a union,simply nest one class inside the other,such as [0-4[6-8]]. This particular union creates a single character class that matches the numbers 0,2,3,4,6,7,and 8.

Enter your regex: [0-4[6-8]] Enter input string to search: 0 I found the text "0" starting at index 0 and ending at index 1.

Enter your regex: [0-4[6-8]]
Enter input string to search: 5
No match found.

Enter your regex: [0-4[6-8]]
Enter input string to search: 6
I found the text "6" starting at index 0 and ending at index 1.

Enter your regex: [0-4[6-8]]
Enter input string to search: 8
I found the text "8" starting at index 0 and ending at index 1.

Enter your regex: [0-4[6-8]]
Enter input string to search: 9
No match found.

Intersections

To create a single character class matching only the characters common to all of its nested classes,use &&,as in [0-9&&[345]]. This particular intersection creates a single character class matching only the numbers common to both character classes: 3,and 5.

Enter your regex: [0-9&&[345]] Enter input string to search: 3 I found the text "3" starting at index 0 and ending at index 1.

Enter your regex: [0-9&&[345]]
Enter input string to search: 4
I found the text "4" starting at index 0 and ending at index 1.

Enter your regex: [0-9&&[345]]
Enter input string to search: 5
I found the text "5" starting at index 0 and ending at index 1.

Enter your regex: [0-9&&[345]]
Enter input string to search: 2
No match found.

Enter your regex: [0-9&&[345]]
Enter input string to search: 6
No match found.

And here's an example that shows the intersection of two ranges:

Enter your regex: [2-8&&[4-6]] Enter input string to search: 3 No match found.

Enter your regex: [2-8&&[4-6]]
Enter input string to search: 4
I found the text "4" starting at index 0 and ending at index 1.

Enter your regex: [2-8&&[4-6]]
Enter input string to search: 5
I found the text "5" starting at index 0 and ending at index 1.

Enter your regex: [2-8&&[4-6]]
Enter input string to search: 6
I found the text "6" starting at index 0 and ending at index 1.

Enter your regex: [2-8&&[4-6]]
Enter input string to search: 7
No match found.

Subtraction

Finally,you can use subtraction to negate one or more nested character classes,such as [0-9&&[^345]]. This example creates a single character class that matches everything from 0 to 9, except the numbers 3,and 5.

Enter your regex: [0-9&&[^345]] Enter input string to search: 2 I found the text "2" starting at index 0 and ending at index 1.

Enter your regex: [0-9&&[^345]]
Enter input string to search: 3
No match found.

Enter your regex: [0-9&&[^345]]
Enter input string to search: 4
No match found.

Enter your regex: [0-9&&[^345]]
Enter input string to search: 5
No match found.

Enter your regex: [0-9&&[^345]]
Enter input string to search: 6
I found the text "6" starting at index 0 and ending at index 1.

Enter your regex: [0-9&&[^345]]
Enter input string to search: 9
I found the text "9" starting at index 0 and ending at index 1.

Now that we've covered how character classes are created,You may want to review the  before continuing with the next section.

Predefined Character Classes
The Pattern API contains a number of useful predefined character classes,which offer convenient shorthands for commonly used regular expressions:

.\d[0-9] \D[^0-9] \s[ \t\n\x0B\f\r] \S[^\s] \w[a-zA-Z_0-9] \W[^\w] In the table above,each construct in the left-hand column is shorthand for the character class in the right-hand column. For example, \d means a range of digits (0-9),and \w means a word character (any lowercase letter,any uppercase letter,the underscore character,or any digit). Use the predefined classes whenever possible. They make your code easier to read and eliminate errors introduced by malformed character classes.

Constructs beginning with a backslash are called escaped constructs. We previewed escaped constructs in the  section where we mentioned the use of backslash and \Q and \E for quotation. If you are using an escaped construct within a string literal,you must precede the backslash with another backslash for the string to compile. For example:

private final String REGEX = "\\d"; // a single digit

In this example \d is the regular expression; the extra backslash is required for the code to compile. The test harness reads the expressions directly from the Console,so the extra backslash is unnecessary.

The following examples demonstrate the use of predefined character classes.

Enter your regex: . Enter input string to search: @ I found the text "@" starting at index 0 and ending at index 1.

Enter your regex: .
Enter input string to search: 1
I found the text "1" starting at index 0 and ending at index 1.

Enter your regex: .
Enter input string to search: a
I found the text "a" starting at index 0 and ending at index 1.

Enter your regex: \d
Enter input string to search: 1
I found the text "1" starting at index 0 and ending at index 1.

Enter your regex: \d
Enter input string to search: a
No match found.

Enter your regex: \D
Enter input string to search: 1
No match found.

Enter your regex: \D
Enter input string to search: a
I found the text "a" starting at index 0 and ending at index 1.

Enter your regex: \s
Enter input string to search:
I found the text " " starting at index 0 and ending at index 1.

Enter your regex: \s
Enter input string to search: a
No match found.

Enter your regex: \S
Enter input string to search:
No match found.

Enter your regex: \S
Enter input string to search: a
I found the text "a" starting at index 0 and ending at index 1.

Enter your regex: \w
Enter input string to search: a
I found the text "a" starting at index 0 and ending at index 1.

Enter your regex: \w
Enter input string to search: !
No match found.

Enter your regex: \W
Enter input string to search: a
No match found.

Enter your regex: \W
Enter input string to search: !
I found the text "!" starting at index 0 and ending at index 1.

In the first three examples,the regular expression is simply . (the "dot" Metacharacter) that indicates "any character." Therefore,the match is successful in all three cases (a randomly selected @ character,a digit,and a letter). The remaining examples each use a single regular expression construct from the . You can refer to this table to figure out the logic behind each match:

  • \d matches all digits
  • \s matches spaces
  • \w matches word characters

Alternatively,a capital letter means the opposite:

  • \D matches non-digits
  • \S matches non-spaces
  • \W matches non-word characters
Quantifiers
Quantifiers allow you to specify the number of occurrences to match against. For convenience,the three sections of the Pattern API specification describing greedy,and possessive quantifiers are presented below. At first glance it may appear that the quantifiers X?X?? and X?+ do exactly the same thing,since they all promise to match "X,once or not at all". There are subtle implementation differences which will be explained near the end of this section.

X?X??X?+X,once or not at allX*X*?X*+X,zero or more timesX+X+?X++X,one or more timesX{n}X{n}?X{n}+X,exactly n timesX{n,}X{n,}?X{n,}+X,at least n timesX{n,m}X{n,m}?X{n,m}+X,at least n but not more than m timesLet's start our look at greedy quantifiers by creating three different regular expressions: the letter "a" followed by either ?*,or +. Let's see what happens when these expressions are tested against an empty input string "":

Enter your regex: a? Enter input string to search: I found the text "" starting at index 0 and ending at index 0.

Enter your regex: a*
Enter input string to search:
I found the text "" starting at index 0 and ending at index 0.

Enter your regex: a+
Enter input string to search:
No match found.

Zero-Length Matches

In the above example,the match is successful in the first two cases because the expressions a? and a* both allow for zero occurrences of the letter a. You'll also notice that the start and end indices are both zero,which is unlike any of the examples we've seen so far. The empty input string "" has no length,so the test simply matches nothing at index 0. Matches of this sort are known as a zero-length matches. A zero-length match can occur in several cases: in an empty input string,at the beginning of an input string,after the last character of an input string,or in between any two characters of an input string. Zero-length matches are easily identifiable because they always start and end at the same index position.

Let's explore zero-length matches with a few more examples. Change the input string to a single letter "a" and you'll notice something interesting:

Enter your regex: a? Enter input string to search: a I found the text "a" starting at index 0 and ending at index 1. I found the text "" starting at index 1 and ending at index 1.

Enter your regex: a*
Enter input string to search: a
I found the text "a" starting at index 0 and ending at index 1.
I found the text "" starting at index 1 and ending at index 1.

Enter your regex: a+
Enter input string to search: a
I found the text "a" starting at index 0 and ending at index 1.

All three quantifiers found the letter "a",but the first two also found a zero-length match at index 1; that is,after the last character of the input string. Remember,the matcher sees the character "a" as sitting in the cell between index 0 and index 1,and our test harness loops until it can no longer find a match. Depending on the quantifier used,the presence of "nothing" at the index after the last character may or may not trigger a match.

Now change the input string to the letter "a" five times in a row and you'll get the following:

Enter your regex: a? Enter input string to search: aaaaa I found the text "a" starting at index 0 and ending at index 1. I found the text "a" starting at index 1 and ending at index 2. I found the text "a" starting at index 2 and ending at index 3. I found the text "a" starting at index 3 and ending at index 4. I found the text "a" starting at index 4 and ending at index 5. I found the text "" starting at index 5 and ending at index 5.

Enter your regex: a*
Enter input string to search: aaaaa
I found the text "aaaaa" starting at index 0 and ending at index 5.
I found the text "" starting at index 5 and ending at index 5.

Enter your regex: a+
Enter input string to search: aaaaa
I found the text "aaaaa" starting at index 0 and ending at index 5.

The expression a? finds an individual match for each character,since it matches when "a" appears zero or one times. The expressiona* finds two separate matches: all of the letter "a"'s in the first match,then the zero-length match after the last character at index 5. And finally, a+ matches all occurrences of the letter "a",ignoring the presence of "nothing" at the last index.

At this point,you might be wondering what the results would be if the first two quantifiers encounter a letter other than "a". For example,what happens if it encounters the letter "b",as in "ababaaaab"?

Let's find out:

Enter your regex: a? Enter input string to search: ababaaaab I found the text "a" starting at index 0 and ending at index 1. I found the text "" starting at index 1 and ending at index 1. I found the text "a" starting at index 2 and ending at index 3. I found the text "" starting at index 3 and ending at index 3. I found the text "a" starting at index 4 and ending at index 5. I found the text "a" starting at index 5 and ending at index 6. I found the text "a" starting at index 6 and ending at index 7. I found the text "a" starting at index 7 and ending at index 8. I found the text "" starting at index 8 and ending at index 8. I found the text "" starting at index 9 and ending at index 9.

Enter your regex: a*
Enter input string to search: ababaaaab
I found the text "a" starting at index 0 and ending at index 1.
I found the text "" starting at index 1 and ending at index 1.
I found the text "a" starting at index 2 and ending at index 3.
I found the text "" starting at index 3 and ending at index 3.
I found the text "aaaa" starting at index 4 and ending at index 8.
I found the text "" starting at index 8 and ending at index 8.
I found the text "" starting at index 9 and ending at index 9.

Enter your regex: a+
Enter input string to search: ababaaaab
I found the text "a" starting at index 0 and ending at index 1.
I found the text "a" starting at index 2 and ending at index 3.
I found the text "aaaa" starting at index 4 and ending at index 8.

Even though the letter "b" appears in cells 1,and 8,the output reports a zero-length match at those locations. The regular expression a? is not specifically looking for the letter "b"; it's merely looking for the presence (or lack thereof) of the letter "a". If the quantifier allows for a match of "a" zero times,anything in the input string that's not an "a" will show up as a zero-length match. The remaining a's are matched according to the rules discussed in the previous examples.

To match a pattern exactly n number of times,simply specify the number inside a set of braces:

Enter your regex: a{3} Enter input string to search: aa No match found.

Enter your regex: a{3}
Enter input string to search: aaa
I found the text "aaa" starting at index 0 and ending at index 3.

Enter your regex: a{3}
Enter input string to search: aaaa
I found the text "aaa" starting at index 0 and ending at index 3.

Here,the regular expression a{3} is searching for three occurrences of the letter "a" in a row. The first test fails because the input string does not have enough a's to match against. The second test contains exactly 3 a's in the input string,which triggers a match. The third test also triggers a match because there are exactly 3 a's at the beginning of the input string. Anything following that is irrelevant to the first match. If the pattern should appear again after that point,it would trigger subsequent matches:

Enter your regex: a{3} Enter input string to search: aaaaaaaaa I found the text "aaa" starting at index 0 and ending at index 3. I found the text "aaa" starting at index 3 and ending at index 6. I found the text "aaa" starting at index 6 and ending at index 9.

To require a pattern to appear at least n times,add a comma after the number:

Enter your regex: a{3,} Enter input string to search: aaaaaaaaa I found the text "aaaaaaaaa" starting at index 0 and ending at index 9.

With the same input string,this test finds only one match,because the 9 a's in a row satisfy the need for "at least" 3 a's.

Finally,to specify an upper limit on the number of occurances,add a second number inside the braces:

Enter your regex: a{3,6} // find at least 3 (but no more than 6) a's in a row Enter input string to search: aaaaaaaaa I found the text "aaaaaa" starting at index 0 and ending at index 6. I found the text "aaa" starting at index 6 and ending at index 9.

Here the first match is forced to stop at the upper limit of 6 characters. The second match includes whatever is left over,which happens to be three a's — the mimimum number of characters allowed for this match. If the input string were one character shorter,there would not be a second match since only two a's would remain.

Capturing Groups and Character Classes with Quantifiers

Until now,we've only tested quantifiers on input strings containing one character. In fact,quantifiers can only attach to one character at a time,so the regular expression "abc+" would mean "a,followed by b,followed by c one or more times". It would not mean "abc" one or more times. However,quantifiers can also attach to  and ,such as [abc]+ (a or b or c,one or more times) or (abc)+ (the group "abc",one or more times).

Let's illustrate by specifying the group (dog),three times in a row.

Enter your regex: (dog){3} Enter input string to search: dogdogdogdogdogdog I found the text "dogdogdog" starting at index 0 and ending at index 9. I found the text "dogdogdog" starting at index 9 and ending at index 18.

Enter your regex: dog{3}
Enter input string to search: dogdogdogdogdogdog
No match found.

Here the first example finds three matches,since the quantifier applies to the entire capturing group. Remove the parentheses,and the match fails because the quantifier {3} now applies only to the letter "g".

Similarly,we can apply a quantifier to an entire character class:

Enter your regex: [abc]{3} Enter input string to search: abccabaaaccbbbc I found the text "abc" starting at index 0 and ending at index 3. I found the text "cab" starting at index 3 and ending at index 6. I found the text "aaa" starting at index 6 and ending at index 9. I found the text "ccb" starting at index 9 and ending at index 12. I found the text "bbc" starting at index 12 and ending at index 15.

Enter your regex: abc{3}
Enter input string to search: abccabaaaccbbbc
No match found.

Here the quantifier {3} applies to the entire character class in the first example,but only to the letter "c" in the second.

Differences Among Greedy,Reluctant,and Possessive Quantifiers

There are subtle differences among greedy,and possessive quantifiers.

Greedy quantifiers are considered "greedy" because they force the matcher to read in,or eat,the entire input string prior to attempting the first match. If the first match attempt (the entire input string) fails,the matcher backs off the input string by one character and tries again,repeating the process until a match is found or there are no more characters left to back off from. Depending on the quantifier used in the expression,the last thing it will try matching against is 1 or 0 characters.

The reluctant quantifiers,take the opposite approach: They start at the beginning of the input string,then reluctantly eat one character at a time looking for a match. The last thing they try is the entire input string.

Finally,the possessive quantifiers always eat the entire input string,trying once (and only once) for a match. Unlike the greedy quantifiers,possessive quantifiers never back off,even if doing so would allow the overall match to succeed.

To illustrate,consider the input string xfooxxxxxxfoo.

Enter your regex: .*foo // greedy quantifier Enter input string to search: xfooxxxxxxfoo I found the text "xfooxxxxxxfoo" starting at index 0 and ending at index 13.

Enter your regex: .*?foo // reluctant quantifier
Enter input string to search: xfooxxxxxxfoo
I found the text "xfoo" starting at index 0 and ending at index 4.
I found the text "xxxxxxfoo" starting at index 4 and ending at index 13.

Enter your regex: .*+foo // possessive quantifier
Enter input string to search: xfooxxxxxxfoo
No match found.

The first example uses the greedy quantifier .* to find "anything",zero or more times,followed by the letters "f" "o" "o". Because the quantifier is greedy,the .* portion of the expression first eats the entire input string. At this point,the overall expression cannot succeed,because the last three letters ("f" "o" "o") have already been consumed. So the matcher slowly backs off one letter at a time until the rightmost occurrence of "foo" has been regurgitated,at which point the match succeeds and the search ends.

The second example,is reluctant,so it starts by first consuming "nothing". Because "foo" doesn't appear at the beginning of the string,it's forced to swallow the first letter (an "x"),which triggers the first match at 0 and 4. Our test harness continues the process until the input string is exhausted. It finds another match at 4 and 13.

The third example fails to find a match because the quantifier is possessive. In this case,the entire input string is consumed by.*+,leaving nothing left over to satisfy the "foo" at the end of the expression. Use a possessive quantifier for situations where you want to seize all of something without ever backing off; it will outperform the equivalent greedy quantifier in cases where the match is not immediately found.

Capturing Groups
In the ,we saw how quantifiers attach to one character,character class,or capturing group at a time. But until now,we have not discussed the notion of capturing groups in any detail.

Capturing groups are a way to treat multiple characters as a single unit. They are created by placing the characters to be grouped inside a set of parentheses. For example,the regular expression (dog) creates a single group containing the letters "d" "o" and "g". The portion of the input string that matches the capturing group will be saved in memory for later recall via backreferences (as discussed below in the section, ).

Numbering

As described in the Pattern API,capturing groups are numbered by counting their opening parentheses from left to right. In the expression ((A)(B(C))),for example,there are four such groups:

  1. ((A)(B(C)))
  2. (A)
  3. (B(C))
  4. (C)

To find out how many groups are present in the expression,call the groupCount method on a matcher object. The groupCount method returns an int showing the number of capturing groups present in the matcher's pattern. In this example, groupCount would return the number 4,showing that the pattern contains 4 capturing groups.

There is also a special group,group 0,which always represents the entire expression. This group is not included in the total reported by groupCount. Groups beginning with (? are pure, non-capturing groups that do not capture text and do not count towards the group total. (You'll see examples of non-capturing groups later in the section .)

It's important to understand how groups are numbered because some Matcher methods accept an int specifying a particular group number as a parameter:

  • public int start(int group): Returns the start index of the subsequence captured by the given group during the prevIoUs match operation.
  • public int end (int group): Returns the index of the last character,plus one,of the subsequence captured by the given group during the prevIoUs match operation.
  • public String group (int group): Returns the input subsequence captured by the given group during the prevIoUs match operation.

Backreferences

The section of the input string matching the capturing group(s) is saved in memory for later recall via backreference. A backreference is specified in the regular expression as a backslash (\) followed by a digit indicating the number of the group to be recalled. For example,the expression (\d\d) defines one capturing group matching two digits in a row,which can be recalled later in the expression via the backreference \1.

To match any 2 digits,followed by the exact same two digits,you would use (\d\d)\1 as the regular expression:

Enter your regex: (\d\d)\1 Enter input string to search: 1212 I found the text "1212" starting at index 0 and ending at index 4.

If you change the last two digits the match will fail:

Enter your regex: (\d\d)\1 Enter input string to search: 1234 No match found.

For nested capturing groups,backreferencing works in exactly the same way: Specify a backslash followed by the number of the group to be recalled.

Boundary Matchers
Until now,we've only been interested in whether or not a match is found at some location within a particular input string. We never cared about where in the string the match was taking place.

You can make your pattern matches more precise by specifying such information with boundary matchers. For example,maybe you're interested in finding a particular word,but only if it appears at the beginning or end of a line. Or maybe you want to know if the match is taking place on a word boundary,or at the end of the previous match.

The following table lists and explains all the boundary matchers.

^$\b\B\A\G\Z\zThe following examples demonstrate the use of boundary matchers ^ and $. As noted above, ^ matches the beginning of a line,and $matches the end.

Enter your regex: ^dog$ Enter input string to search: dog I found the text "dog" starting at index 0 and ending at index 3.

Enter your regex: ^dog$
Enter input string to search: dog
No match found.

Enter your regex: \s*dog$
Enter input string to search: dog
I found the text " dog" starting at index 0 and ending at index 15.

Enter your regex: ^dog\w*
Enter input string to search: dogblahblah
I found the text "dogblahblah" starting at index 0 and ending at index 11.

The first example is successful because the pattern occupies the entire input string. The second example fails because the input string contains extra whitespace at the beginning. The third example specifies an expression that allows for unlimited white space,followed by "dog" on the end of the line. The fourth example requires "dog" to be present at the beginning of a line followed by an unlimited number of word characters.

To check if a pattern begins and ends on a word boundary (as opposed to a substring within a longer string),just use \b on either side; for example, \bdog\b

Enter your regex: \bdog\b
Enter input string to search: The dog plays in the yard.
I found the text "dog" starting at index 4 and ending at index 7.

Enter your regex: \bdog\b
Enter input string to search: The doggie plays in the yard.
No match found.

To match the expression on a non-word boundary,use \B instead:

Enter your regex: \bdog\B Enter input string to search: The dog plays in the yard. No match found.

Enter your regex: \bdog\B
Enter input string to search: The doggie plays in the yard.
I found the text "dog" starting at index 4 and ending at index 7.

To require the match to occur only at the end of the previous match,use \G:

Enter your regex: dog Enter input string to search: dog dog I found the text "dog" starting at index 0 and ending at index 3. I found the text "dog" starting at index 4 and ending at index 7.

Enter your regex: \Gdog
Enter input string to search: dog dog
I found the text "dog" starting at index 0 and ending at index 3.

Here the second example finds only one match,because the second occurrence of "dog" does not start at the end of the previous match.

Methods of the Pattern Class
Until now,we've only used the test harness to create Pattern objects in their most basic form. This section explores advanced techniques such as creating patterns with flags and using embedded flag expressions. It also explores some additional useful methods that we haven't yet discussed.

Creating a Pattern with Flags

The Pattern class defines an alternate compile method that accepts a set of flags affecting the way the pattern is matched. The flags parameter is a bit mask that may include any of the following public static fields:

  • Pattern.CANON_EQ Enables canonical equivalence. When this flag is specified,two characters will be considered to match if,and only if,their full canonical decompositions match. The expression "a\u030A",will match the string "\u00E5" when this flag is specified. By default,matching does not take canonical equivalence into account. Specifying this flag may impose a performance penalty.
  • Pattern.CASE_INSENSITIVE Enables case-insensitive matching. By default,case-insensitive matching assumes that only characters in the US-ASCII charset are being matched. Unicode-aware case-insensitive matching can be enabled by specifying the UNICODE_CASE flag in conjunction with this flag. Case-insensitive matching can also be enabled via the embedded flag expression (?i). Specifying this flag may impose a slight performance penalty.
  • Pattern.COMMENTS Permits whitespace and comments in the pattern. In this mode,whitespace is ignored,and embedded comments starting with # are ignored until the end of a line. Comments mode can also be enabled via the embedded flag expression (?x).
  • Pattern.DOTALL Enables dotall mode. In dotall mode,the expression . matches any character,including a line terminator. By default this expression does not match line terminators. Dotall mode can also be enabled via the embedded flag expression (?s). (The s is a mnemonic for "single-line" mode,which is what this is called in Perl.)
  • Pattern.LITERAL Enables literal parsing of the pattern. When this flag is specified then the input string that specifies the pattern is treated as a sequence of literal characters. Metacharacters or escape sequences in the input sequence will be given no special meaning. The flags CASE_INSENSITIVE and UNICODE_CASE retain their impact on matching when used in conjunction with this flag. The other flags become superfluous. There is no embedded flag character for enabling literal parsing.
  • Pattern.MULTILINE Enables multiline mode. In multiline mode the expressions ^ and $ match just after or just before,respectively,a line terminator or the end of the input sequence. By default these expressions only match at the beginning and the end of the entire input sequence. Multiline mode can also be enabled via the embedded flag expression (?m).
  • Pattern.UNICODE_CASE Enables Unicode-aware case folding. When this flag is specified then case-insensitive matching,when enabled by the CASE_INSENSITIVE flag,is done in a manner consistent with the Unicode Standard. By default,case-insensitive matching assumes that only characters in the US-ASCII charset are being matched. Unicode-aware case folding can also be enabled via the embedded flag expression (?u). Specifying this flag may impose a performance penalty.
  • Pattern.UNIX_LINES Enables UNIX lines mode. In this mode,only the '\n' line terminator is recognized in the behavior of .^,and$. UNIX lines mode can also be enabled via the embedded flag expression (?d).

In the following steps we will modify the test harness, RegexTestHarness.java to create a pattern with case-insensitive matching.

First,modify the code to invoke the alternate version of compile:

Pattern pattern = Pattern.compile(console.readLine("%nEnter your regex: "),Pattern.CASE_INSENSITIVE);

Then compile and run the test harness to get the following results:

Enter your regex: dog Enter input string to search: DoGDOg I found the text "DoG" starting at index 0 and ending at index 3. I found the text "DOg" starting at index 3 and ending at index 6.

As you can see,the string literal "dog" matches both occurences,regardless of case. To compile a pattern with multiple flags,separate the flags to be included using the bitwise OR operator "|". For clarity,the following code samples hardcode the regular expression instead of reading it from the Console:

pattern = Pattern.compile("[az]$",Pattern.MULTILINE | Pattern.UNIX_LInes);

You Could also specify an int variable instead:

final int flags = Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE; Pattern pattern = Pattern.compile("aa",flags);

bedded" name="embedded">

Embedded Flag Expressions

It's also possible to enable varIoUs flags using embedded flag expressions. Embedded flag expressions are an alternative to the two-argument version of compile,and are specified in the regular expression itself. The following example uses the original test harness, RegexTestHarness.java with the embedded flag expression (?i) to enable case-insensitive matching.

Enter your regex: (?i)foo Enter input string to search: FOOfooFoOfoO I found the text "FOO" starting at index 0 and ending at index 3. I found the text "foo" starting at index 3 and ending at index 6. I found the text "FoO" starting at index 6 and ending at index 9. I found the text "foO" starting at index 9 and ending at index 12.

Once again,all matches succeed regardless of case.

The embedded flag expressions that correspond to Pattern's publicly accessible fields are presented in the following table:

Pattern.CANON_EQPattern.CASE_INSENSITIVE(?i)Pattern.COMMENTS(?x)Pattern.MULTILINE(?m)Pattern.DOTALL(?s)Pattern.LIteraLPattern.UNICODE_CASE(?u)Pattern.UNIX_LInes(?d)Using the matches(String,CharSequence) Method

The Pattern class defines a convenient matches method that allows you to quickly check if a pattern is present in a given input string. As with all public static methods,you should invoke matches by its class name,such as Pattern.matches("\\d","1");. In this example,the method returns true,because the digit "1" matches the regular expression \d.

Using the split(String) Method

The split method is a great tool for gathering the text that lies on either side of the pattern that's been matched. As shown below in SplitDemo.java,the split method Could extract the words "one two three four five" from the string "one:two:three:four:five":

import java.util.regex.Pattern; import java.util.regex.Matcher;

public class SplitDemo {

private static final String REGEX = ":";
private static final String INPUT =
    "one:two:three:four:five";

public static void main(String[] args) {
    Pattern p = Pattern.compile(REGEX);
    String[] items = p.split(INPUT);
    for(String s : items) {
        Sy<a href="https://www.jb51.cc/tag/stem/" target="_blank">stem</a>.out.println(s);
    }
}

}

OUTPUT:

one
two
three
four
five

For simplicity,we've matched a string literal,the colon (:) instead of a complex regular expression. Since we're still usingPattern and Matcher objects,you can use split to get the text that falls on either side of any regular expression. Here's the same example, SplitDemo2.java,modified to split on digits instead:

import java.util.regex.Pattern; import java.util.regex.Matcher;

public class SplitDemo2 {

private static final String REGEX = "\\d";
private static final String INPUT =
    "one9two4three7four1five";

public static void main(String[] args) {
    Pattern p = Pattern.compile(REGEX);
    String[] items = p.split(INPUT);
    for(String s : items) {
        Sy<a href="https://www.jb51.cc/tag/stem/" target="_blank">stem</a>.out.println(s);
    }
}

}

OUTPUT:

one
two
three
four
five

Other Utility Methods

You may find the following methods to be of some use as well:

  • public static String quote(String s) Returns a literal pattern String for the specified String. This method produces a String that can be used to create a Pattern that would match String s as if it were a literal pattern. Metacharacters or escape sequences in the input sequence will be given no special meaning.
  • public String toString() Returns the String representation of this pattern. This is the regular expression from which this pattern was compiled.

Pattern Method Equivalents in java.lang.String

Regular expression support also exists in java.lang.String through several methods that mimic the behavior of java.util.regex.Pattern. For convenience,key excerpts from their API are presented below.

  • public boolean matches(String regex): Tells whether or not this string matches the given regular expression. An invocation of this method of the form str.matches(regex) yields exactly the same result as the expression Pattern.matches(regexstr).
  • public String[] split(String regex,int limit): Splits this string around matches of the given regular expression. An invocation of this method of the form str.split(regexn) yields the same result as the expression Pattern.compile(regex).split(strn)
  • public String[] split(String regex): Splits this string around matches of the given regular expression. This method works the same as if you invoked the two-argument split method with the given expression and a limit argument of zero. Trailing empty strings are not included in the resulting array.

There is also a replace method,that replaces one CharSequence with another:

  • public String replace(CharSequence target,CharSequence replacement): Replaces each substring of this string that matches the literal target sequence with the specified literal replacement sequence. The replacement proceeds from the beginning of the string to the end,replacing "aa" with "b" in the string "aaa" will result in "ba" rather than "ab".
Methods of the Matcher Class
This section describes some additional useful methods of the Matcher class. For convenience,the methods listed below are grouped according to functionality.

Index Methods

Index methods provide useful index values that show precisely where the match was found in the input string:

  • public int start(): Returns the start index of the prevIoUs match.
  • public int start(int group): Returns the start index of the subsequence captured by the given group during the prevIoUs match operation.
  • public int end(): Returns the offset after the last character matched.
  • public int end(int group): Returns the offset after the last character of the subsequence captured by the given group during the prevIoUs match operation.

Study Methods

Study methods review the input string and return a boolean indicating whether or not the pattern is found.

  • public boolean lookingAt(): Attempts to match the input sequence,starting at the beginning of the region,against the pattern.
  • public boolean find(): Attempts to find the next subsequence of the input sequence that matches the pattern.
  • public boolean find(int start): Resets this matcher and then attempts to find the next subsequence of the input sequence that matches the pattern,starting at the specified index.
  • public boolean matches(): Attempts to match the entire region against the pattern.

Replacement Methods

Replacement methods are useful methods for replacing text in an input string.

  • public Matcher appendReplacement(StringBuffer sb,String replacement): Implements a non-terminal append-and-replace step.
  • public StringBuffer appendTail(StringBuffer sb): Implements a terminal append-and-replace step.
  • public String replaceAll(String replacement): Replaces every subsequence of the input sequence that matches the pattern with the given replacement string.
  • public String replaceFirst(String replacement): Replaces the first subsequence of the input sequence that matches the pattern with the given replacement string.
  • public static String quoteReplacement(String s): Returns a literal replacement String for the specified String. This method produces aString that will work as a literal replacement s in the appendReplacement method of the Matcher class. The String produced will match the sequence of characters in s treated as a literal sequence. Slashes ('\') and dollar signs ('$') will be given no special meaning.

Using the start and end Methods

Here's an example, MatcherDemo.java,that counts the number of times the word "dog" appears in the input string.

import java.util.regex.Pattern; import java.util.regex.Matcher;

public class MatcherDemo {

private static final String REGEX =
    "\\bdog\\b";
private static final String INPUT =
    "dog dog dog doggie dogg";

public static void main(String[] args) {
   Pattern p = Pattern.compile(REGEX);
   //  get a matcher object
   Matcher m = p.matcher(INPUT);
   int count = 0;
   while(m.find()) {
       count++;
       Sy<a href="https://www.jb51.cc/tag/stem/" target="_blank">stem</a>.out.println("Match number "
                          + count);
       Sy<a href="https://www.jb51.cc/tag/stem/" target="_blank">stem</a>.out.println("start(): "
                          + m.start());
       Sy<a href="https://www.jb51.cc/tag/stem/" target="_blank">stem</a>.out.println("end(): "
                          + m.end());
  }

}
}

OUTPUT:

Match number 1
start(): 0
end(): 3
Match number 2
start(): 4
end(): 7
Match number 3
start(): 8
end(): 11

You can see that this example uses word boundaries to ensure that the letters "d" "o" "g" are not merely a substring in a longer word. It also gives some useful information about where in the input string the match has occurred. The start method returns the start index of the subsequence captured by the given group during the prevIoUs match operation,and end returns the index of the last character matched,plus one.

Using the matches and lookingAt Methods

The matches and lookingAt methods both attempt to match an input sequence against a pattern. The difference,is that matchesrequires the entire input sequence to be matched,while lookingAt does not. Both methods always start at the beginning of the input string. Here's the full code, MatchesLooking.java:

import java.util.regex.Pattern; import java.util.regex.Matcher;

public class MatchesLooking {

private static final String REGEX = "foo";
private static final String INPUT =
    "fooooooooooooooooo";
private static Pattern pattern;
private static Matcher matcher;

public static void main(String[] args) {

    // Initialize
    pattern = Pattern.compile(REGEX);
    matcher = pattern.matcher(INPUT);

    Sy<a href="https://www.jb51.cc/tag/stem/" target="_blank">stem</a>.out.println("Current REGEX is: "
                       + REGEX);
    Sy<a href="https://www.jb51.cc/tag/stem/" target="_blank">stem</a>.out.println("Current INPUT is: "
                       + INPUT);

    Sy<a href="https://www.jb51.cc/tag/stem/" target="_blank">stem</a>.out.println("lookingAt(): "
        + matcher.lookingAt());
    Sy<a href="https://www.jb51.cc/tag/stem/" target="_blank">stem</a>.out.println("matches(): "
        + matcher.matches());
}

}

Current REGEX is: foo Current INPUT is: fooooooooooooooooo lookingAt(): true matches(): false

Using replaceFirst(String) and replaceAll(String)

The replaceFirst and replaceAll methods replace text that matches a given regular expression. As their names indicate, replaceFirstreplaces the first occurrence,and replaceAll replaces all occurences. Here's the ReplaceDemo.java code:

import java.util.regex.Pattern; import java.util.regex.Matcher;

public class ReplaceDemo {

private static String REGEX = "dog";
private static String INPUT =
    "The dog says meow. All dogs say meow.";
private static String REPLACE = "cat";

public static void main(String[] args) {
    Pattern p = Pattern.compile(REGEX);
    // get a matcher object
    Matcher m = p.matcher(INPUT);
    INPUT = m.replaceAll(REPLACE);
    Sy<a href="https://www.jb51.cc/tag/stem/" target="_blank">stem</a>.out.println(INPUT);
}

}

OUTPUT: The cat says meow. All cats say meow.

In this first version,all occurrences of dog are replaced with cat. But why stop here? Rather than replace a simple literal likedog,you can replace text that matches any regular expression. The API for this method states that "given the regular expression a*b,the input aabfooaabfooabfoob,and the replacement string -,an invocation of this method on a matcher for that expression would yield the string -foo-foo-foo-."

Here's the ReplaceDemo2.java code:

import java.util.regex.Pattern; import java.util.regex.Matcher;

public class ReplaceDemo2 {

private static String REGEX = "a*b";
private static String INPUT =
    "aabfooaabfooabfoob";
private static String REPLACE = "-";

public static void main(String[] args) {
    Pattern p = Pattern.compile(REGEX);
    // get a matcher object
    Matcher m = p.matcher(INPUT);
    INPUT = m.replaceAll(REPLACE);
    Sy<a href="https://www.jb51.cc/tag/stem/" target="_blank">stem</a>.out.println(INPUT);
}

}

OUTPUT: -foo-foo-foo-

To replace only the first occurrence of the pattern,simply call replaceFirst instead of replaceAll. It accepts the same parameter.

Using appendReplacement(StringBuffer,String) and appendTail(StringBuffer)

The Matcher class also provides appendReplacement and appendTail methods for text replacement. The following example, RegexDemo.java,uses these two methods to achieve the same effect as replaceAll.

import java.util.regex.Pattern; import java.util.regex.Matcher;

public class RegexDemo {

private static String REGEX = "a*b";
private static String INPUT = "aabfooaabfooabfoob";
private static String REPLACE = "-";

public static void main(String[] args) {
    Pattern p = Pattern.compile(REGEX);
    Matcher m = p.matcher(INPUT); // get a matcher object
    StringBuffer sb = new StringBuffer();
    while(m.find()){
        m.appendReplacement(sb,REPLACE);
    }
    m.appendTail(sb);
    Sy<a href="https://www.jb51.cc/tag/stem/" target="_blank">stem</a>.out.println(sb.toString());
}

}

OUTPUT: -foo-foo-foo-

Matcher Method Equivalents in java.lang.String

For convenience,the String class mimics a couple of Matcher methods as well:

  • public String replaceFirst(String regex,String replacement): Replaces the first substring of this string that matches the given regular expression with the given replacement. An invocation of this method of the form str.replaceFirst(regexrepl) yields exactly the same result as the expression Pattern.compile(regex).matcher(str).replaceFirst(repl)
  • public String replaceAll(String regex,String replacement): Replaces each substring of this string that matches the given regular expression with the given replacement. An invocation of this method of the form str.replaceAll(regexrepl) yields exactly the same result as the expression Pattern.compile(regex).matcher(str).replaceAll(repl)
Methods of the PatternSyntaxException Class
PatternSyntaxException is an unchecked exception that indicates a Syntax error in a regular expression pattern. ThePatternSyntaxException class provides the following methods to help you determine what went wrong:

  • public String getDescription(): Retrieves the description of the error.
  • public int getIndex(): Retrieves the error index.
  • public String getPattern(): Retrieves the erroneous regular expression pattern.
  • public String getMessage(): Returns a multi-line string containing the description of the Syntax error and its index,the erroneous regular-expression pattern,and a visual indication of the error index within the pattern.

The following source code, RegexTestHarness2.java,updates our test harness to check for malformed regular expressions:

import java.io.Console; import java.util.regex.Pattern; import java.util.regex.Matcher; import java.util.regex.PatternSyntaxException;

public class RegexTestHarness2 {

public static void main(String[] args){
    Pattern pattern = null;
    Matcher matcher = null;

    Console console = Sy<a href="https://www.jb51.cc/tag/stem/" target="_blank">stem</a>.console();
    if (console == null) {
        Sy<a href="https://www.jb51.cc/tag/stem/" target="_blank">stem</a>.err.println("No console.");
        Sy<a href="https://www.jb51.cc/tag/stem/" target="_blank">stem</a>.exit(1);
    }
    while (true) {
        try{
            pattern = 
            Pattern.compile(console.readLine("%nEnter your regex: "));

            matcher = 
            pattern.matcher(console.readLine("Enter input string to search: "));
        }
        catch(Pattern<a href="https://www.jb51.cc/tag/Syntax/" target="_blank">Syntax</a>Exception pse){
            console.format("There is a problem" +
                           " with the regular expression!%n");
            console.format("The pattern in question is: <a href="https://www.jb51.cc/tag/s/" target="_blank">%s</a>%n",pse.getPattern());
            console.format("The description is: <a href="https://www.jb51.cc/tag/s/" target="_blank">%s</a>%n",pse.getDescription());
            console.format("The message is: <a href="https://www.jb51.cc/tag/s/" target="_blank">%s</a>%n",pse.getMessage());
            console.format("The index is: <a href="https://www.jb51.cc/tag/s/" target="_blank">%s</a>%n",pse.getIndex());
            Sy<a href="https://www.jb51.cc/tag/stem/" target="_blank">stem</a>.exit(0);
        }
        boolean found = false;
        while (matcher.find()) {
            console.format("I found the text" +
                " \"<a href="https://www.jb51.cc/tag/s/" target="_blank">%s</a>\" starting at " +
                "index %d and ending at index %d.%n",matcher.end());
            found = true;
        }
        if(!found){
            console.format("No match found.%n");
        }
    }
}

}

To run this test,enter ?i)foo as the regular expression. This mistake is a common scenario in which the programmer has forgotten the opening parenthesis in the embedded flag expression (?i). Doing so will produce the following results:

Enter your regex: ?i) There is a problem with the regular expression! The pattern in question is: ?i) The description is: Dangling Meta character '?' The message is: Dangling Meta character '?' near index 0 ?i) ^ The index is: 0

From this output,we can see that the Syntax error is a dangling Metacharacter (the question mark) at index 0. A missing opening parenthesis is the culprit。

Unicode Support

Unicode Character Properties

Each Unicode character,in addition to its value,has certain attributes,or properties. You can match a single character belonging to a particular category with the expression \p{prop}. You can match a single character not belonging to a particular category with the expression \P{prop}.

The three supported property types are scripts,blocks,and a "general" category.

Scripts

To determine if a code point belongs to a specific script,you can either use the script keyword,or the sc short form,\p{script=Hiragana}. Alternatively,you can prefix the script name with the string Is,such as \p{IsHiragana}.

Valid script names supported by Pattern are those accepted by UnicodeScript.forName.

Blocks

A block can be specified using the block keyword,or the blk short form, \p{block=Mongolian}. Alternatively,you can prefix the block name with the string In,such as \p{InMongolian}.

Valid block names supported by Pattern are those accepted by UnicodeBlock.forName.

General Category

Categories can be specified with optional prefix Is. For example, IsL matches the category of Unicode letters. Categories can also be specified by using the general_category keyword,or the short form gc. For example,an uppercase letter can be matched usinggeneral_category=Lu or gc=Lu.

Supported categories are those of  in the version specified by the Character class.

Additional Resources
Now that you've completed this lesson on regular expressions,you'll probably find that your main references will be the API documentation for the following classes: Pattern,and PatternSyntaxException.

For a more precise description of the behavior of regular expression constructs,we recommend reading the book Mastering Regular Expressions by Jeffrey E. F. Friedl.

Questions and Exercises: Regular Expressions
Questions
  1. What are the three public classes in the java.util.regex package? Describe the purpose of each.
  2. Consider the string literal "foo". What is the start index? What is the end index? Explain what these numbers mean.
  3. What is the difference between an ordinary character and a Metacharacter? Give an example of each.
  4. How do you force a Metacharacter to act like an ordinary character?
  5. What do you call a set of characters enclosed in square brackets? What is it for?
  6. Here are three predefined character classes: \d\s,and \w. Describe each one,and rewrite it using square brackets.
  7. For each of \d,and \w,write two simple expressions that match the opposite set of characters.
  8. Consider the regular expression (dog){3}. Identify the two subexpressions. What string does the expression match?

Exercises

  1. Use a backreference to write an expression that will match a person's name only if that person's first name and last name are the same.

总结

以上是小编为你收集整理的Regular Expressions --正则表达式官方教程全部内容。

如果觉得小编网站内容还不错,欢迎将小编网站推荐给好友。

Express 相关文章

Express 入门 创建一个express文件
根据官网 入门 express
Express 路由 02
java叫接口control什么的app.get.post等等都是请求方式我们可以在游览器输入localhost端口/或者在Apifox里面写。
Express 中间件
为了前端丢进去的时候可以直接判断中间件就是经过了这个就会被使用可以做一些自己的数据处理什么的。
Express 04 文件的上传和下载
Express 文件的上传和下载
Express 05 使用Node链接sqlite
运行命令下载app.js 增加中间件。
Express 06 制作留言板项目 创建第一个项目
基本模板来的 后面用后就有什么加什么都行。
Express 06 制作留言板项目 完
此篇完成 增删 刷新文件 的基本操作
Express VUE3 博客开发
最基本的创建 以及 添加了其他的一些插件 登录接口和四个接口
  • • Express 入门 创建一个express文件
  • • Express 路由 02
  • • Express 中间件
  • • Express 04 文件的上传和下载
  • • Express 05 使用Node链接sqlite
  • • Express 06 制作留言板项目 创建第一个
  • • Express 06 制作留言板项目 完
  • • Express VUE3 博客开发
  • • Express VUE3 博客开发 用户token验证
  • • python实现计算器功能
HTMLreactjsCSSNode.jsangulartypescriptvue.jsreact-natispringkotlinAPIseleniumtensorflowbashangularJSexpressxcodematplotlibflaskHibernatedictionaryrailscocoswebnpmreact-hookmongoosegoogle-appformswpfRestwebpackunit-testihttpclassfileNext.jsJsHTML5bootstrap-
  • 友情链接:
  • 菜鸟教程
  • 前端之家
  • 编程小课
  • 小编
  • -
  • 我要投稿
  • -
  • 广告合作
  • -
  • 联系我们
  • -
  • 免责声明
  • -
  • 网站地图
版权所有 © 2018 小编 闽ICP备13020303号-8
微信公众号搜索 “ 程序精选 ” ,选择关注!
微信公众号搜"程序精选"关注
微信扫一扫可直接关注哦!
bedded Flag Expression

7.2. re — Regular expression operations正则表达式 p...

7.2. re — Regular expression operations正则表达式 p...

文中翻译有些不到位的地方请看原文http://docs.python.org/library/re.html

另推荐有关python re的文档http://docs.python.org/howto/regex

中文翻译可参加http://www.cnblogs.com/ltang/archive/2011/07/31/2122914.html  



本模块提供了那些在Perl中可找到的相似的匹配操作。被搜索的patternsstrings可以是Unicode性字符串或8位字符串。

    正则表达式使用了反斜线''\''字符来指示特别的形式或允许在不使用它们特别含义的情况下使用这写特殊字符。这样会有问题,比如,为了匹配字面上的反斜线''\''pattern可能会是''\\\\'',因为正则表达式必须是\\, 并且每一个反斜线在正常的python字面字符串必须被表示为\\

    为此的解决方案是使用pythonraw string观念;在前缀''r''的反斜线不在具有特别含义所以r"\n"是包含''\''''n''2个字符,而"\n"本身是一个换行符是一个字符。通常,在Python代码中的patterns被表示为raw string    明白大部分正则表示操作可在模块级函数和RegexObject方法下完成是很重要的。这些函数不需要你首先编译一个regex对象,但也失去了一些fine-tuning参数,所以是捷径。--参见  Mastering Regular Expressions 一书

7.2.1. Regular Expression Syntax

    正则表达式(RE)明确了a set of strings that matches it;本模块中的函数让你检查一个特别的string是否匹配一个给出的re正则表达式(或给出的re是否匹配特别的string

    多个正则表达式可联合成新的正则表达式;如果AB都是RE,则AB也是RE,一般来说,如果字符串P匹配A并且另一个字符串Q匹配B,则PQ将匹配ABThis holds unless A or B contain low precedence operations; boundary conditions between A and B; or have numbered group references.因此复杂的表达式可有小的简单表达式构成。

    正则表达式可包含特殊和普通字符。大部分普通字符比如''A'',''B'',''C''是最简单的正则表达式;他们只是匹配他们自己 。可联合起来,所以last匹配''last''(剩下将介绍特别类型)

    一些像''|'' , ''(''的特殊字符。特殊字符或者代表了普通字符类,或者影响着他们周围的正则表达式是如何别解释的。pattern不要包含null字节,但可使用\number中的null 字节比如''\x00''

    特殊字符:

''.''  :默认时匹配除换行符外的任意字符,如果明确了DOTALL标记,这匹配包含换行符在内的所有字符。

''^'' :匹配string的开始,在multiline模式下在每一行就匹配

''*'' :使前面的RE匹配0次或多次。所以 ab*将匹配''a'' , ''ab'' , ''a''并后跟任意多个''b''

''?'' : 使前面的RE匹配0次或1次。所以ab?将匹配''a'' ''ab''

*?   +?   ?? : ''*'' ,''+'' ,''?''都是贪心的;他们能匹配多少就匹配多少。有时这种行为是不希望的;如果RE <.*>匹配''<H1>title</H1>''时将匹配整个字符串而不是我们想要的''<H1>''.所以,后加''?''将使非贪心或minimal模式;能够少匹配多少就匹配多少,所以,   .*?将匹配''<H1>'' .

{m} :前面的RE匹配m次;比如a{6}将明确匹配6''a''字符,而不是5次。

{m,n} :前面的RE匹配mn次,在mn的范围内能匹配多少就匹配多少。比如 a{3,5}将匹配从35次的''a''。省略m时说明从0次开始匹配,省略n时说明了上界为无穷多次的匹配。比如,a{4,}b匹配aaab100''a''再后接''b'',但不是''aaab''.

{m,n}? :前面的RE匹配mn次,在mn的范围内能少匹配多少就匹配多少。所以是非贪心的,如果string''aaaaaa''a{3,5}将匹配5''a'' a{3,5}?将匹配只3''a''

''\'' :或者转义(escapes)特殊字符(即允许你匹配像''*'',''?''的特殊字符),或signals 一个特殊的字符串:

如果你不使用raw string来表示pattern,记得python也在string literals中将反斜线''\''用作转义字符;如果python语法分析器不认识转义字符,则反斜线和接下来的字符会被包含在最终结果的string中,但是,如果python识别了最终序列,则反斜线应重复2次。复杂斌难以理解,所以强烈推荐使用raw strings

[] :明确字符集

1.单独列出来的字符 [amk]将匹配''a'' ''m'' ''k''

2. 字符范围 [a-z]小写字母  [0-5][0-9]两位数字00-59   [0-9A-Fa-f]匹配16进制位,如果''-''被转义([a\-z])或被放在开头或结尾([a-])见匹配''-''

3. []中特殊字符失去了其特殊的含义,比如[(+*)]将匹配字面上的''('' ''+''  ''*'' '')''

4. \w \S的字符类也可放在[]中,即使匹配的字符依赖于LOCALEUNICODE模式是否有效

5. 匹配不在[]中的字符,如果字符集中的第一个字符是''^'',则匹配字符集外的任意字符。比如[^5]将匹配除了5以外的任意字符,[^^]将匹配除了''^''外的任意字符。''^''如果不出现在字符集的第一个字符则没有特殊的含义。

6. 为了在集合中匹配'']'',前加反斜线或放在集合开头。比如 [()[\]{}][]()[{}]都匹配圆括号

 

''|'' :A|B (其中AB可是任意的RE)创造了一个将匹配A或者B的正则表达式。任意数量的RE可由''|''方式分隔。此中方式也可在group中出现(下面)。当目标字符串被scanned时,由''|''分隔的RE从左到右一个一个被试。但有一个完全匹配时,此分支被接受。意味着一旦A匹配了,则B不会被test一遍,即使B可能会有一个更长的匹配。换句话说,''|''是非贪心的。为了匹配字母上的literal ''|'' 使用\|或放入[]中为[|]

(...) :只是将RE放入此中,以便后面引用:\number。为了匹配字面上的''''( '')''使用\(\)或在字符类中:[(] [)]

(?...) :

(?iLnsux): 

(?:...): 非捕获版的RE。匹配的字串无法在后面需要时被引用

(?P<name>...): (...)只是匹配后可由group名来引用。比如如果pattern(?P<id>[a-zA-Z_]\w*),则以后可用m.group(''id'')m.end(''id'')方式引用,

(?P=name): group名为nameRE

(?#...): 注释;圆括号中的注释被忽略

(?=...): 如果...匹配下面的时此才匹配,但不消耗任意的string。被叫做lookahead assertion。比如,Isaac(?=Asimov)只在后接''Asimov时才''将匹配''Isaac''

(?!...): 如果...不匹配下面的时此才匹配,比如 Isaac(?!Asimov)将匹配''Isaac''只在不后接''Asimov''

(?<=...):前接...时才匹配。(?<=abc)def将在abcdef中发现一个匹配,...只能是固定长度的pattern,比如 abca|b可以,但a*a{3,4}不行。

>>> import re >>> m = re.search(''(?<=abc)def'', ''abcdef'') >>> m.group(0) ''def''

此例找前跟连字号的一个word:

>>> m = re.search(''(?<=-)\w+'', ''spam-egg'') >>> m.group(0) ''egg''

(?<!...): 如果不前接...时此才匹配。同上,...比是固定长度。

(?(id/name)yes-pattern|no-pattern): 如果为id或名为namegroup存在时才去匹配yes-pattern,反之,匹配no-patternno-pattern是可选的也就可省略。比如,(<)?(\w+@\w+(?:\. \w+)+)(?(1)>)是可poor的邮件匹配pattern,将匹配''<user@host.com>''''user@host.com''但不会匹配''<user@host.com'' .

 

New in version 2.4.

\number:匹配编号为numberRE

    group1开始编号。比如(.+) \1匹配''the the '' ''55 55'',但不是''the end'' (注意空格).编号只能从099.

\A :只匹配string的开头。

\b : 匹配空字符串,但仅仅在word的开头或结尾。一个word被定义为有字母,数字,或下划线组成的序列,所以word的端(end)是空格或非字母,非数字,非下划线。\b被定义为\w \W字符之间的boundary(反之,也成立),或\wstring的开始端或结束端之间,比如,r''\bfoo\b''匹配''foo'',''foo.'' ''(foo)'', ''bar foo baz''但不会匹配''foobar'' ''foo3''.为了兼容pythonstring,在字符范围内\b代表backspace字符。

\B :  匹配空字符,但仅仅但它不在word的开头或结尾时。这意味着r''py\B''将匹配''python'',''py3'',''py2'',而不会匹配''py'',''py.'',''py!''. \B\b的反面,所以is also subject to the settings of LOCALE and UNICDOE.

\d :  当没有明确UNICODE标志位时,将匹配任意的decimal数字;这等同于[0-9].有了UNICODE时,将匹配whatever is classified as a decimal digit in the Unicode character properties database.

\D: 当没有明确UNICODE标志位时,将匹配任意的非decimal数字;这等同于[^0-9].有了UNICODE时,将匹配anything other than character marked as digits in the Unicode character properties database.

\s: 当没有明确UNICODE标志位时,将匹配任意的非空格字符;这等同于[^\t\n\r\f\v....

\S:

\w:这等同于[a-zA-Z0-9]

\W: 

\Z:

7.2.2. Module Contents

本模块定义了一些函数,常量,和异常。一些函数是。。。的简化版本。大部分应用都使用complied形式。

 

re.compile(pattern,flags=0)

    编译一个正则表达式pattern为正则表达式对象,可利用此对象的match()search()方法。

    RE的行为可由flags改变。其值可为下面变量的任何一个,或使用|OR操作混合起来。

prog = re.compile(pattern) result = prog.match(string)

等价于:

result = re.match(pattern, string)

但使用re.compile()为了重用而保存最终的正则表达式对象是更有效的。

re.DEBUG:显示有关compiled expression的调试信息。

re.I  re.IGNORECASE: 完成大小写无关匹配;所以,像[A-Z]的表达式也将匹配小写字母。这不受current locale影响。

re.L  re.LOCALE: 使得\w \W \b \B \s \S依赖与current locale

re.M re.MULTILINE: 明确此时,pattern字符''^''匹配string的开头并且在每一行的开头;''$''匹配string的结尾并且在每一行的结尾。默认情况下,''^''只匹配string的开头,''$''只匹配string的结尾。

re.S re.DOTALL: 使得''.''将匹配包含换行符在内的任意字符,没有此flag时匹配除换行符外的任意字符。

re.U re.UNICODE:使得\w \W \b \B \s \S依赖与Unicode character properties database.

re.X re.VERBOSE: flag可使你写出好看的RE。除了字符类的空格和preceded by unescaped backslash的空格外其他的在pattern中的空格会被忽略,

意味着下面等价:

a = re.compile(r"""\d +  # the integral part                    \.    # the decimal point                    \d *  # some fractional digits""", re.X) b = re.compile(r"\d+\.\d*")

re.search(pattern,string,flags=0)

    扫描string从中找到pattern匹配的位置并返回相应的MatchObject实例对象。如果没有匹配返回None注意这不同于在string中找到0长度的匹配。

re.match(pattern,string,flags=0)

    如果在string的开始有0或多个字符匹配pattern并返回相应的MatchObject实例对象。如果没有匹配返回None注意这不同于在string中找到0长度的匹配。

    注意甚至在multiline模式下re.match()将仅仅匹配string的开头而不是每一行的开头。

    如果你想要在string的任何地方locate a match,使用search()

re.split(pattern,string ,maxsplit=0,flags=0)

    split string by the occurrence of pattern。如果pattern加了圆括号,则所有pattern里所有groups的内容作为最终结果的list被返回。如果maxsplit0,则最多maxsplit个分片发生,并且string中剩下的内容作为list中最后的元素。

>>> re.split(''\W+'', ''Words, words, words.'') [''Words'', ''words'', ''words'', ''''] >>> re.split(''(\W+)'', ''Words, words, words.'') [''Words'', '', '', ''words'', '', '', ''words'', ''.'', ''''] >>> re.split(''\W+'', ''Words, words, words.'', 1) [''Words'', ''words, words.''] >>> re.split(''[a-f]+'', ''0a3B9'', flags=re.IGNORECASE) [''0'', ''3'', ''9'']

如果有捕获groups(pattern或某个子pattern被圆括号括起来) in the seaparator并且它匹配字符串的开头,结果将以空字符串开头,空字符串结尾:

>>> re.split(''(\W+)'', ''...words, words...'') ['''', ''...'', ''words'', '', '', ''words'', ''...'', '''']

注意split绝不会在不匹配时split字符串:

>>> re.split(''x*'', ''foo'') [''foo''] >>> re.split("(?m)^$", "foo\n\nbar\n") [''foo\n\nbar\n'']

re.findall(pattern,string,flags = 0)

    list的形式返回string中所有的非重叠的匹配,每一个匹配时list的一个元素。string被从左到右扫描,匹配。如果一个或多个groups被发现,则返回返回元素为tuplelist

re.finditer(pattern,string,flags = 0)

    对每一个非重叠匹配返回MatchObject实例的inerator

re.sub(pattern,repl,string,count = 0,flags = 0)

    返回由repl替代string中每一个非重叠匹配的字串后的string。如果没有匹配被发现则原string被返回。repl可以是string或函数,如果是string,则转义字符被使用,比如\n是一个换行符,\rcarriage return,等等。不被识别的转义字符\j are left alone.后向引用是\6表示group号为6的被匹配的字串:

>>> re.sub(r''def\s+([a-zA-Z_][a-zA-Z_0-9]*)\s*\(\s*\):'', ...        r''static PyObject*\npy_\1(void)\n{'', ...        ''def myfunc():'') ''static PyObject*\npy_myfunc(void)\n{''

如果repl是函数,则对每一个非重叠的引用此函数被调用一次。此函数把每个MatchObject对象作为函数参数,sub返回替换之后的字符串:

 

 >>> def dashrepl(matchobj):

 

...     if matchobj.group(0) == ''-'': return '' ''

 

...     else: return ''-'' >>> re.sub(''-{1,2}'', dashrepl, ''pro----gram-files'') ''pro--gram files''

>>> re.sub(r''\sAND\s'', '' & '', ''Baked Beans And Spam'', flags=re.IGNORECASE)

 

''Baked Beans & Spam''  

 

pattern可以是stringRE对象

可选的参数count是被替换的最大数;是非负值。如果此参数别省略或为0,则所有匹配被替换。仅当not adjacent to a previous match empty matchs被替换:sub(''x*'',''-'',''abc'')返回''-a-b-c-''.

除了字符转义和后向引用外,\g<name>将使用被匹配的group名为name的字串,此group定义为(?P<name>...).\g<number>也使用了相应的group数;\g<2>等价于\2,但不与替换中的比如\g<2>0混淆。\20被解释为对group 20的引用,不是group 2的引用并后跟字面字符''0''。后向引用\g<0>替换了整个匹配的字串。

 

re.subn(pattern,repl,string ,count = 0,flags = 0)

    完成与sub同样的操作,但返回一个tuple (new_string,number_of_subs_made).

re.escape(string)

    返回所有non-alphanumerics backslashed字串;

re.purge()

    清除RE缓存

exception re.error

    但一个无效的RE被传递给函数时Exceptionraised(比如,可能包含了unmatched parentheses)或在编译或匹配时发生的错误。但如果string不匹配RE是不会发生error

7.2.3. Regular Expression Objects

class re.RegexObject

    RegexObject类支持下面的方法和属性:

search(string[, pos[, endpos]])

    扫描此string并找RE匹配此string的位置,并返回相应的MatchObject对象。如果不匹配返回None;注意这不同于找到0长度的match

    第二个可选的参数pos给在出string中匹配开始的索引位置;默认是0即从头开始匹配。这不完全同于对string的分片操作;''^''匹配string的开头或newline的后面位置,但...

       可选的参数endpos限制了string被搜索多远;就好像string只有endpos长,所以对匹配来讲仅仅有pos endpos-1的字符被搜索。如果endpos小于pos,不会有匹配,否则,如果rx被编译成了RE对象,re.search(string,0,50)等价于rx.search(string[:50],0)

>>> pattern = re.compile("d") >>> pattern.search("dog")     # Match at index 0 <_sre.SRE_Match object at ...> >>> pattern.search("dog", 1# No match; search doesn''t include the "d"

match(string[,pos[,endpos]])

    如果在string的开头有0个或多个字符匹配此RE,则返回相应的MatchObject实例对象。如果没有匹配则返回None;注意,这不同于0长度匹配。

可选的posendpossearch()

>>> pattern = re.compile("o") >>> pattern.match("dog")      # No match as "o" is not at the start of "dog". >>> pattern.match("dog", 1)   # Match as "o" is the 2nd character of "dog". <_sre.SRE_Match object at ...>

    如果你想要在string中定位一个match,使用search()而不是match()

split(string,maxsplit = 0)

    等同于split()函数,使用compiled pattern

findall(string[, pos[, endpos]])

    相似与findall()函数,使用compiled pattern,但也接受可选的参数,此些参数同match()中。

finditer(string[, pos[, endpos]])

    相似与finditer()函数,使用compiled pattern,但也接受可选的参数,此些参数同match()中。

sub(repl,string,count = 0)

    等价于sub()函数,使用compiled pattern

subn(repl,string,count = 0)

    等价于subn()函数,使用compiled pattern

flags

    regex匹配flags 

groups

    patten中被捕获的groups

groupindex

    一个字典,此字典将由(?P<id>)定义的group名与group数一一对应。如果在patter中没有使用

    groups(pattern或某个    pattern没有被圆括号括起来)则此字典为empty

pattern

    the pattern string from which the RE object was compiled

7.2.4. Match Objects

class re.MatchObject

    MatchObject总是有值为True的布尔值,所以你可以测试比如matct()

    expand(template)

        返回对template字符串进行backlash替换之后而获得的字符串,正如sub()方法那样。转义比如\n被转换为恰当的字符,数字后引用(\1 ,\2)和有名后引用(\g<1>,\g<name>)被相应group的内容所替换。

    group([group1, ...])

         返回匹配的一个或多个的子group。如果有一个参数,则结果是一个单string,如果有多个参数,结果是每个参数为一个itemtuple。没有参数时,group1默认为0(整个匹配被返回)。如果groupN0,相应的返回值是整个匹配的string;如果参数在[1..99]范围内,结果是匹配相应的groupstring。如果group number为负或大于patter中定义的group number,则raise一个IndexError异常。如果一个group包含在pattern中且没有匹配,则相应的结果为None。如果一个group包含在pattern中且多次被匹配,则最后那次匹配被返回。

>>> m = re.match(r"(\w+) (\w+)", "Isaac Newton, physicist") >>> m.group(0)       # The entire match ''Isaac Newton'' >>> m.group(1)       # The first parenthesized subgroup. ''Isaac'' >>> m.group(2)       # The second parenthesized subgroup. ''Newton'' >>> m.group(1, 2)    # Multiple arguments give us a tuple. (''Isaac'', ''Newton'')

如果RE使用了(?P<name>...)语法,则groupN参数也可以是个string,此stringgroup name给出。如果string参数不被用作pattern中的group名,raise一个IndexError

一个适度复杂的例子:

>>> m = re.match(r"(?P<first_name>\w+) (?P<last_name>\w+)", "Malcolm Reynolds") >>> m.group(''first_name'') ''Malcolm'' >>> m.group(''last_name'') ''Reynolds''

有名的groups也可由index来引用:

>>> m.group(1) ''Malcolm'' >>> m.group(2) ''Reynolds''

如果一个group匹配了多次,只是最后那次匹配 is accessible

>>> m = re.match(r"(..)+", "a1b2c3"# Matches 3 times. >>> m.group(1)                        # Returns only the last match. ''c3''

    groups([default])

        返回所有由subgroup匹配形成的tuple,从1到开始。default参数不会参与匹配;默认为None

        比如:

>>> m = re.match(r"(\d+)\.(\d+)", "24.1632") >>> m.groups() (''24'', ''1632'')

不是所有的group会参与到匹配中。如果默认值未给出则groups将默认为None,否则值为默认值:

>>> m = re.match(r"(\d+)\.?(\d+)?", "24") >>> m.groups()      # Second group defaults to None. (''24'', None) >>> m.groups(''0'')   # Now, the second group defaults to ''0''. (''24'', ''0'')

    groupdict([default])

        返回一个字典,此字典包含了keysubgroup value为相应的匹配字串。如果默认值未给出则groups将默认为None,否则值为默认值:

>>> m = re.match(r"(?P<first_name>\w+) (?P<last_name>\w+)", "Malcolm Reynolds") >>> m.groupdict(){''first_name'': ''Malcolm'', ''last_name'': ''Reynolds''}

    start([group])

    end([group])

           返回每个匹配的group的开始和结束位置;group默认为0(即整个匹配的字串)。如果group存在但没有匹配时返回-1.对一个match对象m,一个group g满足一个匹配,则g匹配的字串(m.group(g)):

m.string[m.start(g):m.end(g)]

 注意m.start(group)group匹配一个null字符串时将等同于m.end(group)。比如m = re.search(''b(c?)'',''cba'')执行后m.start(0)1m.end(0)2m.start(1)m.end(1)都是2m.start(2)raise一个IndexError异常。

一个将从email地址中删除remove_this的例子

>>> email = "tony@tiremove_thisger.net" >>> m = re.search("remove_this", email) >>> email[:m.start()] + email[m.end():] ''tony@tiger.net''

    span([group])

        MatchObject m来说返回一个(m.start(group), m,end(group))tuple。注意如果group不匹配结果是(-1, -1)

group默认为0,整个匹配。

    pos

        此值传给RegexObject对象的search()match()方法。这是RE引擎开始匹配的index的开始位置

    endpos

        此值传给RegexObject对象的search()match()方法。这是RE引擎开始匹配的index的结束位置

    lastindex   

    lastgroup

    re

    string

7.2.5. Examples

7.2.5.1. Checking For a Pair

在整个例子中,我们将使用下面的helpler函数显示math objects,使之结果更美观:

def displaymatch(match):     if match is None:         return None     return ''<Match: %r, groups=%r>'' % (match.group(), match.groups())

假设你正在写一个扑克程序,此程序中每个玩家的手由5个字符的字符串代表,其中每一个字符代表一个卡片,''a''代表ace''k''代表king ''q''代表queen ''j''代表jack ''t''代表10以及29代表了卡的值。

为了查看一个给出的string是否是个有效地hand,可以如下怎么做:

>>> valid = re.compile(r"^[a2-9tjqk]{5}$") >>> displaymatch(valid.match("akt5q"))  # Valid. "<Match: ''akt5q'', groups=()>" >>> displaymatch(valid.match("akt5e"))  # Invalid. >>> displaymatch(valid.match("akt"))    # Invalid. >>> displaymatch(valid.match("727ak"))  # Valid. "<Match: ''727ak'', groups=()>"

最后一个hand "727ak",包含了一对或2个相同值的卡。可使用后向引用:

>>> pair = re.compile(r".*(.).*\1") >>> displaymatch(pair.match("717ak"))     # Pair of 7s. "<Match: ''717'', groups=(''7'',)>" >>> displaymatch(pair.match("718ak"))     # No pairs. >>> displaymatch(pair.match("354aa"))     # Pair of aces. "<Match: ''354aa'', groups=(''a'',)>"

7.2.5.2. Simulating scanf()

Python中当前没有相等价的scanf()函数,RE一般比scanf()格式更强大,也更啰嗦,下表或多或少提供了一些等价于scanf()函数的RE格式

scanf() Token

Regular Expression

%c

.

%5c

.{5}

%d

[-+]?\d+

%e%E%f%g

[-+]?(\d+(\.\d*)?|\.\d+)([eE][-+]?\d+)?

%i

[-+]?(0[xX][\dA-Fa-f]+|0[0-7]*|\d+)

%o

0[0-7]*

%s

\S+

%u

\d+

%x%X

0[xX][\dA-Fa-f]+

为了从像下面的string中提取文件名和数字:

usr/sbin/sendmail - 0 errors, 4 warnings

你会像下面这样使用scanf()格式:

%s - %d errors, %d warnings

而等价的RE会是:

(\S+) - (\d+) errors, (\d+) warnings

7.2.5.3. search() vs. match()

Python提供了2个不同的基于RE的基本操作:re.match()用来仅仅检查string的开头,而re.search()检查string中任何一个位置开始的匹配(perl默认就是这样)比如:

>>> re.match("c", "abcdef"# No match >>> re.search("c", "abcdef") # Match <_sre.SRE_Match object at ...>

''^''开始的RE可被用在search()中以便来约束在string开头处的匹配:

>>> re.match("c", "abcdef"# No match >>> re.search("^c", "abcdef") # No match >>> re.search("^a", "abcdef"# Match <_sre.SRE_Match object at ...>

但,在MULTILINE模式下,match()函数仅仅匹配string的开头,但以''^''开头的REsearch()函数可匹配每行的开头。

>>> re.match(''X'', ''A\nB\nX'', re.MULTILINE)  # No match >>> re.search(''^X'', ''A\nB\nX'', re.MULTILINE)  # Match <_sre.SRE_Match object at ...>

7.2.5.4. Making a Phonebook

split()函数以patternstring分隔形成一个list。此方法在将文本数据转化为易读易修改的数据结构时显得很与价值。

首先,输入正常意义下来自于文件,这里我们使用triple-quoted字符串语法:

>>> text = """Ross McFluff: 834.345.1254 155 Elm Street ... ... Ronald Heathmore: 892.345.3428 436 Finley Avenue ... Frank Burger: 925.541.7625 662 South Dogwood Way ... ... ... Heather Albrecht: 548.326.4584 919 Park Place"""

字符串给一个或多个空行分隔开。现在我们将string转化为每一个非空行是list中一个元素而形成的list

>>> entries = re.split("\n+", text) >>> entries [''Ross McFluff: 834.345.1254 155 Elm Street'', ''Ronald Heathmore: 892.345.3428 436 Finley Avenue'', ''Frank Burger: 925.541.7625 662 South Dogwood Way'', ''Heather Albrecht: 548.326.4584 919 Park Place'']

最后,将每一行在split成一个first-name last-name telephone-number address构成的list。使用maxsplit参数,因为address占有空间:

>>> [re.split(":? ", entry, 3) for entry in entries] [[''Ross'', ''McFluff'', ''834.345.1254'', ''155 Elm Street''], [''Ronald'', ''Heathmore'', ''892.345.3428'', ''436 Finley Avenue''], [''Frank'', ''Burger'', ''925.541.7625'', ''662 South Dogwood Way''], [''Heather'', ''Albrecht'', ''548.326.4584'', ''919 Park Place'']]

'':?''匹配last-name的冒号。如果maxsplit4我们可以从street-name中分隔出housr-number

>>> [re.split(":? ", entry, 4) for entry in entries]

[[''Ross'', ''McFluff'', ''834.345.1254'', ''155'', ''Elm Street''],

[''Ronald'', ''Heathmore'', ''892.345.3428'', ''436'', ''Finley Avenue''],

[''Frank'', ''Burger'', ''925.541.7625'', ''662'', ''South Dogwood Way''],

[''Heather'', ''Albrecht'', ''548.326.4584'', ''919'', ''Park Place'']]

7.2.5.5. Text Munging

sub()string或函数的结果替换每一次pattern的匹配。这个例子显示了使用sub()函数repl为一个函数,在此函数中"munge" text,或者随机化除了第一个和最后一个字符外的字符:

>>> def repl(m):

...   inner_word = list(m.group(2))

...   random.shuffle(inner_word)

...   return m.group(1) + "".join(inner_word) + m.group(3)

>>> text = "Professor Abdolmalek, please report your absences promptly."

>>> re.sub(r"(\w)(\w+)(\w)", repl, text)

''Poefsrosr Aealmlobdk, pslaee reorpt your abnseces plmrptoy.''

>>> re.sub(r"(\w)(\w+)(\w)", repl, text)

''Pofsroser Aodlambelk, plasee reoprt yuor asnebces potlmrpy.''

7.2.5.6. Finding all Adverbs

findall()匹配所有的pattern,不仅仅是search()那样子匹配第一个。比如,如果某个作家想要在文本中找到所有的动词,他或她可像如下这样使用:

>>> text = "He was carefully disguised but captured quickly by police."

>>> re.findall(r"\w+ly", text)

[''carefully'', ''quickly'']

7.2.5.7. Finding all Adverbs and their Positions

如果某人想要得到有关匹配更多的信息,finditer()是很有用的,因为它提供了MathcObject实例对象。接着前面的例子,如果某个作家想要在某个文本中找到动词以及出现的位置,可使用finditer():

>>> text = "He was carefully disguised but captured quickly by police."

>>> for m in re.finditer(r"\w+ly", text):

...     print ''%02d-%02d: %s'' % (m.start(), m.end(), m.group(0))

07-16: carefully

40-47: quickly

7.2.5.8. Raw String Notation

raw stirng概念(r"text")使得RE健全。没有此,每个在RE中的反斜线''\''可能必须要前缀另一个反斜线来转义它,下面代码功能上等价:

>>> re.match(r"\W(.)\1\W", " ff ")

<_sre.SRE_Match object at ...>

>>> re.match("\\W(.)\\1\\W", " ff ")

<_sre.SRE_Match object at ...>

当某人想要匹配一个字面上的反斜线是,必须在RE中转义。有了raw string后即r"\\"。没有raw string时必须使用"\\\\",下面代码等价:

>>> re.match(r"\\", r"\\")

<_sre.SRE_Match object at ...>

>>> re.match("\\\\", r"\\")

<_sre.SRE_Match object at ...>

 

 

 

 

 

 

 

 

 

 

 

 

 

 

ABAP 正则表达式(Regular Expressions)

ABAP 正则表达式(Regular Expressions)

正则表达式(Regular Expressions)


正则表达式在其他编程语言中的应用非常广泛,网上资料也非常多,而网上在ABAP语言中应用的资料却很少,尽管各语言中正则表达式语法知识都很类似,但仍然有一些区别,本文主要是简单介绍一下其基本语法。总结一下,方便大家查阅。

欢迎转载,请注明出处,文中不足之处还望指正。(Email:hubin0809@126.com)

一、简要认识

正则表达式就是用一个“字符串”来描述一个特征,然后去验证另一个“字符串”是否符合这个特征。比如表达式“ab+” 描述的特征是“一个 'a' 和任意个 'b' ”,那么 'ab','abb','abbbbbbbbbb' 都符合这个特征。

正则表达式可以用来:(1)验证字符串是否符合指定特征,比如验证是否是合法的邮件地址。(2)用来查找字符串,从一个长的文本中查找符合指定特征的字符串,比查找固定字符串更加灵活方便。(3)用来替换,比普通的替换更强大。

举例

DATA:matcherTYPEREFTOcl_abap_matcher,
matchTYPEcLENGTH1.
matcher=cl_abap_matcher=>create(pattern='\w+@\w+(\.\w+)+'
text='hubin0809@126.com').
match=matcher->match().
WRITEmatch.

输出结果:X

解释:

1> '\w+@\w+(\.\w+)+'中 \w 是表示任意一个字母或数字或下划线,+ 表示前面字符个数为一个或多个,@即为’@’字符

2> matcher参照类cl_abap_matcher,match有匹配的意思,调用静态方法create创建了匹配的对(暂时这么理解,好吧,我承认我不知道怎么形容),然后调用match方法,返回值中’X’表示匹配,SPACE表示不匹配。

具体含义后面会讲到,本程序主要是验证邮件地址是否合法。

二、语法规则

pattern模板,text要匹配的字符,match匹配结果,’X’表示匹配,SPACE表示不匹配。

1、普通字符

字母、数字、汉字、下划线、以及后面没有特殊定义的标点符号,都是"普通字符"。表达式中的普通字符,在匹配一个字符串的时候,匹配与之相同的一个字符。

Pattern

Text

Match

A

X

a

-

AB

X

2、转义字符

一些不便书写的字符,采用在前面加 "\" 的方法。例如’.’

表达式

可匹配

\\

代表 "\" 本身

\.

匹配小数点(.)本身

\Q...\E

中间的字符作为普通字符

Pattern

Match

.\.

f.

X

f\f

-

\w\d

\w\d

\\w\\d

\Q\w\d\E

3、能够与 '多种字符'匹配的表达式

正则表达式中的一些表示方法,可以匹配 '多种字符' 其中的任意一个字符。比如,表达式 "\d" 可以匹配任意一个数字。虽然可以匹配其中任意字符,但是只能是一个,不是多个。这就好比玩扑克牌时候,大小王可以代替任意一张牌,但是只能代替一张牌。(没玩过?好吧,去玩qq够级吧,ok,信息泄露了,承认我是山东人)

\d

任意一个数字,0~9 中的任意一个

\w

任意一个字母或数字或下划线,也就是 A~Z,a~z,0~9,_ 中任意一个

\s

包括空格、制表符、换页符等空白字符的其中任意一个

.

小数点可以匹配除了换行符(\n)以外的任意一个字符

\d

9

25

-

\d\d

\w

\s

\n

...

4zF

4、自定义能够与 '多种字符'匹配的表达式

使用方括号 [ ] 包含一系列字符,能够匹配其中任意一个字符。用 [^ ] 包含一系列字符,则能够匹配其中字符之外的任意一个字符。同样的道理,虽然可以匹配其中任意一个,但是只能是一个,不是多个。

[ab5@]

匹配 "a" 或 "b" 或 "5" 或 "@"

[^abc]

匹配 "a","b","c" 之外的任意一个字符

[f-k]

匹配 "f"~"k" 之间的任意一个字母

[^A-F0-3]

匹配 "A"~"F","0"~"3" 之外的任意一个字符

[abc]

abc

[^abc]b

cb

[a-g]b

5、支持的 POSIX字符集合

POSIX字符集合

可匹配

[:alnum:]

任何一个字母或数字(A - Z,a - z,0 - 9)

[:alpha:]

任何一个字母(A - Z,a - z)

[:cntrl:]

任何一个控制字符(\x00 – \x1F,\x7F)

[:digit:]

任何一个数字(0 – 9)

[:space:]

任何一个空白字符(\x09 – \x0D,\x20)

[:graph:]

任何一个可显示的 ASCII 字符,不包含空格

[:lower:]

任何一个小写字母(a – z)

[:upper:]

任何一个大写字母(A – Z)

[:punct:]

可显示字符 [:print:] 中除去字母数字 [:alnum:]

[:blank:]

空格或者制表符(\x20,\x09)

个人感觉意义不大,可能对一些控制字符有用吧,了解。

[[:alnum:]]

[:lower:][:digit:]

a9

[[:lower:][:digit:]]

b

6、修饰匹配次数的特殊符号

前面讲到的表达式,无论是只能匹配一种字符的表达式,还是可以匹配多种字符其中任意一个的表达式,都只能匹配一次。如果使用表达式再加上修饰匹配次数的特殊符号,那么不用重复书写表达式就可以重复匹配,否则会累死的。

作用

{n}

表达式重复n次,比如:"\w{2}" 相当于 "\w\w";"a{5}" 相当于 "aaaaa"

{m,n}

表达式至少重复m次,最多重复n次,比如:"ba{1,3}"可以匹配 "ba"或"baa"或"baaa"

表达式至少重复m次,比如:"\w\d{2,}"可以匹配 "a12","_456","M12344"...

?

匹配表达式0次或者1次,相当于 {0,1},比如:"a[cd]?"可以匹配 "a","ac","ad"

+

表达式至少出现1次,相当于 {1,},比如:"a+b"可以匹配 "ab","aab","aaab"...

*

表达式不出现或出现任意次,相当于 {0,},比如:"*b"可以匹配 "b","cccb"...

[abc]{3}

bca

.{3,5}

abcd

\d{5,}

12345

a*b

a+b

-

7、其他一些代表抽象意义的特殊符号

^

与字符串开始的地方匹配,不匹配任何字符

$

与字符串结束的地方匹配,不匹配任何字符

\b

匹配一个单词边界,也就是单词和空格之间的位置,不匹配任何字符

|

左右两边表达式之间 "或" 关系,匹配左边或者右边

( )

(1). 在被修饰匹配次数的时候,括号中的表达式可以作为整体被修饰
(2). 取匹配结果的时候,括号中的表达式匹配到的内容可以被单独得到

(?: )

匹配pattern但不获取匹配结果,也就是说这是一个非获取匹配,不进行存储供以后使用。

进一步的文字说明仍然比较抽象,因此,举例帮助大家理解。

举例1:表达式 "^aaa" 在匹配 "xxx aaa xxx" 时,匹配结果是:失败。因为 "^" 要求与字符串开始的地方匹配,因此,只有当 "aaa" 位于字符串的开头的时候,"^aaa" 才能匹配,比如:"aaa xxx xxx"。

举例2:表达式 "aaa$" 在匹配 "xxx aaa xxx" 时,匹配结果是:失败。因为 "$" 要求与字符串结束的地方匹配,因此,只有当 "aaa" 位于字符串的结尾的时候,"aaa$" 才能匹配,比如:"xxx xxx aaa"。

举例3:表达式 ".\b." 在匹配 "@@@abc" 时,能够找到匹配的内容;匹配到的内容是:"@a";匹配到的位置是:开始于2,结束于4。
进一步说明:"\b" 与 "^" 和 "$" 类似,本身不匹配任何字符,但是它要求它在匹配结果中所处位置的左右两边,其中一边是 "\w" 范围,另一边是 非"\w" 的范围。

举例4:表达式 "\bend\b" 在匹配 "weekend,endfor,end" 时,能够找到匹配的内容;匹配到的内容是:"end";匹配到的位置是:开始于15,结束于18。

Pattern

Match

(.{1,3})|(.{5,})

bcade

三、正则表达式中的一些高级规则(ABAP部分支持)

1、 匹配次数中的贪婪与非贪婪

贪婪模式:

在使用修饰匹配次数的特殊符号时,有几种表示方法可以使同一个表达式能够匹配不同的次数,比如:"{m,n}","{m,}","?","*","+",具体匹配的次数随被匹配的字符串而定。这种重复匹配不定次数的表达式在匹配过程中,总是尽可能多的匹配。比如,针对文本 "dxxxdxxxd",举例如下:

表达式

匹配结果

(d)(\w+)

"\w+" 将匹配第一个 "d" 之后的所有字符 "xxxdxxxd"

(d)(\w+)(d)

"\w+" 将匹配第一个 "d" 和最后一个 "d" 之间的所有字符 "xxxdxxx"。虽然 "\w+" 也能够匹配上最后一个 "d",但是为了使整个表达式匹配成功,"\w+" 可以 "让出" 它本来能够匹配的最后一个 "d"

由此可见,"\w+" 在匹配的时候,总是尽可能多的匹配符合它规则的字符。虽然第二个举例中,它没有匹配最后一个 "d",但那也是为了让整个表达式能够匹配成功。同理,带 "*" 和 "{m,n}" 的表达式都是尽可能地多匹配,带 "?" 的表达式在可匹配可不匹配的时候,也是尽可能的 "要匹配"。这种匹配原则就叫作 "贪婪" 模式 。

非贪婪模式:(ABAP暂时不支持,但是最好理解吧)

在修饰匹配次数的特殊符号后再加上一个 "?" 号,则可以使匹配次数不定的表达式尽可能少的匹配,使可匹配可不匹配的表达式,尽可能的 "不匹配"。这种匹配原则叫作 "非贪婪" 模式,也叫作 "勉强" 模式。如果少匹配就会导致整个表达式匹配失败的时候,与贪婪模式类似,非贪婪模式会最小限度的再匹配一些,以使整个表达式匹配成功。举例如下,针对文本 "dxxxdxxxd" 举例:

(d)(\w+?)

"\w+?" 将尽可能少的匹配第一个 "d" 之后的字符,结果是:"\w+?" 只匹配了一个 "x"

(d)(\w+?)(d)

为了让整个表达式匹配成功,"\w+?" 不得不匹配 "xxx" 才可以让后边的 "d" 匹配,从而使整个表达式匹配成功。因此,结果是:"\w+?" 匹配 "xxx"

Grep命令中正则表达式(regular Expressions,RE)的用法

Grep命令中正则表达式(regular Expressions,RE)的用法

当你在使用grep命令的时候肯定会用到正则表达式,那么怎么在grep命令中使用正则表达式呢?

正则表达式元字符

grep命令支持很多正则表达式的元字符,以使用户能够更精准的定义要查找的模式。例如,可以通过制定的选项来关闭大小写敏感,要求显示行号等。

元字符 功能 示例 匹配对象
^ 行首定位符 ‘^user'' 匹配所有以user开头的行
$ 行尾定位符 ’user$'' 匹配所以以user结尾的行
匹配一个字符 ‘u.r'' 匹配包含一个u,后跟一个字符,再跟一个r的行
* 匹配两个或多个前导字符 ’u*ser'' 匹配包含零个或多个u后,跟ser模式的行
[] 匹配一组字符中的人一个 ‘[uU]ser'' 匹配包含user或者User的行
[^] 匹配不在指定字符组里的字符 ’[^A-S]ser‘ 匹配一个不在A到S之间的字符,并且该字符后紧跟着ser的行
\< 词首定位符 ’\<user‘ 匹配包含以user开头的词的行
\> 词尾定位符 ’user\>‘ 匹配包含以user结尾的词的行
\<..\> 标记匹配到的字符 ’\<user\>'' 匹配包含<user>的行
{M}{M,}{M,N}

匹配重复出现的次数M次匹配出现的次数至少M次

匹配出现的次数至少M次,但不超过N次

''u\{4\}u\{5\}

u\{5,8\}''

匹配连续出现4个u的行匹配连续出现最少5个u的行

匹配连续出现最少5个,最多8个u的行

 

grep 正则表达式示例:

在/etc/passwd 里查找包含“user1“字符串的行

[root@devops ~]# grep user1 /etc/passwd
user1:x:502:503::/home/user1:/bin/bash

如果想忽略大小写的区做搜索可以使用-i选项

#grep  user1 /etc/passwd

你可以使用"."元字符为一个单个的字符做匹配,例如我们可以使用下面的命令去匹配一个以”u"开头,以“r”结尾,中间是任意一个字符的行

[root@devops ~]# grep ''\<u.r\>'' /etc/passwd
game:x:12:100:games:/usr/games:/sbin/nologin

如何在grep正则表达式中准确的匹配所有包含两个字符的行?

[root@devops ~]# grep ''^..$'' /

ios – NSRegularExpression很奇怪(正则表达式是正确的)

ios – NSRegularExpression很奇怪(正则表达式是正确的)

我有这个正则表达式

([0-9]+)\(([0-9]+),([0-9]+)\)

我正在使用构造一个没有选项(0)的NSRegularExpression.该表达式应匹配字符串

1(135,252)

并且产生三个匹配:1,135,252.现在,我已经通过debuggex.com确认表达式是正确的并且做我想要的.但是,iOS拒绝承认我的努力和以下代码

Nsstring *nodeString = @"1(135,252)";
NSArray *r = [nodeRegex matchesInString:nodeString options:0 range:NSMakeRange(0,nodeString.length)];
NSLog(@"--- %@",nodeString);
for(NSTextCheckingResult *t in r) {
    for(int i = 0; i < t.numberOfRanges; i++) {
        NSLog(@"%@",[nodeString substringWithRange:[t rangeAtIndex:i]]);
    }
}

坚持说

--- 1(135,252)
135,252
13
5,252
5
252

这显然是错的.

思考?

解决方法

你的正则表达式应该是这样的

[NSRegularExpression regularExpressionWithPattern:@"([0-9]+)\\(([0-9]+),([0-9]+)\\)" 
                                          options:0 
                                            error:NULL];

注意图案中的双反斜杠.它们是必需的,因为反斜杠用于转义C中的特殊字符(例如引号),而Objective-C是C的超集.

如果你正在寻找一个方便的工具来处理正则表达式我可以推荐Patterns.它很便宜,可以直接导出到NSRegularExpressions.

关于Regular Expressions --正则表达式官方教程正则表达式 详解的问题就给大家分享到这里,感谢你花时间阅读本站内容,更多关于7.2. re — Regular expression operations正则表达式 p...、ABAP 正则表达式(Regular Expressions)、Grep命令中正则表达式(regular Expressions,RE)的用法、ios – NSRegularExpression很奇怪(正则表达式是正确的)等相关知识的信息别忘了在本站进行查找喔。

本文标签: