RegEx cont.
Variables
- Match vars.
- if regex have parens i.e. groups
- if group got successful match
- then the content of matched pattern stored in
$1, meaning first match against first group - e.g.
/(\S+)(\S+),(\S+)/corresponding matched content isprint $1 $2 $3
- then the content of matched pattern stored in
- classical match idiom:
if(xx =~ xx) do sth else die - three special vars.
- a regex matching a string divides the string into 3 pars
$&: the part actually matched$\backslash: the part before matched$': the part after matched
- Noncapturing parens
- sometimes, we use parens only for precedence, however, it still trigger matching variable.
- how can we escape from that, to tell perl not to trigger matching var in parens ?
- to avoid triggering memory, change
(…)to(?:…) - e.g.
/(\S+)(?:and|or)(\S+)/the middle parens won’t be counted as $2, instead, the last parens is $2.
- Named captures
- even better way, to tell which parens I want to capture, and give name to that matching var.
- e.g.
/(?<name1>\S+)(lorem)/nowprint "$+{name1}"will print the content of matching \S+ in the named capture you create.
Substitution
s/old/new/- old is RegEx
- new is treated as double quoted string, so if there is
$varinside, will be replaced as its value. never think of putting RegEx in new! - RegEx triggers back reference var:
s/with(\w+)/without$1/like said before, $1 is special var in double quoted string context, will be replaced by its value - idiom:
if (s/old/new/) {…}
- Alternate delimiter
- e.g.
#^https://#http://#again, don’t put^in http, it’s merely treated as “string”!
- e.g.
- More Modes
- Global matching,
s/\s+/ /gtransform to canonical whitespace - case-insensitive,
s/old/new/i .including new lines,s/old/new/s
- Global matching,
- List-context match
- you can store parens matched result in a list
- e.g.
my($first, $second, $third) = /(\S)(\S),(\S)/; - e.g.
my @words = /(\w+)/g
- **Non-greedy quantifier
- quantifier normally go long, then back off
- we can tell perl go as short as you can !
- Append
?to quantifier to make it lazy
- Split function
my @fields = split /separator/, $string- separator here is RegEx
my @ip = split /\./, "192.168.1.1";
- Case shifting
\U: take every var following\U, make them uppercase\L: take every var following\L, make them lowercase\u: take first char of followed, make it uppercase\l: take first char of followed, make it lowercase\E: where case shifting stops- N.B. all these happen in “string”!
- e.g.
s/(\w+)/\U$1/ - e.g.
s/(\w{2})/\u\L$1/iginitial cap