REGEX Extended

Embed Size (px)

Citation preview

  • 8/2/2019 REGEX Extended

    1/39

    Metacharacters

    1. the 12 punctuation characters that make

    regular expressions work their magic are $ ( )

    * + . ? [ \ ^ { |

    2. notably absent from the list are ] , - and }.

    The first two become metacharacters onlyafter an unescaped [, and the } only after an

    unescaped {

    3. If you want your regex to match themliterally, you need to escape them by placing

    a backslash in front of them

  • 8/2/2019 REGEX Extended

    2/39

    Matching literal string

    Any regular expression that does not includeany of the dozen characters $()*+.? [\^{|simply matches itself.

    By default, regular expressions are case

    sensitive - regex matches regex but notRegex, REGEX, or ReGeX

    Turn on case insensitivity by using the (?i)

    mode modifier, such as (?i)regex, orsensitive(?i)caseless(?-i)sensitive (localmode modifiers) in .NET or setting the /i flagwhen creating it in JavaScript.

  • 8/2/2019 REGEX Extended

    3/39

    Matching non printable charactersRepresentation Meaning Hex Flavors

    \a bell 0x07 .NET\e escape 0x1B .NET

    \f form feed 0x0C .NET, JScript

    \n new line 0x0A .NET, JScript

    \r carriage return 0x0D .NET, JScript

    \t horizontal tab 0x09 .NET, JScript

    \v vertical tab 0x0B .NET, JScript

    Variations:Using \cA through \cZ, you can match one of the 26 control characters that occupy

    positions 1 through 26 in the ASCII table

    A lowercase \x followed by two uppercase hexadecimal digits matches a single character

    in the ASCII set

  • 8/2/2019 REGEX Extended

    4/39

    Matching *$"'\n\d/\\+ :

    C# - "[$\"'\n\\d/\\\\] "

    - double quotes and backslashes must be escaped with a backslash.Note: "\n" is a string with a literal line break, which is ignored as

    whitespace. "\\n" is a string with the regex token \n, which matches

    a newline.

    @"[$""'\n\d/\\] - to include a double quote in a verbatim string, double it upNote: @"\n" is always the regex token \n, which matches a newline;

    verbatim strings do not support \n at the string level

    JavaScript - /[$"'\n\d\/\\]/- Simply place your regular expression between two forward slashes

    - If any forward slashes occur within the regular expression itself,

    escape those with a backslash.

  • 8/2/2019 REGEX Extended

    5/39

    Creating Regular Expression Objects

    C#:try{

    Regex regexObj = new Regex("UserInput", RegexOptions.Compile);}catch (ArgumentException ex){

    //...}

    Note: RegexOptions.Compile can run up to 10 times faster than a regular expressioncompiled without this option (it compiles the regular expression down to CIL)

    JavaScript:var myregexp = /regex pattern/;

    var myregexp = new RegExp(userinput);

  • 8/2/2019 REGEX Extended

    6/39

    Match One of Many Characters

    [ ] character class matches a single characterout of a list of possible characters

    ^ (caret) - negates the character class if you placeit immediately after the opening bracket

    - (hyphen) - creates a range when it is placedbetween two characters (order given by ASCII orUNICODE character table)

    Examples:

    o Hexadecimal character : [a-fA-F0-9]

    o Nonhexadecimal character : [^a-fA-F0-9]

    o Characters group : [aeiou]

  • 8/2/2019 REGEX Extended

    7/39

    Shorthands

    Six regex tokens that consist of a backslash and a letter

    form shorthand character classes. Each lowercaseshorthand character has an associated uppercaseshorthand character with the opposite meaning.

    Token Matches Opposite\d a single digit \D*^\d+)

    \w a single word character \W

    \s any whitespace character \S

    (this includes spaces, tabs, and line)

    Note - In JavaScript \w is always identical to *a-zA-Z0-9_+. In .NET it includes letters and digits from all otherscripts (Cyrillic, Thai, etc.)

  • 8/2/2019 REGEX Extended

    8/39

    Matching any character

    Solution Matches Flavor Notes

    . any character, except line

    breaks

    .NET

    JScript

    .NET : the dot matches line

    breaks option must not be

    set

    . any character, including line

    breaks

    .NET .NET : the dot matches line

    breaks option must be set[1] - RegexOptions.Singleline

    [\s\S] Any character, including line

    breaks

    JScript[2]

    [1] you can also place a mode modifier at the start of the regular expression

    : (?s) is the mode modifier for dot matches line breaks mode in .NET[2] an alternative solution is needed for JavaScript, which doesnt have a

    dot matches line breaks option (*\d\D+ and *\w\W+ have the same

    effect).

  • 8/2/2019 REGEX Extended

    9/39

    Match Something at the Start and/or

    the End of a Line (1)

    Solution Matches Flavor Note

    \A At the very start of the subject text,

    before the first character (to test

    whether the subject text begins with

    the text you want to match)

    .NET A must be uppercase

    equivalent to \A, as long as you do not

    turn on the ^ and $ match

    at line breaks option; otherwise it will

    match at the very start of the each line

    .NET

    JScript

    .NET : ^ and $ match at line breaks option -

    RegexOptions.Multiline

    \Z \z at the very end of the subject text, after

    the last character (to test whether thesubject text ends with the text you want

    to match)

    .NET Difference between \Z and \z - when the

    last character in your subject text is a linebreak. In that case, \Z can match at the very

    end of the subject text, after the final line

    break, as well as immediately before that line

    break

    $ equivalent to \Z, as long as you do not

    turn on the ^ and $ match

    at line breaks option; otherwise it will

    match at the ver end of the each line

    .NET

    JScript

    .NET : ^ and $ match at line breaks option -

    RegexOptions.Multiline

    Anchors - ^, $, \A, \Z, and \z - they match at certain positions, effectively

    anchoring the regular expression match at those positions:

  • 8/2/2019 REGEX Extended

    10/39

    Match Something at the Start and/or

    the End of a Line (2)Examples ^alpha (.NET, JavaScript)matches alpha at the

    start of the subject text if ^ and $ match at line breaksis not set or at the start of each line otherwise

    \Aalpha (.NET) - matches alpha at the start of thesubject text

    omega$ (.NET, JavaScript)matches omega at theend of the subject text if ^ and $ match at line breaks

    is not set or at the end of each line otherwise omega\Z (.NET) - matches omega at the end of the

    subject text

  • 8/2/2019 REGEX Extended

    11/39

    Match Something at the Start and/or

    the End of a Line (3)

    Combining two anchors:

    \A\Z matches the empty string, as well as

    the string that consists of a single newline

    \A\z matches only the empty string

    ^$ matches each empty line in the subject

    text (in ^ and $ match at line breaks mode)

    Note - In .NET, if you cannot turn on ^ and $ match at line breaks mode outside

    the regular expression, you can place (?m) mode modifier at the start of the

    regular expression

  • 8/2/2019 REGEX Extended

    12/39

    Regular Expression Options (C#)None Specifies that no options are set.

    IgnoreCase Specifies case-insensitive matching.

    Multiline Multiline mode. Changes the meaning of ^ and $ so they match at thebeginning and end, respectively, of any line, and not just the beginning and

    end of the entire string (Caret and dollar match at line breaks)

    ExplicitCapture Specifies that the only valid captures are explicitly named or numbered groups

    of the form (?). This allows unnamed parentheses to act as

    noncapturing groups without the syntactic clumsiness of the expression (?:).

    Compiled Specifies that the regular expression is compiled to an assembly. This yieldsfaster execution but increases startup time.

    Singleline Specifies single-line mode. Changes the meaning of the dot (.) so it matches

    every character (instead of every character except \n). (Dot matches line

    break)

    IgnorePatternWhitespace Eliminates unescaped white space from the pattern and enables comments

    marked with #. (Free-spacing).RightToLeft Specifies that the search will be from right to left instead of from left to right.

    ECMAScript Enables ECMAScript-compliant behavior for the expression. This value can be

    used only in conjunction with the IgnoreCase, Multiline, and Compiled values.

    The use of this value with any other values results in an exception (JavaScript

    flavor) - most important effect is that with this option, \w and \d are restricted

    to ASCII characters, as they are in JavaScriptCultureInvariant Specifies that cultural differences in language is ignored.

  • 8/2/2019 REGEX Extended

    13/39

    Setting Regular Expression Options

    C#Regex regexObj = new Regex("regex pattern",

    RegexOptions.IgnorePatternWhitespace | RegexOptions.IgnoreCase |RegexOptions.Singleline | RegexOptions.Multiline);

    JavaScriptvar myregexp = /regex pattern/im;

    Regex Options1. Free-spacing: Not supported by JavaScript.

    2. Case insensitive: /i3. Dot matches line breaks: Not supported by JavaScript.

    4. Caret and dollar match at line breaks: /m

    5. Additional Language-Specific Options: apply a regular expression repeatedly to thesame string: /g

  • 8/2/2019 REGEX Extended

    14/39

    Test Whether a Match Can Be Found

    Within a Subject StringC#:bool foundMatch = false;try {

    foundMatch = Regex.IsMatch(subjectString, UserInput);} catch (ArgumentNullException ex) {

    // Cannot pass null as the regular expression orsubject string} catch (ArgumentException ex) {

    // Syntax error in the regular expression}

    orbool foundMatch = Regex.IsMatch(subjectString, "regex pattern");

    Note:@"\Aregex pattern\Z" - regex matches the subject string entirely

    Javascript:if (/regex pattern/.test(subjectString)) {

    // Successful match} else {

    // Match attempt failed}

    Note: /^regex pattern&/.test(subjectString) - regex matches the subject string

    entirely

  • 8/2/2019 REGEX Extended

    15/39

    Retrieve the Matched TextC#:

    Regex regexObj = new Regex(@"\d+");string resultString = regexObj.Match(subjectString).Value;

    Note:1. regexObj.Match("123456", 3, 2)tries to find a match in "45

    2. regexObj.Match(subjectString).Index position in subject string

    3. regexObj.Match(subjectString).Length length of the match

    JavaScript:var result =

    subject.match(/\d+/);

    if (result) {

    result = result[0];

    } else {

    result = '';

    }

    var matchstart = -1;var matchlength = -1;

    var match = /\d+/.exec(subject);if (match) {

    matchstart = match.index;matchlength = match[0].length;

    }

    JavaScript:var result =

    subject.match(/\d+/);

    if (result) {

    result = result[0];

    } else {

    result = '';

    }

  • 8/2/2019 REGEX Extended

    16/39

    Match Whole Words \b - word boundary - matches at the start or the end of a

    word in three positions:

    Example: \bdog\b - The first \b requires the d to occur atthe very start of the string, or after a nonword character. Thesecond \b requires the g to occur at the very end of thestring, or before a nonword character (line break charactersare nonword characters). It matches dog in My dog is

    stupid, but not in I will build a doghouse. \Bmatches at every position in the subject text where \b

    does not match, at every position that is not at start or end ofa word.

    Example: \Bcat\B matches cat in scatter, but not in My catis lazy, category, or bobcat

    Note: you need to use alternation to combine \Bcat and cat\Binto \Bcat|cat\B

    U i d C d P i P i

  • 8/2/2019 REGEX Extended

    17/39

    Unicode Code Points, Properties,

    Blocks, and Scripts (1)Solution Matches Flavor Note

    \u2122 Unicode

    code point

    .NET

    JScript

    - a code point is one entry in the Unicode character database

    (\u2122 trademark sign)

    - \u syntax requires exactly four hexadecimal digits

    (U+0000 through U+FFFF)

    \p{Sc} Unicode

    propertyor

    category

    .NET \pL- - Any kind of letter from any language

    \pM- - A character intended to be combined with anothercharacter (accents etc.)

    \pZ- - Any kind of whitespaces or invisible characters

    \pS- - Math symbols, currency signs etc.

    \pN- - Any kind of numeric character in any script

    \pP- - Any kind of punctuation character

    \pC- - Invisible control characters and unused code points

    \p{IsGreek

    Extended}

    Unicode

    block

    .NET \p{InBasic_Latin- \p{InGreek_and_Coptic- \p{InCyrillic-

    \p{InKatakana- etc.

    \P{M}\p{M

    }*

    Unicode

    grapheme

    .NET Unicode grapheme - combining marks -

    "\u00E0\u0061\u0300

  • 8/2/2019 REGEX Extended

    18/39

    Unicode Code Points, Properties,

    Blocks, and Scripts (2) The uppercase \P is the negated variant of the lowercase

    \p. Example: \PSc- matches any character that does nothave the Currency Symbol Unicode property.

    JavaScript flavor does not support Unicode categories, blocks,

    or scripts, you can list the characters that are in the category,block, or in a character class. Alternative versions for:

    Blocks - [\u1F00-\u1FFF] \p{IsGreekExtended}

    Category, character class you should create a character classwith all the unicodes from the specific category/characterclass

    See also: http://www.unicode.org/

    http://www.unicode.org/http://www.unicode.org/
  • 8/2/2019 REGEX Extended

    19/39

    Character class subtractions in .NET

    General form: *class-*subtract++

    Example :

    1. [a-zA-Z0-9-[g-zG-Z]]

    2. *\p{IsThai}-[\PN-++ matches any of the 10 Thai digits.

    \p{IsThai- - matches any character in the Thai

    block\PN- matches any character that doesnt have the Number

    property

  • 8/2/2019 REGEX Extended

    20/39

    Match One of Several Alternatives

    The vertical bar, or pipe symbol, splits the regular expression

    into multiple alternatives

    Example: apply Mary|Jane|Sue to Mary, Jane, and Sue

    went to Mary's housethe match Mary is immediately found

    at the start of the string

    The order of the alternatives in the regex matters only when

    two of them can match at the same position in the string. The

    solution would be to leave the most general string last in the

    enumeration.

  • 8/2/2019 REGEX Extended

    21/39

    Group and Capture Parts of the Match A capturing group is a pair of parentheses that can capture only part of the

    regular expressions

    Example: \b(\d\d\d\d)-(\d\d)-(\d\d)\b1. Has three capturing groups (\d\d\d\d), (\d\d) and (\d\d)

    2. During the matching process the regular expression engine stores the part ofthe text matched by the capturing group

    Applied on subject string 2012 10 2 groups 2012, 10 , 2

    Noncapturing groups : (?: opens the noncapturing groups (not available in Jscript flavor)

    You can specify mode modifiers (example: (?i: ) case insensitivenoncapturing group)

    Benefits:

    You can add them to an existing regex without upsetting the references tonumbered capturing groups

    Performance - a capturing group adds unnecessary overhead that you caneliminate by using a noncapturing group

    Note: parts of the match can be named : \b(?\d\d\d\d)-(?\d\d)-(?\d\d)\b or \b(?\d\d\d\d)-(?\d\d)-(?\d\d)\b (only .NET).

  • 8/2/2019 REGEX Extended

    22/39

    Match Previously Matched Text Again

    Steps

    1. Capture a text in a group

    2. Match the same text anywhere in the regex

    using a backreference (backslash followed by anumber)

    Example: \b\d\d(\d\d)-\1-\1\b matches 2012-09-09, 2012-10-10, 2012-11-11 etc.

    Note: you can name a backreference:\b\d\d(?\d\d)-\k-\k\b

  • 8/2/2019 REGEX Extended

    23/39

    Retrieve Part of the Matched Text

    C#:string resultString = Regex.Match(subjectString, "http://([a-z0-9.-

    ]+)").Groups[1].Value;

    string resultString = Regex.Match(subjectString,

    "http://(?[a-z0-9.-]+)").Groups["domain"].Value;

    JavaScript:var result = "";

    var match = /http:\/\/([a-z0-9.-]+)/.exec(subject);

    if (match) {

    result = match[1];} else {

    result = '';

    }

  • 8/2/2019 REGEX Extended

    24/39

    Retrieve a List of All Matches

    C#:Regex regexObj = new Regex(@"\d+");

    MatchCollection matchlist = regexObj.Matches(subjectString);

    JavaScript:var list = subject.match(/\d+/g);

    Note:

    - the /g flag tells the match() function to iterate over all matches in the string

    and put them into an array

    - regex with the /g flag, string.match() does not provide any further details

    about the regular expression

  • 8/2/2019 REGEX Extended

    25/39

    Iterate over All MatchesC#:

    Match matchResult = Regex.Match(subjectString, @"\d+");while (matchResult.Success) {

    // Here you can process the match stored in matchResult

    matchResult = matchResult.NextMatch();

    }

    JavaScript:var regex = /\d+/g;

    var match = null;

    while (match = regex.exec(subject)) {

    // Don't let browsers such as Firefox get stuck in an infinite loop

    if (match.index == regex.lastIndex) regex.lastIndex++;// Here you can process the match stored in the match variable

    }

    Note: exec() should set lastIndex to the first character after the match if the match iszero characters long, the next match attempt will begin at the position of the match justfound, resulting in an infinite loop

    Repeat Part of the Regex a Certain

  • 8/2/2019 REGEX Extended

    26/39

    Repeat Part of the Regex a Certain

    Number of Times

    \b\d{100}\b - a decimal number with 100 digits

    \b[a-f0-9]{1,8}\b - A 32-bit hexadecimal number

    \b[a-f0-9]{1,8}h?\b - A 32-bit hexadecimal number with an

    optional h suffix

    \b\d*\.\d+(e\d+)? - A floating-point number with an optional

    integer part, a mandatory fractional part, and an optional

    exponent

    Token Result Notes

    {n} repeats the preceding regex token nnumber of times

    {n,m} Variable repetition (between n and m

    times)

    {n,} Infinite repetition but more than n times \d1,- matches one or more digits\d

    \d0,- matches zero or more digits\d\d0,1- matches zero or one digit\d?

    +, * , ? - greedy quantifiers

  • 8/2/2019 REGEX Extended

    27/39

    Choose Minimal or Maximal Repetition (1)

    Lazy quantifiers repeats as few times as it has to, stores one

    backtracking position, and allows the regex to continue- the regex goes ahead only one character at a time,

    each time checking whether the following text can bematched

    You can make any quantifier lazy by placing a questionmark after it: ?, ?, ??, and 7,42-?

    Example:

    The very first task is to find the beginningof a paragraph.

    Then you have to find the end of theparagraph

    .*

    vs

    .*?

  • 8/2/2019 REGEX Extended

    28/39

    Choose Minimal or Maximal Repetition (2)

    Possessive quantifiers it tries to repeat as many times as possible

    will never give back, not even when giving back is the only way thatthe remainder of the regular expression could match.

    do not keep backtracking positions

    You can make any quantifier possessive by placing a plus sign after it:, , ?, and 7,42-

    Possessive quantifiers Atomic group (not available in JScript) a noncapturing group, with the extra job of refusing to backtrack

    the opening bracket simply consists of the three characters (?>

    \b\d++\b\b(?>\d+)\b

    \w++\d(?>\w+)(?>\d+)

  • 8/2/2019 REGEX Extended

    29/39

    Test for a Match Without Adding It to

    the Overall Match Lookaround - checks whether certain text can be matched

    without actually matching it:

    1. lookbehind

    positive : (?"a

    2. lookahead

    positive : q(?=u) matches a "q" that is followed by a "u"

    negative : q(?!u) matches a "q" not followed by a "u

    Note: JavaScript supports only lookahead

  • 8/2/2019 REGEX Extended

    30/39

    Match One of Two Alternatives Based

    on a Condition

    (?(1)then|else) - checks whether the first capturing group has

    already matched something

    Example:

    1. \b(?:(?:(one)|(two)|(three))(?:,|\b)){3,}(?(1)|(?!))(?(2)|(?!))(?(3)|(?!))

    (?(1)|(?!)) - if named group "(1)"

    - then empty regex "" (always pass)

    -else empty negative lookahead (?!) (always fail)

    2. (a)?b(?(1)c|d)abc|bd

  • 8/2/2019 REGEX Extended

    31/39

    Insert Literal Text into the

    Replacement Text (1)

    Key characters:

    \ - literal character does not need to be escaped

    $ - need to be escaped only when they are

    followed by a digit, &, `, ", _, +, or $; to escape a

    dollar sign, precede it with another dollar sign.

    Example: $%\*$1\1 => $%\*$$1\1

    Note: $1 and/or \1 are a backreference to acapturing group and $& refers to whole regex

  • 8/2/2019 REGEX Extended

    32/39

    Insert Literal Text into the

    Replacement Text (2)

    Examples:

    1. Regular expression: http:\S+

    Replacement: $&

    2. Regular expression: \b(\d{4})(\d{3})(\d{3})\bReplacement: ($1) $2-$3

    3. Regular expression: \b(?\d{3})(?\d{3})(?\d{4})\b

    Replacement: (${g1}) ${g2}-${g3}

    Note: .NET and JavaScript leave backreferences to groups that

    dont exist as literal text in the replacement.

  • 8/2/2019 REGEX Extended

    33/39

    Replace All Matches

    C#:Regex regexObj = new Regex("pattern");

    string resultString = regexObj.Replace(subjectString,replacement, count);

    Example: Replace(subject, replacement, 3) replaces only the first threeregular expression matches, and further matches are ignored.

    JavaScript:

    result = subject.replace(/before/g, "after");Note: if you want to replace all regex matches in the string, set the /g flag when

    creating your regular expression object; if you dont use the /g flag, only the first

    match will be replaced.

  • 8/2/2019 REGEX Extended

    34/39

    Replace Matches Reusing Parts of the

    MatchC#:string resultString = Regex.Replace(subjectString, @"(\w+)=(\w+)",

    "$2=$1");

    or

    Regex regexObj = new Regex(@"(\w+)=(\w+)");

    string resultString = regexObj.Replace(subjectString, "$2=$1");

    With named groups:

    Regex regexObj = new Regex(@"(?\w+)=(?\w+)");

    string resultString = regexObj.Replace(subjectString,

    "${right}=${left}");

    JavaScript:result = subject.replace(/(\w+)=(\w+)/g, "$2=$1");

  • 8/2/2019 REGEX Extended

    35/39

    Replace Matches with Replacements

    Generated in CodeC#:Regex regexObj = new Regex(@"\d+");string resultString = regexObj.Replace(subjectString, new

    MatchEvaluator(ComputeReplacement));

    public String ComputeReplacement(Match matchResult) {int t= int.Parse(matchResult.Value) * 2;

    return t.ToString();}

    JavaScript:var result = subject.replace(/\d+/g,

    function(match) { return match * 2; }

    );

    Note: replacement function may accept one or more parameters: the firstparameter will be set to the text matched by the regular expression. If theregular expression has capturing groups, the second parameter will hold thetext matched by the first capturing group, the third parameter gives you the

    text of the second capturing group, and so on.

  • 8/2/2019 REGEX Extended

    36/39

    Split a stringC#:string[] splitArray = Regex.Split(subjectString, "");

    JavaScript:var list = [];

    var regex = //g;var match = null;

    var lastIndex = 0;

    while (match = regex.exec(subject)) {

    // Don't let browsers such as Firefox get stuck in an infinite loop

    if (match.index == regex.lastIndex) regex.lastIndex++;// Add the text before the match

    list.push(subject.substring(lastIndex, match.index));

    lastIndex = match.index + match[0].length;

    }

  • 8/2/2019 REGEX Extended

    37/39

    Search Line by LineC#:

    string[] lines = Regex.Split(subjectString, "\r?\n");Regex regexObj = new Regex("regex pattern");for (int i = 0; i < lines.Length; i++) {

    if (regexObj.IsMatch(lines[i])) {// The regex matches lines[i]

    } else {// The regex does not match lines[i]

    }}

    JavaScript:var lines = subject.split(/\r?\n/);var regexp = /regex pattern/;for (var i = 0; i < lines.length; i++) {

    if (lines[i].match(regexp)) {// The regex matches lines[i]

    } else {// The regex does not match lines[i]

    }}

  • 8/2/2019 REGEX Extended

    38/39

    Validation and Formatting (1)

    Email address^[\w!#$%&'*+/=?`{|}~^]+(?:\.[!#$%&'*+/=?`{|}~^-]+)*@(?:[A-Z0-9-]+\.)+[A-Z]{2,6}$

    International Phone Numbers^\+(?:[0-9]\x20?){6,14}[0-9]$

    Validate Traditional Date Formats^(?:(0?2)/([12][0-9]|0?[1-9])|(0?[469]|11)/(30|[12][0-9]|0?[1-

    9])|(0?[13578]|1[02])/(3[01]|[12][0-9]|0?[1-9]))/((?:[0-9]{2})?[0-9]{2})$

    Limit the Number of Lines in Text^(?:(?:\r\n?|\n)?[^\r\n]*){0,5}$

    Validate Affirmative Responses^(?:1|t(?:rue)?|y(?:es)?|ok(?:ay)?)$

  • 8/2/2019 REGEX Extended

    39/39

    Validation and Formatting (2)

    Find Words Near Each Other\b(?:word1\W+(?:\w+\W+){0,5}?word2|word2\W+(?:\w+\W+){0,5}?word1)\b

    Remove Duplicate Lines^(.*)(?:(?:\r?\n|\r)\1)+$ replaced with $1

    Validating URL^((https?|ftp)://|(www|ftp)\.)[a-z0-9-]+(\.[a-z0-9-]+)+([/?].*)?$

    Extracting the Query from a URL^[^?#]+\?([^#]+)

    Validate Windows Paths^(?:[a-z]:|\\\\[a-z0-9_.$]+\\[a-z0-9_.$]+)\\(?:[^\\/:*?"|\r\n]+\\)*[^\\/:*?"|\r\n]*$