Regular Expression rules

Syntax

The syntax for the REGEXP_SUBSTR function in Oracle is:

REGEXP_SUBSTR( string, pattern [, start_position [, nth_appearance [, match_parameter [, sub_expression ] ] ] ] )

Parameters or Arguments

string
The string to search. It can be CHAR, VARCHAR2, NCHAR, NVARCHAR2, CLOB, or NCLOB.
pattern
The regular expression matching information. It can be a combination of the following:

Value Description
^ Matches the beginning of a string. If used with a match_parameter of ‘m’, it matches the start of a line anywhere within expression.
$ Matches the end of a string. If used with a match_parameter of ‘m’, it matches the end of a line anywhere within expression.
* Matches zero or more occurrences.
+ Matches one or more occurrences.
? Matches zero or one occurrence.
. Matches any character except NULL.
| Used like an “OR” to specify more than one alternative.
[ ] Used to specify a matching list where you are trying to match any one of the characters in the list.
[^ ] Used to specify a nonmatching list where you are trying to match any character except for the ones in the list.
( ) Used to group expressions as a subexpression.
{m} Matches m times.
{m,} Matches at least m times.
{m,n} Matches at least m times, but no more than n times.
\n n is a number between 1 and 9. Matches the nth subexpression found within ( ) before encountering \n.
[..] Matches one collation element that can be more than one character.
[::] Matches character classes.
[==] Matches equivalence classes.
\d Matches a digit character.
\D Matches a nondigit character.
\w Matches a word character.
\W Matches a nonword character.
\s Matches a whitespace character.
\S matches a non-whitespace character.
\A Matches the beginning of a string or matches at the end of a string before a newline character.
\Z Matches at the end of a string.
*? Matches the preceding pattern zero or more occurrences.
+? Matches the preceding pattern one or more occurrences.
?? Matches the preceding pattern zero or one occurrence.
{n}? Matches the preceding pattern n times.
{n,}? Matches the preceding pattern at least n times.
{n,m}? Matches the preceding pattern at least n times, but not more than m times.
start_position
Optional. It is the position in string where the search will start. If omitted, it defaults to 1 which is the first position in the string.
nth_appearance
Optional. It is the nth appearance of pattern in string. If omitted, it defaults to 1 which is the first appearance of pattern in string.
match_parameter
Optional. It allows you to modify the matching behavior for the REGEXP_SUBSTR function. It can be a combination of the following:

Value Description
‘c’ Perform case-sensitive matching.
‘i’ Perform case-insensitive matching.
‘n’ Allows the period character (.) to match the newline character. By default, the period is a wildcard.
‘m’ expression is assumed to have multiple lines, where ^ is the start of a line and $ is the end of a line, regardless of the position of those characters in expression. By default, expression is assumed to be a single line.
‘x’ Whitespace characters are ignored. By default, whitespace characters are matched like any other character.
subexpression
Optional. This is used when pattern has subexpressions and you wish to indicate which subexpression in pattern is the target. It is an integervalue from 0 to 9 indicating the subexpression to match on in pattern.

Returns

The REGEXP_SUBSTR function returns a string value.
If the REGEXP_SUBSTR function does not find any occurrence of pattern, it will return NULL.

Note

  • If there are conflicting values provided for match_parameter, the REGEXP_SUBSTR function will use the last value.
  • If you omit the match_behavior parameter, the REGEXP_SUBSTR function will use the NLS_SORT parameter to determine if it should use a case-sensitive search, it will assume that string is a single line, and assume the period character to match any character (not the newline character).
  • See also the SUBSTR function.

Leave a Reply