오라클 정규식을 찾던중에 레퍼런스 페이지에서 발견~
10g부터 지원이 된다하니.... 9i까지는 삽질의 세계로~
Writing Better SQL Using Regular Expressions
By Alice Rischert
The regular expressions feature in Oracle Database 10g is a powerful tool for manipulating textual data
A new feature of Oracle Database 10g vastly improves your ability to search and manipulate character data. This feature, regular expressions, is a notation for describing textual patterns. It has long been available in many programming languages and a number of UNIX utilities.
Oracle's implementation of regular expressions comes in the form of various SQL functions and a WHERE clause operator. If you are unfamiliar with regular expressions, this article can give you a glimpse into this new and extremely powerful yet seemingly cryptic capability. Readers already familiar with regular expressions can gain an understanding of how to apply this functionality within the context of the Oracle SQL language.
What Is a Regular Expression?
A regular expression comprises one or more character literals and/or metacharacters. In its simplest format, a regular expression can consist only of character literals, such as the regular expression cat. It is read as the letter c followed by the letters a and t and this pattern matches strings such as cat, location, and catalog. Metacharacters provide algorithms that specify how Oracle should process the characters that make up a regular expression. When you understand the meaning of the various metacharacters, you will see that regular expressions are powerful for isolating and replacing specific textual data.
Data validation, identification of duplicate word occurrences, detection of extraneous white spaces, or parsing of strings are just some of the many uses of regular expressions. You can apply them in order to validate the formats of phone numbers, zip codes, email addresses, Social Security numbers, IP addresses, filenames and pathnames, and so on. Furthermore, you can locate patterns such as HTML tags, numbers, dates, or anything that fits any pattern within any textual data and replace them with other patterns.
Using Regular Expressions With Oracle Database 10g
To harness the power of regular expressions, you can exploit the newly introduced Oracle SQL REGEXP_LIKE operator and the REGEXP_INSTR, REGEXP_SUBSTR, and REGEXP_REPLACE functions. You will see how this new functionality supplements the existing LIKE operator and the INSTR, SUBSTR, and REPLACE functions. In fact, they are similar to the existing operator and functions but now offer powerful pattern-matching capabilities. The searched data can be simple strings or large volumes of text stored in the database character columns. Regular expressions let you search, replace, and validate data in ways you have never thought of before, with a high degree of flexibility.
Basic Examples of Regular Expressions
Before using the new functionality, you need to understand the meaning of some of the metacharacters. The period (.) matches any character (except newline) in a regular expression. For example, the regular expression a.b matches a string containing the letter a, followed by any other single character (except newline), followed by the letter b. The strings axb, xaybx, and abba are matches because this pattern is buried in the string. If you want to exactly match a three-letter string in which the line begins with a and ends with b, you must anchor the regular expression. The caret (^) metacharacter indicates the start of a line, and the dollar symbol ($) designates the end of the line (see Table 1). Therefore, the regular expression ^a.b$ matches the strings aab, abb, or axb. To contrast this approach with the familiar pattern matching available with the LIKE operator, you can express such a pattern as a_b, where the underbar (_) is the one-character wildcard.
By default, an individual character or character list in a regular expression matches just once. To indicate multiple occurrences of a character in a regular expression, you apply a quantifier, also called a repetition operator. If you want a match that starts with the letter a and ends with the letter b, your regular expression looks like this: ^a.*b$. The * metacharacter repeats the preceding match any metacharacter (.) zero, one, or more times. The equivalent pattern with the LIKE operator is a%b, with the percent (%) indicating zero, one, or multiple occurrences of any character.
Table 2 shows the complete list of repetition operators. Notice that it contains specific repetition choices that allow more flexibility than the existing LIKE wildcard characters. If you use parentheses around an expression, effectively creating a subexpression, the subexpression can be repeated a certain number of times. For example, the regular expression b(an)*a matches ba, bana, banana, yourbananasplit, and so on.
Oracle's regular expression implementation supports the POSIX (Portable Operating System Interface) character classes, as listed in Table 3. This means that you can be very specific about the type of character you are looking for. Imagine writing a LIKE condition that looks only for nonalphabetic characters—the resulting WHERE clause could easily become very complex.
The POSIX character class must be enclosed by a character list indicated by square brackets ([]). For example, the regular expression [[:lower:]] matches a lowercase character and [[:lower:]]{5} matches five consecutive lowercase characters.
Besides the POSIX character classes, you can place individual characters in a character list. For example, the regular expression ^ab[cd]ef$ matches the strings abcef and abdef. Either c or d must be chosen.
Most metacharacters inside a character list are understood as literals, with the exception of the caret (^) and the hyphen (-). Regular expressions appear complicated because some metacharacters have multiple meanings, depending on the context. The ^ is just one such metacharacter. If you use it as the first character inside a character list, it means the negation of a character list. Therefore, [^[:digit:]] looks for a pattern consisting of any nondigit character whereas ^[[:digit:]] looks for matches that start with a digit. The hyphen (-) indicates a range; the regular expression [a-m] matches any of the letters a through letter m. But it means the literal hyphen if it is the first character in a character list such as in [-afg].
One of the previous examples introduced the use of parentheses to create a subexpression; they allow you to enter alternates separated by the vertical bar (|) alteration metacharacter.
For example, the regular expression t(a|e|i)n allows three possible alternate characters between the letters t and n. Matches include words such as tan, ten, tin, and Pakistan but not teen, mountain, or tune. Alternatively, the regular expression t(a|e|i)n can also be expressed as a character list, which is t[aei]n. Table 4 summarizes these metacharacters. Although more metacharacters exist, this brief overview is sufficient for understanding the regular expressions this article uses.
The REGEXP_LIKE Operator
The REGEXP_LIKE operator introduces you to regular expression functionality when applied within the Oracle database. Table 5 lists the syntax of REGEXP_LIKE.
The following SQL query's WHERE clause shows the REGEXP_LIKE operator, which searches the ZIP column for a pattern that satisfies the regular expression [^[:digit:]]. It will retrieve those rows in the ZIPCODE table for which the ZIP column values contain any character that is not a numeric digit.
SELECT zip
FROM zipcode
WHERE REGEXP_LIKE(zip, '[^[:digit:]]')
ZIP
-----
ab123
123xy
007ab
abcxy
This example of a regular expression consists only of metacharacters—more specifically the POSIX character class digit delimited by colons and square brackets. The second set of brackets (as in [^[:digit:]]) encloses a character class list. As previously mentioned, this is required because you can use POSIX character classes only for constructing a character list.
The REGEXP_INSTR Function
This function returns the starting position of a pattern, so it works much like the familiar INSTR function. The syntax of the new REGEXP_INSTR function is shown in Table 6. The main difference between the two functions is that REGEXP_INSTR lets you specify a pattern instead of a specific search string; thus providing greater versatility. The next example uses REGEXP_INSTR to return the starting position of the five-digit zip-code pattern within the string Joe Smith, 10045 Berry Lane, San Joseph, CA 91234. If the regular expression is written as [[:digit:]]{5}, you will get the house number's starting position instead of the zip code's, because 10045 is the first occurrence of five consecutive digits. Therefore, you must anchor the expression to the end of the line, as indicated with the $ metacharacter, and the function will display the starting position of the zip code regardless of the number of digits for the house number.
SELECT REGEXP_INSTR('Joe Smith, 10045 Berry Lane, San Joseph, CA 91234',
'[[:digit:]]{5}$')
AS rx_instr
FROM dual
RX_INSTR
----------
45
Writing More Complex Patterns
Let's expand on the zip code pattern of the previous example to include an optional four digits. Your pattern may now look like this: [[:digit:]]{5}(-[[:digit:]]{4})?$. If your source string ends in either the 5-digit zip code or the 5-digit + 4 zip-code format, you'll be able to show the pattern's starting position.
SELECT REGEXP_INSTR('Joe Smith, 10045 Berry Lane, San Joseph, CA 91234-1234',
' [[:digit:]]{5}(-[[:digit:]]{4})?$')
AS starts_at
FROM dual
STARTS_AT
----------
44
In this example the parenthesized subexpression (-[[:digit:]]{4}) is repeated zero or one times, as indicated by the ? repetition operator. Again, attempting to use the traditional SQL functions to accomplish the same result poses a challenge even to SQL experts. To better explain the various components of the regular expression example, Table 7 contains a description of the individual literals and metacharacters.
The REGEXP_SUBSTRFunction
The REGEXP_SUBSTR function, much like the SUBSTR function, extracts part of a string. Table 8 displays the syntax of the new function. In the following example, the string that matches the pattern , [^,]*, is returned. The regular expression searches for a comma followed by a space; then zero or more characters that are not commas, as indicated by [^,]*; and lastly looks for another comma. The pattern will look somewhat similar to a comma-separated values string.
SELECT REGEXP_SUBSTR('first field, second field , third field',
', [^,]*,')
FROM dual
REGEXP_SUBSTR('FIR
------------------
, second field ,
The REGEXP_REPLACE Function
Let's first look at the traditional REPLACE SQL function, which substitutes one string with another. Assume your data has extraneous spaces in the text and you would like to replace them with a single space. With the REPLACE function, you need to list exactly how many spaces you want to replace. However, the number of extra spaces may not be the same everywhere in the text. The next example has three spaces between Joe and Smith. The REPLACE function's parameter specifies that two spaces should be replaced with one space. In this case, the result leaves an extra space where there were three spaces in the original string between Joe and Smith.
SELECT REPLACE('Joe Smith',' ', ' ')
AS replace
FROM dual
REPLACE
---------
Joe Smith
The REGEXP_REPLACE function takes the substitution a step further; the syntax is listed in Table 9. The following query replaces any two or more spaces with a single space. The ( ) subexpression contains a single space, which can be repeated two or more times, as indicated by {2,}.
SELECT REGEXP_REPLACE('Joe Smith',
'( ){2,}', ' ')
AS RX_REPLACE
FROM dual
RX_REPLACE
----------
Joe Smith
|
|
'Source Storage > DataBase' 카테고리의 다른 글
Oracle Table Partitioning (0) | 2008.04.22 |
---|---|
ORACLE HINT RULE (0) | 2008.04.04 |
ORACLE TABLE INDEX RULE (0) | 2008.03.27 |
SQL Function Reference: Oracle vs. SQL Server (0) | 2007.01.16 |
SQL Server에서 사용할 수 있는 서버측 페이징기법 (0) | 2005.10.14 |