MySQL offers the ability to use regular expressions to perform complex searches against your data. A regular expression is a tool that provides for a concise and flexible way to identify strings of text based on user-defined patterns.
This article will discuss the MySQL regular expression operators, review their use and syntax, and identify the constructs and special characters that can be used in a MySQL regular expression, as well as provide a few examples of their use.
MySQL Regular Expression Operators
The following operators are used in MySQL to perform regular expression operations. These are used in a WHERE clause similar to the well-known and often used LIKE operator.
- REGEXP: The pattern matching operator for using regular expressions.
- NOT REGEXP: The negation of the REGEXP operator.
- RLIKE: A synonym for the REGEXP operator.
MySQL Regular Expression Syntax
The basic syntax used for MySQL regular expression operations is:
-- For the REGEXP Operator SELECT {COLUMN_NAME} FROM {TABLE_NAME} WHERE {COLUMN_NAME} REGEXP '{REGEXP_PATTERN}'; -- For the NOT REGEXP Operator SELECT {COLUMN_NAME} FROM {TABLE_NAME} WHERE {COLUMN_NAME} NOT REGEXP '{REGEXP_PATTERN}'; -- For the RLIKE Alias Operator SELECT {COLUMN_NAME} FROM {TABLE_NAME} WHERE {COLUMN_NAME} RLIKE '{REGEXP_PATTERN}';
To provide more detailed, yet simple, example of a MySQL regular expression operation, take the following statement. It will retrieve all the columns of each record in the table PRICE where the PRICELIST_ID matches the pattern specified (starts with the numeric range 0-9 occurring one or more times, followed by an ‘_’ (underscore), and then the character sequence ‘USD’.
SELECT * FROM PRICE WHERE PRICELIST_ID REGEXP '^[0-9]+_USD';
Another example of a MySQL regular expression operation, can be shown in the following statement. It will retrieve all columns of each record from the PRICE table where the PRICE_ID matches the pattern specified (starts with an O, followed by and ‘_’ (underscore), then the numeric range 0-9 occurring one or more times, followed by and ‘_’ (underscore), then ending with either the character sequence USD, or BRA.
SELECT * FROM PRICE WHERE PRICE_ID REGEXP '^O_[0-9]+_[USD|BRA]';
MySQL REGEXP Constructs and Special Characters
A MySQL regular expression may use any of the following constructs and special characters to construct a pattern for use with the REGEXP operators. The construct or special character is shown, followed by a description of each and what operations in performs within the pattern for the regular expression.
- ^ : Match the beginning of a string.
- $ : Match the end of a string.
- . : Match any character (including carriage return and newline characters).
- a* : Match any sequence of zero or more a characters.
- a+ : Match any sequence of one or more a characters.
- a? : Match either zero or one a characters.
- de|abc : Match either of the character sequences, de or abc.
- (abc)* : Match zero or more instances of the character sequence abc.
- {1},{2,3} : Provides a more general way of writing regular expressions that match many occurences of the previous atom (or “piece”) of the pattern. i.e. a? can be written as a{0,1}.
- [a-dX],[^a-dX] : Matches any character that is (or is not, if ^ is used) either a, b, c, d, or X. A “-” character between two other characters forms a range that maches all characters from the first character to the second.
- [.characters.] : Within a bracket expression (using “[” and “]”), matches the sequence of characters of that collating element. i.e. the pattern [[.period.]] would match the ‘.’ (period) character.
- [=character_class=] : Within a bracket expression, represents an equivalence class. It matches all characters with the same collation value, including itself.
- [:character_class:] : Within a bracket expression, represents a character class that matches all characters belonging to that class. i.e. the pattern [[:alpha:]] would match against a string that is all aphabetic characters.
- [[:<:]],[[:>:]] : These markers stand for word boundaries, and as such they match the beginning and ending of words, respectively.
* NOTE: MySQL interprets the “\” (backslash) character as an escape character. If you choose to use the “\” character as part of your pattern in a regular expression it will need to escaped with another backslash “\\”.
For further documentation on the MySQL regular expression operator, please visit Regular Expressions in the MySQL Reference Manual (v5.1 currently linked).