RegEx

The Regular Expression or RegEx is very common in many programming languages and it is also available in ABAP.

When should a Regular Expression be used ? When you are working/checking complex patterns in string (or character) data. E.g. to check whether an email address is likely to work, a RegEx addition to the FIND or REPLACE statement can be used.

FIND ALL OCCURRENCES OF
  REGEX '[A-Z]' IN 'ABCDE12345'
  MATCH COUNT sy-tabix.

REPLACE ALL OCCURRENCES OF
  REGEX '[A-Z]' IN lv_input
  WITH SPACE.

The RegEx for a fullblown email validity test would be ^[_a-z0-9-]+(\.[_a-z0-9-]+)*@[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,4})$. Also check CL_ABAP_REGEX and CL_ABAP_MATCHER for more pattern processing powers.

The REPLACE statement also support a table version - check this out:

REPLACE ALL OCCURRENCES OF REGEX '\b(DM)\b' 
  IN TABLE itab WITH 'EUR' 
  RESPECTING CASE.

For a DM to EURO conversion (from SAP help text on REPLACE).

The regular expression has a place in the REPLACE and FIND statements, but it is also available from classes CL_ABAP_REGEX and CL_ABAP_MATCHER.

Use report DEMO_REGEX_TOY to play around with what you can do with a regular expression. Not only does this demonstrate how regular expressions work, it can be used to compose your own regex

Regular expression options (metacharacters) explained

There are a great variety of options in the regular expression. An overview:

Metacharacter	Description
`.`	Matches any single character (many applications exclude newlines, and exactly which characters are considered newlines is flavor-, character-encoding-, and platform-specific, but it is safe to assume that the line feed character is included). Within POSIX bracket expressions, the dot character matches a literal dot. For example, `a.c` matches "abc", etc., but `[a.c]` matches only "a", ".", or "c".
`[ ]`	A bracket expression. Matches a single character that is contained within the brackets. For example, `[abc]` matches "a", "b", or "c". `[a-z]` specifies a range which matches any lowercase letter from "a" to "z". These forms can be mixed: `[abcx-z]` matches "a", "b", "c", "x", "y", or "z", as does `[a-cx-z]`. The `-` character is treated as a literal character if it is the last or the first (after the `^`, if present) character within the brackets: `[abc-]`, `[-abc]`. Note that backslash escapes are not allowed. The `]` character can be included in a bracket expression if it is the first (after the `^`) character: `[]abc]`.
`[^ ]`	Matches a single character that is not contained within the brackets. For example, `[^abc]` matches any character other than "a", "b", or "c". `[^a-z]` matches any single character that is not a lowercase letter from "a" to "z". Likewise, literal characters and ranges can be mixed.
`^`	Matches the starting position within the string. In line-based tools, it matches the starting position of any line.
`$`	Matches the ending position of the string or the position just before a string-ending newline. In line-based tools, it matches the ending position of any line.
`( )`	Defines a marked subexpression. The string matched within the parentheses can be recalled later (see the next entry, `\n`). A marked subexpression is also called a block or capturing group. BRE mode requires ``.
`\n`	Matches what the nth marked subexpression matched, where n is a digit from 1 to 9. This construct is vaguely defined in the POSIX.2 standard. Some tools allow referencing more than nine capturing groups.
`*`	Matches the preceding element zero or more times. For example, `abc` matches "ac", "abc", "abbbc", etc. `[xyz]` matches "", "x", "y", "z", "zx", "zyx", "xyzzy", and so on. `(ab)*` matches "", "ab", "abab", "ababab", and so on.
`?`	Matches the preceding element zero or one time. For example, `ab?c` matches only "ac" or "abc".
`+`	Matches the preceding element one or more times. For example, `ab+c` matches "abc", "abbc", "abbbc", and so on, but not "ac".
`\|`	The choice (also known as alternation or set union) operator matches either the expression before or the expression after the operator. For example, `abc\|def` matches "abc" or "def".
`{m,n}`	Matches the preceding element at least m and not more than n times. For example, `a{3,5}` matches only "aaa", "aaaa", and "aaaaa". This is not found in a few older instances of regular expressions. BRE mode requires `\{m,n\}`.

Note that the super search report RS_ABAP_SOURCE_SCAN supports RegExp search, so you can search Abap coding for just about anything. Very useful !!

Home AbapcadabrA cookies Strings RegEx

RegEx

Search

Topics overview