reMatch Function

Purpose

Searches a string expression or string array for occurrence of a substring using regular expressions.

Syntax

n = reMatch(sexp, pattern [, hash[] | address% ])

n = reMatch(array$(), pattern ,from ,to [, hash[]])

sexp, pattern:string expression
address, from, to:iexp
hash[]:Hash String or Hash Int
array():string array
n:iexp

Description

In the simplest form reMatch searches a substring in a string like InStr(). The return value gives the position of the substring pattern within the string sexp. n = 0 when the substring isn't found. However, using regular expression patterns reMatch is capable of locating much more. For instance, an A followed by a b or d, then an e, and maybe a r. Next a point, comma, or a space, or an end-of-line.

"A[bd]er?([., ]|$)"

Special characters and sequences are used in writing patterns for regular expressions. The following table describes these characters and includes short examples showing how the characters are used.

Character Description
\ Marks the next character as special. \. a point; \\ a backslash; \* star; \+ plus; \[; \]; \(; \); \^; \$. Any character that has a special meaning for a pattern.
^ Matches the beginning of input or line.
$ Matches the end of input or line.
* Matches the preceding character zero or more times. "zo*" matches either "z" or "zoo."
+ Matches the preceding character one or more times. "zo+" matches "zoo" but not "z."
? Matches the preceding character zero or one time. "a?ve?" matches the "ve" in "never."
. Matches any single character except a newline character.
(pattern) A group. To match parentheses characters ( ), use "\(" or "\)".
x|y Matches either x or y. "z|food?" matches "zoo" or "food."
[xyz] A character set. Matches any one of the enclosed characters. "[abc]" matches the "a" in "plain." The special characters (, ), *, ., $ and ^ have no special meaning inside a set.
[^xyz] A negative character set. Matches any character not enclosed. "[^abc]" matches the "p" in "plain."
\A Matches the beginning of input or line, same as ^.
\Z Matches the end of input or line, same as $
\e Matches a an escape character (Esc)
\cX Matches a control character \cA (control-A)
\d Matches a digit character. Equivalent to [0-9].
\D Matches a nondigit character. Equivalent to [^0-9].
\f Matches a form-feed character.
\n Matches a linefeed character.
\r Matches a carriage return character.
\s Matches any white space including space, tab, form-feed, and so on. Equivalent to [ \f\n\r\t\v]
\S Matches any nonwhite space character. Equivalent to [^ \f\n\r\t\v]
\t Matches a tab character.
\v Matches a vertical tab character.
\w Matches any word character including underscore. Equivalent to [A-Za-z0-9_].
\W Matches any nonword character. Equivalent to [^A-Za-z0-9_].
\num Matches num, where num is a positive integer.
\xnn Matches nn, where nn is a hexadecimal number, like \x1b
\onn Matches nn, where nn is a octal number, like \0033


Some group examples for pattern:

"[abc]" An a, b, or c
"a|b|c" An a, b, or c
"[^abc]" Not a, b, c, but some other character
"[A-F0-9a-f]" a hexadecimal number (0 to 9, or a word character A-F, or a-f).
"[-A]" a Minus or an A
"[\dA-Fa-f]" another hexadecimal number

Combinations:

\d+a number at least one digit
\w+a word a sequence of word characters, digits, and _.
.* some character sequence
.+dito with at least one character
^a.*r\.$ A sentence starting with a and ending with r and a point.
[A-Z][a-z]* A normal word starting with an uppercase character and followed with any number of lowercase characters.
^\w+\s+(\w+)\s The second word of a sentence.

Special sequences:

(?b) Binary sort, A-Z does not enclose Umlaute and lowercase characters.
(?t) Text sort, [A-B] encloses Ä, and other apostrophe A's (Á, À, Â, Å ..) , as well as lowercase characters.
(?bi) Binary sort, ignores case: automatically enclosure of uppercase and lowercase characters.

n = reMatch(sexp, pattern, h[])

When h[] is a Hash String the first occurrence of the search pattern is placed in h[1].

When h[] is a Hash Int the location of the first occurrence of pattern is placed in h[1] and the length of the found substring in h[2].

n = reMatch(sexp, pattern, V:i%(0))

When the third parameter is an array of 32-bit integers (Dim i%(1)), then the start position of the substring is placed in i%(0) and the length in i%(1).

n = reMatch(array$(), pattern, from, to [, hi[]])

Searches pattern in the string array elements array$(from) to array$(to). The index of the first array element that contains the searched pattern is returned.

However, when the Hash Int variable is used as fifth parameter, the indices of all elements that contain the pattern are added to the Hash list. This works like VB's Filter function.

Example

Find a hexadecimal value

Debug.Show

Dim s$ = "zz 2a"

Trace reMatch(s$, "[A-F0-9a-f]+")

Dim hi As Hash Int

Trace reMatch(s$, "[A-F0-9a-f]+", hi[])

Debug.Print "hi[]-Found "; hi[% 1]; hi[% 2], Mid(s$, hi[% 1], hi[% 2])

Local hs As Hash String

Trace reMatch(s$, "[A-F0-9a-f]+", hs[])

Debug.Print "hs[]-Found", hs[% 1]

Dim ii(1) As Int

Trace reMatch(s$, "[A-F0-9a-f]+", V:ii(0))

Debug.Print "ii()-Found "; ii(0); ii(1), Mid(s$, ii(0), ii(1))

Locate in an array

Debug.Show

Dim a$() : Array a$() = "zz" #10 "zzz 3a " #10 "c = 0xaa"

Dim i As Int, hi As Hash Int

Trace reMatch(a$(), "[A-F0-9a-f]+", 0, _maxInt, hi[])

Debug.Print "a$()-Found at indices:"

For i = 1 To hi[%]

Debug hi[% i]

Next i

Remarks

The syntax of the regular expression patterns is strongly linked to Perl's re. GFA-BASIC 32 does not support the more exotic possibilities of Perl, like {n,m}, (?#), and *?. In contrast with Perl GFA-BASIC 32 allows 8-bits ANSI characters.

The internal handling of search patterns is simpler as in Perl, the performance is a little better as well.

The preMatch function converts pattern into an internal format for faster execution. This allows for more efficient use of regular expressions in loops

See Also

preMatch, reSub, reStop, Hash

{Created by Sjouke Hamstra; Last updated: 22/10/2014 by James Gaite}