Chapter 4. Regular Expressions, Classes, and Dynamic Evaluation and Execution

With the introduction of the VBScript 5.0 scripting engine, VBScript developers now have three very powerful techniques at their disposal which were previously unavailable with VBScript:

  • The availability of a Regular Expression object, RegExp. Regular expressions provide for advanced string matching and parsing. If you are new to regular expressions, you’ll soon realize their tremendous advantages and wonder how you lived without them!

  • The ability to create object-oriented code! VBScript now supports classes! I am very excited about this new feature, for it allows the creation of robust and reusable ASP pages. It’s also great for those working in large development teams where certain developers aren’t as experienced as others with ASP. The more experienced developers can create classes that encapsulate some of the more difficult functionality, and the more novice developers can simply use these classes to accomplish the common ASP tasks!

  • Dynamic evaluation and execution. Dynamic evaluation allows a code snippet contained in a string to be evaluated as though it had been entered directly by the programmer creating the script. Dynamic execution allows a code snippet contained in a string to be executed as through it had been entered directly by the programmer. By employing the use of dynamic evaluation and execution, a script can achieve flexibility not previously available. We’ll discuss how to perform dynamic evaluations and executions in Section 4.3.

If you are running ASP 3.0, you already have the VBScript 5.0 scripting engine. If you are running an older version of ASP/IIS, be sure to download the latest Microsoft scripting engines from: http://msdn.microsoft.com/scripting/.

Using the RegExp Object

The RegExp object provides the VBScript engine with a means to perform regular expression pattern matching. We’ve discussed regular expressions previously; in Chapter 2, a discussion on using regular expressions in JScript and PerlScript was presented.

RegExp’s Properties

Take a moment to look back at Chapter 2 to see JScript’s implementation of regular expressions. Note that the regular expression syntax contains a pattern and an optional switch. The switch can have one of three values:

i

Ignore case

g

Perform a global search for all occurrences of pattern

gi

Perform a global search for all occurrences of pattern, ignoring case

The RegExp object contains three properties that allow you to set the pattern and these switches for regular expression usage in VBScript. These three properties are Pattern, IgnoreCase, and Global. Pattern expects a string, and is the regular expression pattern to search for. Global is a Boolean value indicating whether the regular expression search should match all occurrences in a string or just the first one. If not specified, Global defaults to False. IgnoreCase is also a Boolean value, indicating whether or not a regular expression search is case-sensitive. By default, IgnoreCase is set to False.

Legal Regular Expression Syntax

The Pattern property contains the regular expression. A regular expression pattern is not restricted to just simple strings. The pattern can also contain special characters, which allow for much more sophisticated searches. Table 4.1 contains a listing of these special characters and their meanings.

Table 4-1. Special Characters in Regular Expression Patterns

Symbol

Description

Any alphanumeric character

Matches the alphanumeric character(s) literally.

\

Indicates that the following character is a special character or a literal. For example, a pattern containing “b” matches the character “b,” while “\b” matches a word boundary.

^

Matches the beginning of a string. For example, “^A” would match only the first “A” in “ASP is Awesome.”

$

Matches only the end of a string. For example “$d” would match the last “d” in “Todd is mad.”

\b

Matches a word boundary. A word boundary exists between two characters, where one of the characters is a word character and the other is not. Furthermore, the beginning and end of a string are considered word boundaries. For example, if you searched for “\bscience\b” in “science has no conscience,” only the first word of the string would be returned. The “science” in “conscience” would not be matched since “science” is not preceded by a word boundary.

\B

The opposite of \b. Matches any word boundary.

[abc...]

Matches any single character that exists between the braces. For example, “[aeiou]” would match the first vowel found in a string. You can also use the hyphen for a range of characters. “[a-m]” would match the first occurrence of a character belonging in the first half of the alphabet.

[^abc...]

Matches any single character not between the braces. For example, “[^aeiou]” matches the first consonant found in a string. (You can also use the hyphen to represent a range of characters.)

\w

Matches any word character. A word character is one that contains an alphanumeric character or an underscore.

\W

Matches any nonword character.

\d

Matches any digit. Functionally identical to [0-9].

\D

Matches any nondigit.

\s

Matches any space character (including a space, a newline character, a carriage return, or a tab).

\S

Matches any nonspace character.

.

Matches any character other than \n: functionally identical to [^\n].

\n

Matches a newline character.

\r

Matches a carriage return.

\t

Matches a tab.

{n}

Matches exactly n occurrences of a regular expression. For example, “\w{10}” matches 10 consecutive word characters.

{n,}

Matches n or more occurrences of a regular expression. For example, “\d{2,}” matches two or more consecutive digits.

{n,m}

Matches between n and m occurrences of a regular expression. For example, “\w{2,4}” matches either two, three, or four consecutive word characters.

?

Matches zero or one occurrences; functionally identical to {0,1}. For example, “\w\d?” matches a word character followed by zero or one digits.

*

Matches zero or more occurrences; functionally identical to {0,}.

+

Matches one or more occurrences; functionally identical to {1,}.

( )

Used to group a series of symbols. For example, “xyz?” matches “xy” and “xyz,” while “x(yz)?” matches “x” and “xyz.”

|

Matches either one of two groups. For example, “(Scott)|( James)” matches “Scott” or “James.”

Tip

To search for a literal that is also used as a special symbol, precede the literal with a backslash. For example, to match a question mark character in a string, use the pattern “\?”, since the question mark is a special symbol for regular expressions.

A regular expression can contain any number of the special symbols and literals listed in Table 4.1. Regular expressions provide a powerful tool for validating input. For example, imagine that a user will enter his social security number. The expected format is ###- ##- ####. This can be validated with the regular expression:

\d{3}-\d{2}-\d{4}

which checks for three digits, followed by a dash, followed by two digits, followed by a dash, followed by four digits. A regular expression that could validate a phone number in either ###-###-#### or (###) ###-#### format could be:

(\d{3}-\d{3}-\d{4})|(\(\d{3}\) \d{3}-\d{4})

Note that when matching a literal left or right parenthesis, the parenthesis needs to be preceded by a backslash .

RegExp’s Methods

The RegExp object contains three methods: Test, Replace, and Execute. The first method, Test, accepts one parameter, which is the string to apply the regular expression to. If a match is found, Test returns True; otherwise, it returns False. For example, if you asked a user to input her name, you want to make sure only letters, apostrophes, and hyphens exist within the person’s name. As Example 4.1 illustrates, you can use the Test method to quickly determine whether the name entered by the user consists of any characters other than the set of accepted characters.

Example 4-1. Using the Test Method to Validate a String

<% @LANGUAGE = "VBScript" %>
<% Option Explicit %>
<%
   Dim objRegExp
   Set objRegExp = New RegExp   'Create a RegExp object instance

   'Set the pattern (allow all letters, apostrophes, and hyphens)
   objRegExp.Pattern = "[^a-z' \-]"

   'Ignore case
   objRegExp.IgnoreCase = True

   'See if the regular expression is found in strName
   Dim strName
   strName = "Scott Mitchell"
   If objRegExp.Test(strName) then
      Response.Write strName & " is not a valid name!<BR>"
   Else
      Response.Write strName & " is a valid name!<BR>"
   End If

   strName = "Roger O'Grady"
   If objRegExp.Test(strName) then
      Response.Write strName & " is not a valid name!<BR>"
   Else
      Response.Write strName & " is a valid name!<BR>"
   End If

   strName = "Tim 7sten"
   If objRegExp.Test(strName) then
      Response.Write strName & " is not a valid name!<BR>"
   Else
      Response.Write strName & " is a valid name!<BR>"
   End If
%>

We set the RegExp’s Pattern property to search for the occurrence of a character that is not a letter, not an apostrophe, not a space, and not a hyphen. We want to ignore case, so we set the IgnoreCase property to True. Then we use the Test method to see if any of the illegal characters exist within the string strName. If Test returns True, then the name is invalid; if Test returns False, then the name is valid. (In the above example, the first two names are valid while the last is invalid.)

The Replace method expects two string parameters. The first parameter is the string to which to apply the regular expression; the second parameter contains the text to be used to replace matching occurrences; a new string reflecting the appropriate substitutions is then returned by the function. For example, if we wanted to replace all instances of the acronym “asp” with “ASP,” we could use the Replace method as shown in Example 4.2.

Example 4-2. Using the Replace Method

<% @LANGUAGE = "VBScript" %>
<% Option Explicit %>
<%
   Dim objRegExp
   Set objRegExp = New RegExp   'Create a RegExp object instance

   'Set the pattern (allow all letters, apostrophes, and hyphens)
   objRegExp.Pattern = "\basp\b"

   objRegExp.IgnoreCase = True  'Ignore case
   objRegExp.Global = True      'Make all possible changes

   Dim strSentence
   strSentence = "Asp is a fun language.  I aspire to learn asp!"

   Response.Write "<B>Before Replace</B><BR>" & strSentence
   Response.Write "<P><B>After Replace</B><BR> "

   'Use the replace method!
   strSentence = objRegExp.Replace(strSentence, "ASP")

   Response.Write strSentence
%>

Note that the regular expression used did not simply search for “asp,” but rather for “\basp\b.” Recall that the “\b” special symbol searches for a word boundary. If the regular expression contained just “asp,” the “asp” in “aspire” would also have been capitalized.

The third and final method of the RegExp object is Execute. The Execute method takes one parameter like the Test method, a string to which to apply the regular expression. The Execute method returns a Matches collection, which contains a Match object for each successful regular expression match.

The Match object contains three read-only properties: FirstIndex, Length, and Value. FirstIndex contains the position in the string where the match occurred. Unfortunately, FirstIndex is zero-based; VBScript, as you probably know, indexes its strings starting at one. In other words, you have to add one to the value of FirstIndex to actually identify the starting position of the substring found by the Match object. As its name suggests, Length contains the total length of the matched string. The final property, Value, contains the matched text.

In Example 4.2, we used the Replace method to find all the instances of “asp” and replace them with “ASP.” We can use the Execute method to grab all of the matches. The code in Example 4.3 uses the Execute method to return a Matches collection; next, the script iterates through the Matches collection, outputting the properties of each of the individual Match objects.

Example 4-3. Using the Execute Method and the Matches Collection

<% @LANGUAGE = "VBScript" %>
<% Option Explicit %>
<%
   Dim objRegExp
   Set objRegExp = New RegExp   'Create a RegExp object instance

   'Set the pattern (allow all letters, apostrophes, and hyphens)
   objRegExp.Pattern = "\basp\b"

   objRegExp.IgnoreCase = True  'Ignore case
   objRegExp.Global = True      'Make all possible changes

   Dim strSentence
   strSentence = "Asp is a fun language.  I aspire to learn asp!"

   'Use the execute method to obtain a matches collection
   Dim objMatches, objMatch
   Set objMatches = objRegExp.Execute(strSentence)

   Response.Write "There were " & objMatches.Count & " matches.<BR>"
   Response.Write "<P><HR><P>"

   'Loop through each Match object in the Matches collection
   Dim iCount
   iCount = 1
   For Each objMatch in objMatches
     Response.Write "<B>Match " & iCount & "</B><BR>"
     Response.Write "FirstIndex = " & objMatch.FirstIndex & "<BR>"
     Response.Write "Length = " & objMatch.Length & "<BR>"
     Response.Write "Value = " & objMatch.Value & "<BR>"
     Response.Write "<P><HR><P>"

     iCount = iCount + 1
   Next

   Set objRegExp = Nothing
%>

Since the Execute method returns an object, be sure to use the Set keyword when assigning the Matches collection returned by Execute to a variable. Also, since the Matches collection is, after all, a collection, you have access to the basic methods and properties of a collection, such as the Count property. Furthermore, you can completely iterate through the Matches collection using a For Each ... Next loop.

The code in Example 4.3 will generate the output shown in Figure 4.1. Note that the first instance of “Asp” is at the beginning of the string strSentence, so the Match object reports its FirstIndex property as zero instead of the more VBScript-friendly one.

The web page produced by Example 4.3

Figure 4-1. The web page produced by Example 4.3

Personally, I find regular expressions to be neat and fun to use, partially because no other method allows for such powerful string parsing with such convoluted code. For example, what does the following regular expression match?

^\s*((\$\s?)|(£\s?))?((\d+(\.(\d\d)?)?)|(\.\d\d))\s*(UK|GBP|GB|USA|US|USD)?)\s*$

(This regular expression is an example from Microsoft’s web site: http://msdn.microsoft.com/workshop/languages/clinic/scripting051099.asp.)

I think you’ll find regular expressions are much easier to build than to analyze after the fact. If you know what patterns you wish to search for, you will most likely be able to create a valid regular expression. However, if presented with an unwieldy regular expression, I find it a bit more difficult to work backwards and predict what type of data the regular expression was intended to validate .

That about wraps up using regular expressions in VBScript. While we looked at the code needed to perform a regular expression search in VBScript, we only touched upon how to effectively use regular expressions to achieve powerful string parsing. To truly master regular expressions, you’ll almost certainly need an entire book dedicated to the subject, such as O’Reilly’s Mastering Regular Expressions. You can also visit Microsoft’s scripting site (http://msdn.microsoft.com/scripting/ ) for more information on the RegExp object.

Get Designing Active Server Pages now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.