Using Regular Expressions in LoadRunner

It has previously been identified how to enable Regular Expressions in LoadRunner. A big thanks to CharlieTim for getting this working, andDmitry for proposing the challenge.
In this post, I am to going demonstrate a practical use of bolting the regular expression engine ontop of LoadRunner. After all, it is more effort than LB/RB. So why go to all the trouble? Hopefully the following example will demonstrate a scenario where regular expressions can be invaluable.

The first step is to bolt-on the regular expression engine.
If you have read any of the previous posts on RegEx in LoadRunner, you can skip this section, as Dimitry has explained this process in detail.
The following three files are required for enabling RegEx in a LoadRunner script.
  • pcre3.dll – The Perl Comparable Regular Expression Library
  • pcreposix.h – The PCRE (Posix Compatible) Expression Library header file
  • regex.h – The LoadRunner Regular Expression Library
These files can be embedded in your script, making the script more portable between your script development machine and the load generators. Right clicking on the action pane will allow you add files to your script.Add Files to Script
After adding these files to your script, your action pane should look something like this.RegEx Action Pane
Next, you will need to comment out the stdlib include line from pcreposix.h
//#include
Finally, we have to add regex.h and pcreposix.h header files to globals.h via the following lines.
#include "pcreposix.h"
#include "regex.h"
The following code used is based on that of Tim Koopmans, with one main change from Tim’s functions. By changing “REG_EXTENDED” to “REG_DOTALL”, our expressions now let the “.” character to match everything, including newlines. This allows for matching such as in this example.
Note that this can also be achieved with the “REG_NEWLINE” option. There is a whole kettle of fish here, and “REG_DOTALL” works for me. If you want to investigate it more, it’s described in detail in the PCRE Documentation.
// regex.h
// PCRE Regular Expression Function Library

buffer() {
  // This will save a 2MB buffer of the response body when called
  web_reg_save_param("buffer", "LB=", "RB=", "Search=Noresource", LAST);
  return 0;
}

match(const char *string, char *pattern, char *match, int matchnum) {
  // The match function will return 0 if match found
  //          1 if match not found
  //        2 if pattern incorrect
  // The match will be placed into paramter "{match}"

  int  status;
  int  eflag;
  char buf[1024] = "";
  char out[1024] = "";

  regex_t re;
  regmatch_t pmatch[128];
  lr_load_dll("pcre3.dll");

  if((status = regcomp(&re, pattern, REG_NEWLINE)) != 0){
    regerror(status, &re, buf, 120);
    lr_output_message("Match PCRE Exit 2");
    return 2;
  }

  if(status = regexec( &re, string, 10, pmatch, eflag) == 0) {

    strncpy(out, string + pmatch[matchnum].rm_so, pmatch[matchnum].rm_eo - pmatch[matchnum].rm_so);
    lr_save_string(out, match);
    eflag = REG_NOTBOL;
    regfree(&re);
    string = "";
    return 0;
  } else {
    lr_log_message("Match not found");
    // match not found
    regfree(&re);
    string = "";
    return 1;
  }
}

replace(const char *string, char *pattern, char *replace, char *match) {
  int length;
  int  status;
  int  eflag;
  char buf[1024] = "";
  char out[1024] = "";

  regex_t re;
  regmatch_t pmatch[128];
  lr_load_dll("pcre3.dll");

  if((status = regcomp(&re, pattern, REG_DOTALL)) != 0){
    regerror(status, &re, buf, 120);
    lr_output_message("Match PCRE Exit 2");
    return 2;
  }

  while(status = regexec( &re, string, 1, pmatch, eflag)== 0){
    //lr_output_message("match found at: %d, string=%s\n",
    //  pmatch[0].rm_so, string + pmatch[0].rm_so);

    strncat(out, string, pmatch[0].rm_so);
    strcat(out, replace);
    string += pmatch[0].rm_eo;
    eflag = REG_NOTBOL;
  }
  strcat(out, string);
  lr_save_string(out, match);
}
Now we have the setup, out of the way, lets get into it.
I have chosen Slashdot.org as the application under test. This page has dynamic content (and contains the some interesting html), so it’s a great example. The page at the time of writing looks like this.
Slashdot.orgSay we wanted to click on the third item in the Quick Links sidebar menu. At present, it’s “Penny Arcade”.

The name suggests that the topics may change from time to time. How do we keep our script from breaking when the link name changes? The answer is regular expressions.
As LoadRunner reads the HTML of a webpage, let’s have a look at the code for the quick links sidebar

Quick Links

To start our Regular Expression, we are going to look for something we can always identify and work from there. In this case it’s “id=”index_qlinks-content””. From there we know that our Anchor link will be after the third occurrence of “<br>”. Anything in between the “id=”index_qlinks-content”” and “<br>” is irrelevant to us. Translating this into a regular expression could look like this:
id="index_qlinks-content".*?href=.*?href=.*?href="(.*?)\">
Notice that it contains the highlighted sections of the code remain, and we replace anything that we don’t care about with “.*?”.  The part of the expression that we wish to extract is surrounded by brackets. Now let’s look at how this translates to the LoadRunner function.
match(lr_eval_string("{buffer}"),
      "id=\"index_qlinks-content\".*?href=.*?href=.*?href=\"(.*?)\">",
      "EXTRACT_QuickLink_Item",
      1);
This will read the string stored in the parameter “buffer”, apply the RegEx above, and extract the value to “EXTRACT_QuickLink_Item” and save the value in the 1st brackets (match number)
Using regular expression this script will continue to work even when the links change. We can extend this further to click on a random link in our quick links menu. Assuming there will always be 7 items in this menu, we first create a random number parameter between 1 and 7.LoadRunner Random Number Parameter
Next we extend our regular expression to capture each href value in the list. Note that I have documented the RegEx over multiple lines to increase readability.
id="index_qlinks-content".*?
href=\"(.*?)\"      // Match 1
href=\"(.*?)\"      // Match 2
href=\"(.*?)\"      // Match 3
href=\"(.*?)\"      // Match 4
href=\"(.*?)\"      // Match 5
href=\"(.*?)\"      // Match 6
href=\"(.*?)\"      // Match 7
Now we put it all together, using the random number as the match number, noting that we have 5 possible matches (defined by brackets). Our final LoadRunner command looks like this. Note that I have trimmed the RegEx to its bare bones.
match(lr_eval_string("{buffer}"),
    "id=\"index_qlinks-content\".*?href=\"(.*?)\".*?href=\"(.*?)\".*?href=\"(.*?)\".*?href=\"(.*?)\".*?href=\"(.*?)\"href=\"(.*?)\".*?href=\"(.*?)\"",
    "EXTRACT_QuickLink_Item",
    atoi(lr_eval_string("{RandomNumber1to7}")));
This will save the href of a random quick link from our list of 7 available to the parameter “EXTRACT­_QuickLink_Item”.
The full script will look something like this:
Action()
{
 // Save the entire webpage into a the buffer parameter
 buffer();

 lr_start_transaction("Regular_Expressions_01");

 web_url("web_url",
  "URL=http://slashdot.org",
  "TargetFrame=",
  "Resource=0",
  "Referer=",
  LAST);

 lr_end_transaction("Regular_Expressions_01", LR_AUTO);

 match(lr_eval_string("{buffer}"),
    "id=\"index_qlinks-content\".*?href=\"(.*?)\".*?href=\"(.*?)\".*?href=\"(.*?)\".*?href=\"(.*?)\".*?href=\"(.*?)\"href=\"(.*?)\".*?href=\"(.*?)\"",
    "EXTRACT_QuickLink_Item",
    atoi(lr_eval_string("{RandomNumber1to7}")));

 return 0;
}
Important: In LoadRunner, we remove the need for expression delimiters (eg. The / in /\w+/). We also add in an additional requirement of escaping the \ character. Instead of \w matching a word character, we must use \\w instead. This is due to the C language considering \ a special character.
The reasoning for excluding RegEx from LoadRunner is often based around the additional processing overhead of the regular expression engine. While RegExp do require slightly more processing power, the increased flexibility they provide can be invaluable. The rule of thumb is to use LB/RB whenever possible and keep RegEx for the rest.
Regular Expressions do however open up a world of possibilities and hopefully may get you out of a tight spot next time you are scripting.

2 comments: