WEB_REG_SAVE_PARAM_REGEXP() – A REGEXP PRIMER FOR PERFORMANCE TESTERS

LoadRunner 11 and later versions come with the long overdue feature of being able to use regular expressions to correlate values. The standard web_reg_save_param_ex() function relies on left and right boundaries and some simple attributes like length, offset and ordinal to narrow down searches. This is generally functional, but regexps are better. They are more accurate, faster and more reliable. There’s a reason why they are arguably the de-facto standard method for extracting values from strings.
JMeter uses regexps as standard and, as a result, once you have a solid understanding of them, I think that it is substantially easier and faster to correlate scripts using this tool. It’s said that it is a steep learning curve to learning regexps, but personally, I reckon you can get the basics in a couple of hours. Besides, any half decent load tester should have this skill even if they don’t use it to correlate; it is extremely useful for data manipulation and this is a common requirement in performance testing. Simply just knowing regexps, awk and sed together is going to solve 90% of your data manipulation needs.
Before getting into detail, anyone starting out with regexps will want a handy regexp tester, like rubular.com. There’s lots and lots of others.
So, a recent post I read asked how to correlate the string 1945 from this json response:
"...:[{"containerName":" ","containerSize":"12","containerStartRow":"0","containerEndRow":"0","rows":[[“NW,RO,RA","DLY","10/07/2011","10/17/2011","10/01/2011","RA","Y","FR","AMEA","AC","1945","50","50","AC 100IOSH-08","UserDefined","10/07/2011","Reassigned"..."
Classically, in LoadRunner you would try to use left and right boundaries but this gets horrible with json.
The obvious LB and RBs here would be
“LB=\”AC\”, \”",
“RB=\”",
But the problem with this is what if AC is dynamic too? It probably is. A reliable correlation would have to use the unique text “rows”: [[ (I'm assuming this is unique.) but then you'd have to end at ] and you’d end up capturing the whole string and be left with some fun C string manipulation to get the required value.
Another method might be to use SaveOffset but the risk here is that one or more values might have dynamic lengths.
There are probably some ways it could be done – there are always ways – but using a regular expression using web_reg_save_param_regexp() is probably better.
The syntax for this function is:
int web_reg_save_param_regexp("ParamName=<output parameter name>", "RegExp=regular_expression", [<List of Attributes>,] [<SEARCH FILTERS>,] LAST );
where Attributes and SEARCH FILTERS are standard. This is pretty simple so I will focus on just the regexp syntax from here on.
In the case of the json sting above, one regexp you could use is:
rows":\[\[“[^"\r\n]*","[^"\r\n]*","[^"\r\n]*","[^"\r\n]*","[^"\r\n]*","RA","[^"\r\n]*","FR","AMEA","[^"\r\n]*","([^"\r\n]*)"
This will return:
1945
That might look like a bunch of gibberish but it’s actually nice and logical and I think most performance testers should be able to grasp the concept and then you just need to learn a few rules. There’s lots of stuff out there so I won’t repeat it here.
But one thing that is worth highlighting here is the use of:
([^"\r\n]*)
This can actually be simplified as:
([^"]*)
It basically means match everything that is not a double quote. The previous expression matched everything that was not a double quote nor a newline. This is really useful, you can use it to capture any value that is enclosed in double quotes which, frankly, makes up a large part of correlation.
If your response contained something like:
<input type=”hidden” name=”__VIEWSTATE” id=”__VIEWSTATE” value=”JHGYTFDIUSI”
Then the regexp would be:
VIEWSTATE” value=”([^"]+)”
(Technically, it should be VIEWSTATE”\s+value=”([^"]+)” where the \s+ matches on white space(s) – it’s safer.)
That’s the basics, but what makes regexps so great is that they can do so much more, and this is exactly why they are far superior to the old wrsp() with it’s clunky boundaries.  The json string given above is one example where a regexp would work better but once you get the hang of them it’s surprising what you can do. For example, multiple matches.
Multiple matches work in the same way as with the classic wrsp() using boundaries, you specify Ordinal=ALL and get param_1, param_2…param_count, etc. which by itself is useful. But actually, regexps can do even more than this, you can insert multiple parenthesis intoa single regexp to get multiple groups and if any of these groups match multiple times then you create a multi-dimensional array – give this a little thought and you will realise the potential. Sadly, LR11 & 12 only support using a single matching group so no multi-dimensional arrays yet in HP land.
The lack of support for multiple groups is a shame, in JMeter you can have a regexp like:
rows":\[\[“[^"\r\n]*","([A-Z]{3})","[^"\r\n]*","[^"\r\n]*","[^\/]+\/[\d]+?\/2011","[A-Za-z]*","[^"\r\n]*","[^"\r\n]*","([^"\r\n]*)","[^"\r\n]*","([^"\r\n]*)"
which returns multiple matches looking like:
1    {param_g1} = DLY
2    {param_g2} = AMEA
3    {param_g3} = 1945
Instead, for LR, you’d need multiple wrsp_regexp() statements.
In general, one thing you need to be aware of when using regexps in load testing is greediness causing backtracking – this is crucial, if you don’t take care, you’ll eat CPU on your Load Generator machines.
Note. There is also a web_reg_save_param_xpath() function which works better for XML responses. This is also a long time feature of JMeter

1 comment:

  1. Looks like there's a lack of familiarity/use of the web_reg_save_param function in this article. There is no mention of "text flags" which addresses the issue. I suggest the reader tries both function and decide which is a better option.

    ReplyDelete