Web applications Performance Symptoms and Bottlenecks Identification

Abstract
IntroductionWeb Applications ArchitectureWeb Applications Symptoms of Performance Bottlenecks
  • Extended response time of user
  • Extended response time of server
  • High CPU usage
  • Invalid data returned
  • HTTP errors (4xx, 5xx)
  • Lots of open connections 
  • Lengthy queues of requests
  • Memory leaks
  • Extensive table scans of database
  • Database deadlocks
  • Pages  unavailable
  • Load balancing ineffectiveness
  • Network interface card insufficient/poor configuration
  • Very tight security
  • Inadequate over all bandwidth
  • Pathetic network architecture
  • Broken links
  • Inadequate transaction design
  • Very tight security
  • Inadequate hardware capacity
  • High SSL transactions
  • Server poorly configured
  • Servers with ineffective load balancing
  • Less utilization of OS resources
  • Insufficient throughput
  • Memory leaks
  • Useless/inefficient garbage collection
  • DB connections poor configuration
  • Useless/inefficient code transactions
  • Sub-optimal session model
  • Application server poor configuration
  • Useless/inefficient hardware resources
  • Useless/inefficient object access model
  • Useless/inefficient security model
  • Less utilization of OS resources
  • Inefficient/ineffective SQL statement
  • Small/insufficient query plan cache
  • Inefficient/ineffective SQA query model
  • Inefficient/ineffective DB configurations
  • Small/insufficient data cache
  • Excess DB connections
  • Excess rows at a time processing
  • Missing/ineffective indexing
  • Inefficient/ineffective concurrency model
  • Outdated statistics
  • Deadlocks
For rich internet applications with lots of images, videos, etc, the client side aspects have a bigger bearing on the actual response time when compared to the server side response time and should be given due importance.
Using many modern AJAX architectures, it is possible to place so much code in the client that a significant amount of time is required before the request is transmitted to the application server. This is particularly true for underpowered client machines with inadequate memory and slow processors.
  • Slow CSS Selectors on Internet Explorer
  • Slow executing external services
  • Multiple CSS Lookups for same object
  • Extensive XHR Calls
  • Large DOM
  • Expensive DOM Manipulations
  • Extensive Visual Effects
  • Extensive JavaScript files
  • Extensive Event Handler Bindings
  • Too fine granular logging and monitoring
  • Size of page increases 
  • Third party services utilize more bandwidth utilization
  • Not decreased resources
  • Inadequate response time of third party component provider

Web Applications Performance Bottlenecks

Conclusion
Performance bottleneck identification is an uphill task in web applications testing because there could be various reasons behind a slow performing web application. Performance bottlenecks identification requires complete non-functional (architectural and behavioral) information of the application as well as necessary input from all stakeholders and exhaustive performance testing activity to achieve the desired outcomes. In this paper we will identify various performance symptoms, potential problematic areas and bottlenecks of each web application component which needs to be focused during performance testing. 
Application current state against its performance goals is found out by conducting performance testing and its behavior in expected and unexpected conditions in production environment is also measured by the same testing process. Performance testing ensures that all the application bottlenecks are identified which will cause bad user experience. It is hard to identify performance bottlenecks without prior knowledge of all the possible performance issues and their symptoms in each problematic area of the application. Unless all the performance bottlenecks and their root causes are identified, it’s tough to make the application perform better.
Performance testing is a team effort where all stakeholders like business owner, business analyst, marketing team, network team, development team, QA and performance testing team participate and involve making it happen. Performance tester needs stakeholders to give proper input to understand the application architecture, its user’s behavior and application performance goals in order to successfully isolate the required load and to identify performance bottlenecks successfully.
A performance testing report that simply state “Application is not performing to the expectations” is not enought. You have to provide the reasons to justify its weak performance and solutions to improve the same for desired results. Stakeholders will be more interested in knowing “Why application is slow” and “What should be done to fix the issues” rather than just knowing the basic information only. Therefore, identifying the bottlenecks and their root causes is the core of performance testing and a successful activity will always provide complete details of these questions.
Following we will identify sources of bottlenecks, typical performance symptoms and major bottlenecks of every component of web application architecture.
It is important to understand the web applications architecture before listing their performance bottlenecks because it will help in understanding the impact of every bottleneck.
In software engineering, different client-server architectures are used for web application development. These architectures logically differentiate among data presentation, application processing and data management functions.  3-tier architecture (web server, application server and database server) is the most famous N– tier client-server architecture used for web application development at enterprise level.
web client server db            
Following is the brief description of each components of the 3-tier web application architecture.
Network Devices
Although network devices don’t make any tier of the 3-tier architecture but different network devices like network interface card, firewall, cables, load balancer, routers etc. are used to support and connect other components.
Web Server                              
Web server which is on 1st tier in 3-tier architecture consists of low capacity computer(s) that receive user requests, send them to required server and show received results to users.
Application Server 
Application server is 2nd tier in 3-tier architecture having one or more medium capacity computer(s) that receive user requests from the web server, apply business logic on them and send them back to the web server. 
Database Server
Database server is in the last tier of 3-tier client-server architecture which normally consists of a high capacity computer with stand-by facility that manages database access to facilitate user data requests.
Sources of Bottlenecks
As discussed the web applications architectural components (web server, application server, database server and network resources) in our previous section, these are the potential areas where most of performance bottleneck resides.
It is tough to test each and every component’s performance thoroughly. Server hardware and network resources are assumed to be the main culprits for lower performance. Especially, server upgrades are considered as the best source of performance optimization and that’s why this old saying “When all else fails, throw more hardware at it” is recalled. However, different studies and experiences reveal that it is the application code which mainly causes the performance bottlenecks. Based on experience, following are the statistics of performance bottlenecks probability in each of its component.
 source  bottleneck
These results show that 76% of the performance bottlenecks appear in the application and database servers where most of the application code resides and this code causes most of the issues. Meanwhile, less percentage of bottlenecks is produced due to web server and network resources. Identifying bottleneck source is important to optimize the application performance'.

Once all the performance bottleneck sources have been spotted, next step is to identify the application performance issues. People assume that conventional performance optimization techniques (i.e. increase server memory, optimize application code, displace the server, change the database indexing, network/internet up gradation etc.) without identifying the root cause, will be sufficient enough for improvements. These techniques may work sometimes but do not guarantee a permanent fix and mostly they end up with wasting a lot of efforts and money. 
Diagnosing the web application performance problems is a demanding task. An application would have various types of issues like functional, usability, security, cross browser compatibility and performance etc. In such situations, it is extremely important to differentiate the performance issues from others appropriately. 
There is a long list of web applications potential performance bottlenecks and some of them are as following,

Application architecture is formed by several components and there could be dozens of bad performance symptoms in each component. Being a good performance tester, one must know the list of performance symptoms on each tier to diagnose bottlenecks effectively.
Below is the detailed list of symptoms of each of the web applications 3-tier component.
Network Performance Bottlenecks
Network bottlenecks contribute very little however they are important enough to be discussed in detail because you cannot afford minor issues as well because they can lead to disasters. Following are the major network performance symptoms in context of 3-tier web applications,
Network performance bottlenecks don’t have any certain source. Load balancing, security and network architecture can be the major sources. Below pie chart depict percentage of each source to illustrate their impact on performance bottlenecks.
 network performance bottleneck
Web Server Performance Bottlenecks
Like network performance bottlenecks, web server bottlenecks don’t have major contribution to the performance issues as well. Web servers act as a liaison between client and processing servers (application and database). So web server performance bottlenecks need to be addressed properly since they can affect other components performance to great extent.
Below is the list of bottlenecks which can affect web server performance,   
Secure transactions has major contribution to web server performance bottlenecks however usually it is load balancing as well and sometimes high resource specialized functions cause web server performance. Below is the graphical representation of each web server performance bottlenecks with percentage.
web server performance bottlenecks 
Application Server Performance Bottlenecks
Business logic of an application resides on the application server. Application server hardware, software and application design can affect the performance to great extent. Poor application server performance can be a critical source of performance bottlenecks.
Below is the list of application server bad performance causes,   
Object caching, SQL and database connection polling are the main causes of application server bottlenecks and they contribute 60% to the application server. 20% of the times inefficient application server causes poor performances.  
Below is the complete detail of application server bottlenecks with their impact.
application server bottlenecks
Database Server Performance Bottlenecks
Database performance is most critical for application performance as this is the main culprit in performance bottlenecks. Database software, hardware and design can really impact the whole system performance.
Following is the comprehensive list of database poor performance causes,    
Bad SQL and indexes contributes nearly 60% to the database server performance bottlenecks. Below chart will show complete detail of database server causes with percentage.
database server performance 
Client Side Performance Bottlenecks
The client side performance aspects as undercome an increased interest in the last years with the release of Google performance optimization best practices like caching, lesser number of static files, file minification, compression, java script processing time, page rendering, etc.
Top 10 client side performance symptoms:

Third Party Services Performance Issues
Today web applications heavily relay on third party components which affect the page loading and result in bad user experience. It is a common practice that third party tools are not properly analyzed from performance point of view before integrating them into the application. If you ever observed the page components load time, you would have experienced that third party components take more time. These third party components can cause various performance bottlenecks but following are the most common,


Web applications are becoming complex day by day and at the same time, the identification of their performance bottlenecks is becoming a tough task. Various factors contribute to the performance issues and knowledge of their issues & their symptoms is mandatory to rectify the performance bottlenecks. Web server, application server, database server hardware, software and design along with network configurations can be major contributors to performance bottlenecks. Moreover, client side design and third party components can also affect web application performance. Knowledge of a complete list of symptoms of all potential problematic areas will help in identifying the root cause and its remedy.

Tibco Business Events Entity Cache Performance Trap

Thanks again for another great story from A. Alam – a performance engineer working for Infosys Ltd. who conducts large scale load tests for a very large enterprise. Mr. Alam and team shared this story from a JMeter load test they ran in their production Tibco Business Events environment. The load test was meant to verify performance of item availability system which manages rest calls and large scale Inventory Changes of ecommerce, mobile apps and retail stores.
Using Dynatrace on their Tibco servers helped them identified the sporadic spikes across the whole system when their tested transaction load exceeded 500 TPS (Transactions per Second). The following Dynatrace chart shows these spikes. Each line represents a different test script with response time being captured by Dynatrace on the actual Application Server:
Response time spiked across most of the simulated transactions once load exceeded 500 TPS (Transactions per Second)
Tibco Environment ExplainedArchitectural Diagram of their Tibco Environment. Active Users and Load Test both rely on Active SpacesPurePath explains Active Space Object Retrieval InternalsInternals of the Active Space API: When 10k ObjectIDs are exhausted the next batch is requested from Active SpaceRoot Cause: Exhausted Active Space ServerSolution: Increase Batch Size
Response time spiked across most of the simulated transactions once load exceeded 500 TPS (Transactions per Second)
I am not an expert in Tibco – but thanks to our friends from Infosys, the insight of Dynatrace PurePath and the help of the Tibco Engineering team the problem was identified in Active Space. Turns out that when querying large Object Sets (>10k) queries get split into multiple requests to Active Space causing much more load than anticipated. The exact technical problem, how they found it and the chosen workaround will be discussed in the remainder of the blog.
To give you a better understanding of their environment check out the architectural diagram below. It shows the Tibco BE REST Service that is used by different end consumers to access inventory data managed by Active Space. It also shows the CDC Load coming from Tibco EMS and how it is used to test the Cache Service which also queries and updates data from Active Space.
During their load test – when load reached 400-500 TPS – performance spikes were both seen in CDC messages processing and also reported by the end users of the Tibco BE REST Service at the same time:
Architectural Diagram of their Tibco Environment. Active Users and Load Test both rely on Active Spaces
Tibco BE has a default batch setting to retrieve 10k Object IDs from the BE cache. If more than 10k Object IDs are requested several roundtrips to Active Space are necessary in order to retrieve the next batch size. They identified this behavior by looking at the PurePath’s captured in both the REST Service but also Cache Services. The PurePath showed them that the method nextEntityId goes off and keeps requesting multiple batches of 10k Objects from Active Space in case more than 10k Objects are requested. It does this by sending an asynchronous request to Active Space and letting the main thread wait until the next batch size is available:
Internals of the Active Space API: When 10k ObjectIDs are exhausted the next batch is requested from Active Space
As both the Tibco BE REST Interface and the Cache Service via the Load Test were requesting Objects in the millions per request the Active Space service was simply overloaded with too many parallel requests. If you think about it: an average request for 1 million objects results in 100 roundtrips. This means that the Active Space Server on average gets 100 more requests than are sent to REST and the Cache Service.
The solution that they chose was one given by the Tibco engineering team: Increasing the Object Entity Batch size from 10k to 10 million. This allowed both the REST interface and the Cache Service to fetch data from Active Space with a single roundtrip. This took a lot of load off of Active Space and therefore improved overall performance. It has yet to be seen if there are any other side effects, e.g: higher memory usage due to larger batch sizes.
It would be interesting to see if other Tibco and Active Space users ran into similar issues and how they solved it. Please let us know. If you don’t know whether your system is facing that issue simply install Dynatrace Free Trial on your Tibco Servers. You can use the Dynatrace Free Trial for this task which gives you exactly these details. Keep us posted on your findings!

Why do we get a sudden spike in response times?






We have an API that is implemented using ServiceStack which is hosted in IIS. While performing load testing of the API we discovered that the response times are good but that they deteriorate rapidly as soon as we hit about 3,500 concurrent users per server. We have two servers and when hitting them with 7,000 users the average response times sit below 500ms for all endpoints. The boxes are behind a load balancer so we get 3,500 concurrents per server. However as soon as we increase the number of total concurrent users we see a significant increase in response times. Increasing the concurrent users to 5,000 per server gives us an average response time per endpoint of around 7 seconds.
The memory and CPU on the servers are quite low, both while the response times are good and when after they deteriorate. At peak with 10,000 concurrent users the CPU averages just below 50% and the RAM sits around 3-4 GB out of 16. This leaves us thinking that we are hitting some kind of limit somewhere. The below screenshot shows some key counters in perfmon during a load test with a total of 10,000 concurrent users. The highlighted counter is requests/second. To the right of the screenshot you can see the requests per second graph becoming really erratic. This is the main indicator for slow response times. As soon as we see this pattern we notice slow response times in the load test.
perfmon screenshot with requests per second highlighted
How do we go about troubleshooting this performance issue? We are trying to identify if this is a coding issue or a configuration issue. Are there any settings in web.config or IIS that could explain this behaviour? The application pool is running .NET v4.0 and the IIS version is 7.5. The only change we have made from the default settings is to update the application pool Queue Lengthvalue from 1,000 to 5,000. We have also added the following config settings to the Aspnet.config file:
<system.web>
    <applicationPool 
        maxConcurrentRequestsPerCPU="5000"
        maxConcurrentThreadsPerCPU="0" 
        requestQueueLimit="5000" />
</system.web>
More details:
The purpose of the API is to combine data from various external sources and return as JSON. It is currently using an InMemory cache implementation to cache individual external calls at the data layer. The first request to a resource will fetch all data required and any subsequent requests for the same resource will get results from the cache. We have a 'cache runner' that is implemented as a background process that updates the information in the cache at certain set intervals. We have added locking around the code that fetches data from the external resources. We have also implemented the services to fetch the data from the external sources in an asynchronous fashion so that the endpoint should only be as slow as the slowest external call (unless we have data in the cache of course). This is done using the System.Threading.Tasks.Task class. Could we be hitting a limitation in terms of number of threads available to the process?

Using Regular Expressions in LoadRunner

It has previously been identified how to enable Regular Expressions in LoadRunner. A big thanks to CharlieTim for getting this working, andDmitry for proposing the challenge.
In this post, I am to going demonstrate a practical use of bolting the regular expression engine ontop of LoadRunner. After all, it is more effort than LB/RB. So why go to all the trouble? Hopefully the following example will demonstrate a scenario where regular expressions can be invaluable.

The first step is to bolt-on the regular expression engine.
If you have read any of the previous posts on RegEx in LoadRunner, you can skip this section, as Dimitry has explained this process in detail.
The following three files are required for enabling RegEx in a LoadRunner script.
  • pcre3.dll – The Perl Comparable Regular Expression Library
  • pcreposix.h – The PCRE (Posix Compatible) Expression Library header file
  • regex.h – The LoadRunner Regular Expression Library
These files can be embedded in your script, making the script more portable between your script development machine and the load generators. Right clicking on the action pane will allow you add files to your script.Add Files to Script
After adding these files to your script, your action pane should look something like this.RegEx Action Pane
Next, you will need to comment out the stdlib include line from pcreposix.h
//#include
Finally, we have to add regex.h and pcreposix.h header files to globals.h via the following lines.
#include "pcreposix.h"
#include "regex.h"
The following code used is based on that of Tim Koopmans, with one main change from Tim’s functions. By changing “REG_EXTENDED” to “REG_DOTALL”, our expressions now let the “.” character to match everything, including newlines. This allows for matching such as in this example.
Note that this can also be achieved with the “REG_NEWLINE” option. There is a whole kettle of fish here, and “REG_DOTALL” works for me. If you want to investigate it more, it’s described in detail in the PCRE Documentation.
// regex.h
// PCRE Regular Expression Function Library

buffer() {
  // This will save a 2MB buffer of the response body when called
  web_reg_save_param("buffer", "LB=", "RB=", "Search=Noresource", LAST);
  return 0;
}

match(const char *string, char *pattern, char *match, int matchnum) {
  // The match function will return 0 if match found
  //          1 if match not found
  //        2 if pattern incorrect
  // The match will be placed into paramter "{match}"

  int  status;
  int  eflag;
  char buf[1024] = "";
  char out[1024] = "";

  regex_t re;
  regmatch_t pmatch[128];
  lr_load_dll("pcre3.dll");

  if((status = regcomp(&re, pattern, REG_NEWLINE)) != 0){
    regerror(status, &re, buf, 120);
    lr_output_message("Match PCRE Exit 2");
    return 2;
  }

  if(status = regexec( &re, string, 10, pmatch, eflag) == 0) {

    strncpy(out, string + pmatch[matchnum].rm_so, pmatch[matchnum].rm_eo - pmatch[matchnum].rm_so);
    lr_save_string(out, match);
    eflag = REG_NOTBOL;
    regfree(&re);
    string = "";
    return 0;
  } else {
    lr_log_message("Match not found");
    // match not found
    regfree(&re);
    string = "";
    return 1;
  }
}

replace(const char *string, char *pattern, char *replace, char *match) {
  int length;
  int  status;
  int  eflag;
  char buf[1024] = "";
  char out[1024] = "";

  regex_t re;
  regmatch_t pmatch[128];
  lr_load_dll("pcre3.dll");

  if((status = regcomp(&re, pattern, REG_DOTALL)) != 0){
    regerror(status, &re, buf, 120);
    lr_output_message("Match PCRE Exit 2");
    return 2;
  }

  while(status = regexec( &re, string, 1, pmatch, eflag)== 0){
    //lr_output_message("match found at: %d, string=%s\n",
    //  pmatch[0].rm_so, string + pmatch[0].rm_so);

    strncat(out, string, pmatch[0].rm_so);
    strcat(out, replace);
    string += pmatch[0].rm_eo;
    eflag = REG_NOTBOL;
  }
  strcat(out, string);
  lr_save_string(out, match);
}
Now we have the setup, out of the way, lets get into it.
I have chosen Slashdot.org as the application under test. This page has dynamic content (and contains the some interesting html), so it’s a great example. The page at the time of writing looks like this.
Slashdot.orgSay we wanted to click on the third item in the Quick Links sidebar menu. At present, it’s “Penny Arcade”.

The name suggests that the topics may change from time to time. How do we keep our script from breaking when the link name changes? The answer is regular expressions.
As LoadRunner reads the HTML of a webpage, let’s have a look at the code for the quick links sidebar

Quick Links

To start our Regular Expression, we are going to look for something we can always identify and work from there. In this case it’s “id=”index_qlinks-content””. From there we know that our Anchor link will be after the third occurrence of “<br>”. Anything in between the “id=”index_qlinks-content”” and “<br>” is irrelevant to us. Translating this into a regular expression could look like this:
id="index_qlinks-content".*?href=.*?href=.*?href="(.*?)\">
Notice that it contains the highlighted sections of the code remain, and we replace anything that we don’t care about with “.*?”.  The part of the expression that we wish to extract is surrounded by brackets. Now let’s look at how this translates to the LoadRunner function.
match(lr_eval_string("{buffer}"),
      "id=\"index_qlinks-content\".*?href=.*?href=.*?href=\"(.*?)\">",
      "EXTRACT_QuickLink_Item",
      1);
This will read the string stored in the parameter “buffer”, apply the RegEx above, and extract the value to “EXTRACT_QuickLink_Item” and save the value in the 1st brackets (match number)
Using regular expression this script will continue to work even when the links change. We can extend this further to click on a random link in our quick links menu. Assuming there will always be 7 items in this menu, we first create a random number parameter between 1 and 7.LoadRunner Random Number Parameter
Next we extend our regular expression to capture each href value in the list. Note that I have documented the RegEx over multiple lines to increase readability.
id="index_qlinks-content".*?
href=\"(.*?)\"      // Match 1
href=\"(.*?)\"      // Match 2
href=\"(.*?)\"      // Match 3
href=\"(.*?)\"      // Match 4
href=\"(.*?)\"      // Match 5
href=\"(.*?)\"      // Match 6
href=\"(.*?)\"      // Match 7
Now we put it all together, using the random number as the match number, noting that we have 5 possible matches (defined by brackets). Our final LoadRunner command looks like this. Note that I have trimmed the RegEx to its bare bones.
match(lr_eval_string("{buffer}"),
    "id=\"index_qlinks-content\".*?href=\"(.*?)\".*?href=\"(.*?)\".*?href=\"(.*?)\".*?href=\"(.*?)\".*?href=\"(.*?)\"href=\"(.*?)\".*?href=\"(.*?)\"",
    "EXTRACT_QuickLink_Item",
    atoi(lr_eval_string("{RandomNumber1to7}")));
This will save the href of a random quick link from our list of 7 available to the parameter “EXTRACT­_QuickLink_Item”.
The full script will look something like this:
Action()
{
 // Save the entire webpage into a the buffer parameter
 buffer();

 lr_start_transaction("Regular_Expressions_01");

 web_url("web_url",
  "URL=http://slashdot.org",
  "TargetFrame=",
  "Resource=0",
  "Referer=",
  LAST);

 lr_end_transaction("Regular_Expressions_01", LR_AUTO);

 match(lr_eval_string("{buffer}"),
    "id=\"index_qlinks-content\".*?href=\"(.*?)\".*?href=\"(.*?)\".*?href=\"(.*?)\".*?href=\"(.*?)\".*?href=\"(.*?)\"href=\"(.*?)\".*?href=\"(.*?)\"",
    "EXTRACT_QuickLink_Item",
    atoi(lr_eval_string("{RandomNumber1to7}")));

 return 0;
}
Important: In LoadRunner, we remove the need for expression delimiters (eg. The / in /\w+/). We also add in an additional requirement of escaping the \ character. Instead of \w matching a word character, we must use \\w instead. This is due to the C language considering \ a special character.
The reasoning for excluding RegEx from LoadRunner is often based around the additional processing overhead of the regular expression engine. While RegExp do require slightly more processing power, the increased flexibility they provide can be invaluable. The rule of thumb is to use LB/RB whenever possible and keep RegEx for the rest.
Regular Expressions do however open up a world of possibilities and hopefully may get you out of a tight spot next time you are scripting.

WEB_REG_SAVE_PARAM_REGEXP() – A REGEXP PRIMER FOR PERFORMANCE TESTERS

LoadRunner 11 and later versions come with the long overdue feature of being able to use regular expressions to correlate values. The standard web_reg_save_param_ex() function relies on left and right boundaries and some simple attributes like length, offset and ordinal to narrow down searches. This is generally functional, but regexps are better. They are more accurate, faster and more reliable. There’s a reason why they are arguably the de-facto standard method for extracting values from strings.
JMeter uses regexps as standard and, as a result, once you have a solid understanding of them, I think that it is substantially easier and faster to correlate scripts using this tool. It’s said that it is a steep learning curve to learning regexps, but personally, I reckon you can get the basics in a couple of hours. Besides, any half decent load tester should have this skill even if they don’t use it to correlate; it is extremely useful for data manipulation and this is a common requirement in performance testing. Simply just knowing regexps, awk and sed together is going to solve 90% of your data manipulation needs.
Before getting into detail, anyone starting out with regexps will want a handy regexp tester, like rubular.com. There’s lots and lots of others.
So, a recent post I read asked how to correlate the string 1945 from this json response:
"...:[{"containerName":" ","containerSize":"12","containerStartRow":"0","containerEndRow":"0","rows":[[“NW,RO,RA","DLY","10/07/2011","10/17/2011","10/01/2011","RA","Y","FR","AMEA","AC","1945","50","50","AC 100IOSH-08","UserDefined","10/07/2011","Reassigned"..."
Classically, in LoadRunner you would try to use left and right boundaries but this gets horrible with json.
The obvious LB and RBs here would be
“LB=\”AC\”, \”",
“RB=\”",
But the problem with this is what if AC is dynamic too? It probably is. A reliable correlation would have to use the unique text “rows”: [[ (I'm assuming this is unique.) but then you'd have to end at ] and you’d end up capturing the whole string and be left with some fun C string manipulation to get the required value.
Another method might be to use SaveOffset but the risk here is that one or more values might have dynamic lengths.
There are probably some ways it could be done – there are always ways – but using a regular expression using web_reg_save_param_regexp() is probably better.
The syntax for this function is:
int web_reg_save_param_regexp("ParamName=<output parameter name>", "RegExp=regular_expression", [<List of Attributes>,] [<SEARCH FILTERS>,] LAST );
where Attributes and SEARCH FILTERS are standard. This is pretty simple so I will focus on just the regexp syntax from here on.
In the case of the json sting above, one regexp you could use is:
rows":\[\[“[^"\r\n]*","[^"\r\n]*","[^"\r\n]*","[^"\r\n]*","[^"\r\n]*","RA","[^"\r\n]*","FR","AMEA","[^"\r\n]*","([^"\r\n]*)"
This will return:
1945
That might look like a bunch of gibberish but it’s actually nice and logical and I think most performance testers should be able to grasp the concept and then you just need to learn a few rules. There’s lots of stuff out there so I won’t repeat it here.
But one thing that is worth highlighting here is the use of:
([^"\r\n]*)
This can actually be simplified as:
([^"]*)
It basically means match everything that is not a double quote. The previous expression matched everything that was not a double quote nor a newline. This is really useful, you can use it to capture any value that is enclosed in double quotes which, frankly, makes up a large part of correlation.
If your response contained something like:
<input type=”hidden” name=”__VIEWSTATE” id=”__VIEWSTATE” value=”JHGYTFDIUSI”
Then the regexp would be:
VIEWSTATE” value=”([^"]+)”
(Technically, it should be VIEWSTATE”\s+value=”([^"]+)” where the \s+ matches on white space(s) – it’s safer.)
That’s the basics, but what makes regexps so great is that they can do so much more, and this is exactly why they are far superior to the old wrsp() with it’s clunky boundaries.  The json string given above is one example where a regexp would work better but once you get the hang of them it’s surprising what you can do. For example, multiple matches.
Multiple matches work in the same way as with the classic wrsp() using boundaries, you specify Ordinal=ALL and get param_1, param_2…param_count, etc. which by itself is useful. But actually, regexps can do even more than this, you can insert multiple parenthesis intoa single regexp to get multiple groups and if any of these groups match multiple times then you create a multi-dimensional array – give this a little thought and you will realise the potential. Sadly, LR11 & 12 only support using a single matching group so no multi-dimensional arrays yet in HP land.
The lack of support for multiple groups is a shame, in JMeter you can have a regexp like:
rows":\[\[“[^"\r\n]*","([A-Z]{3})","[^"\r\n]*","[^"\r\n]*","[^\/]+\/[\d]+?\/2011","[A-Za-z]*","[^"\r\n]*","[^"\r\n]*","([^"\r\n]*)","[^"\r\n]*","([^"\r\n]*)"
which returns multiple matches looking like:
1    {param_g1} = DLY
2    {param_g2} = AMEA
3    {param_g3} = 1945
Instead, for LR, you’d need multiple wrsp_regexp() statements.
In general, one thing you need to be aware of when using regexps in load testing is greediness causing backtracking – this is crucial, if you don’t take care, you’ll eat CPU on your Load Generator machines.
Note. There is also a web_reg_save_param_xpath() function which works better for XML responses. This is also a long time feature of JMeter