Performance and Site Relability Engineering -Sreenivasulu kota

The recorded application becomes unresponsive during the recording

This could be caused by VuGen’s recording mechanism not being able to connect to the application’s server. Network connection errors can be seen in the Recording Log:

[Net An. Warning (1068:197c)] Request Connection: Remote Server @ 123.123.123.123 - 5222 (Service=) Failed attempt #1. Unable to connect to remote server: rc = -1 , le = 10060) [Net An. Warning (1068:197c)] Request Connection: Remote Server @ 123.123.123.123 - 5222 (Service=) Failed attempt #2. Unable to connect to remote server: rc = -1 , le = 10060)

Note: the Recording log is in the ‘Output’ pane. Make sure ‘Recording’ is selected in the combo-box on the left:

In order to fix this problem, the Port mapping for the specific IP and port should be added to the Port Mapping dialog (under the Record > Recording Options… menu), and the entry should be unchecked. That will ensure that the above IP and port are not recorded – that application simply connects to them without any LoadRunner involvement. For the messages above the correct setting would be:

This workaround can be done in case the communication to 123.123.123.123:5222 is not important for the business process and can be omitted. If that’s not the case, the same entry should still be added, but left as checked. That will ensure correct traffic capture to this address.

Creating a custom .DLL file in LoadRunner

Sometimes LoadRunner does not provide all the needed features that one needs when doing performance testing. One option is to write DLL’s to handle the needed stuff and writing your own custom DLL’s is really easy once you get the hang of it!

I needed to do a few HTTP calls “under the radar” from a script so using Delphi 2009 I wrote my own HTTPClient DLL that allows me to do a HTTP GET to a specific URL without adding to the statistics (Pages and Hits/Sec stats).

Creating DLL’s in Delphi is really simple. One of the things I had to keep in mind was that D2009 uses unicode internally so I had to take care to convert any internal strings to non-unicode before returning them to LR. I opted for using the Indy 10 component TIdHTTP as the base HTTP client, adding some Cookie handling and Compression support (gzip,deflate) and making sure the DLL was thread safe (as many client threads/process would be using it). I also soon realized I needed HTTP Proxy Support so in the end I added that too.

Finally had a DLL that exported the following methods:
http_Initialize( VUserID: Integer )
http_WebProxy( VUser: Integer; Host:PAnsiChar; Port: Integer )
http_WebGet( VUserID: Integer; URL: PAnsiChar; DestBuf: PAnsiChar; BufSize: Integer )
http_Finalize( VUserID: Integer )

The VUserID is the identifier for the connection since the DLL supports keep-alive. Calling Finalize() will destroy all cookies and disconnect the client from the server.

Sample script:

vuser_init()
{
int ret, VUserID;
char buf[10240];

// Get VUserID
VUserID = atoi(lr_eval_string("{VUserID"));

// Load the DLL
lr_load_dll("HTTPClient.dll");

// Initialize the VUser object inside the DLL
http_Initialize( VUserID );

// Set the HTTP Proxy
http_WebProxy( VUserID, "127.0.0.1", 8080);

// Clear buffer, and get the URL's response
memset(buf,0,sizeof(buf));
ret = http_WebGet( VUserID, "http://www.google.com", buf, sizeof(buf) );

// Check for error
// Returns
// 0 != Insufficient Buffer (Returned value is needed size)
if (ret != 0) lr_error_message("Error: RetCode=%d", ret)
else lr_output_message("%s", buf);

// Finalize the VUser (free the VUser object inside the DLL)
http_Finalize( VUserID );

return 0;
}

An additional good thing with this is that the DLL can be loaded under ANY protocol, so now HTTP calls can be made in any script type!

Creating dll files and using it in LoadRunner scripts

Creating DLLs have tons of advantage, especially when you want to take your scripts away from the beaten path. I won’t blabber much in this post and would like to take you people right into the topic I want to cover.

In the previous post I mentioned about Search And Replace function. To serve the purpose of an example, I have recreated the same Search And Replace function in VC++(2010) expresss edition and built a DLL file.

Below is the code that created in VC++:

#include "stdafx.h"
#include "C:\Program Files\HP\LoadRunner\include\lrun.h"

extern "C" __declspec( dllexport ) int lr_searchReplace(char* inputStr, char* outputStr, char lookupChar, char repChar);

int lr_searchReplace(char* inputStr, char* outputStr, char lookupChar, char repChar)
{

char *ptr =inputStr;
char xchar;
int len=0;
int i=0;

lr_output_message("%s",inputStr);
xchar = *ptr;//Copy initial
len=strlen(inputStr);
while (len>0)
{

len--;
xchar = *ptr;
if(xchar==' ')
{
inputStr[i]= repChar;

}

ptr++;
i++;

}

lr_save_string(inputStr,outputStr);
lr_output_message("%s",inputStr);

return 0;
}

Before attempting to build this dll project in VC++, ensure that you have included lrun50.lib in the aditional linker file settings which is in Project Options, else the project won't successfully build/compile. lrun.lib should be present inside \lib folder. Also make sure you include the lrun.h header file as shown in the above code snippet.

After the project is built, dig into the debug folder to find the dll file and then copy it inside the LoadRunner script folder and use it as shown below:

Action()
{
lr_load_dll("SearchNReplace.dll");
lr_save_string("this is a dummy text", "cProjectName");
r_searchReplace(lr_eval_string("{cProjectName}"), "convProjName", ' ', '+');
lr_output_message("%s",lr_eval_string("{convProjName}"));
return 0;
}

Difference between HTML, URL Mode Recording in Load runner

HTML Mode:

Default and recommended recording mode.

Records HTML action in the context of the current Web page so that all we see on a web page will be recorded in a single function which makes it easier to read.

Advantage of this mode is that it generates a script that is intuitive to the reader in terms of what is the form requesting (in a form of entire web page).

Best for browser applications.

URL Mode:

Instructs VuGen to record all requests and resources from the server. It automatically records every HTTP resource as URL steps.

Generates a script that has all known resources downloaded for your viewing which works good with non-HTML applications such as applets and non-browser applications (e.g. Win32 executables).

Having everything together creates another problem of overwhelming low-level information and making the script unintuitive. Difficult to read.

When there are unrecognizable requests made to the server in Web (HTTP/HTML) protocol, they are recorded as web_custom_request. However, in URL-mode, this can be selected to allow recording to default to web_custom_request.

GUI Mode:

Introduced with Web (Click & Script) protocol.

GUI-mode option instructs VuGen to record all editable fields in an object or non-browser applications. What it does is to detect the fields that have been edited and generate the scripts accordingly.

Concept similar to functional testing when objects are detected at theGUI-level. When reading the script, it allows easier reading as the script is based on the GUI presented to the real user. Easier to read in context of the object.

Best for applets and non-browser applications.

Disabling and Enabling Rendezvous Points in Load runner

Rendezvous Points:
During a scenario run, you can instruct multiple Vusers to perform tasks
simultaneously by using rendezvous points. A rendezvous point creates intense user
load on the server and enables LoadRunner to measure server performance under
load. arrives at the rendezvous point, it is held there by the Controller. You then set a
rendezvous policy according to which the Controller releases the Vusers from the
rendezvous point either when the required number of Vusers arrives, or when a
specified amount of time has passed. You define rendezvous points in the Vuser
script.

B. Using the Controller, you can influence the level of server load by selecting:

1. Which of the rendezvous points will be active during the scenario?
2. How many Vusers will take part in each rendezvous?

C. Include lr_rendezvous(“rendezvous_name”); on the top of the start transaction
where the rendezvous of all the Vusers must occur in the VuGen script.

D. Howto Set Up a Rendezvous in a Scenario?

Prerequisites:

To set up a rendezvous in the scenario, your scenario must include Vuser scripts that
have rendezvous points inserted in them. When you add a Vuser group or script to
the scenario, LoadRunner scans the included scripts for the names of the rendezvous
points and adds them to the list of rendezvous points. You can see the list of all the
rendezvous points in your scenario by selecting Scenario > Rendezvous.

Note: In goal-oriented scenarios, a script’s rendezvous points are disabled.

Set the level of emulated user load

Select the rendezvous points to take part in the scenario, and the number of Vusers
to participate in each rendezvous. You can temporarily disable a rendezvous and
exclude it from the scenario. You can disable a rendezvous point for all Vusers in a
scenario, or you can temporarily disable specific Vusers from participating in the
rendezvous. By disabling and enabling a rendezvous, you influence the level of server
load.

Set the attributes for the rendezvous policy – Optional

In the Rendezvous Information dialog box, for each rendezvous:
a. Select the rendezvous, and click the Policy button.
b. In the Policy dialog box, set the policy attributes as follows:
I. Release. How many Vusers will be released from a rendezvous at a time.
II. Timeout. How long the Controller waits before releasing Vusers from a rendezvous

By disabling and enabling rendezvous points, you can influence the level of server load

Disabling a rendezvous temporarily removes it from the rendezvous list and excludes it from the scenario.

Enabling a rendezvous returns it to the Rendezvous list and includes it in the scenario.

You use the Disable and Enable commands to change the status of rendezvous points during a scenario.

To disable a rendezvous:

1 Open the Rendezvous window. The Rendezvous menu appears in the LoadRunner menu bar.

2 Click a rendezvous. The selected rendezvous is highlighted.

3 Choose Rendezvous > Disable, or click the Disable button. The rendezvous name changes from black to gray and the rendezvous is disabled.

To enable a rendezvous:

1 Open the Rendezvous window. The Rendezvous menu appears in the LoadRunner menu bar.

2 Click a disabled rendezvous. The selected rendezvous is highlighted.

3 Choose Rendezvous > Enable, or click the Enable button. The rendezvous name changes from gray to black and the rendezvous is enabled.

How to Review SRS Document and Create Test Scenarios

SDLC’s Design Phase:

The next phase in the SDLC is “Design”- this is where the functional requirements are translated into the technical details. The dev, design, environment and data teams are involved in this step. The outcome of this step is typically a Technical Design Document- TDD. The input is the SRS document both for the creation of the TDD and for the QA team to start working on the QA aspect of the project- which is to review the SRS and identify the test objective.

What is an SRS review?

SRS is a document that is created by the development team in collaboration with business analysts and environment/data teams. Typically, this document once finalized will be shared with the QA team via a meeting where a detailed walkthrough is arranged. Sometimes, for an already existing application, we might not need a formal meeting and someone guiding us through this document. We might have the necessary information to do this by ourselves.

SRS review is nothing but going through the functional requirements specification document and trying to understand what the target application is going to be like.

The formal format and a sample were shared with you all in the previous article. It does not necessarily mean that all SRSs are going to be documented that way exactly. Always, form is secondary to the content. Some teams will just choose to write a bulleted list, some teams will include use cases, some teams will include sample screenshots (like the document we had) and some just describe the details in paragraphs.
Pre-steps to software requirements specification review:

Step #1: Documents go through multiple revisions, so make sure we have the right version of the reference document, the SRS.

Step #2: Establish guidelines on what is expected at the end of the review from each team member. In other words, decide on what deliverables are expected from this step – typically, the output of this step is to identify the test scenarios. Test scenarios are nothing but one line pointers of ‘what to test’ for a certain functionality.

Step #3: Also establish guidelines on how this deliverable is to be presented- I mean, the template.

Step #4: Decide on whether each member of the team is going to work on the entire document or divide it among themselves. It is highly recommended that everyone reads everything because that will prevent knowledge concentration with certain team members. But in case of a huge project, with the SRS documents running close to 1000 pages, the approach of breaking up the document module wise and assigning to individual team members is most practical.

Step #5: SRS review also helps in better understanding if there are any specific prerequisites required for the testing of the software.

Step #6: As a byproduct, a list of queries where some functionality is difficult to understand or if more information needs to be incorporated into functional requirements or if mistakes are made in SRS they are identified.
What do we need to get started?
The correct version of the SRS document
Clear instructions on who is going to work on what and how much time have they got.
A template to create test scenarios
Other information on- who to contact in case of a question or who to report in case of a documentation inconsistency

Who would provide this information?

Team leads are generally responsible for providing all the items listed in the section above. However, team members’ inputs are always important for the success of this entire endeavor.

Team leads often ask- What kind of inputs? Wouldn’t it be better to assign a certain module to someone interested in it than to a team member who is not? Wouldn’t it be nice to decide on the target date based on the team’s opinion than thrust a decision on them? Also, for the success of a project, templates are important. As a general rule, templates have a higher rate of efficiency when they are tailored to the specific team’s convenience and comfort.

It should therefore be noted that, team leads more than anything are team members. Getting your team onboard on the day-to-day decisions is crucial for the smooth running of the project.
Why a template for test scenarios – isn’t it enough if we just make a list?

It sure is. However, software projects are not ‘one-man’ shows. They involve team work. Imagine in a team of 4- if each one of them decide to review one module of the software requirements specification each. Team member A has made a list on a sheet of paper. Team member 2 used an excel sheet. Team member 3 used a notepad. Team member 4 used a word doc. How do we consolidate all the work done for the project at the end of the day?

Also, how can we decide which one is the standard and how can we say what is right and what’s not if we did not create the rules to begin with?

That is what template is- A set of guidelines and an agreed format for uniformity and concurrence for the entire team.

The table below will let us create the test scenarios. The columns included are:

Column #1: Test scenario ID
Every entity in our testing process has to be uniquely identifiable. So, every test scenario has to be assigned an ID. The rules to follow while assigning this ID have to be defined. For the sake of this article we are going to follow the naming convention as: TS(prefix that stands for Test Scenario) followed by ‘_’ , module name MI(my Info module of the Orange HRM project) followed by ‘_’ and then the sub section (eg: MIM for My info module, P for photograph and so on)followed by a serial number. An example would be: “TS_MI_MIM_01”.

Column #2: Requirement
It helps that when we create a test scenario we should be able to map it back to the section of the SRS document where we picked it from. If the requirements have IDs we could use that. If not section numbers or even page numbers of the SRS document from where we identified a testable requirement will do.

Column #3: Test scenario description
A one liner specifying ‘what to test’. We would also refer to it as test objective.

Column #4: Importance
This is to give an idea about how important certain functionality is for the AUT. Values like high, medium and low can be assigned to this field. You could also choose a point system, like 1-5, 5 being most important, 1 being less important. Whatever the value this field can take, it has to be pre-decided.

Column #5: No. of Test cases
A rough estimate on how many individual test cases we might end up writing that one test scenario. For example: To test the login- we include these situations: Correct username and password. Correct username and wrong password. Correct password and wrong username. So, validating the login functionality will result in 3 test cases.

Note: You can expand this template or remove the fields as you see fit.

Distributed load testing in JMeter

What is distributed load testing?
Distributed load testing is the process using which multiple systems are used for simulating load of large number of users. In JMeter this is achieved by creating a Master- Slave configuration.

Why it is required?
The reason of using more than one system for load testing is the limitation of single system to generate large number of threads (users).

What other options do we have?
Apart from using distributed load testing we can perform load testing over cloud also. Load testing on cloud (like Amazon’s EC2) has several advantages- easy scalability, no maintenance, fast deployment and no artificial network bottlenecks.
Another alternative is Blazemeter which is a cloud based service compatible with Apache JMeter. It generates large amount of instant load and provide very comprehensive reporting and analysis features.
Also, we can perform distributed load testing on cloud, in which multiple machines on cloud can be used for generating large amount of load.

Distributed Load Testing using JMeter:
For distributed load testing we need to create Master-slave configuration wherein Master will control all the slaves and collect the test results. To make the system work firewall needs to be turned off and all the systems need to be in same subnet.
Also, preferably all the systems need to use same version of JMeter and Java.

1. First of all we need to start the jmeter-server.bat in the slave systems. For this just go to the bin folder inside JMeter home directory and run the batch file jmeter-server.bat(for windows) or jmeter-server (for linux).
2. Now on the master system open the properties file jmeter.properties and edit theremote_hosts entry. Remove the loopback address’s value (127.0.0.1) for the remote_host entry and specify the IP addresses of all the slave systems separated by commas.

3. We just need to Remote start all the slave machines remotely in JMeter. For this just open JMeter on the Master machine (for which properties file is just edited). Open your test script and remote start all the nodes.

Understanding Summary Report in Jmeter

The summary report shows values about the measurement Jmeter has done while calling the same page as if many users are calling the page. It gives the result in tabular format which you can save as .csv file.

These are some main headings in the summary result listener. Lets understand them in detail:

Summary Report:
In the above image you can see in the red lined box: Label, Samples, Average, Max, Min, Std.Dev, Error%, Throughput, KB/Sec, Avg.Bytes.

Label: In the label section you will able to see all the recorded http request, during test run or after test run.

Samples: Samples denote to the number of http request ran for given thread. Like we have one http request and we run it with 5 users, than the number of samples will be 5x1=5.
Same if the sample ran two times for the single user, than the number of samples for 5 users will be 5x2=10.

Average: Average is the average response time for that particular http request. This response time is in millisecond. Like in the image you can see for first label, in which the number of sample is 4 because that sample run 2 time for single user and i ran the test with 2 user. So for 4 samples the average response time is 401 ms.

Min: Min denotes to the minimum response time taken by the http request. Like for the above image the minimum response time for first four samples is 266 ms. It means one http request responded in 266 ms out of four samples.

Max: Max denotes to the maximum response time taken by the http request. Like for the above image the maximum response time for first four samples is 552 ms. It means one http request responded in 552 ms out of four samples.

Std.Deviation: This shows how many exceptional cases were found which were deviating from the average value of the receiving time. The lesser this value more consistent the time pattern is assumed.

Error %: This denotes the error percentage in samples during run. This error can be of 404(file not found), or may be exception or any kind of error during test run will be shown in Error %. In the above image the error % is zero, because all the requests ran successfully.

Throughput: The throughput is the number of requests per unit of time (seconds, minutes, hours) that are sent to your server during the test.

Why Average decreasing and Percentiles increasing?(Load Runner and Performance Testing)

Why Averages Suck and Percentiles are Great?

Anyone that ever monitored or analyzed an application uses or has used averages. They are simple to understand and calculate. We tend to ignore just how wrong the picture is that averages paint of the world. To emphasis the point let me give you a real world example outside of the performance space that I read recently in a newspaper.

The article was explaining that the average salary in a certain region in Europe was 1900 Euro’s (to be clear this would be quite good in that region!). However when looking closer they found out that the majority, namely 9 out of 10 people, only earned around 1000 Euros and one would earn 10.000 (I over simplified this of course, but you get the idea). If you do the math you will see that the average of this is indeed 1900, but we can all agree that this does not represent the “average” salary as we would use the word in day to day live. So now let’s apply this thinking to application performance.

The Average Response Time:

The average response time is by far the most commonly used metric in application performance management. We assume that this represents a “normal” transaction, however this would only be true if the response time is always the same (all transaction run at equal speed) or the response time distribution is roughly bell curved.

A Bell curve represents the “normal” distribution of response times in which the average and the median are the same. I rarely ever occurs in real applications

In a Bell Curve the average (mean) and median are the same. In other words observed performance would represent the majority (half or more than half) of the transactions.
In reality most applications have few very heavy outliers; a statistician would say that the curve has a long tail. A long tail does not imply many slow transactions, but few that are magnitudes slower than the norm.

This is a typical Response Time Distribution with few but heavy outliers – it has a long tail. The average here is dragged to the right by the long tail.

We recognize that the average no longer represents the bulk of the transactions but can be a lot higher than the median.

You can now argue that this is not a problem as long as the average doesn’t look better than the median. I would disagree, but let’s look at another real-world scenario experienced by many of our customers:

This is another typical Response Time Distribution. Here we have quite a few very fast transactions that drag the average to the left of the actual median

In this case a considerable percentage of transactions are very, very fast (10-20 percent), while the bulk of transactions are several times slower. The median would still tell us the true story, but the average all of a sudden looks a lot faster than most of our transactions actually are. This is very typical in search engines or when caches are involved, some transactions are very fast, but the bulk are normal. Another reason for this scenario are failed transactions, more specifically transactions that failed fast. Many real world applications have a failure rate of 1-10 percent (due to user errors or validation errors). These failed transactions are often magnitudes faster than the real ones and consequently distorted an average.

Of course performance analysts are not stupid and regularly try to compensate with higher frequency charts (compensating by looking at smaller aggregates visually) and by taking in minimum and maximum observed response times. However we can often only do this if we know the application very well, those unfamiliar with the application might easily misinterpret the charts. Because of the depth and type of knowledge required for this, it’s difficult to communicate your analysis to other people – think how many arguments between IT teams have been caused by this. And that’s before we even being to think about communicating with business stakeholders!

A better metric by far are percentiles, because they allow us to understand the distribution. But before we look at percentiles, let’s take a look a key feature in every production monitoring solution: Automatic Baselining and Alerting.
Automatic Baselining and Alerting

In real world environments, performance gets attention when it is poor and has a negative impact on the business and users. But how can we identify performance issues quickly to prevent negative effects? We cannot alert on every slow transaction, since there are always some. In addition, most Operations teams have to maintain a large number of applications are not familiar with all of them, so manually setting thresholds can be inaccurate, quite painful and time consuming.

The industry has come up with a solution called Automatic Baselining. Baselining calculates out the “normal” performance and only alerts us when an application slows down or produces more errors than usual. Most approaches rely on averages and standard deviations.

Without going into statistical details, this approach again assumes that the response times are distributed over a bell curve:

The Standard Deviation represents 33% of all transactions with the mean as the middle. 2xStandard Deviation represents 66% and thus the majority, everything outside could be considered an outlier. However most real world scenarios are not bell curved…

Typically, transactions that are outside 2 times standard deviation are treated as slow and captured for analysis. An alert is raised if the average moves significantly. In a bell curve this would account for the slowest 16.5 percent (and you can of course adjust that), however if the response time distribution does not represent a bell curve it becomes inaccurate. We either end up with a lot of false positives (transactions that are a lot slower than the average but when looking at the curve lie within the norm) or we miss a lot of problems (false negatives). In addition if the curve is not a bell curve than the average can differ a lot from the median, applying a standard deviation to such an average can lead to quite a different result than you would expect! To work around this problem these algorithms have many tunable variables and a lot of “hacks” for specific use cases.

Why we consider Percentile rather than response times?

Why Percentiles are always chosen:

A percentile tells me at which part of the curve I am looking at and how many transactions are represented by that metric. To visualize this look at the following chart:

This chart shows the 50th and 90th percentile along with the average of the same transaction. It shows that the average is influenced far mor heavily by the 90th, thus by outliers and not by the bulk of the transactions

The green line represents the average. As you can see it is very volatile. The other two lines represent the 50th and 90th percentile. As we can see the 50th percentile (or median) is rather stable but has a couple of jumps. These jumps represent real performance degradation for the majority (50%) of the transactions. The 90th percentile (this is the start of the “tail”) is a lot more volatile, which means that the outliers slowness depends on data or user behavior. What’s important here is that the average is heavily influenced (dragged) by the 90th percentile, the tail, rather than the bulk of the transactions.
If the 50th percentile (median) of a response time is 500ms that means that 50% of my transactions are either as fast or faster than 500ms. If the 90th percentile of the same transaction is at 1000ms it means that 90% are as fast or faster and only 10% are slower. The average in this case could either be lower than 500ms (on a heavy front curve), a lot higher (long tail) or somewhere in between. A percentile gives me a much better sense of my real world performance, because it shows me a slice of my response time curve.

For exactly that reason percentiles are perfect for automatic baselining. If the 50th percentile moves from 500ms to 600ms I know that 50% of my transactions suffered a 20% performance degradation. You need to react to that.

In many cases we see that the 75th or 90th percentile does not change at all in such a scenario. This means the slow transactions didn’t get any slower, only the normal ones did. Depending on how long your tail is the average might not have moved at all in such a scenario!

In other cases we see the 98th percentile degrading from 1s to 1.5 seconds while the 95th is stable at 900ms. This means that your application as a whole is stable, but a few outliers got worse, nothing to worry about immediately. Percentile-based alerts do not suffer from false positives, are a lot less volatile and don’t miss any important performance degradations! Consequently a baselining approach that uses percentiles does not require a lot of tuning variables to work effectively.

The screenshot below shows the Median (50th Percentile) for a particular transaction jumping from about 50ms to about 500ms and triggering an alert as it is significantly above the calculated baseline (green line). The chart labeled “Slow Response Time” on the other hand shows the 90thpercentile for the same transaction. These “outliers” also show an increase in response time but not significant enough to trigger an alert.

Here we see an automatic baselining dashboard with a violation at the 50th percentile. The violation is quite clear, at the same time the 90th percentile (right upper chart) does not violate. Because the outliers are so much slower than the bulk of the transaction an average would have been influenced by them and would not have have reacted quite as dramatically as the 50th percentile. We might have missed this clear violation!

How can we use percentiles for tuning?

Percentiles are also great for tuning, and giving your optimizations a particular goal. Let’s say that something within my application is too slow in general and I need to make it faster. In this case I want to focus on bringing down the 90th percentile. This would ensure sure that the overall response time of the application goes down. In other cases I have unacceptably long outliers I want to focus on bringing down response time for transactions beyond the 98th or 99th percentile (only outliers). We see a lot of applications that have perfectly acceptable performance for the 90th percentile, with the 98th percentile being magnitudes worse.

In throughput oriented applications on the other hand I would want to make the majority of my transactions very fast, while accepting that an optimization makes a few outliers slower. I might therefore make sure that the 75th percentile goes down while trying to keep the 90th percentile stable or not getting a lot worse.

I could not make the same kind of observations with averages, minimum and maximum, but with percentiles they are very easy indeed.

Conclusion
Averages are ineffective because they are too simplistic and one-dimensional. Percentiles are a really great and easy way of understanding the real performance characteristics of your application. They also provide a great basis for automatic baselining, behavioural learning and optimizing your application with a proper focus. In short, percentiles are great!