Mitigation of Performance Testing Impediments

An impediment is anything that prevents people from doing their job. Here are some impediments that performance testing teams have encountered.

A. Unavailability of subject matter / technical experts such as developers and operations staff.

B. Unavailability of applications to test due to delays or defects in the functionality of the system under test.

C. Lack of Connectivity/access to resources due to network security ports being available or other network blockage.

D. The script recorder fails to recognize applications (due to non-standard security apparatus or other complexity in the application).

E. Not enough Test Data to cover unique conditions necessary during runs that usually go several hours.

F. Delays in obtaining or having enough software licenses and hardware in the performance environment testing.

G. Lack of correspondence between versions of applications in performance versus in active development.

H. Managers not familiar with the implications of ad-hoc approaches to performance testing.

Fight Or Flight? Proactive or Reactive?

      Some call the list above "issues" which an organization may theoretically face.

      Issues become "risks" when they already impact a project.

      A proactive management style at a particular organization sees value in investing up-front to ensure that desired outcomes occur rather than "fight fires" which occur without preparation.

      A reactive management style at a particular organization believes in "conserving resources" by not dedicating resources to situations that may never occur, and addressing risks when they become actual reality.

 
Subject Matter Expertise

      The Impediment
      Knowledge about a system and how it works are usually not readily available to those outside the development team.

      What documents written are often one or more versions behind what is under development.

      Requirements and definitions are necessary to separate whether a particular behavior is intended or is a deviation from that requirement.

      Even if load testers have access to up-to-the-minute wiki entries, load testers usually are not free to interact as a peer of developers.

      Load testers are usually not considered a part of the development team or even the development process, so are therefore perceived as an intrusion to developers.

      To many developers, Performance testers are a nuisence who waste time poking around a system that is "already perfect" or "one we already know that is slow".

      What can reactive load testers do?
      Work among developers and easedrop on their conversations (like those studying animals in the wild).

      What can proactive load testers do?
      Up-front, an executive formally establishes expectations for communication and coordination between developers and load testers.

      Ideally, load testers participate in the development process from the moment a development team is formed so that they are socially bonded with the developers.

      Recognizing that developers are under tight deadlines, the load test team member defines exactly what is needed from the developer and when it is needed.

      This requires up-front analysis of the development organization:

          o the components of the application
          o which developers work on which component
          o contact information for each developer
          o existing documents available and who wrote each document
          o comments in blogs written by each developer

      An executive assigns a "point person" within the development organization who can provide this information.

      Assignments for each developer needs to originate from the development manager under whom a developer works for.

            When one asks/demands something without the authority to do so, that person would over time be perceived as a nuisence.

            No one can serve two masters. For you will hate one and love the other; you will be devoted to one and despise the other.

      A business analyst who is familiar with the application's intended behavior makes a video recording of the application using a utility such as Camtasia from ToolSmith. A recording has the advtange of capturing the timing as well as the steps.

              

The U.S. military developed the web-based CAVNET system to collaborate on innovations to improvise around impediments found in the found.


Availability of applications

      The Impediment
      Parts of an applications under active development become inacessible while developers are in the middle of working on them.

      The application may not have been built successfully. There are many root causes for bad builds:

          o Specification of what goes into each build are not accurate or complete.
          o Resources intended to go into a particular build are not made available.
          o An incorrect version of a component is built with newer incompatible components.
          o Build scripts and processes do not recognize these potential errors, leading to build errors.
          o Inadequate verification of build completeness.

      What can reactive load testers do?
      Frequent (nightly) builds may enable testers more opportunities than losing perhaps weeks wait for the next good build.

      Testers switch to another project/application when one application cannot be tested.

      What can proactive load testers do?
      Use a separate test environment that is updated from the development system only when parts of the application become stable enough to test.

      Have a separate test environment for each version so that work on a prior version can occur when a build is not successful on one particular environment.

      Develop a "smoke test" suite to ensure that applications are testable.

      Coordinate testing schedules with what is being changed by developers.

      Analyze the root causes of why builds are not successful, and track progress on elminating those causes over time.

              
Connectivity/access to resources

      The Impediment
      Workers may not be able to reach the application because of network (remote VPN) connectivity or security access.

      What can reactive load testers do?
      Work near the physical machine.

      Grant unrestricted access to those working on the system.

      What can proactive load testers do?
      Analyze the access for each functionality required by each role.

      Pre-schedule when those who grant access are available to the project.

              
 Script Recorder Recognition

      The Impediment
      Load test script creation software such as LoadRunner work by listening and capturing what goes across the wire and display those conversations as script code which may be modified by humans.

      Such recording mechanisms are designed to recognize only standard protocols going through the wire.

      Standard recording mechanisms will not recognize custom communications, especially within applications using advanced security mechanisms.

      Standard recording mechanisms also have difficulty recognizing complex use of Javascript or CSS syntax in SAP portal code.

      What can reactive load testers do?
      Skip (de-scope) portions which cannot be easily recognized.

      What can proactive load testers do?
      To ensure that utility applications (such as LoadRunner) can be installed, install them before locking down the system.

      Define the pattern install them before locking down the system.

              
 Test Data

      The Impediment
      Applications often only allow a certain combination of values to be accepted. An example of this is only specific postal zip codes being valid within a certain US state.

      Using the same value repeatedly during load testing does not create a realistic emulation of actual behavior because most modern systems cache data in memory, which is 100 times faster than retrieving data from a hard drive.

      This discussion also includes role permissions having a different impact on the system. For example, the screen of an administrator or manager would have more options. The more options, the more resources it takes just to display the screen as well as to edit input fields.

      A wide variation in data values forces databases to take time to scan through files. Specifying an index used to retrieve data is the most common approach to make applications more efficient.

      Growth in the data volume handled by a system can render indexing schemes inefficient at the new level of data.

      What can reactive load testers do?
      Use a single role for all testing.

      Qualify results from each test with the amount of data used to conduct each test.

      Use trial-and-error approachs to finding combinations of values which meet field validation rules.

      Examine application source code to determine the rules.

      Analyze existing logs to define the distribution of function invocations during test runs.

      What can proactive load testers do?
      Project the likely growth of the application in terms of impact to the number of rows in each key data variable. This information is then used to define the growth in row in each table.

      Define procedures for growing the database size, using randomized data values in names.

              
 Test Environment

      The Impediment
      Creating a separate enviornment for load testing can be expensive for a large complex system.

      In order to avoid overloading the production network, the load testing enviornment is often setup so no communication is possible to the rest of the network. This makes it difficult to deploy resources into the environment and then retrieve run result files from the environment.

      A closed environment requires its own set of utility services such as DNS, authentication (LDAP), time sychronization, etc.

      What can reactive load testers do?
      Change network firewalls temporarily while using the development environment for load testing (when developers do not use it).

      Use the production fail-over environment temporarily and hope that it is not needed during the test.

      What can proactive load testers do?
      Build up a production environment and use it for load testing before it is used in actual production.

              
Correspondance Between Versions

      The Impediment
      Defects found in the version running on the perftest environment may not be reproducible by developers in the development/unit test environments running a different (more recent) version.

      Developers may have moved on to a different version, different projects, or even different employers.

      What can reactive load testers do?
      Rerun short load tests on development servers. If the server is shared, the productivity of developers would be affected.

      What can proactive load testers do?
      Before testing, freeze the total state of the application in a full back-up so that the exact state of the system can be restored, even after changes are tried to diagnose or fix the application on the system where it's found.

      Run load tests with trace logs information. This would not duplicate how the system is actually run in production mode.

              
 Ad-hoc Approaches

      The Impediment
      Most established professional fields (such as accounting and medicine) have laws, regulations, and defined industry practices which give legitimacy to certain approaches. People are trained to follow them. The consequences of certain courses of action are known.

      But the profession of performance and load testing has not matured to that point.

      The closest industry document, ITIL, is not yet universally adopted. And ITIL does not clarify the work of performance testing in much detail.

      Consequently, each individual involved with load testing is likely to have his/her own opinions about what actions should be taken.

      This makes rational exploration of the implications of specific courses of action a conflict-ridden and thus time-consuming and expensive endeavor.

      What can reactive load testers do?
      Allocate time for planning before starting actual work until concurrance on the project plan is achieved among the stakeholders.

      Revise project completion estimates or scope as new information becomes available.

      What can proactive load testers do?
      Before the project gets away, agree on the rationale for elements of the project plan and who will do what when (commitments of tasks and deliverables). This is difficult for those who are not accustomed to being accountable, and requests for it would result in withdrawl or other defensive behavior.

      Identify alternative approaches and analyze them before managers come up with it themselves.

      Up-front, identify how to contact each stakeholder and keep them updated at least weekly, and immediately if decisions impact what they are actively working on.

      If a new manager is inserted in the project after it starts, review the project plan and rationale for its elements.

Top 10 performance issues with a Database

Here is a list of top 10 performance issues with a Database and their most probable solutions

Too many calls to the DB - There might be multiple trips to a single DB from various middleware components and any of the following scenarios will occur


1. More data is requested than necessary, primarily for faster rendering (but in the slowing down the entire performance )
2. Multiple applications requesting for the same data.
3. Multiple queries are executed which in the end return the same result
This kind of problem generally arises when there is too much object orientation. The key is to strike a balance between how many objects to create and what to put in each object. Object oriented programing may be good for maintenance, but it surely degrades performance if they are not handled correctly

Too much synchronization – Most developers tend to over-synchronize, large pieces of code are synchronized by writing even larger pieces of code. It is generally fine when there is low load, under high load, the performance of the application will definitely take a beating. How to determine if the application has sync issues. The easiest way (but not 100% fool proof) is to chart CPU time and Execution time

CPU Time – is the time spent on the CPU by the executed code
Execution time - This is the total time the method takes to execute. It includes all times including CPU, I/O, waiting to enter sync block, etc
Generally the gap between the two times gives the waiting time. If our trouble making method does not make an I/O call nor an external call, then it’s most probably a sync issue that is causing the slowness.

Joining too many tables – The worst kind of SQL issues creep up when too many tables are joined and the data is to be extracted from it. Sometimes it is just unfortunate that so many tables have to be necessarily joined to be able to pull out the necessary data.
There are two ways to attack this problem
1) Is it possible to denormalize a few tables to have more data?
2) Is it possible to create a summary table with most of the information that will be updated periodically?
Returning a large result set - Generally no user will go through thousands of records in the result set. Most users will generally limit to only the first few hundreds (or the first 3 -4 pages). By returning all the results, the developer is not only slowing the database but also chocking the network. Breaking the result set into batches (on the database side) will generally solve this issue (though not possible always)

Joining tables in the middleware – SQL is a fantastic language for data manipulation and retrieval. There is simply no need to move data to a middle tier and join tables there. Generally by joining data in the middle tier:
1. Unnecessary load on the network as it has to transport data back and forth
2. Increasing memory requirements on the application server to handle the extra load
3. Drop in server performance as the app tier is mainly held up with processing large queries
The best way to approach this problem is to user Inner and Outer joins right in the database itself. By this, the all the power of SQL and the database is utilized for processing the query.

Ad hock queries – just because SQL gives the privilege to create and use ad-hock queries, there is no point in abusing them. In quite a few cases it is seen that ad-hock queries create more mess than advantage they bring. The best way is to use stored procedures. This is not always possible. Sometimes it is necessary to use ad-hock queries, then there is no option but to use them, but whenever possible, it is recommended to use stored procedures. The main advantage with stored procedures is
1. Pre compiled and ready
2. Optimized by DB
3. Stored procedure in on the DB server, i.e. no network transmission of large SQL request.

Lack of indices – You see that the data is not large, yet the DB seems to be taking an abnormally long time to retrieve the results. The most possible cause for this problem could be lack of or misconfigured index. At first sight it might seem trivial, but when the data grows large, then it plays a significant role. There can be significant hit in performance if the indices are not configured properly.

Fill factor – One of the other things to consider along with index is fill factor. MSDN describes fill factor as a percentage that indicates how much the Database Engine should fill each index page during index creation or rebuild. The fill-factor setting applies only when the index is created or rebuilt. Why is this so important? If the fill factor is too high, if a new record is inserted and index rebuilt, then the DB will more often than not split the index (page splitting) into a new page. This is very resource intensive and causes fragmentation. On the other hand having a very low value for fill factor means that lots of space is reserved for index alone. The easiest way to overcome this is to look at the type of queries that come to the DB; if there are too many SELECT queries, then it is best to leave the default fill factor. On the other hand if there are lots of INSERT, UPDATE and DELETE operations, a nonzero fill factor other than 0 or 100 can be good for performance if the new data is evenly distributed throughout the table.

My Query was fine last week but it is slow this week?? – We get to see a lot of this. The load test ran fine last week but this week the search page is taking a long time. What is wrong with the database? The main issue could be that the execution plan (the way the query is going to get executed on the DB) has changed. The easiest way to get the current explain plan the explain plan for the previous week, compare them and look for the differences.

High CPU and Memory Utilization on the DB – There is a high CPU and high Memory utilization on the database server. There could be a multitude of possible reasons for this.
1. See if there are full table scans happening (soln: create index and update stats)
2. See if there is too much context switching (soln: increase the memory)
3. Look for memory leaks (in terms of tables not being freed even after their usage is complete) (soln: recode!)

There can be many more reasons, but there are the most common ones.

Low CPU and Memory utilization yet poor performance – This is also another case (though not frequent). The CPU and memory are optimally used yet the performance is still slow. The only reason why this can be is for two reasons:
1. Bad network – the database server is waiting for a socket read or write
2. Bad disk management – the database server is waiting for a disk controller to become free

As always these are only the most common database performance issues that might come up in any performance test. There are many more of them out there.

VM STAT COMMANDS IN UNIX

Importance of vmstat command :

The first tool to use is the vmstat command, which quickly provides compact information about various system resources and their related performance problems.


kthr     memory             page              faults        cpu
----- ----------- ------------------------ ------------ -----------
 r  b   avm   fre  re  pi  po  fr   sr  cy  in   sy  cs us sy id wa
 1  0 22478  1677   0   0   0   0    0   0 188 1380 157 57 32  0 10
 1  0 22506  1609   0   0   0   0    0   0 214 1476 186 48 37  0 16
 0  0 22498  1582   0   0   0   0    0   0 248 1470 226 55 36  0  9

 2  0 22534  1465   0   0   0   0    0   0 238  903 239 77 23  0  0
 2  0 22534  1445   0   0   0   0    0   0 209 1142 205 72 28  0  0
 2  0 22534  1426   0   0   0   0    0   0 189 1220 212 74 26  0  0
 3  0 22534  1410   0   0   0   0    0   0 255 1704 268 70 30  0  0
 2  1 22557  1365   0   0   0   0    0   0 383  977 216 72 28  0  0

 2  0 22541  1356   0   0   0   0    0   0 237 1418 209 63 33  0  4
 1  0 22524  1350   0   0   0   0    0   0 241 1348 179 52 32  0 16
 1  0 22546  1293   0   0   0   0    0   0 217 1473 180 51 35  0 14

The vmstat command reports statistics about kernel threads in the run and wait queue, memory, paging, disks, interrupts, system calls, context switches, and CPU activity. The reported CPU activity is a percentage breakdown of user mode, system mode, idle time, and waits for disk I/O.

Note: If the vmstat command is used without any interval, then it generates a single report. The single report is an average report from when the system was started. You can specify only the Count parameter with the Interval parameter. If theInterval parameter is specified without the Count parameter, then the reports are generated continuously.


As a CPU monitor, the vmstat command is superior to the iostat command in that its one-line-per-report output is easier to scan as it scrolls and there is less overhead involved if there are many disks attached to the system. The following example can help you identify situations in which a program has run away or is too CPU-intensive to run in a multiuser environment.

This output shows the effect of introducing a program in a tight loop to a busy multiuser system. The first three reports (the summary has been removed) show the system balanced at 50-55 percent user, 30-35 percent system, and 10-15 percent I/O wait. When the looping program begins, all available CPU cycles are consumed. Because the looping program does no I/O, it can absorb all of the cycles previously unused because of I/O wait. Worse, it represents a process that is always ready to take over the CPU when a useful process relinquishes it. Because the looping program has a priority equal to that of all other foreground processes, it will not necessarily have to give up the CPU when another process becomes dispatchable. The program runs for about 10 seconds (five reports), and then the activity reported by the vmstat command returns to a more normal pattern.

Optimum use would have the CPU working 100 percent of the time. This holds true in the case of a single-user system with no need to share the CPU. Generally, if us +sy time is below 90 percent, a single-user system is not considered CPU constrained. However, if us + sy time on a multiuser system exceeds 80 percent, the processes may spend time waiting in the run queue. Response time and throughput might suffer.

To check if the CPU is the bottleneck, consider the four cpu columns and the twokthr (kernel threads) columns in the vmstat report. It may also be worthwhile looking at the faults column: 
cpu

Percentage breakdown of CPU time usage during the interval. The cpucolumns are as follows: 
us:The us column shows the percent of CPU time spent in user mode. A UNIX process can execute in either user mode or system (kernel) mode. When in user mode, a process executes within its application code and does not require kernel resources to perform computations, manage memory, or set variables. 

sy:The sy column details the percentage of time the CPU was executing a process in system mode. This includes CPU resource consumed by kernel processes (kprocs) and others that need access to kernel resources. If a process needs kernel resources, it must execute a system call and is thereby switched to system mode to make that resource available. For example, reading or writing of a file requires kernel resources to open the file, seek a specific location, and read or write data, unless memory mapped files are used.
id:The id column shows the percentage of time which the CPU is idle, or waiting, without pending local disk I/O. If there are no threads available for execution (the run queue is empty), the system dispatches a thread called wait, which is also known as the idle kproc. On an SMP system, onewait thread per processor can be dispatched. The report generated by the ps-k or -g 0 option) identifies this askproc or wait. If the ps report shows a high aggregate time for this thread, it means there were significant periods of time when no other thread was ready to run or waiting to be executed on the CPU. The system was therefore mostlyidle and waiting for new tasks. command (with the
wa:The wa column details the percentage of time the CPU wasidle with pending local disk I/O and NFS-mounted disks. If there is at least one outstanding I/O to a disk when wait is running, the time is classified as waiting for I/O. Unless asynchronous I/O is being used by the process, an I/O request to disk causes the calling process to block (or sleep) until the request has been completed. Once an I/O request for a process completes, it is placed on the run queue. If the I/Os were completing faster, more CPU time could be used. 

A wa value over 25 percent could indicate that the disk subsystem might not be balanced properly, or it might be the result of a disk-intensive workload. 
Number of kernel threads in various queues averaged per second over the sampling interval. The kthr columns are as follows: 
r:Average number of kernel threads that are runnable, which includes threads that are running and threads that are waiting for the CPU. If this number is greater than the number of CPUs, there is at least one thread waiting for a CPU and the more threads there are waiting for CPUs, the greater the likelihood of a performance impact. 
b:Average number of kernel threads in the VMM wait queue per second. This includes threads that are waiting on filesystem I/O or threads that have been suspended due to memory load control. 
If processes are suspended due to memory load control, the blocked column (b) in the vmstat report indicates the increase in the number of threads rather than the run queue. 
p:For vmstat -I The number of threads waiting on I/Os to raw devices per second. Threads waiting on I/Os to filesystems would not be included here. 
faults
Information about process control, such as trap and interrupt rate. Thefaults columns are as follows: 
in:Number of device interrupts per second observed in the interval. Additional information can be found in Assessing disk performance with the vmstat command. 

sy:The number of system calls per second observed in the interval. Resources are available to user processes through well-defined system calls. These calls instruct the kernel to perform operations for the calling process and exchange data between the kernel and the process. Because workloads and applications vary widely, and different calls perform different functions, it is impossible to define how many system calls per-second are too many. But typically, when the sy column raises over 10000 calls per second on a uniprocessor, further investigations is called for (on an SMP system the number is 10000 calls per second per processor). One reason could be "polling" subroutines like the select() subroutine. For this column, it is advisable to have a baseline measurement that gives a count for a normal sy value. 

cs:Number of context switches per second observed in the interval. The physical CPU resource is subdivided into logical time slices of 10 milliseconds each. Assuming a thread is scheduled for execution, it will run until its time slice expires, until it is preempted, or until it voluntarily gives up control of the CPU. When another thread is given control of the CPU, the context or working environment of the previous thread must be saved and the context of the current thread must be loaded. The operating system has a very efficient context switching procedure, so each switch is inexpensive in terms of resources. Any significant increase in context switches, such as when cs is a lot higher than the disk I/O and network packet rate, should be cause for further investigation

JVM MONITORING

Acronym for Java Virtual Machine. An abstract computing machine, or virtual machine, JVM is a platform-independent execution environment that converts Java bytecode into machine language and executes it. Most programming languages compile source code directly into machine code that is designed to run on a specific microprocessor architecture or operating system, such as Windows or UNIX. A JVM -- a machine within a machine -- mimics a real Java processor, enabling Java bytecode to be executed as actions or operating system calls on any processor regardless of the operating system. 

For example, establishing a socket connection from a workstation to a remote machine involves an operating system call. Since different operating systems handle sockets in different ways, the JVM translates the programming code so that the two machines that may be on different platforms are able to connect.


JVM consist of following components:-

1)Byte-code verifier :- It verify the byte-code ,it check's for unusual code.

2)Class Loader :- After verifying Class Loader will load the byte-code into the memory for execution.

3)Execution engine :-
It further consist of 2 parts :-
a)Interpreter :- It interpret the code & run.
b)JIT(Just-in-Time Interpreter)
JVM Hotspot defines when to use Interpreter or JIT.

4)Garbage Collector:- It periodically check for the object on heap , whose link is broken
So it can collect the garbage from Heap.

5) Security Manager :- It constantly monitors the code.It is 2nd level of security.[1st level is Byte-code verifier ].

How can I take a thread dump and heap dump automatically when my CPU utilization is above 80%?

Thread dumps are vital artifacts to diagnose CPU spikes, deadlocks, memory problems, unresponsive applications, poor response times, and other system problems. There are great online thread dump analysis tools such as http://fastthread.io/ that can analyze and spot problems. But to those tools you need provide proper thread dumps as input. Thus in this article, I have documented 7 different options to capture thread dumps.

1. jstack

‘jstack’ is an effective command line tool to capture thread dumps. The jstack tool is shipped in JDK_HOME\bin folder. Here is the command that you need to issue to capture thread dump:
jstack -l  <pid> > <file-path>
Where:
pid: is the Process Id of the application, whose thread dump should be captured
file-path: is the file path where thread dump will be written in to.
Example:
jstack -l 37320 > /opt/tmp/threadDump.txt
As per the example thread dump of the process would be generated in /opt/tmp/threadDump.txt file.
Jstack tool is included in JDK since Java 5. If you are running in older version of java, consider using other options

2. Kill -3

In major enterprises for security reasons only JREs are installed in production machines. Since jstack and other tools are only part of JDK, you wouldn’t be able to use jstack. In such circumstances, ‘kill -3’ option can be used.
kill -3 <pid>
Where:
pid: is the Process Id of the application, whose thread dump should be captured
Example:
Kill -3 37320
When ‘kill -3’ option is used thread dump is sent to standard error stream. If you are running your application in tomcat, thread dump will be sent in to <TOMCAT_HOME>/logs/catalina.out file.
Note: To my knowledge this option is supported in most flavors of *nix operating systems (Unix, Linux, HP-UX operating systems). Not sure about other Operating systems.

3. JVisualVM

Java VisualVM is a graphical user interface tool that provides detailed information about the applications while they are running on a specified Java Virtual Machine (JVM). It’s located in JDK_HOME\bin\jvisualvm.exe. It’s part of Sun’s JDK distribution since JDK 6 update 7.s
Launch the jvisualvm. On the left panel, you will notice all the java applications that are running on your machine. You need to select your application from the list (see the red color highlight in the below diagram). This tool also has the capability to capture thread dumps from the java processes that are running in remote host as well.
Fig: Java Visual VMFig: Java Visual VM
Now go to the “Threads” tab. Click on the “Thread Dump” button as shown in the below image. Now Thread dumps would be generated.
Image title
Fig: Highlighting "Thread Dump" button in the “Threads” tab

4. JMC

Java Mission Control (JMC) is a tool that collects and analyze data from Java applications running locally or deployed in production environments. This tool has been packaged into JDK since Oracle JDK 7 Update 40. This tool also provides an option to take thread dumps from the JVM. JMC tool is present in JDK_HOME\bin\jmc.exe
Once you launch the tool, you will see all the Java processes that are running on your local host. Note: JMC also has the ability to connect with java processes running on a remote host. Now on the left panel click on the “Flight Recorder” option that is listed below the Java process for which you want to take thread dumps. Now you will see the “Start Flight Recording” wizard, as shown in the below figure.
Image titleFig: Flight Recorder wizard showing ‘Thread Dump’ capture option.
Here in the “Thread Dump” field, you can select the interval in which you want to capture thread dump. As per the above example, every 60 seconds thread dump will be captured. After the selection is complete start the Flight recorder. Once recording is complete, you will see the thread dumps in the “Threads” panel, as shown in the figure below.
Image title
Fig: Showing captured ‘Thread Dump’ in JMC.

5. Windows (Ctrl + Break)

This option will work only in Windows Operating system.
  • Select command line console window in which you have launched application.
  • Now on the console window issue the “Ctrl + Break” command.
This will generate thread dump. The thread dump will be printed on the console window itself.
Note 1: in several laptops (like my Lenovo T series) “Break” key is removedJ. In such circumstances you have to google to find the equivalent keys for the “Break”. In my case it turned out that “Function key + B” is the equivalent of “Break” key. Thus I had to use “Ctrl + Fn + B” to generate thread dumps.
Note 2: But one disadvantage with this approach is thread dump will be printed on the windows console itself. Without getting the thread dump in a file format, it’s hard to use the thread dump analysis tools such as http://fasthread.io. Thus when you launch the application from the command line, redirect the output a text file i.e. Example if you are launching the application “SampleThreadProgram”, you would issue the command:
java -classpath . SampleThreadProgram
Instead launch the SampleThreadProgram like this
java -classpath . SampleThreadProgram > C:\workspace\threadDump.txt 2>&1
Thus when you issue “Ctrl + Break” thread dump will be sent to C:\workspace\threadDump.txt file.

6. ThreadMXBean

Since JDK 1.5 ThreadMXBean has been introduced. This is the management interface for the thread system in the Java Virtual Machine. Using this interface also you can generate thread dumps. You only have to write few lines of code to generate thread dumps programmatically. Below is a skeleton implementation on ThreadMXBean implementation, which generates Thread dump from the application.
    public void  dumpThreadDump() {
        ThreadMXBean threadMxBean = ManagementFactory.getThreadMXBean();
        for (ThreadInfo ti : threadMxBean.dumpAllThreads(true, true)) {
            System.out.print(ti.toString());
        }
    }

7. APM Tool – App Dynamics

Few Application Performance Monitoring tools provide options to generate thread dumps. If you are monitoring your application through App Dynamics (APM tool), below are the instructions to capture thread dump:
1. Create an action, selecting Diagnostics->Take a thread dump in the Create Action window.
2. Enter a name for the action, the number of samples to take, and the interval between the thread dumps in milliseconds.
3. If you want to require approval before the thread dump action can be started, check the Require approval before this Action checkbox and enter the email address of the individual or group that is authorized to approve the action. See Actions Requiring Approval for more information.
4. Click OK.
Image title
Fig: App dynamics thread dump capturing wizard

8. JCMD

The jcmd tool was introduced with Oracle’s Java 7. It’s useful in troubleshooting issues with JVM applications. It has various capabilities such as identifying java process Ids, acquiring heap dumps, acquiring thread dumps, acquiring garbage collection statistics, ….
Using the below JCMD command you can generate thread dump:
jcmd <pid> Thread.print > <file-path>
where
pid: is the Process Id of the application, whose thread dump should be captured
file-path: is the file path where thread dump will be written in to.
Example:
jcmd 37320 Thread.print > /opt/tmp/threadDump.txt
As per the example thread dump of the process would be generated in /opt/tmp/threadDump.txt file.

Conclusion

Even though 7 different options are listed to capture thread dumps, IMHO, 1. 'jstack' and  2. 'kill -3' are the best ones. Because they are:
a. Simple (straightforward, easy to implement)
b. Universal (works in most cases regardless of OS, Java Vendor, JVM version, etc.)