Summary:Thread Dump is a very useful tool for diagnosing Java application problems.
This article is shared from Huawei Cloud Community “Debugging and Troubleshooting – Thread Dump Analysis of Java Thread Analysis”, author: Long Ge Notes.
Thread Dump is a very useful tool for diagnosing Java application problems. Each Java virtual machine has the ability to generate thread-dumps of all threads at a certain point in time. Although the thread dumps printed by each Java virtual machine are slightly different, most of them provide a snapshot of the current active thread, and all threads in the JVM. The stack trace information of a Java thread. The stack information generally includes the complete class name and method executed, and if possible, the line number of the source code.
Thread Dump Features
- Can be used under various operating systems;
- Can be used under various Java application servers;
- Can be used in a production environment without affecting system performance;
- Ability to pinpoint problems directly to the line of code in the application;
Thread Dump capture
Generally, when the server hangs, crashes or has low performance, it is necessary to capture the server’s thread stack (Thread Dump) for subsequent analysis. In actual operation, the information of one dump is often not enough to confirm the problem. In order to reflect the dynamic changes of the thread state, it is necessary to perform thread dumps several times in succession, with an interval of 10-20s each time. It is recommended to generate dump information at least three times. If each dump points to the same problem, we can determine the typicality of the problem.
- OS command to get ThreadDump
ps –ef | grep java
kill -3 <pid>
Notice:
Be careful, one careless step may kill the server process. The kill -9 command kills the process.
- The tools that come with the JVM get the thread stack
jps 或 ps –ef | grep java (获取PID)
jstack [-l ] <pid> | tee -a jstack.log(获取ThreadDump)
Thread Dump Analysis
Thread Dump Information
- Header information: time, JVM information
2011-11-02 19:05:06
Full thread dump Java HotSpot(TM) Server VM (16.3-b01 mixed mode):
1. "Timer-0" daemon prio=10 tid=0xac190c00 nid=0xaef in Object.wait() [0xae77d000]
# 线程名称:Timer-0;线程类型:daemon;优先级: 10,默认是5;
# JVM线程id:tid=0xac190c00,JVM内部线程的唯一标识(通过java.lang.Thread.getId()获取,通常用自增方式实现)。
# 对应系统线程id(NativeThread ID):nid=0xaef,和top命令查看的线程pid对应,不过一个是10进制,一个是16进制。(通过命令:top -H -p pid,可以查看该进程的所有线程信息)
# 线程状态:in Object.wait();
# 起始栈地址:[0xae77d000],对象的内存地址,通过JVM内存查看工具,能够看出线程是在哪儿个对象上等待;
2. java.lang.Thread.State: TIMED_WAITING (on object monitor)
3. at java.lang.Object.wait(Native Method)
4. -waiting on <0xb3885f60> (a java.util.TaskQueue) # 继续wait
5. at java.util.TimerThread.mainLoop(Timer.java:509)
6. -locked <0xb3885f60> (a java.util.TaskQueue) # 已经locked
7. at java.util.TimerThread.run(Timer.java:462)
Java thread statck trace:是上面2-7行的信息。到目前为止这是最重要的数据,Java stack trace提供了大部分信息来精确定位问题根源。
- Detailed explanation of Java thread statck trace:
Stack information should be interpreted in reverse: The program executes line 7 first, then line 6, and so on.
- locked <0xb3885f60> (a java.util.ArrayList)
- waiting on <0xb3885f60> (a java.util.ArrayList)
That is to say, the object is locked first, lock the object 0xb3885f60, then release the object lock and enter the waiting state. Why does this happen? Take a look at the following java code example and you will understand:
synchronized(obj) {
.........
obj.wait();
.........
}
As above, during the execution of the thread, first use synchronized to obtain the Monitor of this object (corresponding to locked <0xb3885f60>). When obj.wait() is executed, the thread gives up the ownership of Monitor and enters the “wait set” queue (corresponding to waiting on <0xb3885f60> ).
In the first line of information on the stack, the state of the thread at the code level is further marked,For example:
java.lang.Thread.State: TIMED_WAITING (parking)
The explanation is as follows:
|blocked|
> This thread tried to enter asynchronized block, but the lock was taken by another thread. This thread isblocked until the lock gets released.
|blocked (on thin lock)|
> This is the same state asblocked, but the lock in question is a thin lock.
|waiting|
> This thread calledObject.wait() on an object. The thread will remain there until some otherthread sends a notification to that object.
|sleeping|
> This thread calledjava.lang.Thread.sleep().
|parked|
> This thread calledjava.util.concurrent.locks.LockSupport.park().
|suspended|
> The thread's execution wassuspended by java.lang.Thread.suspend() or a JVMTI agent call.
Thread state analysis
The state of the thread is a very important thing, so these states will be displayed in the thread dump. Through the analysis of these states, the running status of the thread can be obtained, and then possible problems can be found.The state of the thread is defined in the Thread.State enumeration type:
public enum State
{
/**
* Thread state for a thread which has not yet started.
*/
NEW,
/**
* Thread state for a runnable thread. A thread in the runnable
* state is executing in the Java virtual machine but it may
* be waiting for other resources from the operating system
* such as processor.
*/
RUNNABLE,
/**
* Thread state for a thread blocked waiting for a monitor lock.
* A thread in the blocked state is waiting for a monitor lock
* to enter a synchronized block/method or
* reenter a synchronized block/method after calling
* {@link Object#wait() Object.wait}.
*/
BLOCKED,
/**
* Thread state for a waiting thread.
* A thread is in the waiting state due to calling one of the
* following methods:
* <ul>
* <li>{@link Object#wait() Object.wait} with no timeout</li>
* <li>{@link #join() Thread.join} with no timeout</li>
* <li>{@link LockSupport#park() LockSupport.park}</li>
* </ul>
*
* <p>A thread in the waiting state is waiting for another thread to
* perform a particular action.
*
* For example, a thread that has called <tt>Object.wait()</tt>
* on an object is waiting for another thread to call
* <tt>Object.notify()</tt> or <tt>Object.notifyAll()</tt> on
* that object. A thread that has called <tt>Thread.join()</tt>
* is waiting for a specified thread to terminate.
*/
WAITING,
/**
* Thread state for a waiting thread with a specified waiting time.
* A thread is in the timed waiting state due to calling one of
* the following methods with a specified positive waiting time:
* <ul>
* <li>{@link #sleep Thread.sleep}</li>
* <li>{@link Object#wait(long) Object.wait} with timeout</li>
* <li>{@link #join(long) Thread.join} with timeout</li>
* <li>{@link LockSupport#parkNanos LockSupport.parkNanos}</li>
* <li>{@link LockSupport#parkUntil LockSupport.parkUntil}</li>
* </ul>
*/
TIMED_WAITING,
/**
* Thread state for a terminated thread.
* The thread has completed execution.
*/
TERMINATED;
}
Each thread has a corresponding Thread object in the heap memory. Thread t = new Thread(); When the Thread object has just been created in the heap memory and the t.start() method has not been called, the thread is in the NEW state. In this state, the thread is no different from an ordinary java object, it is just an object in the heap memory.
This state indicates that the thread has all the running conditions, is in the run queue for scheduling by the operating system, or is running. Threads in this state are relatively normal, but if the thread stays in this state for a long time, it is not normal. question).
The thread is waiting to acquire the monitor (also called built-in lock) of the java object, that is, the thread is waiting to enter the method or code block protected by synchronized. Synchronized is used to ensure atomicity. Only one thread can enter the critical area at any time, and other threads can only wait in line.
In the state of the thread, it is waiting for an event to occur. Only when certain conditions are met can it be executed. And this specific event is usually another thread. That is to say, if a specific event does not occur, the thread in this state has been waiting and cannot get a chance to execute. for example:
A thread calls obj.wait() method of obj object, if no thread calls obj.notify or obj.notifyAll, then A thread has no way to resume running; if A thread calls LockSupport.park(), no other thread Call LockSupport.unpark(A), then A has no way to resume operation. TIMED_WAITING:
Many thread-related classes in JUC provide time-limited and unlimited-time APIs. TIMED_WAITING means that the thread called a time-limited version of the API and is waiting for time to elapse. When the waiting time has elapsed, the thread can resume running. If a thread enters the WAITING state, a specific event must occur before it can resume running; while a thread in TIMED_WAITING will resume running if a specific event occurs or the time elapses.
After the thread is executed, the run method returns normally after execution, or ends when a runtime exception is thrown, the thread will stay in this state. At this time, only the Thread object is left in the thread, which is useless.
Critical State Analysis
- Wait on condition: The thread is either sleeping or waiting to be notified by another thread.
This state means that it is waiting for another condition to wake itself up, or it simply called sleep(n).
At this time, the thread status is roughly as follows:
java.lang.Thread.State: WAITING (parking):一直等那个条件发生;
java.lang.Thread.State: TIMED_WAITING (parking或sleeping):定时的,那个条件不到来,也将定时唤醒自己。
- Waiting for Monitor Entry and in Object.wait(): The thread is waiting to get the lock for an object (some other thread may be holding the lock). This happens if two or more threads try to execute synchronized code. Note that the lock is always for an object and not for individual methods .
In a multi-threaded JAVA program, to achieve synchronization between threads, we must talk about Monitor.Monitor is the main means in Java to realize mutual exclusion and cooperation between threads, and it can be regarded as an object or Class lock. Each object has, and only one Monitor . The following figure describes the relationship between threads and Monitors, as well as the state transition diagram of threads:
As shown above, each Monitor can only be owned by one thread at a certain moment.This thread is “ActiveThread”, and other threads are “Waiting Thread”, waiting in two queues “Entry Set” and “Wait Set” respectively. The state of the thread waiting in the “Entry Set” is “Waiting for monitor entry”, while the state of the thread waiting in the “Wait Set” is “in Object.wait()”.
First look at the threads in “Entry Set”. We call the code segment protected by synchronized a critical section.When a thread applies to enter the critical section, it enters the “Entry Set” queue. The corresponding code is like:
synchronized(obj) {
.........
}
At this point there are two possibilities:
- The monitor is not owned by other threads, and there are no other waiting threads in the Entry Set. This thread becomes the Owner of the Monitor of the corresponding class or object, and executes the code in the critical section.
- The monitor is owned by other threads, and this thread is waiting in the Entry Set queue.
In the first case, the thread will be in the “Runnable” state, while in the second case, the thread DUMP will show that it is in “waiting for monitor entry”. as follows:
"Thread-0" prio=10 tid=0x08222eb0 nid=0x9 waiting for monitor entry [0xf927b000..0xf927bdb8]
at testthread.WaitThread.run(WaitThread.java:39)
- waiting to lock <0xef63bf08> (a java.lang.Object)
- locked <0xef63beb8> (a java.util.ArrayList)
at java.lang.Thread.run(Thread.java:595)
The setting of the critical section is to ensure the atomicity and integrity of the code execution inside it. But because the critical section only allows threads to pass serially at any time, this is contrary to the original intention of our multithreaded program.If synchronized is used extensively in a multi-threaded program, or it is used improperly, a large number of threads will wait at the entrance of the critical section, resulting in a significant drop in system performance.. If you find this situation in the thread DUMP, you should review the source code and improve the program.
Look at the thread in “Wait Set”.When the thread obtains the Monitor and enters the critical section, if it finds that the condition for the thread to continue running is not met, it calls the wait() method of the object (usually the synchronized object), abandons the Monitor, and enters the “Wait Set” queue.Only when another thread calls notify() on the object or notifyAll(), threads in the “Wait Set” queue get a chance to compete, but only one thread obtains the Monitor of the object and returns to the running state. Threads in the “Wait Set” are represented in DUMP as: in Object.wait(). as follows:
"Thread-1" prio=10 tid=0x08223250 nid=0xa in Object.wait() [0xef47a000..0xef47aa38]
at java.lang.Object.wait(Native Method)
- waiting on <0xef63beb8> (a java.util.ArrayList)
at java.lang.Object.wait(Object.java:474)
at testthread.MyWaitThread.run(MyWaitThread.java:40)
- locked <0xef63beb8> (a java.util.ArrayList)
at java.lang.Thread.run(Thread.java:595)
综上,一般CPU很忙时,则关注runnable的线程,CPU很闲时,则关注waiting for monitor entry的线程。
As mentioned above, if the synchronized and monitor mechanisms are not used properly, it may cause performance problems of multi-threaded programs. In JDK 5.0, the Lock mechanism was introduced, so that developers can develop high-performance concurrent multi-threaded programs more flexibly, and can replace the synchronized and Monitor mechanisms in the previous JDK. but,It should be noted that because the Lock class is just an ordinary class, the JVM has no way of knowing the occupancy of the Lock object, so the thread DUMP will not include information about the LockOn issues such as deadlocks, it is not as easy to identify as using synchronized programming.
critical state example
package jstack;
public class BlockedState
{
private static Object object = new Object();
public static void main(String[] args)
{
Runnable task = new Runnable() {
@Override
public void run()
{
synchronized (object)
{
long begin = System.currentTimeMillis();
long end = System.currentTimeMillis();
// 让线程运行5分钟,会一直持有object的监视器
while ((end - begin) <= 5 * 60 * 1000)
{
}
}
}
};
new Thread(task, "t1").start();
new Thread(task, "t2").start();
}
}
The thread that gets the object first will execute for 5 minutes.During these 5 minutes, the monitor of the object will be held all the time, and another thread cannot execute and is in the BLOCKED state:
Full thread dump Java HotSpot(TM) Server VM (20.12-b01 mixed mode):
"DestroyJavaVM" prio=6 tid=0x00856c00 nid=0x1314 waiting on condition [0x00000000]
java.lang.Thread.State: RUNNABLE
"t2" prio=6 tid=0x27d7a800 nid=0x1350 waiting for monitor entry [0x2833f000]
java.lang.Thread.State: BLOCKED (on object monitor)
at jstack.BlockedState$1.run(BlockedState.java:17)
- waiting to lock <0x1cfcdc00> (a java.lang.Object)
at java.lang.Thread.run(Thread.java:662)
"t1" prio=6 tid=0x27d79400 nid=0x1338 runnable [0x282ef000]
java.lang.Thread.State: RUNNABLE
at jstack.BlockedState$1.run(BlockedState.java:22)
- locked <0x1cfcdc00> (a java.lang.Object)
at java.lang.Thread.run(Thread.java:662)
You can see through thread dump:The t2 thread is indeed in BLOCKED (on object monitor). waiting for monitor entry waiting to enter the synchronized protected area.
package jstack;
public class WaitingState
{
private static Object object = new Object();
public static void main(String[] args)
{
Runnable task = new Runnable() {
@Override
public void run()
{
synchronized (object)
{
long begin = System.currentTimeMillis();
long end = System.currentTimeMillis();
// 让线程运行5分钟,会一直持有object的监视器
while ((end - begin) <= 5 * 60 * 1000)
{
try
{
// 进入等待的同时,会进入释放监视器
object.wait();
} catch (InterruptedException e)
{
e.printStackTrace();
}
}
}
}
};
new Thread(task, "t1").start();
new Thread(task, "t2").start();
}
}
Full thread dump Java HotSpot(TM) Server VM (20.12-b01 mixed mode):
"DestroyJavaVM" prio=6 tid=0x00856c00 nid=0x1734 waiting on condition [0x00000000]
java.lang.Thread.State: RUNNABLE
"t2" prio=6 tid=0x27d7e000 nid=0x17f4 in Object.wait() [0x2833f000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x1cfcdc00> (a java.lang.Object)
at java.lang.Object.wait(Object.java:485)
at jstack.WaitingState$1.run(WaitingState.java:26)
- locked <0x1cfcdc00> (a java.lang.Object)
at java.lang.Thread.run(Thread.java:662)
"t1" prio=6 tid=0x27d7d400 nid=0x17f0 in Object.wait() [0x282ef000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x1cfcdc00> (a java.lang.Object)
at java.lang.Object.wait(Object.java:485)
at jstack.WaitingState$1.run(WaitingState.java:26)
- locked <0x1cfcdc00> (a java.lang.Object)
at java.lang.Thread.run(Thread.java:662)
It can be found that both t1 and t2 are in WAITING (on object monitor), and the reason for entering the waiting state is that in Object.wait() is called. Through the lock and conditional queue under the JUC package, this effect is also achieved, and you can practice it yourself.
- Show TIMED_WAITING status
package jstack;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.locks.Condition;
import java.util.concurrent.locks.Lock;
import java.util.concurrent.locks.ReentrantLock;
public class TimedWaitingState
{
// java的显示锁,类似java对象内置的监视器
private static Lock lock = new ReentrantLock();
// 锁关联的条件队列(类似于object.wait)
private static Condition condition = lock.newCondition();
public static void main(String[] args)
{
Runnable task = new Runnable() {
@Override
public void run()
{
// 加锁,进入临界区
lock.lock();
try
{
condition.await(5, TimeUnit.MINUTES);
} catch (InterruptedException e)
{
e.printStackTrace();
}
// 解锁,退出临界区
lock.unlock();
}
};
new Thread(task, "t1").start();
new Thread(task, "t2").start();
}
}
Full thread dump Java HotSpot(TM) Server VM (20.12-b01 mixed mode):
"DestroyJavaVM" prio=6 tid=0x00856c00 nid=0x169c waiting on condition [0x00000000]
java.lang.Thread.State: RUNNABLE
"t2" prio=6 tid=0x27d7d800 nid=0xc30 waiting on condition [0x2833f000]
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x1cfce5b8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:196)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2116)
at jstack.TimedWaitingState$1.run(TimedWaitingState.java:28)
at java.lang.Thread.run(Thread.java:662)
"t1" prio=6 tid=0x280d0c00 nid=0x16e0 waiting on condition [0x282ef000]
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x1cfce5b8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:196)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2116)
at jstack.TimedWaitingState$1.run(TimedWaitingState.java:28)
at java.lang.Thread.run(Thread.java:662)
It can be seen that both t1 and t2 threads are in java.lang.Thread.State: TIMED_WAITING (parking), this parking represents the tool class under the called JUC, not the default java monitor.
case analysis
problem scenario
- High CPU, high load, slow response
- Multiple dumps during a request;
- Comparing the runnable threads of multiple dump files, if the execution method has a relatively large change, it is relatively normal. If you are executing the same method, there are some problems;
- Find the most CPU-intensive threads
- Use the command: top -H -p pid (pid is the process number of the system under test) to find the thread ID that causes the high CPU, corresponding to the nid of the thread in the thread dump information, except that one is decimal and the other is hexadecimal;
- In the thread dump, find the corresponding thread stack information according to the thread ID searched by the top command;
- Low CPU usage but slow response
Perform a dump to check whether there are many thread strikes in i/o, database, etc., and locate the cause of the bottleneck;
Dump multiple times to compare whether all runnable threads have been executing the same method. If yes, congratulations, it is locked!
deadlock
A deadlock often manifests itself as a program stalling, or no longer responding to user requests. Observed from the operating system, the CPU usage of the corresponding process is zero, and will soon disappear from the output of top or prstat.
For example, in the following example, it is a typical deadlock situation:
"Thread-1" prio=5 tid=0x00acc490 nid=0xe50 waiting for monitor entry [0x02d3f000
..0x02d3fd68]
at deadlockthreads.TestThread.run(TestThread.java:31)
- waiting to lock <0x22c19f18> (a java.lang.Object)
- locked <0x22c19f20> (a java.lang.Object)
"Thread-0" prio=5 tid=0x00accdb0 nid=0xdec waiting for monitor entry [0x02cff000
..0x02cff9e8]
at deadlockthreads.TestThread.run(TestThread.java:31)
- waiting to lock <0x22c19f20> (a java.lang.Object)
- locked <0x22c19f18> (a java.lang.Object)
Deadlock detection has been enhanced in Java 5.Java-level deadlocks can be directly reported in thread dump,As follows:
Found one Java-level deadlock:
=============================
"Thread-1":
waiting to lock monitor 0x0003f334 (object 0x22c19f18, a java.lang.Object),
which is held by "Thread-0"
"Thread-0":
waiting to lock monitor 0x0003f314 (object 0x22c19f20, a java.lang.Object),
which is held by "Thread-1"
thermal lock
Thermal locks are also often the main factor leading to system performance bottlenecks. Its performance characteristics are:Due to multiple threads competing for critical sections, or locksmay appear:
- Frequent thread context switching: From the perspective of thread scheduling by the operating system, when a thread is blocked waiting for resources, the operating system will switch it out and put it in the waiting queue. After the thread obtains the resource, the scheduling algorithm will switch the thread in. into the execution queue.
- lots of system calls: Because of the context switching of threads, the competition of hot locks, or the frequent entry and exit of critical sections, a large number of system calls may be caused.
- Most of the CPU overhead is spent in “system mode”: Thread context switching and system calls will cause the CPU to run in the “system state”. In other words, although the system is very busy, the proportion of CPU used in the “user state” is small, and the application program cannot get enough CPU. resource.
- As the number of CPUs increases, the performance of the system decreases. Because the number of CPUs is large, the more threads run at the same time, it may cause more frequent thread context switching and system CPU overhead, resulting in worse performance.
The above descriptions are all manifestations of a system with poor scalability. From the perspective of overall performance indicators, due to the existence of thread hot locks, the response time of the program will be longer and the throughput will be reduced.
So, how to know where the “hot lock” appears??
An important method is to combine various tools of the operating system to observe the usage of system resources, and to collect dump information of Java threads, to see what methods the threads are blocked on, and to understand the reasons to find the corresponding solutions.
VM important thread
Some of the more important threads generated during the running of the JVM are listed as follows:
Click to follow and learn about Huawei Cloud’s fresh technologies for the first time~
#Detailed #explanation #JAVA #thread #problem #diagnosis #tool #Thread #Dump #HUAWEI #CLOUD #Developer #Alliances #personal #space #News Fast Delivery