(JMAP JHAT & GCORE)
Intended Audience
Basic knowledge about memory settings for java processes and UNIX commands is required. This article knowledge can be used to debug memory issues related to applications deployed in application servers or standalone java processes.
Last week I was looking into a batch application which is designed for an uptime of 24X7X365. The application processes data very efficiently and fast when the application is started. After certain point of time, the application was not processing any data or rather it was becoming progressively slow. To fix the issue, I had to do some amount of research and since I did not find any directly available documentation, I decided to do it myself. Although the experience I put here is for batch application the same can be used for web application and any other type of applications that use JDK6 or above.
This article will explain few tactics that can be used to identify if the application is hanging or not responding due to memory issues or if there are some possible memory leaks. This article assumes that you are running the application in UNIX environment with SUN JVM. The syntax used might be slightly different for windows environment in case you want to tailor it for windows environment.
To give a gist of the application that I used, it processes and compares different versions of XML documents. The application processes the data in batches. During the initial hours of the application, the rate of completion was 12 batches/hour (ignore what a batch size is as this is used only for showing relative figures). After 3 hours, the rate of execution dropped to 3 batches / hour and after 5 hours, the rate dropped to 1 batch / hour. This showed that the application was having some problem, which could be memory leaks, locks, CPU contention, Network issues, and or database issues.
CPU Usage
The CPU usage during the first, third and fifth hour were 99.3%, 75% and 4% respectively. This was hinting that the application was not really processing the data as it progressed. So the next step was to find out what was happening. The possible hints are that the system is doing something else which doesn’t require so much of CPU like GC, IO Wait, Locks etc.
Thread Dump
The thread dump of the process was taken to figure out the state of each thread. The application was using a batch framework designed on multithreading. Hence the thread dump showed details for all of the 20 threads it was using.
The syntax to take the thread dump in UNIX is
kill -3
The output will be redirected to the standard output log configured for your process. Remember its not the log4j logs, it’s the standard output log. Most of the application servers name it (of course you can configure it) as SystemOut.log
The thread dump showed that all the 20 threads were in RUNNABLE state. This showed that the application threads are not in any deadlock situation and neither is it waiting for a monitor entry (like synchronized blocks). This also showed that the threads were not waiting on IO.
Memory Profiling
The next area to investigate was the memory. The process was started again with the following command line parameters to get the GC details printed.
“-Xloggc:/usr/berfty/gc_log.log -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps"
We aimed at GC processes details as it will explain what are the events happening with respect to the memory. The GC log can give details
- How much objects are still referenced by the application.
- How much objects are removed from the memory.
- How many times Full GC happens.
- Time taken for the GC process.
There are 2 types of entries you will find in the GC log.
Partial GC: This clears the objects from young generation area by removing it from heap (if the object is no longer referenced by application) or moves the objects to the old generation area. The partial GC is represented as shown below
6930.574: [GC [PSYoungGen: 3099243K->15566K(3111424K)] 4976226K->1905642K(5208576K), 0.0220170 secs]
The above line means that the young generation was having an initial size of 3099243K. After the GC process completed, the young generation size reduced to 15566K. The unwanted objects were either removed or moved to the old generation area. The second section shows that the details about the entire heap (Young and Old Generation area). The size of heap before GC was 4976226K and after the GC the size reduced to 1905642K. The entire process of GC took 0.0220170 secs.
Full GC: This process performed the full GC as the old generation was reaching its capacity and the tenured objects from new generation area were not able to be moved from the new area to the old area. The Full GC is represented as shown below.
6966.094: [Full GC [PSYoungGen: 8110K->0K(3106624K)] [ParOldGen: 2069790K->1024980K(2097152K)] 2077901K->1024980K(5203776K) [PSPermGen: 49032K->48483K(131072K)], 2.0098750 secs]
The above line means that the Young generation occupancy changed 8110K to 0K due to the GC process. The old generation size before the GC was 2069790K and after the GC it was 1024980K. The logs also details about the permanent generation area too.
To generalize the above logs, the GC details are printed like this in the gc_log file
Taking the topic back to our application, after 5 hours, the GC log was showing only Full GC happening every 2 seconds. The reason was that the old generation was not getting cleared which was indicated by the similar starting capacity and final capacity of the old generation area in the GC log.
51655.720: [Full GC [PSYoungGen: 1857920K->217698K(2501824K)] [ParOldGen: 2023876K->2022817K(2097152K)] 3881796K->2240515K(4598976K) [PSPermGen: 53644K->53633K(131072K)], 1.4887940 secs]
51658.184: [Full GC [PSYoungGen: 1857920K->224477K(2501824K)] [ParOldGen: 2022817K->2022657K(2097152K)] 3880737K->2247134K(4598976K) [PSPermGen: 53643K->53598K(131072K)], 1.4882530 secs]
51660.675: [Full GC [PSYoungGen: 1857920K->236322K(2501824K)] [ParOldGen: 2022657K->2020841K(2097152K)] 3880577K->2257163K(4598976K) [PSPermGen: 53645K->53595K(131072K)], 1.9492500 secs]
51663.604: [Full GC [PSYoungGen: 1857920K->242417K(2501824K)] [ParOldGen: 2020841K->2020278K(2097152K)] 3878761K->2262696K(4598976K) [PSPermGen: 53607K->53597K(131072K)], 1.9473010 secs]
51666.424: [Full GC [PSYoungGen: 1857920K->217779K(2501824K)] [ParOldGen: 2020278K->2018873K(2097152K)] 3878198K->2236652K(4598976K) [PSPermGen: 53608K->53595K(131072K)], 1.4736700 secs]
51669.022: [Full GC [PSYoungGen: 1857920K->217271K(2501824K)] [ParOldGen: 2018873K->2018606K(2097152K)] 3876793K->2235878K(4598976K) [PSPermGen: 53654K->53595K(131072K)], 1.9685320 secs]
51671.963: [Full GC [PSYoungGen: 1857920K->226794K(2501824K)] [ParOldGen: 2018606K->2018585K(2097152K)] 3876526K->2245380K(4598976K) [PSPermGen: 53609K->53595K(131072K)], 1.5949040 secs]
51674.840: [Full GC [PSYoungGen: 1857920K->242015K(2501824K)] [ParOldGen: 2018585K->2017538K(2097152K)] 3876505K->2259553K(4598976K) [PSPermGen: 53647K->53597K(131072K)], 1.8560720 secs]
Diagnosing the Memory
The above log and inferences suggested that the application has memory leaks. We took different approaches and hit a roadblock in some approaches. The various approaches that were attempted are listed below.
To identify the memory leak issue, we need to figure out how many objects are getting created by the application and how many of them are removed after its use. For e.g.: If a byte stream is opened and not closed after its use, the byte array will continue to exist in the memory causing filling up of the old generation area. Since the byte array is present in the new area for long time, the objects will be moved to the old area. Hence the aim was to identify the objects that are occupying and filling the memory.
gcore
Using gcore, the process dump can be created. From the core dump, jmap process can generate a heap dump file. Using the heap dump file, the jhat process can generate a detailed histogram of objects in the heap. Our application was working on JDK5. gcore is a UNIX BSD utility which will generate the image of the running java process. The syntax to generate the core is
gcore
where pid is the process id of the java process. The file will be created by the name core.
The core file generated is passed as parameter to the jmap process which will be used to generate the heap bin file. Since the core was generated for a process run on JDK5, the jmap of JDK5 should be used to generate the heap dump file. The syntax used is
/usr/java/jdk1.5.0_12/bin/jmap -heap:format=b `which java` core.5831
Attaching to core core.5831 from executable /opt/java/bin/java, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 1.6.0-ea-b52
Dumping heap to heap.bin ...
Heap dump file created
`which java` is used to get the path of the executable of java. The output will be created as heap.bin file. Using the heap.bin file, we can generate the heap histogram and details with the jhat utility introduced in JDK6.
/usr/java/jdk1.6.0/bin/jhat -stack true -J-d64 -J-mx10240m -port 8080
The jhat can understand the heap dump created by the JDK5 and JDK4 processes (theoretically true but it did not work for us). Another point to note in using jhat is that it requires a lot of memory (approx 6-10GB for 2GB Heap file) and hence requires a 64bit JVM.
jhat (JDK 6 & above)
As mentioned before, we tried to generate the core file and generate the heap dump out of the core file. The jhat utility gave various exceptions while loading the heap dump created by the JDK5 process. So we used the JDK6 process to run our application. With JDK6, the approach we took is given below.
We first used jmap to generate the heap dump file. The syntax used is given below
/usr/java/jdk1.6.0/bin/jmap -dump:format=b,file=
After the heap dump file is created, we use the jhat utility to generate the heap description. The syntax used for the jhat is
/usr/java/jdk1.6.0/bin/jhat -stack true -J-d64 -J-mx10240m -port 8080
The jhat utility will extract the data from the heap dump file and load it into the memory. Once the loading is complete, jhat will start a webserver at the port specified in the command shown above. The heap details can be accessed at the web URL
http://
Heap Histogram
All Classes (excluding platform)
Class Instance Count Total Size
class [B 337060 1428530365
class [C 3708107 302392746
class java.lang.String 3653039 73060780
class [I 52573 51248160
class xh7_6_13.yl 321018 33064854
class [Ljava.util.HashMap$Entry; 78852 13199256
class [Ljava.lang.Object; 72781 12525120
class java.util.HashMap$Entry 156231 4374468
The [represents array of objects of type mentioned as the next word.
class [C is char[]
class [B is byte[]
class [Z is boolean[]
class [S is short[]
class [I is int[]
class [J is long[]
class [D is double[]
The details like who are referring to each object are also present inside the heap histogram details. This is found by using the referrer and reference property of each type of object. Using this data, we can identify the objects that are loaded into the memory and the objects that are unintentionally lingering in the memory. Once we identify the object, we can track the code to remove the unwanted references.
jmap
With JDK6 and above, the jmap process can directly print the heap histogram as a text file. This is much easier and can give early indications of memory leak. Since the output is a text file, more details like who loaded, who is referencing it will not be present. For the details that are missed in the jmap histogram output, jhat need to be used.
Memory Profiler
We used a commercially well known memory profiler and it did not serve the purpose.
Though the intention of the document is not to show how the issue was fixed, few reviewers felt that it should be added. From the heap histogram, it was evident that the byte array was consuming almost 1.4GB of the available 2GB space of the Old generation area. The remaining 0.6 GB was occupied by a cache, which was designed to do it that way. The byte array was occupying memory as one of the byte streams was not closed by the application.
The above cases work with SUN JDK. For other JVM’s there are other tools like HEAP Analyzer and MDD4J.
No comments:
Post a Comment