We had this Java Spring Boot based micro-service not doing any heavy lifting as such, but on whichever environment we deployed it, it always had close to 200% CPU usage.
For debugging the cause behind such an unexpected behavior, we followed following steps and came across a reason behind it.
Checking CPU Usage
Where did we get to know the fact that the service had a high CPU?
For you can use kubectl top
command to see the resource usage by pods.
We had 2 CPU cores assigned to each pod of the service.
Here you can see 200% CPU usage mentioned as ~2000m values for CPU cores.
Taking the Thread Dump
In the output of kubectl top command response, you can see CPU usage of the-service being very high. Does not matter how many pods we ran, all had a very high CPU usage. As there was no load on the service, there must be something wrong happening within itself that was consuming the resources like this.
From this we knew we had to look at the thread dump of the service, but the app was running in kubernetes and the docker base image it was using did not have jmap
in java that it was using. Neither could we install jstack
like tool into the running pod.
We had another option though: to send kill -3 signal to the java process of the service which would not kill the app but would print the thread dump to the console/logs.
To capture logs have kubectl logs command running in one terminal. Sample command looks like this:
To send kill signal to java process, we need to exec into the pod.
For that in another terminal run:
Once into the pod bash, we need to look at processes running with just top
command.
top
command printed the Processes running in the pod.
In response of top command you can see the java process with PID 1 is running with CPU usage close to 200 percent.
Before we send kill -QUIT signal to this process, we need also to look at what threads were consuming the CPU at this level.
To show thread details of the top command, press SHIFT + H while top is running
Here you can see Threads with PID 50 and 26 were taking close to 100% of the each CPU core.
Now that we have Thread and Process PID, we are ready to take a dump, a thread dump.
For that just run kill command as:
When you hit this command from inside the pod, the terminal which is listening to the logs would just print the thread dump to the terminal.
Finding the culprit
Once you have the dump, you need to find the threads which are causing the hight CPU usage.
For that convert PID of threads to HEX representation:
Then find in the thread dump the thread stack with nid as these HEX values. Those are the threads you are looking for.
In our case, the Threads which were problematic where running a infinite loop with while(true), and inside they were waiting for a condition. Between these condition checks, threads were sleeping for some time. By mistake the sleep time was misconfigured to be 0 so the threads where actually never sleeping and running forever leaving CPU and us awake.