Operating micro services in the cloud should be as resource efficient as possible and micro services should scale depending on the user demand. With that in mind I started to tune a micro service written in Java 17, which serves 100s of API requests per second. The goal was to improve the horizontal scalability. On this journey I had some learnings on how the JVM chooses the default garbage collector. In this article I want to share them with you.
Resource Definitions for the Kubernetes cluster
Every application should have some resource definitions. In my case I have a Kubernetes cluster. To deploy the micro service with the correct settings, I am using Helm charts. If you want to know how this works, you can follow the official guide for: Managing resources
To achieve horizontal scalability I defined a horizontal pod autoscaler with a minimum of 3 pods and a maximum of 20. You can choose a maximum which suits best for you. Additionally I have set the resources to 500m cpu. That is basically half of a physical CPU. The reasoning is to have small instances which are scaling down at night and up during the day. Additionally the application has a low amount of traffic during the night. The definition looks like this:
resources: requests: cpu: 500m memory: 2048Mi
So far so good. The change was deployed to the test system and all was fine. Before it get’s deployed to production I wanted to do a performance test to verify the change. When looking at the test result I have noticed that the performance was quite OK, however the Garbage Collector metrics have been slow.
I was not quite sure what is going on and after some research I found out that the SerialGC is being used as Garbage collector by the JVM. I always thought Garbage-First (G1) Collector is the default choice…
How the JVM chooses the default garbage collector (GC).
Interestingly when using only half of a CPU for the JVM, the JVM ergonomic logic chooses the SerialGC. The serial garbage collector uses a single thread to perform all garbage collection work, and should be used for application with a low amount of CPUs and RAM available.
After some investigation I also found the code path where this logic is implemented. If you are interested check it out: Github openjdk os.cpp#L1673
Defining Request and Limit Resources
As mentioned the service is running on a Kubernetes cluster as docker image. It has half a CPU defined, however the underlying infrastructure has probably more resources. To use this infrastructure efficiently it is possible to define resource limits:
resources: limits: cpu: 2000m memory: 2048Mi requests: cpu: 500m memory: 2048Mi
And this is the configuration I am using right now for the mentioned micro service. I can use up to 2 CPUs, if there is a higher demand. Especially on startup the full limit is reached and this helps to bring up pods faster. The SerialGC does not benefit from more CPU resources. However other Garbage Collectors do.
The JVM chooses the garbage collector depending on the configuration of the requested resources during the startup.
Which means in my case the SerialGC is selected on startup and the JVM ignores the fact, that more CPUs are available.
Define the Garbage Collector
To ensure that the proper GC is being enforced, we can define it manually by specifying the JVM option -XX:+UseG1GC. The JVM has multiple GCs available and make sure to choose depending on your use case. A nice summary can be found here: Available Collectors - Note: Shenandoah is missing
In my opinion the G1 GC is not even the best solution any more. For my use case I ended up using the Z Garbage Collector by defining the flag -XX:+UseZGC. The reasoning is that the application should keep latencies low and I have seen that the GC pauses are smaller when using ZGC, compared to the others. As every application has different workloads I would recommend to do some tests yourself and choose the correct GC for optimal performance.
Depending on the load and resource requirements of your java micro services, you should ensure that the correct JVM settings are set. Then you will be able to use the available resources as good as possible and potentially save costs operating it. For me it was very insightful to read about the new garbage collectors in details and to find a proper setting for my micro service.