Add max cpu and max ram usage to resource tab
Mattia Giuffrida
It would be helpful to have Resource tab to see the memory and cpu usage but 15 seconds interval are rather short and not enough. It would be great if the resource tab can show the max cpu and the memory it has used so it is easier to understand if the build failed due to OOM or not.
S
Sebastian Lerner
Hi Mattia Giuffrida! This exists in "Resource Class Insights" https://discuss.circleci.com/t/resource-class-insights/43790
Does that help your use case? or is there a reason why that would not work?
Mattia Giuffrida
Sebastian Lerner: Hi Sebastian, thank you for pointing me there, it's a really useful aggregated view!
However, in my case I had some jobs which were failing with a "job was killed" error.
Looking at those insights and the "Resources" tab for the job, the memory consumption never hit 100%, yet when I reached out to the CircleCI support, they confirmed the jobs were being killed due to an Out Of Memory issue. They also explained the reason I couldn't see this in the Resources tab is because the frequency of the readings is every 15s.
This feature request is to make this case (OOM) clearer; either by explicitly communicating when jobs are killed due to OOM, or by improving the Resources tab with a proper measurement of the maximum memory used. This way we can have a clear indication that is the issue and bump the resources accordingly, or modify the job to ease the memory pressure.
The Resource class insights shows aggregated data and is therefore not useful to debug errors.
S
Sebastian Lerner
Mattia Giuffrida: Thanks for the context, super helpful. Can you share the support ticket request number so I can read through it with my engineering team? Feel free to send it to sebastian @ circleci.com.
S
Sebastian Lerner
One more question Mattia Giuffrida. Did upgrading the resource class size help resolve the issue or was there something else you did?
Mattia Giuffrida
Sebastian Lerner: Sure, I'll send you the info to your email.
In the end we bumped the instances to xlarge and for now this has helped. I was also adviced to find a way to ease the memory pressure, but in our particular case that doesn't seem to be an option unfortunately