June 22, 2021

Throttling Jenkins pipelines per node

It turns out that throttling pipeline jobs on Jenkins is broken. At least, it is not possible to limit the number of pipeline jobs that run per node with the throttling plugin, even though that feature does exist for freestyle jobs. Instead, we can set up a work around using the lockable resources plugin.

First, let’s talk about the difference between throttling and locking on Jenkins. It’s always good to understand the conceptual difference between two things when you have to violate it for the sake of practicality.

Locking is used when a job needs to make exclusive use of a resource external to Jenkins. The job places a lock on that resource when it is used. Other jobs that need to use the same resource will be blocked until the job releases that lock, after which the next job can get access. Resources are defined globally, so all Jenkins agents have access to the same pool of resources.
Throttling is used to limit concurrent runs of the same (or similar) jobs on Jenkins itself. Throttling settings can be applied both globally or per agent, so we can limit the number of certain jobs that each agent individually can run at a time without limiting the number of executors for un-throttled jobs.

Both strategies limit the number of copies of a job (or category of jobs) that can run simultaneously. There is naturally a lot of overlap in how they can be used. There are some low level differences in how the limits are evaluated and applied, but the key conceptual difference is this:

Throttled jobs can scale with Jenkins’ own resources.
Locked jobs scale with an external resource.

If I only have one database server, adding more Jenkins agents won’t let me run more jobs; this is a case for locks. If I have one database installed on each Jenkins agent directly, adding more agents does let me run more jobs; this is a job for throttles.

The problem is: throttling per node doesn’t actually work for Pipeline jobs. The option exists in the Jenkins UI, but it does not actually do anything. There is an open ticket about it, if you’re interested. I even tried opening a PR to fix it, but the implementation I tried has some unaddressed edge cases and doesn’t work with multibranch pipelines at all.

The workaround I settled on is this: manually force lockable resources to scale with the number of agents. I consider this just a workaround because it’s abusing locks to do something they aren’t designed for, thus being more fragile and requiring more maintenance on my part. But, at least it works.

Let’s say my goal is to limit jobs with an unusually high memory requirement to only run 2 at a time on a node. The node is perfectly capable of running 8 “normal” jobs at once, so it’s not worth lowering the number of executors. What I can do is define 2 lockable resources per node. If we have the master node and one agent, that means 4 lockable resources:

Resource name	Resource label
high-memory-master-slot1	high-memory-master
high-memory-master-slot2	high-memory-master
high-memory-node2-slot1	high-memory-node2
high-memory-node2-slot2	high-memory-node2

The important piece is that on one node, all my “high memory slots” have the same label, which includes the name of the node. In the pipeline jobs that I want to be subject to this limit, I request a lock for any resource matching that label:

String runningNode = env.NODE_NAME ?: 'master'
lock(label: "high-memory-${runningNode}", quantity: 1) { ... }

The quantity argument is required to prevent Jenkins from locking all resources with that label. Since env.NODE_NAME is only set inside a node{} environment, I set a fallback to say if we’re not in a node environment, then assume we are running on the “master” or controller node. (As far as I know this assumption is true, but I admit I don’t know enough about how Pipelines are executed to know for sure if that is universal.) This design works assuming that if there is a node{} block at all, it wraps the lock{} block.

There are some very important drawbacks to this:

In my case, I’m lucky that the high-memory function is part of a global shared library, so I can code this lock once and have it automatically apply to all jobs that use it. If that weren’t the case, then in order for this limit to be respected I have to ensure every individual job implements this lock.
Similarly, this implementation assumes a name for a master node. If this is duplicated across many jobs, it will be a pain to change. There may be a more robust way of getting the current node, but everything I’ve found returns an empty string for the controller.
This can easily break if there are node{} blocks inside the lock{}, since you might end up locking one of the slots on a node different from where your job is actually running.
Since the job has to start running in order to ask for a lock, any waiting jobs will sit idle occupying an executor while they wait for a resource to be freed.
When new agents are added, I have to manually update the configuration to add new slots for these “throttled” jobs.

It happens that with my particular set up none of these drawbacks are a major concern, but they could easily be deal-breakers. Throttles, if they worked per node, would have avoided most of these:

Throttles can be defined as part of the job definition itself rather than inside the Jenkinsfile, meaning as an administrator I can more easily adjust the throttles of all jobs on a server shared by multiple teams.
Throttles defined for a job can prevent the job from starting at all, leaving executors free to execute other jobs.
When a new agent is added, no configuration changes are needed, since throttles would automatically know that the new agent can take a certain number of throttled jobs.

One nice advantage of using locks instead of throttles is that I can adjust how many slots are available on each node independently; I can define 2 slots on one node and 4 slots on another if I want. It is also possible to write a helper method that can check if lockable resources exist on a particular agent, so that you can default to “unthrottled” on an agent if no resources are defined on it:

import jenkins.model.Jenkins
private Boolean isNodeThrottled(String nodeName) {
    def lockable = Jenkins.getInstance().getPlugin('lockable-resources')
    if (lockable != null) {
        def limiter = lockable.getResources().find {
            it =~ "high-memory-${nodeName}-.*"
        }
        return limiter != null
    }
    return false
}

This function checks if there are any lockable resources on that match the naming pattern I’ve established in the table above. If there aren’t any, it returns false to indicate that this node is not throttled. This saves us from defining, say, 8 separate resources on an agent with 8 executors. For my high-memory example, this only makes sense if I have some nodes with effectively infinite memory, but this kind of logic might be applicable for other types of resources.

There are a lot of pros and cons that need to be weighed to decide what the right approach is, regardless of what the conceptual difference of locking vs throttling is. The throttling plugin actually does have some functionality for throttling pipelines, so it may be that for your specific use cases throttles are still the right way to go. For me, defining slots as lockable resources was the way to get the functionality I needed.

3 Comments

Add yours →

okhatavkar007
@ March 29, 2022, 11:48

Hey Gregory,

In our case, we do have the enough resources and each job is run in different container in our case. But still we were not able to limit the concurrency using throttling plugin.

What we have is that trigger job that runs multiple pytest jobs which are created from job dsl based on yaml configuration. We wanted to run these multiple pytest jobs but with concurrency of 10. when we trigger these job all are started but we want to run only batch size of 10 ? is this really possible with throttling plugin. Or have you seen this scenario ?

Reply →

Gregory Paciga
@ May 19, 2022, 18:30

I think you can achieve this by having each job lock a resource that has only 10 copies. The other jobs will show has started (up to the maximum available executors) but they will wait on that lock step until resources are free. You’d think it would work with throttling, but as discussed, it doesn’t work in all scenarios you’d expect it to.

Reply →

monger39
@ February 22, 2023, 05:50

we use alternative mechanism using similar locks, by first locking a resource (which contains the nodeName), then starting a node(#nodeName){}. This ensures a free lockable resource, but has the risk that the node does not have free slot or, even worse, the node could be offline. There is some ongoing work around the lockable resources (see github) to work around that problem.

Reply →

gerg.dev

Throttling Jenkins pipelines per node

More posts like this

3 Comments

okhatavkar007

Gregory Paciga

monger39

Leave a ReplyCancel reply

Throttling Jenkins pipelines per node

About this article

More posts like this

3 Comments

okhatavkar007

Gregory Paciga

monger39

Leave a ReplyCancel reply