Connectwise Automate leverages Microsoft’s IIS webserver to facilitate application communications. Like other components of the Automate stack, IIS needs proper configuration to perform well under the heavy load of an RMM system. Below you’ll find the core IIS tuning config we use here at Automation Theory whenever we’re doing consulting for partners.
As we explore IIS tuning for Connectwise Automate it’s important to note that we’re going in reverse order; we’re starting with the smallest structures and working towards the largest; tuning lower-level items first simplifies things as we work up the stack.
Application Pool Recycling
IIS has an application pool feature known as recycling. This restarts the worker processes on a regular interval, and the original intent was to address memory leaks in web applications where modification of the code wasn’t possible. While we have seen excessive memory usage with Automate worker processes occasionally, the true utility of this setting lies elsewhere.
The Automate application uses connection pooling when communicating with MySQL. The application will open up to 500 connections to MySQL and leave them open to facilitate the rapid processing of database requests. This is a best practice, but historically the pooled connections don’t always get closed, and for larger partners, this will result in a server that regularly hits the limit for MySQL connections, and the Automate app freezing/crashing.
Recycling the IIS worker processes closes these connections and prevents the database from being overwhelmed. The exact setting to correct this behavior will vary between servers, but recycling the IIS application pool every 60-90 minutes is adequate for most partners. While the recycle is happening the application will appear to hang for ~30 seconds. If this would be disruptive to daily operations it’s possible to specify a set of times for recycling to occur to ensure it happens outside of business hours.
Inside of IIS a thread known as a worker process handles the incoming web requests. The default number of worker processes is 1, and this will be fine for smaller Automate instances. However, the theoretical limit for simultaneous web requests (worst case scenario) would approach ~33% of the agent count (there are so many variables at play; please treat this as a loose number and use PowerShell to measure TCP connections on your webserver). For a 300 agent server, it’s quite plausible that a single thread could handle 100 simultaneous requests. However, the case becomes much less plausible for a 3000 agent server to efficiently serve 1000 simultaneous requests with a single thread.
# PowerShell to measure current web requests Get-NetTCPConnection -LocalPort 443 | Measure-Object
So, obviously bigger servers will need more threads, but how many? The old school logic when doing IIS tuning for Connectwise Automate is to have half as many worker processes as you have CPU cores (so a server with 8 cores/vCPUs would have 4 worker threads). The idea behind this is to prevent resource starvation. However, there are settings inside of IIS for controlling CPU usage by the worker threads, and they provide much more granular control over the resource utilization.
For most partners starting with the old school method is advisable, but if performance is still lacking (or the threads/connections ratio is still disproportionate) it would be worth gradually adding more worker processes (with the IIS resource controls) until an optimal balance is reached. As with other concepts in resource allocation, it’s worth noting that more is not always better, and there is a point of diminishing returns (the tipping point is normally when worker processes exceed core count). It’s also worth noting that additional worker processes will open additional database connections, so care should be taken to prevent hitting the max_connections threshold.
IIS application pools also have a property known as queue length. This is simply the number of pending web requests the server will queue before returning the HTTP 503 status code (service not available). All requests pass through this queue first as they are processed by the webserver (there are also other queues, but they are out of scope for our discussion here). The queue length plays a role in determining the total number of concurrent connections; as any request that a worker process can’t accept immediately remains in the queue.
In an ideal world, the webserver always can process incoming requests rapidly, and this queue never contains a large volume of requests. However, as discussed in our blog post here, there are inherent performance issues with the Automate database — and that can result in web requests waiting for the database, and thus the queue filling up. Because of this tendency, the Automate installer sets the queue length to 11,000 each on the “Labtech” and “CwaRestApi” application pools — a full 11x greater than the default.
It’s important to note cause and effect bidirectionally as we look at tuning this setting. In most cases, a high queue length is caused by contention in MySQL. This makes IIS unable to process requests, and the queue fills. However, once MySQL recovers, the queued requests begin to flood in and the application response will still be poor until the server catches up. It’s during these times that the Control Center will lock up and be unresponsive.
The million-dollar question of course is: when doing IIS tuning for Connectwise Automate, what should the queue length be? In the spirit of the proper use of the queue, we’d suggest setting it to a count that could hold ~60 seconds worth of requests (normally this is ~30% of the agent count). If it takes longer than 60 seconds to process a web request that indicates that there is a real issue, and it would make sense to start returning the 503 status code.
As a simple test to gauge this, set Performance Monitor to watch the queue size and recycle the application pool. This is normally lower than the 30% of agents count, but it’s representative of what a short processing delay on the server should look like. If your queue length is comfortably larger than this amount then all is well. For smaller Automate instances where 30% of the agent count would be less than the default of 1000 sticking with the default setting is the best course of action.
IIS also has a connection limit for all requests to a website on the server. The default value for this is 4,294,967,295, and it’s a setting that normally doesn’t get much attention, but it did get the spotlight during the 2020.7 Automate patch. This setting is defined on the website level, and it is the funnel that feeds all of the different application pools.
So, why does this setting default to the number of IPv4 addresses in existence? This value is set to the upper limit by default to prevent connections from being denied by default (the developers are hedging a bet that the whole internet won’t be accessing an IIS server simultaneously). However, the idea is that this value could be set to make sure that the sum of all connections to the server doesn’t exceed the available resources.
If this setting is left to the default, the connection limits at the application pool layer are the controlling factor. The danger here is that it’s possible for the sum of the application pool connections to be greater than the number of connections the server can accommodate. This appears to be the issue with the 2020.7 patch, where the queue length on the application pool was too large, and thus the website connection limit was used to prevent overloading of the server. Obviously, the conditions of the patch were a rather special case, but in general, it is of benefit to partners to configure their servers to be tolerant against connection spikes.
When doing IIS tuning for Connectwise Automate this value should be set to the max number of connections the server can process — however, there are a lot of variables in that calculation. As a starting point, it’s advisable to set this to the sum of the application pool queues once they are properly scaled, and work down from there. Please keep in mind that this value will be the cap of agent communications, web interface users, and API calls — and the normal count of those will be different depending on user count, integrations, and how Automate is used in your particular environment.
We hope that this has been helpful for you. Here at Automation Theory, we’re certified MySQL DBAs dedicated exclusively to the Connectwise Automate software stack. Be sure to check out our integrations and services.