Top 5 myths about the Connectwise Automate Database

As we begin discussing myths about the Connectwise Automate database, we’d like to go on record and say that most of the published information is wrong or misleading. Databases are extremely complex, and most of the existing tuning or maintenance advise is not holistic in nature — it treats symptoms and not the root causes. In this blog post, we’ll explore the top five myths about tuning and maintaining the Automate database.

Myth 1: The Automate database is too big

This myth is propagated by almost everyone you talk to, from support to other admins to consulting firms. If your server is running poorly they will happily tell you to start chopping out data from certain tables. The cause and effect aren’t 100% false, but the premise is, and sometimes the application is as well.

The premise of a database is that it holds data — if it can’t hold data, then something has gone wrong. It’s worth noting that there are production MySQL databases for other applications that are hundreds of gigabytes in size and hold hundreds of millions of rows that do not have the sorts of issues that Automate does. So, if a database is meant to hold data, and other MySQL databases can hold data orders of magnitude larger than an Automate database, what’s the issue?

The symptoms of slow system performance are caused by inefficient querying of the large tables. The existing database schema as of this time (July 2020) is almost exclusively using single-column indexes. This is an issue since MySQL can only use one index per query in most cases. If you want to search the eventlogs table for rows matching certain event ID’s and belonging to certain computers only one index can be used — so you either get all the event logs for all the computers, or you get every event log with that ID (these are the two dismal “best case” scenarios with the stock indexes). In cases like this, the query optimizer will look at this situation and often determine the fastest way to query the data is to ignore all the indexes and to check each row in the entire table for matches (a full table scan).

So, when it comes to full table scans less data is indeed better, and this is the bit of truth to this myth. It is logical after all that if we need to sort through all the data to find what we’re looking for that the less there is to sort through the faster we can do it. However, if there are general stability or performance issues randomly deleting data isn’t the solution.

The answer to the table level performance problem is to create better indexes (compound indexes with multiple columns); deleting data is simply optimizing for full table scans (which is so absurd words fail to describe it).

Here at Automation Theory we’re doing R&D on better indexes for the Automate database, but until that work is done the suggestions we have are two-fold:

Use the explain statement when writing custom SQL to catch non-optimal queries
Try Database Tuner; it adjusts optimizer settings to prefer the existing indexes and reduce table scans.

Myth 2: The server needs more RAM

Server resources seem to be a common scapegoat for performance issues. Support often will tell you that the database needs more RAM, and some partners in the past have had to add extreme amounts of memory just to prove that their issues stem from something else. It’s very common to see partners with more RAM than data in their servers.

While it’s very true that you need adequate resources, at some point you’ll start to see diminishing returns. The more frustrating point is that, as we describe in our post The MySQL Performance Problem, is that MySQL needs to be given instructions on how to best use these resources (which our plugin Database Tuner does, along with our Database Tuneup service). Blindly throwing resources at the issue won’t help.

The exact amount of RAM that a server should have depends on the size of the database and the server architecture. It’s important to note that MySQL does need other memory structures besides the buffer pool (approximately 10% more), so on a single server, it’s important to make sure that IIS, the Automate application, the DB and the OS all have enough RAM and aren’t competing against each other — which might result in paging. Paging is the enemy of database performance since it writes to disk data that we want in RAM for rapid access — and in cases of resource contention reducing the amount of RAM allocated to MySQL could increase performance.

Myth 3: Your storage must be optimized for random writes

The only thing more complicated than a database is trying to understand how databases interact with other layers of hardware and software. There’s a thread (here) where a case is made that Automate servers shouldn’t be run in Azure as the disk performance for random writes is so poor that the database will be prone to locking issues.

While it’s true that faster storage is inherently better, the above logic fails to account for MySQL behavior holistically. Locking in MySQL is a property of a transaction, and it is only secondarily related to I/O — and on a properly configured server the transaction I/O should be buffered in memory and then asynchronously written to disk (and with the huge amount of RAM most partners have the buffer isn’t the issue here). If there is a query pileup due to transaction locking we can solve that issue with proper transaction configuration (Database Tuner optimizes this too).

Now, let’s talk about storage for a moment. As described in The MySQL Performance Problem, MySQL needs to be configured to use additional I/O resources. The default settings in MySQL assume it’s installed on a mechanical disk. Just like RAM, you can give MySQL the highest-end storage controller on the market, but without the proper configuration it won’t use it — things might be faster since the pipe is bigger, but in no way, shape, or form are you using the whole pipe.

Myth 4: max_connections should be 3x agent count

This myth is something that was recommended by CW for some time. I’ve personally spoken with a consultant who advised that for a 10,000 agent server we should have 30,000 as our max_connections for the Automate DB. The official documentation has since changed, first to 3000 (a reasonable value) and now to 1000 (this is too low for large servers). However, many partners have max_connections set extremely high, and this is unnecessary at best and can cause issues at worst.

The idea of max_connections in MySQL is that if the server is overwhelmed it’s better to turn away some requests instead of having the server crash. While in an ideal world we don’t want either, for an Automate server it’s much better for some queries (most commonly by volume heartbeats, check-ins, inventory updates, etc.) to fail than for the whole DB to malfunction.

The danger comes from how memory allocation works in MySQL. Certain memory structures like the buffer pool are allocated when the server starts and don’t change. However, several buffers are allocated per connection as needed. On a stock Automate server, each thread could use 129.1MB of RAM if it maxed out all its possible buffers. While it’s unlikely to happen, that puts the MySQL theoretical max memory usage at ~378GB for 3000 connections.

Large Automate instances also can have pain points with the nightly maintenance routines and can be prone to connection spikes. Some would advocate that a larger max_connections value is a valid solution — but this simply results in a query pileup, and the Automate application becoming more or less brain dead for some time as the database has not been scaled to process the inbound requests. The application is effectively down at this point, even if none of the underlying services are reporting hard errors. From a stability and administration point of view it’s much better to impose hard limits and fix root causes for issues when they happen — max_connections is a poor band-aid fix at best.

So, what’s a good max_connections value? First, do a quick spot check run the statement:

show global status like 'max_used_connect%';

It will return the high water mark for simultaneous connections since the server was started (and on MySQL 5.7 it will show the timestamp of the high watermark). Most partners will find that their server doesn’t use any more than ~2000 connections, and having extremely high connection limits doesn’t serve a purpose. The previous guidance from CW of 3000 is suitable for most partners (1000 is too low due to the connection pooling config which will use up to 500 of those connections).

Myth 5: If my buffer pool is large enough my Automate database runs in RAM

Let’s preface this by saying that the InnoDB buffer pool is one of the most debated (and perhaps most misunderstood) items in MySQL tuning. It seems that there are seas of opinions on this topic, and the proper configuration depends on the usage of the server (so most of the advice you see on sizing, instances, etc. does not apply to Automate). With that out of the way, this particular myth goes as follows: if the buffer pool is larger than the database size all the data is stored in RAM and therefore the database is faster since there’s less I/O (as if it were using the MEMORY storage engine).

That is a false statement because that’s not how the buffer pool works. The buffer pool is made up of three parts (the buffer pool proper, the change buffer, and the adaptive hash index), and the buffer pool proper is only used for reading data (so you do save I/O on reads). The pool populates as data is accessed, and it only stays in the buffer if it is frequently accessed — MySQL employs an algorithm to remove infrequently accessed data from the buffer. This is why the buffer should never be at 100% capacity on a healthy server.

Also, unlike memory tables the buffer pool cycles data. By default anytime any data is accessed it’s added to the buffer pool, and it might push out data that’s more frequently used (think of when big tables are accessed by reports or internal monitors). MySQL also has a read-ahead function where it prefetches data into the buffer pool based on what you’ve already accessed, so the data in the buffer pool at any given point in time might be the most recently accessed or most anticipated accessed data instead of the most frequently accessed data (Database Tuner does optimize these settings to reduce data cycling in the buffer pool).

Given the above behavior, the buffer pool is definitely not a pure-memory database like popular opinion holds it to be. With that said a properly sized buffer pool is important for a properly tuned MySQL server — and it’s better to err on the side of being too large rather than too small. While it’s outside the scope here to have a complete discussion about buffer pool sizing, we’d advise starting at 80% of the size of the database and tune from there.

Addressing the root causes

Now that we’ve reviewed the most common myths, it’s time to answer: what can we do about the root causes? Here at Automation Theory, we’re certified MySQL DBAs working exclusively with the Connectwise Automate stack. We have a couple of products purpose-built to address the root causes and properly tune the Automate database.

Database Tuner is a plugin for Automate that dynamically tunes your database in real-time. It will alleviate some of the index woes, properly allocate RAM to a number of buffers, provide an interface to allocate I/O resources, and adjust the buffer pool to the workload of the Automate application. Database Tuner can be used in both on-premise and cloud-hosted servers.

Our Database Tuneup flat-rate service complements the Database Tuner plugin for on-premise servers. It will help size your buffer pool, connection limits, and it contains additional I/O settings that can’t be adjusted while MySQL is running (and other best practice items as well).

We hope this information has been helpful to you. Please feel free to contact us if you have questions about your particular server!