How to Optimize Queries in SQL Server | by Jamie Burns | May, 2022

Whenever you’ve obtained a question that’s working slightly gradual, right here’s an strategy you possibly can take to make it quick

A rev counter in a car
Picture by Chris Liverani on Unsplash

Optimising SQL Server queries will be each painful and rewarding. On one hand, making a change that will get your question working in a fraction of the time is an excellent feeling. The opposite facet is having a question that refuses to get any faster, no matter what you modify.

In all of my time optimising queries, there are a handful of issues I all the time examine first, and a variety of the time it’s one among these that’s the essential offender of the slow-down and will get the question working rapidly once more.

On this article, I’ll speak via how I’d strategy optimizing a few slow-running queries, together with some tips about my expertise on methods to strategy such optimization typically.

On this article, I’ll be utilizing a replica of the Stack Overflow database from 2013, hosted on SQL Server 2019 Developer Edition. This database has obtained an excellent construction, and there’s a variety of knowledge in there. If you wish to comply with alongside in your machine, then the database will be discovered on this article by Brent Ozar. Don’t fear in case you don’t have these databases although, I’ll embrace the schema definitions as we go alongside so as to see what’s happening.

The very first thing I all the time examine for once I come throughout a gradual question is the execution plan for the question.

That is the plan that SQL Server generates for a given question, and exhibits all of the totally different elements that it’s doing when executing it.

These plans could be a little daunting at first as a result of there’s a lot info proven. However they’re extremely useful in stating why one thing is working slowly.

A software I take advantage of to view these execution plans is the superb (and free) SQL Sentry Plan Explorer. By working your queries on this software, you get a really good UI to indicate and discover the plans. There are alternative routes to view these plans (together with viewing it in plain textual content in case you’re that means inclined), in addition to utilizing SQL Server Management Studio straight, however I discover the Plan Explorer the best to make use of, and I’ll be together with screenshots of this software on this article.

One other software I steadily use is the SQL Server Profiler, which is put in alongside SQL Server Management Studio. I discover with the ability to profile queries regionally (or on check environments) useful in monitoring down the precise queries which are working slowly. Nevertheless, there are a load of various methods of doing this, together with utilizing Extended Events, or with SQL monitoring instruments like Redgate’s SQL Monitor, or inside your infrastructure with issues like Azure’s Query Performance Insight.

For this text we are going to assume we’ve already recognized which queries are the gradual ones and should be labored on.

Okay, let’s get began with a gradual question, and see what we will do about it.

Say we now have a desk of posts, which have a lot of solutions for every:

We now have a question that finds the highest 10 posts which have probably the most solutions, and we wish the Id and Physique of every.

We are able to write the question like this:

It seems fairly easy. Let’s give this question a run, and see the way it performs:

A screenshot of the query being executed in SQL Server Management Studio. The query is shown at the top, with the result set at the bottom. The timing shows that the query took 50 seconds to run.
The outcomes of executing the question, coming in at 50 seconds

Uh-oh. That took 50 seconds to return these 10 rows. That’s not good. Let’s execute this in SQL Sentry Plan Explorer and have a look at the execution plan, to see what’s happening.

A screenshot of the generated execution plan. It shows a Clustered Index Scan reading 17 million rows.
The generated execution plan

It’s a reasonably easy plan, and we will see the place the issue is. That Clustered Index Scan on the finish is discovering 17 million rows, and when it’s mixed with the Kind step, that’s virtually your complete value of the question.

A Clustered Index Scan is when the Clustered Index (which, in our case, is simply the Major Key Id column) is learn from the begin to the tip. And on such a big desk as this Posts desk, that’s all the time going to be gradual.

start line for fixing points like that is to see if there’s a brand new index that may be added that can be utilized, as an alternative of counting on this Clustered Index. We all know that we’re ordering by the AnswerCount column, so let’s add a brand new Non-Clustered Index to that desk for that column:

Now that’s added, let’s re-run our question:

A screenshot of the query being executed in SQL Server Management Studio. The query is shown at the top, with the result set at the bottom. The timing shows that the query took 0 seconds to run.
The outcomes of executing the question, coming in at 0 seconds

That’s significantly better — returning in ‘0’ seconds. Let’s examine the execution plan now:

A screenshot of the generated execution plan. It shows an Index Scan reading 225 rows and a Key Lookup.
The generated execution plan

Nice. We’ve now obtained an Index Scan as an alternative of the Clustered Index Scan and Kind, and since this index is concentrating on the column we’re primarily sorting by, then we’re solely studying 225 rows (as an alternative of 17 million). We’ve now additionally obtained a brand new Key Lookup step. Once we hover over this, we see that it’s trying up the ‘Physique’ column:

A screenshot of the generated execution plan, showing the details of the Key Lookup. It shows that the Output List contains the Body column.
The main points of the Key Lookup

Provided that this question is now returning in 68ms, and studying an inexpensive variety of rows, I’d recommend that that is so far as this optimisation must go. It’s massively higher than the unique 50 seconds it took to run, and we’ve dramatically decreased the variety of reads, which is able to ease the load on the server. Plus, this Physique column is fairly large, so we don’t essentially need to embrace this in an index.

Now we need to get the ten newest wiki posts that don’t have any solutions, and get the put up’s Id, CreationDate and LastActivityDate. We now have a PostTypes desk with this schema:

And so the question we will write is:

Let’s run it:

A screenshot of the query being executed in SQL Server Management Studio. The query is shown at the top, with the result set at the bottom. The timing shows that the query took 49 seconds to run.
The outcomes of executing the question, coming in at 49 seconds

Okay, we’re again at 49 seconds. However that’s not shocking, since our new index in all probability isn’t getting used for this. Let’s examine the plan:

A screenshot of the generated execution plan. It shows a Clustered Index Scan, but the number of rows shown is only 171. There is a warning icon on the Clustered Index Scan.
The generated execution plan

This time, we’re not seeing any actually thick arrows (indicating a variety of rows being learn), however one thing undoubtedly not proper. We are able to see slightly warning icon on the Clustered Index Scan, which is suggesting that is the place the issue is once more. Hovering on this provides us this message:

A screenshot of the generated execution plan, showing the details of the Clustered Index Scan. There is a warning shown.
The main points of the Clustered Index Scan

Have a look at that warning:

Operation prompted residual I/O. The precise variety of rows learn was 4,169,204, however the variety of rows returned was 171.

Yikes.

That’s fairly inefficient — it’s studying over 4 thousands and thousands rows, simply to return 171.

Wanting on the predicate for this Clustered Index Scan, we will see that it’s utilizing AnswerCount and PostTypeId. We’ve already obtained an index for AnswerCount, so let’s edit that index to incorporate PostTypeId in there, and see if that makes a distinction:

Now let’s run the question once more:

A screenshot of the generated execution plan. It still shows the Clustered Index Scan with a warning icon.
The generated execution plan

Nope, that new index isn’t getting used. The Clustered Index Scan remains to be there. Let’s create a brand new index, however this time set PostTypeId to be the primary column within the index, and embrace AnswerCount, and see what distinction that makes:

Once we attempt that question once more, we see that it’s nonetheless not made a distinction. SQL Server remains to be preferring to make use of that Clustered Index Scan overusing any of the brand new indexes we’ve created.

Let’s replace this final index to additionally embrace CreationDate and LastActivityDate, since they’re concerned within the question too:

Working this question is significantly better:

A screenshot of the generated execution plan. The Clustered Index Scan has been replaced by an Index Seek.
The generated execution plan

That is coming in at 3ms. Significantly better than the unique 50 seconds. The Clustered Index Scan has been changed by an Index Search, which is far more environment friendly — SQL Server can use this index to particularly search for the rows it’s thinking about, quite than having to undergo your complete index.

Nevertheless, what occurs in case you attempt to choose one other column in that question. Say we add the ClosedDate to the choose checklist, like this:

Once we run this question, we get this plan:

A screenshot of the generated execution plan. It again shows the Clustered Index Scan with a warning icon.
The generated execution plan

And sure, we’re again at 49 seconds. Not good.

Now, we might simply embrace this new column into the index, and that may be an appropriate answer for some eventualities, however actually, we wish a plan that may address totally different columns being returned.

If we don’t, it is a future efficiency bug simply ready to occur — think about if 6 months down the road a brand new column is added to this question, and the developer forgets to examine this plan.

The question would possibly return rapidly on their improvement machine (which solely has restricted knowledge), and relying on how a lot consideration is given to efficiency points in your check atmosphere, this modification would possibly make it all over to manufacturing with out anybody noticing the impact it will have, after which instantly you’ve obtained a serious efficiency bug.

Let’s see if we will make this higher.

I usually attempt to make all the things so simple as potential. Wanting on the plan we’ve obtained, we’ve obtained a Bitmap, Parallelism, and Clustered Index Scan only for discovering the kind. Let’s index the PostTypes desk now, and see what distinction that makes:

Let’s run the question once more, and examine the plan:

A screenshot of the generated execution plan. The Clustered Index Scan on the PostTypes table has been replaced with an Index Seek.
The generated execution plan

So we’ve changed the Clustered Index Scan on PostTypes with an Index Search, nevertheless it’s not helped the rest — it’s nonetheless coming in at 49 seconds. Let’s attempt altering the question itself.

We all know that we’re solely in search of posts of a particular kind, so let’s set that as a variable first, and keep away from the be a part of to PostTypes fully:

Let’s run it. We now get 2 plans, as a result of we’ve obtained 2 particular person statements being executed. First, we now have the setting of the @PostTypeId variable:

A screenshot of the generated execution plan. It shows an Index Seek for finding the row in the PostTypes table
The generated execution plan

That is trying fantastic, it’s utilizing an Index Search, and is available in at 2ms. The subsequent assertion has improved barely, in that it now is available in at 42 seconds (down from 49), however isn’t adequate but:

A screenshot of the generated execution plan. It shows a Clustered Index Scan with a warning icon.
The generated execution plan

Okay, it is a fairly easy assertion now on the Posts desk, so let’s assessment what indexes we’ve obtained. We’ve obtained IX_AnswerCount which has AnswerCount as its key column. We’ve additionally obtained IX_PostTypeId that has PostTypeId as its key column, and likewise contains AnswerCount, CreationDate and LastActivityDate.

Wanting on the plan above, that Kind step is a bit suspicious. Let’s attempt to eradicate that by creating a distinct index which has its first key as CreationDate, to hopefully keep away from that kind:

Let’s examine the plan:

A screenshot of the generated execution plan. It shows an Index Scan reading 12 million rows, and a Key Lookup with a warning icon.
The generated execution plan

Oops. That now takes 2.5 minutes to run, and studying 12 million rows. It seems prefer it’s studying all these rows, then doing a Key Lookup to then filter by PostTypeId and AnswerCount. Not an enchancment in any respect.

Let’s create an index to index first on PostTypeId, then AnswerCount, then lastly CreationDate:

Now the plan comes out as:

A screenshot of the generated execution plan. It shows an Index Seek reading 167 rows and a Key Lookup reading 10 rows. Neither have a warning icon.
The generated execution plan

Wonderful. The timing is coming in at 1ms, and that plan seems very nice and clear.

This optimisation ought to be fantastic to go away there because it’s very fast, and even when further columns are added, the question plan shouldn’t change dramatically.

Eradicating Key Lookups

One choice we do have is to tidy up that Key Lookup. Wanting on the particulars, we will see that this step is trying up these columns which are being returned however weren’t included within the IX_CreatedDate column:

A screenshot of the generated execution plan, showing the details of the Key Lookup. It lists ClosedDate and LastActivityDate in the Output List area.
The main points of the Key Lookup

So we will embrace these columns within the index, not as key columns, however simply included:

The plan is now super-simple:

A screenshot of the generated execution plan. It shows an Index Seek reading 10 rows.
The generated execution plan

That is trying nice now. Our index is getting used, and the plan would address further columns that will or will not be added sooner or later.

A word for whenever you end optimisation

Whenever you’ve reached your objective of optimising one thing, it’s all the time a good suggestion to assessment what adjustments you’ve made. When you’ve adopted via what we’ve executed above, you’ll see that we’ve created extra indexes than we really want, in order that they should be dropped.

It’s additionally price reviewing the optimisations you’ve executed, particularly in case you’ve adopted an incremental strategy like I’ve executed above. For instance, now that we’ve obtained the IX_PostTypeId_AnswerCount_CreationDate index on the Posts desk, is it nonetheless crucial for the question to have been rewritten to make use of a variable for PostTypeId?

You would possibly must do some extra experimentation to double-check that every of your adjustments remains to be wanted, and perhaps revert any adjustments that don’t offer you any profit anymore.

A warning for including new indexes and eradicating Key Lookups

Simply pay attention to making an attempt to take away all Key Lookups as we did above. Typically it’s certainly good to incorporate columns like this in indexes to keep away from the Key Lookup, however you don’t need to make your indexes too heavy. Typically it’s preferable to maintain your Key Lookups within the plan since general, it may be helpful.

Including too many indexes, or together with too many columns in them, can introduce new issues. While the indexes will help queries discover their knowledge sooner, there’s a value to sustaining these indexes when knowledge is added, eliminated, or modified within the index’s desk. So whenever you’re making index adjustments, you might want to additionally take into account what’s occurring when the information is altering — having a desk with a thousand indexes would possibly sound like your queries will run super-fast, however that’s going to take a very long time for brand new knowledge to be inserted. And in case you’ve obtained a column that’s being up to date repeatedly, you would possibly assume that value is a bit too excessive, and it’s higher to go away the Key Lookup in place.

As an illustration, take our PostTypes desk. Say we need to insert a brand new row into that desk:

If we now have no indexes on that desk (aside from the Clustered Index for the first key), inserting this new row generates a easy plan:

A screenshot of the generated execution plan. It shows a Clustered Index Insert.
The generated execution plan

We’ve simply obtained the Clustered Index Insert, which was arrange for the first key. Nevertheless, if we add 10 indexes all on the Kind column:

The plan exhibits that these new 10 indexes have been inserted into as nicely:

A screenshot of the generated execution plan. It shows a Clustered Index Insert, plus 10 non-clustered index inserts.
The generated execution plan

It’s in all probability not a giant deal if this desk is just inserted into, say, as soon as a day. But when that is being up to date a number of occasions a second, then that value goes to mount up.

It’s a cautious steadiness that you just want to concentrate on. Indexes can velocity up your queries, however come at a price for inserting, modifying or deleting that knowledge. All of it comes all the way down to how the information is used, and whether or not the fee is suitable.

When is the optimisation completed?

Earlier than getting caught into making a question sooner, it’s price contemplating for a second what your finish objective is. In some unspecified time in the future, you’ll want to attract a line below the enhancements you’ve made and say ‘it’s executed, it’s adequate. A typical threat when doing optimization work is to attempt to goal for the near-impossible, however you’ll ultimately attain some extent of diminishing returns, because it will get tougher and tougher to enhance the efficiency even by just a bit bit.

In some circumstances, you may be optimizing one thing since you’re experiencing a time-out — the restrict set by your utility that it’ll wait on your question to finish earlier than it throws an error. In circumstances like this, your first objective will clearly be to get the question to constantly full inside that timeout restrict. Nevertheless, in case you go away it at that, you would possibly end up having to revisit the question once more, the subsequent time the timeout will get breached. You’ll ideally need your question to comfortably run inside your set timeout limits.

Typically you’ll be optimising one thing to enhance response occasions for one thing, to make one thing really feel rather less sluggish. In these circumstances, it could be that any enchancment is nice sufficient, or perhaps you’d goal at one thing like a 50% discount within the question time.

In different circumstances, you may be optimising a question to scale back the general load on the database. So the question may be working in an inexpensive time, nevertheless it’s placing an enormous quantity of strain on the database server itself, which is mostly slowing all queries down barely. In conditions like this, you’ve obtained to consider what your goal optimisation is — are you lowering the variety of reads in a particular desk, or lowering how a lot tempdb is used, and so on. And if that’s the case, how are you going to measure whether or not the adjustments you’ve made are adequate.

There’s no mounted rule for saying how nicely a given question ought to run — all of it will depend on what the question itself is doing, how steadily that question goes to be executed, the sources you’ve accessible, and the way a lot time you need to put money into it.

So, say you’ve obtained a question that takes 10 seconds to finish, and also you need this faster. If you will get that working in 1 second, then that’s a reasonably first rate enchancment. That will or will not be your goal although — in a variety of conditions, a 1-second question will nonetheless be deemed too gradual. So that you spend extra time on it and get it working in 500 milliseconds. Once more, an excellent enchancment. There’s nonetheless room to enhance although. You’ll must resolve how far more time you need to make investments on this — is it price spending extra time to get this all the way down to 10 milliseconds? Or, given how steadily this question runs, or the way it suits in with the broader answer, perhaps 500 milliseconds is suitable for now.

That is possible a choice you might want to contain the broader crew in, particularly in case you’ve obtained a product crew who’re liable for the deliverability of your answer, and have outlined non-functional necessities that relate to hurry and response occasions.

Hopefully, they’ll have a good suggestion as to the wants of your customers, in order that they’ll have the ability to measure up the advantages of additional optimization in opposition to creating different options that may be within the backlog/board. This can be an excellent case for elevating technical debt, the place if you understand there’s a greater answer that may make one thing run loads sooner, however you possibly can solely afford an interim answer proper now, then you possibly can take into account elevating this as debt to be handled in some unspecified time in the future sooner or later. See my other article about technical debt for a deeper dialogue on how technical debt can work.

On this article, we’ve talked about what an optimization course of seems like, and highlighted a typical thought course of {that a} developer would possibly undergo to optimize a question.

We’ve additionally talked about the way it’s essential to resolve how far you need your optimization to go — clearly, you need all the things to run as fast as potential, however keep in mind that in some unspecified time in the future it’s going to take an exponential quantity of effort to shave just a few extra milliseconds off that question, so you might want to study to attract a line below optimization and say that it’s adequate for now.

Optimization work will be extremely rewarding, and personally, it’s one among my favourite points of being a developer. Even in simply developing with the trivial examples above, I’ve actually loved going via the motions of constructing a gradual question run in a fraction of the time. It’s not one thing everybody enjoys, however in case you get within the zone for doing optimization work, you’ll rapidly end up all the time looking out for different issues to optimize and make faster!

More Posts