For Answers, see/post comments

Is a Temporary Table Really Necessary?

You will find articles on this site as well as others advising you to avoid the use of temporary tables to maximize the performance of your queries. I agree with the articles, but would like to add that sometimes you cannot avoid the use of a temporary table. I work with some very large SQL Server databases (the biggest being over 2.1 terabytes) and have found that I can avoid the use of temporary tables in most cases but sometimes they come in handy. This article will discuss the use and the alternatives to temporary tables from a query performance and maintenance standpoint.

Most of the literature that advises against the use of temporary correctly states that they may cause performance issues due to the locking of the tempdb while the temporary table is being created, the I/O activity involved during the use of the temporary table, and the potential locking of the tempdb if a transaction is used for the creation and the subsequent operations against the temporary table, not to mention the numerous problems SQL Server has with operations against temporary tables - see the list of Knowledge Base articles below. While these issues are true, I'm going to provide some reason to use a temporary table. I will admit that I do not use or have found a reason to use a global temporary table so you will find discussions on global temporary tables absent in this article.

Why would you use a temporary table?
There are several reasons that I use temporary tables in my work; to hold the results of a called stored procedure, to reduce the number of rows for joins, to aggregate data from different sources, or to replace cursors.

As your query become more complex you will find yourself repeating blocks of code within a query or between different queries. This reuse of code makes the case for a creating a stored procedure with that code and calling that stored procedure. This may make for a large amount of stored procedures in your database, but it does greatly reduce the maintenance when the functionality needs to be changed and you have to change code to meet that functionality, only one query to change now and not multiple queries that have to researched and changed. I use this technique quite often and it often forces me to use a temporary table to hold the results of those stored procedures since Transact-SQL does not allow the results of a stored procedure to be used as a table. This is probably the number one reason in my code for the use of temporary tables.

I quite often find myself having to join a 10 million plus row table to a 100 million plus row table to a 20 million plus row table and then ordering the results so only the most recent activity displays first. Even with proper indexes and using WHERE clauses to filter and force the use of an index, the performance of the query is unacceptable (since the application I work on is used by call centers, acceptable performance for a query is measured in seconds) and often the sorting produces huge performance losses as well as huge tempdb activity. I have quite often found that using corresponding temporary tables for each of the permanent tables to hold data from filtered by the WHERE clauses before I join and sort the data will increase performance on the query to such a large degree that I can actually place it into production without worrying about its performance or the impact on the tempdb database. Below is a very simple query to show how I do this.

Original Query to find details on a particular customer's phone call

SELECT table1.numCustID, table2.strPhoneNumber, table3.strPhoneNumberCalled
FROM dbo.table1 table1INNER JOIN dbo.table2 table2
ON table1.numBillID = table2.numBillID
INNER JOIN dbo.table3 table3
ON table2.numBillDtlID = table3.numBillDtlID
WHERE table1.numCustID = '5555'
AND table2.strPhoneNumber = '5555555555'
AND table3.strPhoneNumberCalled = '1234561234'
ORDER BY table3.dtmCalled DESC

(This query does not match the schema or an existing query at Verizon. It has been created to show a particular problem with a hypothetical telecommunications database.)

New Query

(I usually name my temporary table after the stored procedure that created it so I can troubleshoot any problems in tempdb from the use of temporary tables faster.)

CREATE TABLE #tquery2a
(multiplecolumns DATATYPES)

CREATE TABLE #tquery2b
(mulitplecolumns DATATYPES)

INSERT INTO #tquery2a
SELECT columns FROM dbo.table2 WHERE table2.strPhoneNumber = '5555555555'

INSERT INTO #tquery2bSELECT columns FROM dbo.table3 WHERE table3.strPhoneNumberCalled = '1234561234'

SELECT table1.numCustID, #tquery2a.strPhoneNumber, #tquery2b.strPhoneNumberCalled
FROM dbo.table1 table1
INNER JOIN #tquery2a #tquery2a
ON table1.numBillID = #tquery2a.numBillID
INNER JOIN #tquery2b #tquery2b
ON #tquery2a.numBillDtlID = #tquery2b.numBillDtlID
WHERE table1.numCustID = '5555'
ORDER BY #tquery2b.dtmCalled DESC

Believe it or not this method works, especially with the ORDER BY statement and its performance is vastly better than the original query.

Reporting off an OLTP designed databases is not always the easiest thing to do. The database is just built to maximize reports that executives want. Using temporary tables to stage the results from numerous SELECT statements, aggregate those results before displaying them is sometimes the only way to can get reports out of an OLTP database. Working in a call center application you are usually asked to produce reports that summarize what the call center reps are doing on a time filtered basis. Working your way through all the tables to gather the data and then summarizing it in multiple ways can only be accomplished with the use of temporary tables. Before any comes up with this argument: I know I work in a multi-billion company but that doesn't mean that executives are willing to listen to our arguments that they need a data warehouse or a simple reporting database if it means they have to spend money to get one when they can just as easy piggy-back off of the OLTP database and blame me if the queries are too slow and cause a performance headache for the servers. Sorry, that was for the theoretical guys out there who have magically gotten everything they wanted no matter the cost or the size of the companies they worked for.

The last argument for the use of a temporary table is to replace a cursor. I am not fond of cursors and advocate doing anything possible to replace the cursor (performance of your solution needs to be tested against the performance of the cursor though). One of the tricks I use is to mimic the main reason a cursor is usually built for, looping through a result set one row at a time and performing an action based on the data in that row. Below is a short query that displays this logic by obtaining all the user table names and executing sp_spaceused on each table.

SET NOCOUNT ON
DECLARE @lngTabCount INTEGER
DECLARE @lngLoopCount INTEGER
DECLARE @strTabName SYSNAME

CREATE TABLE #tTables(
numID INTEGER IDENTITY(1,1),
strTableName SYSNAME
)

INSERT INTO #tTables (strTableName)
SELECT name FROM dbo.sysobjects WHERE xtype = 'u'

SET @lngTabCount = @@ROWCOUNT
SET @lngLoopCount = @lngTabCount

WHILE @lngLoopCount <> 0
BEGIN
SET @strTabName = (SELECT strTableName FROM #tTables WHERE numID = @lngLoopCount)
EXEC sp_spaceused @strTabName
SET @lngLoopCount = @lngLoopCount - 1
END

DROP TABLE #tTables
GO

Cursor-like actions without cursor overhead and performance related problems

How can you work around using a temporary table?
Now that I shown you several situations when you consider using a temporary table, lets talk about what you can do to avoid using a temporary table if all possible.

There is a nice thing in the SQL world called a derived table that can be used to replace temporary tables in most cases. Once again I'll get on my performance soapbox and say that sometimes with very large data sets, derived tables performance is considerably less than using a temporary table with an index. But for most cases simply using a derived table on a join will cut the need for your temporary table. You can find several articles on the use of derived table at WWW.SQLServerCentral.Com so I will not go into detail on their use in this article. If you are using a temporary table to stage data from several different sources either replace the temporary table with a UNION or create a permanent table to mimic the temporary one, both will usually satisfy your needs with reduced overhead. If you are operating on SQL Server 2000 and are using small data sets, try using the new table data type. This will create a temporary table like object in memory rather than on the tempdb and improve the performance of your query. Explore the use of a correlated sub-query and see if it can replace your temporary table. Sometimes just restating where your data is coming from will replace the need for temporary tables.

Any one of these ways has been discussed as possible alternative solutions to the use of a temporary table. The main key is for you to test alternative ways to determine if you can replace the use of a temporary table before you settle in a create one out of habit. As you create your bag or tricks you will find yourself using temporary tables less and less and even find yourself disgusted at your coding abilities when you actually have to use a temporary table when you truly believe there is another way out there.

If you use temporary tables optimize their use.

If the situation mandates a temporary table then there are several things you can do to maximize their performance. First, just because it is a temporary table do not be tempted to put all the columns and all the rows from your permanent table into the temporary table if you do not need them. Filter the data going into your temporary table to include the minimum number of columns and rows actually needed. Second, do not use the SELECT INTO statement to create your temp table. The SELECT INTO should be avoided at all costs in your coding due to the locking it places on system objects while it determines how to build the table. Take the time to script the temporary table out and use a separate INSERT INTO to populate the table. I will qualify this with that you can use a SELECT INTO if it includes WHERE 1=0 to create a table in the quickest way possible, but don't do this just to save a few keystrokes. Third, watch how you use temporary tables to avoid recompiles on the stored procedure. I explain this in getter detail in my article Optimizing Stored Procedure Recompiles available on my website. Fourth, test the need for a clustered-index on your temporary table. If the data set is large a cluster-index will speed the operations against the temporary table, but you have to weigh in the performance needs of creating that index and inserting into the table with a clustered-index. This is one of those methods that needs to be tested both ways with the largest data set you think will be placed into the temporary table before deciding on the index. And last, I know that when the stored procedure completes and the connection ends the temporary table will be dropped but why keep it around if you are done with it. If you code creates and uses a temporary table and then goes on to do other things that do not involve that table - drop the table when you are done. This frees up tempdb resources for other objects. I will even drop the table at the end of a stored procedure even though the connection is about to finish just to avoid any issues that may arise with unknown bugs.

Summary

While temporary tables (in my opinion) are far better than cursors, they do have a performance hit when being used. This article has briefly discussed several reasons to use a temporary table and several methods to use as alternatives to temporary tables. The key remains you and your situation. Test your query with alternatives before you create a temporary table and test your performance hogs created with temporary tables before you decide on what you can and can't do. I strongly believe that even though I am writing this article, it is my opinion based on my history and before I even jump in with something I read in a book or web site I will test it several different ways. Do this and your Transact-SQL skills will continue to grow to the point that you always have several different paths to take to create a query.

No comments: