(somewhat) Social

Entries in #TSQL2sDay (4)

Tuesday
May102011

T-SQL Tuesday #18–CTE A Simpler Form Of Recursion

It’s T-SQL Tuesday time folks. Bob Pusateri (blog|twitter) is the host and has chosen Common Table Expressions (CTEs) as the topic of the month.

Several ideas came to mind for writing about CTEs, one of the best uses I’ve seen for one recently was to grab the name of the most recent backup file for a database. You’ll have to ask Aaron Nelson (blog|twitter) to hook you up with that one though.

I thought I’d write about an interesting problem that was posed to me in an interview a couple of months ago.

 

Here’s a table

I was given a table with two columns; ManagerID, EmployeeID.

This table was populated with a few values thusly:

USE TempDB
GO
create table #ManagersEmployees (ManagerID int, EmployeeID int)
insert into #ManagersEmployees 
values(1,2), (2,3), (2,4), (2,5), (3,6), (3,7), (3,8)
    , (4,10),(5,11),(5,12), (12,13), (12,14)
GO

I was asked to write a recursive procedure to pull out the manager, employee tree for a given ManagerID.

 

CTEs to the rescue

Having done a little work with CTEs and understanding that I could easily write a recursive query using them I was able to quite quickly put together a script to pull the information needed. By throwing it into a procedure it could quickly and easily be executed.

CREATE PROCEDURE ManagerRecursion_CTE @ManagerID INT
AS
SET NOCOUNT ON
;WITH Managers_CTE (ManagerID, EmployeeID )
AS ( SELECT ManagerID, EmployeeID FROM #ManagersEmployees  
        WHERE ManagerID = @ManagerID
UNION ALL
    SELECT e.ManagerID, e.EmployeeID 
        FROM #ManagersEmployees e 
            INNER JOIN Managers_CTE c on e.ManagerID = c.EmployeeID)
SELECT * FROM Managers_CTE ORDER BY ManagerID, EmployeeID
GO

 

I tested and this worked nicely, it was a simple solution and provided the requested results.

 

That’s not recursion

The trouble is that while the results were not correct I was advised that this was not recursive and did not meet the criteria. Back to the drawing board then.

After a lot more work I came up with the following:

CREATE PROCEDURE ManagerRecursion_NonCTE @ManagerID INT
AS
SET NOCOUNT ON
DECLARE @rowcnt INT, @lastrow INT
DECLARE @Tbl_Results TABLE (rowid INT IDENTITY(1,1), ManagerID INT, EmployeeID INT)
 
INSERT INTO @Tbl_Results (ManagerID, EmployeeID)
SELECT ManagerID, EmployeeID
FROM #ManagersEmployees
WHERE ManagerID = @ManagerID
 
SET @rowcnt = @@ROWCOUNT
SET @lastrow = 0
WHILE @rowcnt > 0
BEGIN
INSERT INTO @Tbl_Results (ManagerID, EmployeeID)
SELECT m.ManagerID, m.EmployeeID
FROM #ManagersEmployees m
    INNER JOIN @Tbl_Results t
        ON m.ManagerID = t.EmployeeID
WHERE rowid > @lastrow
 
SELECT @rowcnt = @@ROWCOUNT
 
SELECT @lastrow = @@IDENTITY - @rowcnt
END
SELECT ManagerID, EmployeeID FROM @Tbl_Results ORDER BY ManagerID, EmployeeID
GO

 

I tested this and got back the same results as with the first procedure with all the values I passed in. Deep breath on this one as it was pushing the limits of what I could produce on the spot in an interview.

 

That’s still not recursion

Again, while the results we correct this was not recursive. It was back to the drawing board once more. This time I had to admit defeat, however did tell the interviewer that I would work on a solution at home and email it in. He gave me his contact information, we completed the rest of the interview and I went home determined to get the right data in the manner that the interviewer wanted.

After a whole bunch of reading and a lot of work I finally came up with correct results in a recursive procedure which I emailed in to get feedback.

CREATE PROC ManagerRecursion @EmpID INT, @InnerLoop INT = 0
AS
BEGIN
    IF @InnerLoop = 0
        BEGIN
        CREATE TABLE #Tbl_Results (ManagerID INT, EmployeeID INT)
        END
            INSERT INTO #Tbl_Results (ManagerID, EmployeeID)
            SELECT ManagerID, EmployeeID FROM #ManagersEmployees WHERE ManagerID = @EmpID
 
            SET @EmpID = (SELECT MIN(EmployeeID) FROM #ManagersEmployees WHERE ManagerID = @EmpID)
 
            WHILE @EmpID IS NOT NULL
            BEGIN
                EXEC ManagerRecursion @EmpID, 1
                SET @EmpID = (SELECT MIN(EmployeeID) FROM #ManagersEmployees WHERE EmployeeID > @EmpID 
                    AND EmployeeID NOT IN (SELECT ManagerID FROM #Tbl_Results)
                    AND EmployeeID IN (SELECT EmployeeID FROM #Tbl_Results))
            END
    IF @InnerLoop = 0   
        BEGIN
        SELECT * FROM #Tbl_Results order by ManagerID, EmployeeID
        DROP TABLE #Tbl_Results
        END      
END
 
GO

 

Unfortunately no feedback was forthcoming. I felt good about providing this solution despite that. I enjoy a challenge and this was certainly one of those.

 

So what was the right way?

That depends on who you ask. For me the first way was the right was. It performed well, the code was clean and easy and required a minimum amount of development. I feel that the solution here was exactly the reason that CTEs were created in the first place.

The second solution was a lot more work, the query got more complex and it does not perform as well as the first.

The final procedure was true recursion in that the procedure calls itself over and over again until all of the results are returned. It’s a loop that makes cursors look like they perform well. It was easily the worst performing of the three.

 

It all goes to show there’s more than one way to get the results you need. It’s also interesting how an example like this shows just how much work the SQL development team have done to help reduce complexity and improve performance.

Monday
Sep132010

T-SQL Tuesday #10 - Applying Indexes To Replicated Tables

It's T-SQL Tuesday time again! This month Michael J. Swart (blog|twitter) is hosting and has given us the topic of indexes.

I've previously written about how you can find indexes that are not being used and get their sizes. Not being allowed to reuse that meant that I had to come up with something else.

In my previous post on calculating replication schema options I documented a way of being able to change article options in transactional replication so that you could apply all the non-clustered indexes on your subscribing table that exist on the publishing one. But what if you are using replication to offload reporting and you need to create indexes that don't exist on the publishing table?

You could always manually apply these indexes any time that you perform a snapshot but this requires manual work and can easily be forgotten.

Fortunately there is another option, you can have replication automatically apply a post-snapshot SQL script which will create those indexes for you. Here's how...

 

Create a basic publication (snapshot or transactional) and add a table article. Perform the snapshot and compare indexes between the tables (in this example I am using the person.address table from AdventureWorks2008).

After the snapshot here's how the indexes stand

 

For this example we're going to create a new index on Admindb.person.address. To do this first we need to generate the index creation script

USE AdminDB
GO
IF NOT EXISTS (SELECT * FROM sys.indexes WHERE object_id = OBJECT_ID(N'[person].[address]') AND name = N'NCIX_Zip')
CREATE NONCLUSTERED INDEX [NCIX_Zip] ON [person].[address]
(
    [PostalCode] ASC
) WITH (ONLINE = ON, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
GO

 

Save this script to C:\ReplicationScripts\CreateReportingIndex.sql

 

Now we change the existing publication so that whenever a snapshot is taken it will apply this script when applied to the subscriber. This change can be done in a couple of ways.

The quick and easy way is to use sp_changepublication and just pass in the publication name, property change and location of the script.

USE AdventureWorks2008
GO
EXEC sp_changepublication
    @publication = 'Post Snapshot Test'
  , @property = N'post_snapshot_script'
  , @value = 'C:\ReplicationScripts\CreateReportingIndex.sql'
  , @force_invalidate_snapshot = 1 --must invalidate existing snapshots for this to take effect
GO

 

This can also be done using the GUI (right click on the publication and select Properties)

 

 

When you next perform a snapshot then the script will be applied. We can check that this actually worked first by looking at replication monitor where it will tell you that the script was applied.

 

And then you can check the table itself and confirm that the index exists

 

 

There are a couple of final things to note.

Usually you will specify a unc path for the SQL script so that you just have to maintain a single copy. You just need to ensure that the subscriber is able to access the post-snapshot script location.

Post-snapshots are not limited to creating indexes, in fact you can do pretty much anything just so long as the Distribution Agent has the access;

  • Send yourself an email to let you know that the snapshot has been applied on the subscriber
  • Apply table permissions
  • Create views

 

If you use replication and find yourself manually reapplying indexes it's worth evaluating whether using post-snapshot scripts might save you some work.

Monday
Jun072010

Estimating Data Compression

 It's T-SQL Tuesday #007 time and we have Jorge Segarra (blog | twitter) the one and only SQLChicken running things.

I'm hoping he's going to have so many posts to go through that he won't notice that I am going to cheat a little.

When I first got the chance to play with SQL 2008 my attention was instantly captured by the opportunity to perform data compression. With an environment growing out of control and a SAN having reached it's physical limits I really needed something to give me a little breathing room. Data compression provided me that option and I worked towards seeing exactly how much space it could save me. A consequence of this work was that last year I wrote an article for SQLServerCentral.com about Data Compression in SQL Server 2008 in which I gave an overview of compression and how it works. As a part of the article I also provided a script which could be used to estimate space savings on every table in a database (extending sp_estimate_data_compression_savings which works on an individual table only).

One thing missing from that article and the attached proc was a final rollup script that would take the data you'd collected and provide it to you in a somewhat more useful format. So in the interests of being helpful here one is...

SELECT
      Database_Name
    , OBJECT_NAME
    , sum(current_size_kb)/1024 as existing_server_size_MB
    , sum(estimated_size_page_kb)/1024 as estimated_server_size_page_compressed_MB
    , sum(estimated_page_savings_kb)/1024 as estimated_server_size_page_savings_MB
FROM
    estimated_data_compression_savings
WHERE
    database_name in ( 'AdventureWorks2008', 'AdventureWorksDW2008', 'teststatsdb')
GROUP BY
      Database_Name
    , object_name
WITH ROLLUP
HAVING sum(current_size_kb)/1024 > 0
ORDER BY
      Database_Name
    , estimated_server_size_page_savings_MB desc

Note: the WITH ROLLUP allows us to quickly get a summary of the total space savings we could get at the server and database level without having to perform other subqueries

So there you go, a quick and easy view of where you might be able to save yourself some disk space on your SQL Servers. Remember though, it's only a guide and you need to thoroughly test performance before implementing any kind of compression.

 

 

Monday
Mar082010

TSQL Tuesday #004 - Who's Using My IO?

This blog post is in the response to the T-SQL Tuesday #004: IO post by Mike Walsh.

 

I ran into a problem recently whereby I was seeing some disk queuing and it was impacting performance. Nothing much new there, these things happen. However I was fortunate in that some new disk had just been added to the server and there was an opportunity to offload some of the i/o on to some new spindles.

Rather than just throw a couple of databases or logs onto the new disk I decided to take a look and see where the majority of my i/o was coming from. I thought that I knew where it was and knew what I was going to move. I was surprised by what I actually found out.

There's a new function in SQL 2005/2008 called sys.dm_io_virtual_file_stats. You pass in the database id and file id to get a list of statistics including i/o bytes read and written, number of i/o reads and writes and any i/o stalls or waits. What's great is that if you pass in null for the two function paramters you get back information for every file on the SQL Instance.

SELECT * FROM sys.dm_io_virtual_file_stats(null, null)

 




This information would be very useful if you could make head or tail of half the information. This is where the system view sys.master_files comes in very useful. This view contains information about every file on the SQL Instance; what it is, where it is, it's state, size, growth, max size and more.

SELECT * FROM sys.master_files

 

 



If we join the two together we can easily see what files are showing the heaviest i/o on reads, writes or combined.

SELECT DB_NAME(svs.database_id) AS DatabaseName
, smf.name AS InternalName
, smf.physical_name AS PhysicalFile
, smf.type_desc AS FileType
, CONVERT(NUMERIC(15,2), CONVERT(NUMERIC(15,2),io_stall_read_ms) / num_of_reads) AS AverageReadWait_ms
, CONVERT(NUMERIC(15,2), CONVERT(NUMERIC(15,2),io_stall_write_ms) / num_of_writes) AS AverageWriteWait_ms
, CONVERT(NUMERIC(15,2), ((CONVERT(NUMERIC(15,2),io_stall_read_ms) / num_of_reads)
       +  (CONVERT(NUMERIC(15,2),io_stall_write_ms) / num_of_writes)) / 2) AS AverageWait_ms
  FROM sys.dm_io_virtual_file_stats(NULL, NULL) svs
INNER JOIN sys.master_files smf
ON svs.database_id = smf.database_id
AND svs.file_id = smf.file_id
ORDER BY AverageWait_ms DESC

 

 


Using this I was able to quickly identify the worst performing database, what file within that database was being hardest hit and move it out to it's own set of spindles and greatly improve read/write times across the board.


One thing to note, this is based upon the understanding that the indexes are correctly set and utilized on the database in question and that files and filegroups allocations are adjusted already for the best performance. This is far from a catchall and simply adding a covering index to an existing table could provide significant improvements. This is just something to help along the road.