(somewhat) Social

Entries in Indexes (2)

Monday
Sep132010

T-SQL Tuesday #10 - Applying Indexes To Replicated Tables

It's T-SQL Tuesday time again! This month Michael J. Swart (blog|twitter) is hosting and has given us the topic of indexes.

I've previously written about how you can find indexes that are not being used and get their sizes. Not being allowed to reuse that meant that I had to come up with something else.

In my previous post on calculating replication schema options I documented a way of being able to change article options in transactional replication so that you could apply all the non-clustered indexes on your subscribing table that exist on the publishing one. But what if you are using replication to offload reporting and you need to create indexes that don't exist on the publishing table?

You could always manually apply these indexes any time that you perform a snapshot but this requires manual work and can easily be forgotten.

Fortunately there is another option, you can have replication automatically apply a post-snapshot SQL script which will create those indexes for you. Here's how...

 

Create a basic publication (snapshot or transactional) and add a table article. Perform the snapshot and compare indexes between the tables (in this example I am using the person.address table from AdventureWorks2008).

After the snapshot here's how the indexes stand

 

For this example we're going to create a new index on Admindb.person.address. To do this first we need to generate the index creation script

USE AdminDB
GO
IF NOT EXISTS (SELECT * FROM sys.indexes WHERE object_id = OBJECT_ID(N'[person].[address]') AND name = N'NCIX_Zip')
CREATE NONCLUSTERED INDEX [NCIX_Zip] ON [person].[address]
(
    [PostalCode] ASC
) WITH (ONLINE = ON, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
GO

 

Save this script to C:\ReplicationScripts\CreateReportingIndex.sql

 

Now we change the existing publication so that whenever a snapshot is taken it will apply this script when applied to the subscriber. This change can be done in a couple of ways.

The quick and easy way is to use sp_changepublication and just pass in the publication name, property change and location of the script.

USE AdventureWorks2008
GO
EXEC sp_changepublication
    @publication = 'Post Snapshot Test'
  , @property = N'post_snapshot_script'
  , @value = 'C:\ReplicationScripts\CreateReportingIndex.sql'
  , @force_invalidate_snapshot = 1 --must invalidate existing snapshots for this to take effect
GO

 

This can also be done using the GUI (right click on the publication and select Properties)

 

 

When you next perform a snapshot then the script will be applied. We can check that this actually worked first by looking at replication monitor where it will tell you that the script was applied.

 

And then you can check the table itself and confirm that the index exists

 

 

There are a couple of final things to note.

Usually you will specify a unc path for the SQL script so that you just have to maintain a single copy. You just need to ensure that the subscriber is able to access the post-snapshot script location.

Post-snapshots are not limited to creating indexes, in fact you can do pretty much anything just so long as the Distribution Agent has the access;

  • Send yourself an email to let you know that the snapshot has been applied on the subscriber
  • Apply table permissions
  • Create views

 

If you use replication and find yourself manually reapplying indexes it's worth evaluating whether using post-snapshot scripts might save you some work.

Thursday
Jul082010

What Indexes Are Not Being Used And How Big Are They?

Leafs Stanley Cup Cabinet (or Cubs World Series Trophy Cabinet)Indexes...they are your friends. They order your data nice and neatly, they can dramatically improve your query performance and can have cool names like IX_UseMeToRunFasterLikeATurboButton.

Indexes...they are your enemy. They slow down your inserts, updates and deletes, and they take up valuable disk space.

So having indexes that are used is a great thing.  You'll see query performance improve by an order of magnitude and some freeing up of your server resources. However, if not used, an index just sits there an takes up space kind of like the Toronto Maple Leafs Stanely Cup cabinet. Let's not forget those unused indexes are still being kept up to date and slowing down your DML statements.

 

A couple of months ago I was invited to add a large number of indexes to a database in production that is used for reporting. Being a good DBA I asked, quite politely, why they were needed. "They are so the queries run faster" was the response. Much banter flowed back and forth between teams and in the end all of the indexes were applied (this included 20 indexes on a 534 column table, yes, I know).

I felt these indexes were not going to be used, but as in a lot of cases it comes down to a please just do it situation. So I did.

Three months on and I had a couple of hours to take a look and see how these indexes were doing. A great post by Robert Davis (blog|twitter) entitled "How Do I ... Get a List of Tables With Data Modifications in a Specific Timeframe?" got me started with the idea of using sys.dm_db_index_usage_stats to check for this.

When a DML statement gets executed the user_updates column in this DMV gets incremented by one. Likewise, when a query is executed and an index used, be it for a lookup, scan or seek then then user_lookup, user_scan or user_seek value gets incremented.

Armed with this information I was quickly and easily able to identify indexes in the database that had not been used since the last Instance restart (joining against sysindexes provides the index name)

SELECT
   object_name(us.object_id) as objname
  , si.[name] as IndexName
FROM
  sys.dm_db_index_usage_stats us
    INNER JOIN sysindexes si
      ON us.[object_id] = si.id
      AND us.index_id = si.indid
WHERE
  us.database_id = db_id()
  AND us.user_lookups = 0
  AND us.user_scans = 0
  AND us.user_seeks = 0

 

This gave me a nice list of unused indexes that I could contemplate removing. However I really felt that I needed to put some numbers behind those indexes. Managers love numbers. By pulling the list of indexes into a table and then using sys.dm_db_index_physical_stats I was able to get the size of each index. Adding the user_updates information from sys.dm_db_index_usage_stats provided me with a long list of indexes along with how many times they have been updated since SQL was last started and the size of each index.

 

--CHECKS TO SEE IF INDEXES ARE BEING USED IN A DATABASE
--IF UNUSED THEN GETS THE SIZE OF THE INDEX ALONG WITH UPDATES SINCE LAST SQL START


if object_id('tempdb..#indexinfo') is not null
drop table #indexinfo

--Load up a list of unused indexes for the current database
SELECT
  ROW_NUMBER() OVER (ORDER BY object_name(us.object_id), us.index_id) as ROWNUM
  , object_name(us.object_id) as objname
  , us.[object_id]
  , us.index_id
  , si.[name] as IndexName
  , us.user_updates
  , 0 as IndexSizeKB
INTO
  #indexinfo
FROM
  sys.dm_db_index_usage_stats us
    INNER JOIN sysindexes si
      ON us.[object_id] = si.id
      AND us.index_id = si.indid
WHERE
  us.database_id = db_id()
  AND us.user_lookups = 0
  AND us.user_scans = 0
  AND us.user_seeks = 0


--Going to use a while loop and get the physical info for each index for sizing purposes
DECLARE @object_id bigint
      , @index_id int
      , @rownum int
      , @index_size int
      
SELECT @rownum = min(ROWNUM) from #indexinfo where ROWNUM IS NOT NULL
WHILE @rownum IS NOT NULL
BEGIN

    SELECT @object_id = object_id from #indexinfo WHERE ROWNUM = @rownum
    SELECT @index_id = index_id from #indexinfo WHERE ROWNUM = @rownum

    SELECT @index_size = convert(int, (avg_record_size_in_bytes * record_count) / 1024)
    FROM sys.dm_db_index_physical_stats(db_id(), @object_id, @index_id,  null, 'detailed')
    WHERE index_level = 0

    UPDATE #indexinfo
    SET IndexSizeKB = @index_size
    WHERE ROWNUM = @rownum

    SELECT @rownum = min(ROWNUM) from #indexinfo where ROWNUM > @rownum and ROWNUM IS NOT NULL
END

--Give us the results
SELECTobjname as TableName, IndexName, user_updates, IndexSizeKB FROM #indexinfo
SELECTsum(user_updates) as Updates, sum(indexSizeKB) / 1024 as TotalUnusedIndexesMB
FROM #indexinfo


I ended up finding 245 indexes that were not being used. Those indexes had been updated over 15 million times in the last two weeks and they totalled up to over 12GB of utilized disk.

Some nice numbers to add to my request to drop those indexes.