Loading MS Connect To A DB–Part 1: DB & Powershell Setup
Thursday, January 6, 2011 at 7:07PM Database Setup
Step one in loading the data into the database was to create the schema. Already knowing what was going to be stored it was quite simple to create a couple of tables. One to hold the permanent data and another staging table for working with the data.
For the purposes of this I created a new database “DenaliConnect” and used the following script to create the schema:
1/11/2011 - due to extremely long URLs causing data truncation errors I've increased the URL column from varchar(255) to varchar(1000)
1: USE [DenaliConnect]
2: GO
3: SET ANSI_NULLS ON
4: GO
5: SET QUOTED_IDENTIFIER ON
6: GO
7: SET ANSI_PADDING ON
8: GO
9: CREATE TABLE [dbo].[ConnectItems_Staging](
10: [ID] [int] NOT NULL,
11: [Title] [varchar](1000) NULL,
12: [URL] [varchar](1000) NOT NULL,
13: [ItemStatus] [varchar](1000) NOT NULL,
14: [UpVoteCount] [int] NOT NULL,
15: [DownVoteCount] [int] NOT NULL,
16: [ItemDescription] [nvarchar](max) NOT NULL,
17: [Author] [varchar](100) NULL,
18: [Modified] [datetime] NOT NULL,
19: PRIMARY KEY CLUSTERED
20: (
21: [ID] ASC
22: )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
23: ) ON [PRIMARY]
24: GO
25: SET ANSI_PADDING OFF
26: GO
27:
28: SET ANSI_NULLS ON
29: GO
30: SET QUOTED_IDENTIFIER ON
31: GO
32: SET ANSI_PADDING ON
33: GO
34: CREATE TABLE [dbo].[ConnectItems](
35: [ID] [int] NOT NULL,
36: [Title] [varchar](1000) NULL,
37: [URL] [varchar](1000) NOT NULL,
38: [ItemStatus] [varchar](1000) NOT NULL,
39: [UpVoteCount] [int] NOT NULL,
40: [DownVoteCount] [int] NOT NULL,
41: [ItemDescription] [nvarchar](max) NOT NULL,
42: [Author] [varchar](100) NULL,
43: [Modified] [datetime] NOT NULL,
44: [ValidationCount] [int] NOT NULL,
45: [WorkAroundCount] [int] NOT NULL,
46: [OpenedDate] [datetime] NOT NULL,
47: PRIMARY KEY CLUSTERED
48: (
49: [ID] ASC
50: )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
51: ) ON [PRIMARY]
52: GO
53: SET ANSI_PADDING OFF
54: GO
55:
56: ALTER TABLE [dbo].[ConnectItems_Staging] ADD DEFAULT ((0)) FOR [UpVoteCount]
57: GO
58: ALTER TABLE [dbo].[ConnectItems_Staging] ADD DEFAULT ((0)) FOR [DownVoteCount]
59: GO
60: ALTER TABLE [dbo].[ConnectItems_Staging] ADD DEFAULT (getdate()) FOR [Modified]
61: GO
62: ALTER TABLE [dbo].[ConnectItems] ADD DEFAULT ((0)) FOR [UpVoteCount]
63: GO
64: ALTER TABLE [dbo].[ConnectItems] ADD DEFAULT ((0)) FOR [DownVoteCount]
65: GO
66: ALTER TABLE [dbo].[ConnectItems] ADD DEFAULT (getdate()) FOR [Modified]
67: GO
68: ALTER TABLE [dbo].[ConnectItems] ADD DEFAULT ((0)) FOR [ValidationCount]
69: GO
70: ALTER TABLE [dbo].[ConnectItems] ADD DEFAULT ((0)) FOR [WorkAroundCount]
71: GO
72: ALTER TABLE [dbo].[ConnectItems] ADD DEFAULT ('01/01/1901') FOR [OpenedDate]
73: GO
This gives us a couple of tables to work with. To get the data from the ConnectItems_Staging to the ConnectItems table I took advantage of the MERGE statement in SQL 2008 and created a stored procedure to perform the load:
1: SET ANSI_NULLS ON
2: GO
3: SET QUOTED_IDENTIFIER ON
4: GO
5: CREATE procedure [dbo].[MergeItems]
6: AS
7: MERGE dbo.ConnectItems as Target
8: USING (SELECT ID, Title, URL, ItemStatus, UpVoteCount, DownVoteCount, ItemDescription, Author, Modified FROM dbo.ConnectItems_Staging) AS Source
9: ON (Target.ID = Source.ID)
10: WHEN MATCHED THEN
11: UPDATE SET Target.UpVoteCount = Source.UpVoteCount, Target.DownVoteCount = Source.DownVoteCount, Target.Modified = Source.Modified, Target.ItemStatus = Source.ItemStatus
12: WHEN NOT MATCHED BY TARGET THEN
13: INSERT (ID, Title, URL, ItemStatus, UpVoteCount, DownVoteCount, ItemDescription, Author, Modified)
14: VALUES (Source.ID, Source.Title, Source.URL, Source.ItemStatus, Source.UpVoteCount, Source.DownVoteCount, Source.ItemDescription, Source.Author, Source.Modified)
15: ;
16: GO
Unfortunately when it came to the online database which was going to be the backend for the published data we were stuck with running SQL 2005 in which MERGE is not supported. As such I ended up creating another proc to perform this task
1: SET ANSI_NULLS ON
2: GO
3: SET QUOTED_IDENTIFIER ON
4: GO
5: CREATE PROCEDURE [dbo].[InsertUpdateItems]
6: AS
7: --Update existing records
8: UPDATE C
9: SET UpVoteCount = S.UpVoteCount
10: , DownVoteCount = S.DownVoteCount
11: , Modified = S.Modified
12: , ItemStatus = S.ItemStatus
13: FROM dbo.ConnectItems C
14: INNER JOIN dbo.ConnectItems_Staging S
15: ON C.ID = S.ID
16: --Insert new ones
17: INSERT INTO dbo.ConnectItems (ID, Title, URL, ItemStatus, UpVoteCount, DownVoteCount, ItemDescription, Author, Modified)
18: SELECT S.ID, S.Title, S.URL, S.ItemStatus, S.UpVoteCount, S.DownVoteCount, S.ItemDescription, S.Author, S.Modified
19: FROM dbo.ConnectItems_Staging S
20: LEFT OUTER JOIN dbo.ConnectItems C
21: ON S.ID = C.ID
22: WHERE C.ID IS NULL
23: GO
Powershell Setup
The data pull is performed using Powershell V2 and the PowershellPack (which contains 10 modules to help you get the most out of Powershell). Included in the PowershellPack is the PSRSS module to manage RSS feeds.
I originally tried using the WebClient functionality of Powershell (New-Object System.Net.WebClient) which I use in my MCM Video Download script however I ran into encoding errors on the page. PSRSS eliminated this as a problem and allowed me to move forward.
Next Up
In Part 2 I’ll cover pulling the RSS feed from MS Connect and parsing the data for loading into the database.
- Intro
- Part 1 – Database & Powershell setup
- Part 2 – Parsing the RSS & loading the data
- Part 3 – Keeping the data current
- Part 4 – Loading old data


Reader Comments