(somewhat) Social
« Loading MS Connect To A DB–Part 4: Loading Old Data | Main | Loading MS Connect To A DB–Part 2: Parsing & Loading RSS Data »
Tuesday
Jan112011

Loading MS Connect To A DB–Part 3: Keeping The Data Current

In Part 2 of this series we finally got some data in the database. Now that there’s data we need to be able to keep it up to date. In particular we want to look for status and vote count changes.

We’re going to work with the same Connect items as was shown in the previous post (634019).

The question is, given that there is no RSS feed how are we going to keep this data current? This is where our friend the Net.WebClient comes to the rescue.

First things first, let’s grab a record to work with.

Pulling Data From the Database

We’re going to be using the Net.WebClient to open the URL associated with the Connect item, then parsing that information with Powershell and updating the database with the relevant details.

To pull in a record we use invoke-sqlcmd:

$workingURLs = invoke-sqlcmd -ServerInstance "."  -Database "DenaliConnect" -Query "select ID, URL from dbo.ConnectItems where ID = 634019"

Now we take this recordset and dump the results to variables

$URL = $workingURLs.URL
$ID = $workingURLs.ID

The result is two variables:

$ID = 634019
$URL = http://connect.microsoft.com/SQLServer/feedback/details/634019/enable-controls-to-be-found-by-name-in-report-builder

 

Looking At The Web Page

With the URL captured it’s time to take a look at the data on the page to see what information is available and how we might get to it. The best way to do this is to open it up in the browser and then look at the source (click image to enbiggen)

Good news here, all of the information I want to grab to stored in meta tags. This makes like so much simpler when it comes to parsing the data.

 

Parsing The Data

First thing first we’ll pull the page

$PulledURL = (new-object Net.WebClient).DownloadString("$URL")

Now we have the page stored in a variable we can use our old friend REGEX to parse it and pull what we require. Let’s start with the Up-Votes.

UpVoteMeta

Using regex we can pull just this data:

$regexUp = "<meta name=`"Search.MSConnect.Feedback.UpVoteCount[^>]*?/>"
$up = [regex]::Matches($PulledURL, "$regexup") | % { $_.Value } | Select -First 1
$up

which gives us: <meta name="Search.MSConnect.Feedback.UpVoteCount" content="4" />

Hey wow, that’s just what we needed. Using regex again we can pull just the numerical value and get the actual count.

$UpCount = [regex]::Matches($up, "\d{1,5}") | Select Value
$UpCountVal = [int]$UpCount.Value
$UpCountVal

We run this and we get 4. The right number!

Repeating this for the down-votes, validations, workarounds, opened date and status loads up a bunch of variables ready for writing to an update statement.

 

Updating The Database

We just use the variables to build an update statement

$SQLUpdate = @"
UPDATE dbo.ConnectItems
SET UpVoteCount = $UpCountVal, 
DownVoteCount = $downCountVal,
ValidationCount = $validationCountVal,
WorkAroundCount = $WorkAroundCountVal,
OpenedDate = '$OpenedDate',
ItemStatus = '$ActualStatus'
WHERE ID = $ID
"@
$SQLUpdate

UpVoteUpdateStatement

A quick invoke-sqlcmd and the database is updated. The voting information is current as are the other counts and the status.

invoke-sqlcmd -ServerInstance "."  -Database "DenaliConnect" -Query "$SQLUpdate"

 

Final Steps

Now that this works fine for a single record it’s easy to turn the pull of the web page and the parsing into a function, then loop through all the currently active items and update their data accordingly.

You can download the full script here.

In the final step we’ll go over loading old data which wasn’t captured in the RSS feed.

 

PrintView Printer Friendly Version

EmailEmail Article to Friend

Reader Comments

There are no comments for this journal entry. To create a new comment, use the form below.

PostPost a New Comment

Enter your information below to add a new comment.

My response is on my own website »
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>