What is the best way to aggregate data from remote servers?

Question

What is the best way to aggregate data from remote servers?

Tim Staddon

Valued Member

Points: 57
More actions
July 14, 2008 at 9:13 am

#190675

Hi,
I'm building a sharepoint where the customer list is obtained from BDC, which needs a data source that collects data from six different databases over several sites. Some of these are connected by very slow WAN links, or have very old servers. Politics are involved, so all I can do is query the databases using a read-only account, from my own server.
The fastest way I've been able to do it is using a SP to grab each customer ID, name and postcode from a remote table using OPENQUERY, then store the results in a local table; repeat for each database, and then finally do some clever reformatting to generate a table which looks like this:
[font="Courier New"]
CUSTNAME POSTCODE TABLE1ID TABLE2ID TABLE3ID TABLE4ID TABLE5ID TABLE6ID
CUSTOMER Limited SO123ME Customer1 CUST1 C100001
CUSTOMER LTD SO123ME Customer1 CUST1 C100001 [/font]
It isn't very fast, and as you can guess it's not perfect because the different databases may have the same customer but store their names differently (yes, I know this is a bad example because I could use a replace function, and in fact I do where possible, but it can't catch everything).
At least I can use the CUSTNAME column as a key column in this setup, and users can do a partial name search to obtain a list of records that match.
If you guys were doing this, how would you go about it?

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 1

If the names "look" different on different servers, then CustName cannot be your key, since it will not help you identify cust1 on server1 and cust1a on server2 as being the same customer. That's where your first challenge is (making sure you have a cross-reference ability among your data sources). If there isn't one readily available, then I'd build one (mapping the customers to their matches on different servers.)

Once you have that - then I might roughly use your approach: use OPENQUERY to pull back FILTERED rows from each data set, pulling only the key info I need and the info required to aggregate, and then perform the final aggregation locally. If possible - I'd aggregate on the remote server FIRST, and bring back pre-aggregated data (if you can come up with a way to make the calculation work with aggregating semi-aggregated data).

If that's still slow, then perhaps look at your slowest links, and figure out how to asynchronously get the data from them (something like replication or mirroring might work for that), so that the data is retrieved quickly, even if it's not the freshest data (you replication could fall behind, but you would still get a report out fairly quickly). Depends of course on what the report requirements are, etc...

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

GSquared SSC Guru Points: 260824 More actions · Answer 2

I'm assuming that Customer IDs aren't shared between databases. Is that correct? If so, then best-match on name is possibly all you can do. You might match against phone, e-mail, or some such, as well, if that's an option.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Tim Staddon Valued Member Points: 57 More actions · Answer 3

Hi,

What would be the advantage of pulling back filtered rowsets?

I need to build ID Enumerator, Finder and SpecificFinder methods; surely if I'm going to aggregate the customer tables from all six systems properly then I can't do that without grabbing all records from all tables.

Quite right about being unable to correlate the data properly; however the idea of doing it this way is that we can see how many systems have different names for the same customer and then we can just do a clean-up.

The long term plan is for it to show only one row for each customer, but we're happy for it to show

Customer 1 Ltd

Customer1 ltd

CUSTOMER ONE LIMITED

etc while that sanity check takes place. Either way, there will still only be one unique instance of each customer name, which is why I think we'll be able to use it as the primary key.

I have a function called Neaten which strips out white spaces, capitalises the name, changes Ltd. to LIMITED, replaces ampersands with AND etc. Getting a list of distinct customer names and matching codes was dead easy (SELECT DISTINCT....).

Writing the pivot table was extremely tedious but not especially difficult.

Tim Staddon Valued Member Points: 57 More actions · Answer 4

Hi,

I've got it working rather nicely with SPs that query the remote tables, then update local tables with the query results.

This is blisteringly fast from a client PC but I do need drop the table every night; the next time someone or something executes the SP the table is recreated and then populated from the dynamic sql outside of core hours so Sharepoint never realises it's going on.

Is there an efficient way to do this without dropping the table though?

I looked at the SQL 2005 crippled alternative to MERGE and I have my reservations about using it in the SPs called by MOSS 2007.

Not that I'm a cynic, but Sharepoint has time and again demonstrated to me, a habit of throwing its teddy out of the pram as soon as I try to do anything vaguely complex in a SP.

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 5

Instead of referencing the tables directly - perhaps set up a view to act as your data source in this case. This way you can build the new table while the existing one is in place, then switch the view's definition (sub MS type of speed, and then drop the table).

That way - the SP is always referencing something valid.

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Tim Staddon Valued Member Points: 57 More actions · Answer 6

Here's what I need to get just to get a list of support calls from our support system:

create table #SupportSystem_Calls

(

[CallRef] varchar(8),

[CustomerRef] varchar(16),

[NAME] varchar(120),

[Opened] datetime,

[Closed] datetime,

[Actioned] datetime,

[PRODUCTREF] varchar(16),

[ProductName] varchar(40),

[VERSIONREF] varchar(16),

[Type] varchar(32),

[Status] varchar(8000),

[Priority] varchar(32),

[TITLE] varchar(250),

[CALLER] varchar(40),

[HANDLERTAG] varchar(8),

[CUSTSOWNREF] varchar(16),

[SERIALNO] varchar(40),

[SLA] varchar(32),

[ActionCategory] varchar(32),

[szemail] varchar(120),

[szphone] varchar(20),

[szline] varchar(16),

[szresolvedby] varchar(8),

[SourceTable] varchar(8),

[Customer Name] varchar(120)

)

INSERT into #SupportSystem_Calls

SELECT CallRef, CustomerRef, NAME, Opened, Closed, Actioned, PRODUCTREF, ProductName, VERSIONREF, Type, Status, Priority, TITLE, CALLER,

HANDLERTAG, CUSTSOWNREF, SERIALNO, SLA, ActionCategory, szemail, szphone, szline, szresolvedby, SourceTable, dbo.Neaten(NAME)

AS [Customer Name]

FROM OPENQUERY([SupportServer],

'SELECT SupportDB.Archived_Calls.REFERENCE AS CallRef, SupportDB.Archived_Calls.CUSTOMERREF AS CustomerRef, CUSTOMERTABLE_1.NAME, DATEADD(hh,

SupportDB.Archived_Calls.OPENHOURS, DATEADD(n, SupportDB.Archived_Calls.OPENMINUTES, SupportDB.Archived_Calls.OPENDATE))

AS Opened, DATEADD(hh, SupportDB.Archived_Calls.CLOSEHOURS, DATEADD(n, SupportDB.Archived_Calls.CLOSEMINUTES,

SupportDB.Archived_Calls.CLOSEDATE)) AS Closed, DATEADD(hh, SupportDB.Archived_Calls.ACTIONHOURS, DATEADD(n,

SupportDB.Archived_Calls.ACTIONMINUTES, SupportDB.Archived_Calls.ACTIONDATE)) AS Actioned, SupportDB.Archived_Calls.PRODUCTREF, SupportDB.SUPPROD.NAME AS ProductName,

SupportDB.Archived_Calls.VERSIONREF, SUPCATEG_2.CATEGORY AS Type, REPLACE(SUPCATEG_3.CATEGORY, ''£ '', '''') AS Status,

SUPCATEG_4.CATEGORY AS Priority, SupportDB.Archived_Calls.TITLE, SupportDB.Archived_Calls.CALLER,

SupportDB.Archived_Calls.HANDLERTAG, SupportDB.Archived_Calls.CUSTSOWNREF, SupportDB.Archived_Calls.SERIALNO,

SUPCATEG_1.CATEGORY AS SLA, SUPCATEG_5.CATEGORY AS ActionCategory, SupportDB.Archived_Calls.szemail,

SupportDB.Archived_Calls.szphone, SupportDB.Archived_Calls.szline, SupportDB.Archived_Calls.szresolvedby,

''Archived'' AS SourceTable

FROM SupportDB.Archived_Calls INNER JOIN

SupportDB.SUPCATEG AS SUPCATEG_5 ON SupportDB.Archived_Calls.ACTION = SUPCATEG_5.NO INNER JOIN

SupportDB.SUPCATEG AS SUPCATEG_1 ON SupportDB.Archived_Calls.CATAREA = SUPCATEG_1.NO INNER JOIN

SupportDB.SUPCATEG AS SUPCATEG_2 ON SupportDB.Archived_Calls.CATTYPE = SUPCATEG_2.NO INNER JOIN

SupportDB.SUPCATEG AS SUPCATEG_3 ON SupportDB.Archived_Calls.CATSTATUS = SUPCATEG_3.NO INNER JOIN

SupportDB.SUPCATEG AS SUPCATEG_4 ON SupportDB.Archived_Calls.CATPRIORITY = SUPCATEG_4.NO INNER JOIN

SupportDB.CUSTOMERTABLE AS CUSTOMERTABLE_1 ON SupportDB.Archived_Calls.CUSTOMERREF = CUSTOMERTABLE_1.REFERENCE INNER JOIN

SupportDB.SUPPROD ON SupportDB.Archived_Calls.PRODUCTREF = SupportDB.SUPPROD.REFERENCE ORDER BY Closed DESC')

INSERT into #SupportSystem_Calls

SELECT CallRef, CustomerRef, NAME, Opened, Closed, Actioned, PRODUCTREF, ProductName, VERSIONREF, Type, Status, Priority, TITLE, CALLER,

HANDLERTAG, CUSTSOWNREF, SERIALNO, SLA, ActionCategory, szemail, szphone, szline, szresolvedby, SourceTable, dbo.Neaten(NAME)

AS [Customer Name]

FROM OPENQUERY([SupportServer],

'SELECT SupportDB.Calls.REFERENCE AS CallRef, SupportDB.Calls.CUSTOMERREF AS CustomerRef, SupportDB.CUSTOMERTABLE.NAME, DATEADD(hh,

SupportDB.Calls.OPENHOURS, DATEADD(n, SupportDB.Calls.OPENMINUTES, SupportDB.Calls.OPENDATE)) AS Opened, DATEADD(hh,

SupportDB.Calls.CLOSEHOURS, DATEADD(n, SupportDB.Calls.CLOSEMINUTES, SupportDB.Calls.CLOSEDATE)) AS Closed, DATEADD(hh,

SupportDB.Calls.ACTIONHOURS, DATEADD(n, SupportDB.Calls.ACTIONMINUTES, SupportDB.Calls.ACTIONDATE)) AS Actioned,

SupportDB.Calls.PRODUCTREF,SupportDB.SUPPROD.NAME AS ProductName, SupportDB.Calls.VERSIONREF, SUPCATEG_2.CATEGORY AS Type, REPLACE(SUPCATEG_3.CATEGORY, ''£ '',

'''') AS Status, SUPCATEG_4.CATEGORY AS Priority, SupportDB.Calls.TITLE, SupportDB.Calls.CALLER, SupportDB.Calls.HANDLERTAG,

SupportDB.Calls.CUSTSOWNREF, SupportDB.Calls.SERIALNO, SUPCATEG_1.CATEGORY AS SLA,

SupportDB.SUPCATEG.CATEGORY AS ActionCategory, SupportDB.Calls.szemail, SupportDB.Calls.szphone, SupportDB.Calls.szline,

SupportDB.Calls.szresolvedby,

''Live'' AS SourceTable

FROM SupportDB.Calls INNER JOIN

SupportDB.SUPCATEG ON SupportDB.Calls.ACTION = SupportDB.SUPCATEG.NO INNER JOIN

SupportDB.SUPCATEG AS SUPCATEG_1 ON SupportDB.Calls.CATAREA = SUPCATEG_1.NO INNER JOIN

SupportDB.SUPCATEG AS SUPCATEG_2 ON SupportDB.Calls.CATTYPE = SUPCATEG_2.NO INNER JOIN

SupportDB.SUPCATEG AS SUPCATEG_3 ON SupportDB.Calls.CATSTATUS = SUPCATEG_3.NO INNER JOIN

SupportDB.SUPCATEG AS SUPCATEG_4 ON SupportDB.Calls.CATPRIORITY = SUPCATEG_4.NO INNER JOIN

SupportDB.CUSTOMERTABLE ON SupportDB.Calls.CUSTOMERREF = SupportDB.CUSTOMERTABLE.REFERENCE INNER JOIN

SupportDB.SUPPROD ON SupportDB.Calls.PRODUCTREF = SupportDB.SUPPROD.REFERENCE ORDER BY Actioned DESC')

INSERT into #SupportSystem_Calls

SELECT CallRef, CustomerRef, NAME, Opened, Closed, Actioned, PRODUCTREF, 'Unspecified' AS ProductName, VERSIONREF, Type, Status, Priority, TITLE,

CALLER, HANDLERTAG, CUSTSOWNREF, SERIALNO, SLA, ActionCategory, szemail, szphone, szline, szresolvedby, 'Unknown' AS SourceTable,

dbo.Neaten(NAME) AS [Customer Name]

FROM SupportDB.Archived_Calls

WHERE (PRODUCTREF = '')

SELECT TOP 100 PERCENT * FROM #SupportSystem_Calls

ORDER BY CallRef ASC

DROP TABLE #SupportSystem_Calls

I tried configuring a remote view to retrieve this dataset. It took 8 seconds to retrieve 3600 records, from a SQL Server instance hosted on the server below mine in the rack and both servers are connected via the same gigabit switch.

I tried using SP and it is faster - 3 seconds to retrieve the dataset. But considering the TINY size of the dataset, it's still too slow.

About the only way that does seem to return the dataset without latency, is if I create a local table and then use SP to retrieve data from it; this reduces the lag considerably.

But then it doesn't update itself, so I need to be able to run an upsert within the SP, for instance like this:

INSERT into #HOTHTMP

SELECT * FROM dbo.[Recent HOTH Transactions]

UPDATE SupportCalls

SET [Opened] = #TMP.[Opened],

[Closed] = #TMP.[Closed],

[Actioned] = #TMP.[Actioned],

[Status] = #TMP.[Status],

[TITLE] = #TMP.[TITLE],

[SLA] = #TMP.[SLA],

[ActionCategory] = #TMP.[ActionCategory],

[Customer Name] = #TMP.[Customer Name],

[HANDLERTAG] = #TMP.[HANDLERTAG],

[szresolvedby] = #TMP.[szresolvedby],

[SourceTable] = #TMP.[SourceTable],

[szline] = #TMP.[szline]

FROM #TMP WHERE SupportCalls.CallRef = #TMP.CallRef

INSERT into SupportCalls

SELECT * FROM #TMP WHERE CallRef NOT IN(SELECT DISTINCT CallRef FROM SupportCalls)

SELECT * FROM SupportCalls

DROP TABLE #HOTHTMP

When I restricted the contents of the #TMP table so it only showed the last 50 modified records from the live table and the last 5 updated records from the archived table, I cut the execution time of the UPSERT by 80% and it still ensures that the Business Data List in Sharepoint is up to date.

However, when I execute the final SELECT statement, every record that wasn't updated shows NULL in the twelve columns used by the update routine.

If I can get the UPSERT to work and then return the complete updated recordset in the final statement of the SP, then I would expect that to be the most efficient solution for Sharepoint.

What is the best way to aggregate data from remote servers?

Cookies on SQLServerCentral