July 14, 2008 at 9:13 am
Hi,
I'm building a sharepoint where the customer list is obtained from BDC, which needs a data source that collects data from six different databases over several sites. Some of these are connected by very slow WAN links, or have very old servers. Politics are involved, so all I can do is query the databases using a read-only account, from my own server.
The fastest way I've been able to do it is using a SP to grab each customer ID, name and postcode from a remote table using OPENQUERY, then store the results in a local table; repeat for each database, and then finally do some clever reformatting to generate a table which looks like this:
[font="Courier New"]
CUSTNAME POSTCODE TABLE1ID TABLE2ID TABLE3ID TABLE4ID TABLE5ID TABLE6ID
CUSTOMER Limited SO123ME Customer1 CUST1 C100001
CUSTOMER LTD SO123ME Customer1 CUST1 C100001 [/font]
It isn't very fast, and as you can guess it's not perfect because the different databases may have the same customer but store their names differently (yes, I know this is a bad example because I could use a replace function, and in fact I do where possible, but it can't catch everything).
At least I can use the CUSTNAME column as a key column in this setup, and users can do a partial name search to obtain a list of records that match.
If you guys were doing this, how would you go about it?
July 14, 2008 at 9:35 am
If the names "look" different on different servers, then CustName cannot be your key, since it will not help you identify cust1 on server1 and cust1a on server2 as being the same customer. That's where your first challenge is (making sure you have a cross-reference ability among your data sources). If there isn't one readily available, then I'd build one (mapping the customers to their matches on different servers.)
Once you have that - then I might roughly use your approach: use OPENQUERY to pull back FILTERED rows from each data set, pulling only the key info I need and the info required to aggregate, and then perform the final aggregation locally. If possible - I'd aggregate on the remote server FIRST, and bring back pre-aggregated data (if you can come up with a way to make the calculation work with aggregating semi-aggregated data).
If that's still slow, then perhaps look at your slowest links, and figure out how to asynchronously get the data from them (something like replication or mirroring might work for that), so that the data is retrieved quickly, even if it's not the freshest data (you replication could fall behind, but you would still get a report out fairly quickly). Depends of course on what the report requirements are, etc...
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
July 14, 2008 at 9:36 am
I'm assuming that Customer IDs aren't shared between databases. Is that correct? If so, then best-match on name is possibly all you can do. You might match against phone, e-mail, or some such, as well, if that's an option.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
July 14, 2008 at 10:12 am
Hi,
What would be the advantage of pulling back filtered rowsets?
I need to build ID Enumerator, Finder and SpecificFinder methods; surely if I'm going to aggregate the customer tables from all six systems properly then I can't do that without grabbing all records from all tables.
Quite right about being unable to correlate the data properly; however the idea of doing it this way is that we can see how many systems have different names for the same customer and then we can just do a clean-up.
The long term plan is for it to show only one row for each customer, but we're happy for it to show
Customer 1 Ltd
Customer1 ltd
CUSTOMER ONE LIMITED
etc while that sanity check takes place. Either way, there will still only be one unique instance of each customer name, which is why I think we'll be able to use it as the primary key.
I have a function called Neaten which strips out white spaces, capitalises the name, changes Ltd. to LIMITED, replaces ampersands with AND etc. Getting a list of distinct customer names and matching codes was dead easy (SELECT DISTINCT....).
Writing the pivot table was extremely tedious but not especially difficult.
July 15, 2008 at 10:28 am
Hi,
I've got it working rather nicely with SPs that query the remote tables, then update local tables with the query results.
This is blisteringly fast from a client PC but I do need drop the table every night; the next time someone or something executes the SP the table is recreated and then populated from the dynamic sql outside of core hours so Sharepoint never realises it's going on.
Is there an efficient way to do this without dropping the table though?
I looked at the SQL 2005 crippled alternative to MERGE and I have my reservations about using it in the SPs called by MOSS 2007.
Not that I'm a cynic, but Sharepoint has time and again demonstrated to me, a habit of throwing its teddy out of the pram as soon as I try to do anything vaguely complex in a SP.
July 16, 2008 at 8:31 am
Instead of referencing the tables directly - perhaps set up a view to act as your data source in this case. This way you can build the new table while the existing one is in place, then switch the view's definition (sub MS type of speed, and then drop the table).
That way - the SP is always referencing something valid.
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
July 16, 2008 at 9:17 am
Here's what I need to get just to get a list of support calls from our support system:
create table #SupportSystem_Calls
(
[CallRef] varchar(8),
[CustomerRef] varchar(16),
[NAME] varchar(120),
[Opened] datetime,
[Closed] datetime,
[Actioned] datetime,
[PRODUCTREF] varchar(16),
[ProductName] varchar(40),
[VERSIONREF] varchar(16),
[Type] varchar(32),
[Status] varchar(8000),
[Priority] varchar(32),
[TITLE] varchar(250),
[CALLER] varchar(40),
[HANDLERTAG] varchar(8),
[CUSTSOWNREF] varchar(16),
[SERIALNO] varchar(40),
[SLA] varchar(32),
[ActionCategory] varchar(32),
[szemail] varchar(120),
[szphone] varchar(20),
[szline] varchar(16),
[szresolvedby] varchar(8),
[SourceTable] varchar(8),
[Customer Name] varchar(120)
)
INSERT into #SupportSystem_Calls
SELECT CallRef, CustomerRef, NAME, Opened, Closed, Actioned, PRODUCTREF, ProductName, VERSIONREF, Type, Status, Priority, TITLE, CALLER,
HANDLERTAG, CUSTSOWNREF, SERIALNO, SLA, ActionCategory, szemail, szphone, szline, szresolvedby, SourceTable, dbo.Neaten(NAME)
AS [Customer Name]
FROM OPENQUERY([SupportServer],
'SELECT SupportDB.Archived_Calls.REFERENCE AS CallRef, SupportDB.Archived_Calls.CUSTOMERREF AS CustomerRef, CUSTOMERTABLE_1.NAME, DATEADD(hh,
SupportDB.Archived_Calls.OPENHOURS, DATEADD(n, SupportDB.Archived_Calls.OPENMINUTES, SupportDB.Archived_Calls.OPENDATE))
AS Opened, DATEADD(hh, SupportDB.Archived_Calls.CLOSEHOURS, DATEADD(n, SupportDB.Archived_Calls.CLOSEMINUTES,
SupportDB.Archived_Calls.CLOSEDATE)) AS Closed, DATEADD(hh, SupportDB.Archived_Calls.ACTIONHOURS, DATEADD(n,
SupportDB.Archived_Calls.ACTIONMINUTES, SupportDB.Archived_Calls.ACTIONDATE)) AS Actioned, SupportDB.Archived_Calls.PRODUCTREF, SupportDB.SUPPROD.NAME AS ProductName,
SupportDB.Archived_Calls.VERSIONREF, SUPCATEG_2.CATEGORY AS Type, REPLACE(SUPCATEG_3.CATEGORY, ''£ '', '''') AS Status,
SUPCATEG_4.CATEGORY AS Priority, SupportDB.Archived_Calls.TITLE, SupportDB.Archived_Calls.CALLER,
SupportDB.Archived_Calls.HANDLERTAG, SupportDB.Archived_Calls.CUSTSOWNREF, SupportDB.Archived_Calls.SERIALNO,
SUPCATEG_1.CATEGORY AS SLA, SUPCATEG_5.CATEGORY AS ActionCategory, SupportDB.Archived_Calls.szemail,
SupportDB.Archived_Calls.szphone, SupportDB.Archived_Calls.szline, SupportDB.Archived_Calls.szresolvedby,
''Archived'' AS SourceTable
FROM SupportDB.Archived_Calls INNER JOIN
SupportDB.SUPCATEG AS SUPCATEG_5 ON SupportDB.Archived_Calls.ACTION = SUPCATEG_5.NO INNER JOIN
SupportDB.SUPCATEG AS SUPCATEG_1 ON SupportDB.Archived_Calls.CATAREA = SUPCATEG_1.NO INNER JOIN
SupportDB.SUPCATEG AS SUPCATEG_2 ON SupportDB.Archived_Calls.CATTYPE = SUPCATEG_2.NO INNER JOIN
SupportDB.SUPCATEG AS SUPCATEG_3 ON SupportDB.Archived_Calls.CATSTATUS = SUPCATEG_3.NO INNER JOIN
SupportDB.SUPCATEG AS SUPCATEG_4 ON SupportDB.Archived_Calls.CATPRIORITY = SUPCATEG_4.NO INNER JOIN
SupportDB.CUSTOMERTABLE AS CUSTOMERTABLE_1 ON SupportDB.Archived_Calls.CUSTOMERREF = CUSTOMERTABLE_1.REFERENCE INNER JOIN
SupportDB.SUPPROD ON SupportDB.Archived_Calls.PRODUCTREF = SupportDB.SUPPROD.REFERENCE ORDER BY Closed DESC')
INSERT into #SupportSystem_Calls
SELECT CallRef, CustomerRef, NAME, Opened, Closed, Actioned, PRODUCTREF, ProductName, VERSIONREF, Type, Status, Priority, TITLE, CALLER,
HANDLERTAG, CUSTSOWNREF, SERIALNO, SLA, ActionCategory, szemail, szphone, szline, szresolvedby, SourceTable, dbo.Neaten(NAME)
AS [Customer Name]
FROM OPENQUERY([SupportServer],
'SELECT SupportDB.Calls.REFERENCE AS CallRef, SupportDB.Calls.CUSTOMERREF AS CustomerRef, SupportDB.CUSTOMERTABLE.NAME, DATEADD(hh,
SupportDB.Calls.OPENHOURS, DATEADD(n, SupportDB.Calls.OPENMINUTES, SupportDB.Calls.OPENDATE)) AS Opened, DATEADD(hh,
SupportDB.Calls.CLOSEHOURS, DATEADD(n, SupportDB.Calls.CLOSEMINUTES, SupportDB.Calls.CLOSEDATE)) AS Closed, DATEADD(hh,
SupportDB.Calls.ACTIONHOURS, DATEADD(n, SupportDB.Calls.ACTIONMINUTES, SupportDB.Calls.ACTIONDATE)) AS Actioned,
SupportDB.Calls.PRODUCTREF,SupportDB.SUPPROD.NAME AS ProductName, SupportDB.Calls.VERSIONREF, SUPCATEG_2.CATEGORY AS Type, REPLACE(SUPCATEG_3.CATEGORY, ''£ '',
'''') AS Status, SUPCATEG_4.CATEGORY AS Priority, SupportDB.Calls.TITLE, SupportDB.Calls.CALLER, SupportDB.Calls.HANDLERTAG,
SupportDB.Calls.CUSTSOWNREF, SupportDB.Calls.SERIALNO, SUPCATEG_1.CATEGORY AS SLA,
SupportDB.SUPCATEG.CATEGORY AS ActionCategory, SupportDB.Calls.szemail, SupportDB.Calls.szphone, SupportDB.Calls.szline,
SupportDB.Calls.szresolvedby,
''Live'' AS SourceTable
FROM SupportDB.Calls INNER JOIN
SupportDB.SUPCATEG ON SupportDB.Calls.ACTION = SupportDB.SUPCATEG.NO INNER JOIN
SupportDB.SUPCATEG AS SUPCATEG_1 ON SupportDB.Calls.CATAREA = SUPCATEG_1.NO INNER JOIN
SupportDB.SUPCATEG AS SUPCATEG_2 ON SupportDB.Calls.CATTYPE = SUPCATEG_2.NO INNER JOIN
SupportDB.SUPCATEG AS SUPCATEG_3 ON SupportDB.Calls.CATSTATUS = SUPCATEG_3.NO INNER JOIN
SupportDB.SUPCATEG AS SUPCATEG_4 ON SupportDB.Calls.CATPRIORITY = SUPCATEG_4.NO INNER JOIN
SupportDB.CUSTOMERTABLE ON SupportDB.Calls.CUSTOMERREF = SupportDB.CUSTOMERTABLE.REFERENCE INNER JOIN
SupportDB.SUPPROD ON SupportDB.Calls.PRODUCTREF = SupportDB.SUPPROD.REFERENCE ORDER BY Actioned DESC')
INSERT into #SupportSystem_Calls
SELECT CallRef, CustomerRef, NAME, Opened, Closed, Actioned, PRODUCTREF, 'Unspecified' AS ProductName, VERSIONREF, Type, Status, Priority, TITLE,
CALLER, HANDLERTAG, CUSTSOWNREF, SERIALNO, SLA, ActionCategory, szemail, szphone, szline, szresolvedby, 'Unknown' AS SourceTable,
dbo.Neaten(NAME) AS [Customer Name]
FROM SupportDB.Archived_Calls
WHERE (PRODUCTREF = '')
SELECT TOP 100 PERCENT * FROM #SupportSystem_Calls
ORDER BY CallRef ASC
DROP TABLE #SupportSystem_Calls
I tried configuring a remote view to retrieve this dataset. It took 8 seconds to retrieve 3600 records, from a SQL Server instance hosted on the server below mine in the rack and both servers are connected via the same gigabit switch.
I tried using SP and it is faster - 3 seconds to retrieve the dataset. But considering the TINY size of the dataset, it's still too slow.
About the only way that does seem to return the dataset without latency, is if I create a local table and then use SP to retrieve data from it; this reduces the lag considerably.
But then it doesn't update itself, so I need to be able to run an upsert within the SP, for instance like this:
INSERT into #HOTHTMP
SELECT * FROM dbo.[Recent HOTH Transactions]
UPDATE SupportCalls
SET [Opened] = #TMP.[Opened],
[Closed] = #TMP.[Closed],
[Actioned] = #TMP.[Actioned],
[Status] = #TMP.[Status],
[TITLE] = #TMP.[TITLE],
[SLA] = #TMP.[SLA],
[ActionCategory] = #TMP.[ActionCategory],
[Customer Name] = #TMP.[Customer Name],
[HANDLERTAG] = #TMP.[HANDLERTAG],
[szresolvedby] = #TMP.[szresolvedby],
[SourceTable] = #TMP.[SourceTable],
[szline] = #TMP.[szline]
FROM #TMP WHERE SupportCalls.CallRef = #TMP.CallRef
INSERT into SupportCalls
SELECT * FROM #TMP WHERE CallRef NOT IN(SELECT DISTINCT CallRef FROM SupportCalls)
SELECT * FROM SupportCalls
DROP TABLE #HOTHTMP
When I restricted the contents of the #TMP table so it only showed the last 50 modified records from the live table and the last 5 updated records from the archived table, I cut the execution time of the UPSERT by 80% and it still ensures that the Business Data List in Sharepoint is up to date.
However, when I execute the final SELECT statement, every record that wasn't updated shows NULL in the twelve columns used by the update routine.
If I can get the UPSERT to work and then return the complete updated recordset in the final statement of the SP, then I would expect that to be the most efficient solution for Sharepoint.
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply