September 4, 2012 at 12:42 pm
Here are the real-world limitations:
1. The Data Dictionary is 7 years old and parts are obsolete.
2. There are 1623 tables, which 934 being empty of rows.
3. All GUIDs (here called OIDS) are Varchar(32)
4. Referential integrity is loosely enforced, with most tables not having FK relationships described.
5. There are scores of many-to-many tables, small with only 4 columns, called 'ThisTable_ToThatTable_Helper' or 'ThisTable_ToThatTable_Collection'. They have incoming and outgoing OIDS.
6. The varchar(32) fields are called OID, OID_Link, or {field_name}Moniker, about 70% of the time.
Here are my actions so far:
1. I selected all table.column object, twice, into two table variables, then multiplied for a cartesian product.
2. I used the output, 'tblDataCartesian' to build dynamic SQL strings.
3. No table is looking at itself
4. All rows of the cartesian, which I am calling "Efforts", which came from an empty source table were removed.
5. There are over 31.4 Million 'Efforts'.
6. Each test for a Top 1 Inner join between SourceTable, TargetTable on SourceColA = TargetColB takes about 1 second. Any efforts with '1' row are posted to 'tblDataDictionary' [intDictionaryID],[vcrSourceTable],[vcrOnField],vcrTargetTable],[bolEffortDone]
7. Uninterrupted, it will take 363 days to perform all 31.4 million efforts. (Efforts marked [bolEffortDone] = 1 are not re-tried.)
So... you see my problem.
Goal: To have a 3 column data dictionary that is queryable via a recursive CTE to perform traveling-salesman type questions like, 'Starting at ThisTable.ThisColumn, what is the |shortest|longest|must-go-through-TheOtherTable| path to ThatTable.Column, where all OIDS successfully map?'
Your Task:
Looking at this SQL scrip, how can we modify it to be more efficient that 1 INNER JOIN per second?
===========
[font="Courier New"]
SET NOCOUNT ON;
Declare @tblShort TABLE (tblNameA varchar(255),colNameA varchar(255));
Declare @tblLong TABLE (tblNameB varchar(255),colNameB varchar(255));
Declare @tblCartesian TABLE(tblNameA varchar(255),colNameA varchar(255),tblNameB varchar(255),colNameB varchar(255))
Declare @strSQLs NVarchar(4000);
Declare @strSQLc NVarchar(4000);
Declare @strSQLu NVarchar(4000);
Declare @bigCartProduct BIGINT;
Declare @bigCounter BIGINT;
Declare @tblCounts TABLE (rowCnt BIGINT);
Declare @tblDictionary table (SourceTable Varchar(255), OnField Varchar(255), TargetTable Varchar(255));
Declare @tblSrcA Varchar(255)
Declare @colSrcA Varchar(255)
Declare @tblSrcB Varchar(255)
Declare @colSrcB Varchar(255)
Declare @startTime date
Declare @endTime date
Declare @minsElap INT
Set @startTime = GetDate()
SET @bigCounter = 0;
SET @strSQLs = '';
SET @strSQLc = '';
SET @strSQLu = '';
/* -- REM: Disable re-build of the 'tblDataCartesian', having already built it once.
-- Make CTE1 & 2 for the cartesian join
WITH CTE1 (TableNameA,ColumnNameA) as (
SELECT
OBJECT_NAME(c.OBJECT_ID) as TableNameA
, c.name as ColumnNameA
FROM sys.columns AS c
INNER JOIN sys.types AS t ON c.user_type_id=t.user_type_id
INNER JOIN sys.objects AS o ON C.Object_ID = O.Object_ID
WHERE O.[Type] = 'U'
AND t.name = 'Varchar'
AND C.Max_Length = 32
AND C.Name is not null
AND CHARINDEX('-',OBJECT_NAME(c.OBJECT_ID),1) = 0
)
INSERT INTO @tblShort
SELECT * FROM CTE1
ORDER BY TableNameA, ColumnNameA;
WITH CTE2 (TableNameB,ColumnNameB) as (
SELECT
OBJECT_NAME(c.OBJECT_ID) TableNameB
, c.name ColumnNameB
FROM sys.columns AS c
INNER JOIN sys.types AS t ON c.user_type_id=t.user_type_id
INNER JOIN sys.objects AS o ON C.Object_ID = O.Object_ID
WHERE O.[Type] = 'U'
AND t.name = 'Varchar'
AND C.Max_Length = 32
AND C.Name is not null
AND CHARINDEX('-',OBJECT_NAME(c.OBJECT_ID),1) = 0
)
INSERT INTO @tblLong
SELECT * FROM CTE2;
INSERT INTO tblDataCartesian (vcrSourceTable,vcrSourceCol,vcrTargetTable,vcrTargetCol,bolEffortDone)
SELECT tblNameA, colNameA, tblNameB, colNameB,'0'
FROM @tblShort, @tblLong
WHERE tblNameA <> tblNameB
SELECT @bigCartProduct = COUNT(*) FROM tblDataCartesian WHERE bolEffortDone = 0
PRINT 'Total Cartesian Records Remaining: ' + cast(@bigCartProduct as varchar(100))
-- End of Disabling
*/
Declare myCursor Cursor READ_ONLY FOR
(SELECT vcrSourceTable, vcrSourceCol, vcrTargetTable, vcrTargetCol
FROM tblDataCartesian
WHERE bolEffortDone = 0
AND vcrSourceTable = 'Client')
OPEN myCursor
FETCH NEXT FROM myCursor INTO @tblSrcA, @colSrcA, @tblSrcB, @colSrcB
WHILE @@FETCH_STATUS = 0
BEGIN
UPDATE @tblCounts SET rowcnt = 0
SET @bigCounter = @bigCounter + 1
SET @strSQLs = ''
SET @strSQLc = ''
SET @strSQLu = ''
SET @strSQLs = 'SELECT TOP 1 ''[' + @tblSrcA + ']'' AS SourceTable, '
SET @strSQLS = @strSQLs + '''' + @colSrcA + ''' as OnField, '
SET @strSQLS = @strSQLs + '''[' + @tblSrcB + ']'' AS TargetTable '
SET @strSQLS = @strSQLs + 'FROM ' + @tblSrcA + ' X (nolock) '
SET @strSQLS = @strSQLs + 'INNER JOIN [' + @tblSrcB + '] Y (nolock) ON X.' + @colSrcA + ' = Y.' + @colSrcB + ' '
SET @strSQLc = N'SELECT COUNT(*) as RowCnt FROM (' + @strSQLs + ') Z;'
--print @strSQLC
--INSERT INTO @tblCounts
--exec SP_ExecuteSQL @strSQLc
--IF (SELECT * from @tblCounts WHERE rowcnt > 0) > 0
Begin
INSERT INTO @tblDictionary
exec SP_ExecuteSQL @strSQLs;
INSERT INTO tblDataDictionary
SELECT *, '1' FROM @tblDictionary;
end
-- Don't reuse your last success !
DELETE FROM @tblDictionary;
-- Now, having done the quad, don't do it again
SET @strSQLu = N'UPDATE tblDataCartesian SET bolEffortDone = 1 '
SET @strSQLu = @strSQLu + 'WHERE vcrSourceTable = ''' + @tblSrcA + ''' '
SET @strSQLu = @strSQLu + 'AND vcrSourceCol = ''' + @colSrcA + ''' '
SET @strSQLu = @strSQLu + 'AND vcrTargetTable = ''' + @tblSrcB + ''' '
SET @strSQLu = @strSQLu + 'AND vcrTargetCol = ''' + @colSrcB + ''';'
Exec SP_ExecuteSQL @strSQLu
-- Tell 'Em Watcha Done.
IF (@bigCounter % 100) = 0
Begin
set @minsELAP = Cast(datediff(minute,@startTime,GetDate()) as Varchar(100))
print 'At Mins: ' + Cast(@minsELAP as Varchar(100)) + ' _ ' + 'Effort # ' + Cast(@bigCounter as varchar(100)) + ' Checking: ' + @tblSrcA + ', ' + @colSrcA + ', ' + @tblSrcB + ', ' + @colSrcB
end
FETCH NEXT FROM myCursor INTO @tblSrcA, @colSrcA, @tblSrcB, @colSrcB
IF @bigCounter = 100000 BREAK
END-- End of WHILE LOOP
SET NOCOUNT OFF;
Set @EndTime = GetDate()
set @minsELAP = Cast(datediff(minute,@startTime,@endTime) as Varchar(100))
print 'Minutes Elapsed: ' + Cast(@minsELAP as Varchar(100))
Print 'Records Per Minute: ' + Cast(Cast(@bigCounter as Float)/ Cast(@minsElap as Float) as Varchar(100))
SET @bigCounter = 0
Close myCursor
Deallocate myCursor
--================== TESTING SCRIPTS ====================
/*
SELECT
Case boleffortDone
When 0 then 'To Do'
When 1 Then 'Done'
End as StatusIs
, vcrSourceTable
, Count(*)
FROM tblDataCartesian (nolock)
WHERE vcrSourceTable = 'SupportType'
Group By
Case boleffortDone
When 0 then 'To Do'
When 1 Then 'Done'
End ,
vcrSourceTable
ORDER BY Case boleffortDone
When 0 then 'To Do'
When 1 Then 'Done'
End
, vcrSourceTable
, Count(*) DESC
GO
SELECT * FROM tblDataCartesian (nolock)
WHERE vcrSourceTable = 'Client'
SELECT * FROM tblDataDictionary (nolock)
*/
=============[/font]
September 4, 2012 at 7:11 pm
Some points:
1. A recursive CTE is likely to be slower, not faster.
2. It really is not at all clear what you are trying to do here. You may want to post a simplified example, with some sample data and table definitions.
3. It is even less clear why you would want to do something like this. If you could explain that, we may be able to suggest a better way to approach it.
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
Viewing 2 posts - 1 through 1 (of 1 total)
You must be logged in to reply to this topic. Login to reply