June 14, 2017 at 6:48 am
Goodmorning,
Ihave a table that stores parent and child record in same column (key_txt).Table has over 35 million records. I need to identify which child belongs towhich parent. Column key_txt can increase by 16 char at the time. Parent childrelationship identified by
parent.mpbi_seq_key_txt= SUBSTRING(child.mpbi_seq_key_txt, 1,LEN(parent.mpbi_seq_key_txt)) AND LEN(parent.mpbi_seq_key_txt) +16= LEN(child.mpbi_seq_key_txt)
See examplebelow:
IF (SELECT OBJECT_ID('tempdb..#test'))is not null
DROP TABLE #test
CREATE TABLE #test(
Key_txt varchar(124),
ParentID int,
ParentIDFK int )
INSERT INTO #test(key_txt)
SELECT '0000112500000001'
UNION ALL
SELECT '00001125000000010000107800000005'
UNION ALL
SELECT '000011250000000100001078000000050000107600000001'
UNION ALL
SELECT '1000112500000006'
UNION ALL
SELECT '10001125000000060000107800000005'
UNION ALL
SELECT '10001125000000060000107800000006'
UNION ALL
SELECT '10001125000000060000107800000007'
UNION ALL
SELECT '2000112500000007'
UNION ALL
SELECT '8000888800000009'
UNION ALL
SELECT '00001125000000010000107800000006'
UNION ALL
SELECT '2000112500000007'
UNION ALL
SELECT '20001125000000071112223334445555'
UPDATE Parent SET Parent.ParentID=A.ParentKey
FROM #test Parent
INNER JOIN (
SELECT Key_txt, DENSE_RANK() OVER (ORDER BY Key_txt ASC) AS ParentKey
FROM #test
)A
ON A.Key_txt=Parent.Key_txt
UPDATE Parent SET ParentIDFK=child.ParentID
FROM #test Parent
INNER JOIN #test child ON child.Key_txt= SUBSTRING(Parent.Key_txt, 1,LEN(child.Key_txt)) AND LEN(child.Key_txt) +16= LEN(Parent.Key_txt)
SELECT * from #test
Due to size of the table and kex_txt column data type (varchar) update takes long time to run. I wonder if there better way to do it
Thank you
June 14, 2017 at 7:26 am
The reason you are struggling here is because you have violated 1NF by stuffing parent and child keys into the same tuple. You need to redesign how you are handling your hierarchy and don't store the parent and the child keys all crammed together in one big mess of a string. I know the reason you posted a question here is to ask if there is a faster way to accomplish your goal. Given the structure of your table you are kind of stuck. If instead you fix the data structure the performance issue will fix itself.
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
June 14, 2017 at 7:46 am
You say the key_txt field is a varchar(124), and yet the parent/child key thing expands by 16 characters each time... 16 isn't an integer multiple of 124...
Having said that, and agreeing with Sean's comment about violating 1NF, and the performance is going to suck...
...but, if you have to use this structure, are you able to put some persisted computed columns on the table to store the intermediate values? That, at least, might help a little with the performance issues, by saving you from having to do all the string splitting stuff every time you want to do this sort of selection from the table...
Erm. No, really, this is horrible.
Thomas Rushton
blog: https://thelonedba.wordpress.com
June 14, 2017 at 7:55 am
ThomasRushton - Wednesday, June 14, 2017 7:46 AMYou say the key_txt field is a varchar(124), and yet the parent/child key thing expands by 16 characters each time... 16 isn't an integer multiple of 124...Having said that, and agreeing with Sean's comment about violating 1NF, and the performance is going to suck...
...but, if you have to use this structure, are you able to put some persisted computed columns on the table to store the intermediate values? That, at least, might help a little with the performance issues, by saving you from having to do all the string splitting stuff every time you want to do this sort of selection from the table...
Erm. No, really, this is horrible.
Thomas,
I have to use the structure as is. To clarify what you saying if I split string by 16 char and insert values into separate columns (Parent1, Parent2, Parent3,…) I might see speed improvement?
June 14, 2017 at 8:02 am
legeboka - Wednesday, June 14, 2017 7:55 AMThomasRushton - Wednesday, June 14, 2017 7:46 AMYou say the key_txt field is a varchar(124), and yet the parent/child key thing expands by 16 characters each time... 16 isn't an integer multiple of 124...Having said that, and agreeing with Sean's comment about violating 1NF, and the performance is going to suck...
...but, if you have to use this structure, are you able to put some persisted computed columns on the table to store the intermediate values? That, at least, might help a little with the performance issues, by saving you from having to do all the string splitting stuff every time you want to do this sort of selection from the table...
Erm. No, really, this is horrible.
Thomas,
I have to use the structure as is. To clarify what you saying if I split string by 16 char and insert values into separate columns (Parent1, Parent2, Parent3,…) I might see speed improvement?
Don't do that. You just said you have to use the structure as it is. Adding columns Parent1, Parent2 is just awful. You should only ever need the next level, not the whole tree. Then you can use a recursive cte to get the whole tree. Your structure is such that I don't think there is much you can do. You seriously need to redesign this.
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
June 14, 2017 at 8:04 am
legeboka - Wednesday, June 14, 2017 7:55 AMThomasRushton - Wednesday, June 14, 2017 7:46 AMYou say the key_txt field is a varchar(124), and yet the parent/child key thing expands by 16 characters each time... 16 isn't an integer multiple of 124...Having said that, and agreeing with Sean's comment about violating 1NF, and the performance is going to suck...
...but, if you have to use this structure, are you able to put some persisted computed columns on the table to store the intermediate values? That, at least, might help a little with the performance issues, by saving you from having to do all the string splitting stuff every time you want to do this sort of selection from the table...
Erm. No, really, this is horrible.
Thomas,
I have to use the structure as is. To clarify what you saying if I split string by 16 char and insert values into separate columns (Parent1, Parent2, Parent3,…) I might see speed improvement?
Looking at it again, I don't know if you would... 🙁
What might be better is running that first update in one statement, then rewriting the second update so it's updating in batches, eg:
DECLARE @rc AS INTEGER = 1 ;
WHILE @rc <> 0
BEGIN
UPDATE TOP ( 2 ) -- change this to something more appropriate to the size of your real dataset - this is just to show the concept
Parent
SET
ParentIDFK = child.ParentID
FROM
@test-2 Parent
INNER JOIN
@test-2 child ON child.Key_txt = SUBSTRING(Parent.Key_txt, 1, LEN(child.Key_txt))
AND LEN(child.Key_txt) + 16 = LEN(Parent.Key_txt)
WHERE
LEN(Parent.Key_txt) > 16
AND Parent.ParentIDFK IS NULL ;
SELECT @rc = @@ROWCOUNT ;
SELECT * FROM @test-2 ; -- remove this when you do it for real.
END ;
Thomas Rushton
blog: https://thelonedba.wordpress.com
June 14, 2017 at 12:04 pm
Try changing your join to the following
INNER JOIN #test child ON child.Key_txt LIKE Parent.Key_txt + '[0-9]%'
AND LEN(child.Key_txt) - 16=LEN(Parent.Key_txt) /* I think you had these backwards. */
LIKE is SARGable as long as there is no leading wildcard, whereas SUBSTRING is never SARGable.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
June 15, 2017 at 10:38 am
>> I have a table that stores parent [sic] and child [sic] records [sic] in same column ( key_txt).Table has over 35 million records [sic]. I need to identify which child [sic] belongs to which parent [sic]. Column key_txt can increase by 16 char at the time. Parent child relationship identified by
parent.mpbi_seq_ key_txt= SUBSTRING(child.mpbi_seq_ key_txt, 1,LEN(parent.mpbi_seq_ key_txt)) AND LEN(parent.mpbi_seq_ key_txt) +16= LEN(child.mpbi_seq_ key_txt)<<
You need to get a book on basic relational databases. The terms child and parent refer to pointer structures in the old network databases, and have nothing to do with SQL. We have "referenced" and "referencing" tables. Rows are nothing like records. If you would read ISO 11179 or any book on data modeling, you would know not to mix data and metadata in the schema. Labeling something with the word "key" tells us how it has been used, and not what the data element is by its nature.
Identifiers cannot be integers because we don't use them for computation. What's a square root of your credit card number? A table must have a key, by definition, but your attempt at DDL doesn't have anything like that.
You also don't know the correct syntax for an insertion statement. Even SQL Server has had the values list construction for years; there is no reason to use the original Sybase "select – union" syntax. Finally, the old Sybase "update – from" syntax does not work. I don't mean it's slow. I mean, it does not work. If the source table has multiple rows that qualify, this syntax will not give you any warning as it picks the last physically stored row of data. In short, you've done the wrong thing badly. Then on top of that I believe it's already been pointed out that you have normalization problems in this data.
Please read Dr. Codd's 12 rules of RDBMS. It used to be required for any database course. In order to be in first normal form, each column must hold a scalar value. That means your compound string. That has to be parsed is a normal form violation.
>> Due to size of the table and kex_txt column data type (varchar) update takes long time to run. I wonder if there better way to do it <<
I have a whole book devoted to trees and hierarchies in SQL. Get a copy of it and pick one of the many methods that I discuss in the book. My favorite happens to be "the nested set model", but there might be something better for your purposes.
Please post DDL and follow ANSI/ISO standards when asking for help.
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply