November 26, 2019 at 7:55 pm
I am working with a table that I have not seen before, and do not have a clear idea on what it is doing or what the data in it really means. There are 5+ million rows in this table. and 550 columns. However I need to create a few rather complex queries against that table.
There is no PK, no Clustered or Unique index on the table. There are a bunch of non-unique indexes only.
I need to come up with a T-SQL (or may be there is one somewhere already?.. I searched and did not find ..)
that will show me what combination of columns marks a record in this table unique. I know for the fact there is no one single column that has unique value per row but there are definitely several columns that are unique per row together. I just don't know which ones. Has anybody done anything like that before?
Likes to play Chess
November 26, 2019 at 8:01 pm
There could (and probably will be) be multiple combinations of columns which are unique, based on existing data. That does not tell you which of the combinations, if any, you should choose for a clustered index. You need to find a way of getting to grips with the data enough to understand roughly what is going on, and that will help you choose the right combination.
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel
November 26, 2019 at 8:45 pm
If the number of candidate columns is small you could exhaustively test the possible combinations using a cursor and dynamic sql.
Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können
November 26, 2019 at 9:11 pm
I am working with a table that I have not seen before, and do not have a clear idea on what it is doing or what the data in it really means. There are 5+ million rows in this table. and 550 columns. However I need to create a few rather complex queries against that table.
I would think you have to have at least some idea of what data is stored in that table - else you cannot even begin to write any queries, let alone *rather complex queries* against that table.
With that in mind - I would use SQL Search by Redgate to search that database (and any other related databases) for reference to that table. This table sounds like it is built from a combination of other tables and I suspect that there is some code that is used to build that table. If that isn't the case - then I would hope there is other code that references that table, and examining that code should help in identifying how the table has been previously utilized.
If I still couldn't find out - then I would start asking associates and/or the application owner - see if they have an idea on what the data represents and what could possibly be utilized to uniquely identify the rows. Once you have an idea on what actually sources this data, then you can start looking at individual columns to determine which ones might possibly be candidate keys.
Finally - look at the non-clustered indexes...there might just be one (or more) that were not created as unique, but are actually meant to be the PK or unique.
Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”
― Charles R. Swindoll
How to post questions to get better answers faster
Managing Transaction Logs
November 26, 2019 at 9:17 pm
May be Identifying Relationship Between Columns In SQL can be of use?
November 27, 2019 at 2:53 am
Have you asked anyone???
--Jeff Moden
Change is inevitable... Change for the better is not.
November 27, 2019 at 11:27 am
I am working with a table that I have not seen before, and do not have a clear idea on what it is doing or what the data in it really means. There are 5+ million rows in this table. and 550 columns. However I need to create a few rather complex queries against that table.
There is no PK, no Clustered or Unique index on the table. There are a bunch of non-unique indexes only.
I need to come up with a T-SQL (or may be there is one somewhere already?.. I searched and did not find ..)
that will show me what combination of columns marks a record in this table unique. I know for the fact there is no one single column that has unique value per row but there are definitely several columns that are unique per row together. I just don't know which ones. Has anybody done anything like that before?
Try exploring statistics.
November 27, 2019 at 2:02 pm
Best way I can think of is to do a little data exploration so you better understand the table (500+ columns... <sigh>), then, on a TEST system, try creating a unique constraint (don't do a clustered index yet, just nonclustered for testing, avoids data movement, makes dropping the index easy). Use the combination of columns that makes the most sense based on the structure. That's how I'd do it.
Of course, then, I'd get to work normalizing the data structure and/or fixing the star schema.
500 columns...
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
November 27, 2019 at 2:10 pm
It's difficult for me to believe that such a table exists and no one knows anything about it. Find who does and ask them.
--Jeff Moden
Change is inevitable... Change for the better is not.
November 27, 2019 at 2:21 pm
It's difficult for me to believe that such a table exists and no one knows anything about it. Find who does and ask them.
Very good point, as usual.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
November 27, 2019 at 2:28 pm
With 550 columns it's not possible to test every combination of 2 or more columns to see if they are unique. It's a combinatorial problem with a search space of 3.6855 × 10^(165). Or to be more precise:
3685510180489786476798393145496356338786055879312930105836138965083617346086082863365358130056307390177215209990980317284932211552660930305235775636164742230126362073
I think Jeff has the right suggestion of finding someone who might know.
Or maybe you could add an identity column and make it the primary key?
November 27, 2019 at 2:34 pm
With 550 columns it's not possible to test every combination of 2 or more columns to see if they are unique. It's a combinatorial problem with a search space of 3.6855 × 10^(165). Or to be more precise:
3685510180489786476798393145496356338786055879312930105836138965083617346086082863365358130056307390177215209990980317284932211552660930305235775636164742230126362073
I think Jeff has the right suggestion of finding someone who might know.
That's a great calculator you've got there ... must be the size of a TV 🙂
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel
November 27, 2019 at 3:36 pm
With 550 columns it's not possible to test every combination of 2 or more columns to see if they are unique. It's a combinatorial problem with a search space of 3.6855 × 10^(165). Or to be more precise:
3685510180489786476798393145496356338786055879312930105836138965083617346086082863365358130056307390177215209990980317284932211552660930305235775636164742230126362073
For sure having some insight, any insight, into the meaning of the columns is essential. Anyway, if the OP could boil it down to 10 candidate columns it's only 1,023 combinations (including single columns, excluding those it's 1,013). Maybe that's doable. If it's 12 columns that's 4,095 combinations which could take a while.
Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können
November 27, 2019 at 3:45 pm
Thanks everyone for your input.
I really appreciate it.
Will take all of it into account. Happy Thanksgiving!
Likes to play Chess
November 27, 2019 at 3:46 pm
Thanks! Very helpful suggestion.
Likes to play Chess
Viewing 15 posts - 1 through 15 (of 15 total)
You must be logged in to reply to this topic. Login to reply