DBA vs. Developer Question / Issue

Question

DBA vs. Developer Question / Issue

Viewing 15 posts - 16 through 30 (of 63 total)

You must be logged in to reply to this topic. Login to reply

Eric M Russell SSC Guru Points: 125562 More actions · Answer 1

Grant Fritchey (2/19/2015)
shane.green 1227 (2/19/2015)
There are exceptions to every rule, so while "All tables should have at least one index" is good rule - it doesn't mean that should not be any exceptions.
So to stick up for the senior developer (even if the supplied information is a little lacking :-))
A reasonable example would be where some large character or binary data is stored (and is not based on a language) and the search criteria changes frequently.
...
...
Welcome to the party and thanks for posting.
No disagreements. You'll note my first response said that the heap might work better. They need to experiment. I still stand by that. As more information has come out though, I'm leaning towards the idea that they need a clustered index, but I'd still say test it to be sure.

In the original post, it was stated that table contains no primary key. Regardless of what the table contains (datatypes or internal value structure), all tables at least need a primary key. Whether that key is also clustered or whether other non-unique indexes are included is optional.

Ideally the table would have a narrow primary key that is also clustered and with the leading column sequentially incrmenting. For example on a table that contains a count of sales by Period and Region it would be: PeriodKey, SalesRegionKey.

"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

ZZartin SSC-Dedicated Points: 30921 More actions · Answer 2

Eric M Russell (2/19/2015)
Grant Fritchey (2/19/2015)
shane.green 1227 (2/19/2015)
There are exceptions to every rule, so while "All tables should have at least one index" is good rule - it doesn't mean that should not be any exceptions.
So to stick up for the senior developer (even if the supplied information is a little lacking :-))
A reasonable example would be where some large character or binary data is stored (and is not based on a language) and the search criteria changes frequently.
...
...
Welcome to the party and thanks for posting.
No disagreements. You'll note my first response said that the heap might work better. They need to experiment. I still stand by that. As more information has come out though, I'm leaning towards the idea that they need a clustered index, but I'd still say test it to be sure.
In the original post, it was stated that table contains no primary key. Regardless of what the table contains (datatypes or internal value structure), all tables at least need a primary key. Whether that key is also clustered or whether other non-unique indexes are included is optional.
Ideally the table would have a narrow primary key that is also clustered and with the leading column sequentially incrmenting. For example on a table that contains a count of sales by Period and Region it would be: PeriodKey, SalesRegionKey.

Why?

Not all tables have a logically primary key, and throwing an identity column on a table just for the sake of making it have a primary key doesn't necessarily help anything.

Grant Fritchey SSC Guru Points: 398909 More actions · Answer 3

ZZartin (2/19/2015)
Eric M Russell (2/19/2015)
Grant Fritchey (2/19/2015)
shane.green 1227 (2/19/2015)
There are exceptions to every rule, so while "All tables should have at least one index" is good rule - it doesn't mean that should not be any exceptions.
So to stick up for the senior developer (even if the supplied information is a little lacking :-))
A reasonable example would be where some large character or binary data is stored (and is not based on a language) and the search criteria changes frequently.
...
...
Welcome to the party and thanks for posting.
No disagreements. You'll note my first response said that the heap might work better. They need to experiment. I still stand by that. As more information has come out though, I'm leaning towards the idea that they need a clustered index, but I'd still say test it to be sure.
In the original post, it was stated that table contains no primary key. Regardless of what the table contains (datatypes or internal value structure), all tables at least need a primary key. Whether that key is also clustered or whether other non-unique indexes are included is optional.
Ideally the table would have a narrow primary key that is also clustered and with the leading column sequentially incrmenting. For example on a table that contains a count of sales by Period and Region it would be: PeriodKey, SalesRegionKey.
Why?
Not all tables have a logically primary key, and throwing an identity column on a table just for the sake of making it have a primary key doesn't necessarily help anything.

I agree. There are always exceptions.

But... While I do believe in exceptions, since they're exceptions and should therefore be exceptional, I want proof positive that deviating from the standard is being done for good reason and not just on a whim or some internet rumor.

We're not there and we can't know for sure what's going on since we haven't seen the structure or the code. Based on what's been said so far... I'm inclined to believe that a PK and a cluster (possibly one and the same) are needed.

Second but.... Test it to be sure.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Eric M Russell SSC Guru Points: 125562 More actions · Answer 4

ZZartin (2/19/2015)
Eric M Russell (2/19/2015)
Grant Fritchey (2/19/2015)
shane.green 1227 (2/19/2015)
There are exceptions to every rule, so while "All tables should have at least one index" is good rule - it doesn't mean that should not be any exceptions.
So to stick up for the senior developer (even if the supplied information is a little lacking :-))
A reasonable example would be where some large character or binary data is stored (and is not based on a language) and the search criteria changes frequently.
...
...
Welcome to the party and thanks for posting.
No disagreements. You'll note my first response said that the heap might work better. They need to experiment. I still stand by that. As more information has come out though, I'm leaning towards the idea that they need a clustered index, but I'd still say test it to be sure.
In the original post, it was stated that table contains no primary key. Regardless of what the table contains (datatypes or internal value structure), all tables at least need a primary key. Whether that key is also clustered or whether other non-unique indexes are included is optional.
Ideally the table would have a narrow primary key that is also clustered and with the leading column sequentially incrmenting. For example on a table that contains a count of sales by Period and Region it would be: PeriodKey, SalesRegionKey.
Why?
Not all tables have a logically primary key, and throwing an identity column on a table just for the sake of making it have a primary key doesn't necessarily help anything.

A table without a logical unique key; that's the problem.

Here is how the table was originally described:

She is creating several stored procedures that create reports based off single heaps. Each heap is a different customer, different data and has customer account numbers, but multiple entries so no primary key.

Any table that contains a customer account number and is the data source for reporting should have some column combination that could be a primary key.

For example, a table for a weather station that records measurements should have a primary key (also clustered in this case) on time recorded.

Even EAV (entity-attribute-value) modeled tables can have a primary key placed on entity-attribute, or perhaps entity-attribute-time if it's versioned.

"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

ZZartin SSC-Dedicated Points: 30921 More actions · Answer 5

Eric M Russell (2/19/2015)
ZZartin (2/19/2015)
Eric M Russell (2/19/2015)
Grant Fritchey (2/19/2015)
shane.green 1227 (2/19/2015)
There are exceptions to every rule, so while "All tables should have at least one index" is good rule - it doesn't mean that should not be any exceptions.
So to stick up for the senior developer (even if the supplied information is a little lacking :-))
A reasonable example would be where some large character or binary data is stored (and is not based on a language) and the search criteria changes frequently.
...
...
Welcome to the party and thanks for posting.
No disagreements. You'll note my first response said that the heap might work better. They need to experiment. I still stand by that. As more information has come out though, I'm leaning towards the idea that they need a clustered index, but I'd still say test it to be sure.
In the original post, it was stated that table contains no primary key. Regardless of what the table contains (datatypes or internal value structure), all tables at least need a primary key. Whether that key is also clustered or whether other non-unique indexes are included is optional.
Ideally the table would have a narrow primary key that is also clustered and with the leading column sequentially incrmenting. For example on a table that contains a count of sales by Period and Region it would be: PeriodKey, SalesRegionKey.
Why?
Not all tables have a logically primary key, and throwing an identity column on a table just for the sake of making it have a primary key doesn't necessarily help anything.
A table without a logical unique key; that's the problem.
Here is how the table was originally described:
She is creating several stored procedures that create reports based off single heaps. Each heap is a different customer, different data and has customer account numbers, but multiple entries so no primary key.
Any table that contains a customer account number and is the data source for reporting should have some column combination that could be a primary key.
For example, a table for a weather station that records measurements should have a primary key (also clustered in this case) on time recorded.
Even EAV (entity-attribute-value) modeled tables can have a primary key placed on entity-attribute, or perhaps entity-attribute-time if it's versioned.

I don't disagree that most tables probably will and should but there's certainly cases where you might not have a need for or a possible primary key.

To make a very simple example what about a table representing a bookshelf? It might only have two columns book name and author, there's no possible primary key on it nor does there need to be.

Eric M Russell SSC Guru Points: 125562 More actions · Answer 6

ZZartin (2/19/2015)
I don't disagree that most tables probably will and should but there's certainly cases where you might not have a need for or a possible primary key.
To make a very simple example what about a table representing a bookshelf? It might only have two columns book name and author, there's no possible primary key on it nor does there need to be.

In a table that contains only (2) columns, BookName and AuthorName, it seems that BookName + AuthorName would be the natural primary key. A primary key can be composite of attributes, and it's primary function is to prevent duplicates. For example, in the table below, it makes no sense to record Game Of Thrones - Martin twice.

Two Towers - Tolkien

Wizzards First Rule - Goodkind

Game Of Thrones - Martin

But my concept of a "bookshelf" is a container of books, so each book would have unique BookID (in addition is ISBN) for inventory purposes, and then each book would be assigned a ShelfID, because a unique physical book can only be contained on one shelf at a time. Another table called Title would contain BookName and AuthorName.

The Title table would contain:

ISBN, TitleName, AuthorName and ISBN would be the primary key. Perhaps TitleName + AuthorName would be an additional unique index.

The Book table would contain:

BookID, ISBN, ShelfID and BookID would be the primary key.

"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

SqlServerDBA318 SSCrazy Points: 2353 More actions · Answer 7

Okay, so I cannot post the code due to the issues / privacy regarding what we do. I will clarify though as much as possible.

We run analysis for casinos based off casino play and have formulas that put players in a behavior / segment based off past 13 months of play that creates a mail file to send off to players that determines what offers they receive. There is more than one calculation that makes this happen, but included is number of plays, win, loss, whether it is slots or table, day of week and distance they live within casino.

Any one here that has a player card for any casino will know what I'm talking about when you receive an offer in the mail.

So, each Heap has 16 columns that will include player info such as Account#, first, last, dob, address, and then the rest that are regarding their play that includes number of pulls, money in, money won, jackpot won, etc.

Because of this not sure what could be used as a primary key since there are multiple entries involved for players.

The number of rows when this happens goes from 150,000 up to 3million depending on the casino.

Hope this can clarify and thanks again for all the suggestions.

ZZartin SSC-Dedicated Points: 30921 More actions · Answer 8

Eric M Russell (2/19/2015)
ZZartin (2/19/2015)
I don't disagree that most tables probably will and should but there's certainly cases where you might not have a need for or a possible primary key.
To make a very simple example what about a table representing a bookshelf? It might only have two columns book name and author, there's no possible primary key on it nor does there need to be.
In a table that contains only (2) columns, BookName and AuthorName, it seems that BookName + AuthorName would be the natural primary key. A primary key can be composite of attributes, and it's primary function is to prevent duplicates. For example, in the table below, it makes no sense to record Game Of Thrones - Martin twice.
Two Towers - Tolkien
Wizzards First Rule - Goodkind
Game Of Thrones - Martin
Game Of Thrones - Martin
But my concept of a "bookshelf" is a container of books, so each book would have unique BookID (in addition is ISBN) for inventory purposes, and then each book would be assigned a ShelfID, because a unique physical book can only be contained on one shelf at a time. Another table called Title would contain BookName and AuthorName.
The Title table would contain:
ISBN, TitleName, AuthorName and ISBN would be the primary key. Perhaps TitleName + AuthorName would be an additional unique index.
The Book table would contain:
BookID, ISBN, ShelfID and BookID would be the primary key.

But that expands on the initial description with information that may or may not even be available just to create a primary key which for storing which books are on a bookshelf isn't really relavant anyways.

Eric M Russell SSC Guru Points: 125562 More actions · Answer 9

Depending on the level of detail in the record, an event can be uniquely identified by a combination of time, place, actor, and action. In this case the Account number is probably the actor, the Casino is the place, jackput Win or Hotel stay would be the action, and although not mentioned, there is probably a Date column involved here as well.

Let's assume the ETL load, application submission form, or whatever inserts this table were accidentally performed twice, just eye balling the records, how would you determine one set of records are actually a duplicate of another set? That will give you clues about what the constitute the primary key here.

"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

Eric M Russell SSC Guru Points: 125562 More actions · Answer 10

ZZartin (2/19/2015)
Eric M Russell (2/19/2015)
ZZartin (2/19/2015)
I don't disagree that most tables probably will and should but there's certainly cases where you might not have a need for or a possible primary key.
To make a very simple example what about a table representing a bookshelf? It might only have two columns book name and author, there's no possible primary key on it nor does there need to be.
In a table that contains only (2) columns, BookName and AuthorName, it seems that BookName + AuthorName would be the natural primary key. A primary key can be composite of attributes, and it's primary function is to prevent duplicates. For example, in the table below, it makes no sense to record Game Of Thrones - Martin twice.
Two Towers - Tolkien
Wizzards First Rule - Goodkind
Game Of Thrones - Martin
Game Of Thrones - Martin
But my concept of a "bookshelf" is a container of books, so each book would have unique BookID (in addition is ISBN) for inventory purposes, and then each book would be assigned a ShelfID, because a unique physical book can only be contained on one shelf at a time. Another table called Title would contain BookName and AuthorName.
The Title table would contain:
ISBN, TitleName, AuthorName and ISBN would be the primary key. Perhaps TitleName + AuthorName would be an additional unique index.
The Book table would contain:
BookID, ISBN, ShelfID and BookID would be the primary key.
But that expands on the initial description with information that may or may not even be available just to create a primary key which for storing which books are on a bookshelf isn't really relavant anyways.

A table containing nothing but Title and Author, with no primary key, makes no sense from a normalized data modeling perspective. Even if it's nothing but a one-off lookup table intended to populate an application listbox, it still needs a primary key (unique index) on Title and Author. But based on how the OP describes the actual table in quesiton, it's used for reporting purposes, so preventing duplicate records matters.

"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

ZZartin SSC-Dedicated Points: 30921 More actions · Answer 11

Eric M Russell (2/19/2015)
ZZartin (2/19/2015)
Eric M Russell (2/19/2015)
ZZartin (2/19/2015)
I don't disagree that most tables probably will and should but there's certainly cases where you might not have a need for or a possible primary key.
To make a very simple example what about a table representing a bookshelf? It might only have two columns book name and author, there's no possible primary key on it nor does there need to be.
In a table that contains only (2) columns, BookName and AuthorName, it seems that BookName + AuthorName would be the natural primary key. A primary key can be composite of attributes, and it's primary function is to prevent duplicates. For example, in the table below, it makes no sense to record Game Of Thrones - Martin twice.
Two Towers - Tolkien
Wizzards First Rule - Goodkind
Game Of Thrones - Martin
Game Of Thrones - Martin
But my concept of a "bookshelf" is a container of books, so each book would have unique BookID (in addition is ISBN) for inventory purposes, and then each book would be assigned a ShelfID, because a unique physical book can only be contained on one shelf at a time. Another table called Title would contain BookName and AuthorName.
The Title table would contain:
ISBN, TitleName, AuthorName and ISBN would be the primary key. Perhaps TitleName + AuthorName would be an additional unique index.
The Book table would contain:
BookID, ISBN, ShelfID and BookID would be the primary key.
But that expands on the initial description with information that may or may not even be available just to create a primary key which for storing which books are on a bookshelf isn't really relavant anyways.
A table containing nothing but Title and Author, with no primary key, makes no sense from a normalized data modeling perspective. Even if it's nothing but a one-off lookup table intended to populate an application listbox, it still needs a primary key (unique index) on Title and Author. But based on how the OP describes the actual table in quesiton, it's used for reporting purposes, so preventing duplicate records matters.

I didn't really intend this to turn into a debate on data models, but not all data models are normalized perfectly. In the example I gave a primary key on author and title would simply not make sense because there's no reason you can't put two books with the same title/author on a bookshelf and while creating some kind of identity column would create a primary key it wouldn't serve much purpose.

It sounds like what the OP is describing is a list table that could very potentially contain duplicates because it's used for the purposes of generating aggregates which makes the results of those aggregates unique.

x SSC-Insane Points: 23660 More actions · Answer 12

Eric M Russell (2/18/2015)
butcherking13 (2/18/2015)
I was hoping the tenured DBA’s would chime in on this. I work with a developer who has more experience as I am a very junior DBA. She is creating several stored procedures that create reports based off single heaps. Each heap is a different customer, different data and has customer account numbers, but multiple entries so no primary key.
I brought up creating a clustered index on the heaps to help performance since some have a few million rows, but she is against it and not sure why. In this situation would you just create the indexes without communication or her agreement? Is this normal?
Thank you for any response!
The real problem is that she's creating reports from a flat denormalized table with no primary key, which means there is a potential for duplicates and inaccurate reporting. A table should always have a primary key, but the question about whether the table should be clustered is more subjective. One advantage of a clustered key is that it will benefit non-clustered indexes, because the clustered key is used as the bookmark for faster lookups.

Ok I'll bite, why is the row lookup slower on heaps? There is still an (invisible) key to the row.

Eric M Russell SSC Guru Points: 125562 More actions · Answer 13

ZZartin (2/19/2015)
Eric M Russell (2/19/2015)
ZZartin (2/19/2015)
Eric M Russell (2/19/2015)
ZZartin (2/19/2015)
I don't disagree that most tables probably will and should but there's certainly cases where you might not have a need for or a possible primary key.
To make a very simple example what about a table representing a bookshelf? It might only have two columns book name and author, there's no possible primary key on it nor does there need to be.
In a table that contains only (2) columns, BookName and AuthorName, it seems that BookName + AuthorName would be the natural primary key. A primary key can be composite of attributes, and it's primary function is to prevent duplicates. For example, in the table below, it makes no sense to record Game Of Thrones - Martin twice.
Two Towers - Tolkien
Wizzards First Rule - Goodkind
Game Of Thrones - Martin
Game Of Thrones - Martin
But my concept of a "bookshelf" is a container of books, so each book would have unique BookID (in addition is ISBN) for inventory purposes, and then each book would be assigned a ShelfID, because a unique physical book can only be contained on one shelf at a time. Another table called Title would contain BookName and AuthorName.
The Title table would contain:
ISBN, TitleName, AuthorName and ISBN would be the primary key. Perhaps TitleName + AuthorName would be an additional unique index.
The Book table would contain:
BookID, ISBN, ShelfID and BookID would be the primary key.
But that expands on the initial description with information that may or may not even be available just to create a primary key which for storing which books are on a bookshelf isn't really relavant anyways.
A table containing nothing but Title and Author, with no primary key, makes no sense from a normalized data modeling perspective. Even if it's nothing but a one-off lookup table intended to populate an application listbox, it still needs a primary key (unique index) on Title and Author. But based on how the OP describes the actual table in quesiton, it's used for reporting purposes, so preventing duplicate records matters.
I didn't really intend this to turn into a debate on data models, but not all data models are normalized perfectly. In the example I gave a primary key on author and title would simply not make sense because there's no reason you can't put two books with the same title/author on a bookshelf and while creating some kind of identity column would create a primary key it wouldn't serve much purpose.
It sounds like what the OP is describing is a list table that could very potentially contain duplicates because it's used for the purposes of generating aggregates which makes the results of those aggregates unique.

It's unclear how a table containing only Title + Author, and duplicates allowed, would be used by an application. If someone wanted to know what titles are available for purchase, they want to see a distinct list, and once a specific book has been purchased, there needs to be a ID to distinguigh WHICH book was purchased. There either needs to be one record per distinct book (primary key on BookID) or one record per Title + Author (primary key) with an additional inventory count column that can be decremented.

If we're talking about non-relational lists that have been dumped into SQL Server, and nobody cares about duplicates, then as far as I'm concerned it's not even a real table, and the DBA shouldn't fret over it. It's like debating the nutritional value of a ball game hot dog.

"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

ZZartin SSC-Dedicated Points: 30921 More actions · Answer 14

Eric M Russell (2/19/2015)
ZZartin (2/19/2015)
Eric M Russell (2/19/2015)
ZZartin (2/19/2015)
Eric M Russell (2/19/2015)
ZZartin (2/19/2015)
I don't disagree that most tables probably will and should but there's certainly cases where you might not have a need for or a possible primary key.
To make a very simple example what about a table representing a bookshelf? It might only have two columns book name and author, there's no possible primary key on it nor does there need to be.
In a table that contains only (2) columns, BookName and AuthorName, it seems that BookName + AuthorName would be the natural primary key. A primary key can be composite of attributes, and it's primary function is to prevent duplicates. For example, in the table below, it makes no sense to record Game Of Thrones - Martin twice.
Two Towers - Tolkien
Wizzards First Rule - Goodkind
Game Of Thrones - Martin
Game Of Thrones - Martin
But my concept of a "bookshelf" is a container of books, so each book would have unique BookID (in addition is ISBN) for inventory purposes, and then each book would be assigned a ShelfID, because a unique physical book can only be contained on one shelf at a time. Another table called Title would contain BookName and AuthorName.
The Title table would contain:
ISBN, TitleName, AuthorName and ISBN would be the primary key. Perhaps TitleName + AuthorName would be an additional unique index.
The Book table would contain:
BookID, ISBN, ShelfID and BookID would be the primary key.
But that expands on the initial description with information that may or may not even be available just to create a primary key which for storing which books are on a bookshelf isn't really relavant anyways.
A table containing nothing but Title and Author, with no primary key, makes no sense from a normalized data modeling perspective. Even if it's nothing but a one-off lookup table intended to populate an application listbox, it still needs a primary key (unique index) on Title and Author. But based on how the OP describes the actual table in quesiton, it's used for reporting purposes, so preventing duplicate records matters.
I didn't really intend this to turn into a debate on data models, but not all data models are normalized perfectly. In the example I gave a primary key on author and title would simply not make sense because there's no reason you can't put two books with the same title/author on a bookshelf and while creating some kind of identity column would create a primary key it wouldn't serve much purpose.
It sounds like what the OP is describing is a list table that could very potentially contain duplicates because it's used for the purposes of generating aggregates which makes the results of those aggregates unique.
It's unclear how a table containing only Title + Author, and duplicates allowed, would be used by an application. If someone wanted to know what titles are available for purchase, they want to see a distinct list, and once a specific book has been purchased, there needs to be a ID to distinguigh WHICH book was purchased. There either needs to be one record per distinct book (primary key on BookID) or one record per Title + Author (primary key) with an additional inventory count column that can be decremented.
If we're talking about non-relational lists that have been dumped into SQL Server, and nobody cares about duplicates, then as far as I'm concerned it's not even a real table, and the DBA shouldn't fret over it. It's like debating the nutritional value of a ball game hot dog.

/shrug this got way too involved for what I thought was a simple theoretical.

Title + Author (primary key) with an additional inventory count column

Functionally this is the same result as what I originally described with more complicated modification functions.

Eric M Russell SSC Guru Points: 125562 More actions · Answer 15

ZZartin (2/19/2015)
Eric M Russell (2/19/2015)It's unclear how a table containing only Title + Author, and duplicates allowed, would be used by an application. If someone wanted to know what titles are available for purchase, they want to see a distinct list, and once a specific book has been purchased, there needs to be a ID to distinguigh WHICH book was purchased. There either needs to be one record per distinct book (primary key on BookID) or one record per Title + Author (primary key) with an additional inventory count column that can be decremented.
If we're talking about non-relational lists that have been dumped into SQL Server, and nobody cares about duplicates, then as far as I'm concerned it's not even a real table, and the DBA shouldn't fret over it. It's like debating the nutritional value of a ball game hot dog.
/shrug this got way too involved for what I thought was a simple theoretical.
Title + Author (primary key) with an additional inventory count column
Functionally this is the same result as what I originally described with more complicated modification functions.

Without a primary key, the table isn't functional for anything, not even display in a listbox.

"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho