Help me in designing table and its indexes

Question

Help me in designing table and its indexes

SSC Eights!

Points: 885

March 2, 2018 at 3:07 pm

CREATE TABLE FileMeta(Id [Some Data Type] NOT NULL, ParentId [Some Data Type] NOT NULL, Type CHAR(1) NOT NULL /* F = File or D = Directory */, AccountId [Some Data Type] NOT NULL, Name VARCHAR(255))

SELECT …FROM FileMetaWHERE AccountId = [Foo]AND Id = [Bar]

SELECT …FROM FileMetaWHERE AccountId = [Foo]AND Name = ‘[Some Name]’

trying to understand and design indexes
how would you choose indexes?

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply

Gail Shaw SSC Guru Points: 1004514 More actions · Answer 1

https://www.sqlinthewild.co.za/index.php/2009/01/19/index-columns-selectivity-and-equality-predicates/

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 2

SQLAddict01 - Friday, March 2, 2018 3:07 PM

CREATE TABLE FileMeta(Id [Some Data Type] NOT NULL, ParentId [Some Data Type] NOT NULL, Type CHAR(1) NOT NULL /* F = File or D = Directory */, AccountId [Some Data Type] NOT NULL, Name VARCHAR(255))

SELECT â€¦FROM FileMetaWHERE AccountId = [Foo]AND Id = [Bar]

SELECT â€¦FROM FileMetaWHERE AccountId = [Foo]AND Name = â€˜[Some Name]â€™

trying to understand and design indexes
how would you choose indexes?

You're trying to build a "Parent/Child" table, which is frequently referred to as an "Adjacency List". I see by your comments that it's going to be used for file directory structures in some way.

I also see that you're setting yourself up to perpetuate the need for RBAR to traverse the directory structure. That means that you're actually missing a few columns if you want whatever project you're working on to result in a very high performance solution and so we can't yet recommend any indexes because the columns you actually need indexes on are missing.

Please see the following articles on a very high performance hybrid method not only for making it so queries against the hierarchy run insanely fast, but maintenance of the hierarchy is easy and even more insanely fast.

Hierarchies on Steroids #1: Convert an Adjacency List to Nested Sets[/url]
Hierarchies on Steroids #2: A Replacement for Nested Sets Calculations[/url]

Also and just in case you don't know (many don't), be careful where you use the tokens of "foo" and "bar" in any of your discussions or code because you might not know what the source of those names are but someone you speak to may. They're a derivative the pronunciation guide (foÍžobÃ¤r) of the military slang term "FUBAR", which is a NSFW (Not Safe For Work) term especially in the presence of people that are sensitive to swear words. I won't even use the terms in code and especially not as examples of code (and, yes, I was in the military). 😉

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

DesNorton SSC-Insane Points: 24681 More actions · Answer 3

I would also consider separating Directories and Files into their own tables.

How to post data/code on a forum to get the best help.

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 4

DesNorton - Saturday, March 3, 2018 1:00 PM
I would also consider separating Directories and Files into their own tables.

I'll suggest against putting those into their own tables. It would be like putting managers and employees in their own tables, which pretty much screws the whole notion of the hierarchy.

If you have the files and directories in the same table, then you can also use the method in the article at the second link to get byte totals for all levels of all directories.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

DesNorton SSC-Insane Points: 24681 More actions · Answer 5

Jeff Moden - Saturday, March 3, 2018 5:52 PM
DesNorton - Saturday, March 3, 2018 1:00 PM
I would also consider separating Directories and Files into their own tables.
I'll suggest against putting those into their own tables. It would be like putting managers and employees in their own tables, which pretty much screws the whole notion of the hierarchy.
If you have the files and directories in the same table, then you can also use the method in the article at the second link to get byte totals for all levels of all directories.

I see Managers and Employees as both being Employees, therefor keeping 1 table, with an EmployeeType field.
However, I see Directories and Files as completely separate entities, hence the normalisation.

That said, the ability to get to the byte totals would be a compelling reason to demormalise in this case.

How to post data/code on a forum to get the best help.

DesNorton SSC-Insane Points: 24681 More actions · Answer 6

DELETED - Somehow managed to double post

How to post data/code on a forum to get the best help.

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 7

DesNorton - Saturday, March 3, 2018 10:32 PM
Jeff Moden - Saturday, March 3, 2018 5:52 PM
DesNorton - Saturday, March 3, 2018 1:00 PM
I would also consider separating Directories and Files into their own tables.
I'll suggest against putting those into their own tables. It would be like putting managers and employees in their own tables, which pretty much screws the whole notion of the hierarchy.
If you have the files and directories in the same table, then you can also use the method in the article at the second link to get byte totals for all levels of all directories.
I see Managers and Employees as both being Employees, therefor keeping 1 table, with an EmployeeType field.
However, I see Directories and Files as completely separate entities, hence the normalisation.
That said, the ability to get to the byte totals would be a compelling reason to demormalise in this case.

I wouldn't call it a denormalization but can understand why some might. I just look at directory names and files names as manager and employee names in that they just named parts of the same hierarchy. Some can have children (directories) and some cannot (files). Directories have a name with zero length and files have names with typically a non-zero length. They're just attributes for a general entity. Putting directories and files in different tables would be like putting vehicles in different tables just based on what color they are or whether or not they have 4 wheel drive. Look at sys.objects. They're all objects and they have different attributes such as being a view or a table, etc.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

RonKyle SSC-Dedicated Points: 31578 More actions · Answer 8

This might be a case where it depends what you want to do with the data and no obvious clear answer at the outset. An example to compare it to would be street address. In most cases the number and street would be in the same field. But that might not give someone who sells houses the needed granularity. In this case there would be one field for the street, and another for the address. Two tables rather than columns are being discussed, but the intent is the same. It seems to me that generally these would be combined. But if the requirement is to take certain measurements that this combination would make more difficult, then better to separate.

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 9

SQLAddict01 - Friday, March 2, 2018 3:07 PM

CREATE TABLE FileMeta(Id [Some Data Type] NOT NULL, ParentId [Some Data Type] NOT NULL, Type CHAR(1) NOT NULL /* F = File or D = Directory */, AccountId [Some Data Type] NOT NULL, Name VARCHAR(255))

SELECT â€¦FROM FileMetaWHERE AccountId = [Foo]AND Id = [Bar]

SELECT â€¦FROM FileMetaWHERE AccountId = [Foo]AND Name = â€˜[Some Name]â€™

trying to understand and design indexes
how would you choose indexes?

So, back to you, SQLAddict01 . What are you actually going to use this table for? What is the business logic that you will attempt that this table is supposed to help solve? Of course, the reason for me asking is because you haven't actually defined the problem that the table will be used to solve and so all of our recommendations (and yours) may be out in left field.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 10

RonKyle - Monday, March 5, 2018 5:06 AM
This might be a case where it depends what you want to do with the data and no obvious clear answer at the outset. An example to compare it to would be street address. In most cases the number and street would be in the same field. But that might not give someone who sells houses the needed granularity. In this case there would be one field for the street, and another for the address. Two tables rather than columns are being discussed, but the intent is the same. It seems to me that generally these would be combined. But if the requirement is to take certain measurements that this combination would make more difficult, then better to separate.

How would you lose granularity for a directory structure as a single Adjacency List (Parent/Child) table? In your example, the number and street address would, in fact, be in the same table, would they not?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

RonKyle SSC-Dedicated Points: 31578 More actions · Answer 11

RonKyle - Monday, March 5, 2018 5:06 AM

This might be a case where it depends what you want to do with the data and no obvious clear answer at the outset. An example to compare it to would be street address. In most cases the number and street would be in the same field. But that might not give someone who sells houses the needed granularity. In this case there would be one field for the street, and another for the address. Two tables rather than columns are being discussed, but the intent is the same. It seems to me that generally these would be combined. But if the requirement is to take certain measurements that this combination would make more difficult, then better to separate.

How would you lose granularity for a directory structure as a single Adjacency List (Parent/Child) table? In your example, the number and street address would, in fact, be in the same table, would they not?

Yes, as I pointed out in the answer, I was discussing columns rather than tables. Nevertheless, the point was that you can't automatically say how something is supposed to be broken out, or not, without knowing what information is needed from the data. You assumed that the structure had to be so, but without knowing what was needed. Your follow up with the poster was the correct question: what do you intend to do with the data, or maybe more clearly what information do you need from the data you're storing.

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 12

RonKyle - Tuesday, March 6, 2018 4:25 AM
RonKyle - Monday, March 5, 2018 5:06 AM
This might be a case where it depends what you want to do with the data and no obvious clear answer at the outset. An example to compare it to would be street address. In most cases the number and street would be in the same field. But that might not give someone who sells houses the needed granularity. In this case there would be one field for the street, and another for the address. Two tables rather than columns are being discussed, but the intent is the same. It seems to me that generally these would be combined. But if the requirement is to take certain measurements that this combination would make more difficult, then better to separate.

How would you lose granularity for a directory structure as a single Adjacency List (Parent/Child) table? In your example, the number and street address would, in fact, be in the same table, would they not?

Yes, as I pointed out in the answer, I was discussing columns rather than tables. Nevertheless, the point was that you can't automatically say how something is supposed to be broken out, or not, without knowing what information is needed from the data. You assumed that the structure had to be so, but without knowing what was needed. Your follow up with the poster was the correct question: what do you intend to do with the data, or maybe more clearly what information do you need from the data you're storing.

Heh... understood. My recommendations are based on all of the problems that I've seen in the past with people splitting things out to multiple tables when all they really need is a single, simple Adjacency List for ease of maintenance and "flash" it to a Nested Sets hierarchy when a change occurs. While there's always an exception, I've not yet seen where separation to multiple tables actually solves anything such as what the OP has portrayed except to justify the need for additional disk space, CPUs, and Developer time. 😉

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 13

Speaking of the OP...

@ SQLAddict01 , your absence from your own request for help is a bit conspicuous. Any feedback on the questions I've posed to try to help you figure out what's best for your project?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

RonKyle SSC-Dedicated Points: 31578 More actions · Answer 14

While there's always an exception, I've not yet seen where separation to multiple tables actually solves anything such as what the OP has portrayed except to justify the need for additional disk space, CPUs, and Developer time

Wasn't saying you were wrong, was just saying more information was needed. I ran into the street number/street name issue when I was first learning SSAS and created a data warehouse for my daughter's Girl Scout cookie sales. To determine the percentage of houses sold to versus the overall number I had to break out the addresses further. But it made me realize that a realtor or the USPS might also have the same need. But of course generally these are in a single column because in most databases they can be considered atomic. But since then I've always kept in mind the reporting goal, and sometimes that changes the way the database is designed.