December 19, 2008 at 9:26 am
Hi
I use sql server 2005 and I wanted to ask an opinion for the creation of a table with millions of rows...
The table has a primary key and therefore indexes.
improve performance by dividing the table for years? :hehe: :exclamation:
ie
it is better to have a table with 24 million of rows or 12 tables with 2 million of rows
Thank you
December 19, 2008 at 9:30 am
That all depends on table design and access patterns. Being that you are on 2000 you could use a partitioned view to bring the multiple table scenario into one "table" and you can reap some benefits through that but again access patterns and designs are the critical component to the answer. So, if you want further input it would be good to post the table create script and some of the frequently used queries for consideration.
Hopefully that makes sense.
David
@SQLTentmaker“He is no fool who gives what he cannot keep to gain that which he cannot lose” - Jim Elliot
December 19, 2008 at 9:42 am
This is the table
CREATE TABLE [dbo].[Vendite](
[Data] [datetime] NOT NULL,
[Mese] [int] NOT NULL,
[Anno] [int] NOT NULL,
[Societa] [varchar](3) NOT NULL,
[Codice] [varchar](9) NOT NULL,
[TipologiaCliente] [varchar](1) NOT NULL,
[Classe] [int] NOT NULL,
[Importo1] [decimal](19, 2) NOT NULL,
[Importo2] [decimal](19, 2) NOT NULL,
[Importo3] [decimal](19, 2) NULL,
CONSTRAINT [PK_Vendite] PRIMARY KEY CLUSTERED
(
[Mese] ASC,
[Anno] ASC,
[Societa] ASC,
[Codice] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
queries may require the join over two years
some ideas?
Thank you
December 19, 2008 at 9:49 am
Making some great assumptions as I still don't have queries but if you are going to be retrieving 2 years of data (is that what you were stating?) then you would be able to gain some benefit from breaking the tables down physically and creating a partitioned view over the top of them. You should review the information in BOL regarding restrictions associated with partitioned views and do some testing before jumping in and making the change. Part of that testing should be with running standard queries in a side-by-side scenario to see if you are going to reap any benefit.
Again, some assumptions being made...
David
@SQLTentmaker“He is no fool who gives what he cannot keep to gain that which he cannot lose” - Jim Elliot
December 19, 2008 at 10:01 am
Example query
solution with single table
SELECT DISTINCT M.Mese,M.Anno,M.Societa,M.Codice FROM
dbo.VenditeAnno(@M,@Anno) M
LEFT OUTER JOIN dbo.VenditeAnno(@P,@AnnoPrec) P
ON M.Societa = P.Societa AND M.Codice = P.Codice
WHERE M.Classe = xxx
VenditeAnno is a function that returns the data a year
Thank you
December 19, 2008 at 10:19 am
I'm a little confused by the sql statement as it doesn't appear to be referencing the table at all but I am guessing that you are within the function. What is the function actually returning? Very curious.
David
@SQLTentmaker“He is no fool who gives what he cannot keep to gain that which he cannot lose” - Jim Elliot
December 19, 2008 at 10:29 am
Move whatever the function is doing into a derived table. The millions of call to that function is what is killing the performance.. no amount of indexing an overcome that.
We need to see the function's code to go any further.
December 22, 2008 at 2:29 am
FUNCTION [dbo].[VenditeAnno]
(
@Mese Int,
@Anno Int
)
RETURNS TABLE
AS
RETURN
(SELECT Vendite.Data, Vendite.Mese As Mese,Vendite.Anno as Anno,
Vendite.Societa, Vendite.Codice, Vendite.TipologiaCliente, Vendite.Classe,
Vendite.Importo1, Vendite.Importo2, Vendite.Importo3
FROM Vendite
WHERE Vendite.Mese=@Mese AND Vendite.Anno=@Anno
)
December 22, 2008 at 5:25 am
What's the point of the left join in the query? I don't understand what you are trying to do there.
December 22, 2008 at 6:11 am
Looks to be a year over year comparison, which if this is the case the physical separation of the data into multiple tables by year and using a partitioned view should help you get some better results. You should be able to see that in the execution plan when you do some testing. The only caveat to that would be that you may not be able to get the detail of the table access due to the use of the function. Hard to say until you try it.
I haven't used a function in the way you are using it but it seems odd to me. Is this efficient?
David
@SQLTentmaker“He is no fool who gives what he cannot keep to gain that which he cannot lose” - Jim Elliot
December 22, 2008 at 6:11 am
I think you might get much better performance if you make that concatenated PK a non-clustered index and change the clustered to just those 2 columns in your query's WHERE clause.
December 22, 2008 at 6:22 am
...because the way the function is written, and the columns contained in the function, the current indexing is not going to be sufficient and there will be some lookups. I haven't played with the table but it sure appears that way to me.
Just my thoughts.
David
@SQLTentmaker“He is no fool who gives what he cannot keep to gain that which he cannot lose” - Jim Elliot
December 22, 2008 at 9:22 am
I have tried with SET STATISTICS IO
Time is almost identical
The difference is that the solution with a single table run 1 physics reading and then all logic
while The solution with multiple tables running a 1 physical reading and then 1 logical reading for each year...
the keys and the structure of the tables is always the same
who benefits if I share the table??
Thank you
December 22, 2008 at 9:33 am
Can you include the statistics IO output? Also, can you attach the two query plans?
David
@SQLTentmaker“He is no fool who gives what he cannot keep to gain that which he cannot lose” - Jim Elliot
Viewing 14 posts - 1 through 13 (of 13 total)
You must be logged in to reply to this topic. Login to reply