September 15, 2003 at 7:03 am
I need Help!
I have one very large table (12 million rows) with the following columns:
(DID, datetime, origination, servicetype, serviceloc, servicecon, servicename, host, value)
I would like to seperate the table into two tables with the following columns:
Table1
(DID,origination, servicetype, serviceloc, servicecon, servicename, host)
Table2
(DID,datetime, value1)
Currently the data is in one table and even with indexes the performance is very poor. I believe by doing this I can drasticaly improve the performance of the reports hitting the tables since the number rows in table1 would be less than 1000 and the number of rows in table2 would be around 12 million.
I appreciate any help I can get with this.
Thanks,
Jamey
- JH
- JH
September 15, 2003 at 8:08 am
You are almost there already. Now that you have created the tables, you can
INSERT INTO Table1
SELECT DISTINCT DID,origination, servicetype, serviceloc, servicecon, servicename, host
FROM <OriginalTable> (NOLOCK)
INSERT INTO Table2
SELECT DID, datetime, value1
FROM <OriginalTable> (NOLOCK)
I would recommend leaving the indexes off table 1 and 2 until after they have data...it will speed up the insert.
Good luck
Guarddata-
September 15, 2003 at 8:11 am
According to your analyis, currently the fields DID,origination, servicetype, serviceloc, servicecon, servicename, host are duplicated and you want to normalize the table. DID will be the primary key for the new table1 (referenced in Table2.
create a new table called Table 1 with the fields DID,origination, servicetype, serviceloc, servicecon, servicename, host and the run the query.
Insert into Table1 Select distinct DID,origination, servicetype, serviceloc, servicecon, servicename, host from <CurrentTable>
Now rename the <CurrentTable> into Table2 and remove the unnneccesary fields.
Create the relationship properly.
I suggest you to create view <CurrentName> which is a selection from both tables.
You can have instead of Insert, instead of update & instead of delete triggers to handle any code referring the old table.
This is only a suggestion.
G.R. Preethiviraj Kulasingham
Chief Technology Officer.
Softlogic Information Systems Limited,
14 De Fonseka Place,
Colombo 05.
Sri Lanka.
Cheers,
Prithiviraj Kulasingham
http://preethiviraj.blogspot.com/
September 15, 2003 at 9:03 am
Thtanks for the responses!
If I forgot to mention that the DID column is an ID field. Therefore, if I do a select distinct on it it will return all 12 million rows.
I built the following code and it seems to work however when running it against the 12 million rows it runs for roughly 24 hours. (which is a problem)
Here's what I have so far:
--Updates the Descrition table with any new data
INSERT INTO tblDescription (origination, servicetype, serviceloc, host, servicecon)
SELECT DISTINCT 'MailStats', service, param, server, 'Residential'
FROM tblAll a
WHERE NOT EXISTS (SELECT a.service, a.param, a.server FROM tblAll a, tblDescription d WHERE a.service = d.servicetype and a.param = d.serviceloc and a.server = d.host)
--Scans through decription table and inserts values into the value table and links the desciption table to the values table by the DID column.
INSERT INTO tblResults (DID,[datetime], datamonth, value1)
SELECT d.DID, a.date, DATEPART(Month,a.date), a.value
FROM tblAll a, tblDescription d
WHERE EXISTS (SELECT a.service, a.param, a.server FROM tblAll a, tbldescription d WHERE a.service = d.servicetype and a.param = d.serviceloc and a.server = d.host)
Sorry for the elementary coding here. I am VERY new to this!
- JH
- JH
September 15, 2003 at 1:47 pm
INSERT INTO Table1
12 million rows is a lot of data to cut your teeth on .
Remove the WHERE NOT EXISTS clause from your statements. Since nothing is in the destination table to begin with, this condition should always be true.
HOWEVER, the real problem is the duplication of the key fields. If all 6 fields define the key as you seem to indicate, try this
CREATE TABLE tblDescription (
DescID INT IDENTITY(1,1),
origination ....add the proper data types
servicetype
serviceloc
servicecon
servicename
host
)
CREATE TABLE tblResults (
DescID INT,
datetime .... not a good idea to use a reserved word like this...try something like ResultTime
value ....
)
INSERT INTO tblDescription (origination, servicetype, serviceloc, servicecon, servicename, host)
SELECT DISTINCT origination, servicetype, serviceloc, servicecon, servicename, host
from tblAll
INSERT INTO tblResults ( DescID, ResultTime, Value )
SELECT D.DescID, A.ResultTime, A.Value
FROM tblAll A
INNER JOIN tblDescription D ON
D.origination = A.origination
AND D.servicetype = A.servicetype
AND D.serviceloc = A.serviceloc
AND D.servicecon = A.servicecon
AND D.servicename = A.servicename
AND D.host = A.host
Good luck. I would recommend picking some some good books or tutorials dealing with Database structure and design.
Guarddata-
September 16, 2003 at 12:36 am
24 hours to run a report on a 12 million rows table sounds a bit excessive if you ask me. Have you defined any indexes on your tables referenced in this report? Look at your JOIN columns and WHERE criteria columns and make some intelligent decisions on whether to build indexes on them. And if the answer is yes to above question, have you checked to see that they are not fragmented maybe?
Nikki Pratt
Development DBA
Nikki Pratt
Development DBA
September 16, 2003 at 3:05 am
Jamey,
guarddata has given the right answer.
I would extent his version for table2 insert by the following:
INSERT INTO Table2
SELECT DID, datetime, value1
FROM <OriginalTable> (NOLOCK)
WHERE datetime is not null
As you have written you would have around 1000 records in table2.
As I don't know your data I'assumed that the datetime column is the selective one (either it has a value or it has not). Of cours you should modify this query to fit your environment (check the value1 column for not null or other thisg which fits to your requirements)
By creating the indexes after the data load the whole insert processes should not take too long for 12M records.
Bye
Gabor
Bye
Gabor
Bye
Gabor
September 16, 2003 at 3:10 am
Or there is an another possibility what I do if I have a large table to remodel (over 100M rows)
- I do a bcp out into a file from a view corresponding to table1 definition after a second bcp out corresponding to table2 definition.
- Create tbale1 and table2 whithout indexes
- bcp in into table1 and table2
- create the required indexes
Assuming you have set the recovery mode to bulk logged there will be a minimum of logging therefore this one is the fastest solution over all.
Bye
Gabor
Bye
Gabor
September 16, 2003 at 7:18 am
Thanks everyone for all of your help!
Just to give some more detail to my set up. Table1 should only have 1000 or so rows, however, during each insert I must check and make sure that the data from the large table is in Table1, if it is then just grab the DID and insert it along with the datetime, and value into Table2. If it's not in Table1 then insert it into Table1 and then grab the DID number along with the datetime and value and insert it into Table2.
The data is generated like so...We have approx. 1000 systems that are being monitored (The data in Table1). Each system is being polled approx. 500 times a day (the data in Table2). Currently there is only one table and all of the data is being dumped into it every 30 minutes. This is causing very slow performance on the system since I don't have time to reIndex the table between updates I have already ordered another server to set up a data wharehouse on in order to report off of, this will give some relief to the primary server.
Hopefully this will give you all an idea as too what I need and also an idea of what the tables are used for. Again I want to thank everyone for their help with this matter.
Thanks,
Jamey
- JH
- JH
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply