March 29, 2006 at 9:18 am
So I'm developing a survey stats software that needs to run fast. But my main table has over 3 million rows. My solution, creating an insert trigger that divides the 3 million rows into seperate tables (around 400. one table for each question in the particular survey).
So when the Insert trigger occurs, I need to insert the row into the proper table. This time there's 400 tables, there could be 50 in the future, or 1000.
Now the original row inserted has a column which will tell me which table I need to insert the record into. SO I store that in a variable. ie:
@tableName varchar(50)
SELECT @tableName = (SELECT Question FROM Inserted)
For this example, let's say "Q10" is stored in @tableName. I would need to execute this
INSERT INTO Q10 (.......
But because I can't hardcode every potential table, I'd like to be able to execute
INSERT INTO @TABLENAME (.....
But because @TABLENAME is a variable, and not a table, it can't execute this.
Is there a way for me to be able to do that???
Or would I need to create a trigger with a huge SELECT/CASE statement?
March 29, 2006 at 9:24 am
March 29, 2006 at 9:49 am
Wonderful! Thanks....
March 29, 2006 at 9:57 am
Hi Dan
A word of warning. This is not a design you should take lightly. It could become a nightmare to work with.
You should only choose it after spending a long time thinking about the alternatives and the implications.
Ryan Randall
Solutions are easy. Understanding the problem, now, that's the hard part.
March 29, 2006 at 10:12 am
My Nightmare is well on it's way already RyanRandall, here's how I got here. I posted another topic in Performance and Tuning too...
- We have a survey software that is SQL Based (Custom Designed, outsourced)
- In order to provide clients with fast access, we created a replication server. Solely for their use
- Because each survey has its own set of tables, I had to automate trnasactional replication and the subscription. (First Nightmare)
- Now that replication was set. It was time to point my webstats software to the replicated data.
- Because our 'answer' table has 3+ million rows. When I was trying to pull stats off replication, it ended up crashing the survey software that runs off the main server. I assume because the replication got backed up, and then the software couldn't access the database quickly enough. And this was only me trying to do things like "SELECT TOP 100000 FROM..."
- Now I need to provide clients with real-time survey stats. But how can I do calculations quickly on this large table, without causing delays and crashes?
MY SOLUTION:
When I automate my replication, if I create triggers on our 'answer' table I can
a) determine which question was answered
b) insert that new answer into a table that stores all the answers for that question
c) when a client request data on 5 questions, now I just need to query five small tables, each one only once. As opposed to querying the same huge table 5 times. Or once with a complex query.
If I'm overly complicating my life, I'd love to know how to simplify it. But since I don't know the names of the survey questions that will pop up in the future, I don't know how I can avoid dynamically building SQL queries.
March 29, 2006 at 10:20 am
Can't you achive the same thing by placing a clustered index on a questionID column in your answers table? If you know what question that you are working with, you can search for the answers by a questionID. A clustered index seek on a table with 3 M. rows will still net great performance. I think what Ryan was concerned with was the overhead that you would be placing on your insert trigger.
March 29, 2006 at 10:30 am
John - No, I was concerned about the whole design . (I'm with you on the solution suggestion if that's going to be possible for Dan, though.)
Dan - Create a new thread asking for a discussion on the design. You won't regret it!
Ryan Randall
Solutions are easy. Understanding the problem, now, that's the hard part.
March 29, 2006 at 10:31 am
There's already a clustered index, and it's already using it.
A lot of my initial queries seem to need to use an Index Scan, not a seek. (SELECT DISTINCT(ANSWERTEXT) WHERE QUESTION = 'Q10') for example
My indexes are working, and just basic aggregate queries are crushing me. I tried adding additional indexes as well, and it really did not help convert index scans to index seeks....
March 29, 2006 at 10:36 am
FYI... I also placed this same request in your performance tuning thread..
You need to figure out how to get your queries to use an index seek. On a table with 3M rows, an index scan is not going to perform near as well as an index seek. Can you post your table DDL along with a sample query and maybe some data?
March 29, 2006 at 11:07 am
Here's my table creation script... For some reason the replication didn't copy over the PK, but it does create the PK clustered index. Plus the query runs just as slow on the other server where the PK's are created by the survey software.
CREATE TABLE [dbo].[Response_1000] (
[RpsRespondent] [varchar] (10) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
[RpsQuestion] [nvarchar] (10) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
[RpsMatrix] [int] NOT NULL ,
[RpsMention] [int] NOT NULL ,
[RpsContent] [nvarchar] (100) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[RpsOpened] [ntext] COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[RpsType] [int] NOT NULL
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO
CREATE UNIQUE CLUSTERED INDEX [PK_Response_1000] ON [dbo].[Response_1000]([RpsRespondent], [RpsQuestion], [RpsMatrix], [RpsMention], [RpsType]) WITH FILLFACTOR = 90 ON [PRIMARY]
GO
CREATE INDEX [IxRpsRespondent_1000] ON [dbo].[Response_1000]([RpsRespondent]) WITH FILLFACTOR = 90 ON [PRIMARY]
GO
---------
As for sample queries, I can't run some now because it will crash our live server. But here's some I remember testing in Query Analyzer in order to see the execution.
Oh, here's a good one. When I created my stats software I would grab the above table, and make a copy of it, with only the answers from completed surveys. Because that sped up the stats. So in the test, I'd go from 300,000 records, and end up with a table with 10,000 records.
When I first crashed our survey software, it's because this specific query was timing out. Depending on the solution, it may no longer be needed, but here it is...
SELECT * INTO NEW_1000 FROM Response_1000 WHERE RPSRESPONDENT IN (SELECT ResRespondent FROM Respondent_1000 WHERE (RESCODERESULT = '100'))
CREATE CLUSTERED INDEX TESTING ON NEW_1000 (RpsRespondent, RpsQuestion, RpsContent) WITH FILLFACTOR = 100
CREATE NONCLUSTERED INDEX TESTING2 ON NEW_1000 (RpsQuestion, RpsContent) WITH FILLFACTOR = 100
March 29, 2006 at 11:22 am
How about a few sample rows from Response_1000 and maybe the same for RpsRespondent (along with the DDL)?
March 29, 2006 at 11:30 am
I can't post any of this data unfortunately...
The respondent table is a list of people who took part, and their completion code.
I just link to that to only import the answers that are part of actual completed surveys
How do I get a DDL generated in SQL Server?
March 29, 2006 at 11:34 am
In EM, right-click on the table and go to All Tasks>>Generate SQL Script.
Can you post test data or create a couple of examples?
March 29, 2006 at 12:15 pm
Your lead column on your clustered index is Respondent. I doubt that is a terribly useful column in your statistical analysis. On your reporting database, try reclustering with a different column order, leading with Question and whatever else you tend to filter on. A non-clustered index on respondent might be useful if you also query on it.
Even with a non-clustered index on Question, SQL is apt to scan the clustered index to find particular questions if the cardinality is low (and 400 distinct values across 3M rows is low cardinality.) Does the execution plan bear this out?
I agree with the earlier answers that your proposed solution is going to be a maintenance nightmare. Don't do it! But if you do, consider a partitioned view instead of a trigger solution. Oh, and don't forget - we warned you not to do it.
March 30, 2006 at 9:40 am
Few questions and a couple comments. I'm going to focus on indexes and tuning.
CREATE UNIQUE CLUSTERED INDEX [PK_Response_1000] ON [dbo].[Response_1000]([RpsRespondent], [RpsQuestion], [RpsMatrix], [RpsMention], [RpsType]) WITH FILLFACTOR = 90 ON [PRIMARY]
GO
CREATE INDEX [IxRpsRespondent_1000] ON [dbo].[Response_1000]([RpsRespondent]) WITH FILLFACTOR = 90 ON [PRIMARY]
GO
Your cluster is too wide and your noncluster is redundent, since that's the first column of the cluster.
Does the question have to be nvarchar, or can it be just varchar? If it can then change it. That will narrow the indexes a little. (narrower index column means shallower index tree, means SQL's more likely to use it.)
What are the queries that you normally run?
How selective are the various fields that you filter on?
What does the Respondent_1000 table look like? It should have an index on RESCODERESULT if you're using that to filter out uncompleted questionaires.
As a first suggestion, without actually seeing the queries.
CREATE CLUSTERED INDEX idx_Response1000_RpsRespondentRpsQuestion
ON Response_1000 (RpsRespondent, RpsQuestion) -- this assumes that you're going to be soing queries, orders and grouping by these 2. Swap the order if you query more on question than respondent.
GO
CREATE NONCLUSTERED INDEX idx_Response1000_RpsMatrix
ON Response_1000 (RpsMatrix) -- If you never do queries on Matrix, with or without the question and respondent, ignore this.
GO
CREATE NONCLUSTERED INDEX idx_Response1000_RpsMention
ON Response_1000 (RpsMention) -- If you never do queries on RpsMention, with or without the question and respondent, ignore this.
GO
CREATE NONCLUSTERED INDEX idx_Response1000_RpsType
ON Response_1000 (RpsType)
GO
If you can post the queries that you need to run, I can give you a lot more useful suggestions.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
Viewing 15 posts - 1 through 14 (of 14 total)
You must be logged in to reply to this topic. Login to reply