July 21, 2003 at 9:00 am
The project that I'm involved is to FTP flat file data. Once detail data is up, its next summarise into Hourly, Daily,... all time bounds. The whole idea of the project is to create Reports of all kinds. The data is basically calls comming into a CallCenter. The detail data is updated using BCP, and summarisation is done using user defined SP. Now that the reports are generated a bit slower (by Crystal Reports 9). We have to optemize the database to apply proper Index on the tables. Right now we have Composite PK's for all such tables. So, in brief if there is a single Detailed Table, there are 3 different Tables that summarises this detailed database, Maybe one summary teable is for reports about CallAgent's Summary while other is for CallGroup's Summary. And all these Summaries have Time break-up starting from Daily,...Yearly. My only question is 1.) Is Indexing needed for Summary data while updating these database tables. <br> 2.) Is indexing is based on the types of report that you create or its based on aggression used before summarisation. Or create them for both. Finally I need help in creating these Indexes, as we are creating views also maybe indexed Views. I'll post the table's and SP. If you need so.
TakeCare,
Bue!!
TakeCare,
Bue!!
July 21, 2003 at 12:43 pm
Actually, sounds like you need an OLAP solution since the data is for reporting only and will be read-only after the initial load. Check out MS OLAP services or the forum on this site for Analysis Services.
--
If you don't want to go that route, I'd post the schema and a little clearer explanation as to who is using what data, how often, using what filters/criteria, and the schema of the tables.
HTH,
jay
July 21, 2003 at 2:14 pm
I agree with the above post. OLAP or we need more information to suggest something.
Steve Jones
July 22, 2003 at 7:23 am
Thx, and OLAP or AS cannot be used as per Specs. So we have to do it in a traditional way. So, I would like to tell more about the dataTables. The data is classified into 3 types, AgentDetails, CallsDetails, and TrunkUsageDetails.
where,
AgentCallDetails - talks about how the Call was handled.
CallsDetails - talks about life cycle of the call, and
TrunkUsageDetails - talks about facility usage.
AgentCallDetails [ACD]/table has (we could not create any PK/ CPK)
- Login/Logout report.
We have created a view as follows
----
CREATE VIEW VW_ACD_LOGIN_LOGOUT (
AGENT_ID, AGENT_NAME, QUEUE_ID, LOGIN_DATE, LOGIN_TIME, LOGOUT_DATE, LOGOUT_TIME, SITE_ID)
AS SELECT login.AGENT_ID, login.AGENT_NAME, login.QUEUE_ID, --queue_id is department id
login.DATE_ID, login.TIME_ID,
logout.DATE_ID, logout.TIME_ID, login.SITE_ID --site_id is location of the contact Office
FROM ACD login CROSS JOIN ACD logout
WHERE (login.EVENT_ID IN (9, 10)) --login, logout events
GROUP BY login.SITE_ID, login.QUEUE_ID, login.AGENT_NAME, login.AGENT_ID, login.EVENT_ID, login.DATE_ID, login.TIME_ID, logout.AGENT_NAME, logout.AGENT_ID, logout.EVENT_ID, logout.DATE_ID, logout.TIME_ID
HAVING (login.EVENT_ID = 9) AND (logout.EVENT_ID = 10) AND (login.TIME_ID < logout.TIME_ID) AND (login.DATE_ID = logout.DATE_ID) AND (login.AGENT_NAME = logout.AGENT_NAME)
----
Since we dont have any Index created, do we need to create view or indexed view.
Should indexed view have Index as a base...
Please guide me thro' creating Index or Indexed View.
CallsDetails [CDR] table, had different types of calls. Like Inbound, OutBound, Transfered, VoIP. While designing database we created tables CDR_INBOUND, CDR_OUTBOUND, CDR_OTHER, CDR_OTHER. This normilization design was made based on (report) RPT requirement. Though many tables they have same schema.
The RPT, would be based on CDR RPT's of (Inbound, Outbound, Frequently Dailed Numbers) across either (Geography, or Time, or Department) across Location/Site_id
The Schema of CDR in general is
----
CDR_IDCPK
SITE_IDCPK
CALL_DURATION
END_DATE_TIME
CALL_ID
CALLING_PARTY
CALLED_PARTY
FINAL_PARTY
DATE_ID
TIME_ID
USERNAME
DEPARTMENT
...
---
There are also Excetion (a type of RPT). Thats based on many selection criterion. There are many condition that has to select to get his RPT. Ex: "Get RPT when CALL_DURATION was more than x sec for more than n times by the CAll_id = y". So this would also need Indexing.
So, though there are so many issues. We have started tunning database after creating UI, and Reports. I know this is sad. But it would be nice even if we get thro' this stage ... If we could get performance help from database. And from y'll. We are concentrating only on Indexing. And we have no DBA to estimate a plan.
Please, request any requirment we could update the same. We have started targetting SP and UDF to get us resultSets for the UI ... things have imporved. But we need good help on Indexing that's the issue. So thats clear in my first request. "Should Indexing be based on aggrestion, or type of RPT of both should have as many Indexes"
TakeCare,
Bue!!
TakeCare,
Bue!!
July 22, 2003 at 11:12 am
Buell, please don't take my post the wrong way; I'm only trying to help. I believe that if you have numerous queries like the one you posted, you are heading for disaster.
--
To begin with, have you denormalized the data set on purpose? The AGENT_NAME, QUEUE_ID, and SITE_ID are fields which should be stored in the CALL Table, not the CallDetail Table.
--
Secondly, you are doing a CROSS JOIN from a detail table to itself, with barely any WHERE expression to filter the data out. The HAVING clause occurs AFTER the data has been spooled out into tempdb to create the worktable for your GROUP BY expression. The more expressions you can take out of the HAVING clause and into the WHERE clause, the less data will be initially retrieved from the database and used in the worktable. Secondly, what exactly is the point of the cross join? Please explain why this won't work. It's an example of putting only the relevant data in the select statement, NOT having a CROSS JOIN, building a useful WHERE clause, and stripping out the HAVING clause:
SELECT
login.AGENT_ID
, login.DATE_ID
, login.TIME_ID
, logout.DATE_ID
, logout.TIME_ID
FROM ACD login
INNER JOIN ACD logout
ON login.AGENT_ID = logout.AGENT_ID
AND login.DATE_ID = logout.DATE_ID
WHERE
login.EVENT_ID = 9
ANDlogout.EVENT_ID = 10
GROUP BY login.AGENT_ID, login.DATE_ID, login.TIME_ID, logout.DATE_ID, logout.TIME_ID
July 22, 2003 at 11:18 am
I guess the point is that the indexing strategy you take is inconsequential if the queries and tables are poorly designed to begin with. If you normalize the table structures and do things like I wrote above, then the indexing strategy becomes much more clear: index first on those fields which will be most frequently accessed via WHERE clauses (and JOIN expressions), and consider different strategies for clustering based on how many queries are GROUPING on a certain set of data, or are doing RANGE queries (for instance, on your DATE_ID field?), etc.
--
Also, why aren't you using DATETIME fields?
July 22, 2003 at 3:44 pm
To latch on jpipes, maybe moving the where into the on clause. Do pickup some problems with it sometimes.
July 23, 2003 at 3:29 am
Thx for this info, and a neat query. And we have Normalised the database. The Date_Interval table keeps complete track of the Date(date, weekDay, yearDate, yearWeek...) and similarly the Time_Interval table. As report request can be to see data between days, on day, Only a day, or Only Monday.... and many more queries. More over the ACD is sample of important fields that's Composite PKs. So otherwise they are fine.
I've tried both y'r queries, they were fine and better than mine (thanx). Both could get data of 20Million records in 5sec.
And thax for the Indexing needs, and info.
TakeCare,
Bue!!
TakeCare,
Bue!!
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply