How do I decide, why index here and not there?

Question

How do I decide, why index here and not there?

Buell

SSC Journeyman

Points: 79
More actions
July 21, 2003 at 9:00 am

#101579

The project that I'm involved is to FTP flat file data. Once detail data is up, its next summarise into Hourly, Daily,... all time bounds. The whole idea of the project is to create Reports of all kinds. The data is basically calls comming into a CallCenter. The detail data is updated using BCP, and summarisation is done using user defined SP. Now that the reports are generated a bit slower (by Crystal Reports 9). We have to optemize the database to apply proper Index on the tables. Right now we have Composite PK's for all such tables. So, in brief if there is a single Detailed Table, there are 3 different Tables that summarises this detailed database, Maybe one summary teable is for reports about CallAgent's Summary while other is for CallGroup's Summary. And all these Summaries have Time break-up starting from Daily,...Yearly. My only question is 1.) Is Indexing needed for Summary data while updating these database tables. <br> 2.) Is indexing is based on the types of report that you create or its based on aggression used before summarisation. Or create them for both. Finally I need help in creating these Indexes, as we are creating views also maybe indexed Views. I'll post the table's and SP. If you need so.
TakeCare,
Bue!!

TakeCare,
Bue!!

Viewing 8 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply

jpipes SSChampion Points: 12230 More actions · Answer 1

Actually, sounds like you need an OLAP solution since the data is for reporting only and will be read-only after the initial load. Check out MS OLAP services or the forum on this site for Analysis Services.

--

If you don't want to go that route, I'd post the schema and a little clearer explanation as to who is using what data, how often, using what filters/criteria, and the schema of the tables.

HTH,

jay

Steve Jones - SSC Editor SSC Guru Points: 741028 More actions · Answer 2

I agree with the above post. OLAP or we need more information to suggest something.

Steve Jones

sjones@sqlservercentral.com

http://www.sqlservercentral.com/columnists/sjones

http://www.dkranch.net

Buell SSC Journeyman Points: 79 More actions · Answer 3

Thx, and OLAP or AS cannot be used as per Specs. So we have to do it in a traditional way. So, I would like to tell more about the dataTables. The data is classified into 3 types, AgentDetails, CallsDetails, and TrunkUsageDetails.

where,

AgentCallDetails - talks about how the Call was handled.

CallsDetails - talks about life cycle of the call, and

TrunkUsageDetails - talks about facility usage.

AgentCallDetails [ACD]/table has (we could not create any PK/ CPK)

- Login/Logout report.

We have created a view as follows

----

CREATE VIEW VW_ACD_LOGIN_LOGOUT (

AGENT_ID, AGENT_NAME, QUEUE_ID, LOGIN_DATE, LOGIN_TIME, LOGOUT_DATE, LOGOUT_TIME, SITE_ID)

AS SELECT login.AGENT_ID, login.AGENT_NAME, login.QUEUE_ID, --queue_id is department id

login.DATE_ID, login.TIME_ID,

logout.DATE_ID, logout.TIME_ID, login.SITE_ID --site_id is location of the contact Office

FROM ACD login CROSS JOIN ACD logout

WHERE (login.EVENT_ID IN (9, 10)) --login, logout events

GROUP BY login.SITE_ID, login.QUEUE_ID, login.AGENT_NAME, login.AGENT_ID, login.EVENT_ID, login.DATE_ID, login.TIME_ID, logout.AGENT_NAME, logout.AGENT_ID, logout.EVENT_ID, logout.DATE_ID, logout.TIME_ID

HAVING (login.EVENT_ID = 9) AND (logout.EVENT_ID = 10) AND (login.TIME_ID < logout.TIME_ID) AND (login.DATE_ID = logout.DATE_ID) AND (login.AGENT_NAME = logout.AGENT_NAME)

----

Since we dont have any Index created, do we need to create view or indexed view.

Should indexed view have Index as a base...

Please guide me thro' creating Index or Indexed View.

CallsDetails [CDR] table, had different types of calls. Like Inbound, OutBound, Transfered, VoIP. While designing database we created tables CDR_INBOUND, CDR_OUTBOUND, CDR_OTHER, CDR_OTHER. This normilization design was made based on (report) RPT requirement. Though many tables they have same schema.

The RPT, would be based on CDR RPT's of (Inbound, Outbound, Frequently Dailed Numbers) across either (Geography, or Time, or Department) across Location/Site_id

The Schema of CDR in general is

----

CDR_IDCPK

SITE_IDCPK

CALL_DURATION

END_DATE_TIME

CALL_ID

CALLING_PARTY

CALLED_PARTY

FINAL_PARTY

DATE_ID

TIME_ID

USERNAME

DEPARTMENT

...

---

There are also Excetion (a type of RPT). Thats based on many selection criterion. There are many condition that has to select to get his RPT. Ex: "Get RPT when CALL_DURATION was more than x sec for more than n times by the CAll_id = y". So this would also need Indexing.

So, though there are so many issues. We have started tunning database after creating UI, and Reports. I know this is sad. But it would be nice even if we get thro' this stage ... If we could get performance help from database. And from y'll. We are concentrating only on Indexing. And we have no DBA to estimate a plan.

Please, request any requirment we could update the same. We have started targetting SP and UDF to get us resultSets for the UI ... things have imporved. But we need good help on Indexing that's the issue. So thats clear in my first request. "Should Indexing be based on aggrestion, or type of RPT of both should have as many Indexes"

TakeCare,

Bue!!

TakeCare,
Bue!!

jpipes SSChampion Points: 12230 More actions · Answer 4

Buell, please don't take my post the wrong way; I'm only trying to help. I believe that if you have numerous queries like the one you posted, you are heading for disaster.

--

To begin with, have you denormalized the data set on purpose? The AGENT_NAME, QUEUE_ID, and SITE_ID are fields which should be stored in the CALL Table, not the CallDetail Table.

--

Secondly, you are doing a CROSS JOIN from a detail table to itself, with barely any WHERE expression to filter the data out. The HAVING clause occurs AFTER the data has been spooled out into tempdb to create the worktable for your GROUP BY expression. The more expressions you can take out of the HAVING clause and into the WHERE clause, the less data will be initially retrieved from the database and used in the worktable. Secondly, what exactly is the point of the cross join? Please explain why this won't work. It's an example of putting only the relevant data in the select statement, NOT having a CROSS JOIN, building a useful WHERE clause, and stripping out the HAVING clause:



SELECT 

    login.AGENT_ID

  , login.DATE_ID

  , login.TIME_ID

  , logout.DATE_ID

  , logout.TIME_ID

FROM ACD login 

INNER JOIN ACD logout 

ON  login.AGENT_ID = logout.AGENT_ID

AND login.DATE_ID = logout.DATE_ID

WHERE 

login.EVENT_ID = 9

ANDlogout.EVENT_ID = 10

GROUP BY login.AGENT_ID, login.DATE_ID, login.TIME_ID, logout.DATE_ID, logout.TIME_ID

jpipes SSChampion Points: 12230 More actions · Answer 5

I guess the point is that the indexing strategy you take is inconsequential if the queries and tables are poorly designed to begin with. If you normalize the table structures and do things like I wrote above, then the indexing strategy becomes much more clear: index first on those fields which will be most frequently accessed via WHERE clauses (and JOIN expressions), and consider different strategies for clustering based on how many queries are GROUPING on a certain set of data, or are doing RANGE queries (for instance, on your DATE_ID field?), etc.

--

Also, why aren't you using DATETIME fields?

5409045121009-7368 SSCoach Points: 15740 More actions · Answer 6

To latch on jpipes, maybe moving the where into the on clause. Do pickup some problems with it sometimes.

Buell SSC Journeyman Points: 79 More actions · Answer 7

Thx for this info, and a neat query. And we have Normalised the database. The Date_Interval table keeps complete track of the Date(date, weekDay, yearDate, yearWeek...) and similarly the Time_Interval table. As report request can be to see data between days, on day, Only a day, or Only Monday.... and many more queries. More over the ACD is sample of important fields that's Composite PKs. So otherwise they are fine.

I've tried both y'r queries, they were fine and better than mine (thanx). Both could get data of 20Million records in 5sec.

And thax for the Indexing needs, and info.

TakeCare,

Bue!!

TakeCare,
Bue!!