How to optimise the table performanace (90 lacs records)

  • Hi,

    I have a table with 5000 users along with 6 years of attendance data. If I want to retrieve user data from the table it takes time to load.

    Table Structure

    Usercode, Rosterdate, Shift code, AttendanceCode

    Case 1

    If i want to retrieve the shift information from 1-jun-2020 to 10-jun-2020 for 5000 users,

    would take time to retrieve the data in SQL server.

    what kind of index should i update in above table?

    Regards

    Sethu

  • This is a very general question, so I'm going to give a very general answer.

    If you intend to filter your information based on a particular column, one presumes Rosterdate from the description, then the index key goes on that column.

    However, there is more to it than that. Is this a clustered table or a heap? What's the clustered index key? With just an index on Rosterdate, you're looking at either an key lookup or an RID lookup, which could negatively affect performance. So, you might want to consider adding INCLUDE columns to your index on Rosterdate to cover the ShiftCode column (since you said that's what you want). However, guessing, you also want to see the Usercode and AttendeanceCode? Then those should also be included in the INCLUDE. At this point, you've just recreated a clustered index.

    So, is the table a heap? If so, then, create a clustered index on the Rosterdate column and your done.

    However, there's so much more to picking indexes and defining data storage. I'd strongly suggest doing a lot more studying on the topic.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Hi Grant,

     

    Thanks for the reply. My table is not a heap and we created a index on rosterdate, usercode and rosterdate, usercode & Shiftcode.

    Most of the reports based on these two combinations. For example , clients wants to show the list of employees with shift and attendance information between last 2 months and shift belongs to "Working day". My query like "select <<list of fields>> from LeaveRoster where rosterdate between <<from>> and <<to>> and shift = 'workingday'

    Total No employees: 5000

    Total No.Records : 80Lac approximately(both active & inactive employee attendance records)

     

     

    • This reply was modified 4 years, 6 months ago by  Grant Fritchey. Reason: Fixed some typos
  • what's "lac"?

  • @sterling3721 - lac is (forgive the dialect - I believe Indian/Gujarati) for thousand

     

    @sethumail2000-2 - those two indexes, are essentially left key subsets, I would very much recommend removing the index with 2 columns if these are to stay, duplicated indexes like this actually hurt rather then improve as they require more storage, IO, CPU, RAM etc to keep up to date with changes, so no point having two indexes where the key columns are in the same order.

    rosterdate, usercode

    rosterdate, usercode & Shiftcode

     

    Without seeing statistics and execution plans its tricky to see what is the best indexing strategy.

    But if your query is on ROSTERDATE and SHIFTCODE, then neither of the above indexes fully satisfy the need as it can seek on ROSTERDATE but then cant use the rest of the index to get to SHIFTCODE as your missing USERCODE.

    It they also comes down to distribution and which is more selective, 90,000 records but only 10 unique SHIFTCODES then put ROSTERDATE on the left of the index.

    But for the query

    select <<list of fields>> from LeaveRoster where rosterdate between <<from>> and <<to>> and shift = 'workingday'

    I would say there should be an index on RosterDate & ShiftCode

  • 1 lakh = 100,000

  • Anthony has nailed it. Those indexes are basically duplicates. Potentially, in the event of scans, you may, sometimes, see that smaller indexed used because scanning fewer pages is better. However, except for that, you'll see that the second index will work for queries that only refer to the two columns.

    Are you looking at the execution plans to understand how your indexes are being used?

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Lakh is derived from the Indo-European word for "salmon" and is related to the words "lox" and 'lacquer".  It should be obvious how "lox" is related to "salmon".  "Lacquer" was derived from the similarity in red color to salmon.  "Lakh" has a lengthier derivation.  When the Indic people moved away from the rivers where salmon were plentiful, the meaning became abstracted from specifically salmon to an attribute of salmon - their great numbers.  Eventually, this became a specific great number - 100,000.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply