understanding reindexing

  • this is just a question to help me understand what is going on in my reindexing process.

    I have a table with 31mil rows. There are three indexes defined on this table 1)PK_Lead - on int column, and two non-clustered indexes 2) lead_A - datetime,varchar(15) , and 3) LEAD_B - varchar(10), datetime. When reindexing occurs on the non-clustered indexes i get very different performance between them. Lead_A takes less then a min to rebuild but Lead_B averages over 30 min. I dont understand why there is such a difference between the rebuild time when the date in the indexes is similar (the datetime field is the same coulmn). My only theory is that Lead_A is less fragmented because the datetime column is first that the value in that column is added sequentially (insert audit column). In Lead_B the varchar(10) column is first and this is a random alphanumeric value so there are more page splits thus more fragmentation. Or if not fragmentation of index is it just that since the datetime is sequential in the table it is much easier to create index then non-sequential values? Are there any other reasons that one reindex takes so much longer then another one?

    I am running standard so i can not do the reindexing online. The indexes currently do not have a fill factor but I assume that if I added one to Lead_B this would just help in not having to reindex as often. Am I correct in this assumption?

    Bob
    -----------------------------------------------------------------------------
    How to post to get the best help[/url]

  • Just a shot in the dark here, since no one has replied to you yet... Check out sys.dm_db_index_physical_stats using the DETAILED scanning mode and see if the fragmentation, or page count is different for the indexes. DETAILED will report on each level of the index (both 1 and 2 in your case) so you can compare them against each other.

    Hope this helps!

    Chad

  • Thanks Chad. There appears to be more fragmentation, pages and records in the non-leaf levels of Lead_B index which is taking longer. Could this account for Lead_B index taking approx 30 times more time to rebuild?

    291NONCLUSTERED INDEXIN_ROW_DATA4035.7540996327143601342.5717231516280315464877.990462070669631708457

    291NONCLUSTERED INDEXIN_ROW_DATA4199.72577696526515861.86689419795222109464.4332962688411154648

    291NONCLUSTERED INDEXIN_ROW_DATA427581862.40733876945891094

    291NONCLUSTERED INDEXIN_ROW_DATA4301113.47170743760818

    571NONCLUSTERED INDEXIN_ROW_DATA400.339272638515987616613.28900421667218194099.716382505559731708467

    571NONCLUSTERED INDEXIN_ROW_DATA418.30769230769231873.7356321839080532588.195206325673381940

    571NONCLUSTERED INDEXIN_ROW_DATA42021261.0946380034594325

    571NONCLUSTERED INDEXIN_ROW_DATA4301110.5683222139856682

    Bob
    -----------------------------------------------------------------------------
    How to post to get the best help[/url]

  • So... the first index has two kinds of fragmentation. 1st - some of the pages and/or extents are out of order, 2nd- not all the pages are full of data. When you reindex, it shuffles the pages around to be in the right order and fills them to the fillfactor you specify (or the original fill factor). So with greater fragmentation there is more work to do, but it probably won't account for a 30x difference...

    When you say that the datetime is sequential, do you mean that there is a clustered index on the column? If so, maybe it's actually using the clustered index to build the non-clustered index faster.

    And now I'm really starting to dig... do the two varchar columns have different numbers of unique values? Are all the datetimes pretty much unique or are there duplicate datetime values?

    Chad

  • there is not a clustered index on the datetime column. it is just a time stamp, thus all distinct, of when the row was inserted into the table, the clustered index is an incrementing int identity, so i would assume that the Lead_A (which has datetime first) would not need to do many, if any, page splits as the order the values are inserted into the table is also the order of the index.

    As for the two varchar values, in lead_A (quick one) the varchar value is our vendor names so a potential of only approx 100 distinct values. In Lead_D (slow one) the varchar is an identification values thus everyone is distinct.

    Bob
    -----------------------------------------------------------------------------
    How to post to get the best help[/url]

  • Are you doing a rebuild of the indexes or a re-org? Are autostats turned on or off for the database?

  • I do both depending on the fragmentation. If its more then 30% I will do a rebuild. the reorganization is quite quick on both tables. auto stats is on.

    Bob
    -----------------------------------------------------------------------------
    How to post to get the best help[/url]

  • kind of off topic but when you said

    it is just a time stamp, thus all distinct

    are time stamps guaranteed unique or 'just as long as you don't do 2+ successive inserts really fast'?

  • I never bothered to check the uniqueness of the timestamp, and it turns out that there are numerous (200000) values that are the same.

    Bob
    -----------------------------------------------------------------------------
    How to post to get the best help[/url]

  • I'm starting to shoot a little wildly here, but do a distinct count on the columns for both indexes. It sounds like the fast index is composed of two columns with lots of duplicates (so there would be fewer rows), while the slow index has almost unique values for the leading index column (so there would be many more rows). From the stats you posted, I would guess about a 2-to-1 ratio, but maybe there's something being hidden there and the index is just that much larger.

    Chad

  • Unfortunately there is not that much of a discrepancy between the counts of the leading columns on the indexes (Lead_A, datetime, 32million and Lead_B, varchar, 27 million). Now five million rows is quite a bit but in my opinion not enough to account for the reindex time difference. the second column in Lead_A has only 53 distinct values and the second column in Lead_B is the same column as the Leading column in Lead_A so 32 million distinct values.

    I am sorry but I just realized (I didn't check it earlier) that Lead_A has the Primary key/clustered index column included in the index and there is no included column in Lead_B. Could this be the reason for the quick reindex?

    Bob
    -----------------------------------------------------------------------------
    How to post to get the best help[/url]

  • I don't think that would make too much of a difference. At the leaf level the index is either storing a pointer to the page containing the rows data or the clustered index key, so I wonder if including the key in the index is storing the data twice? Hmmm...

    I was actually thinking of all the index columns in conjunction with each other. Lets say that for each datetime there is exactly 1 varchar(15) record such that select distinct datetime, varchar(15) returns 32 million rows (32m x 1, the Lead_A index). If for each varchar(10) column there are 4 distinct datetime records, then select varchar(10), datetime would return over 100 million rows (27m X 4, the Lead_B index). If the difference were dramatic, it might explain the extra time. However, I doubt this is the case since it sounds like your datetime is almost unique (putting both indexes at similar counts), the varchars have similar average length (I'm guessing), and the output from sys.dm_index_physical_stats seemed to show just a 2-1 ratio, not a 10-1 or 30-1 ratio.

    I'm afraid I've dug to the end of my ideas. I can't think of anywhere else to look, but if I do, I'll sure post again. Sorry, and good luck!

    Chad

  • Thanks Chad, for the effort. This has, at least, given me some extra troubleshooting abilities/ideas I did not have before so effort was not futile.

    Bob
    -----------------------------------------------------------------------------
    How to post to get the best help[/url]

  • so I wonder if including the key in the index is storing the data twice? Hmmm...

    Tested:

    set nocount on;

    go

    create table IndexTest1 (

    ID int identity primary key,

    Val1 int,

    Val2 int,

    Val3 int);

    go

    insert into dbo.indextest1 (val1, val2, val3)

    select top 1000000 checksum(newid()), checksum(newid()), checksum(newid())

    from dbo.numbers n1

    cross join dbo.numbers n2;

    go

    create index IDX_IndexTest1_NotIncluded

    on dbo.indextest1 (val1, val2, val3)

    go

    create table IndexTest2 (

    ID int identity primary key,

    Val1 int,

    Val2 int,

    Val3 int);

    go

    insert into dbo.indextest2 (val1, val2, val3)

    select val1, val2, val3

    from dbo.indextest1;

    go

    create index IDX_IndexTest2_Include

    on dbo.indextest2(val1, val2, val3, id)

    include (ID);go

    create table IndexTest3 (

    ID int identity primary key,

    Val1 int,

    Val2 int,

    Val3 int);

    go

    insert into dbo.indextest3 (val1, val2, val3)

    select val1, val2, val3

    from dbo.indextest1;

    go

    create index IDX_IndexTest3_Leaf

    on dbo.indextest3 (val1, val2, val3, id);

    Index on table 1 ended up at 21.469 Meg, and on each of the other two ended up at 21.461 Meg. I tested it three times, same result each time. So, either there's something wrong with my test, or including the ID column in the index cut the size by 8 kb per million rows, or that's the margin for error.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Thanks GSquared. It looks like the margin of error. I got exactly the same number of pages in all 3 indexes, but the second had one more fragment than the others, which would have made made it exactly 8K smaller had it not been fragmented. For some reason it didn't even cross my mind to run a test like that, but sure enough there it is, plain as day.

    Chad

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply