How to fix a non-indexed huge table?

  • Heh... I missed THAT!  Not only does the table need some fixin', but you do need to add some memory!  My desktop box has a gig... a server running standard edition should have at least 2.  For enterprise edition, you really need to scale memory to the number of connections hitting the server but, certainly, no less that 2 gig.  Memory is pretty cheap... buy as much memory as the server and the edition of SQL can handle.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I totally aggree on Jeff, except that the "rename" of the new well designed table won't be a "snap". Rename is just not possible with SQL. It might look like in Management Studio. But what it does is create another new table (the one with the new name) an then copy data from the old -and this will take quite a while again and you need double space for a while. Then it'll drop the old one.

    Just keep that in mind when using it in production.

    Andreas

    ---------------------------------------------------
    MVP SQL Server
    Microsoft Certified Master SQL Server 2008
    Microsoft Certified Solutions Master Data Platform, SQL Server 2012
    www.insidesql.org/blogs/andreaswolter
    www.andreas-wolter.com

  • This is obviously a log file from a web server, so the table setup is probably the default log created by IIS when you choose ODBC logging. I've done a lot of analysis of web traffic and I think your stuck with the schema - referrers can be very long, queries can be very long (particulary with asp/jsp/cgi sites). I would instead suggest you look at the content and attempt to ditch some data. The problem is IIS is a slut when it comes to logging it just logs everything. So hits to your site from developers, search engine bots, hits to images on the page ... I typically tried to use the default log as a staging area, then do an etl process into new tables (obviuosly you need to talk to your management about what they want). I would get rid of hits to the site from our own domain (I don't care how many hits I got from my people reviewing the site, just outsiders coming in). Get rid of the bot hits, Recode the user agent, ditch all the hits to gif, jpg ... and keep only the asp, html and real "pages". I found I could reduce the web site by a qtr by doing all the above. Then I could apply indexes that acutally worked. The downside is your traffic will appear less, but it will be closer to truth and your queries will be faster.

  • or you could just exec sp_rename...

  • Jeff, in our shop, it's 2500% certain and nothing less. We had problems once when we were only 2100% certain. That day is now known as "The day of which will not be spoken." 
     
    Oh, and wangkhar (and Jeff) is correct. We use sp_rename on tables with billions of records, and it happens in no time.
  • Sorry Andreas... That's absolutely incorrect and you really need to check your facts before making such an unqualified statement... Try this... insert a column in the middle of an existing table using EM... now, look at the code that action generates... yes, it's creating a new table (like I am in my suggestion) and that will take some time especially for 60 million rows... but, near the end of the code, you'll find that even EM uses sp_Rename and it only takes milli-seconds.

    But, no sense arguing about the speed... a demonstration is in order... 

    The following creates a million row table and then measures the amount of time it takes to rename the table using sp_Rename... do read the comments in the code before you post back, eh?

    --===== Identify the environment to run the test in

        USE TempDB

        SET NOCOUNT ON -- Suppress the autodisplay of rowcounts for speed and appearance

    --===== Declare local variables

    DECLARE @bar       CHAR(78) --A line of "=" to separate sections on the report device

    DECLARE @StartDate DATETIME --Remembers when a section of code started

        SET @bar = REPLICATE('=',78)

      PRINT @bar

    --===== Create and populate a million row test table

         -- (~37 seconds to execute).

         -- Column "SomeDate" has a range of  >=01/01/2000  <01/01/2010

         -- That's ten years worth of dates.

         -- Identify the section and start the section timer

      PRINT 'Building the million row table...'

        SET @StartDate = GETDATE()

         -- Build and populate the table on the fly

     SELECT TOP 1000000

            RowNum     = IDENTITY(INT,1,1),

            SomeID     = CAST(RAND(CAST(NEWID() AS VARBINARY))*50000+1 AS INT),

            SomeValue  = CHAR(STR(RAND(CAST(NEWID() AS VARBINARY))*25+65))

                       + CHAR(STR(RAND(CAST(NEWID() AS VARBINARY))*25+65)),

            SomeNumber = CAST(RAND(CAST(NEWID() AS VARBINARY))*100 AS MONEY),

            SomeDate   = CAST(RAND(CAST(NEWID() AS VARBINARY))*3653.0+36524.0 AS DATETIME)

       INTO dbo.jbmBigTest

       FROM Master.dbo.SysColumns sc1 WITH (NOLOCK),

            Master.dbo.SysColumns sc2 WITH (NOLOCK)

         -- Display the rowcount and the duration for this section

      PRINT STR(@@ROWCOUNT,12) + ' Rows Inserted'

      PRINT CONVERT(CHAR(12),GETDATE()-@StartDate,114) + ' (hh:mi:ss.mil)'

      PRINT @bar

    --===== Rename the table to show how long it takes

         -- (~0 to 220 milliseconds to execute with an average less than 5 milliseconds)

         -- Identify the section and start the section timer

      PRINT 'Renaming the table from jbmBigTest to jbmBigTestRenamed...'

        SET @StartDate = GETDATE()

         -- Rename the table

       EXEC dbo.sp_Rename 'jbmBigTest', 'jbmBigTestRenamed'

         -- Display the duration for this section

      PRINT CONVERT(CHAR(12),GETDATE()-@StartDate,114) + ' (hh:mi:ss.mil)'

      PRINT @bar

    --===== Display the first 10 rows just to show the table has been renamed

      PRINT 'Selecting 10 rows from jbmBigTestRenamed to prove it''s there...'

     SELECT TOP 10 * FROM dbo.jbmBigTestRenamed

    --===== Housekeeping (and further proof that the table has been renamed)

       DROP TABLE jbmBigTestRenamed

       DROP TABLE jbmBigTest

      PRINT 'Test complete.'

    ... and here's the output from the run...

    ==============================================================================
    Building the million row table...
         1000000 Rows Inserted
    00:00:36:830 (hh:mi:ss.mil)
    ==============================================================================
    Renaming the table from jbmBigTest to jbmBigTestRenamed...
    Caution: Changing any part of an object name could break scripts and stored procedures.
    The object was renamed to 'jbmBigTestRenamed'.
    00:00:00:000 (hh:mi:ss.mil)
    ==============================================================================
    Selecting 10 rows from jbmBigTestRenamed to prove it's there...
    RowNum      SomeID      SomeValue SomeNumber            SomeDate                                               
    ----------- ----------- --------- --------------------- ------------------------------------------------------ 
    1           407         SH        45.0266               2009-08-02 18:07:35.383
    2           31297       YI        57.4772               2007-07-03 19:22:33.887
    3           19774       EK        9.5429                2007-08-13 19:13:12.447
    4           11851       FL        51.3669               2000-03-01 02:02:53.813
    5           24085       MQ        59.8272               2004-11-20 11:34:09.050
    6           47946       RD        33.5944               2000-02-25 23:35:55.947
    7           46825       EH        23.0357               2002-12-06 18:48:11.760
    8           10240       FI        56.5999               2004-03-14 21:07:18.460
    9           2467        SF        62.5958               2000-03-02 19:40:05.820
    10          486         EH        56.5331               2006-09-22 00:19:23.660

    Server: Msg 3701, Level 11, State 5, Line 56

    Cannot drop the table 'jbmBigTest', because it does not exist in the system catalog.

    Test complete.

    Like I said... it's a "snap" and it's fast.  Any questions, Andreas?

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Heh... point well taken, David... we have a day or two just like that.  Our "big one" is affectionately known as "Cycle 28".

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Halifaxdal...

    I just noticed that the size of the MDF table doubled when you added the column... you've done something wrong... very wrong... stop modifying the original table!!!

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Unless I'm missing something, his original table was about the same size, as he had three columns at 4096 each. While I realize that you questioned his need for that large of a varchar on two of them, it looks like he kept the same basic structure he originally had.

  • David,

    What I was looking at were the numbers from one of the previous posts (reprinted here for convenience...)

    Warning: The table 'IISLog' has been created but its maximum row size (13625) exceeds the maximum number of bytes per row (8060). INSERT or UPDATE of a row in this table will fail if the resulting row length exceeds 8060 bytes.

    (63008970 row(s) affected)

    Original files' size:

    mdf: 16,860,096

    ldf: 1,008,000

    New files' size:

    mdf:37,697,856

    ldf: 28,333,632

    Notice that the size of the MDF file more than doubled... adding a simple INT IDENTITY column probably would not account for that.  I'm thinking that an incorrect padding option was used and all the varchar columns have been right padded with spaces to the full extent of the data type... we can find that out by comparing DATALENGTH to LEN on each variable length column.  If true, the transfer is going to need an RTRIM on each variable length column... that's going to slow things down even more not to mention causing additional and unwanted logging.

    And, because someone is operating without the proverbial "net", they also managed to blow the log file way out of proportion and that someone (Halifaxdal) has even more work to do, now.  THIS is the reason we needed to use SELECT/INTO to create the IDENTITY column and to do other things... when done correctly, logging will be minimized.

    The tape backup boys are gonna be pissed about that log file!

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Halifaxdal,

    I/we are trying to help... in order to do that, I need answers to the questions I posed, earlier.  Of course, if you don't want the help or think you can do it on your own, no problem... but, considering that your little experiment caused the overall database storage requirements to grow from a piddley 18 gig to a whopping 66 gig just to add a single column to a medium sized table, I'm thinking you really need the help. 

    No wonder it took so long... those poor files had to grow many, many times.

    And, considering bnordberg's observations about this being a system log file (in which case, I wouldn't need those answers I asked for), it might be a bad thing to change anything but some indexes on the table.  We can repair all of this damage and an archive may be in order instead of modifying the table.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi Jeff, and other gurus,

    Thank you so much for the analysis. I apologize I reply late because of the holiday reason. (Happy Easterday! )

    Yes like bnordberg said this is from a system generated log file. I have no idea how the field is populated and what will be populated in the future, before I left company on Thursday I checked the max size of those varchar(4096) fields, and the maximum size I got is definitely less than 1000.

    While looking for fundamental database optimization, I am also thinking if I can do anything from the front-end, seriously! I used dotLucene in one of my other project, dotLucene is an excellent search engine that is usually used to index/search files. But it can be used to search database content too! I will try if it helps to sort it out. IT IS THUNDERING FAST! (probably I have to wait until Tuesday)

    Here is a link to dotLucene:

    http://www.codeproject.com/aspnet/DotLuceneSearch.asp

  • I have a local copy of the original data files, so it't not a problem to use your suggestion (SELECT/INTO) to create a new well-designed schema on the server.

  • FYI, here is some data from the table:

    select top 10 * from iislog

    date time c-ip cs-username s-ip s-port cs-method cs-uri-stem cs-uri-query sc-status cs(User-Agent) cs(Referer) IISLog_id

    ------------------------------------------------------ ------------------------------------------------------ --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ------ --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- --------- ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- -----------

    2006-11-01 00:00:00.000 1900-01-01 00:10:18.000 201.141.179.251 - 168.165.10.18 443 GET /Imagenes/movmis.gif - 200 Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1) https://inverweb.somecompany.com/InverWeb/MovtosCheques/MovChqsMisDia.asp?lang=0 1

    2006-11-01 00:00:00.000 1900-01-01 00:10:18.000 201.141.179.251 - 168.165.10.18 443 GET /Imagenes/b_acept.gif - 200 Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1) https://inverweb.somecompany.com/InverWeb/MovtosCheques/MovChqsMisDia.asp?lang=0 2

    2006-11-01 00:00:00.000 1900-01-01 00:10:19.000 189.145.60.86 - 168.165.10.18 443 GET /Imagenes/toppage.gif - 304 Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+SV1;+.NET+CLR+1.1.4322) https://inverweb.somecompany.com/inverweb/Default.asp 3

    2006-11-01 00:00:00.000 1900-01-01 00:10:19.000 189.145.60.86 - 168.165.10.18 443 GET /inverweb/rutinaJS.js - 304 Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+SV1;+.NET+CLR+1.1.4322) https://inverweb.somecompany.com/inverweb/Default.asp 4

    2006-11-01 00:00:00.000 1900-01-01 00:10:19.000 189.158.178.21 - 168.165.10.18 443 GET /inverweb/ConsultaMovimientos/default.asp id=99999%2C160%2C0250%2C2%2C1%2C&lang=0 200 Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+SV1;+.NET+CLR+1.0.3705;+.NET+CLR+1.1.4322;+Media+Center+PC+4.0) https://inverweb.somecompany.com/inverweb/Default.asp 5

    2006-11-01 00:00:00.000 1900-01-01 00:10:19.000 189.172.89.195 - 168.165.10.18 443 GET /Imagenes/movant_c.gif - 200 Mozilla/5.0+(Windows;+U;+Windows+NT+5.1;+en-US;+rv:1.7.2)+Gecko/20040804+Netscape/7.2+(ax) https://inverweb.somecompany.com/inverweb/ConsultaMovimientos/default_tc.asp?id=99999,238,0273,3,1,&lang=0 6

    2006-11-01 00:00:00.000 1900-01-01 00:10:19.000 148.233.37.34 - 168.165.10.18 443 GET /inverweb/blanco.asp - 200 Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.0) https://inverweb.somecompany.com/inverweb/ 7

    2006-11-01 00:00:00.000 1900-01-01 00:10:19.000 63.68.228.197 - 168.165.10.18 443 GET /Inverweb/Inverweb_imagenes/fondo.gif - 404 Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+SV1;+InfoPath.1;+.NET+CLR+1.1.4322;+.NET+CLR+2.0.50727) https://inverweb.somecompany.com/Inverweb/entrada.asp 8

    2006-11-01 00:00:00.000 1900-01-01 00:10:19.000 200.34.165.26 - 168.165.10.18 443 GET /Inverweb/salir.asp - 302 Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+SV1;+.NET+CLR+1.1.4322;+.NET+CLR+2.0.50727) https://inverweb.somecompany.com/inverweb/ 9

    2006-11-01 00:00:00.000 1900-01-01 00:10:19.000 189.172.89.195 - 168.165.10.18 443 GET /Imagenes/movtc.gif - 200 Mozilla/5.0+(Windows;+U;+Windows+NT+5.1;+en-US;+rv:1.7.2)+Gecko/20040804+Netscape/7.2+(ax) https://inverweb.somecompany.com/inverweb/ConsultaMovimientos/default_tc.asp?id=99999,238,0273,3,1,&lang=0 10

    (10 row(s) affected)

    I also have a very bad news to report:

    On Thursday just after I finished adding the index to the table using David's code, I was able to get the max length of a field in 9 mins, and then I check all the three varchar(4096) fields and get result in about 7 mins, but now, it's been 18 mins, result is still not there.... I'll check tomorrow.

  • >>or you could just exec sp_rename...

     

    which is not the point I was trying to make - I was trying to stress was the GUI (which USES sp_rename) actually DOES

    Andreas

    ---------------------------------------------------
    MVP SQL Server
    Microsoft Certified Master SQL Server 2008
    Microsoft Certified Solutions Master Data Platform, SQL Server 2012
    www.insidesql.org/blogs/andreaswolter
    www.andreas-wolter.com

Viewing 15 posts - 16 through 30 (of 40 total)

You must be logged in to reply to this topic. Login to reply