tempdb filled from ETL process

  • Hello,

    I have a sql job that runs an ETL process pumping and transforming data from one database into another in the same instance.

    it will pump almost about 500k rows into this database and transform it. the problem is it filled up the tempdb to almost a wopping 160gigs, which is not good.

    wondering if there is a way just to pump to data in without it using the tempdb, i am using an insert into from a select statement, i use it because the existing table it inserts into has indexes on it to make the read faster.

    any advice would help greatly.

    thanks in advance

  • Siten0308 (4/5/2016)


    Hello,

    I have a sql job that runs an ETL process pumping and transforming data from one database into another in the same instance.

    it will pump almost about 500k rows into this database and transform it. the problem is it filled up the tempdb to almost a wopping 160gigs, which is not good.

    wondering if there is a way just to pump to data in without it using the tempdb, i am using an insert into from a select statement, i use it because the existing table it inserts into has indexes on it to make the read faster.

    any advice would help greatly.

    thanks in advance

    You need to really dig into your tempdb usage. 500K rows taking 160GB means some VERY VERY FAT rows!! Maybe you built your indexes on the new data while all in the same transaction as the insert or something?? Could be other things going on too that consumed it, or a cross product by mistake, etc.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Hello,

    sorry but here is an update, so i am inserting 1014193 rows, and it made a 175gig temp db data files grow, i am trying to figure out if there is anyway to insert so much without having to use the tempdb?

    tried using while loop per project (each is broken down by project, but i am sure one project is bigger than the others), tried... cant believe it... cursor, nope, wasn't a good idea. any other ideas?

    thanks in advanced

  • What is the table definition?? Even for 1M rows 175GB means incredibly fat rows!! Got blobs by chance?!?

    I would look at tuning the ETL process. And first thing I would do with that is look for cartesian join scenarios that can explode your actual row counts.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • You are correct SQLGuru, sorry a little background, this is a Exploded BOM Tsql statment that i loop through, finding on the parent child etc. for all parts related to project(s) that are selected, there is no blobs in this, just big varchar columns.

    1 things i can think of that might help, but wondering if you agree, and that is to shrink the column size limit to (i think right now its 5000 varchar and there are 12 columns like that), 1000 varchar for all columns

    any other ideas?

  • As long as the varchar(5000) fields don't actually have a bunch of stuff in them then they won't take up space. And if they DO have stuff in them, then clearly it cannot be right to lop them all off at 1K total characters because you are going to be losing data.

    Do you need to repeat all the information for each level of the exploded BOM? That would be my first thought if you are doing that. The next would be to figure out how to do this directly into your table so you avoid touching tempdb at all.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • hmm, so I think i am catching what you are throwing, I can loop all the ID's, then later add the descriptions to it, after the explode part... will removing descriptions make a whole lot difference?

  • Well, if you are duplicating them in a temp table unnecessarily then they could be an issue. Of course updating rows isn't free either, especially when you make them larger in the process. My point was directed at unnecessary duplication of fields in the exploded BOM that really only belong with a parent say.

    I still don't understand why tempdb is involved in any case. Is there no way you can just go direct to the target here? No need to respond to the question. Helping you fix an ETL process goes WAY beyond freebie forum support. :hehe:

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Siten0308 (4/7/2016)


    so i am inserting 1014193 rows, and it made a 175gig temp db data files grow

    Every record must fit into a single data page.

    A data page is 8K.

    1 million rows without BLOBs cannot occupy more than 8GB of space.

    OK, 10 GB considering overheads. Indexing, fragmentation could increase the number to 20 GB,

    But that's it.

    What you need to look at is how you process the data.

    Hash joins, cartesian products on million rows tables - the "worktables" created by SQL Server won't fit into memory and will be taking space in tempdb.

    A lot of space.

    As 1 mil rows hash-joined to every of other 1 mil rows will produce a trillion of rows in a hash table.

    Check how you process the data.

    just big varchar columns.

    How big?

    varchar(MAX) by any chance?

    _____________
    Code for TallyGenerator

  • To add to what Sergiy posted, the data in the VARCHAR columns may have a serious but easy to fix problem. They may be filled with trailing spaces and that might account for the insane size of the table.

    Run the following code on each of the columns (changing the column name for each run, of course).

    SELECT COUNT(*)

    FROM dbo.YourTable --< Change this to match your table!

    WHERE LEN(SomeColA)<>DATALENGTH(SomeColA) --< Change the column name in two places for each column

    ;

    If you have a count larger than just one or two (zero is obviously preferable), then trailing spaces is your problem.

    How does that happen? Of course, "It Depends" but one way is when someone copies data from a fixed length datatype into a VARCHAR. That's what happened to me in the past.

    The "fix" for the existing data is to do an RTRIM on the data using an UPDATE. The fix for whatever process is building the data is to do an RTRIM of the data on the way in.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Sergiy (4/7/2016)


    Siten0308 (4/7/2016)


    so i am inserting 1014193 rows, and it made a 175gig temp db data files grow

    Every record must fit into a single data page.

    A data page is 8K.

    1 million rows without BLOBs cannot occupy more than 8GB of space.

    OK, 10 GB considering overheads. Indexing, fragmentation could increase the number to 20 GB,

    But that's it.

    just big varchar columns.

    How big?

    varchar(MAX) by any chance?

    Sorry Sergiy, but that is not true, and hasn't been for some time now. Row overflow pages can allow you to have massive row sizes. The OP's (12 was it??) 5000 byte varchar fields could be big trouble here if they are filled up for a large number of that 1M rows.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply