November 6, 2007 at 10:40 am
By the way, this back and forth over the use of "Over()" is kind of silly, when there are a ton of things I like about SQL Server 2005, that make it far from useless. That's what prompted this whole thing, and Over() was just something that was tossed out.
As an example that just bit someone here in the butt today (they had spent hours trying to figure out the problem until I pointed out one of my main beefs with DTS), is the way DTS displays workflow with transformations. That wouldn't happen in SSIS, because it handles workflow in a more logical manner.
In DTS, you have a green line going from an Execute SQL task to a datasource object. The datasource object, on the other hand is connected with a 10 transformations (black lines) to a datadestination object. In what order do the Execute SQL task and the transformations execute?
There are a ton more, both big and small, many of which are only going to be useful in certain environments. I also stated that there are things that I preferred in 2000. Again, that wasn't the argument. The argument was that 2005 is useless. It's not.
November 6, 2007 at 10:49 am
We don't have a single report that takes longer than 30 seconds to retrieve (typically they're in the 0-2 second range), thanks to some careful marting...
...Once I've gotten the data out of those into various marted tables, everything down the line is a piece of cake.
Come on, David... you just got done complaining about all of the extra terabytes of space it takes for covering indexes, yet you have the room for "careful marting" and all the preprocessing it takes? Go for it...
Also, I think you meant to saying "covering index", not "index", correct? Those are very different beasts in this conversation.
Yeah, David... I meant "covering index"... that's what we've been talking about for the last couple of pages of this thread and that's what I meant by "in the presence of the proper index".
The bottom line here is that the tests were for performance, nothing else... not a test for the shortest code. And the Inner Join in the presence of the "proper covering index" blows the doors off of the OVER method by a factor of 2 or 3 times. There was no previous mention of "careful marting" or anything else. It was a straight up test of which method was the fastest and Serqiy's Inner Join won hands down over OVER.
"Careful marting" is a whole different subject... and guess what? If you use the Inner Join method with a "proper covering index", that'll probably go quicker, as well 😛
--Jeff Moden
Change is inevitable... Change for the better is not.
November 6, 2007 at 11:08 am
Jeff Moden (11/6/2007)
Come on, David... you just got done complaining about all of the extra terabytes of space it takes for covering indexes, yet you have the room for "careful marting" and all the preprocessing it takes? Go for it...[/QUOTE]You think that a covering index would replace our need for marting? Ooookay.
A typical covering index on our source data would be hundreds of gigabytes, if not terabytes, in size all by itself, so yes, I have room for careful marting, but not covering indexes there. Where we do use covering indexes is when we have marted tables that serve as the source for other marted tables. In those cases, the covering indexes make plenty of sense, as the data we need out of the source is a tiny subset, both vertically and horizontally, of some rather wide and tall tables. If you think a covering index (or twenty) on our original two tables is going to replace the need for the marting tables, you're very sorely mistaken. I'll let you do the math, say an average of 500 million rows per day with 25 months of data. That's why we have marted tables.
"Careful marting" is a whole different subject... and guess what? If you use the Inner Join method with a "proper covering index", that'll probably go quicker, as well 😛
I stated above several times that I often use covering indexes, just not in our reporting environment, as again, it's not needed there.
November 6, 2007 at 11:52 am
That's 381.25 billion records in the table with an insert rate of 5800 records per second.
N 56°04'39.16"
E 12°55'05.25"
November 6, 2007 at 12:52 pm
David McFarland (11/6/2007)
Sergiy (11/5/2007)
David, what did you prove?[/QUOTE]That you were wrong. No big deal, as I've done it before, and I'll do it again (based on your "maths" skills, probably sooner rather than later).
In you dreams, poor boy.
In real life everybody can see what did you prove, but I not gonna mention it because you gonna scream again: "Mama, this bad uncle said bad word about me!" 😀
And if you mentioned math I will remind you about your shameful flight when you've proven yourself id... (oops, almost did it) talking about real weigh of a thing when scales show 0.00 kg.
So, go and do you favourite dull tasks and dream about terabytes of data you can never handle.
_____________
Code for TallyGenerator
November 6, 2007 at 2:16 pm
Peter Larsson (11/6/2007)
That's 381.25 billion records in the table with an insert rate of 5800 records per second.
In case someone tries to imagine the horror of running queries against hundreds of billions of rows (my average must have been off due to recent increases, because we're closer to 250 billion rows), keep in mind that this data is stored as partitioned views (it's actually very manageable this way), and start out as physical tables with a single day's data, eventually rolling up to a table of a week's data. This makes ad hoc stuff (which we have to do way too much of still) tolerable, as in almost all cases, even the adhoc stuff is typically for no more than a month's data at a time, and most often just a week. Because of the volume, we mart like crazy, so that the billions of rows become 10k rows for this particular mart, 100k for another, etc.
Even though it's large, it's nowhere near the warehouse of some other places I've consulted, and I doubt that it is as large as the warehouses that are handled by many here on the board, and I've never really looked at database size as some sort of e-peen contest anyway, especially since I don't really have a lot of control over how much data gets spit out to me. While there are a few things one has to keep in mind when working with VLDBs, it's not really hard, and in most cases, there's not a big difference in coding for a million row table and a billion row table. One of the places that I consulted for worked with the data of most of the Fortune 500 companies. Walmart, for example, makes our warehouse look like a little MS Access project.
Petabytes, terabytes, gigabytes. They're all just collections of bytes.
November 6, 2007 at 2:47 pm
Heh... no question that with that many rows of data that you need to do some data marting and that you're not actually trying to create reports from nearly 400 billion rows of unsummarized data... but you aren't doing nearly 400 billion rows per day... nah... at the worst case, you're marting those 500 million rows each day and, hopefully, in more than one chunk... if you did it in 50 or 100 million row chunks 5 or 10 times a day, the covering index would prove it's worth quite nicely.
So far as OVER being useless... I think it's "useless" if I have to take a 200% to 300% hit on performance in the face of marting 500 million rows per day. If your import rate is as Peter suggests, it's taking you almost 14 (total) hours just to do the import of 500 million rows... even at the record rate of 83K rows per second that you can sometime erg out with Bulk Insert, that's still gonna take about 1.6 hours. I'm thinking you need all the speed you can get on the entire process.
--Jeff Moden
Change is inevitable... Change for the better is not.
November 7, 2007 at 8:23 am
Jeff Moden (11/6/2007)
Heh... no question that with that many rows of data that you need to do some data marting and that you're not actually trying to create reports from nearly 400 billion rows of unsummarized data... but you aren't doing nearly 400 billion rows per day... nah... at the worst case, you're marting those 500 million rows each day and, hopefully, in more than one chunk... if you did it in 50 or 100 million row chunks 5 or 10 times a day, the covering index would prove it's worth quite nicely.
We actually don't break down the single day's data (well, we do, but across product lines, the largest of which is still a few hundred million rows per day), and we don't have a single marting procedure that uses Over(). We do pull the data all day long, but only mart at the end of the day. Most of our marting is to create metrics based on various granularity levels, not to create data for a specific report. Where we use Over, and will continue to use Over, is in report generating stored procedures that use the marted tables as their source. The reports are typically subsecond, the code is easily modfiable, and the performance matches other methods in cases where we're not going to add covering indexes, such as this.
It won't hurt my feelings if I'm the only person in the world who uses Over(), but I won't be. It's still the ideal option in our particular scenario.
Viewing 8 posts - 166 through 172 (of 172 total)
You must be logged in to reply to this topic. Login to reply