February 27, 2011 at 8:58 am
daveriya (2/25/2011)
But i am getting result,whats wrong in this?tell me
How can you tell which row is for registered, and which is not registered?
daveriya (2/27/2011)
group by o f what?registered
Yes.
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
February 27, 2011 at 9:43 am
Craig Farrell (2/25/2011)
My question, and yes it's a serious one, not just a rant:2) At the end of each month, a new table is created for each bank that contains monthly metrics consolidated at the account level.
The table naming convention is bankX_YYYYMM where X represents the numeric designation of the bank and YYYYMM indicates the 4 digit year and 2 digit month.
Are they STILL teaching this horrible mechanic to people?! Is this seriously what the book is teaching? If so, please give me the ISBN, I'd like to write a few scorching reviews. That technique should only be taught in advanced courses under the header of "Break only in emergency".
Interesting... why do you think this is such a bad practice especially if you don't have the Enterprise Edition? What would you use instead of this practice? And, no... I'm not taking a pot shot at you are anything of the like. I'm really curious as to what people would use as an alternative especially since using a single table over time could result in a very, very large number of rows, indeed.
Yes, I know that Table Partitioning is the way to go but only if you have the Enterprise Edition and not everyone has that luxury.
--Jeff Moden
Change is inevitable... Change for the better is not.
February 27, 2011 at 10:06 am
For someone who "needs to solve [this] on his own", you sure got the newsgroup to do a lot of your homework. The problem with that is who is going to sit next to you and take the exam? Just about every database exam I have ever had has had a substantial section on paper... mappings, ERDs, and all that jazz. Hope you find your book soon, and have the nerve to actually open it.
The problem with cheating on homework is that it's where you learn, and having someone else give you the answer completely defeats the purpose.
February 27, 2011 at 10:13 am
this is not a my hw,and this forum is about to help each other,i m learning so many things.
and i am learning through this blog
February 27, 2011 at 12:40 pm
One way to show its registered or not
SELECT SUM(num_trans) as 'No of Transactions', SUM(spend) as 'Total Spend','registered'
FROM (
select num_trans,spend from bank1_201001 where registered = 1
union all
select num_trans,spend from bank2_201001 where registered = 0
)
registered
union all
SELECT SUM(num_trans) as 'No of Transactions', SUM(spend) as 'Total Spend','Unregistered'
FROM (
select num_trans,spend from bank1_201001 where registered = 0
union all
select num_trans,spend from bank2_201001 where registered = 0
)
nonregistered
group by clause doesn fit anywhere here
February 27, 2011 at 1:08 pm
daveriya (2/27/2011)
this is not a my hw,and this forum is about to help each other,i m learning so many things.and i am learning through this blog
You have to pardon us, daveriya ... it DOES look like homework. 🙂 Although it's true that this forum is about helping each other, it's patently not about someone doing someone else's homework or helping them to get a job even though they know nothing. And, no, I'm not accusing you of any of that. I'm just letting you know that it has happened in the past and so people are a whole lot leary of someone who asks too many of the wrong kind of question.
As you said, you're new at SQL and your trying to learn. None of us have a problem with that and my hat is off to you especially if you're trying to self-study. You could, however, alleviate some of those fears by telling people where you got the questions you posted from. It's just a suggestion. 😉
--Jeff Moden
Change is inevitable... Change for the better is not.
February 27, 2011 at 1:17 pm
daveriya (2/27/2011)
this is not a my hw,and this forum is about to help each other,i m learning so many things.and i am learning through this blog
Please clarify the contradiction between your aforementioned statement and your comment from one of your other posts
i have deadline on monday and if i dont complete it ,i will be fail.
@jeff: a post history check might help to put this thread into a different perspective... 😉
February 27, 2011 at 2:10 pm
daveriya (2/27/2011)
@jeff,well dear,if u dont wanna to help me,just stay out of this matter,i dont mind.
i am not begging u ,to help me
thank u
Heh... first, don't call me "dear". You don't know me well enough.
Second, I WAS trying to help you but you just bit the hand that was trying to feed you. 😉
--Jeff Moden
Change is inevitable... Change for the better is not.
February 27, 2011 at 2:28 pm
LutzM (2/27/2011)
Please clarify the contradiction between your aforementioned statement and your comment from one of your other postsi have deadline on monday and if i dont complete it ,i will be fail.
@jeff: a post history check might help to put this thread into a different perspective... 😉
I usually don't make those types of checks. 😉 Thanks for pointing it out. Heh... maybe it's not homework but it sure is a life test. 😛
--Jeff Moden
Change is inevitable... Change for the better is not.
February 27, 2011 at 2:30 pm
Jeff Moden (2/27/2011)
Interesting... why do you think this is such a bad practice especially if you don't have the Enterprise Edition? What would you use instead of this practice? And, no... I'm not taking a pot shot at you are anything of the like. I'm really curious as to what people would use as an alternative especially since using a single table over time could result in a very, very large number of rows, indeed.Yes, I know that Table Partitioning is the way to go but only if you have the Enterprise Edition and not everyone has that luxury.
Sorry, Jeff, I missed this comment amongst the other items occuring in this thread.
Heh, nah, if you were taking a pot shot at me you'd use much stronger terms. 🙂 No worries there.
Most circumstances I've found in manual partitioning end up with a structure that doesn't require dynamic SQL, but it does require an understanding of what you're building for. The structure in particular that I find umbrage with is the 'one table per month' structure, an automated method that also gets out of hand eventually. This requires constant dynamic SQL as well as making optimization painful, at best.
In my personal experimentations, I've found one of two methods to be most useful, but it depends on usage of the data. If it's rarely used (say once a day), a historic table with a temp table structure to pop the necessary data from date to date and then used from there can work rather effectively against the entire history.
The other option is a data redundancy, which I'm aware is against standard best practice in general but for cases like this at least allows you to control optimization more efficiently. In usual cases after a review of the data for companies that need histories like this they've got an area of 80% heavy usage, about 10-15% of marketing lastyear-thisyear kind of usage, and ~5% of real historical data requirements. I've been in a few scenarios where the heavy usage category goes up to 95%.
Either way, in this case, a simple maintenance structure hooked to a daily data removal/archiving helps. For example, you'd set up three tables, RecentData, MarketCompare, and Archive. Recent Data would be used for most things, MarketCompare for the majority of your marketing items, and Archive if you needed to yank out some historicals. An IF control on a wrapper proc then would execute the necessary query against whichever three tables actually needed your data from after evaluating the parameters. These tables will usually have completely different indexing mechanics as well, since they're typically used for completely different functions.
To keep something like this intact, you would, by either trigger or sproc, insert any new row to all three tables, thus causing the replication. Run a daily job to delete any row past a certain date/matching certain criteria, to keep your RecentData and your MarketCompare on track.
It's obviously a workaround to proper enterprise partitioning, but it's usable all the way down to lowly MS Access levels, and keeps your maintenance of the data to the data layer, not the schema layer.
This causes some database bloat, obviously, but it can be kept to a reasonable level.
However, this does skip the crux of your comment. When data has reached this volume, the proper course is to move to an Enterprise level of whatever your software choice is (MS SQL we hope!), allowing for proper data partitioning and similar items.
The entire idea of constantly regenerating new schema via maintenance bugs me at a visceral level, as well, so perhaps my initial reaction was a tad strong to it. My apologies if anyone took offense, I was annoyed at the book, not the OP. I do hope that was clear.
EDIT: I should mention again, this requires the developer to understand the data and the intended business usage, research we might not have time to invest in.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
February 27, 2011 at 3:26 pm
Craig Farrell (2/27/2011)
Jeff Moden (2/27/2011)
Interesting... why do you think this is such a bad practice especially if you don't have the Enterprise Edition? What would you use instead of this practice? And, no... I'm not taking a pot shot at you are anything of the like. I'm really curious as to what people would use as an alternative especially since using a single table over time could result in a very, very large number of rows, indeed.Yes, I know that Table Partitioning is the way to go but only if you have the Enterprise Edition and not everyone has that luxury.
Sorry, Jeff, I missed this comment amongst the other items occuring in this thread.
Heh, nah, if you were taking a pot shot at me you'd use much stronger terms. 🙂 No worries there.
I never know how someone will take my rather direct questions because it's difficult to write the correct "tone" into such posts. Thanks for taking it the right way.
Here's my humble opinion on the subject and, yes, I agree... "It Depends".
This requires constant dynamic SQL as well as making optimization painful, at best.
We had a system which not only made monthly tables, but daily tables (1 for each bill cycle) and each table was in a separate daily database to boot. Yeah, I know... that's a bit over the top so far as I was concerned (and, no, I didn't design the system). The existance of multiple tables didn't require "constant" dynamic SQL for the queries any more than monthly table partitioning would. We simply rebuilt a small handful of "pass through" views once a month to look 30, 60, 90, and 180 days back as well a year back (all required for certain "aging" processes). Anything older than 1 year 4 months (so we could have time to do year-end runs) would go through a final backup, be added to a summary table, and dropped.
We didn't just do this in a willy-nilly fashion, either. Each table had the required constraint so that it would work as a real "partitioned view" which doesn't quite live up to being a partitioned table but does have a huge number of advantages such as not having to reindex or reorg all rows because the older tables never got updated (thus, neither did their separate indexes) and the fact that no matter what the query was, if it only addressed the most recent data, it would only use the most recent table.
I guess the key for us was making the monthly views and we had a job that did that automatically.
It also made for MUCH quicker restores if we ever had a problem with the main database because we didn't have to restore so much legacy data (about 4 million rows of data per day). 😀 Of course, backups were a relative breeze. As a good friend of mine said, "When you do the laundry, do you wash the clean clothes?". Of course not. That's why we only backed up databases that had changed or added data and having them split up, like we did, save a huge amount of tape and time. 🙂
On the flip side, I've also seen where people very effectively use a single large table of millions and millions of rows. However, they still have major problems when they need to do something like add a new column or index. Also, unless they have the Enterprise Edition, I've also seen how long it takes to rebuild the indexes on such a large table when needed. Having the data split up into monthly or even daily databases or tables sure makes life a lot easier for something like that especially in a high transaction 24/7 application.
To summarize, I'm not so much against monthly or even daily tables and databases IF they're done in a proper fashion.
Thanks for the feedback, Craig. I sure do appreciate it.
--Jeff Moden
Change is inevitable... Change for the better is not.
February 27, 2011 at 6:12 pm
Jeff Moden (2/27/2011)
I never know how someone will take my rather direct questions because it's difficult to write the correct "tone" into such posts. Thanks for taking it the right way.Here's my humble opinion on the subject and, yes, I agree... "It Depends".
Humble, perhaps, but wise and experienced. So, I'll digress this thread a bit more, if you're willing, to discuss this further.
We had a system which not only made monthly tables, but daily tables (1 for each bill cycle) and each table was in a separate daily database to boot. Yeah, I know... that's a bit over the top so far as I was concerned (and, no, I didn't design the system). The existance of multiple tables didn't require "constant" dynamic SQL for the queries any more than monthly table partitioning would. We simply rebuilt a small handful of "pass through" views once a month to look 30, 60, 90, and 180 days back as well a year back (all required for certain "aging" processes). Anything older than 1 year 4 months (so we could have time to do year-end runs) would go through a final backup, be added to a summary table, and dropped.
So, instead of having the procedure add union alls for only the tables that would be required on a per call basis, you standardized certain views for certain distances against the tables? Out of curiousity, what mechanic did you use to decide which views should be used, and were these indexed views? It sounds somewhat similar to my intent, though you had a few more of them, though if they were unindexed like it sounds from the next section at least you didn't have a data bloat.
We didn't just do this in a willy-nilly fashion, either. Each table had the required constraint so that it would work as a real "partitioned view" which doesn't quite live up to being a partitioned table but does have a huge number of advantages such as not having to reindex or reorg all rows because the older tables never got updated (thus, neither did their separate indexes) and the fact that no matter what the query was, if it only addressed the most recent data, it would only use the most recent table.
This is a functionality that when I read this I'd never seen SQL do. So, I decided a quick test was in order. Unfortunately, I can't seem to reproduce what I'm understanding you to be saying, that if you union your 30 days of the month, and only want data from one day, you won't hit all the tables. Perhaps you can help me? I've shown the code below from my brief test, and attached the .sqlplan I get from both the full UNION ALL and the specific SELECT for a range from the data.
Because of the necessity to still touch all the tables, I'm not sure of the gain.
IF OBJECT_ID ( 'tblUnionTest_a') IS NOT NULL
DROP TABLE tblUnionTest_a
IF OBJECT_ID ( 'tblUnionTest_b') IS NOT NULL
DROP TABLE tblUnionTest_b
IF OBJECT_ID ( 'tblUnionTest_c') IS NOT NULL
DROP TABLE tblUnionTest_c
CREATE TABLE tblUnionTest_a (txt VARCHAR(100), dtTxt DATETIME)
CREATE TABLE tblUnionTest_b (txt VARCHAR(100), dtTxt DATETIME)
CREATE TABLE tblUnionTest_c (txt VARCHAR(100), dtTxt DATETIME)
INSERT INTO tblUnionTest_a
SELECT
NEWID(),
DATEADD( dd, -N, getdate())
FROM
tempdb..Tally
WHERE
N<= 10
INSERT INTO tblUnionTest_b
SELECT
NEWID(),
DATEADD( dd, -( N + 30), getdate())
FROM
tempdb..Tally
WHERE
N<= 10
INSERT INTO tblUnionTest_c
SELECT
NEWID(),
DATEADD( dd, - ( N + 60), getdate())
FROM
tempdb..Tally
WHERE
N<= 10
select * from tblUnionTest_a UNION ALL
select * from tblUnionTest_b UNION ALL
select * from tblUnionTest_c
GO
create view vwUnionTest
AS
select * from tblUnionTest_a UNION ALL
select * from tblUnionTest_b UNION ALL
select * from tblUnionTest_c
GO
SELECT * from vwUnionTest
WHEREdtTxt BETWEEN DATEADD( dd, -45, getdate()) and dateadd( dd, -30, getdate())
CREATE CLUSTERED INDEX idx_tblUnionTest_a ON tblUnionTest_a (dtTxt)
CREATE CLUSTERED INDEX idx_tblUnionTest_b ON tblUnionTest_b (dtTxt)
CREATE CLUSTERED INDEX idx_tblUnionTest_c ON tblUnionTest_c (dtTxt)
SELECT * from vwUnionTest
WHEREdtTxt BETWEEN DATEADD( dd, -45, getdate()) and dateadd( dd, -30, getdate())
I guess the key for us was making the monthly views and we had a job that did that automatically.
It also made for MUCH quicker restores if we ever had a problem with the main database because we didn't have to restore so much legacy data (about 4 million rows of data per day). 😀 Of course, backups were a relative breeze. As a good friend of mine said, "When you do the laundry, do you wash the clean clothes?". Of course not. That's why we only backed up databases that had changed or added data and having them split up, like we did, save a huge amount of tape and time. 🙂
Well, I could certainly see that as a value to having every table in its own database, but wouldn't it have been simpler to filegroup them, if you were generating separate mdf/ldf combos anyway? Well, I guess it's a moot point, as you mentioned, a bit of overkill. I have to say that would certainly cut down on "ooops, can you restore x table" time, though... 🙂
Also, personally, 4 million inserts per day isn't what I'd expect a kid coming straight out of school to know how to deal with anyway. Your method and needs, in my (not always so humble, but I do try) opinion would fall under the "in case of emergency" level, because it's a comprehensive method, not just a workaround to too much data, and you would take an end to end approach to using it. I've seen way to many of these implemented poorly.
On the flip side, I've also seen where people very effectively use a single large table of millions and millions of rows. However, they still have major problems when they need to do something like add a new column or index. Also, unless they have the Enterprise Edition, I've also seen how long it takes to rebuild the indexes on such a large table when needed. Having the data split up into monthly or even daily databases or tables sure makes life a lot easier for something like that especially in a high transaction 24/7 application.
To summarize, I'm not so much against monthly or even daily tables and databases IF they're done in a proper fashion.
Thanks for the feedback, Craig. I sure do appreciate it.
You're correct, and index rebuilds on your historicals certainly would take a while, but to compare the methods, it's why you would typically have your RecentlyAdded table nice and tight. However, even to that, 4m*30days, 120 million rows is rarely quick to reindex, and if you're a Three/Four Nines server, you'd probably have to do some manipulation to stay in allowances unless you're running a beast. For that, I agree with you, and there's value there. Just teaching it to students as a standard method of dividing data bugs me.
Thank you for your feedback as well, Jeff. I hope you're able to assist with my conundrum regarding the view usage. I like chewy conversations like this one. 🙂
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
February 27, 2011 at 7:42 pm
So, instead of having the procedure add union alls for only the tables that would be required on a per call basis, you standardized certain views for certain distances against the tables? Out of curiousity, what mechanic did you use to decide which views should be used, and were these indexed views?
Correct. We had standardized views that would "look back" using UNION ALLs. As to the mechanism, the databases had a date component in their name in the ISO format which made them sortable and easily selectable from (at the time) dbo.sysdatabases (it was SQL Server 2000). We had a scheduled job that ran in the first few instants of the first day of the month that would rebuild the views on the fly.
Also, personally, 4 million inserts per day isn't what I'd expect a kid coming straight out of school to know how to deal with anyway. Your method and needs, in my (not always so humble, but I do try) opinion would fall under the "in case of emergency" level, because it's a comprehensive method, not just a workaround to too much data, and you would take an end to end approach to using it. I've seen way to many of these implemented poorly.
Thank you for the kind words and, yes, I agree on all fronts. I do tend to "over-think" on these types of things considering such off the wall topics such as time to restore, tape footprint, nightly maintenance, etc, etc, ad infinitum. And I also agree that I've seen way too many of these types of things implemented very poorly. If it IS going to be taught, it needs to be taught correctly and the option of table partitioning should be taught in parallel so people know what they're getting into.
This is a functionality that when I read this I'd never seen SQL do. So, I decided a quick test was in order. Unfortunately, I can't seem to reproduce what I'm understanding you to be saying, that if you union your 30 days of the month, and only want data from one day, you won't hit all the tables. Perhaps you can help me? I've shown the code below from my brief test, and attached the .sqlplan I get from both the full UNION ALL and the specific SELECT for a range from the data.
Because of the necessity to still touch all the tables, I'm not sure of the gain.
It's not necessary to touch all of the tables BUT... you do have to build the underlying tables and the partitioned view correctly. You MUST have a non-overlapping check constraint on the "partitioning column" and the partitioned column MUST (I've never been able to get it to work any other way) be a part of the PK (something they don't come right out and tell you in BOL).
Here's the code that demos it all including inserts that automatically partition themselves to the correct table(s).
--===== Do these tests in a nice safe place that everyone has.
USE TempDB
;
--===== Conditionally drop test tables/views to make reruns easier.
IF OBJECT_ID ( 'dbo.UnionTest_201101') IS NOT NULL DROP TABLE UnionTest_201101;
IF OBJECT_ID ( 'dbo.UnionTest_201102') IS NOT NULL DROP TABLE UnionTest_201102;
IF OBJECT_ID ( 'dbo.UnionTest_201103') IS NOT NULL DROP TABLE UnionTest_201103;
IF OBJECT_ID ( 'dbo.UnionTest_3Month') IS NOT NULL DROP VIEW UnionTest_3Month;
--===== Recreate the test tables.
-- DO notice the CHECK contraint on the SomeDate table which
-- will serve as the PARTITIONING COLUMN. Also notice that
-- it IS included in the PK and that it's a different constraint
-- on each table.
CREATE TABLE dbo.UnionTest_201101
(
RowID UNIQUEIDENTIFIER,
SomeDate DATETIME NOT NULL CHECK (SomeDate >= '20110101' AND SomeDate < '20110201'),
SomeAmount DECIMAL(9,2) NOT NULL,
CONSTRAINT PK_UnionTest_201101 PRIMARY KEY CLUSTERED (SomeDate, RowID)
)
;
CREATE TABLE dbo.UnionTest_201102
(
RowID UNIQUEIDENTIFIER,
SomeDate DATETIME NOT NULL CHECK (SomeDate >= '20110201' AND SomeDate < '20110301'),
SomeAmount DECIMAL(9,2) NOT NULL,
CONSTRAINT PK_UnionTest_201102 PRIMARY KEY CLUSTERED (SomeDate, RowID)
)
;
CREATE TABLE dbo.UnionTest_201103
(
RowID UNIQUEIDENTIFIER,
SomeDate DATETIME NOT NULL CHECK (SomeDate >= '20110301' AND SomeDate < '20110401'),
SomeAmount DECIMAL(9,2) NOT NULL,
CONSTRAINT PK_UnionTest_201103 PRIMARY KEY CLUSTERED (SomeDate,RowID)
)
;
GO
--===== Recreate the test view
CREATE VIEW dbo.UnionTest_3Month AS
SELECT RowID, SomeDate, SomeAmount FROM dbo.UnionTest_201101 UNION ALL
SELECT RowID, SomeDate, SomeAmount FROM dbo.UnionTest_201102 UNION ALL
SELECT RowID, SomeDate, SomeAmount FROM dbo.UnionTest_201103
;
GO
--===== Define the limits of our test
DECLARE @StartDate DATETIME,
@EndDate DATETIME,
@Days INT
;
SELECT @StartDate = '20110101',
@EndDate = '20110331',
@Days = DATEDIFF(dd,@StartDate,@EndDate)
;
--===== Populate all 3 tables through the updateable view.
-- NOTICE THAT THE CORRECT DATED ROWS EACH WENT INTO
-- THE CORRECT UNDERLYING TABLE.
INSERT INTO dbo.UnionTest_3Month
(RowID, SomeDate, SomeAmount)
SELECT TOP 10000
RowID = NEWID(),
SomeDate = RAND(CHECKSUM(NEWID())) * @Days + @StartDate,
SomeAmount = RAND(CHECKSUM(NEWID())) * 100 + .01
FROM sys.All_Columns ac1
CROSS JOIN sys.All_Columns ac2
;
--===== Demonstrate that there is no misplaced data...
SELECT * FROM dbo.UnionTest_201101 WHERE MONTH(SomeDate) <> 1;
SELECT * FROM dbo.UnionTest_201102 WHERE MONTH(SomeDate) <> 2;
SELECT * FROM dbo.UnionTest_201103 WHERE MONTH(SomeDate) <> 3;
--===== ... and that each table was correctly filled.
SELECT * FROM dbo.UnionTest_201101;
SELECT * FROM dbo.UnionTest_201102;
SELECT * FROM dbo.UnionTest_201103;
--===== Now, demonstrate the view. Do this with the actual execution plan turned on.
-- You'll see that it selects from just one table even though the view has 3 tables.
SELECT * FROM dbo.UnionTest_3Month WHERE SomeDate >= '2011-03-15' AND SomeDate < '2011-03-16'
;
--Jeff Moden
Change is inevitable... Change for the better is not.
February 27, 2011 at 7:44 pm
Craig,
Look up Partitioned Views in Books Online.
Paul
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
Viewing 15 posts - 16 through 30 (of 46 total)
You must be logged in to reply to this topic. Login to reply