May 3, 2012 at 7:38 am
blandry (5/3/2012)
Hi Cadavre,SUCCESS! Your first block of code works - no errors.
Do I need to try the second block if the first one works? (For now, I wont - but if you want me to - just say so)
Thanks very very much.
Awesome. I'd try both as a performance test.
OK, assuming the results are correct (you'll have to sanitize that), can you post the actual execution plan for me?
May 3, 2012 at 7:42 am
blandry (5/3/2012)
Lynn,I read the link below your signature with Jeff's recommendations - thats not going to work. First, we are not talking about 1 table - I would have many. Second, the "posting" (eg, visible to the world) of data is what is heavily frowned upon here - I would really be risking my job on that. Among our clients are some of the largest banks and investment houses and though we do some obfuscating of data, we got into trouble for publicly posting some of that stuff so I simply cannot do that. Its silly, its ridiculous, after all its not even financial data - but recently one big client saw just a filename and knew it was theirs and that created a stir here bigger than Occupy Wall Street.
You could you know... mock up fake data that's entirely made off the cuff. Then based on the output use the same said data to show what the output should look like. 🙂
May 3, 2012 at 7:45 am
mtassin (5/3/2012)
You could you know... mock up fake data that's entirely made off the cuff. Then based on the output use the same said data to show what the output should look like. 🙂
+1
If you can do what mtassin suggests, that would be awesome. As I said when I posted my "solution", it's ugly and needs plenty of tidying up. The actual execution plan will help some with that, but readily consumable sample data, ddl scripts and expected outcome based on the sample data would be much more helpful.
May 3, 2012 at 8:17 am
A couple things...
Cadavre, the code produces no errors, but it also produces no results. Or better said, all the counts are NULL. Running the same data through the older, drawn out proc, I get results, your code produces all NULLs, no counts.
It would also be nice to have the data ordered by the ModDate - but I get errors each time I try to add an ORDER BY clause.
As far as Mtassin's idea with data - this would be at least a half-day if not all-day project. Again, we are not talking about 1 table - we are talking about many of them and each one has to have corresponding key dependant data elsewhere. Our apps handle this nicely, but putting together data manually always winds up being a massive project - this is why we obfuscate. I am offering a BAK file with good test data, this is a no-brainer. If I have to go through manually, you are looking at days to build it, days to test it, and in the end, we wind up in a debugging loop just to get test data that is already being offered. If we cant do that, I accept it - just dont understand it.
Please also bear in mind that I am trying to get my head around this code you've sent me. Its very impressive and there is a lot to learn there for me. But the bigger problem with that is that the code does not work - so trying to learn it, and debug it as I go is absorbing quite a bit of time, and honestly, I dont think I am keeping up. I know the Boss will eventually pull the plug on the entire idea saying; "Its not perfect, but it works" and I will probably also get "Forget about it - we will worry about it later."
I will keep plugging away on this end to see if I can break this code down, get my head around it, and try to ease off pestering you folks so much.
May 3, 2012 at 8:34 am
Where are you hosting your .bak? I'll take a look at it sometime tonight if I can.
May 3, 2012 at 8:58 am
blandry (5/3/2012)
Lynn,I read the link below your signature with Jeff's recommendations - thats not going to work. First, we are not talking about 1 table - I would have many. Second, the "posting" (eg, visible to the world) of data is what is heavily frowned upon here - I would really be risking my job on that. Among our clients are some of the largest banks and investment houses and though we do some obfuscating of data, we got into trouble for publicly posting some of that stuff so I simply cannot do that. Its silly, its ridiculous, after all its not even financial data - but recently one big client saw just a filename and knew it was theirs and that created a stir here bigger than Occupy Wall Street.
I would be okay sharing some sample data with you, even give you a copy of the DB - but I cant post it - I can only leave it for you. If we cant pull that off, I understand, but thats my only option.
I know others have already said this, but I'll stress it again. Jeff does show how to pull actual data, but when we ask for sample data, that is what we want, sample data - data that isn't live production data. We understand the sensitive nature of a companies data and DON'T want you to post that. When you pull real data, the next step (not discussed probably) is sanitizing it so that it isn't identifiable. Hmm, perhaps that would make a good companion article to Jeff's.
To really help you we do need the tables, views, functions, stored procedures, data, everything that would help us work with you to solve a problem.
May 3, 2012 at 9:00 am
blandry (5/3/2012)
A couple things...Cadavre, the code produces no errors, but it also produces no results. Or better said, all the counts are NULL. Running the same data through the older, drawn out proc, I get results, your code produces all NULLs, no counts.
It would also be nice to have the data ordered by the ModDate - but I get errors each time I try to add an ORDER BY clause.
As far as Mtassin's idea with data - this would be at least a half-day if not all-day project. Again, we are not talking about 1 table - we are talking about many of them and each one has to have corresponding key dependant data elsewhere. Our apps handle this nicely, but putting together data manually always winds up being a massive project - this is why we obfuscate. I am offering a BAK file with good test data, this is a no-brainer. If I have to go through manually, you are looking at days to build it, days to test it, and in the end, we wind up in a debugging loop just to get test data that is already being offered. If we cant do that, I accept it - just dont understand it.
Please also bear in mind that I am trying to get my head around this code you've sent me. Its very impressive and there is a lot to learn there for me. But the bigger problem with that is that the code does not work - so trying to learn it, and debug it as I go is absorbing quite a bit of time, and honestly, I dont think I am keeping up. I know the Boss will eventually pull the plug on the entire idea saying; "Its not perfect, but it works" and I will probably also get "Forget about it - we will worry about it later."
I will keep plugging away on this end to see if I can break this code down, get my head around it, and try to ease off pestering you folks so much.
How big is the .bak file? Can you zip it, say using 7zip?
May 3, 2012 at 9:05 am
Lynn Pettis (5/3/2012)
I know others have already said this, but I'll stress it again. Jeff does show how to pull actual data, but when we ask for sample data, that is what we want, sample data - data that isn't live production data. We understand the sensitive nature of a companies data and DON'T want you to post that. When you pull real data, the next step (not discussed probably) is sanitizing it so that it isn't identifiable. Hmm, perhaps that would make a good companion article to Jeff's.
To really help you we do need the tables, views, functions, stored procedures, data, everything that would help us work with you to solve a problem.
Exactly as Lynn has said here. From looking at some of these queries, we're talking about 5 or 6 tables.
We don't need 5 million rows in each table... 3 or 4 should suffice.
May 3, 2012 at 10:33 am
Cadavre / Lynn,
Please check your private message - link and info are in there. BAK file is < 50MB, let me know if you have any trouble getting it.
As always many thanks - I realize we are not paying you guys for your time and expertise (yet?), so please, when you can get to this great - it is very much appreciated, but I respect that you probably have lives outside of SSC (you do, right???)
Let me know if you have any needs or problems.
Again, many thanks!
May 3, 2012 at 2:30 pm
blandry (5/3/2012)
Cadavre / Lynn,Please check your private message - link and info are in there. BAK file is < 50MB, let me know if you have any trouble getting it.
As always many thanks - I realize we are not paying you guys for your time and expertise (yet?), so please, when you can get to this great - it is very much appreciated, but I respect that you probably have lives outside of SSC (you do, right???)
Let me know if you have any needs or problems.
Again, many thanks!
I've not received a message, so not been able to look at it this evening. Will happily have a look at the weekend though if you can send it through (bank holiday on Monday so I have a whole day with nothing to do 😀 )
May 4, 2012 at 8:57 am
Cadavre...
Just sent you a private msg. Pick it up when you can and let me know if there are problems.
Thanks!
May 4, 2012 at 9:15 am
Downloaded and restored your DB last night. Went to bed right after, long day.
May 4, 2012 at 9:22 am
Just started to download now, will take a look on Sunday or Monday depending on time.
May 4, 2012 at 10:08 am
Lynn / Cadavre,
No problem - if you can look stuff over in a couple days, that is just fine - this is not any emergency issue.
Thanks for the heads up though, and as well, for your efforts on our behalf.
May 4, 2012 at 2:24 pm
blandry (5/4/2012)
Lynn / Cadavre,No problem - if you can look stuff over in a couple days, that is just fine - this is not any emergency issue.
Thanks for the heads up though, and as well, for your efforts on our behalf.
Straight off the bat I can fix the code I gave you.
DECLARE @ImportType INT = 7001, @ImportID CHAR(2) = '03',
@ImpDateFrom DATETIME = '2011-12-01', @ImpDateTo DATETIME = '2011-12-31',
@DateFrom DATETIME = '2010-01-01', @DateTo DATETIME = '2011-12-31',
@FileType VARCHAR(25) = '.XLS,.DOT,.DOC,.PPT,.MBD';
WITH CTE(n) AS(SELECT 1 UNION ALL SELECT 1),
CTE2(n) AS (SELECT 1 FROM CTE x, CTE y),
CTE3(n) AS (SELECT 1 FROM CTE2 x, CTE2 y),
CTE4(n) AS (SELECT 1 FROM CTE3 x, CTE3 y),
CTE5(n) AS (SELECT 0 UNION ALL
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM CTE4 x, CTE4 y)
SELECT
DATEADD(MONTH, DATEDIFF(MONTH, '1900-01-01', DATEADD(M,a.n,'1900-01-01')), 0) AS modifiedMonth,
SUM(IsAccess) AS AccessCount,
SUM(IsExcel) AS ExcelCount,
SUM(IsWord) AS WordCount,
SUM(IsPowerPoint) AS PowerPointCount
FROM CTE5 a
LEFT OUTER JOIN (SELECT
f.FileName,
f.LastModified,
DATEADD(MONTH, DATEDIFF(MONTH, 0, f.LastModified), 0) AS monthModified,
DATEADD(MONTH, DATEDIFF(MONTH, 0, MIN(f.LastModified) OVER()), 0) AS minMonthModified,
DATEADD(MONTH, DATEDIFF(MONTH, 0, MAX(f.LastModified) OVER()), 0) AS maxMonthModified,
f.FileType,
sf.ImportType,
sf.ImportID,
ft.Application,
CASE WHEN ft.Application = 'Access' THEN 1 ELSE 0 END AS IsAccess,
CASE WHEN ft.Application = 'Excel' THEN 1 ELSE 0 END AS IsExcel,
CASE WHEN ft.Application = 'Word' THEN 1 ELSE 0 END AS IsWord,
CASE WHEN ft.Application = 'PowerPoint' THEN 1 ELSE 0 END AS IsPowerPoint
FROM dbo.Files f
INNER JOIN dbo.Groups g ON f.GroupID = g.GroupID
INNER JOIN dbo.SourceFiles sf ON g.BatchID = sf.BatchID
INNER JOIN dbo.Import i ON sf.ImportID = i.ImportID
INNER JOIN dbo.FileTypes ft ON f.FileType = ft.FileExtension
WHERE f.FileType IN (SELECT * FROM dbo.fn_Split(@FileType, ',')) AND
f.LastModified > dateadd(day, - 1, CONVERT(DATETIME, @DateFrom, 102)) AND
f.LastModified < dateadd(day, 1, CONVERT(DATETIME, @DateTo, 102)) AND
sf.ImportDate > dateadd(day, - 1, CONVERT(DATETIME, @ImpDateFrom, 102)) AND
sf.ImportDate < dateadd(day, 1, CONVERT(DATETIME, @ImpDateTo, 102)) AND
CHARINDEX(dbo.PADL(sf.ImportId, 2, '0'), @ImportID) > 0 AND
sf.ImportType = @ImportType) b ON DATEADD(MONTH, DATEDIFF(MONTH, '1900-01-01', DATEADD(M,a.n,'1900-01-01')), 0) = b.monthModified
WHERE DATEADD(MONTH, DATEDIFF(MONTH, '1900-01-01', DATEADD(M,a.n,'1900-01-01')), 0) >= minMonthModified AND
DATEADD(MONTH, DATEDIFF(MONTH, '1900-01-01', DATEADD(M,a.n,'1900-01-01')), 0) <= maxMonthModified
GROUP BY DATEADD(MONTH, DATEDIFF(MONTH, '1900-01-01', DATEADD(M,a.n,'1900-01-01')), 0)
ORDER BY DATEADD(MONTH, DATEDIFF(MONTH, '1900-01-01', DATEADD(M,a.n,'1900-01-01')), 0);
Produces: -
modifiedMonth AccessCount ExcelCount WordCount PowerPointCount
----------------------- ----------- ----------- ----------- ---------------
2010-01-01 00:00:00.000 0 27 1 0
2010-02-01 00:00:00.000 0 16 5 0
2010-03-01 00:00:00.000 0 86 1 0
2010-04-01 00:00:00.000 0 6 12 14
2010-05-01 00:00:00.000 0 1 0 0
2010-06-01 00:00:00.000 0 0 0 2
2010-07-01 00:00:00.000 0 1 0 0
2010-09-01 00:00:00.000 0 701 0 0
2010-10-01 00:00:00.000 0 34 0 0
2010-11-01 00:00:00.000 0 14 10 3
2011-03-01 00:00:00.000 0 9 706 704
2011-05-01 00:00:00.000 0 21 1 2
2011-06-01 00:00:00.000 0 8 3 0
2011-07-01 00:00:00.000 0 0 1 0
2011-08-01 00:00:00.000 0 6 0 0
2011-09-01 00:00:00.000 0 13 6 7
2011-10-01 00:00:00.000 0 1054 18 1
2011-11-01 00:00:00.000 0 0 2 0
Table 'Worktable'. Scan count 3, logical reads 7143, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'FileTypes'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Files'. Scan count 14, logical reads 100, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Groups'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Import'. Scan count 0, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'SourceFiles'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#4CF5691D'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 62 ms, elapsed time = 66 ms.
Will do some performance improvements before I'm done, but that should give you some indication of a set based solution over a cursor or while loop 😀
What exactly do we want to return? I remember we wanted the counts of the file types, what else?
Viewing 15 posts - 61 through 75 (of 77 total)
You must be logged in to reply to this topic. Login to reply