November 16, 2011 at 7:04 am
I’m under a big deadline for a customer, up most of the night, and I have a T-SQL script that aggregates the departments (from 0 to 49) a customer has shopped into a BIGINT value which is meant to be a mask of all the departments the customer has shopped. There’s a lot more going on, but I created this failing script and was hoping you could look at this quickly. Why does the final value in the table show 8 instead of 15???? Any help would be most appreciated. Thanks, Don Bouchard
DECLARE @Result TABLE (CustomerNum DECIMAL(10,0), Mask BIGINT)
DECLARE @Depts TABLE (CustomerNum DECIMAL(10,0), Dept INT)
DECLARE @Big2 BIGINT
SET NOCOUNT ON
SET @Big2 = 2
-- Initialize the row for this customer to be 0
INSERT @Result ( CustomerNum, Mask )
VALUES ( 12345, 0 )
INSERT @Depts (CustomerNum, Dept) VALUES (12345, 0)
INSERT @Depts (CustomerNum, Dept) VALUES (12345, 1)
INSERT @Depts (CustomerNum, Dept) VALUES (12345, 2)
INSERT @Depts (CustomerNum, Dept) VALUES (12345, 3)
SELECT * FROM @Result
SELECT * FROM @Depts
-- I was hoping that the following statement would loop through the rows in the @Depts table and OR the values there with the current mask value in the @Results table
UPDATE @Result SET Mask = R.Mask | POWER(@Big2, D.Dept) FROM @Result R INNER JOIN @Depts D ON R.CustomerNum = D.CustomerNum
SELECT * FROM @Result
November 16, 2011 at 7:08 am
Donald Bouchard (11/16/2011)
I’m under a big deadline for a customer, up most of the night, and I have a T-SQL script that aggregates the departments (from 0 to 49) a customer has shopped into a BIGINT value which is meant to be a mask of all the departments the customer has shopped. There’s a lot more going on, but I created this failing script and was hoping you could look at this quickly. Why does the final value in the table show 8 instead of 15???? Any help would be most appreciated. Thanks, Don Bouchard
This is what I get when I run your query.
CustomerNum Mask
--------------------------------------- --------------------
12345 0
CustomerNum Dept
--------------------------------------- -----------
12345 0
12345 1
12345 2
12345 3
CustomerNum Mask
--------------------------------------- --------------------
12345 1
Please clarify.
November 16, 2011 at 7:11 am
The result you got is strange because on my sql server, I get 8 which is the last row that was inserted into the @Depts table. The bottom line is that the value I'm getting returned is just one of the department values, not all of them or'ed together which is what I would expect.
November 16, 2011 at 7:15 am
R.Mask | POWER(@Big2, D.Dept)
Why are you using Bitwise OR here? Is it required for your logic? Remove it & verify the results.
November 16, 2011 at 7:17 am
The Bitwise OR is the entire reason for the logic! I am producing a bit flag of all the departments a customer has shopped at with each bit for each department.
November 16, 2011 at 7:23 am
Donald Bouchard (11/16/2011)
The Bitwise OR is the entire reason for the logic! I am producing a bit flag of all the departments a customer has shopped at with each bit for each department.
Out of my league... Never understood BIT operations... 😀
November 16, 2011 at 7:40 am
Are you trying to update the @Result table with an aggregate from the @Depts table? If so, you need to change your update since you have multiple rows for the CustNum. The final result is just going to be one of the records from the @Depts table. To see what I'm talking about, just try
UPDATE @Result
SET Mask = D.Dept
FROM @Result R
INNER JOIN @Depts D
ON R.CustomerNum = D.CustomerNum
And you'll most likely get 0 as the Mask value. But change the order of the inserts into the @Depts table and you'll get another result. You need to do your aggregate from the @Depts table first (maybe a subquery) so that you have one row for each CustomerNum and then apply your update to the @Results table. IIRC, most other SQL flavors will error if you don't have a 1 to 1 match on an update, but that T-sql lets you get away with it.
MWise
November 16, 2011 at 7:42 am
Bitwise ORing powers of 2 is just like adding them. You can try this:
update R
set Mask=(select sum(POWER(@Big2, D.Dept)) from @Depts D where D.CustomerNum = R.CustomerNum)
from @Result R
November 16, 2011 at 7:44 am
No, they are not just like adding, close but not exact. If I had multiple rows with the same department number, it would not work. Two records with a department number of 2 would give the result of 4 instead of the desired result of 2.
November 16, 2011 at 7:47 am
In this case you can use "distinct":
update R
set Mask=(select sum(distinct POWER(@Big2, D.Dept)) from @Depts D where D.CustomerNum = R.CustomerNum)
from @Result R
November 16, 2011 at 7:53 am
That UPDATE statement won't loop through the rows in @Dept - it will use the first row in @Depts that it finds and then stop. That explains why the value you got is different from the one Cadavre got. If you want to loop, you have to use a loop - and that means a cursor or a WHILE loop. I'm sure someone will correct me if there's a way of aggregating ORs that would make a loop unnecessary. Maybe a recursive CTE would do the trick, although I couldn't guarantee that would perform any better than a loop.
Having said all that, you may wish to reconsider your use of bitmasks, since it breaks the rules of normalisation by storing more than one value in the same column, as well as getting you into situations like this one!
John
November 16, 2011 at 7:53 am
Is the Dept field also supposed to be a mask? Meaning if the customer shopped in both departments 1 and 2 then he should have a value of 3 (or in binary 11). If Dept is a mask then why would you have 4 records for the same customer? I haven't gotten to thinking of a solution, still trying to understand the issue.
November 16, 2011 at 8:18 am
The department number field is not a bit mask but simply a number between 0 and 49 (actually 99 in the real world). My actual problem is the fact that I have millions of sales rows with customernum and departmentnum fields that I need to aggregate in some way. My actual goal is to have a table called 'DeptsShopped' with a single record for each customer that will show which departments that customer has shopped in ever. I understand the other poster's comments about normalization, but sometimes the real world demands exceptions. My actual problem is that I have departments that go from 0 to 99 thus requiring me to have 2 64bit mask values with the first containing a bit mask for departments numbered 0 thru 49 and the second for departments numbered 50 thru 99. With my approach, once the DeptsShopped table is created (that's where I'm having my current problems!), asking questions like 'Show me all customers that have shopped in departments 1 and (2 or 3)?' is easy.
SELECT CustomerNum FROM DeptsShopped WHERE Mask & 2 <> 0 AND (Mask & 4 <> 0 OR Mask & 8 <> 0)
versus the following code if I didn't have the aggregated bit mask records:
SELECT DISTINCT CustomerNum FROM RawSalesData RSD
WHERE (SELECT COUNT(*) FROM RawSalesData WHERE DepartmentNum = 1 AND CustomerNum = RSD.CustomerNum) > 0) AND
((SELECT COUNT(*) FROM RawSalesData WHERE DepartmentNum = 2 AND CustomerNum = RSD.CustomerNum) > 0) OR
(SELECT COUNT(*) FROM RawSalesData WHERE DepartmentNum = 3 AND CustomerNum = RSD.CustomerNum) > 0))
Seems to me that my first query above will perform a heck of a lot faster...(assuming I ignore the normalization rule)
November 16, 2011 at 8:24 am
Why not have one row in the DeptShopped table for each customer who has shopped in each department? That's the tried and trusted way of doing it, and it means no need for bitmasks! I appreciate that you can't always change the design and you sometimes have to work with what you've got, but if you can change it, it's certainly worth considering.
John
November 16, 2011 at 8:31 am
I understand and agree with all your points, but if I have a million customers that have shopped on average in about 10 departments each, I'm going to be doing scans of 10,000,000 records to get my results versus a scan of 1,000,000 records. I realize I could index the DepartmentNum field but still I'm (actually SQL Server is) going to be doing a lot of data reading using the standard approach to answer the question I posed in my earlier post. I always fell conflicted about these kind of problems. On one hand, there is the desire to follow standard db design rules and on the other hand, there is the need to ensure you're not following a rule that makes the user experience go so very much slower...
Thanks for all of your thoughts.
Viewing 15 posts - 1 through 15 (of 21 total)
You must be logged in to reply to this topic. Login to reply