March 18, 2014 at 9:56 am
ChrisM@Work (3/18/2014)
Although just about every reference on the first page returned by Google specifies two operands, the TSQL BITWISE OR actually works with a chain like this:
-- on a single row
SELECT 001 | 000 | 010
Result
11
Which I think can be emulated like this:
-- try row-wise
DECLARE @LastValue INT
SELECT @LastValue = 001
SELECT @LastValue = @LastValue | d.MyValue
FROM (SELECT MyValue = 000 UNION ALL SELECT 010) d
-- eyeball check
SELECT @LastValue
Result
11
- which can be converted into a msTVF.
The output should be 011, although thats is something I can handle (converting 11 to 011).
The query needs to be used like the built in aggregate function (with the possibility to group by)
March 18, 2014 at 9:56 am
joakim.fenno (3/18/2014)
ChrisM@Work (3/18/2014)
joakim.fenno (3/18/2014)
OK
CREATE TABLE my_table (col1 nvarchar(25) NULL, bit_flag nvarchar(3) NULL)
INSERT INTO my_table VALUES('A', '100'), ('A', '001'),('B', '010'),('A', '001');
expected output from query :
A 101
B 011
The bit_flag column contains 12 flag (and is therefore a nvarchar(12)) in the real example
You have three rows where col1 = 'A' and one row where col1 = 'B'. You have two rows in your result set, one for 'A' and one for 'B', and yet the result for 'B' is different to the value in the one row in the sample data. What is the third row in the sample data set, the row where col1 = 'B', OR'd with?
sorry, should be:
CREATE TABLE my_table (col1 nvarchar(25) NULL, bit_flag nvarchar(3) NULL)
INSERT INTO my_table VALUES('A', '100'), ('A', '001'),('B', '010'),('B', '001');
hhmmm....that looks surprisingly like what I posted.
And the results seem to match too.
Compare my last post and this.
select 100|001, 010|001
This seems to produce the exact same output.
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
March 18, 2014 at 9:59 am
Sean Lange (3/18/2014)
joakim.fenno (3/18/2014)
ChrisM@Work (3/18/2014)
joakim.fenno (3/18/2014)
OK
CREATE TABLE my_table (col1 nvarchar(25) NULL, bit_flag nvarchar(3) NULL)
INSERT INTO my_table VALUES('A', '100'), ('A', '001'),('B', '010'),('A', '001');
expected output from query :
A 101
B 011
The bit_flag column contains 12 flag (and is therefore a nvarchar(12)) in the real example
You have three rows where col1 = 'A' and one row where col1 = 'B'. You have two rows in your result set, one for 'A' and one for 'B', and yet the result for 'B' is different to the value in the one row in the sample data. What is the third row in the sample data set, the row where col1 = 'B', OR'd with?
sorry, should be:
CREATE TABLE my_table (col1 nvarchar(25) NULL, bit_flag nvarchar(3) NULL)
INSERT INTO my_table VALUES('A', '100'), ('A', '001'),('B', '010'),('B', '001');
hhmmm....that looks surprisingly like what I posted.
And the results seem to match too.
Compare my last post and this.
select 100|001, 010|001
This seems to produce the exact same output.
Yes, I made a typo.
See my comments about your query in my previous post.
March 18, 2014 at 10:10 am
Here's a solution which will work with many values. First the UDF:
ALTER FUNCTION fn_BITWISE_OR
(
@col1 VARCHAR(3)
)
RETURNS VARCHAR(3)
AS
BEGIN
-- try row-wise
DECLARE @LastValue INT
SELECT @LastValue = 000
SELECT @LastValue = @LastValue | d.bit_flag
-- substitute this for your table
FROM (VALUES('A', '100'), ('A', '001'),('A', '010'),('B', '001')) d (col1, bit_flag )
-- substitute this for your table
WHERE col1 = @col1 -- filter
RETURN RIGHT('000'+CAST(@LastValue AS VARCHAR(3)),3)
END
Then, usage like this:
SELECT d.col1, dbo.fn_BITWISE_OR(d.col1)
FROM (VALUES('A', '100'), ('A', '001'),('A', '010'),('B', '001')) d (col1, bit_flag )
GROUP BY col1
In both cases you will need to switch the sample data set for your actual table name, and you will also need to play with the data type, currently set to VARCHAR(3).
Note that this is a scalar udf (!). It will prevent parallelism in all or part of the plan and will be executed for each row, hence performance could become a problem.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
March 18, 2014 at 12:12 pm
ChrisM@Work (3/18/2014)
Here's a solution which will work with many values. First the UDF:
ALTER FUNCTION fn_BITWISE_OR
(
@col1 VARCHAR(3)
)
RETURNS VARCHAR(3)
AS
BEGIN
-- try row-wise
DECLARE @LastValue INT
SELECT @LastValue = 000
SELECT @LastValue = @LastValue | d.bit_flag
-- substitute this for your table
FROM (VALUES('A', '100'), ('A', '001'),('A', '010'),('B', '001')) d (col1, bit_flag )
-- substitute this for your table
WHERE col1 = @col1 -- filter
RETURN RIGHT('000'+CAST(@LastValue AS VARCHAR(3)),3)
END
Then, usage like this:
SELECT d.col1, dbo.fn_BITWISE_OR(d.col1)
FROM (VALUES('A', '100'), ('A', '001'),('A', '010'),('B', '001')) d (col1, bit_flag )
GROUP BY col1
In both cases you will need to switch the sample data set for your actual table name, and you will also need to play with the data type, currently set to VARCHAR(3).
Note that this is a scalar udf (!). It will prevent parallelism in all or part of the plan and will be executed for each row, hence performance could become a problem.
Thanks!
I'm aware that scalar functions prevents parallel execution plans but thats somethign that I will have to deal with later on.
I will check later if it works but I doubt that its optimal in my case.
It is important that I can groupo by a number of columns and that the group by is dynamic (set of columns might differ and might be none as well).
It is good that you implement it as a function because it will be used in many different places.
However the function will be used for different tables (at least three different) so I guess it has to be more generic (table name cant be hard coded in the function).
March 18, 2014 at 2:13 pm
joakim.fenno (3/18/2014)
The bit_flag column contains 12 flag (and is therefore a nvarchar(12)) in the real example
Gosh. That's 24+2 or 26 bytes to store what could be stored in less than 2 (8 bits per byte). Is there some reason why the designers of this table decided to store binary data in NVARCHAR instead of BINARY(2) or even separate bit columns which allows for 8 separate bit column per byte. It would also make bit-wise coding a whole lot easier if BINARY(2) were used.
--Jeff Moden
Change is inevitable... Change for the better is not.
March 18, 2014 at 2:46 pm
Jeff Moden (3/18/2014)
joakim.fenno (3/18/2014)
The bit_flag column contains 12 flag (and is therefore a nvarchar(12)) in the real exampleGosh. That's 24+2 or 26 bytes to store what could be stored in less than 2 (8 bits per byte). Is there some reason why the designers of this table decided to store binary data in NVARCHAR instead of BINARY(2) or even separate bit columns which allows for 8 separate bit column per byte. It would also make bit-wise coding a whole lot easier if BINARY(2) were used.
I know that and I am willing to use a binary data type instead if that makes the implementation easier.
Each bit flag represent a month and some queries might be executed using the more user friendly string field, at least in the short run.
But let’s assume that a binary data type can be used
March 18, 2014 at 5:39 pm
Here are three ways you could tackle this. I prefer number three, which is to store the month number as a tinyint.
They all produce the same result, but query 2 seems to produce the simplest plan.
use tempdb
--== the "nvarchar" version
begin try
create table my_table (col1 nvarchar(25) null, bit_flag nvarchar(3) null)
insert into my_table values('a', '100'), ('a', '001'),('b', '010'),('b', '001');
end try
begin catch
end catch;
with split_data as
(
select col1, x.n, max(x.v ) as bit_value
from my_table
cross apply (
select n, case substring(bit_flag,n,1)
when '1' then 1
else 0
end
from (values(1),(2),(3)) t(n)
) x(n,v)
group by col1, x.n
)
select col1,right('000'+cast(sum( power(10,3-n)*bit_value ) as varchar(3)),3) as col_value
from split_data
group by col1;
--== the "int" flag version
begin try
create table my_table2 (col1 nvarchar(25) null, int_flag int null)
insert into my_table2 values('a', 100), ('a', 001),('b', 10),('b', 1);
end try
begin catch
end catch;
with split_data as
(
select col1, int_flag
from my_table2
group by col1, int_flag
)
select col1,right('000'+cast(sum( int_flag ) as varchar(3)),3) as col_value
from split_data
group by col1;
--== the "tinyint" meaningful month number version
begin try
create table my_table3 (col1 nvarchar(25) null, month_number tinyint null)
insert into my_table3 values('a', 1), ('a', 3),('b', 2),('b', 1);
end try
begin catch
end catch;
with split_data as
(
select col1, month_number
from my_table3
group by col1, month_number
)
select col1,right('000'+cast(sum( power(10,month_number-1) ) as varchar(3)),3) as col_value
from split_data
group by col1;
MM
select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);
March 19, 2014 at 1:32 am
mister.magoo (3/18/2014)
Here are three ways you could tackle this. I prefer number three, which is to store the month number as a tinyint.They all produce the same result, but query 2 seems to produce the simplest plan.
use tempdb
--== the "nvarchar" version
begin try
create table my_table (col1 nvarchar(25) null, bit_flag nvarchar(3) null)
insert into my_table values('a', '100'), ('a', '001'),('b', '010'),('b', '001');
end try
begin catch
end catch;
with split_data as
(
select col1, x.n, max(x.v ) as bit_value
from my_table
cross apply (
select n, case substring(bit_flag,n,1)
when '1' then 1
else 0
end
from (values(1),(2),(3)) t(n)
) x(n,v)
group by col1, x.n
)
select col1,right('000'+cast(sum( power(10,3-n)*bit_value ) as varchar(3)),3) as col_value
from split_data
group by col1;
--== the "int" flag version
begin try
create table my_table2 (col1 nvarchar(25) null, int_flag int null)
insert into my_table2 values('a', 100), ('a', 001),('b', 10),('b', 1);
end try
begin catch
end catch;
with split_data as
(
select col1, int_flag
from my_table2
group by col1, int_flag
)
select col1,right('000'+cast(sum( int_flag ) as varchar(3)),3) as col_value
from split_data
group by col1;
--== the "tinyint" meaningful month number version
begin try
create table my_table3 (col1 nvarchar(25) null, month_number tinyint null)
insert into my_table3 values('a', 1), ('a', 3),('b', 2),('b', 1);
end try
begin catch
end catch;
with split_data as
(
select col1, month_number
from my_table3
group by col1, month_number
)
select col1,right('000'+cast(sum( power(10,month_number-1) ) as varchar(3)),3) as col_value
from split_data
group by col1;
Thanks!
Lets start with the "int" flag version, its does not produce the correct output if you add a value 101 (menas true for month 1, false for month 2, true for month 3) for the int_flag:
begin try
drop table my_table2
create table my_table2 (col1 nvarchar(25) null, int_flag int null)
insert into my_table2 values('a', 101), ('a', 1),('a', 1),('b', 1),('a', 10),('b', 1);
end try
begin catch
end catch;
select * from my_table2
order by col1
go
with split_data as
(
select col1, int_flag
from my_table2
group by col1, int_flag
)
select col1,right('000'+cast(sum( int_flag ) as varchar(3)),3) as col_value
from split_data
group by col1;
Or did I misunderstand your implementation?
1 = 001?
10 = 010?
100 = 100?
It needs to support all combination of 0 and 1
It seems like the "nvarchar" version works better.
I dont really understand the "tinyint" flag version, what do the values for the flag mean?
March 19, 2014 at 3:33 am
joakim.fenno (3/19/2014)
Thanks!
You're welcome 🙂
Lets start with the "int" flag version, its does not produce the correct output if you add a value 101 (menas true for month 1, false for month 2, true for month 3) for the int_flag:
...
It needs to support all combination of 0 and 1
Well, if you have other requirements than the one stated - where each value only contained a flag in one position it would help if you disclosed ALL the requirements rather than drip feeding them.
It seems like the "nvarchar" version works better.
I dont really understand the "tinyint" flag version, what do the values for the flag mean?
Ignore it, now that we know you can have more than one flag in a single data item, that one is useless.
As you said, the nvarchar version will work for that.
MM
select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);
March 19, 2014 at 3:41 am
mister.magoo (3/19/2014)
joakim.fenno (3/19/2014)
Thanks!You're welcome 🙂
Lets start with the "int" flag version, its does not produce the correct output if you add a value 101 (menas true for month 1, false for month 2, true for month 3) for the int_flag:
...
It needs to support all combination of 0 and 1
Well, if you have other requirements than the one stated - where each value only contained a flag in one position it would help if you disclosed ALL the requirements rather than drip feeding them.
It seems like the "nvarchar" version works better.
I dont really understand the "tinyint" flag version, what do the values for the flag mean?
Ignore it, now that we know you can have more than one flag in a single data item, that one is useless.
As you said, the nvarchar version will work for that.
Thanks!
Sorry for being unclear about the requirements.
I will see if I can change the int-version to support my needs - I will, at least in the long run, have to use other data types for prestanda and storage reasons.
One thing though, as mentioned above I would really like to implement this as a function since it will be used in multiple places on multiple tables. And also if I want to change the columns I would like to group on I would prefer not change the code in more than one place. That’s why I thought about an aggregate CLR in the first place.
March 20, 2014 at 8:30 am
From what it sounds like, you're not going to be indexing these values, just doing things based on the results. If there really are only 12 bits (months), then you might also want to consider 12 BIT columns with month names.
Jan BIT
,Feb BIT
,Mar BIT
...
,Dec BIT
and normal logical OR statements.
You get serious gains in code readability, at the expense of a little tedious (or macro-driven 🙂 ) code generation. Performance against the other techniques you'd have to test, of course.
Alternately, in this case, a simple lookup table with 2^24 (16777216) rows containing three BINARY(2) columns and a clustered index on the "input" values would also work. If you have enough RAM to buffer 96MB of data plus the usual overhead, that might be very interesting to benchmark against the other solutions.
March 20, 2014 at 9:56 am
Here's a sandpit version very similar to what you currently have. The only downside is the ugly aggregate - but it's a very simple ugly aggregate. I guess it depends upon your priorities. You do get your 12 months saved as a smallint representation of a bit string.
DROP TABLE my_table2
CREATE TABLE my_table2 (col1 nvarchar(25) null, int_flag smallint NULL)
G0
INSERT INTO my_table2 values
('a', (SELECT DecimalOut FROM dbo.CHARBitsToDecimal('000000000100'))),
('a', (SELECT DecimalOut FROM dbo.CHARBitsToDecimal('000000000001'))),
('b', (SELECT DecimalOut FROM dbo.CHARBitsToDecimal('000000000010'))),
('b', 2),
('b', 1),
('b', 4);
CREATE CLUSTERED INDEX cx_col1 ON my_table2 (col1)
-- Stare & compare
SELECT * FROM my_table2
-- BITWISE OR aggregate emulation
SELECT col1,
CAST(MAX(SUBSTRING(RIGHT(x.CHARBitsOut,12),1,1)) AS CHAR(1))+
CAST(MAX(SUBSTRING(RIGHT(x.CHARBitsOut,12),2,1)) AS CHAR(1))+
CAST(MAX(SUBSTRING(RIGHT(x.CHARBitsOut,12),3,1)) AS CHAR(1))+
CAST(MAX(SUBSTRING(RIGHT(x.CHARBitsOut,12),4,1)) AS CHAR(1))+
CAST(MAX(SUBSTRING(RIGHT(x.CHARBitsOut,12),5,1)) AS CHAR(1))+
CAST(MAX(SUBSTRING(RIGHT(x.CHARBitsOut,12),6,1)) AS CHAR(1))+
CAST(MAX(SUBSTRING(RIGHT(x.CHARBitsOut,12),7,1)) AS CHAR(1))+
CAST(MAX(SUBSTRING(RIGHT(x.CHARBitsOut,12),8,1)) AS CHAR(1))+
CAST(MAX(SUBSTRING(RIGHT(x.CHARBitsOut,12),9,1)) AS CHAR(1))+
CAST(MAX(SUBSTRING(RIGHT(x.CHARBitsOut,12),10,1)) AS CHAR(1))+
CAST(MAX(SUBSTRING(RIGHT(x.CHARBitsOut,12),11,1)) AS CHAR(1))+
CAST(MAX(SUBSTRING(RIGHT(x.CHARBitsOut,12),12,1)) AS CHAR(1))
FROM my_table2 t
CROSS APPLY dbo.DecimalToCHARBits (t.int_flag) x
GROUP BY col1;
---------------------------------------------------------------------
-- check the functions
SELECT DecimalOut FROM [dbo].[CHARBitsToDecimal] ('111011110111')
-- 3831 -- SMALLINT, two bytes
SELECT RIGHT(CHARBitsOut,12) FROM [dbo].[DecimalToCHARBits] (3831) -- SMALLINT, two bytes
-- '111011110111'
CREATE FUNCTION [dbo].[CHARBitsToDecimal]
(@BinVal VARCHAR(32))
RETURNS TABLE WITH SCHEMABINDING AS
/*
ChrisM March 2014
CHARBitsToDecimal could output SMALLINT.
*/
RETURN
SELECT DecimalOut = SUM(POWER(2,iTally.n-1))
FROM (SELECT n = ROW_NUMBER() OVER(ORDER BY (SELECT NULL))
FROM (VALUES
(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),
(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),
(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) d (n)
) iTally
WHERE SUBSTRING(REVERSE(@Binval),iTally.n,1) = 1;
CREATE FUNCTION [dbo].[DecimalToCHARBits]
(@DecVal BIGINT)
RETURNS TABLE WITH SCHEMABINDING AS
/*
ChrisM March 2014
DecimalToCHARBits() could very easily be converted to work with 12 characters,
removing the present requirement for extracting the rightmost 12.
*/
RETURN
SELECT CHARBitsOut = CAST(REVERSE(
(SELECT (@DecVal/POWER(CAST(2 AS BIGINT),iTally.n-1)) % 2 AS [text()]
FROM (SELECT n = ROW_NUMBER() OVER(ORDER BY (SELECT NULL))
FROM (VALUES
(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),
(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),
(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) d (n)
) iTally
FOR XML PATH(''))
) AS VARCHAR(32));
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
Viewing 13 posts - 16 through 27 (of 27 total)
You must be logged in to reply to this topic. Login to reply