October 30, 2012 at 5:23 am
webtekkie (10/30/2012)
Dwain - that's impressive! I'm going through it now to see understand what's going on here.Jeff - I've checked with the data sourcing team - they have told me that there will ONLY be 7 chars for Value1 and 10 chars for Value2 - never more and never less.
I think Jeff may have been trying to determine whether lower case letters are allowed in either or both strings.
If you thought that was impressive (not really, just foolin' around) then here's one for the math geeks out there. Might even run slightly faster.
Same set up except change to the length of v (got it down to 34 bytes!):
DECLARE @Values TABLE
(ID INT IDENTITY, Value1 VARCHAR(7)COLLATE SQL_Latin1_General_CP1_CS_AS
, Value2 VARCHAR(10) COLLATE SQL_Latin1_General_CP1_CS_AS
,v DECIMAL(34,20))
DECLARE @Alphanumerics CHAR(62) =
'0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz'
-- Scramble the encryption key
;WITH Tally (n) AS (
SELECT n=number
FROM [master].dbo.spt_values Tally
WHERE [Type] = 'P' AND Number BETWEEN 1 AND 62)
SELECT @Alphanumerics = (
SELECT SUBSTRING(@Alphanumerics, n, 1)
FROM Tally a
ORDER BY NEWID()
FOR XML PATH(''))
SELECT EncryptionKey=@Alphanumerics
-- Sample data
INSERT INTO @Values (Value1, Value2)
SELECT 'AXH32CT','22124587'
UNION ALL SELECT '544DX88','21457751'
UNION ALL SELECT '43sAb2', '32m9SDSA09'
First the encode:
-- Encode
;WITH Tally (n) AS (
SELECT n=number
FROM [master].dbo.spt_values Tally
WHERE [Type] = 'P' AND Number BETWEEN 1 AND 10),
Encode1 AS (
SELECT ID, Value1, Value2, a.v1, n=POWER(CAST(10 AS DECIMAL(14,0)), 2*(LEN(Value1)-n))
FROM @Values
CROSS APPLY (
SELECT n, RIGHT('0' +
CAST(CHARINDEX(SUBSTRING(Value1,n,1), @Alphanumerics) AS VARCHAR),2)
FROM Tally
WHERE n BETWEEN 1 AND LEN(Value1)) a(n, v1)),
Catenate1 AS (
SELECT ID, Value1, Value2
,v1=( SELECT SUM(n * CAST(v1 AS DECIMAL(14,0)))
FROM Encode1 b
WHERE a.ID = b.ID)
FROM Encode1 a
GROUP BY ID, Value1, Value2),
Encode2 AS (
SELECT ID, Value1, Value2, v1, a.v2, n=POWER(CAST(10 AS DECIMAL(20,0)), 2*(LEN(Value2)-n))
FROM Catenate1
CROSS APPLY (
SELECT n, RIGHT('0' +
CAST(CHARINDEX(SUBSTRING(Value2,n,1), @Alphanumerics) AS VARCHAR),2)
FROM Tally
WHERE n BETWEEN 1 AND LEN(Value2)) a(n, v2)),
Catenate2 AS (
SELECT ID, Value1, Value2, v1
,v2=( SELECT SUM(n * CAST(v2 AS DECIMAL(20,0)))
FROM Encode2 b
WHERE a.ID = b.ID)
FROM Encode2 a
GROUP BY ID, Value1, Value2, v1)
UPDATE a
SET v=CAST(v1 AS VARCHAR(14)) + '.' +
CASE LEN(CAST(v2 AS VARCHAR(20)))%2 WHEN 1 THEN '0' ELSE '' END +
CAST(v2 AS VARCHAR(20))
-- Display the encoded values
OUTPUT INSERTED.ID, INSERTED.Value1, INSERTED.Value2, INSERTED.v
FROM @Values a
INNER JOIN Catenate2 b
ON a.ID = b.ID
Now the decode:
-- Decode
;WITH Tally (n) AS (
SELECT n=number
FROM [master].dbo.spt_values Tally
WHERE [Type] = 'P' AND Number BETWEEN 1 AND 37),
Decode1 AS (
SELECT ID, Value1, Value2, v, n, v1
FROM @Values
CROSS APPLY (
SELECT n, SUBSTRING(@Alphanumerics
,CAST(SUBSTRING(CASE LEN(CAST(FLOOR(v) AS VARCHAR(14)))%2 WHEN 1 THEN '0' ELSE '' END +
CAST(FLOOR(v) AS VARCHAR(14)), 2*n-1, 2) AS INT), 1)
FROM Tally
WHERE n BETWEEN 1 AND 1+LEN(CAST(FLOOR(v) AS VARCHAR(14)))/2) a(n, v1)),
Decode2 AS (
SELECT ID, Value1, Value2, v, n, v2
FROM @Values
CROSS APPLY (
SELECT n, SUBSTRING(@Alphanumerics
,CAST(SUBSTRING(CAST(v-CAST(FLOOR(v) AS DECIMAL(18,0)) AS VARCHAR(22)), 3+2*(n-1), 2) AS INT), 1)
FROM Tally
WHERE n BETWEEN 1 AND 10) a(n, v2)),
SELECT ID, Value1, Value2, v
,v1=( SELECT v1 + ''
FROM Decode1 b
WHERE a.ID = b.ID
ORDER BY n
FOR XML PATH(''))
,v2=( SELECT v2 + ''
FROM Decode2 b
WHERE a.ID = b.ID
ORDER BY n
FOR XML PATH(''))
FROM @Values a
GROUP BY ID, Value1, Value2, v
There's probably a few unnecessary CASTs running around in this one but taking them out would be a chore. Did manage to remove a couple of the CTEs on the decode though. Depending on which approach you choose, you should be able to do the same (remove 2 CTEs) from the first approach the same way I did on the second.
Both solutions use the same basic idea. Use a tally table to convert each string one character at a time to get an index into @Alphanuerics. The second solution avoids needing to store the length of each string in the encoded string itself - that's how I reduced the size by 3 miserly bytes. It also uses sums against powers of 10 to put the strings back together instead of doing a character concatenate.
Definitely can't wait to see what Jeff comes up with. The man is in a class all by himself.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
October 30, 2012 at 7:12 am
And here is my version:
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[f_JoinToOne]') AND type in (N'FN', N'IF', N'TF', N'FS', N'FT'))
DROP FUNCTION [dbo].[f_JoinToOne]
GO
CREATE FUNCTION [dbo].[f_JoinToOne] (@value1 VARCHAR(7),@value2 DECIMAL(10))
RETURNS DECIMAL(22) WITH SCHEMABINDING
AS
BEGIN
-- some variables
DECLARE @characters CHAR(36),
@iterator BIGINT = LEN(@value1),
@multiplier BIGINT = 1,
@value10 BIGINT = 0
-- encoding string and the default result
SET @characters = '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ'
IF @value1 IS NULL OR @value2 IS NULL RETURN NULL;
--convert value to decimal
WHILE @iterator > 0
SELECT @value10 = @value10 + (( CHARINDEX(SUBSTRING(@value1, @iterator, 1 ), @characters)-1) * @multiplier )
,@multiplier = @multiplier * 36
,@iterator = @iterator -1;
RETURN CAST(@value10 * POWER(10.0,10) AS DECIMAL(22)) + @value2;
END
GO
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[f_SplitToTwo]') AND type in (N'FN', N'IF', N'TF', N'FS', N'FT'))
DROP FUNCTION [dbo].[f_SplitToTwo]
GO
CREATE FUNCTION [dbo].[f_SplitToTwo] (@value DECIMAL(22))
RETURNS @t TABLE (value1 VARCHAR(7), value2 DECIMAL(10)) WITH SCHEMABINDING
AS
BEGIN
-- some variables
DECLARE @value1 VARCHAR(7) = '',
@value2 DECIMAL(10),
@characters CHAR(36),
@value10 BIGINT = 0
IF @value IS NULL RETURN;
-- encoding string and the default result
SELECT @characters = '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ'
SET @value2 = CAST(RIGHT(CAST(@value AS VARCHAR(22)),10) AS DECIMAL(10));
SET @value10 = (@value - @value2) / POWER(10.0,10)
WHILE @value10 > 0
SELECT @value1 = SUBSTRING(@characters, @value10 % 36 + 1, 1) + @value1,
@value10 = @value10 / 36;
INSERT @t SELECT @value1, @value2
RETURN;
END
GO
select [dbo].[f_JoinToOne]('ZZZZZZZ',9999999999)
select * from [dbo].[f_SplitToTwo](783641640959999999999)
select [dbo].[f_JoinToOne]('1',0)
select * from [dbo].[f_SplitToTwo](10000000000)
select [dbo].[f_JoinToOne]('A',1)
select * from [dbo].[f_SplitToTwo](100000000001)
select [dbo].[f_JoinToOne]('',1)
select * from [dbo].[f_SplitToTwo](1)
Cannot be sure for performance, but will not be surprised that [f_JoinToOne] (loop-based scalar UDF) will outperform CTE/tally table based.
October 30, 2012 at 7:17 am
webtekkie (10/30/2012)
Dwain - that's impressive! I'm going through it now to see understand what's going on here.Jeff - I've checked with the data sourcing team - they have told me that there will ONLY be 7 chars for Value1 and 10 chars for Value2 - never more and never less.
Even better! But I still need to know what the actual content possibilites will be. To re-ask the question, only 0-9 and A-Z (upper case only) characters for Value 1 and only 0-9 for Value 2?
--Jeff Moden
Change is inevitable... Change for the better is not.
October 30, 2012 at 7:18 am
Thanks again, Jeff - I'm even more impressed now.
Jeff, I think that Dwain is setting some pretty high expecatations here! Personally, I don't think that there is a solution better than his but are you up for proving me wrong? 😛
The alpha characters are all uppercase.
October 30, 2012 at 7:25 am
Sorry Jeff - yes, exactly as you stated - 0-9 and A-Z (upper case only) characters for Value 1 and only 0-9 for Value 2?
October 30, 2012 at 7:38 am
dwain.c (10/30/2012)
Definitely can't wait to see what Jeff comes up with. The man is in a class all by himself.
Thanks for the compliment :blush: but seriously not true. I haven't checked your code for performance but it looks like something that I might write. Well done!
--Jeff Moden
Change is inevitable... Change for the better is not.
October 30, 2012 at 7:43 am
webtekkie (10/30/2012)
Sorry Jeff - yes, exactly as you stated - 0-9 and A-Z (upper case only) characters for Value 1 and only 0-9 for Value 2?
Perfect! I'll take a look at this tonight after work. I'm not sure that I can beat any of the fine solutions posted so far but it'll sure be fun trying. I love these kinds of problems.
--Jeff Moden
Change is inevitable... Change for the better is not.
October 30, 2012 at 8:38 am
I hate to throw cold water all over your efforts Dwain, but it appears the output does have to be an int and not a decimal. I think that means that your solution isn't going to work?
October 30, 2012 at 8:48 am
Another take on it. Simple... but I think Eugene's might be faster. It's a lot closer with a physical tally table, but I think his still wins.
CREATE FUNCTION dbo.TallyCombine(
@Value1varchar(7),
@Value2varchar(10)
)
RETURNS decimal(24,10)
AS
BEGIN
DECLARE @holding varchar(25) = ''
;WITH
t1 AS (SELECT 1 N UNION ALL SELECT 1 N),
t2 AS (SELECT 1 N FROM t1 x, t1 y),
Tally AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) N FROM t2 x, t2 y)
SELECT @holding = @holding + CAST(ASCII(SUBSTRING(@Value1,N,N+1)) AS varchar(30))
FROM Tally
WHERE N < LEN(@Value1)
SELECT @holding = @holding + '.' + @Value2
RETURN @holding
END
GO
CREATE FUNCTION dbo.TallySplitNums(
@Value3varchar(25)
)
RETURNS @t TABLE(Value1 VARCHAR(7), Value2 int)
AS
BEGIN
DECLARE @Value1 varchar(7) = ''
;WITH
t1 AS (SELECT 1 N UNION ALL SELECT 1 N),
t2 AS (SELECT 1 N FROM t1 x, t1 y),
Tally AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) N FROM t2 x, t2 y)
SELECT @Value1 = @Value1 + CAST(CHAR(SUBSTRING(@Value3,N-1,2)) AS varchar(30))
FROM Tally
WHERE N < CHARINDEX('.',@Value3)
AND N%2 = 0
INSERT INTO @t(value1,value2)
SELECT @Value1, RIGHT(@Value3, LEN(@Value3) - CHARINDEX('.',@Value3))
RETURN;
END
GO
SELECT dbo.TallyCombine('ABCD12Z','1234567890')
SELECT * FROM dbo.TallySplitNums('65666768495090.1234567890')
October 30, 2012 at 9:01 am
webtekkie (10/30/2012)
I hate to throw cold water all over your efforts Dwain, but it appears the output does have to be an int and not a decimal. I think that means that your solution isn't going to work?
7-digits in 36-based numeric system will take upto 12 digits in 10-based (decimal) numeric system, plus your 10 digits of second value... So, it cannot be INT or even BIGINT! 4 and 8 bytes are not enough.
Combination of your two values in decimal numeric system will need upto 22 digits.
If you just don't want to have decimal point, then you can use my code, it does produce DECIMAL(22).
BTW, I've tested my scalar UDF against Dwain method on 1,000,000 records. As I thought, my UDF is faster. Stopped Dwain one after 3 min of execution but my one did complete update in 52 seconds.
October 30, 2012 at 9:48 am
It has to be an "integer value", but does it have to be an "integer datatype"?
A varchar(22) can hold things like "1234567890123456789012". Looks like a number, is a number to a human user, allows more digits than Int/BigInt. Only problem is you'll get overflow errors if you try to do mathematical operations on it as an integer instead of a floating point number. (And you'll get the usual rounding issues with floating point numerics if you do math on it that way, just not error messages.)
If the intent is to store a numeric representation of the alphanumeric codes, then varchar() should be fine. If the intent is to do math on it, then it won't work, or will require custom code in order to work.
Usually, this kind of data doesn't end up having math done on it.
Store it that way, and you can convert to/from without error. Will that do what you need?
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
October 30, 2012 at 10:56 am
Eugene - I didn't even notice your post earlier as I was posting at the same time. It's an excellent solution but sadly fails for the same reason as Dwain's - it returns a decimal.
GSquared - full marks for thinking "outside the box" - unfortunately it does have to be an integer datatype as the third party solution requires it to be so.
So, it does look like I'm asking the impossible - it isn't mathematically possible to turn that many characters into an integer datatype which will fit into a bigint.
October 30, 2012 at 11:27 am
webtekkie (10/30/2012)
Eugene - I didn't even notice your post earlier as I was posting at the same time. It's an excellent solution but sadly fails for the same reason as Dwain's - it returns a decimal.GSquared - full marks for thinking "outside the box" - unfortunately it does have to be an integer datatype as the third party solution requires it to be so.
So, it does look like I'm asking the impossible - it isn't mathematically possible to turn that many characters into an integer datatype which will fit into a bigint.
Not in a way that can be recomposed back to the original data. Checksum() will get you an integer from pretty much any input, and it's deterministic so far as I know, but there's no way to compose back to the original value. Plus, of course, you can end up with collisions on checksums, which means it's not guaranteed unique for different inputs.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
October 30, 2012 at 11:29 am
...
So, it does look like I'm asking the impossible - it isn't mathematically possible to turn that many characters into an integer datatype which will fit into a bigint.
Yes, exactly right! It is not possible mathematically and I've explained already why is that.
But...
Practically, you can do it. I don't believe that number of rows in your table is larger than upper limit of INT (or BIGINT).
So, as it was discussed somewhere at the beginning, create a mapping table with IDENTITY (int or bigint whatever appropriate). Simple and very fast.;-)
October 30, 2012 at 4:04 pm
Eugene Elutin (10/30/2012)
...
So, it does look like I'm asking the impossible - it isn't mathematically possible to turn that many characters into an integer datatype which will fit into a bigint.
Yes, exactly right! It is not possible mathematically and I've explained already why is that.
But...
Practically, you can do it. I don't believe that number of rows in your table is larger than upper limit of INT (or BIGINT).
So, as it was discussed somewhere at the beginning, create a mapping table with IDENTITY (int or bigint whatever appropriate). Simple and very fast.;-)
Not that it matters now but I agree. Not mathematically possible with BIGINT given 367 * 1010.
Shifting gears, my recommendation would be to stop messing around with such a conversions and do something very similar to what Eugene suggested except, instead of a mapping table, just add an IDENTITY column to the existing table and call it a day. If you can't modify the table by adding an identity column to it, then a mapping table, as Eugene suggested would certainly do the trick.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 16 through 30 (of 33 total)
You must be logged in to reply to this topic. Login to reply