January 3, 2013 at 1:19 pm
Hi all,
How do I find out if the field value is of a number that has the same digit.
For instance I need to find all rows where column_1 is all 0s or all 1s or all 9s as 00000, 111, 999999999999
Here is some value that are in the table right now:
000
0000000
00000000000
000001
0001000
00-0000000-01
Thanks,
January 3, 2013 at 1:45 pm
Does this do what you want?
SELECT my_column
FROM dbo.my_table myt
WHERE my_column not like '%[^0]%'
OR my_column not like '%[^1]%'
OR my_column not like '%[^9]%'
If not, can you post back an example of where it falls down?
Cheers
...One of the symptoms of an approaching nervous breakdown is the belief that ones work is terribly important.... Bertrand Russell
January 3, 2013 at 1:46 pm
You mean to say you want to count the occurence of that number right?
If so then below is the query for that:
SELECT (LEN(columnname) - LEN(REPLACE(columnname, FindSubString, '' ))) Cntrepeated
:-)I am a newbie to this site:-)
January 3, 2013 at 1:48 pm
Here's one approach. You didn't provide ddl for your test data so I made some assumptions about data types. This approach is based on using the first character to define what "all the same" looks like. It does not account for punctuation so it may not be handling the last example correctly.
Create Table #test
(
num varchar(20)
)
;
Insert #test (num)
Values
('000'),
('0000000'),
('00000000000'),
('000001'),
('0001000'),
('00-0000000-01'),
('00-0000000-00')
;
Select
num,
Case When Replicate(LEFT(num,1),LEN(num)) = num Then 'all the same' Else 'different' End As compare
From
#test
;
Drop table #test;
January 3, 2013 at 2:00 pm
I know I didn't ask for it in the original post, but is there a way to account for any digit?
This work fine, just looking for a shortcut.
SELECT *
FROM tab
WHERE col not like '%[^0]%'
OR col not like '%[^1]%'
OR col not like '%[^2]%'
OR col not like '%[^3]%'
OR col not like '%[^4]%'
OR col not like '%[^5]%'
OR col not like '%[^6]%'
OR col not like '%[^7]%'
OR col not like '%[^8]%'
OR col not like '%[^9]%'
something like SELECT * FROM tab WHERE col not like '%[^0 - ^9]%'
Thanks,
January 3, 2013 at 2:08 pm
If I understand your requirements, the approach I suggested will work for any number. I placed the comparison test in a CTE and then queried for the final results. You'd still have to parse out punctuation if punctuation should be ignored in the comparison. Using an expanded set of data:
Create Table #test
(
num varchar(20)
)
;
Insert #test (num)
Values
('000'),
('0000000'),
('00000000000'),
('000001'),
('0001000'),
('00-0000000-01'),
('00-0000000-00'),
('88848'),
('44444444'),
('62135444'),
('222222222'),
('55')
;
With same_check
As
(
Select
num,
Case When Replicate(LEFT(num,1),LEN(num)) = num Then 'same' Else 'different' End As compare
From
#test
)
Select num
From same_check
Where compare = 'same'
;
Drop table #test;
January 3, 2013 at 2:33 pm
Thanks you all for your replies. I have some options now.
January 3, 2013 at 7:37 pm
Be good to know what the underlying business need is. Maybe there's a solution we haven't given you because we don't have enough information about the underlying problem. What's the significance of '00000' or '111' or '999999999999999999999' or '9'? What do they mean, and why do you need to filter them out?
...One of the symptoms of an approaching nervous breakdown is the belief that ones work is terribly important.... Bertrand Russell
January 3, 2013 at 8:45 pm
Could you do something like this?
;WITH Strings (s) AS (
SELECT '000' UNION ALL SELECT '0000000' UNION ALL SELECT '00000000000'
UNION ALL SELECT '000001' UNION ALL SELECT '0001000' UNION ALL SELECT '00-0000000-01'
UNION ALL SELECT '00-0000000-00' UNION ALL SELECT '88848' UNION ALL SELECT '44444444'
UNION ALL SELECT '62135444' UNION ALL SELECT '222222222' UNION ALL SELECT '55')
SELECT s
FROM Strings
WHERE PATINDEX('%[^' + LEFT(s, 1) + ']%', s) = 0
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
January 3, 2013 at 11:26 pm
I was asked to clean up data. The column that I am looking at (nvarchar(20)) contains tax id values.
I needed to identify rows where tax id value contains same digits (as of all zeros, threes, nines ..... or others).
I really liked SSC-Enthusiastic's approach as it helps me to identify values of the same characters (....., $$$$$, 1111, ## and so on), not just numbers.
As it appears, the data that we've got has a lot of junk in it.
January 3, 2013 at 11:31 pm
eugene.pipko (1/3/2013)
I was asked to clean up data. The column that I am looking at (nvarchar(20)) contains tax id values.I needed to identify rows where tax id value contains same digits (as of all zeros, threes, nines ..... or others).
I really liked SSC-Enthusiastic's approach as it helps me to identify values of the same characters (....., $$$$$, 1111, ## and so on), not just numbers.
As it appears, the data that we've got has a lot of junk in it.
Hmmm... thought mine did that too.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
January 3, 2013 at 11:33 pm
Sorry Dwain for not mentioning you. I just didn't test yours yet.
January 3, 2013 at 11:39 pm
eugene.pipko (1/3/2013)
Sorry Dwain for not mentioning you. I just didn't test yours yet.
No problem... I'm used to toiling in obscurity. :hehe:
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
January 4, 2013 at 8:37 am
In terms of finding rows for your cleanup, Dwain's approach and mine will give you the same result so it comes down to speed. How many rows are you looking at cleaning up? I didn't do any performance testing so I'm not sure which approach is faster, REPLICATE or PATINDEX. If it's only a few rows, it probably doesn't matter. If you're talking about millions of rows, performance testing might be helpful.
I haven't used PATINDEX before--thanks for the new thought Dwain! I suspect that in performance testing that will be the faster approach but testing would show for sure.
January 4, 2013 at 2:38 pm
I wonder whether your friendly tax office (yah right!) might be able to supply you with some business rules around validating "tax ids". For instance can it be only X characters wide? Do any of the digits have intrinsic meaning? Some of the government identifiers I've dealt with in the past have had all sorts of little easter eggs embedded in them that help you weed out the junk. Some have had a particular "check digit" that is based on the values of the other digits. In some the first digit could only be a number between 1 and 6... and so on. Just thinking that if your ultimate aim is to identify junk, there are going to be lots of other type of junk than just consecutive digits.
...One of the symptoms of an approaching nervous breakdown is the belief that ones work is terribly important.... Bertrand Russell
Viewing 15 posts - 1 through 15 (of 15 total)
You must be logged in to reply to this topic. Login to reply