March 24, 2015 at 3:05 pm
Hello all,
I am trying to figure out how I can find the names of people who have a double occurrence or more of characters either in their first name or last name.
This is for a data cleanup.
Thanks!! for any suggestions
CREATE TABLE #Names ( FIRSTNAME VARCHAR (20), LASTNAME VARCHAR (20))
INSERT INTO #Names VALUES
('Thomas', 'Hohner')
,('Mike', 'SSmith')
,('TtTony', 'Hawk')
,('Jeff', 'Smith')
,('ZZSara', 'Jones')
,('Luke', 'HendersonZZ')
,('Lily', 'PPadZZZ')
SELECT * FROM #Names
DROP TABLE #Names
***SQL born on date Spring 2013:-)
March 24, 2015 at 3:20 pm
This is a quick idea.
WITH Letters AS(
SELECT TOP 26 CHAR(ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) + 64) letter
FROM sys.all_columns
)
SELECT DISTINCT n.*
FROM #Names n
JOIN Letters l ON n.FIRSTNAME LIKE '%' + letter + letter+ '%'
OR n.LASTNAME LIKE '%' + letter + letter+ '%'
I'm thinking on something different but I need some more time.
March 24, 2015 at 3:35 pm
Here are my 2 options. Just wanted to note that I added an id to force order and that Jeff could be considered a false positive for a cleanup process.
CREATE TABLE #Names ( id int, FIRSTNAME VARCHAR (20), LASTNAME VARCHAR (20))
INSERT INTO #Names VALUES
(1,'Thomas', 'Hohner')
,(2,'Mike', 'SSmith')
,(3,'TtTony', 'Hawk')
,(4,'Jeff', 'Smith')
,(5,'ZZSara', 'Jones')
,(6,'Luke', 'HendersonZZ')
,(7,'Lily', 'PPadZZZ');
WITH E(n) AS(SELECT 0 UNION ALL SELECT 0 UNION ALL SELECT 0 UNION ALL
SELECT 0 UNION ALL SELECT 0 UNION ALL SELECT 0)
,E36(n) AS (SELECT E.n FROM E, E x)
,Letters AS(
SELECT TOP 26 CHAR(ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) + 64) letter
FROM sys.all_columns
)
SELECT DISTINCT n.*
FROM #Names n
JOIN Letters l ON n.FIRSTNAME LIKE '%' + letter + letter+ '%'
OR n.LASTNAME LIKE '%' + letter + letter+ '%'
ORDER BY id;
WITH E(n) AS(SELECT 0 UNION ALL SELECT 0 UNION ALL SELECT 0 UNION ALL
SELECT 0 UNION ALL SELECT 0)
,E25(n) AS (SELECT E.n FROM E, E x)
,cteTally AS(
SELECT TOP 20 ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) n
FROM E25
)
SELECT DISTINCT n.*
FROM #Names n
CROSS
JOIN cteTally t
WHERE (LEN( n.FIRSTNAME) >= t.n AND SUBSTRING(n.FIRSTNAME, n, 1) = SUBSTRING(n.FIRSTNAME, n + 1, 1))
OR (LEN( n.LASTNAME) >= t.n AND SUBSTRING(n.LASTNAME, n, 1) = SUBSTRING(n.LASTNAME, n + 1, 1))
ORDER BY n.id
DROP TABLE #Names
Feel free to ask any questions about the code.
March 24, 2015 at 4:05 pm
Luis Cazares (3/24/2015)
This is a quick idea.
WITH Letters AS(
SELECT TOP 26 CHAR(ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) + 64) letter
FROM sys.all_columns
)
SELECT DISTINCT n.*
FROM #Names n
JOIN Letters l ON n.FIRSTNAME LIKE '%' + letter + letter+ '%'
OR n.LASTNAME LIKE '%' + letter + letter+ '%'
ORDER BY id
I'm thinking on something different but I need some more time.
FYI... the "ORDER BY id" is breaking it:pinch: (copy/paste error?)
-- Itzik Ben-Gan 2001
March 24, 2015 at 4:37 pm
Alan.B (3/24/2015)
Luis Cazares (3/24/2015)
This is a quick idea.
WITH Letters AS(
SELECT TOP 26 CHAR(ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) + 64) letter
FROM sys.all_columns
)
SELECT DISTINCT n.*
FROM #Names n
JOIN Letters l ON n.FIRSTNAME LIKE '%' + letter + letter+ '%'
OR n.LASTNAME LIKE '%' + letter + letter+ '%'
ORDER BY id
I'm thinking on something different but I need some more time.
FYI... the "ORDER BY id" is breaking it:pinch: (copy/paste error?)
You're right, I changed the sample data and posted full code in my second post.
March 24, 2015 at 4:41 pm
Luis Cazares (3/24/2015)
Alan.B (3/24/2015)
Luis Cazares (3/24/2015)
This is a quick idea.
WITH Letters AS(
SELECT TOP 26 CHAR(ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) + 64) letter
FROM sys.all_columns
)
SELECT DISTINCT n.*
FROM #Names n
JOIN Letters l ON n.FIRSTNAME LIKE '%' + letter + letter+ '%'
OR n.LASTNAME LIKE '%' + letter + letter+ '%'
ORDER BY id
I'm thinking on something different but I need some more time.
FYI... the "ORDER BY id" is breaking it:pinch: (copy/paste error?)
You're right, I changed the sample data and posted full code in my second post.
For the record, I spent about 20 minutes trying to come up with something better than what you posted and failed.:hehe: At least I could do some successful debugging :hehe:
-- Itzik Ben-Gan 2001
March 25, 2015 at 2:39 am
Just for fun, we'll get a little fancy here. 😛
My lizard-brain recalls a really neat trick I once saw in an article by Jeff Moden.
WITH Letters (letter) AS
(
SELECT TOP 26 CHAR(ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) + 64)
FROM sys.all_columns
UNION ALL
SELECT TOP 26 CHAR(ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) + 96)
FROM sys.all_columns
)
SELECT FIRSTNAME, LASTNAME, letters
FROM #Names a
CROSS APPLY
(
SELECT STUFF(
(
SELECT ',' + letter
FROM #Names b
CROSS JOIN Letters
WHERE a.FIRSTNAME = b.FIRSTNAME and a.LASTNAME = b.LASTNAME AND (
LTRIM(RTRIM(
REPLACE(REPLACE(REPLACE(FIRSTNAME COLLATE LATIN1_GENERAL_BIN,REPLICATE(letter,2),letter+CHAR(1)),CHAR(1)+letter,''),CHAR(1),'')
)) <> FIRSTNAME OR
LTRIM(RTRIM(
REPLACE(REPLACE(REPLACE(LASTNAME COLLATE LATIN1_GENERAL_BIN,REPLICATE(letter,2),letter+CHAR(1)),CHAR(1)+letter,''),CHAR(1),'')
)) <> LASTNAME)
FOR XML PATH('')
), 1, 1, '')
) b (letters)
WHERE letters IS NOT NULL;
The triple REPLACE converts duplicated characters into a single character using Jeff's method:
REPLACE Multiple Spaces with One[/url]
I speeded up the REPLACE, since we know it is much slower without a binary collation sequence (bug identified by Paul White: REPLACE function extremely slow with (non-binary) Windows collation), but then needed to use both upper and lower case characters in the Letters CTE.
Of course, we'd also like to see all duplicated characters, so I did the FOR XML PATH concatenation thingy ala Wayne Sheffield: Creating a comma-separated list (SQL Spackle)[/url]
Of course, this may not be faster than Luis's solution but it was kinda fun combining the works of those three great SQL-Jedi!
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
March 25, 2015 at 8:01 am
Great solutions! Thank you gentlemen. I am still playing with it and testing them out.
I have seen similar things with a Windows function but I am trying to dissect it to see what your doing here Luis. I'm so use to doing plain Jane windows functions.
But I know windows functions can be pretty awesome. I really like creative solutions it helps me learn.
So if I'm reading this right
You are selection the TOP 26 Characters/Letters from the names of all System columns. That should find all 26 letters.
You are also Converting Row_Number to Char to display letters rather than BigInt (WOW this is so cool right here never thought of doing that)
I kind of get lost on OVER() is that windowing over the whole set?
+ 64 is converting to letters right?
This is the part (Order by (SELECT Null)) I am completely lost on this but really want to understand this.
Is this just tricking the window on the ORDER Clause??
SELECT TOP 26 CHAR(ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) + 64) letter
FROM sys.all_columns
***SQL born on date Spring 2013:-)
March 25, 2015 at 9:41 am
You're really close Thomas.
Ranking functions (ROW_NUMBER, RANK, DENSE_RANK, NTILE) need the OVER clause with ORDER BY. As I don't care about the order and I can't use a constant, the (SELECT NULL) is "tricking" the ORDER BY. Although, it realizes that it doesn't need to order.
CHAR() function returns a character based on the ascii code introduced. ASCII codes from 65 to 90 represent A to Z (for Spanish, I would need an additional row for Ñ). That's why I need to add 64, so I can start from 65.
A little help to explain myself:
WITH E(n) AS(SELECT 0 UNION ALL SELECT 0 UNION ALL SELECT 0 UNION ALL
SELECT 0 UNION ALL SELECT 0 UNION ALL SELECT 0)
,E36(n) AS (SELECT E.n FROM E, E x)
,Letters AS(
SELECT TOP 26 CHAR(ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) + 64) letter,
ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) + 64 number
FROM sys.all_columns
)
SELECT *
FROM Letters
March 25, 2015 at 9:46 am
thomashohner (3/25/2015)
SELECT TOP 26 CHAR(ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) + 64) letter
FROM sys.all_columns
If you remove the CHAR function and the +64, you'll see it produces a set of numbers between 1 and 26: -
SELECT TOP 26
ROW_NUMBER() OVER ( ORDER BY ( SELECT NULL ) )
FROM sys.all_columns;
The reason for the order by select null is that we don't care what order it grabs the top 26 rows from sys.all_columns, we're not using the data we just want a table with numbers. He could've just as easily done: -
SELECT TOP 26
ROW_NUMBER() OVER ( ORDER BY @@VERSION )
FROM sys.all_columns;
Next, Dwain is adding 64 to each of those numbers, so now it's a set between 65 and 90.
If you run the CHAR function on that set, it just happens to produce the letters A-Z 🙂
SELECT TOP 26
CHAR( ROW_NUMBER() OVER ( ORDER BY ( SELECT NULL ) ) + 64 )
FROM sys.all_columns;
The UNION ALL is to create the letters a-z, so that you have both the upper and lower case variants, since he is using a binary collation on the replace. Another option would've been to change the common table expression like this: -
WITH Letters (letter) AS
(
SELECT CHAR(N + CASE WHEN a.N <= 26 THEN 64
ELSE 70
END)
FROM ( SELECT TOP 52
ROW_NUMBER() OVER ( ORDER BY ( SELECT NULL ) )
FROM sys.all_columns
) a ( N )
)
Bear in mind that I'm not saying it'd be better, just different.
March 25, 2015 at 10:35 am
So awesome guys. I love this stuff!
Very, Very cool.
Thanks for the explanation. Great learning experience.:-D
***SQL born on date Spring 2013:-)
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply