December 8, 2009 at 5:27 am
I have a simple holding table thats populated with filenames from a directory and I'm wanting to clean up the names so I can then undertake a data comparison with another table.
Each filename will either contain 4 numerics '####', 5 numerics '#####' or 'CYP####' and the words 'Floor Plan'. I'm wanting to remove each of these to result in just the property name.
In the example given,
'Abbey Lane Cemetery 03022 Floor Plan' would become 'Abbey Lane Cemetery',
'Abbey Lane Primary 4-11 2001 Floor Plan' would become 'Abbey Lane Primary 4-11',
'Abbeydale Grange Secondary 11-16 4254 Floor Plan' would become 'Abbeydale Grange Secondary 11-16',
and
'Aldine House CYP0048 Floor Plan' would become 'Aldine House'
and so on.
USE test;
go
------
IF OBJECT_ID('CADlist','U') IS NOT NULL
DROP TABLE CADlist
------
CREATE TABLE [dbo].[CADlist](
[FileID] [int] IDENTITY(1,1) NOT NULL,
[FileNM] [nvarchar](1000) NOT NULL,
PRIMARY KEY CLUSTERED
(
[FileID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
-------
SET IDENTITY_INSERT CADlist ON
------
INSERT INTO CADlist
(FileID, FileNM)
SELECT '1','Abbey Lane Cemetery 03022 Floor Plan' UNION ALL
SELECT '2','Abbey Lane Primary 4-11 2001 Floor Plan' UNION ALL
SELECT '3','Abbeydale Grange Secondary 11-16 4254 Floor Plan' UNION ALL
SELECT '4','Abbeydale Industrial Hamlet 00433 Floor Plan' UNION ALL
SELECT '5','Abbeydale Primary 3-11 2002 Floor Plan' UNION ALL
SELECT '6','Abbeyfield Park 00460 Floor Plan' UNION ALL
SELECT '7','Acres Hill Primary 3-11 2318 Floor Plan' UNION ALL
SELECT '8','Aldine House CYP0048 Floor Plan' UNION ALL
SELECT '9','All Saints C Secondary VA 11-18 5401 Floor Plan' UNION ALL
SELECT '10','All Saints Youth Club CYP0002 Floor Plan' UNION ALL
SELECT '11','Angram Bank Pavilion 03029 Floor Plan' UNION ALL
SELECT '12','Angram Bank Primary 3-11 2342 Floor Plan' UNION ALL
SELECT '13','Anns Grove Primary 3-11 2343 Floor Plan' UNION ALL
SELECT '14','Anns Road Youth Club CYP0003 Floor Plan' UNION ALL
SELECT '15','Arbourthorne Primary 3-11 3429 Floor Plan' UNION ALL
SELECT '16','Athelstan Primary 4-11 2340 Floor Plan' UNION ALL
SELECT '17','Audio Visual Enterprise Centre (Brown Str) CYP0075 Floor Plan' UNION ALL
SELECT '18','Audio Visual Enterprise Centre (Sidney Str) CYP0076 Floor Plan' UNION ALL
SELECT '19','Ballifield Primary 3-11 2281 Floor Plan' UNION ALL
SELECT '20','Bankwood Primary 4-11 2322 Floor Plan'
SET IDENTITY_INSERT CADlist OFF
I'm not sure how best to go about it. To remove the 'Floor Plan' part it's simply a case of:
UPDATE CADlist
SET FileNM = Replace(FileNM, ' Floor Plan', '')
Is it worth splitting the data into it's various types and then working on each set individually:
SELECT FileNM
FROM CADlist
WHERE patindex('% [0-9][0-9][0-9][0-9][0-9] %', FileNM) > 0
----
SELECT FileNM
FROM CADlist
WHERE patindex('% [0-9][0-9][0-9][0-9] %', FileNM) > 0
----
SELECT FileNM
FROM CADlist
WHERE patindex('% CYP[0-9][0-9][0-9][0-9] %', FileNM) > 0
But, I'm not sure how best to alter each record, do I need to creat a cursor and iterate through each record?
I can use PATINDEX and STUFF to update strings but not sure how to use this for each record
DECLARE@strTest varchar(8000)
DECLARE @pos1 int --find position of 5 numerics
SELECT @strTest = 'Abbey Lane Cemetery 03022 Floor Plan'
SELECT @strTest as [BEFORE]
-----
SET @pos1 =PATINDEX('% [0-9][0-9][0-9][0-9][0-9] %', @strTest ) --find position of 5 numerics
SET @strTest= STUFF(@strTest,@pos1,6,'') --set string null for 5 numerics and blank space
SELECT @strTest as [AFTER]
I'd really appreciate some pointers in the right direction!
many thanks,
Dom Horton
December 8, 2009 at 6:34 am
There is no need to do a cursor. All you have to do is specify the operation(s) you want to perform to the entire set of rows. For example:
-- removes 'Floor Plan' from all rows
update cadlist
set fileNm = replace(fileNm,'Floor Plan','')
To perform multiple operations, you nest the functions or expression that are doing the string manipulation.
update cadlist
set fileNm = replace(replace(fileNm,'Floor Plan',''),'CYP','')
Looking at the pattern of your data, after using the above update, all your rows should end with a string of 4-5 digits. Instead of doing multiple passes with different WHERE conditions, you can use nested CASE expressions with your PATINDEX searches, or perhaps you could REVERSE() the string and look for the first blank then use LEFT, RIGHT, or SUBSTRING to get only the characters you want to keep.
It helps if you can generalize your observations, seeing the pattern as a string of 4 or more digits or seeing that they will always fall at the end. That should be enough to get you started. Let us know if you have any questions.
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
December 8, 2009 at 7:40 am
Thanks for the response
...I'll have a play...
December 8, 2009 at 8:14 am
Hi
Is that...
space - [word] - space - "floor" - space - "plan"
where [word] = 4 numerics '####', 5 numerics '#####' or 'CYP####'
Or to put it another way, you want to truncate the string at the FIRST space of
space - [word] - space - "floor" - space - "plan"
is this correct?
Reading from RIGHT to LEFT, the third space in the string?
If so, then here's how:
RTRIM the string
REVERSE it
Find the position of the third space using CHARINDEX repeatedly
Pluck the piece you want.
Would you like some help with this?
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
December 8, 2009 at 8:25 am
SELECT FileNM FROM CADlist
-----
UPDATE CADlist
SET FileNm = REPLACE(REPLACE(fileNm,'Floor Plan',''),'CYP','')
-----
SELECT FileNM FROM CADlist
-----
UPDATE CADlist
SET FileNM = REVERSE(FileNM)
SELECT FileNM FROM CADlist
-----
If I remove 'CYP' and 'Floor Plan' as suggested, reverse the string so each entry will start with either 4 or 5 digits. How do I apply the below logic to all the records:
DECLARE@strTest varchar(1000)
SELECT @strTest = '22030 yretemeC enaL yebbA'
SELECT @strtest
SELECT @strTest=SUBSTRING ( @strtest ,(charindex(' ',@strtest)),1000 )
SELECT @strTest=reverse(@strtest)
SELECT @strtest
thanks
December 8, 2009 at 8:45 am
Here you go...
-- This is a crude numbers table and is not part of the solution
;WITH Numbers AS (SELECT (N10 + N1) + 1 AS n
FROM (SELECT CAST(0 AS INT) AS N10 UNION ALL SELECT 10 UNION ALL SELECT 20 UNION ALL SELECT 30 UNION ALL SELECT 40 UNION ALL
SELECT 50 UNION ALL SELECT 60) Tens
CROSS JOIN
(SELECT CAST(0 AS INT) AS N1 UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL
SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) Units )
SELECT c.*, d.[Output]
FROM #CADlist c
CROSS APPLY (SELECT LEFT(c.FileNM, LEN(c.FileNM) - MAX(x.n)) AS [Output]
FROM (SELECT TOP 3 n
FROM Numbers n
WHERE SUBSTRING(LTRIM(REVERSE(c.FileNM)), n, 1) = ' '
AND n < LEN(c.FileNM)
ORDER BY n) x
) d
You will need to change the table name.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
December 8, 2009 at 9:03 am
I'm sorry Chris but I don't follow how your code works.
regards,
Dom
December 8, 2009 at 9:25 am
DECLARE @FileNM VARCHAR(70)
SET @FileNM = 'Audio Visual Enterprise Centre (Sidney Str) CYP0076 Floor Plan'
;WITH Numbers AS (SELECT (N10 + N1) + 1 AS n
FROM (SELECT CAST(0 AS INT) AS N10 UNION ALL SELECT 10 UNION ALL SELECT 20 UNION ALL SELECT 30 UNION ALL SELECT 40 UNION ALL
SELECT 50 UNION ALL SELECT 60) Tens
CROSS JOIN
(SELECT CAST(0 AS INT) AS N1 UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL
SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) Units )
-- look at the result of this, then
SELECT n, SUBSTRING(LTRIM(REVERSE(@FileNM)), n, 1)
FROM Numbers n
ORDER BY n
-- comment it out and look at the result of this
SELECT TOP(3) n
FROM Numbers n
WHERE SUBSTRING(LTRIM(REVERSE(@FileNM)), n, 1) = ' '
AND n < LEN(@FileNM)
ORDER BY n
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
December 8, 2009 at 2:45 pm
Hi Chris,
thanks for the help, I've fathomed it out now! I'm not that familar with CTE's, so it'll be something I'll read up on in the next day or two.
regards,
Dom
December 9, 2009 at 2:11 am
Top work, Dom.
A CTE is like a derived table defined outside the main body of the query, with the advantage that you can refer to it multiple times in the query, including as an aggregated derived table. They're probably best for making complex queries more readable and testable but have some specialised uses too - resolving hierarchical structures springs to mind. There's a recent[/url] SSC article to get you started.
Cheers
ChrisM
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
December 9, 2009 at 7:18 am
Thanks for the info Chris, it's much appreciated,
cheers,
dom
December 9, 2009 at 7:34 am
Here's a short article on CTE's you could read when you want to start investigating them.
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply