May 15, 2012 at 10:13 am
I am working on streamlining some work that another group here is doing. They have a series of several hundred documents (entered into a table). The document naming structure is not really "standard" meaning that there is a set pattern. Other than the first two characters are Alpha (ex: SE).
I have stripped out those characters as the first level of my order by gets them all together. The problem comes when I try and sort the following:
11e
11EOa
11f
15c
1a
1a
1b
After these strings there is a space and then the rest of the file name. I just need to get these in order. I thought about trying to split the numeric from the alpha, but have not been very successful.
my order by is based on:
substring(eso.documentTitle, 3, CHARINDEX(' ', eso.documentTitle)-2)
As you can see there are some that have a single digit, and some that have multiple characters after the numbers. Any thoughts on the best way to attack this? I am trying to do this in my TSQL, as I am grabbing the data and throwing it into a temp table to redo the sort order (IE: update the original table as it is based on the order in which these files where entered into the DB.)
Hope this makes sense....
TIA.
May 15, 2012 at 10:17 am
...
Hope this makes sense....
...
It would make full sense, if you could show the exact expected result based on your sample. It would help to understand how you want your data be sorted...
May 15, 2012 at 10:20 am
Here is a subset of the data showing the current problem. notice that 2a comes after 27n. there unfortunately is no pattern or set number of files that can or will be present. Some may have OO27a-g or they may go to a-n. Hope that is clearer.
OO27k Informed Consent from Persons with Limited
OO27l Expedited Review
OO27m IRB Review of Modifications
OO27n Recruiting Potential Research Participants
OO2a Tim Parker CV 2011 10 25
OO3a Nursing Quality Plan 2011
OO3b 2011 Safety Mgmt Plan
OO3c Health System 2010 Community Benefit
May 15, 2012 at 10:24 am
in a similar situation, i've used a function dbo.StripNonNumeric, which leaves just the string 0-9, which i could convert to an integer;
then i could do
ORDER BY
CONVERT(int,dbo.StripNonNumeric(SomeColumn)),,
SomeColumn
As long as you understand the sorting ramifications, like
that "11AB23" will get converted to 1123 for the order,
that's one way to tackle it.
Lowell
May 15, 2012 at 10:30 am
Removing all digits may not help for cases like:
OO1a 2011 Zero
OO1a 2012 Non-Zero
as OP, if I understand correctly, would want to sort them as above, and stripping digits would sort them as
OO1a 2012 Non-Zero
OO1a 2011 Zero
Question: what is the maximum number of digits between your OO prefix and the first Alpha-character you want to be significant for sorting?
May 15, 2012 at 10:31 am
YUCK!!! Can you go to the to other group and tell him how horrible their naming convention is? Add an underscore, space, anything...
OO_27_a
Fixing this at the source will be far better than trying to unravel it later.
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
May 15, 2012 at 10:37 am
Sean - I couldn't agree more...I am going to be working on that! I just started here 3 weeks ago and this is the second step of a cleanup process that they are spending quite a bit of time on. these file names are output to an HTML page and they a sort order that is when they were placed in the project.
Step one was cleaning up files that were associated improperly with the 5 categories. That was easy.
As for the max number of digits. So far 2. and I am really only worried about sorting on anything up to that first space in the file name. The rest of the document name isn't a problem (at least currently).
May 15, 2012 at 10:42 am
I love it!
declare @t table (id int identity(1,1), val varchar(200))
insert @t
select 'OO27k Informed Consent from Persons with Limited'
union select 'OO27l Expedited Review'
union select 'OO27m IRB Review of Modifications'
union select 'OO27n Recruiting Potential Research Participants'
union select 'OO2a Tim Parker CV 2011 10 25'
union select 'OO3a Nursing Quality Plan 2011'
union select 'OO3b 2011 Safety Mgmt Plan'
union select 'OO3c Health System 2010 Community Benefit'
;with remd
as
(
select id, LEN(MAX(rn)) rd
from
(
select id, substring(val,3,200) v, rn
from @t
join (select top (999) cast(ROW_NUMBER() over (order by (select null)) as varchar) rn
from sys.columns) n
on n.rn = LEFT(substring(val,3,200),LEN(rn))
) q
group by id
)
select t.val
from @t t
join remd r on r.id = t.id
order by substring(val,3 + rd,200)
May 15, 2012 at 10:47 am
I'm sure someone may find a more elegant way, but this is forcing the sort.
with c (doc_name) as (
select 'OO27k' union all
select 'OO27l' union all
select 'OO27m' union all
select 'OO27n' union all
select 'OO2a' union all
select 'OO3a' union all
select 'OO3b' union all
select 'OO3c'
)
select *
from c
order by
case when isnumeric(substring(doc_name, 3, 1))=0 then ASCII(substring(doc_name, 3, 1)) else ASCII(substring(doc_name, 3, 1))+200 end,
case when isnumeric(substring(doc_name, 4, 1))=0 then ASCII(substring(doc_name, 4, 1)) else ASCII(substring(doc_name, 4, 1))+200 end,
case when isnumeric(substring(doc_name, 5, 1))=0 then ASCII(substring(doc_name, 5, 1)) else ASCII(substring(doc_name, 5, 1))+200 end
______________________________________________________________________________
How I want a drink, alcoholic of course, after the heavy lectures involving quantum mechanics.
May 15, 2012 at 10:51 am
..., but this is forcing the sort.
Yeah, but it will only work for known values.
It's as well as doing it manually...
May 15, 2012 at 11:08 am
Can you provide the DDL (CREATE TABLE statement) where this data is stored (and there by selected from). It wold help to know more about the data.
May 15, 2012 at 12:32 pm
Eugene Elutin (5/15/2012)
..., but this is forcing the sort.
Yeah, but it will only work for known values.
It's as well as doing it manually...
Not at all. It's sorting by column 3, then 4, then 5. It answers the example given.
______________________________________________________________________________
How I want a drink, alcoholic of course, after the heavy lectures involving quantum mechanics.
May 15, 2012 at 7:27 pm
TimParker (5/15/2012)
Here is a subset of the data showing the current problem. notice that 2a comes after 27n. there unfortunately is no pattern or set number of files that can or will be present. Some may have OO27a-g or they may go to a-n. Hope that is clearer.OO27k Informed Consent from Persons with Limited
OO27l Expedited Review
OO27m IRB Review of Modifications
OO27n Recruiting Potential Research Participants
OO2a Tim Parker CV 2011 10 25
OO3a Nursing Quality Plan 2011
OO3b 2011 Safety Mgmt Plan
OO3c Health System 2010 Community Benefit
You say there is no set pattern but, let me ask, will that first blob of text always contain at least 1 alpha character followed by at least 1 numeric digit followed by at least 1 alpha character follwed by at least 1 space?
--Jeff Moden
Change is inevitable... Change for the better is not.
May 16, 2012 at 12:51 am
Assuming Jeff's assumptions are correct (except that the first blob is always 2 alpha characters), you can construct an ugly, nasty ORDER BY clause:
DECLARE @t TABLE ([name] VARCHAR(100))
INSERT INTO @t
SELECT 'OO27k Informed Consent from Persons with Limited'
UNION ALL SELECT 'OO27l Expedited Review'
UNION ALL SELECT 'OO27 Another Review'
UNION ALL SELECT 'OO27m IRB Review of Modifications'
UNION ALL SELECT 'OO27n Recruiting Potential Research Participants'
UNION ALL SELECT 'OO2a Tim Parker CV 2011 10 25'
UNION ALL SELECT 'OO3a Nursing Quality Plan 2011'
UNION ALL SELECT 'OO3b 2011 Safety Mgmt Plan'
UNION ALL SELECT 'OO3c Health System 2010 Community Benefit'
SELECT [name]
FROM (
SELECT [name], PATINDEX('%[0-9]%', [name]) AS FirstNo
,PATINDEX('%[a-zA-Z]%', SUBSTRING([name],3,LEN([name]))) As Suffix
FROM @t) x
ORDER BY SUBSTRING([name], 1, 2)
,CAST(SUBSTRING([name], 3, 2+Suffix-FirstNo) AS INT)
,SUBSTRING([name], 2+Suffix, CHARINDEX(' ', [name]))
Resulting in this:
OO2a Tim Parker CV 2011 10 25
OO3a Nursing Quality Plan 2011
OO3b 2011 Safety Mgmt Plan
OO3c Health System 2010 Community Benefit
OO27 Another Review
OO27k Informed Consent from Persons with Limited
OO27l Expedited Review
OO27m IRB Review of Modifications
OO27n Recruiting Potential Research Participants
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
Viewing 14 posts - 1 through 13 (of 13 total)
You must be logged in to reply to this topic. Login to reply