October 19, 2010 at 1:40 am
Hi All,
I have a table with lakhs of records which can probably be grouped into some 25-30 groups based on the names of the objects.
My objects column will have a fairly lengthy name with first 20 characters of it always being constant.
Ex: "Microsoft SQL Server" has 20 characters totally. Now my data would be like:
"Microsoft SQL Server 2005"
"Microsoft SQL Server 2008"
"Microsoft SQL Server 2008R2"
I am trying to make groups so that my data will look like:
2005 2008 2008R2
x y z
for this, since i know first 20 char are constant always, can i start with 21st char and take it till end?
(after 21st letter the length varies.)
In that case how can i consider the end point?
Please suggest me something.
Thanks in advance.
October 19, 2010 at 2:30 am
You can do something like:
SELECT RTRIM ( LTRIM ( REPLACE (ColA, 'Microsoft SQL Server','') ) ) TrimmedString
FROM
(
SELECT 'Microsoft SQL Server 2005' UNION ALL
SELECT 'Microsoft SQL Server 2008' UNION ALL
SELECT 'Microsoft SQL Server 2008R2'
) Tab (ColA)
October 19, 2010 at 2:34 am
If u insist in SUBSTRING funtion, then take a look at this:
SELECT
RTRIM ( LTRIM ( REPLACE (ColA, 'Microsoft SQL Server','') ) ) TrimmedString_ReplaceFn,
RTRIM ( LTRIM ( SUBSTRING( ColA , LEN ('Microsoft SQL Server') + 1 , (LEN(ColA) - LEN('Microsoft SQL Server')) ) )) TrimmedString_SUBFn
FROM
(
SELECT 'Microsoft SQL Server 2005' UNION ALL
SELECT 'Microsoft SQL Server 2008' UNION ALL
SELECT 'Microsoft SQL Server 2008R2'
) Tab (ColA)
October 19, 2010 at 2:38 am
SUBSTRING (Transact-SQL)
Returns part of a character, binary, text, or image expression
SUBSTRING (value_expression ,start_expression ,length_expression )
length_expression
Is a positive integer or bigint expression that specifies how many characters of the value_expression will be returned. If length_expression is negative, an error is generated and the statement is terminated. If the sum of start_expression and length_expression is greater than the number of characters in value_expression, the whole value expression beginning at start_expression is returned.
I Have Nine Lives You Have One Only
THINK!
October 19, 2010 at 2:53 am
Hi All,
Thanks for the reply.
What am i trying to do here is to make groups based on names. Currently it is done using case like:
case when name like 'Microsoft SQL Server 2005' then 2005
when name like 'Microsoft SQL Server 2008' then 2008
when name like 'Microsoft SQL Server 2008R2' then 200R2
like this i have more than 30 groups. Now the query is taking a long time. So i thought of tuning this query.
To start, i thought of avoiding the repetition of the first 20 characters.
Is my approach correct? from query tuning point of view, is there anything i can do?
October 19, 2010 at 4:51 am
Hi,
for lakhs of records, sql server is well scaled to perform, but given the problem, it is very open that using such a case statement is bad enough to hit the app's performance. If there is probably another column which can segregate the data, numerically, then it would be of great use.
From the tuning point, you can try to get in the RANK() function and use it to group the data.
Regards,
Prajey
October 19, 2010 at 5:47 am
Is it worthwhile creating a new column that removes the repeated portion of data?
Is it even worthwhile storing the repeated data if its the same for every row?
October 19, 2010 at 5:53 am
grahamc (10/19/2010)
Is it worthwhile creating a new column that removes the repeated portion of data?Is it even worthwhile storing the repeated data if its the same for every row?
Exactly what I was thinking. With 25-30 repeating values these could be moved into another table and the repeating column changed to a foreign key reference.
October 19, 2010 at 5:57 am
Malavika, i suggested a solution using REPLACE... that will remove the repeated data leaving only the data you require.. if there are multiple values to be replace, you can use CROSS APPLY, REPLACE, GROUP data....
October 19, 2010 at 6:03 am
Thanks for all your replies.
@Cold coffee,
i am thinking of moving these groups into a temp table. how about this?
if this is not wrong , then how to proceed on this?
October 19, 2010 at 6:17 am
I did not quite get your full requirement, 'cuz of the fact that, you did not
1. Provide sample data to work with
2. Give a clear visual example of your desired result
Assuming from your initial post, i am posting a code piece that may interest you. I am not sure how this will perform on your multi-lakh table.
; WITH CTE0 (ColA) AS
(
SELECT 'Microsoft SQL Server 2005'
UNION ALL SELECT 'Microsoft SQL Server 2008'
UNION ALL SELECT 'Microsoft SQL Server 2008R2'
UNION ALL SELECT 'Microsoft Visual Studio 2005'
UNION ALL SELECT 'Microsoft Visual Studio 2008'
UNION ALL SELECT 'Microsoft Visual Studio 2010'
UNION ALL SELECT 'Windows 95'
UNION ALL SELECT 'Windows 98'
UNION ALL SELECT 'Windows XP'
UNION ALL SELECT 'Windows Vista'
UNION ALL SELECT 'Windows 7'
UNION ALL SELECT 'Windows 7 Home Premium'
),
CTE1 (TrimmedString) AS
(
SELECT
--Replace the pattern from CROSS APPLY with a blank space and RTRIM and LTRIM it
--
RTRIM ( LTRIM ( REPLACE (ColA, Vals,'') ) ) TrimmedString
FROM
CTE0
CROSS APPLY -- here is where we input the patterns we need to remove
-- add them as a serious of SELECT clauses
(
SELECT 'Microsoft SQL Server'
UNION ALL SELECT 'Microsoft Visual Studio'
UNION ALL SELECT 'Windows'
) Trimmer(Vals)
WHERE CHARINDEX(Vals ,ColA ) > 0 -- This will ensure that only those rows that has a match is processed
)
SELECT
TrimmedString , COUNT(*) CNT
-- INTO TempTable - If you need a persisted table for the grouped count, you can uncomment this
FROM
CTE1
GROUP BY
TrimmedString
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply