August 10, 2018 at 6:01 am
I've imported a large flat file that I need to clean up. Each record should begin with a 4 digit integer and then a string following the integer. A new line would start with another 4 digit integer and string. Currently the records look like this
1375 Order includes
All of the items you ordered plus
a few
other items that you did not order
1376 There are no orders in this entry
1377 We always shop for apples and
oranges, but we do not like grapes
I would like to get it to be 3 individual records starting with the integer followed by the string.
1375 Order includes All of the items you ordered plus a few other items that you did not order
1376 There are no orders in this entry
1377 We always shop for apples and oranges, but we do not like grapes
August 10, 2018 at 6:13 am
jon.wilson - Friday, August 10, 2018 6:01 AMI've imported a large flat file that I need to clean up. Each record should begin with a 4 digit integer and then a string following the integer. A new line would start with another 4 digit integer and string. Currently the records look like this
1375 Order includes
All of the items you ordered plus
a few
other items that you did not order
1376 There are no orders in this entry
1377 We always shop for apples and
oranges, but we do not like grapesI would like to get it to be 3 individual records starting with the integer followed by the string.
1375 Order includes All of the items you ordered plus a few other items that you did not order
1376 There are no orders in this entry
1377 We always shop for apples and oranges, but we do not like grapes
OK, so is your question: "Can you assist me in writing a query to remove all rows which do not begin with "nnnn "? If not, what is your question?
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel
August 10, 2018 at 6:23 am
Sorry, I would like to have the rows that don't begin with nnnn to be attached to the previous row. The data source I recieved has a Item number (nnnn) and then a description. However, most of the descriptions after the item number start on a new row or rows. The next item starts with a new 4 digit integer. So, the final output would be
1234 String with the description
August 10, 2018 at 6:41 am
jon.wilson - Friday, August 10, 2018 6:23 AMSorry, I would like to have the rows that don't begin with nnnn to be attached to the previous row. The data source I recieved has a Item number (nnnn) and then a description. However, most of the descriptions after the item number start on a new row or rows. The next item starts with a new 4 digit integer. So, the final output would be
1234 String with the description
If I had read your first post more carefully, I would have noticed that, sorry!
How have you guaranteed that the rows are in the correct order after performing the import? Is there some identity column in your import table which can be used to ensure order?
Or are you looking to modify the import itself, such that the rows are imported 'correctly' without requiring a subsequent clean-up?
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel
August 10, 2018 at 6:43 am
Hopefully this will help, right now there are 7 records that need to be combined into 2 records each beginning with the 4 digit integer and the corresponding string.
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
CREATE TABLE [dbo].
(
[Column 0] [varchar](5000) NULL
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO
INSERT [dbo].
([Column 0]) VALUES (N'1376 Beginning of string ''A''')
GO
INSERT [dbo].
([Column 0]) VALUES (N'The rest of string ''A''')
GO
INSERT [dbo].
([Column 0]) VALUES (N'1377 Part of string ''B''')
GO
INSERT [dbo].
([Column 0]) VALUES (N'more of string ''B''')
GO
INSERT [dbo].
([Column 0]) VALUES (N'even more of string ''B''')
GO
INSERT [dbo].
([Column 0]) VALUES (N'still more of string ''B''')
GO
INSERT [dbo].
([Column 0]) VALUES (N'the last part of string ''B''')
GO
August 10, 2018 at 6:46 am
The rows are in the correct order from what I can tell
August 10, 2018 at 6:50 am
The flat file has the data in the correct order, every record does begin with the 4 digit integer and then the description, but there are carrage returns at the end of many of the strings creating essentially a new record.
August 10, 2018 at 6:51 am
jon.wilson - Friday, August 10, 2018 6:46 AMThe rows are in the correct order from what I can tell
Tables in SQL Server have no natural order. If you have the time/inclination, you may find this interesting.
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel
August 10, 2018 at 6:54 am
jon.wilson - Friday, August 10, 2018 6:50 AMThe flat file has the data in the correct order, every record does begin with the 4 digit integer and then the description, but there are carrage returns at the end of many of the strings creating essentially a new record.
Understood. Here are two ways to fix this:
1) Pre-process the file (use PoSh / C# / Python or whatever language you feel comfortable with) to put it in the desired format prior to import.
2) Use SSIS to perform the import. A Script Component source would be able to read the file in its current form and output the file in its tidied form (after writing some code).
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel
August 10, 2018 at 6:55 am
Ideally, I would like to get this into 2 columns. One for the 4 digit integer (which would actually be my key) and then the description
August 10, 2018 at 6:56 am
I'll take a look at these two paths, thanks for your suggestions
August 10, 2018 at 7:01 am
jon.wilson - Friday, August 10, 2018 6:56 AMI'll take a look at these two paths, thanks for your suggestions
If you're not already familiar with SSIS, option 1 is likely to be the faster option, as there's a bit of a learning curve.
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel
August 10, 2018 at 7:27 am
There is a third option if you can add an identity column to your staging table. Is that possible?
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
August 10, 2018 at 8:09 am
Yes, the staging table can be anything right now
August 10, 2018 at 8:59 am
I've updated the staging table definition. I used a temp table here, but you should use your regular table.
CREATE TABLE #table(
id INT IDENTITY,
[Column 0] [varchar](5000) NULL
) ON [PRIMARY]
INSERT #table ([Column 0])
VALUES
(N'1376 Beginning of string ''A''')
, (N'The rest of string ''A''')
, (N'1377 Part of string ''B''')
, (N'more of string ''B''')
, (N'even more of string ''B''')
, (N'still more of string ''B''')
, (N'the last part of string ''B''')
;
WITH grouped AS
(
SELECT t.id, t.[Column 0], i.is_start, SUM(i.is_start)OVER(ORDER BY t.id ROWS UNBOUNDED PRECEDING) grp
FROM #table t
CROSS APPLY (VALUES(CASE WHEN t.[Column 0] LIKE '[0-9][0-9][0-9][0-9]%' THEN 1 ELSE 0 END)) i(is_start)
)
SELECT
(
SELECT sub.[Column 0] + ' '
FROM grouped sub
WHERE sub.grp = g.grp
ORDER BY sub.id
FOR XML PATH(''), TYPE
).value('.', 'VARCHAR(5000)') AS Column0
FROM grouped g
WHERE g.is_start = 1
ORDER BY g.id
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
Viewing 15 posts - 1 through 14 (of 14 total)
You must be logged in to reply to this topic. Login to reply