loading and reformatting text file data into into SQL table

Question

loading and reformatting text file data into into SQL table

olibbhq

SSCarpal Tunnel

Points: 4034
More actions
May 29, 2015 at 4:27 am

#298628

Hi and many thanks for your help in advance. I am looking for a way to convert the following format into a sql table. For those of you familar with the format it is Bib Tex.
Essentially a new row in the table would be for each entry, denoted by an @ logo and each column is denoted by an =, as you can see from the example data no one contains all the possible columns and some fields can be over two lines long.
To load this I was considering loading it into a table as each line being a row. Adding a row number, then a column counting the @ signs in order and essentially grouping each record, then for each group running through and looking for the column keywords 'author' , 'title' etc then splitting the data out into those constituent parts using substring and charindex.
Does this sound like the right approach.
Many Thanks for your help and suggestions,
Oli
@Book{hicks2001,
author = "von Hicks, III, Michael",
title = "Design of a Carbon Fiber Composite Grid Structure for the GLAST
Spacecraft Using a Novel Manufacturing Technique",
publisher = "Stanford Press",
year = 2001,
address = "Palo Alto",
edition = "1st",
}
@Book{Torre2008,
author = "Joe Torre and Tom Verducci",
publisher = "Doubleday",
title = "The Yankee Years",
year = 2008,
}
@INPROCEEDINGS {author:06,
title = {Some publication title},
author = {First Author and Second Author},
crossref = {conference:06},
pages = {330--331},
}
@PROCEEDINGS {conference:06,
editor = {First Editor and Second Editor},
title = {Proceedings of the Xth Conference on XYZ},
booktitle = {Proceedings of the Xth Conference on XYZ},
year = 2006,
month = oct,
}

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply

Phil Parkin SSC Guru Points: 247224 More actions · Answer 1

You could do the whole thing (parsing, splitting, outputting one row with multiple columns per @ group) using an asynchronous Script Component, but it would take a fair chunk of C# to implement.

If that doesn't scare you, do some research to get yourself up to speed with what this might entail & then come back here with any follow-up questions.

olibbhq SSCarpal Tunnel Points: 4034 More actions · Answer 2

Hi, thanks for the advice, I will try with Python as I am more familiar with python than c# and see how I get on.

Many Thanks,

Oliver

Phil Parkin SSC Guru Points: 247224 More actions · Answer 3

As far as I know, you cannot use Python in a Script Component, so I think you are limited by that.

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 4

oliver.morris (5/29/2015)
Hi and many thanks for your help in advance. I am looking for a way to convert the following format into a sql table. For those of you familar with the format it is Bib Tex.
Essentially a new row in the table would be for each entry, denoted by an @ logo and each column is denoted by an =, as you can see from the example data no one contains all the possible columns and some fields can be over two lines long.
To load this I was considering loading it into a table as each line being a row. Adding a row number, then a column counting the @ signs in order and essentially grouping each record, then for each group running through and looking for the column keywords 'author' , 'title' etc then splitting the data out into those constituent parts using substring and charindex.
Does this sound like the right approach.
Many Thanks for your help and suggestions,
Oli
@Book{hicks2001,
author = "von Hicks, III, Michael",
title = "Design of a Carbon Fiber Composite Grid Structure for the GLAST
Spacecraft Using a Novel Manufacturing Technique",
publisher = "Stanford Press",
year = 2001,
address = "Palo Alto",
edition = "1st",
}
@Book{Torre2008,
author = "Joe Torre and Tom Verducci",
publisher = "Doubleday",
title = "The Yankee Years",
year = 2008,
}
@INPROCEEDINGS {author:06,
title = {Some publication title},
author = {First Author and Second Author},
crossref = {conference:06},
pages = {330--331},
}
@PROCEEDINGS {conference:06,
editor = {First Editor and Second Editor},
title = {Proceedings of the Xth Conference on XYZ},
booktitle = {Proceedings of the Xth Conference on XYZ},
year = 2006,
month = oct,
}

It might seem obvious but I want to make sure...

1. Do you want to capture the fact that the items after the @ symbol change?

2. In the case of things like "hicks2001" "conference:06", what do you want done with THAT data?

3. What do you want done with the braces and double quotes around the items?

Maybe it would be easier to just ask you what the data from above should look like in a table?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Welsh Corgi SSC Guru Points: 116520 More actions · Answer 5

oliver.morris (5/29/2015)
Hi and many thanks for your help in advance. I am looking for a way to convert the following format into a sql table. For those of you familar with the format it is Bib Tex.
Essentially a new row in the table would be for each entry, denoted by an @ logo and each column is denoted by an =, as you can see from the example data no one contains all the possible columns and some fields can be over two lines long.
To load this I was considering loading it into a table as each line being a row. Adding a row number, then a column counting the @ signs in order and essentially grouping each record, then for each group running through and looking for the column keywords 'author' , 'title' etc then splitting the data out into those constituent parts using substring and charindex.
Does this sound like the right approach.
Many Thanks for your help and suggestions,
Oli
I'm curious about the Data format.
Who are you getting the data from?
Can't you ask they to provided you a different format?
Is this a homework assignment?
@Book{hicks2001,
author = "von Hicks, III, Michael",
title = "Design of a Carbon Fiber Composite Grid Structure for the GLAST
Spacecraft Using a Novel Manufacturing Technique",
publisher = "Stanford Press",
year = 2001,
address = "Palo Alto",
edition = "1st",
}
@Book{Torre2008,
author = "Joe Torre and Tom Verducci",
publisher = "Doubleday",
title = "The Yankee Years",
year = 2008,
}
@INPROCEEDINGS {author:06,
title = {Some publication title},
author = {First Author and Second Author},
crossref = {conference:06},
pages = {330--331},
}
@PROCEEDINGS {conference:06,
editor = {First Editor and Second Editor},
title = {Proceedings of the Xth Conference on XYZ},
booktitle = {Proceedings of the Xth Conference on XYZ},
year = 2006,
month = oct,
}

For better, quicker answers on T-SQL questions, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/

For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

Welsh Corgi SSC Guru Points: 116520 More actions · Answer 6

oliver.morris (5/29/2015)
Hi and many thanks for your help in advance. I am looking for a way to convert the following format into a sql table. For those of you familar with the format it is Bib Tex.
Essentially a new row in the table would be for each entry, denoted by an @ logo and each column is denoted by an =, as you can see from the example data no one contains all the possible columns and some fields can be over two lines long.
To load this I was considering loading it into a table as each line being a row. Adding a row number, then a column counting the @ signs in order and essentially grouping each record, then for each group running through and looking for the column keywords 'author' , 'title' etc then splitting the data out into those constituent parts using substring and charindex.
Does this sound like the right approach.
Many Thanks for your help and suggestions,
Oli
This article bears strong resemblance to your issue:;-)
http://en.wikipedia.org/wiki/BibTeX
@Book{hicks2001,
author = "von Hicks, III, Michael",
title = "Design of a Carbon Fiber Composite Grid Structure for the GLAST
Spacecraft Using a Novel Manufacturing Technique",
publisher = "Stanford Press",
year = 2001,
address = "Palo Alto",
edition = "1st",
}
@Book{Torre2008,
author = "Joe Torre and Tom Verducci",
publisher = "Doubleday",
title = "The Yankee Years",
year = 2008,
}
@INPROCEEDINGS {author:06,
title = {Some publication title},
author = {First Author and Second Author},
crossref = {conference:06},
pages = {330--331},
}
@PROCEEDINGS {conference:06,
editor = {First Editor and Second Editor},
title = {Proceedings of the Xth Conference on XYZ},
booktitle = {Proceedings of the Xth Conference on XYZ},
year = 2006,
month = oct,
}

Perhaps I'm missing something?

For better, quicker answers on T-SQL questions, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/

For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 7

Well, at least it's not as noisy and bloated as XML. 😀

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 8

So, now that we've established what it is, my question remains... how do you want to see that in a table? In this case and based on the Wikipedia article, I'd recommend 1 table per unique @category.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

olibbhq SSCarpal Tunnel Points: 4034 More actions · Answer 9

Hi,

Thanks for all your help, I took the example content from wikipedia but all the content I want to put into a table is in the format. Essentially I am trying to move the data into a single standardised table from this bib tex format.

The way I have achieved this is to use python to change the text file so that each new line starts at an @ symbol. Then import row by row into an existing table using python pyodbc. Then using charindex and substring to pull the content from the large string into their relevant tables.

It seems to work and at the moment I don't need to put it in a SSIS package.

Thanks again for your help,

Oliver

Welsh Corgi SSC Guru Points: 116520 More actions · Answer 10

olibbhq (5/31/2015)
Hi,
Thanks for all your help, I took the example content from wikipedia but all the content I want to put into a table is in the format. Essentially I am trying to move the data into a single standardised table from this bib tex format.
The way I have achieved this is to use python to change the text file so that each new line starts at an @ symbol. Then import row by row into an existing table using python pyodbc. Then using charindex and substring to pull the content from the large string into their relevant tables.
It seems to work and at the moment I don't need to put it in a SSIS package.
Thanks again for your help,
Oliver

My advice to you is to request the data feed in a specific format that is Conducive to an easy load.

You need to tell them what you want.

If this is a mandate by an instructor then unfortunately that is not going to work. Good luck. 🙂

I would KISS.

For better, quicker answers on T-SQL questions, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/

For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

Welsh Corgi SSC Guru Points: 116520 More actions · Answer 11

Why do you think you need that you do not need to use SSIS?

If you get your date in a simplified format you can use the Import Wizard To load your data without any code

For better, quicker answers on T-SQL questions, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/

For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

Welsh Corgi SSC Guru Points: 116520 More actions · Answer 12

olibbhq (5/31/2015)
Hi,
Thanks for all your help, I took the example content from wikipedia but all the content I want to put into a table is in the format. Essentially I am trying to move the data into a single standardised table from this bib tex format.
The way I have achieved this is to use python to change the text file so that each new line starts at an @ symbol. Then import row by row into an existing table using python pyodbc. Then using charindex and substring to pull the content from the large string into their relevant tables.
It seems to work and at the moment I don't need to put it in a SSIS package.
Thanks again for your help,
Oliver

I would not accept that file format.

Specify the file format and make life easier on yourself.

For better, quicker answers on T-SQL questions, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/

For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

olibbhq SSCarpal Tunnel Points: 4034 More actions · Answer 13

Thanks for the advice and your help have a good sunday afternoon!

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 14

olibbhq (5/31/2015)
Hi,
Thanks for all your help, I took the example content from wikipedia but all the content I want to put into a table is in the format. Essentially I am trying to move the data into a single standardised table from this bib tex format.
The way I have achieved this is to use python to change the text file so that each new line starts at an @ symbol. Then import row by row into an existing table using python pyodbc. Then using charindex and substring to pull the content from the large string into their relevant tables.
It seems to work and at the moment I don't need to put it in a SSIS package.
Thanks again for your help,
Oliver

Congrats on that. I just hate to do such things because it lends to the proverbial "Tower of Babel".

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)