May 29, 2015 at 4:27 am
Hi and many thanks for your help in advance. I am looking for a way to convert the following format into a sql table. For those of you familar with the format it is Bib Tex.
Essentially a new row in the table would be for each entry, denoted by an @ logo and each column is denoted by an =, as you can see from the example data no one contains all the possible columns and some fields can be over two lines long.
To load this I was considering loading it into a table as each line being a row. Adding a row number, then a column counting the @ signs in order and essentially grouping each record, then for each group running through and looking for the column keywords 'author' , 'title' etc then splitting the data out into those constituent parts using substring and charindex.
Does this sound like the right approach.
Many Thanks for your help and suggestions,
Oli
@Book{hicks2001,
author = "von Hicks, III, Michael",
title = "Design of a Carbon Fiber Composite Grid Structure for the GLAST
Spacecraft Using a Novel Manufacturing Technique",
publisher = "Stanford Press",
year = 2001,
address = "Palo Alto",
edition = "1st",
}
@Book{Torre2008,
author = "Joe Torre and Tom Verducci",
publisher = "Doubleday",
title = "The Yankee Years",
year = 2008,
}
@INPROCEEDINGS {author:06,
title = {Some publication title},
author = {First Author and Second Author},
crossref = {conference:06},
pages = {330--331},
}
@PROCEEDINGS {conference:06,
editor = {First Editor and Second Editor},
title = {Proceedings of the Xth Conference on XYZ},
booktitle = {Proceedings of the Xth Conference on XYZ},
year = 2006,
month = oct,
}
May 29, 2015 at 6:55 am
You could do the whole thing (parsing, splitting, outputting one row with multiple columns per @ group) using an asynchronous Script Component, but it would take a fair chunk of C# to implement.
If that doesn't scare you, do some research to get yourself up to speed with what this might entail & then come back here with any follow-up questions.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
May 29, 2015 at 7:40 am
Hi, thanks for the advice, I will try with Python as I am more familiar with python than c# and see how I get on.
Many Thanks,
Oliver
May 30, 2015 at 3:47 am
As far as I know, you cannot use Python in a Script Component, so I think you are limited by that.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
May 30, 2015 at 1:21 pm
oliver.morris (5/29/2015)
Hi and many thanks for your help in advance. I am looking for a way to convert the following format into a sql table. For those of you familar with the format it is Bib Tex.Essentially a new row in the table would be for each entry, denoted by an @ logo and each column is denoted by an =, as you can see from the example data no one contains all the possible columns and some fields can be over two lines long.
To load this I was considering loading it into a table as each line being a row. Adding a row number, then a column counting the @ signs in order and essentially grouping each record, then for each group running through and looking for the column keywords 'author' , 'title' etc then splitting the data out into those constituent parts using substring and charindex.
Does this sound like the right approach.
Many Thanks for your help and suggestions,
Oli
@Book{hicks2001,
author = "von Hicks, III, Michael",
title = "Design of a Carbon Fiber Composite Grid Structure for the GLAST
Spacecraft Using a Novel Manufacturing Technique",
publisher = "Stanford Press",
year = 2001,
address = "Palo Alto",
edition = "1st",
}
@Book{Torre2008,
author = "Joe Torre and Tom Verducci",
publisher = "Doubleday",
title = "The Yankee Years",
year = 2008,
}
@INPROCEEDINGS {author:06,
title = {Some publication title},
author = {First Author and Second Author},
crossref = {conference:06},
pages = {330--331},
}
@PROCEEDINGS {conference:06,
editor = {First Editor and Second Editor},
title = {Proceedings of the Xth Conference on XYZ},
booktitle = {Proceedings of the Xth Conference on XYZ},
year = 2006,
month = oct,
}
It might seem obvious but I want to make sure...
1. Do you want to capture the fact that the items after the @ symbol change?
2. In the case of things like "hicks2001" "conference:06", what do you want done with THAT data?
3. What do you want done with the braces and double quotes around the items?
Maybe it would be easier to just ask you what the data from above should look like in a table?
--Jeff Moden
Change is inevitable... Change for the better is not.
May 30, 2015 at 1:32 pm
oliver.morris (5/29/2015)
Hi and many thanks for your help in advance. I am looking for a way to convert the following format into a sql table. For those of you familar with the format it is Bib Tex.Essentially a new row in the table would be for each entry, denoted by an @ logo and each column is denoted by an =, as you can see from the example data no one contains all the possible columns and some fields can be over two lines long.
To load this I was considering loading it into a table as each line being a row. Adding a row number, then a column counting the @ signs in order and essentially grouping each record, then for each group running through and looking for the column keywords 'author' , 'title' etc then splitting the data out into those constituent parts using substring and charindex.
Does this sound like the right approach.
Many Thanks for your help and suggestions,
Oli
I'm curious about the Data format.
Who are you getting the data from?
Can't you ask they to provided you a different format?
Is this a homework assignment?
@Book{hicks2001,
author = "von Hicks, III, Michael",
title = "Design of a Carbon Fiber Composite Grid Structure for the GLAST
Spacecraft Using a Novel Manufacturing Technique",
publisher = "Stanford Press",
year = 2001,
address = "Palo Alto",
edition = "1st",
}
@Book{Torre2008,
author = "Joe Torre and Tom Verducci",
publisher = "Doubleday",
title = "The Yankee Years",
year = 2008,
}
@INPROCEEDINGS {author:06,
title = {Some publication title},
author = {First Author and Second Author},
crossref = {conference:06},
pages = {330--331},
}
@PROCEEDINGS {conference:06,
editor = {First Editor and Second Editor},
title = {Proceedings of the Xth Conference on XYZ},
booktitle = {Proceedings of the Xth Conference on XYZ},
year = 2006,
month = oct,
}
For better, quicker answers on T-SQL questions, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/
May 30, 2015 at 2:04 pm
oliver.morris (5/29/2015)
Hi and many thanks for your help in advance. I am looking for a way to convert the following format into a sql table. For those of you familar with the format it is Bib Tex.Essentially a new row in the table would be for each entry, denoted by an @ logo and each column is denoted by an =, as you can see from the example data no one contains all the possible columns and some fields can be over two lines long.
To load this I was considering loading it into a table as each line being a row. Adding a row number, then a column counting the @ signs in order and essentially grouping each record, then for each group running through and looking for the column keywords 'author' , 'title' etc then splitting the data out into those constituent parts using substring and charindex.
Does this sound like the right approach.
Many Thanks for your help and suggestions,
Oli
This article bears strong resemblance to your issue:;-)
http://en.wikipedia.org/wiki/BibTeX">
http://en.wikipedia.org/wiki/BibTeX
@Book{hicks2001,
author = "von Hicks, III, Michael",
title = "Design of a Carbon Fiber Composite Grid Structure for the GLAST
Spacecraft Using a Novel Manufacturing Technique",
publisher = "Stanford Press",
year = 2001,
address = "Palo Alto",
edition = "1st",
}
@Book{Torre2008,
author = "Joe Torre and Tom Verducci",
publisher = "Doubleday",
title = "The Yankee Years",
year = 2008,
}
@INPROCEEDINGS {author:06,
title = {Some publication title},
author = {First Author and Second Author},
crossref = {conference:06},
pages = {330--331},
}
@PROCEEDINGS {conference:06,
editor = {First Editor and Second Editor},
title = {Proceedings of the Xth Conference on XYZ},
booktitle = {Proceedings of the Xth Conference on XYZ},
year = 2006,
month = oct,
}
Perhaps I'm missing something?
For better, quicker answers on T-SQL questions, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/
May 30, 2015 at 2:18 pm
Well, at least it's not as noisy and bloated as XML. 😀
--Jeff Moden
Change is inevitable... Change for the better is not.
May 30, 2015 at 2:20 pm
So, now that we've established what it is, my question remains... how do you want to see that in a table? In this case and based on the Wikipedia article, I'd recommend 1 table per unique @category.
--Jeff Moden
Change is inevitable... Change for the better is not.
May 31, 2015 at 3:11 am
Hi,
Thanks for all your help, I took the example content from wikipedia but all the content I want to put into a table is in the format. Essentially I am trying to move the data into a single standardised table from this bib tex format.
The way I have achieved this is to use python to change the text file so that each new line starts at an @ symbol. Then import row by row into an existing table using python pyodbc. Then using charindex and substring to pull the content from the large string into their relevant tables.
It seems to work and at the moment I don't need to put it in a SSIS package.
Thanks again for your help,
Oliver
May 31, 2015 at 3:33 am
olibbhq (5/31/2015)
Hi,Thanks for all your help, I took the example content from wikipedia but all the content I want to put into a table is in the format. Essentially I am trying to move the data into a single standardised table from this bib tex format.
The way I have achieved this is to use python to change the text file so that each new line starts at an @ symbol. Then import row by row into an existing table using python pyodbc. Then using charindex and substring to pull the content from the large string into their relevant tables.
It seems to work and at the moment I don't need to put it in a SSIS package.
Thanks again for your help,
Oliver
My advice to you is to request the data feed in a specific format that is Conducive to an easy load.
You need to tell them what you want.
If this is a mandate by an instructor then unfortunately that is not going to work. Good luck. 🙂
I would KISS.
For better, quicker answers on T-SQL questions, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/
May 31, 2015 at 3:47 am
Why do you think you need that you do not need to use SSIS?
If you get your date in a simplified format you can use the Import Wizard To load your data without any code
For better, quicker answers on T-SQL questions, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/
May 31, 2015 at 4:03 am
olibbhq (5/31/2015)
Hi,Thanks for all your help, I took the example content from wikipedia but all the content I want to put into a table is in the format. Essentially I am trying to move the data into a single standardised table from this bib tex format.
The way I have achieved this is to use python to change the text file so that each new line starts at an @ symbol. Then import row by row into an existing table using python pyodbc. Then using charindex and substring to pull the content from the large string into their relevant tables.
It seems to work and at the moment I don't need to put it in a SSIS package.
Thanks again for your help,
Oliver
I would not accept that file format.
Specify the file format and make life easier on yourself.
For better, quicker answers on T-SQL questions, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/
May 31, 2015 at 6:21 am
Thanks for the advice and your help have a good sunday afternoon!
May 31, 2015 at 9:39 am
olibbhq (5/31/2015)
Hi,Thanks for all your help, I took the example content from wikipedia but all the content I want to put into a table is in the format. Essentially I am trying to move the data into a single standardised table from this bib tex format.
The way I have achieved this is to use python to change the text file so that each new line starts at an @ symbol. Then import row by row into an existing table using python pyodbc. Then using charindex and substring to pull the content from the large string into their relevant tables.
It seems to work and at the moment I don't need to put it in a SSIS package.
Thanks again for your help,
Oliver
Congrats on that. I just hate to do such things because it lends to the proverbial "Tower of Babel".
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 1 through 15 (of 17 total)
You must be logged in to reply to this topic. Login to reply