Exif data into a database.

  • Hello All,

    With the ExifTool by Phil Harvey I make a number of .CSV files from different media (harddisk, USB-stick, CD-rom's etc.).

    How can I import these .CSV files into a database?

    Goal is to get all my digital photo's meta data in a single database, so that I can query the total collection.
    (To study meta data of the pictures and to organise all my backup efforts, backups, backup from backups etc, copies).

    Thanks for your time and attention
    Ben

    -- The tool:
    -- https://www.sno.phy.queensu.ca/~phil/exiftool/

    -- Example for calls to the exiftool
    exiftool -csv -r -ext jpg J:\ >exif1_jpg.csv
    exiftool -csv -r -ext jpg C:\ >exif2_jpg.csv

    The structure of the CSV files does vary enormously.
    The number of fields varies ( sometimes >> 1000 )
    Which fields vary. (<always present> <most of the time>, <rare>) This depends on camerabrands and camera's in the set.
    The structure of the CSV is not predictable. If a picture of a friend camera is added, new fields can be introduced. 
    Total number of pictures to import the exif data from is around 100 000 to 150 000.

    The exiftool can handle only a limited amount of data at the time, so there are multiple files.
    (So when I import pictures from 2010, 2011, 2012 each CSV has a different structure).

  • Have you tried the import wizard?

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • ben.brugman - Tuesday, November 20, 2018 7:41 AM

    Hello All,

    With the ExifTool by Phil Harvey I make a number of .CSV files from different media (harddisk, USB-stick, CD-rom's etc.).

    How can I import these .CSV files into a database?

    Goal is to get all my digital photo's meta data in a single database, so that I can query the total collection.
    (To study meta data of the pictures and to organise all my backup efforts, backups, backup from backups etc, copies).

    Thanks for your time and attention
    Ben

    -- The tool:
    -- https://www.sno.phy.queensu.ca/~phil/exiftool/

    -- Example for calls to the exiftool
    exiftool -csv -r -ext jpg J:\ >exif1_jpg.csv
    exiftool -csv -r -ext jpg C:\ >exif2_jpg.csv

    The structure of the CSV files does vary enormously.
    The number of fields varies ( sometimes >> 1000 )
    Which fields vary. (<always present> <most of the time>, <rare>) This depends on camerabrands and camera's in the set.
    The structure of the CSV is not predictable. If a picture of a friend camera is added, new fields can be introduced. 
    Total number of pictures to import the exif data from is around 100 000 to 150 000.

    The exiftool can handle only a limited amount of data at the time, so there are multiple files.
    (So when I import pictures from 2010, 2011, 2012 each CSV has a different structure).

    Do the CSV files come with column headers?  Also, with > 1000 unpredictable "fields", what on Earth will you want to do with the data once it's in a table?  I mean, do the column headers (if present) identify what the field may contain?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Tuesday, November 20, 2018 8:10 AM

    Do the CSV files come with column headers?  Also, with > 1000 unpredictable "fields", what on Earth will you want to do with the data once it's in a table?  I mean, do the column headers (if present) identify what the field may contain?

    Yes the CSV files come with column headers.

    The Exif system is an extendable system, manufacturers of camera's come up with different extensions. What they do is unpredictable, at least I can not predict what manufacturers have come up with. But when you look into the data, there are large numbers of fields containing sensible data. Fields for the same purpose can have different names, different formats and other subtleties. A number of fields relates to the camera, a number of fields relate to the used lens, a number of fields relates to the setting. There is date and time information, sometimes the timezone. Sometimes there is GPS information, sometimes there is even named location information. Which faces are recognized, where the focus point is in the picture. And the list goes on.

    So there is a lot of information and not all is useful to everybody and there is also information I do not get yet. But making the information visible is step one.

    Questions to be answered could be :
    What focal length's are used ?
    Which combinations of ISO settings/shutter speeds/aperture are used ? (And how often) ?
    Which pictures are not yet in my 'main set' and are lurking in some corner of some media ?
    How do I take pictures dispersed over a day, or over a complete 'vacation' ?
    How often do I use a flash ?

    And a number of questions, depending on what data is available.

    Having the exif data in a single database for 'all' pictures could be useful.

    Thanks.
    Ben

  • Phil Parkin - Tuesday, November 20, 2018 8:02 AM

    Have you tried the import wizard?

    In the past I used the import wizard. But I would like to have an automated process and the format **) of the CSV files is not predictable. So using the wizard I could not archive that the wizard picked up the changing format.

    Thanks Ben
    **) (The delimitor, textqualifier etc. is always the same. The number of columns and the names are not.).

  • ben.brugman - Wednesday, November 21, 2018 5:45 AM

    Phil Parkin - Tuesday, November 20, 2018 8:02 AM

    Have you tried the import wizard?

    In the past I used the import wizard. But I would like to have an automated process and the format **) of the CSV files is not predictable. So using the wizard I could not archive that the wizard picked up the changing format.

    Thanks Ben
    **) (The delimitor, textqualifier etc. is always the same. The number of columns and the names are not.).

    Do you have a consistent output format? Presumably a single wide table with every possible column on it? If not, how are you hoping to accommodate this data?

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • You could import each line into a single row/record. Then you could parse the single row into the multiple columns you want to use/keep... using SUBSTRING() and the offset you would get from the first column-headings row.

  • I think you might find it easiest to write a small application (C#, VB) to pull in the columns of interest based on the headings in each file and write the results to the database. That way, you can code in the fact that the same thing can have different names in two files - you could even code in conversions if some files use imperial measurements while others use metric.

  • Phil Parkin - Wednesday, November 21, 2018 5:51 AM

    Do you have a consistent output format? Presumably a single wide table with every possible column on it? If not, how are you hoping to accommodate this data?

    Yes and No.
    Starting this project I noticed that the number of columns was over the number of columns allowed. (Columns per nonwide table)
    Just now I learned that there is also a 'Columns per wide table', so I learned something new just now.🙂

    There are a large number of columns that I do understand and are planning to use. (date/time columns, lens parameters, setting parameters).
    There are a number of columns which contain data (example focus points), where I know which information is contained, but I have not yet discovered the name of those columns and the format the data is in. I have to search for these column and study the format.
    There are probably columns I suspect must be present but I am not sure of that they actually exist.
    And there are a number of columns which I do not know off and do not understand 'yet'.

    So after importing the data into SQL-server I need to study the column names (header names) and the content.
    And make a list which columns to use in the endresult. Probably have a table with far less than a 1000 columns and have the remainder of the information in a named, value table.

    But I am starting of with a table for each .CSV file. (Each file has a distinct set of columns).

    Ben

  • Ricky Lively - Wednesday, November 21, 2018 6:14 AM

    You could import each line into a single row/record. Then you could parse the single row into the multiple columns you want to use/keep... using SUBSTRING() and the offset you would get from the first column-headings row.

    Some fields contain <EOL> (End of Line) symbols, I have to accomodate for that.

    At the moment I am importing the file, using 'DelimitedSplit_Long' **) function to split the file into lines. Repair the splits which are within a field. Then use 'DelimitedSplit_Long', to split the lines into items. Repair the splits within a field. And then build a table from that.

    But this is such a generic request. (Importing a .CSV file where when importing the columns are not known). That I am still supprised that there is no Generic solution for this. I have encountered this problem ***) before, college's of me have encountered this problem before. So there must be hundreds of people having encountered this problem before.

    Thanks for your time and attention.
    Ben

    **) Adapted DelimitedSplit8K

    ***) At my work I have several examples where we want to import data from a .CSV. (I can not go into this).
    Privately I have a number off situations where I like to import data from a CSV.
    Exif data from picture files. (As in this thread).
    Banking information of downloads of bank accounts. (Different banks different formats).
    Medical data of devices I use. (Different devices, different brands, but even within a brand there are variations of .CSV files for the 'same' device).

  • ben.brugman - Thursday, November 22, 2018 3:07 AM

    Ricky Lively - Wednesday, November 21, 2018 6:14 AM

    You could import each line into a single row/record. Then you could parse the single row into the multiple columns you want to use/keep... using SUBSTRING() and the offset you would get from the first column-headings row.

    Some fields contain <EOL> (End of Line) symbols, I have to accomodate for that.

    At the moment I am importing the file, using 'DelimitedSplit_Long' **) function to split the file into lines. Repair the splits which are within a field. Then use 'DelimitedSplit_Long', to split the lines into items. Repair the splits within a field. And then build a table from that.

    But this is such a generic request. (Importing a .CSV file where when importing the columns are not known). That I am still supprised that there is no Generic solution for this. I have encountered this problem ***) before, college's of me have encountered this problem before. So there must be hundreds of people having encountered this problem before.

    Thanks for your time and attention.
    Ben

    **) Adapted DelimitedSplit8K

    ***) At my work I have several examples where we want to import data from a .CSV. (I can not go into this).
    Privately I have a number off situations where I like to import data from a CSV.
    Exif data from picture files. (As in this thread).
    Banking information of downloads of bank accounts. (Different banks different formats).
    Medical data of devices I use. (Different devices, different brands, but even within a brand there are variations of .CSV files for the 'same' device).

    There are generic solutions for such a thing.  I'm just wondering what the heck it is that you'll do with the "generic" column names that apparently vary so greatly. 

    Also, you seem to indicate that a comma may be embedded in a "field" that is not intended to be a delimiter.  Do you have such an example and is the field then embedded in double-quotes in the file itself?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Thursday, November 22, 2018 1:04 PM

    There are generic solutions for such a thing.  I'm just wondering what the heck it is that you'll do with the "generic" column names that apparently vary so greatly. 

    Also, you seem to indicate that a comma may be embedded in a "field" that is not intended to be a delimiter.  Do you have such an example and is the field then embedded in double-quotes in the file itself?

    Yes the fields with a comma or a <EOL> marker are double qouted.
    For examples of fieldnames and of a commas embedded in a (double-quote) "Field", see below.


    There are generic solutions for such a thing.

    Could you point them out to me?

    The field names are not Generic as such, but I am seeking a solution which can handle 'names' in general.

    Thanks for you time and attention.
    Ben

    Fieldname examples (most of them fairly self explaining): 

    1. FocalLength, FocalLength35efl,FocalLengthln35mmFormat, MaxFocalLength, MinFocalLength), very clear what these are.
    2. BabyAge and Babyname    (Self describing, used in some compact camera's).
    3. AEBBracketValue, AEBSequence, AEBShotCount,AELock,AELockButten. (For a DSLR user clear what that means, although I would have to inspect which camera uses AELock and which camera uses AELockButten (or both). Same for Sequence and ShotCount.).
    4. GPSAltitude    GPSAltitudeRef    GPSAreaInformation    GPSDateStamp    GPSDateTime    GPSDestBearing    GPSDestBearingRef    GPSDOP    GPSHPositioningError    GPSImgDirection    GPSImgDirectionRef    GPSLatitude    GPSLatitudeRef    GPSLongitude    GPSLongitudeRef    GPSMapDatum    GPSMeasureMode    GPSPosition    GPSProcessingMethod    GPSSpeed    GPSSpeedRef    GPSStatus    GPSTimeStamp    GPSTrack    GPSTrackRef    GPSVersionID
      It's clear that they all have to do with location data, but there is 'overlap' in the fields and for different brands differents combinations are used.

    There are also a number of more obscure fieldnames. Or fields where you have to check the content to see what's actually in those fields.

    Content

    1. Adobe XMP Core 5.6-c014 79.156797, 2014/08/20-09:53:02 
    2. Baseline DCT, Huffman coding
    3. CREATOR: gd-jpeg v1.0 (using IJG JPEG v62), quality = 80
    4. Y, Cb, Cr, -
    5. Little-endian (Intel, II)
    6. Auto, Fired
    7. xx deg x' 41.40" N, x deg xx' 28.80" E    (numbers are replaced by xxx, to protect the innocent)
    8. Not Embedded, Independent
    9. Reflective, Glossy, Positive, Color
    10. Copyright International Color Consortium, 2009
    Both the fieldnames and content are taken from a set containing 1050 'columns'. This is from pictures from a large set of camera's. Scanned from a general harddisk, not my 'photodisc'.

  • ben.brugman - Tuesday, November 20, 2018 7:41 AM

    Hello All,

    With the ExifTool by Phil Harvey I make a number of .CSV files from different media (harddisk, USB-stick, CD-rom's etc.).

    How can I import these .CSV files into a database?

    Goal is to get all my digital photo's meta data in a single database, so that I can query the total collection.
    (To study meta data of the pictures and to organise all my backup efforts, backups, backup from backups etc, copies).

    Thanks for your time and attention
    Ben

    -- The tool:
    -- https://www.sno.phy.queensu.ca/~phil/exiftool/

    -- Example for calls to the exiftool
    exiftool -csv -r -ext jpg J:\ >exif1_jpg.csv
    exiftool -csv -r -ext jpg C:\ >exif2_jpg.csv

    The structure of the CSV files does vary enormously.
    The number of fields varies ( sometimes >> 1000 )
    Which fields vary. (<always present> <most of the time>, <rare>) This depends on camerabrands and camera's in the set.
    The structure of the CSV is not predictable. If a picture of a friend camera is added, new fields can be introduced. 
    Total number of pictures to import the exif data from is around 100 000 to 150 000.

    The exiftool can handle only a limited amount of data at the time, so there are multiple files.
    (So when I import pictures from 2010, 2011, 2012 each CSV has a different structure).

    I must ask why you want to do this?
    😎
    The EXIF specs are 148 pages, you might have to spend the next few days (and nights) modelling it to fit into a relational database:pinch:

  • ben.brugman - Friday, November 23, 2018 7:01 AM

    Jeff Moden - Thursday, November 22, 2018 1:04 PM

    There are generic solutions for such a thing.  I'm just wondering what the heck it is that you'll do with the "generic" column names that apparently vary so greatly. 

    Also, you seem to indicate that a comma may be embedded in a "field" that is not intended to be a delimiter.  Do you have such an example and is the field then embedded in double-quotes in the file itself?

    Yes the fields with a comma or a <EOL> marker are double qouted.
    For examples of fieldnames and of a commas embedded in a (double-quote) "Field", see below.


    There are generic solutions for such a thing.

    Could you point them out to me?

    The field names are not Generic as such, but I am seeking a solution which can handle 'names' in general.

    Thanks for you time and attention.
    Ben

    Fieldname examples (most of them fairly self explaining): 

    1. FocalLength, FocalLength35efl,FocalLengthln35mmFormat, MaxFocalLength, MinFocalLength), very clear what these are.
    2. BabyAge and Babyname    (Self describing, used in some compact camera's).
    3. AEBBracketValue, AEBSequence, AEBShotCount,AELock,AELockButten. (For a DSLR user clear what that means, although I would have to inspect which camera uses AELock and which camera uses AELockButten (or both). Same for Sequence and ShotCount.).
    4. GPSAltitude    GPSAltitudeRef    GPSAreaInformation    GPSDateStamp    GPSDateTime    GPSDestBearing    GPSDestBearingRef    GPSDOP    GPSHPositioningError    GPSImgDirection    GPSImgDirectionRef    GPSLatitude    GPSLatitudeRef    GPSLongitude    GPSLongitudeRef    GPSMapDatum    GPSMeasureMode    GPSPosition    GPSProcessingMethod    GPSSpeed    GPSSpeedRef    GPSStatus    GPSTimeStamp    GPSTrack    GPSTrackRef    GPSVersionID
      It's clear that they all have to do with location data, but there is 'overlap' in the fields and for different brands differents combinations are used.

    There are also a number of more obscure fieldnames. Or fields where you have to check the content to see what's actually in those fields.

    Content

    1. Adobe XMP Core 5.6-c014 79.156797, 2014/08/20-09:53:02 
    2. Baseline DCT, Huffman coding
    3. CREATOR: gd-jpeg v1.0 (using IJG JPEG v62), quality = 80
    4. Y, Cb, Cr, -
    5. Little-endian (Intel, II)
    6. Auto, Fired
    7. xx deg x' 41.40" N, x deg xx' 28.80" E    (numbers are replaced by xxx, to protect the innocent)
    8. Not Embedded, Independent
    9. Reflective, Glossy, Positive, Color
    10. Copyright International Color Consortium, 2009
    Both the fieldnames and content are taken from a set containing 1050 'columns'. This is from pictures from a large set of camera's. Scanned from a general harddisk, not my 'photodisc'.

    Sorry, Ben.  I've been head down into a hurricane on other things.

    You can do this... have a stored procedure that reads the first row as a blob (the one with the field names) and split it so you get a numbered list of the field names.  Use that as a source of dynamic SQL to build your target table and then import the file to that new target table.

    Since you have some clues given in the column name by the datatype, you might even go the extra mile by having the dynamic SQL determine the correct datatype based on the clues in the column name.

    As for providing a link to some of the generic methods I've seen goes, I can't.  It was a bad recommendation on my part because I'm normally disappointed in them for one reason or another and end up writing my own using the method I describe above.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Eirikur Eiriksson - Friday, November 23, 2018 7:58 AM

    I must ask why you want to do this?
    😎
    The EXIF specs are 148 pages, you might have to spend the next few days (and nights) modelling it to fit into a relational database:pinch:

    Thanks for the document, did go through it. This is very interresting, but not needed for using the exif information. (But still interesting, thank you).

    Why?
    There is a 'shit'-load of information available in the Exif data. Having photography as a hobby, this is interresting information too me. Using a (SQL-server) database makes this information more accesseble. With lightroom you can make filters on the data to see the pictures (and how many) are 'left' after the filtering. With a database you can do so much more, combinations of the 'parameters' for example, to study the settings of large sets of pictures.

    One important goal is to collect the data of a large number of media and combine them in a single database. So that I can explore which pictures are not allready in my masterset of pictures. And then 'discard' the media which I do not have need anymore for. (I have a collection of CD-roms, flash drives and others containing backups of pictures and other pictures.) Going through them informs me if my master set is complete.

    Other questions are what are the lenses and settings which get used and how much. And how do I (and others) take pictures over time.

    Remark: Photography is a hobby, both the pictures and the meta (exif) data is totaly mostly insignificant. (But it's a hobby).

    My question to you is:
    Why the obsession with the WHY question? Very often this question is asked. No problem, but af the WHY is anwsered, the original question keeps hanging in the air. The impression I sometimes get is the the 'WHY' question is asked to avoid giving an anwser to the question. Or as 'a suggestion' that the you should not want what you asked for.

    Ben

Viewing 15 posts - 1 through 15 (of 19 total)

You must be logged in to reply to this topic. Login to reply