September 12, 2009 at 12:06 am
Hi friends
In one of our project there is a requirement of finding duplicates among enrolled photo and bio-metirc data (finger print) , both Photo and Finger Print is availble as SQL Server Tables.
The volume is pretty large , (arround 1000000*4=1000000 ).
Could some of you help me to find out a soltion to this deduplication issue
Thanking you
regards
john
September 12, 2009 at 5:52 am
are they real duplicates, as in the image is in the database twice, or two different images of the the same print?
if they are the same image twice, then you could use binary_checksum to check for possible duplicates;
then if you group by the binary_checksum having count >1, those that appear in the SELECT might be duplicates...but it's not guaranteed.
select * from (
select row_number() over(partition by binary_checksum(yourimage) order by 1) AS RW,*
from yourtable) MyAlias
where RW >1
SELECT * FROM yourtable
WHERE binary_checksum(yourimage) in
(select binary_checksum(yourimage)
from yourtable
group by binary_checksum(yourimage)
having count(*) > 1)
i've written an outside program to get the filesize and the CRC of any file/image on disk, and then put that info into a database in order to find duplicates...that has worked for me, and been 100% accurate in the past, as it seems the combo of size and CRC is a perfect indicator of duplications.
maybe the datalength() of the image and the checksum would be an excellent indicator for IMAGE datatypes
if it is two different images, that might be the same, then you'd have to use some kind of algorythm related to identifying the significant points for each print, stick that in the same row of each image, and then group by that similar to the checksum method above...not 100% accurate, but a good indicator.
Lowell
September 13, 2009 at 7:35 pm
I think Lowell has some good ideas. I'd use the checksum to get an initial list, but then I'd have some program run through and compare the binary data to be sure it's really the same.
September 15, 2009 at 4:50 am
Hi
Friedns the check sum comparison is a good idea ,
But my problem is diffrent and so it is posted under NOT In SQL Section
See the data is comming through an Enrollment program ,
Capturing Photo finger print as wel as Finger print ,
My requuirement is to identify accedental or intentional Duplication in enrollment , I think some third party engines are there ,those engines supposed to mach facial and finger images ,
I am a pure database guy and so familiar with those ideas
regards
john
September 15, 2009 at 5:11 am
I'm guessing you are just coming on board with a shop that does this? I would say they must have that software in their shop already; I would lean more towards calling sister or similar shops and asking what they used...learning by others experience.
Both types of recognition software are going to do this: analyze the image, and come up with a value, or number of values, which can be placed in the database.
Once that value is in the database, that is what is used for checking for best matches...
i googled "fingerprint recognition software" and found lots of software, one which claims scan 40K fingerprints a second. you could google similarly for facial recognition software.
heres one of the first links to free versions:
http://www.freedownloadmanager.org/downloads/fingerprint_recognition_software/
so for example, say some software produces a has for my fingerprint, along with some other high level data, arch vs swirl type fingerprints for example. saving that along side a difital image of the print, and repeating for all the prints in your database, would give you the raw data you could use to check for possible dupes whenever a new record is inserted.
Lowell
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply