May 28, 2020 at 1:49 pm
The problem: I have two data sources, Google Analytics and our own internal Application Form. Each source records the exact same goal/event, the conclusion of the application process. Unfortunately they record the datetime at slightly different times. For example Google may record the time as 3:05:52 while our app records it as 3:06:13. I know by looking at the Previous Step field in the Google data that the URL was for Location1 and then can see our app also recorded Location1 so I know the two records are a match.
The Question: How do I properly and reliably compare two datetimes that are usually within seconds or even milliseconds of each other so I can say Record A from Google is the same as Record Q from our own application?
May 28, 2020 at 6:07 pm
The solution will depend on a few different factors.
An example check would be:
SELECT ...
FROM ...
WHERE InternalDateTime >= GoogleDateTime
AND InternalDateTime < dateadd(minute, 1, GoogleDateTime);
This may not perform as well as needed - if that is the case then adding a persisted derived column to the Google Analytics table might improve performance.
Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”
― Charles R. Swindoll
How to post questions to get better answers faster
Managing Transaction Logs
May 28, 2020 at 6:19 pm
Your reply prompted my thoughts, would it also work in the where clause like the sample below? This assumes I have manually checked several examples and found the average difference is about a minute either way. Also, there could be multiple entries within seconds of one another but the probability of them being applications for the same location is slim. Even if it were to line up location and time the error is unavoidable and therefore acceptable.
Select ...
FROM ...
WHERE InternalDateTime between DATEADD(seconds, -60, GoogleDateTime) and DATEADD(seconds, 60, GoogleDatetime);
May 28, 2020 at 7:09 pm
That would probably work - you need to test it to be sure though. I prefer not using BETWEEN for date/times - but in this case it might be the better option.
That is why the question about which one will be first...if the GoogleDateTime will always be before the InternalDateTime then it simplifies the check to what I previously posted.
My guess is that GoogleDateTime will always be before (or equal) to the InternalDateTime - due to the order of processing where Google Analytics is updated/inserted prior to your internal repository.
Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”
― Charles R. Swindoll
How to post questions to get better answers faster
Managing Transaction Logs
Viewing 4 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic. Login to reply