April 9, 2022 at 7:48 pm
OMG. I think CostCo doesn't want anyone to do anything with their own purchase data. If you've ever tried to read a Costco receipt, it's some brutal horrible shorthand. If you try to import it, it's even worse. So I asked Costco to e-mail a PDF of my last 6 months of purchases. Either that's a PowerQuery puzzle that's way above my pay grade, or Costco has zero interest in you using your own data for something useful (like importing all the invoice details from your purchase into a table, so you can analyze it.
Sorry about that.
Has anyone ever successfully parsed a Costco PDF of purchases? If so, how did you do it?? I get two columns of data on top of one another (second value is on next line).
April 10, 2022 at 6:36 am
I don't know what these look like. Can you perhaps provide an example?
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
April 11, 2022 at 7:30 pm
Sure. Just a minute. I attached a PDF. (I changed the extension to txt because this place doesn't like PDFs, so you'll have to change it back). The date and item columns wrap in the PDF, so they import as separate lines. Ideally, I'd like to merge them back together, and then import. I can get the data to read into PowerBI or Excel using PowerQuery, but I don't know how to clean up the data so that it's usable. Since the bakery only uses a limited number of items, I could copy and paste the mangled data into Excel, fix it manually, and then import just the {[PurchaseDate],[Item number],[Qty], and [Amount]} columns.
Might well be just overthinking it. He has less than 100 unique products he buys there, so maybe it's a half hour of boring, and then just import the 3 columns after that. If all else fails, he could do it manually, but I'm lazy (maybe too lazy?)
Thanks!
Pieter
April 12, 2022 at 11:16 am
I see what you mean about the format of that.
Based on that data, I am struggling to find a unique key for the results. You would think something like
PurchasedAt (DateTime), ProductCode
Would do the trick.
To get PurchasedAt, I assume there's something which can be derived from WHSDATE ITEM? Row 2 looks like MM-DD or DD-MM, and row 1 looks like YYYY-, optionally followed by a number. What is this number? Why is it not always present?
ProductCode ... is that the final row in the ITEM description column, for each item? Does not look like a common format if it is!
This would require a PQ masterpiece!
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
April 12, 2022 at 1:32 pm
Phil,
thanks for looking at it. The good thing is that since it's a bakery, he only buys a small number of different ingredients, so entering them by hand won't be that hard, so it's not the end of the world.
Pieter
April 14, 2022 at 9:03 am
I was half looking forward to giving this a proper go, as I'm sure it would be good for my own PQ education, but it seems that it's probably not worth the effort. Should you decide that you want to revisit, I'm happy to have a go.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
April 14, 2022 at 5:14 pm
Phil,
If you want, I can DM you a few receipts in PDF form (I'll have to change the extension, but ...), and you can if you wish, if only for a challenge. He usually goes shopping there once a week, so he could pretty easily just enter the stuff in a table if he has to. I was hoping the data would be easier to just import, but that turned out to be a hot mess.
Pieter
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply