This series looks at the Advent of Code challenges.
As one of my goals, I’m working through challenges. This post looks at day 4.
Part 1
This is an interesting data set. It’s ugly, inconsistent, and spans across lines. In fact, to determine what any particular “row” is, you need to read line by line and process the data until you find a blank line. An SSIS or ADF exercise indeed.
However, I decided to try this with Python first. I thought the loading and splitting of items on consecutive lines would be easier. I opened the file and then scanned it for values like this:
for row in passports:
if row not in ['n','rn']:
currpassport += row.replace('n',' ')
This let me look for a blank row. If I didn’t find one, I added the row to my current passport value. If I did have a blank value, I split the row into a dictionary:
currdict = dict(x.split(":") for x in currpassport.split(" ") if x)
From here, I could check the length being either 8 entries, or 7 entries if the cid was not present. I counted all these up to get to the answer.
Part II
This was annoying. Validating each one of the entries based on years or a set of values. I knew I could build a number of validation functions, which is the better way. I ended up just using a series of IF statements to check values and set a validation variable. A sample of them is here:
if currdict["hgt"][-2:]=="cm" and ( int(currdict["hgt"][:-2]) < 150 or int(currdict["hgt"][:-2]) > 193):
valid = 0
if currdict["hgt"][-2:]=="in" and ( int(currdict["hgt"][:-2]) < 59 or int(currdict["hgt"][:-2]) > 76):
valid = 0
if currdict["ecl"] not in ["amb", "blu", "brn", "gry", "grn", "hzl", "oth"]:
valid = 0
if currdict["hcl"][0] != "#" or len(currdict["hcl"]) != 7:valid = 0
This let me tally up the valid passports.
I still need to work on these in PoSh and SQL, but life has gotten in the way of things outside of work.