Month: April 2017

Improving code with RegEx

Background

Because of the way the prerequisites were entered into the course websites, I needed to write a little function parse strings such as “INFS1200 + GEOM1100 + 1200 + 1300″. In English, we know that 1200 and 1300 refer to GEOM1200 and GEOM1300 but that was assumed so left off. However, I needed to create a List object with the course codes themselves. So after I use list.split() on the string, I iterated over the list and the letters of the previous element to any current element that consists only of digits.

My little function was fine while I could list = string.split(” + “), but this breaks down when you discover that sometimes they are entered as “INFS1200 or GEOM1100 and 1200 & 1300” or something equally inconsistent.

Improvements

So it was time to learn RegEx! And therefore re-write the function in a more generic way. I needed to find any string that consisted of 4 uppercase letters followed by 4 digits or 2 uppercase letters followed by 4 digits, putting each result into the list.

After some searching and experimentation, I found that (simple) RegEx wasn’t as difficult as I thought and kind of fun. I used the following:

[A-Z]{4}[0-9]{4} | [A-Z]{2}[0-9]{4} | [0-9]{4}r

This works well and means I don’t have to worry how they separate the course codes. After adding the course codes to the list, I then iterate over it to fix the missing subject matter letters (INFS or GEOM etc).

Posted by Anthony