Introducing the projects!

I’ve set up this place to document my university projects and personal project histories, learnings and progress. I’m going to start with pages for the university projects I’ve done in 2016 and continue with ongoing technology projects I’m working on.

In my spare time, I am also working on a personal database and Python project to help students choose subjects at UQ.

Posted by Anthony, 1 comment

Website updates from Google Chrome Lighthouse

Recently I heard about Chrome’s Lighthouse extension for auditing websites and web apps. Lighthouse analyses websites for a few key metrics and suggesting ways they can be improved if needed. These include:

  • Progressive Web app improvements
  • Performance
  • Accessibility
  • Best Practises

While I don’t currently build web applications, I think it’s still a good test to use to find where improvements could be made, even so, I’ll learn something for the future.

acarrick.com

Of course, since I’m not developing a progressive web app, I get a lowish score. But there are things I can improve.

acarrick lighthouse

Progressive Web App

There’s not too much I can do improve the PWA score since its complaints include:

  • Does not register a Service Worker
  • Does not respond with a 200 when offline
  • User will not be prompted to Install the Web App
  • Is not configured for a custom splash screen
  • Address bar does not match brand colors
  • Page load is not fast enough on 3G

I can add some chrome branding which is pretty easy, just a meta tag. The page load I can try to improve but may be limited by my hosting.

Performance

The major performance improvements it suggested included:

  • Reduce render-blocking scripts
  • Reduce render-blocking stylesheets
  • Offscreen images
  • Enable text compression

I’ll try to rearrange my scripts and CSS to reduce the render blockages…

Accessibility

The only complaint it raised regarding accessibility was: “<html> element does not have a [lang] attribute.” I guess this is for screen readers. It will be easy to fix.

Best Practices

There were 3 issues that Lighthouse raised regarding general best practices:

  • Uses document.write()
  • Does not open external anchors using rel=”noopener”
  • Manifest’s short_name will be truncated when displayed on homescreen

document.write() I use to generate the email address for spam prevention and Statcounter uses it. I’ll look into it for my own knowledge. Opening external links with rel=”noopener” is something to do with tab process isolation. I don’t have a manifest to have a short_name, but anyway.

Improvements

I added a <meta name="theme-color" content="#551A8B"> to the <head>.

Next, I simply moved the scripts down the bottom of the <body> element and marked the JQuery import as async as directed.

According to the guide, the render blocking CSS seems difficult to fix at this time and maybe it’s not ideal due to the imagery used.

The language fix was super easy – just update the head element to:<html lang="en">.

For the second Best Practice suggestion I just added a “rel=”noopener” ” to each external link.

According to the documentation, this warning is also triggered if the page title is > 12 characters. My name is 15 characters with space so I’m not changing this.

Result

acarrick lighthouse after

Ta Da!

Turns out though that I even though I added the branding meta tag, Lighthouse looks for a manifest too. So even though I now have purple branding on mobile browser address bars, I don’t get the points yet!

Posted by Anthony

Short Project: Chat Log Parser

I’m using Viber to communicate with someone, and we have many chats. So I looked into Viber’s chat backup capability. I found that Viber has two backups — one that you can restore, and one that you can email. It turns out that the email-able backup is actually in CSV. And so I realised I could parse it very easily with Python; and use a templating module such as Jinja2 or Mako, format it into an easy to read HTML page.

Coding

Reading the CSVs was simple enough, there’s a module built into the Python Standard Library to read them as an iterable. Then by reading over the log, I could see the structure easily:

Date Time Sender Name Sender Phone Number Message

I had two major problems parsing the data:

  1. parsing the date and time
  2. Handling messages that included a new line. A message that contained a new line, actually saved the new line character to the CSV meaning that the line in the CSV actually broke. This is probably invalid CSV or something.

Date/time parsing

I wanted a way to parse to separate strings that make up the date-time. One string as the date and the other as the time.  I started to use datetime.combine() but later realised I could create the whole object in one hit. datetime.strptime() is more powerful than I thought.

 

Paragraphed Message Handling

Handling the new line was a fun challenge, but I’m disappointed it has to be this way. I took advantage of the fact that a message line should have a date string in the first column, and if not, it must be a continuation of the first column. If I couldn’t parse the date string to a date object, I’d treat that line as a continuation and append it to the most recently found message. (I also needed to handle actual commas in the message, so I assumed that a comma in the text would actually be written as “, ” so I recreated it the internal data structure.) I know that this whole thing is kludgy, but I can’t do much else when the CSV data isn’t properly quoted. (As I discovered when updating this project for Facebook Messenger. I’ll talk about that in another post though.)

Structure

I was introduced to object oriented programming by using classes as data structures, and I think for this kind of application it makes sense. This gives me the flexibility to allow a Message object and a Chatlog object to do things to themselves, over-and-above just using properties.

screenshot of my data structure

Currently, though I still have a mix of “private” variables (because Python doesn’t support private variables) with getter methods and Python Properties which I started to use once I discovered them. Most of the class is set in the constructor, with only the contents, is_user flag and timestamp as properties in case they need to be changed later.

Templating

The basic Viber template I created is as follows:

 

This pretty simple, but not very pretty. I pass the whole chatlog object instantiation to Jinja2 and then I use it directly within the template. In this case, I iterate over each message, displaying the date and time as a <p> element and the message itself as another paragraph element. If the message sender name is “Me”, I give it a class which floats it right.

chatlog in HTML screenshotConclusion

I’ve actually done a lot more than this, but I think this is enough for an introduction and basic history. The latest version actually supports Facebook chat and bundles messages together from the same contact if they are in the same minute.

It can be found on Github at Chat Log Viewer. Feel free to fork it and send me a PR, there are issues I need to fix. In a future post, I’ll talk about the Facebook code additions and learnings related to that.

Posted by Anthony in chat log parser

Improving code with RegEx

Background

Because of the way the prerequisites were entered into the course websites, I needed to write a little function parse strings such as “INFS1200 + GEOM1100 + 1200 + 1300″. In English, we know that 1200 and 1300 refer to GEOM1200 and GEOM1300 but that was assumed so left off. However, I needed to create a List object with the course codes themselves. So after I use list.split() on the string, I iterated over the list and the letters of the previous element to any current element that consists only of digits.

My little function was fine while I could list = string.split(” + “), but this breaks down when you discover that sometimes they are entered as “INFS1200 or GEOM1100 and 1200 & 1300” or something equally inconsistent.

Improvements

So it was time to learn RegEx! And therefore re-write the function in a more generic way. I needed to find any string that consisted of 4 uppercase letters followed by 4 digits or 2 uppercase letters followed by 4 digits, putting each result into the list.

After some searching and experimentation, I found that (simple) RegEx wasn’t as difficult as I thought and kind of fun. I used the following:

This works well and means I don’t have to worry how they separate the course codes. After adding the course codes to the list, I then iterate over it to fix the missing subject matter letters (INFS or GEOM etc).

Posted by Anthony

Database Correction Interlude

This is the third post in a series, documenting the progression of the project and the challenges I faced at each stage. I’m intending these posts to be almost reflective in nature, rather than very technical, though I will lightly justify some technical choices also. Once I feel that I can move the project out of a prototype stage, I’ll put up a more formal page documenting the technology and some overall reflections.

Because the state of the database kept bugging me, I went back to the original ER diagrams with my friend, and we came to a quite simple solution.

Just simply remove the Faculty relation, adding instead a ‘faculty’ attribute to both the School relation and Plan/Program relation.

A few reasons this is better and still valid:

  • I wouldn’t be saving any space with the dedicated relation anyway (unless I used ID numbers, but then there’s more querying to join Faculty name with its ID and to the School).
  • There are only around 20 – 30 unique Schools at UQ so, querying it with distinct  isn’t really an issue.
  • Because Plan/Program would be a 1:M relationship with Faculty anyway, you’d still have to search every row in the Plan/Program table to determine all study plans offered by a Faculty.
  • Because I’m not actually storing any further data about the faculty, there’s no point in devoting a whole table to it.

This really simplifies the data creation process, and keeping the faculty against both the school and plan whilst still allowing me to run interesting queries.

I also took the opportunity to update the database schema so the columns were of a decent length.

 

Posted by Anthony in Course Chooser

UQ Course Chooser Stage 2 – Database Planning and Evolution.

This is the second post in a series, documenting the progression of the project and the challenges I faced at each stage. I’m intending these posts to be almost reflective in nature, rather than very technical, though I will lightly justify some technical choices also. Once I feel that I can move the project out of a prototype stage, I’ll put up a more formal page documenting the technology and some overall reflections.

Following on from part one, I have a Python script to scrape UQ’s course and program websites and now need to design and build a database to store the scraped data.

ER Diagrams and Database planning

I used Draw.io to do the initial diagramming, in both Chen notation as I was taught at university, and in Martin notation as is used more commonly used in the industry. Many of us used Draw.io during our university studies for its collaborative functionality as part of the team-based database design project. We could discuss the diagram and easily move objects around in real-time, either completely remotely or just on separate computers sitting next to each other. I kept using Draw.io for my own project since I was already experienced in it; and also using Google Drive, it was automatically saved and versioned as well as being accessible from any Internet-connected PC.

I like that Chen notation shows the relationship between entities and we don’t need to include the tables created during the mapping steps.

Design Rationale and Mapping

I needed to store information about the following entities:

  • Course
  • Plan/Program
  • School
  • Faculty

A Course entity contains attributes such as CourseCode (the Primary Key), Title, Coordinator; and the important multi-valued attributes Semesters and Prerequisites. Semesters and Prerequisites would be mapped to new tables. Prequeresites ends up being a correlation table just mapping CourseCodes to other CourseCodes.

I figured that a Course is related to a Plan/Program since Courses make up Programs (of study). A Course is offered by a School. Schools have a title and a Faculty (entity) that manages that School. A Faculty also offers a Plan so it needs to be an entire entity by itself so both relationships are available.

My original Chen Notation diagram was complex and had many relations and relationships:

Unfortunately, I’ve lost some detail when I decided to simplify it during the mapping stage. After diagraming it using Martin Notation it seemed quite hard to comprehend.

Originally, I wanted to be able to correlate faculties against schools and programs; and schools and faculties against individual courses. This way, as well as having a database to help choose subjects, I could also run interesting statistical analysis against the relationship between courses/schools/faculties/programs to determine things like… what faculty has the most programs, or what school has the most individual courses.

My colleague suggested (separately of my unplanned redesigns) I use the ER diagramming tool built into MySQL Workbench. It uses Martin notation and has the nice feature of being able to generate an SQL script based on the diagram.

MySQL Workbench

Following on from my Martin Notation on Draw.io, following is one of my early ER diagrams I made in SQL Workbench:

After discussions with my friend, I further simplified the ER diagram and model:

Clearly, the model and diagram are now much simpler and easier to understand, with the Course table still taking centre-stage. But still, Program is no longer linked to Faculty. (Having said that, an equijoin with a nested query would likely work and still keep the database simple.)

Recently I have removed the Faculty table since it didn’t really need to exist, and I figured I should just store them in columns against each School instance (row) given that each school only has one faculty.

Final Thoughts

Of course, while it’s now very easy to insert the course data into the database, I have also lost all the faculty-school-program relationships by removing that table. Now, more sub-queries would be required to cross-reference it. I decided that since there would only be 20-30 Schools across five Faculties, the time saved in the queries didn’t warrant the extra complexity. There are only about 3000 courses in total and probably a few hundred programs so I didn’t think speed was really an issue.

Also, I really wanted to get the database created so I could at least prototype the Python database saving code.

Now I actually have the Python script saving things to the database, I’ll probably go back and either change the database again or scrape the data first then go and reprocess the database to rebuild the Faculty table once the entire dataset is scrapped. Having the experience of redesigning the database a few times I feel I am in a better place to go back and create a database structure similar to what I wanted originally.

Building the database itself presented further technical challenges as I was also on holidays at the time….

Posted by Anthony in Course Chooser

Project: UQ Course Chooser

A project I’m currently working on in my spare time is a Python script/database/web app to help UQ students choose their subjects for the next semester.

This is the first in a series of blog posts almost in a diary format, so I can document the progression of the project and the challenges I faced. I’m intending these posts to be almost reflective in nature, rather than very technical, though I will lightly justify some technical choices also. Once I feel that I can move the project out of a prototype stage, I’ll put up a more formal page documenting the technology and some overall reflections.

Rationale

The idea stemmed from my own frustrations with choosing subjects for my Graduate Certificate in Information Technology. We have a list of courses (at UQ, a subject is a referred to as a “course”, I’ll use them interchangeably) we can choose from for a particular Program (qualification). This list of subjects unfortunately just divides the subjects into categories related to the program such as “Part A – Compulsory”, “Part B – Introductory Electives”. So when I was choosing my own subjects, I would just open up each course in the list in a new browser tab and then remove the subjects (close the browser tab) that are only offered in a different semester or that I don’t have the prerequisites for.

I devised three main “stages” and components for my project – a Python script to scrape UQ’s course websites, a database to store the course and program information from the scrape, and a web app which is the actual interface and UI to show the course information (I envisage a page where the user can select their program of study, the semester, and enter any other subjects they’ve studied. Then the app can show a list of subjects available, highlighting the ones that meet criteria.).

Stage one – Scraping script

The first course of action was to determine if it was even possible to programmatically determine the course information (semester, prerequisites, course title etc). A quick look over the to look at the HTML source of the course websites proved that it was. I pleasantly discovered that the web pages were nicely formatted in a consistent way with HTML elements given sensible ids and classes such as <h1 id="course-title">Introduction to Software Engineering (CSSE7030)</h1> for the course title or
<p id="course-incompatible">COMP1502 or CSSE1001</p> for the courses that another course is incompatible with. (You wouldn’t get credit for this course if you have passed those other courses.) I also found a list of all programs offered by the university. These facts proved that it would be possible (maybe even not-to-difficult) to scrape UQ’s course and program websites to build up a database of subjects and programs.

Finally, I was able to get back to the project after the semester had finished so I set about finishing the Python script.

My script is made up of three main components so far:

  • A Course() class (object) to store various attributes about each course using appropriate data types (though Python is loosely typed).
  • A find_links(url) which searches through a web page and returns a list of all the courses llisted based on URLs in UQ’s course-URL format.
  • A database_access.py module which contains some helper methods for database querying and writing and also the main course writer method which receives a Course object and writes its attributes to the database.

In my main script, I loop over the result of find_links() on a given URL, using the result of each iteration to create a new Course() object from that URL, appending each new Course object into a list. Then I loop over that list passing each Course object to the database_access.course_create(course) method.

 

Once I got enough working that I could scrape individual course data, I peeled away from this and set about creating the database to actually store the scraped information. I’ll discuss my experiences with mySQL and Python in a later post.

Posted by Anthony in Course Chooser