Urban Dictionary Analysis Tool

A fun little exercise I’ve been doing is a statistical and language analysis tool to analyse Urban Dictionary.  The idea for the project came about when it was pointed out that my own name was on UD and I realised that many of the definitions were of a sexual nature or offering praise to the holder of the name. I suspect that people are adding definitions of either their own name, or their partners or relatives. I thought it would be fun to programmatically analyse the various definitions and group them by content, maybe also ranking the most popular keywords or other interesting statistics.

The finished (though I’ll add to it overtime) product is available here: https://www.acarrick.com/urban_stats

screenshot of Urban Stats tool.

Continue reading for some technical details….

Continue reading →

Posted by Anthony

Quick Project: Movies Database/Google Form

For a long time, I’ve wanted to create a database of the movies I have. Because I have movies scattered around on various media, I’d like to be able to point friends to a site where they can see all the movies I have, and I can look up to see what media it is available on. In my case, I have a PVR, a PVR on the Mac, some DVDs, and movies I’ve bought off Google Play.

Eventually, I realised that Google Forms just saves to a Google Sheets spreadsheet and I could use this spreadsheet to find the location, or share a link to it so my friends can pick a movie. I also figured that I could somehow script the act of responding to the Google Form or otherwise populate the data, reading the list of recordings on my Mac PVR from the filesystem.

After a bit of searching around, I came across this Reddit thread, How can I use Python to submit a Google Form (or write to a response spreadsheet)? which suggested the easiest way to submit the Google Form. With a bit of Python magic, I created a simple script to read the files and folders in a directory, and submit them straight to the Google Form!

Read on to find out how!

Continue reading →

Posted by Anthony in Short Projects

The Right Tools for the Job

I’ve been involving myself in a few projects lately, for both work and personal life. During these projects, I’ve been thinking about what tools and technologies I use, and to use while I’m in the planning process. I’d like to share some of these.

Updated: 29th March 2018 – Git Bash for Mac

Continue reading →

Posted by Anthony

Website updates from Google Chrome Lighthouse

Recently I heard about Chrome’s Lighthouse extension for auditing websites and web apps. Lighthouse analyses websites for a few key metrics and suggesting ways they can be improved if needed. These include:

  • Progressive Web app improvements
  • Performance
  • Accessibility
  • Best Practises

While I don’t currently build web applications, I think it’s still a good test to use to find where improvements could be made, even so, I’ll learn something for the future.

Continue reading →

Posted by Anthony

Short Project: Chat Log Parser

I’m using Viber to communicate with someone, and we have many chats. So I looked into Viber’s chat backup capability. I found that Viber has two backups — one that you can restore, and one that you can email. It turns out that the email-able backup is actually in CSV. And so I realised I could parse it very easily with Python; and use a templating module such as Jinja2 or Mako, format it into an easy to read HTML page.

Continue reading →

Posted by Anthony in chat log parser

Improving code with RegEx

Background

Because of the way the prerequisites were entered into the course websites, I needed to write a little function parse strings such as “INFS1200 + GEOM1100 + 1200 + 1300″. In English, we know that 1200 and 1300 refer to GEOM1200 and GEOM1300 but that was assumed so left off. However, I needed to create a List object with the course codes themselves. So after I use list.split() on the string, I iterated over the list and the letters of the previous element to any current element that consists only of digits.

My little function was fine while I could list = string.split(” + “), but this breaks down when you discover that sometimes they are entered as “INFS1200 or GEOM1100 and 1200 & 1300” or something equally inconsistent.

Improvements

So it was time to learn RegEx! And therefore re-write the function in a more generic way. I needed to find any string that consisted of 4 uppercase letters followed by 4 digits or 2 uppercase letters followed by 4 digits, putting each result into the list.

After some searching and experimentation, I found that (simple) RegEx wasn’t as difficult as I thought and kind of fun. I used the following:

[A-Z]{4}[0-9]{4} | [A-Z]{2}[0-9]{4} | [0-9]{4}r

This works well and means I don’t have to worry how they separate the course codes. After adding the course codes to the list, I then iterate over it to fix the missing subject matter letters (INFS or GEOM etc).

Posted by Anthony

Database Correction Interlude

This is the third post in a series, documenting the progression of the project and the challenges I faced at each stage. I’m intending these posts to be almost reflective in nature, rather than very technical, though I will lightly justify some technical choices also. Once I feel that I can move the project out of a prototype stage, I’ll put up a more formal page documenting the technology and some overall reflections.

Because the state of the database kept bugging me, I went back to the original ER diagrams with my friend, and we came to a quite simple solution.

Continue reading →

Posted by Anthony in Course Chooser

UQ Course Chooser Stage 2 – Database Planning and Evolution.

This is the second post in a series, documenting the progression of the project and the challenges I faced at each stage. I’m intending these posts to be almost reflective in nature, rather than very technical, though I will lightly justify some technical choices also. Once I feel that I can move the project out of a prototype stage, I’ll put up a more formal page documenting the technology and some overall reflections.

Following on from part one, I have a Python script to scrape UQ’s course and program websites and now need to design and build a database to store the scraped data.

Continue reading →

Posted by Anthony in Course Chooser