Short Project: Chat Log Parser

I’m using Viber to communicate with someone, and we have many chats. So I looked into Viber’s chat backup capability. I found that Viber has two backups — one that you can restore, and one that you can email. It turns out that the email-able backup is actually in CSV. And so I realised I could parse it very easily with Python; and use a templating module such as Jinja2 or Mako, format it into an easy to read HTML page.

Coding

Reading the CSVs was simple enough, there’s a module built into the Python Standard Library to read them as an iterable. Then by reading over the log, I could see the structure easily:

Date Time Sender Name Sender Phone Number Message

I had two major problems parsing the data:

  1. parsing the date and time
  2. Handling messages that included a new line. A message that contained a new line, actually saved the new line character to the CSV meaning that the line in the CSV actually broke. This is probably invalid CSV or something.

Date/time parsing

I wanted a way to parse to separate strings that make up the date-time. One string as the date and the other as the time.  I started to use datetime.combine() but later realised I could create the whole object in one hit. datetime.strptime() is more powerful than I thought.

 

Paragraphed Message Handling

Handling the new line was a fun challenge, but I’m disappointed it has to be this way. I took advantage of the fact that a message line should have a date string in the first column, and if not, it must be a continuation of the first column. If I couldn’t parse the date string to a date object, I’d treat that line as a continuation and append it to the most recently found message. (I also needed to handle actual commas in the message, so I assumed that a comma in the text would actually be written as “, ” so I recreated it the internal data structure.) I know that this whole thing is kludgy, but I can’t do much else when the CSV data isn’t properly quoted. (As I discovered when updating this project for Facebook Messenger. I’ll talk about that in another post though.)

Structure

I was introduced to object oriented programming by using classes as data structures, and I think for this kind of application it makes sense. This gives me the flexibility to allow a Message object and a Chatlog object to do things to themselves, over-and-above just using properties.

screenshot of my data structure

Currently, though I still have a mix of “private” variables (because Python doesn’t support private variables) with getter methods and Python Properties which I started to use once I discovered them. Most of the class is set in the constructor, with only the contents, is_user flag and timestamp as properties in case they need to be changed later.

Templating

The basic Viber template I created is as follows:

 

This pretty simple, but not very pretty. I pass the whole chatlog object instantiation to Jinja2 and then I use it directly within the template. In this case, I iterate over each message, displaying the date and time as a <p> element and the message itself as another paragraph element. If the message sender name is “Me”, I give it a class which floats it right.

chatlog in HTML screenshotConclusion

I’ve actually done a lot more than this, but I think this is enough for an introduction and basic history. The latest version actually supports Facebook chat and bundles messages together from the same contact if they are in the same minute.

It can be found on Github at Chat Log Viewer. Feel free to fork it and send me a PR, there are issues I need to fix. In a future post, I’ll talk about the Facebook code additions and learnings related to that.

Posted by Anthony