I’m using Viber to communicate with someone, and we have many chats. So I looked into Viber’s chat backup capability. I found that Viber has two backups — one that you can restore, and one that you can email. It turns out that the email-able backup is actually in CSV. And so I realised I could parse it very easily with Python; and use a templating module such as Jinja2 or Mako, format it into an easy to read HTML page.
Coding
Reading the CSVs was simple enough, there’s a module built into the Python Standard Library to read them as an iterable. Then by reading over the log, I could see the structure easily:
Date | Time | Sender Name | Sender Phone Number | Message |
I had two major problems parsing the data:
- parsing the date and time
- Handling messages that included a new line. A message that contained a new line, actually saved the new line character to the CSV meaning that the line in the CSV actually broke. This is probably invalid CSV or something.
Date/time parsing
I wanted a way to parse to separate strings that make up the date-time. One string as the date and the other as the time. I started to use datetime.combine() but later realised I could create the whole object in one hit. datetime.strptime() is more powerful than I thought.
def datetime_parser(this_date: str, this_time: str, date_mask="dd/mm/yyyy") -> datetime: """ Parses input dates and outputs a datetime object :param this_date: :param this_time: :param date_mask: :return: """ # dt = datetime.strptime(this_date, "%d/%m/%Y") dt = datetime.strptime(this_date + this_time, "%d/%m/%Y"+"%H:%M:%S") t = datetime.strptime(this_time, "%H:%M:%S") # result = datetime.combine(d, t) return dt
Paragraphed Message Handling
Handling the new line was a fun challenge, but I’m disappointed it has to be this way. I took advantage of the fact that a message line should have a date string in the first column, and if not, it must be a continuation of the first column. If I couldn’t parse the date string to a date object, I’d treat that line as a continuation and append it to the most recently found message. (I also needed to handle actual commas in the message, so I assumed that a comma in the text would actually be written as “, ” so I recreated it the internal data structure.) I know that this whole thing is kludgy, but I can’t do much else when the CSV data isn’t properly quoted. (As I discovered when updating this project for Facebook Messenger. I’ll talk about that in another post though.)
def viber(filename, viber_chats): with codecs.open(filename, "r", encoding='utf-8-sig') as chatfile: chat = csv.reader(chatfile, delimiter=",") for line in chat: if len(line) > 0: try: timestamp = datetime_parser(line[0], line[1]) content = "" for i, message_fragment in enumerate(line[4:]): content += message_fragment if i + 1 != len(line[4:]): content += ", " m = Message(line[2], line[3], timestamp, content) if m.get_sender_name() == 'Me': m.is_user = True viber_chats.add_message(m) except (ValueError, IndexError): # this must be a continuation of the previous message rest_content = "\n" if len(line) == 0: print("here") for i, message_fragment in enumerate(line): rest_content += message_fragment if i + 1 != len(line): rest_content += ", " viber_chats.get_most_recently_found_msg().contents += rest_content elif len(line) == 0: # this must be a continuation of the previous message # and a para space viber_chats.get_most_recently_found_msg().contents += "\n"
Structure
I was introduced to object oriented programming by using classes as data structures, and I think for this kind of application it makes sense. This gives me the flexibility to allow a Message object and a Chatlog object to do things to themselves, over-and-above just using properties.
Currently, though I still have a mix of “private” variables (because Python doesn’t support private variables) with getter methods and Python Properties which I started to use once I discovered them. Most of the class is set in the constructor, with only the contents, is_user flag and timestamp as properties in case they need to be changed later.
Templating
The basic Viber template I created is as follows:
<!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8"> <title>Chat Logs</title> <style type="text/css"> body { font-family: 'Arial', 'Helvetica', sans-serif; } .message { /*width: 40%;*/ margin: 0; border: black thin solid; border-radius: 5pt; clear: both; padding: 5pt; float: left; max-width: 40%; } .me { float: right; text-align: right; clear: both; } .message p { margin: 0; } </style> </head> <body> <h1>Number of messages: {{chat._messages|length}}</h1> {% for key, msg in chat._messages.items() %} <div class="message {% if msg.get_sender_name() == 'Me' %}me{% endif %}"> <p>{{msg.timestamp.date()}} {{msg.timestamp.time()}}</p> <p>{{msg.contents}}</p> </div> {% endfor %} </body> </html>
This pretty simple, but not very pretty. I pass the whole chatlog object instantiation to Jinja2 and then I use it directly within the template. In this case, I iterate over each message, displaying the date and time as a <p> element and the message itself as another paragraph element. If the message sender name is “Me”, I give it a class which floats it right.
Conclusion
I’ve actually done a lot more than this, but I think this is enough for an introduction and basic history. The latest version actually supports Facebook chat and bundles messages together from the same contact if they are in the same minute.
It can be found on Github at Chat Log Viewer. Feel free to fork it and send me a PR, there are issues I need to fix. In a future post, I’ll talk about the Facebook code additions and learnings related to that.