COMP7506 – Algorithms and Data Structures

Algorithms and Data Structures was one of my last two subjects in my Graduate Diploma in Information Technology. It served as a broad look at the design and implementation of data structures and computer algorithms for the processing of large datasets. 

The course covered many topics both common in general purpose software development and more common only in high-performance computing (or choosing an existing implementation)

  • Abstract vs Concrete Data types in Java (interfaces and the impleting class)
  • Arrays vs Linked Lists – when to use one or the other and linked list implementation
  • Time and Space (memory/storage) Asymptotic Analysis:
    • Big O Notation + Big Theta Notation / Big Omega Notation
    • Describing an algorithm in psuedo-code for easier discussion.
  • Stacks (first in, last out) and Queues (first in, first out). Usage and implementation.
  • List ADT (like Java’s ArrayList)
  • Trees 
    • traversal orders (pre-, post-, in-order)
    • Binary Trees
    • Arithemetic Expression Trees
    • Implementation (linked or array-based)
  • Heaps and Priority Queue implementation
  • Maps, HashTables, Sets usage and implementation
    • Hashing, Compression of keys to a fixed size table
    • Linear probing, double hashing, separate chaining
  • Text Processing and Searching.
    • Boyer-Moore Heuristics
    • Knuth-Morris-Prat Algorithm
    • Tries (the tree-based data structure for storing text occurrences)
    • Huffman Encoding and pre-fix coding.
  • Graphs and graph traversal.
  • Sorting and Selection
    • Merge Sort, Bubble Sort, Insertion Sort, Quick Sort
    • Quick Select, Partition

Assignment 1

The first assignment’s backstory was a simulation of airspace in Australia. The actual task was to implement a Queue and a Cube data structure together with appropriate unit tests attaining 80% code coverage for the most marks in this area.

        /**
	 * Removes an element from the queue
	 * Time: O(n)
	 * @param element
	 * @return
	 */
	boolean remove(T element) {
		for (int i = 0; i < queue.length; i++) {
			if (queue[i] == element) {
				queue[i] = null;
				if (frontOfQueue == i) {
					frontOfQueue++;
				}
				size--;
				return true;
			}
		}
		return false;
	}

Assignment 2

The second assignment was about text processing. We were given a large volume of text and needed to implement various search functions, some were:

  • Word occurence count
  • Phrase search
  • Prefix search
  • word  search

Trie Class (for storing word occurrences)

package comp3506.assn2.application;

import comp3506.assn2.utils.Pair;

public class Trie {

	TrieNode root = new TrieNode();

	public Trie() {

	}
	/**
	 * adds a new word to the Trie. If word is already in Trie, only add its position
	 * Time Complexity: O(n) (including a worst case constant for creating new objects 
	 * - where n is the letters in the word. 
	 * Worst case where no prefix of that word has been added so need to add all new objects.
	 * already. Because it loops over each letter in the word. Doesn't care about alphabet
	 * size like in lecture notes because this
	 * implementation uses a fixed size array to store the child nodes and indexes into 
	 * them directly. 
	 * @param word the word to add
	 * @param lineNumber the line number where it was found
	 * @param characterPosition the column number where it was found
	 * 
	 */
	public void addWord(String word, int lineNumber, int characterPosition) {
		TrieNode checkOutNode = root;

		for (int i = 0; i < word.length(); i++) {
			int index = word.toLowerCase().charAt(i) - 'a';

			if (checkOutNode.children[index] != null) {
				checkOutNode = checkOutNode.children[index];
			}
			else {
				checkOutNode.children[index] = new TrieNode(word.toLowerCase().charAt(i));
				checkOutNode.hasChildren = true;
				checkOutNode = checkOutNode.children[index];
			}
		}

		Pair<Integer, Integer> occurence = new Pair<Integer, Integer>(lineNumber, characterPosition);
		checkOutNode.isEnd();

		if (checkOutNode.occurences == null) {
			checkOutNode.occurences = new LinkedList<>();
		}
		checkOutNode.occurences.add(occurence);
	}

	/**
	 * Returns the TrieNode whose path forms the word given
	 * Time Complexity: O(n) where n is number of letters. 
	 * 
	 * @param word word to search for
	 * @return the end node
	 */
	TrieNode findNode(String word) {
		TrieNode result;
		word = comp3506.assn2.utils.Misc.trimNonLetters(word);
		TrieNode checkOutNode = root;
		for (int i = 0; i < word.length(); i++) {
			int index = word.toLowerCase().charAt(i) - 'a';
			if (checkOutNode.children[index] != null) {
				checkOutNode = checkOutNode.children[index];
			} else {
				// Not found
				return null;
			}
		}
		result = checkOutNode;
		return result;
	}
	/**
	 * Recursively finds children's occurrences of nodes that are also words.
	 * Time: O(n*M) where n is the length of suffixes and M is the number of children of this Node.
	 * @param node Node: starting node
	 * @param resultSet 
	 * @return
	 */
	TrieNode traverseChildren(TrieNode node, LinkedList<Pair<Integer, Integer>> resultSet) {
		if (!node.hasChildren) {
			resultSet.append(node.occurences);
		}
		else {
			for (TrieNode child : node.children) {
				if (child != null) {
					traverseChildren(child, resultSet);
				}
			}
		}
		return null;
	}
	/**
	 * Finds words and the words that the word given is a prefix of.
	 * O(n + (N *M)) basically time complexity of findNode() + traverseChildren() 
	 * because it just uses those two.
	 * @param word
	 * @return
	 */
	public LinkedList<Pair<Integer, Integer>> findPrefixes(String word) {
		LinkedList<Pair<Integer, Integer>> result = new LinkedList<>();
		TrieNode prefix = findNode(word);
		if (prefix == null) {
			return new LinkedList<>();
		}
		result.append(prefix.occurences);
		traverseChildren(prefix, result);
		return result;
	}
	
	/**
	 * Searches for a word given and returns the occurrences. 
	 * Time Complexity: O(n + n) where n is number of letters plus trimNonLetters (regex blackbox). 
	 * @param word
	 * @return
	 */
	public LinkedList<Pair<Integer, Integer>> search(String word) {
		LinkedList<Pair<Integer, Integer>> result = null;
		word = comp3506.assn2.utils.Misc.trimNonLetters(word);
		TrieNode checkOutNode = findNode(word);
		result = checkOutNode.occurences;
		return result;
	}
}

//Reference
// basic Trie structure and search and add from Geeks For Geeks. Modified for Java, this assignment, and to make more sense to me. 
//V. Rao Sanaka, "Trie | (Insert and Search)", Geeks For Geeks. [Online]. Available: https://www.geeksforgeeks.org/trie-insert-and-search/. [Accessed: 15- Sep- 2018].

Trie Node

package comp3506.assn2.application;

import comp3506.assn2.utils.Pair;

/**
 * Space Complexity: O(2 bits + 1 char + array of Node pointers)
 * @author anthony
 *
 */
public class TrieNode {
	TrieNode[] children = new TrieNode[26];
	LinkedList<Pair<Integer, Integer>> occurences;
	String value;
	char character;
	boolean hasChildren;
	
	boolean isEnd;
	
	/**
	 * Constructor
	 * @param c character it represents to help debugging.
	 */
	public TrieNode(char c) {
		isEnd = false;
		character = c;
		hasChildren = false;
	}
	
	/**
	 * Constructor
	 */
	public TrieNode() {
		hasChildren = false;
		isEnd = false;
	}
	
	void isEnd() {
		isEnd = true;
	}
	
}

As well as the actual search method implementation we had to write a readme justifying our decisions and choices in algorithms; and the post-graduate students (which I was) needed to write a research essay on optimising the loading time of the text and data structures by saving it to disk.

report_redacted

Overall thoughts

During the assignments, particularly the second one, I discovered that I work well by breaking up a large task into smaller parts that I can complete. I found if I break it up before I start much actual development and write the sub-tasks down, I can more easily focus on each task.

This course was rather hard, though probably not the hardest course I’ve done during this degree. (The hardest course I did was CSSE7023 – Advanced Software Engineering.) To be honest (and I think it’s good to be) I actually wanted to drop this course because I didn’t see the relevance, nor did I enjoy it in the slightest. For several reasons, I ended up staying in the course and I also ended up passing it! I still don’t actually see the relevance of all the different types of tree data structures, but I am glad I did the course and learnt the concepts. Also, doing the assignments with their abstract data types, concrete data types, and search functionality implementation did give me more practical experience in object-oriented programming which helped in my work as a junior software developer at Canopy Tools.

I would however recommend the book A Common-Sense Guide to Data Structures and Algorithms by Jay Wengrow. It is quite accessible and frankly explained the concepts better and with less words than the prescribed textbook.

Taken from https://pragprog.com/book/jwdsal/a-common-sense-guide-to-data-structures-and-algorithms. Retrieved on 3rd January 2019