Skip to content

Latest commit

 

History

History
654 lines (451 loc) · 30.3 KB

report.md

File metadata and controls

654 lines (451 loc) · 30.3 KB

Report for assignment 3

Group 17

Project

Name: Java : Algorithms and Data Structure

URL: https://github.com/phishman3579/java-algorithms-implementation

A collection of algorithms and data structures implemented by Justin Wetherell.

Onboarding experience

Initially, we had trouble even building the project. This was due to the complete lack of build instructions. It is not documented anywhere how you can run the tests, or even what build tool to use. In the end, we found a configuration file for Travis CI, which allowed us to deduce that the project uses the Apache Ant build system. After installing this tool, we were finally able to build and run the project without further issues. We submitted a Pull Request with updated build instructions in the README.

Complexity

1. What are your results for ten complex functions?

Function nLOC lizard CCN Manual CCN
BinaryHeapArray.heapDown 62 41 38
IntervalTree.Interval.query 44 27 27
BinarySearchTree.replaceNodeWithNode 45 22 22
BinarySearchTree.getDFS 60 23 23
BTree.validateNode 59 22 6
Multiplication.multiplyUsingFFT 68 21 18
RedBlackTree.balanceAfterDelete 72 22 21
RedBlackTree.balanceAfterInsert 46 18 15
SegmentTree.FlatSegmentTree.NonOverlappingSegment.query 62 28 18
SegmentTree.DynamicSegmentTree.OverlappingSegment.createFromList 44 17 17

lizard seems to count ternary operators (condition ? if_true : if_false) as having cyclomatic complexity 1. I did the same in the manual results above. The difference between lizard's and my manual results is a bit harder to explain. In my manual version I subtracted 1 from the CCN for each exit point (return) in the code, using the following formula (source):

CCN = d - e + 2

where d is the number of decicion points (ifs, fors, whiles, &&, ||, etc.) and e the number of exit points (ie. returns). If we ignore the term e and pretend that there's a single exit point (ie. e = 1) we would end up with the same answer as lizard.

2. Are the functions just complex, or also long?

The complexity of the functions seems to be correlated with their length.

3. What is the purpose of the functions?

A lot of the complexity of the functions stems from handling multiple different cases in the same function. For example, there are cases where it is possible to change the behaivour of functions (order to visit nodes in DFS and if elements should be order largest-to-lowest or smallest-to-largest).

Some uneccessary complexity arose from the use of poor case managing, ie. the same condition was checked multiple times in sub if/else branches.

4. Are exceptions taken into account in the given measurements?

We did not encounter code that dealt with exceptions (throwing or catching) so we could not determine if lizard took this into account. Since the oppurntunity did not present itself, we didn't have to consider the impact of exceptions.

5. Is the documentation clear w.r.t. all the possible outcomes?

Functions were often not properly documented, especially in regards to mapping input to output depending on different cases. If functions had comments, they often were either very shallow, or just restated what was trivially inferred by the function name.

Refactoring

Below you'll find a number of functions we refactored in order to reduce their cyclomatic complexity.

@psalqvist (Philip Salqvist): BinarySearchTree.replaceNodeWithNode

Refactor to reduce duplication of if statements and && operations mainly. To provide an example:

// Remove link from replacementNode's parent to replacement
Node<T> replacementParent = replacementNode.parent;
if (replacementParent != null && replacementParent != nodeToRemoved) {
    Node<T> replacementParentLesser = replacementParent.lesser;
    Node<T> replacementParentGreater = replacementParent.greater;
    if (replacementParentLesser != null && replacementParentLesser == replacementNode) {
        replacementParent.lesser = replacementNodeGreater;
        if (replacementNodeGreater != null)
            replacementNodeGreater.parent = replacementParent;
    } else if (replacementParentGreater != null && replacementParentGreater == replacementNode) {
        replacementParent.greater = replacementNodeLesser;
        if (replacementNodeLesser != null)
            replacementNodeLesser.parent = replacementParent;
    }
}

is reduced to:

public void removeParent(Node<T> removed, Node<T> replacement, Node<T> replacementLesser, Node<T> replacementGreater) {
    if (replacement.parent == null || replacement.parent == removed) return;

    if (replacement.parent.lesser == replacement) {
        replacement.parent.lesser = replacementGreater;
        if (replacementGreater != null) replacementGreater.parent = replacement.parent;
        return;
    }
    replacement.parent.greater = replacementLesser;
    if(replacementLesser != null) replacementLesser.parent = replacement.parent;
}

replaceNodeWithNode originally had an NCC of 30. Now the function is broken down into 4 functions with an NCC of 2, 5, 6 and 6.

For the full changes, see:

git show 901117ab917522fbe1bce573726f2a26c5446634

@nolanderc (Christofer Nolander): BinaryHeapArray.heapDown

The main goal of refactoring should be to reduce the amount of duplication in the if statements. For example, there is one that looks like this:

if ((type == Type.MIN && left != null && right != null && value.compareTo(left) > 0 && value.compareTo(right) > 0)
 || (type == Type.MAX && left != null && right != null && value.compareTo(left) < 0 && value.compareTo(right) < 0) {
    ...
}

Here the checks for left != null and right != null are duplicated twice each. Also the comparisons against value are repeated twice, but with different operators (< and >). This pattern is repeated an additional 2-4 times, depending on how you count.

We can reduce the cyclomatic complexity of this code by restructuring the code so that each check for null only happens once, and which comparison to do against the value is determined by the type only a single time. Applying these changes results in something the following:

// determine the order we want nodes in once
int desiredOrder = (type == Type.MIN) ? -1 : 1;
int undesiredOrder = -desiredOrder;

// Do checks against null and perform comparisons against the parent value.
// If their order does not match the desired, we need to swap them.
boolean leftShouldSwap = left != null && Integer.signum(value.compareTo(left)) == undesiredOrder;
boolean rightShouldSwap = right != null && Integer.signum(value.compareTo(right)) == undesiredOrder;

// handle different cases depending on which child needs to swap with the parent
if (leftShouldSwap & rightShouldSwap) {
    // if both need to swap, make sure that swapping preserves the desired order
    return (Integer.signum(left.compareTo(right)) == desiredOrder) ? leftIndex : rightIndex;
} else if (leftShouldSwap) {
    return leftIndex;
} else if (rightShouldSwap) {
    return rightIndex;
} else {
    return -1;
}

This is essentially everything the old 46 nLOC function did, but with multiple levels of nested if statements and convuluted logic. In the end, the new version uses two functions with a cyclomatic complexity of 2 and 10, respectively. Compare this to the old version which had a cyclomatic complexity of 41.

For a full diff, run the following:

git show 9b4599289de7c734e4cd7364ce8535fc7d32be90

@nolanderc (Christofer Nolander): IntervalTree.query

Here we use the same techniques as before: we remove duplicated code by recognizing that the only differences between two branches are a few variables. This means we can move everything but assignment of these variables outside the ifs, reducing the amount of duplicated code. For example, this was originally in the code:

if (index < center) {
    if (left != null) {
        IntervalData<O> temp = left.query(index);
        if (results == null && temp != null)
            results = temp;
        else if (results != null && temp != null)
            results.combined(temp);
    }
} else if (index >= center) {
    if (right != null) {
        IntervalData<O> temp = right.query(index);
        if (results == null && temp != null)
            results = temp;
        else if (results != null && temp != null)
            results.combined(temp);
    }
}

Notice that the only difference between these branches is left and right. Thus, we can replace the above with:

IntervalData<O> next = index < center ? left : right;
if (next != null) {
    IntervalData<O> temp = next.query(index);
    if (temp != null) results = appendToInterval(results, temp);
}

where appendToInterval is a helper function performing the same function as the if statements before.

We can then reuse these principles again to further reduce the complexity of the function down from 27 to one function with CCN 7 and another with 2.

git show 8246a9edbbf6aafb92b3c7e8cf7eb1b33ffb7ab5

@ekorre1001 (Jiayi Guo): Multiplication.multiplyUsingFFT

To improve the cyclomatic complexity we want to either remove or reduce the use of if, for, while, else if, && and ||. In the multiplyUsingFFT we can find a lot of if statements used to investigate both of the input numbers since they are strings. For instance the following is used to check whether the product is negative.

if ((a.charAt(0) == '-' && b.charAt(0) != '-') || (a.charAt(0) != '-' && b.charAt(0) == '-')) {
      negative = true;
}

This can be improved by converting both of the strings to integers and check if the product is negative or not.

int x = Integer.parseInt(a);
int y = Integer.parseInt(b);
if(x*y < 0) negative = true;

By doing this we have reduced the cyclomatic complexity but the drawback is that we initiate new variables and we are dependent on another library.

Lastly, we can remove the use of some commonly used operations by promoting them to a function. For instance, since both strings are edited to remove the minus symbol (the following code), we can reduce the complexity by adding a helper function that gets called for each string. The drawback is of course those additional functions.

if (a.charAt(0) == '-') {
    a = a.substring(1);
}
if (b.charAt(0) == '-') {
    b = b.substring(1);
}

In the end, we managed to reduce the CCN from 21 to 13.

Git diff: Check the refactor section here.

@ekorre1001 (Jiayi Guo): BTree.validateNode

As with the previous function we aim to reduce the use of if, for, while, else if, && and ||. In this case a lot of if statements were used to check a specific condition and directly followed by a return. For example the following:

if (keySize < minKeySize) {
    return false;
} else if (keySize > maxKeySize) {
    return false;
} else if (childrenSize == 0) {
    return true;
} else if (keySize != (childrenSize - 1)) {
    return false;
} else if (childrenSize < minChildrenSize) {
    return false;
} else if (childrenSize > maxChildrenSize) {
    return false;
}

This can be reduced by creating a helper function that does the same thing and then we just need to check the value returned by the helper function. The code above can be replaced with the following:

// make the check in another function and save the result
int checkNonRoot = validateNonRootHelper(keySize, childrenSize);
// return the corresponding boolean
if (checkNonRoot == 0) {
    return false;
} else if (checkNonRoot == 1) {
    return true;
}

The drawback is that we need to add a few additional functions and variables.

In the end, we managed to reduce the CCN from 22 to 14.

Git diff: Check the refactor section here.

@Kubha_99 (Kunal Bhatnagar)

I did refactoring plan for the following two functions,

RedBlackTree: balanceAfterInsert

It initially had a CC of 18 and was a quite long and complex function. By splitting the logic of the functions into two help function the complexity of that function was reduced to 7. The logic that would be divided into two subfunctions was the operations to perform is a specific node called uncle that could be either black or red. For this I seperated the logic into two subfunctions containing the logic for each of the two options.

This plan was also implemented and can be seen by,

git show a6f5a82600dd192f69d845da782bc6f2225bb943

RedBlackTree: balanceAfterDelete

This was a very large and complex function, entangled in a lot of logical if:s that could be simplified. It originally had a CC of 22. After studying the function for a while i noticed that many of the conditions that the if statements check are repeated. They also tend to arrive at either the exact same state or the opposite. Hence they can be easier combined and structured into a "smarter" aggregated if loops in order to reduce the complexity. Another thing I found is that there is a very specific case containing logic that should be seperated. The logic checks if a specific node is RED and if it is we need to perform a set of actions.

After this plan was implemented the complexity fell down to 13.

git show a6f5a82600dd192f69d845da782bc6f2225bb943

Coverage

Tools

The code coverage tool we used was OpenClover which works with the Ant build tool. We followed the provided quick start guide and managed to integrate it with Ant. It was quite easy to do the setup since only a few steps were required, although a few things did not work out initially. However, we were able to troubleshoot things and finally got it up and running.

Your own coverage tool

Show a patch (or link to a branch) that shows the instrumented code to gather coverage measurements.

The patch is probably too long to be copied here, so please add the git command that is used to obtain the patch instead:

git diff ...

What kinds of constructs does your tool support, and how accurate is its output?

See this table to look at the instrumented code:

Function Name Show Diff
BinaryHeapArray.heapDown git show 9ef482092816b3367f4d4b45214821ee11019fe3
BinarySearchTree.replaceNodeWithNode git show 7c8a37bd8f21ab6c6f370eb2ed62eb8bd30c27bf
BinarySearchTree.TreePrinter.getString git show 4fb7392b531f9b68a7443b606655a55aed228918
Prime.isPrime git show 4fb7392b531f9b68a7443b606655a55aed228918

Our tool supports if statements, while loops and for loops. The output should show exactly how many times a code block was executed, and also notify if a code block has not been executed. For example:

BranchCoverageTest.branchCoverageWithNotReached:
  - entry point: 1
  - not reached: 0 <-- NOT REACHED

Evaluation

1. How detailed is your coverage measurement?

Our tool takes into handles any language construct that contains a block of code. That includes ifs, try-catch, and most loops, but excludes ternary expressions (boolean ? true : false) and uncaught exceptions. This is because branch execution is caught by making a function call with the instumentation class object.

2. What are the limitations of your own tool?

If you want to measure given a function, you have to put in the effort to actually write all the instrumentation statements yourself. This can be a lot of effort. This was immediately obvious if you try to find a function with low branch coverage, as this requires about 5-10 minutes of effort just to investigate a single function.

3. Are the results of your tool consistent with existing coverage tools?

Comparing the output of OpenClover to our own tool, we see the same results (ie. after running the test suite, we see the same number of executions for any given branch).

Coverage improvement

Everyone added at least 4 additional test cases to the test suite.

@ekorre1001 (Jiayi Guo)

Changed to another function because:

BTree::validateNode: Barely any test for the data structure and no test for this specific function which makes it harder to understand.

Multiplication::multiplyUsingFFT: Above 98% code coverage

The requirements + report can be found here

5 Test cases added and can be found in the PrimeTest.java

@psalqvist (Philip Salqvist)

BinarySearchTree.replaceNodeWithNode before: 0%, after: 96,9%

Previously, there existed no explicit test cases for this function. Thus, branch coverage was 0%. Added 8 test cases to improve branch coverage to 96,9%. In the branch coverage report I only run explicit tests for replaceNodeWithNode. If not, implicit tests covered a lot of branches unintentionally. This way, the progression of the new explicit tests are more easy to follow.

Branch coverage report: https://docs.google.com/document/d/1n01CraeifGIsWPX0nC88pzylQhwQNforvvrhc141y6M/edit

New test cases:

git show cf3a7a4078b8bd8208ffc5c74101e9dd44c1ad69

@nolanderc (Christofer Nolander)

BinaryHeapArray.heapDown before: 94.1%, after: 100.0%.

Branch coverage report: https://docs.google.com/document/d/1IVLQTIkl8IiemVskQ_JFw1sNe5EBXXEWpTxYIQ4-NV0/edit?usp=sharing

For the 2 new test cases, see:

git show 395749c08a93e8b8a063e87849e44543fe1189ac

Since the heapDown function already had such high branch coverage I also wrote tests for BinaryHeapArray.validateNode:

BinaryHeapArray.validateNode before: 57.7%, after: 100.0%.

For the 2 new test cases, see:

git show 8e6d8c7d8d36417133fe66f2c8d80a153d23b3f9 

While writing these tests, I also found a bug in the source code. The fix can be seen here:

git show 12d42662a86bc770915c15f5d2d24d877b107c06

@Kubha_99 (Kunal Bhatnagar)

RedBlackTree:BalanceAfterInsertand RedBlackTree:BalanceAfterDelete

All information about coverage percentage and before after can be found at https://docs.google.com/document/d/1hvrr56Q9QfOv9k4Ixlnah9XqeZsu_Dx5NCEKpLBFNVk/edit?usp=sharing

git show 818905e88c856edab56d6efef6ac415597f7543b

@mantaur (Mark Bergrahm)

BinarySearchTree.TreePrinter.getString before: 0.0%, after: 96.1%.

Branch coverage report: https://docs.google.com/document/d/1DI1iBl6Sr4eEB6ruFHFqF14WtQgAkcGSYj0F7vOc_HU/edit?usp=sharing

4 new test cases to improve coverage can be seen:

git show 0b4093c79e6ae3ad6f61a6f9e3d22b7e6f285b66

Self-assessment: Way of working

Current state according to the Essence standard

We evaluate that we continued developing our way of working from the last assignment and have so far fullfilled the "In Use" and "In Place" states where checks are applicable. For example the "Procedures are in place to handle feedback on the team’s way of working" seems more suited for a team working on a long running project. Meanwhile we start and end assignments weekly, which does not give us much room to apply any feedback while working. Thus we end up gathering feedback during the assignment presentations and subsequent team meeting, which we then incorporate into our way of working for the next assignment. We are currently starting to look at entering the "Working well" state for the coming assignments, as we already fulfill some of the items in this stage (such as "the team naturally applies the practices without thinking about them" and "the tools naturally support the way that the team works").

Was the self-assessment unanimous? Any doubts about certain items?

Sels-assessment was unanimous amongst the group members. When doubts occured, we discussed and resolved them together.

How have you improved so far?

The team is naturally applying tools and practices. Especially regarding the use of git and github, such as branch naming conventions, creating issues and pull requests, as well as reviewing pull requests on a regular basis.

Where is potential for improvement?

As mentioned above in "Current state..." the team is aiming to enter the "Working well" state of the Essence standard. As such one key area for improvement is to continually tune our use of practices and tools, which can be achieved by learning more about them and using a greater extent of their functionality.

Overall experience

What are your main take-aways from this project? What did you learn?

  • Branch coverage can be a good way to identify bugs in the source code (for example, see the bug in BinaryHeapArray.validateNode above).
  • High cyclomatic complexity is not neccessarily bound to high algorithmic or general complexity. We found many cases of poorly written code increasing the complexity of quite simple functions. Sometimes thorough code review can greatly reduce the CCN whilst keeping functionality and making the code more easily understandable.
  • Some group members didn't explicitly reflect upon cyclomatic complexity before this assignment. It is evidently a great way to keep code easy to read and understand.