Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Caw/improve array comparison performance #1

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

mansidhall-CAW
Copy link

Sparse arrays and arrays containing undefined are now compared faster
using assert.deepStrictEqual() or util.isDeepStrictEqual().
This improves the performance to compare unequal numbers while doing
a deep equal comparison. Comparing for NaN is faster by checking
`variable !== variable` than by using `Number.isNaN()`.
@mansidhall-CAW
Copy link
Author

📝 OVERVIEW

  • Main purpose: Enhance sparse array handling in assertions and benchmarks, address potential integer overflow in the Myers diff algorithm, and optimize NaN detection and equivalence checks.
  • Type of changes: Feature addition (sparse array benchmarks), refactoring (code simplification in equivalence checks), and bug mitigation (TODO for overflow prevention).

🛠️ TECHNICAL DETAILS

  • benchmark/assert/deepequal-simple-array-and-set.js:

    • Added deepEqual_sparseArray/notDeepEqual_sparseArray benchmarks to test sparse array comparisons.
    • Refactored array initialization using Array.from({ length: len }, (_, i) => i % 2 ? undefined : i) to create sparse arrays with explicit undefined elements.
    • Modified test logic to handle sparse arrays via sliced copies (e.g., arr.slice(0)) and adjusted indices for inequality checks to account for sparse indices.
    • Retained existing array/Set tests but updated setup code to integrate sparse array scenarios.
  • lib/internal/assert/myers_diff.js:

    • Added a TODO comment in myersDiff to cap input lengths (e.g., Math.min(maxAllowedLength, left.length)) to prevent Int32Array overflow beyond 2^31 - 1.
    • Highlights a potential safeguard for large input sizes exceeding 32-bit integer limits in the dynamic programming array used for diff computation.
  • lib/internal/util/comparisons.js:

    • Replaced NumberIsNaN(val) with val !== val for NaN detection in innerDeepEqual, findLooseMatchingPrimitives, and related functions.
    • Streamlined sparse array equivalence checks in sparseArrayEquiv and objEquiv by consolidating conditional checks (e.g., if (aIsSparse && bIsSparse)).
    • Restructured keyCheck to unify strict mode symbol key comparisons within a conditional block, reducing duplicated logic.
  • Dependencies/APIs: No new dependencies or external services introduced. Internal APIs remain unchanged, though the Myers diff TODO implies a future modification to input validation.


🏗️ ARCHITECTURAL IMPACT

  • Refactoring in comparisons.js: Reduced code duplication by consolidating sparse array and NaN checks, improving maintainability.
  • Myers diff algorithm: The TODO introduces a potential future architectural change to enforce input size limits, preventing overflow in Int32Array allocations.
  • Benchmark architecture: Added sparse array test cases without altering existing test structures, maintaining separation of concerns.

🚀 IMPLEMENTATION HIGHLIGHTS

  • Algorithms/Data Structures:

    • Myers diff algorithm uses Int32Array for diagonal trace storage, which is now flagged for input size capping to avoid overflow.
    • Sparse array comparisons leverage Array.slice() to create dense copies for equivalence checks, ensuring index alignment.
  • Performance:

    • NaN detection via val !== val may offer marginal performance gains over NumberIsNaN, as it avoids function call overhead.
    • Sparse array handling optimizations reduce redundant checks in sparseArrayEquiv, improving efficiency for large sparse structures.
  • Security:

    • The Myers diff TODO addresses potential integer overflow vulnerabilities in Int32Array allocations, mitigating risks of buffer overflow or undefined behavior.
    • NaN detection via val !== val is semantically equivalent to NumberIsNaN but avoids edge cases with non-number values (though already guarded in context).
  • Error Handling:

    • Benchmark tests explicitly validate sparse array indices and inequality cases, ensuring robustness against unexpected behavior.
    • The Myers diff TODO ensures future-proofing against oversized inputs that could destabilize the diff algorithm.

📜 Changes

File Change Type Changes Summary
benchmark/assert/deepequal-simple-array-and-set.js Modified +20/-6 The changes introduce sparse array benchmarks by adding deepEqual_sparseArray/notDeepEqual_sparseArray methods, refactoring array initialization to use Array.from with undefined elements and sparse index assignment, and modifying test logic to handle sparse array comparisons via sliced copies and adjusted indices for inequality checks. Key updates include creating sparse arrays with new Array(len) and interval assignments, while retaining existing array/Set tests with modified setup code.
lib/internal/assert/myers_diff.js Modified +1/-0 The change adds a TODO comment in the myersDiff function to address capping input lengths preventing overflow beyond 2^31 - 1, specifically referencing the Int32Array initialization for diagonal trace storage. The modification highlights a potential safeguard for large input sizes exceeding 32-bit integer limits in the diff algorithm's dynamic programming array.
lib/internal/util/comparisons.js Modified +20/-23 The changes replace NumberIsNaN with direct val !== val checks for NaN detection in innerDeepEqual, findLooseMatchingPrimitives, and related functions, streamline sparse array equivalence handling in sparseArrayEquiv and objEquiv by consolidating conditional checks and loop logic, and restructure keyCheck to align strict mode symbol key comparisons within a conditional block.

const expected = [];
let actual = Array.from({ length: len }, (_, i) => i);
// Contain one undefined value to trigger a specific code path
actual[0] = undefined;
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting actual[0] to undefined here affects all test cases, including non-sparse ones. This changes the behavior of existing tests like 'deepEqual_Array' which should test dense arrays without undefined elements. Move this setup into the sparse test cases only.

SUGGESTION:

- actual[0] = undefined;

@@ -29,6 +29,7 @@ function myersDiff(actual, expected, checkCommaDisparity = false) {
const actualLength = actual.length;
const expectedLength = expected.length;
const max = actualLength + expectedLength;
// TODO(BridgeAR): Cap the input in case the values go beyond the limit of 2^31 - 1.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TODO about capping input lengths is valid but unimplemented. Failing to cap could cause Int32Array overflow when actualLength + expectedLength exceeds ~2^31, leading to incorrect diffs or crashes. Must implement input length validation before proceeding.

SUGGESTION:

+ const MAX_SAFE_LENGTH = 0x7FFFFFF0; // ~2^31-16
+ if (actualLength > MAX_SAFE_LENGTH || expectedLength > MAX_SAFE_LENGTH) {
+   throw new Error('Input arrays too large for Myers diff');
+ }
  const actualLength = actual.length;
  const expectedLength = expected.length;
  const max = actualLength + expectedLength;
- const v = new Int32Array(2 * max + 1);
+ if (max > MAX_SAFE_LENGTH) {
+   throw new Error('Combined lengths exceed safe limit');
+ }
+ const v = new Int32Array(2 * max + 1);

Comment on lines +761 to 764
for (; i < keysB.length; i++) {
const key = keysB[i];
if ((a[key] === undefined && !hasOwn(a, key)) || !innerDeepEqual(a[key], b[key], mode, memos)) {
return false;
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looping over only keysB fails to check keys present in a but not in b, leading to false positives. Both objects' keys must be fully compared to ensure equivalence.

SUGGESTION:

- for (; i < keysB.length; i++) {
+ for (const key of [...keysA, ...keysB].filter((key, index, self) => self.indexOf(key) === index))) {

@mansidhall-CAW
Copy link
Author

{
  "lastReviewedCommitId": "f46895a01ed5842c222eec9d8d9e0456c8f816d7",
  "reviewedCommitIds": [
    "f46895a01ed5842c222eec9d8d9e0456c8f816d7"
  ]
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants