Skip to content

Add erlang:crc32/1,2 and erlang:crc32_combine/3#2209

Closed
bettio wants to merge 0 commit intoatomvm:mainfrom
bettio:add-crc32
Closed

Add erlang:crc32/1,2 and erlang:crc32_combine/3#2209
bettio wants to merge 0 commit intoatomvm:mainfrom
bettio:add-crc32

Conversation

@bettio
Copy link
Collaborator

@bettio bettio commented Mar 16, 2026

Add CRC32 (IEEE 802.3) checksum support to the erlang module,
matching OTP's API. When zlib is available the implementation
delegates to it, otherwise a standalone software fallback is used.

Also fix term_is_int32 and term_is_uint32 which returned true
unconditionally for unboxed integers on 64-bit builds, skipping
the range check needed for values above 32-bit limits.

These changes are made under both the "Apache 2.0" and the "GNU Lesser General
Public License 2.1 or later" license terms (dual license).

SPDX-License-Identifier: Apache-2.0 OR LGPL-2.1-or-later

* by caller using \c term_is_int()
* @note Only extracts from unboxed integers (28-bit on 32-bit builds,
* 60-bit on 64-bit builds)
* @note Safe conversions: \c size_t s = term_to_int(t) is always valid
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure this is true. size_t is unsigned while term_to_int(t) returns a signed value.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

ok = expect_badarg(fun() -> erlang:crc32_combine(?MODULE:id(16#100000000), 0, 0) end),
ok = expect_badarg(fun() -> erlang:crc32_combine(0, ?MODULE:id(16#100000000), 0) end),
ok = expect_badarg(fun() -> erlang:crc32_combine(0, 0, ?MODULE:id(16#100000000)) end),
ok.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also test crc32/2 with second element being invalid ([-1], [256], etc.).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

.base.type = NIFFunctionType,
.nif_ptr = nif_erlang_crc32_1
};
static const struct Nif crc32_old_nif = {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name crc32_old_nif is a little bit confusing. crc32_2_nif? Any reason not to merge it with nif_erlang_crc32_1 and check for argc?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.


const void *data_ptr;
size_t data_size;
char *alloc_ptr = NULL;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part of this could be factorized if both nifs would be combined.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@petermm
Copy link
Contributor

petermm commented Mar 17, 2026

https://ampcode.com/threads/T-019cfc3d-3f9e-73c3-8602-d78da75e5b70

PR Review: Add erlang:crc32/1, erlang:crc32/2 and erlang:crc32_combine/3

Commit: bbc54a56Add erlang:crc32/1, erlang:crc32/2 and erlang:crc32_combine/3
Author: Davide Bettio
Reviewer: AI-assisted (Oracle + manual inspection)


Summary

Implements CRC32 (IEEE 802.3) checksum NIFs matching OTP's erlang module API. When zlib is available the implementation delegates to it; otherwise a standalone software fallback is used. Includes Erlang stubs with edoc, gperf registration, test module, and CHANGELOG entry.

Verdict: Approve with changes — no memory leaks or symbol collisions found; the software fallback math is correct. Two issues should be addressed before merge.


Issues Found

🔴 Issue 1: zlib path silently truncates length for large inputs

File: src/libAtomVM/nifs.c (around line 6767)

uLong crc = crc32(old_crc_val, data_ptr, data_size);

zlib's crc32() takes a uInt length parameter, not size_t. On 64-bit builds, if data_size > UINT_MAX, this silently truncates and computes the CRC of only the low 32-bit portion of the length. The software fallback does not have this bug since it takes size_t len.

Suggested fix — chunk the zlib call:

#ifdef WITH_ZLIB
    uLong zcrc = old_crc_val;
    const Bytef *p = (const Bytef *) data_ptr;
    size_t remaining = data_size;

    while (remaining > UINT_MAX) {
        zcrc = crc32(zcrc, p, UINT_MAX);
        p += UINT_MAX;
        remaining -= UINT_MAX;
    }
    uint32_t crc = (uint32_t) crc32(zcrc, p, (uInt) remaining);
#else
    uint32_t crc = crc32_uint(old_crc_val, (const uint8_t *) data_ptr, data_size);
#endif

Severity: Medium — unlikely on embedded targets but a real correctness bug on host/64-bit builds.


🟡 Issue 2: Spec/implementation mismatch on argument range

Files: libs/estdlib/src/erlang.erl, src/libAtomVM/nifs.c

The Erlang typespecs declare non_neg_integer() for CRC and size parameters (matching OTP docs), but the C implementation validates with term_is_uint32, rejecting values ≥ 16#100000000. This is reasonable for CRC values (which are always 32-bit) but creates a documented vs. actual behavior mismatch, especially for SecondSize in crc32_combine/3.

Recommendation: Add an @doc note in erlang.erl stating AtomVM limits these arguments to the 32-bit range, or widen validation for SecondSize if full OTP parity is desired.


What Looks Good

Area Assessment
Memory safety ✅ No leaks — alloc_ptr is freed before make_maybe_boxed_int64; iolist_to_buffer cleans up on failure; empty data handled safely
Input validation term_is_uint32 + VALIDATE_VALUE correctly rejects negatives, floats, atoms, tuples, oversized ints
Name collision ✅ No conflict — crc32_uint and nif_erlang_crc32 don't collide with zlib's crc32 C symbol
Software fallback ✅ Standard reflected polynomial 0xEDB88320 with proper init/final inversion; combine uses correct GF(2) matrix-squaring (matches zlib's algorithm)
argc dispatch ✅ Single nif_erlang_crc32 handles both arities cleanly via argc == 2 check
Test coverage ✅ Good: covers binary, iolist, empty data, incremental update, combine, and extensive badarg cases
CHANGELOG / docs ✅ Present and accurate

Test Coverage Gaps

The following additional test cases would strengthen confidence:

  1. No-zlib build — Run the same test module with WITH_ZLIB=off in CI. This is the biggest gap since the software fallback is otherwise unexercised.

  2. Boundary values — Test max uint32 CRC:

    16#FFFFFFFF = erlang:crc32(16#FFFFFFFF, <<>>).
  3. Empty update/combine identity — Verify CRC is unchanged when combined with empty data:

    Old = erlang:crc32(<<"abc">>), Old = erlang:crc32(Old, <<>>).
    C1 = erlang:crc32(<<"abc">>), C1 = erlang:crc32_combine(C1, erlang:crc32(<<>>), 0).
  4. Nested iodata — e.g. [<<"He">>, [$l, $l], [<<"o">>]]

  5. Float badargs — Add 1.0 / 1.5 as CRC and size arguments

  6. Cross-check property — Verify incremental CRC matches combined CRC for more input shapes:

    erlang:crc32([A, B]) =:= erlang:crc32(erlang:crc32(A), B).
    erlang:crc32_combine(erlang:crc32(A), erlang:crc32(B), byte_size(B)) =:= erlang:crc32(<<A/binary, B/binary>>).

Minor Observations

  • iolist flattening allocates O(n): nif_erlang_crc32 flattens non-binary iodata into a temporary malloc'd buffer via iolist_to_buffer. This matches existing AtomVM patterns but can be expensive on constrained targets for large iolists. A streaming approach over iodata segments would avoid the copy but is more complex — fine as a future optimization.

  • make_maybe_boxed_int64 for a uint32 result: Semantically broader than needed, but safe — CRC32 values fit comfortably and boxing support is useful when values exceed the immediate-int range on some targets.

  • crc32_combine/3 cannot validate SecondSize: If the caller passes a size that doesn't match the data that produced SecondCrc, the result will be wrong but well-defined. This matches OTP/zlib semantics — the function has no way to verify it.

bettio added a commit that referenced this pull request Mar 18, 2026
Add `erlang:crc32/1,2` and `erlang:crc32_combine/3`

These changes are made under both the "Apache 2.0" and the "GNU Lesser General
Public License 2.1 or later" license terms (dual license).

SPDX-License-Identifier: Apache-2.0 OR LGPL-2.1-or-later
@bettio bettio closed this Mar 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants