Skip to content

[BUG]: HEAP_CORRUPTION/ACCESS_VIOLATION During ColumnReader_HasNext call #549

@IRailean97

Description

@IRailean97

Issue Description

After upgrading ParquetSharp from 10.0.1 to 19.0.1, my application began crashing intermittently with STATUS_HEAP_CORRUPTION inside ParquetSharpNative.dll.

WinDbgX shows the actual corruption occurring inside native code — specifically during ColumnReader_HasNext call.

The crash is a heap corruption (0xC0000374) detected inside RtlpHeapHandleError.
The failure occurs during a call to ParquetSharpNative!ColumnReader_HasNext invoked from managed code via P/Invoke.
The heap corruption is likely due to reading/writing freed or overwritten native memory inside the native Parquet library before this point.
The native stack trace shows memory allocation/free logic inside ntdll triggered during HasNext.
Before HasNext, there’s a long sequence of KmsConnectionConfig_SetKmsInstanceUrl calls, this suggests native code is building/handling a URL string, possibly reallocating heap memory.

Image

Environment Information

  • ParquetSharp Version: [e.g. 1.0.1]
  • .NET Framework/SDK Version: .NET 8
  • Operating System: Windows 10

Steps To Reproduce

Code using ColumnReader:

private IEnumerator<ColumnElement> GetColumnElements(ColumnReader columnReader, IParquetColumnBuffer<T> buffer)
{
    while (columnReader.HasNext)
    {
        buffer.Clear();
        buffer.Read(columnReader);
        int columnValuesOffset = 0;

for (int columnLevelsOffset = 0; columnLevelsOffset < buffer.LevelsCount; columnLevelsOffset++)
        {
            ColumnElement columnElement;
            try
            {
                short definitionLevel = buffer.DefinitionLevels == null ? (short)0 : buffer.DefinitionLevels[columnLevelsOffset];
                short repetitionLevel = buffer.RepetitionLevels == null ? (short)0 : buffer.RepetitionLevels[columnLevelsOffset];

                if (definitionLevel < buffer.MaxDefinitionLevel)
                {
                    columnElement = new ColumnElement(null, definitionLevel, repetitionLevel);
                }
                else
                {
                    string str = ParquetDataConverter.ConvertParquetValueToString(buffer.Values[columnValuesOffset], this.schemaElement);
                    columnElement = new ColumnElement(str, definitionLevel, repetitionLevel);
                    columnValuesOffset++;
                }
            }
            catch (Exception ex)
            {
                throw new ParquetReaderException($"Failed to read data from Parquet Column - {columnReader.ColumnDescriptor?.Name.MarkAsPrivate()} and of parquet physical type = {columnReader.Type.ToString().MarkAsPrivate()} and logical type = {columnReader.ColumnDescriptor?.LogicalType?.Type.ToString().MarkAsPrivate()}. Inner Exception from ParquetSharp: {ex}");
            }

            yield return columnElement;
        }
    }
}

Inside ConvertParquetValueToString:
I parse different physical types.

I've also noticed in dumpstack it is often failing during parsing Int96. May be related, however, no direct relation found:

case PhysicalType.Int96:
    if (valueToConvert is Int96 intDate)
    {
        return Int96ToTimestampNanos(intDate)?.ToString();
    }

    return valueToConvert.ToString();
internal static DateTime? Int96ToTimestampNanos(Int96 value)
{
    try
    {
        long time;
        int julianDay;
        unsafe
        {
            time = ReadInt64LittleEndian((byte*)&value);
            julianDay = ReadInt32LittleEndian((byte*)&value + 8);
        }

        // INT96 stores the Date part as the number of Julian Days (days since the start of Julian calendar) which can also include BCE dates
        if (julianDay < JulianDayMinValue)
        {
            return null;
        }

        int days = julianDay - UnixEpochJulianDay;
        return UnixEpoch + new TimeSpan(days, 0, 0, 0) + new TimeSpanNanos(time).TimeSpan;
    }
    catch (Exception e)
    {
        throw new ParquetReaderException($"Parquet data convert Error - Failed to convert Int96 to DateTime {value.ToString()}. ", e);
    }
}

Expected Behavior

This is sporadic issue, previously I used ParquetSharp 10.0.1 and no such exception occurred. After upgrade to 19.0.1 I started to observe this behavior

Additional Context (Optional)

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions