Skip to content

Conversation

@mattjala
Copy link
Contributor

@mattjala mattjala commented Oct 24, 2025

This PR should fix issues the Cache VOL runs into during gent_longlinks() within h5dumpgentest from the main HDF5 repo.

  • The use of long link names would overrun the fixed-size path buffers used by the Cache VOL. These buffers are now dynamically allocated and shouldn't ever be overrun.
  • Closing a parent group before a child group would lead to the child group retaining a pointer to freed memory, which it would attempt to access during its own close later. Cache objects are now ref-counted in a way that should lead to each object being kept open until all references to it are closed, preventing the unsafe access.
  • Fix a use-after-free during file open

Related to HDFGroup/hdf5#5935

@mattjala mattjala changed the title Fix unsafe memory access issues Fix unsafe memory access during gent_longlinks Oct 24, 2025
@brtnfld
Copy link
Collaborator

brtnfld commented Nov 3, 2025

For the basename() changes, we have a const violation and a thread safety issue (static storage is used).

You could use a thead-safe copy

char name_copy[PATH_MAX];
strncpy(name_copy, name, PATH_MAX - 1);
name_copy[PATH_MAX - 1] = '\0';
char *base = basename(name_copy);

Or add a helper function:

/*-------------------------------------------------------------------------
 * Function:    safe_basename
 *
 * Purpose:     Thread-safe basename alternative that doesn't modify input
 *
 * Return:      Pointer to the basename portion of path (not a copy)
 *
 *-------------------------------------------------------------------------
 */
static const char* safe_basename(const char *path) {
    const char *base;
    
    if (path == NULL || path[0] == '\0')
        return ".";
    
    /* Find the last '/' in the path */
    base = strrchr(path, '/');
    
    if (base == NULL) {
        /* No '/' found, entire path is the basename */
        return path;
    }
    
    /* Skip past the '/' */
    base++;
    
    /* Handle trailing slashes: "/path/to/dir/" -> "dir" */
    if (*base == '\0') {
        /* Path ends with '/', need to find previous component */
        const char *end = base - 1;  /* Point to the '/' */
        const char *start = path;
        
        /* Skip all trailing slashes */
        while (end > path && *end == '/')
            end--;
        
        if (end == path)
            return "/";  /* Path is all slashes */
        
        /* Find the previous '/' */
        start = end;
        while (start > path && *(start - 1) != '/')
            start--;
        
        /* Return pointer to the component (still in original string) */
        /* NOTE: This won't be null-terminated, but strlen() will work
         * because we're calculating length before calling this function */
        return start;
    }
    
    return base;
}

and then,

const char *base = safe_basename(name);
size_t path_len =
    strlen(file->H5LS->path) + strlen(base) + strlen("-cache") + 2;

So you have:

// Build cache path: <storage_path>/<filename>-cache/
char rnd[255];
sprintf(rnd, "%d", file->H5DWMM->mpi->rank);
const char *base = safe_basename(name);  // ← CHANGED

size_t path_len =
    strlen(file->H5LS->path) + strlen(base) + strlen("-cache") + 2;
file->H5DWMM->cache->path = (char *)malloc(path_len);
if (file->H5DWMM->cache->path == NULL) {
  LOG_ERROR(-1, "Failed to allocate cache path");
  free(file->H5DWMM->cache);
  return FAIL;
}
snprintf(file->H5DWMM->cache->path, path_len, "%s/%s-cache",
         file->H5LS->path, base);

Two other locations would need updating:
~5563
char *base = basename((char *)name);
size_t path_len = strlen(file->H5LS->path) + strlen(base) + 2;

and
~6184
char *base = basename((char *)name);
size_t path_len = strlen(file->H5LS->path) + strlen(base) +
strlen("-global-cache/") + 2;

@brtnfld
Copy link
Collaborator

brtnfld commented Nov 3, 2025

Or do, (does not handle trailing slashes):

const char *base = strrchr(name, '/');
base = (base == NULL) ? name : base + 1;

@jhendersonHDF jhendersonHDF merged commit df4df2e into HDFGroup:develop Nov 11, 2025
3 checks passed
brtnfld pushed a commit to brtnfld/vol-cache that referenced this pull request Nov 13, 2025
* Fix overflow on long link names

* Reference count cache objects
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants