A few minor, non-blocking cleanups noticed while reviewing HERD (src/hdmf/common/resources.py). None change behavior for correct usage; grouping them here rather than filing separately.
1. Dead assignment in add_ref (resources.py:679-682)
if entity_uri is not None:
entity_uri = entity.entity_uri # value is never read afterward
msg = 'This entity already exists. Ignoring new entity uri'
warn(msg, stacklevel=3)
In this branch the entity already exists, so add_entity is False and _add_entity is never called. The reassigned entity_uri is never used again. The line can be removed (keep the warning).
2. to_dataframe rebuilds the files DataFrame inside a loop (resources.py:954-955)
for idx in result_df['files_idx']:
file_id_val = self.files.to_dataframe().iloc[int(idx)]['file_object_id']
file_id_col.append(file_id_val)
self.files.to_dataframe() is recomputed on every iteration. Hoist it out of the loop (or map via the already-available files DataFrame).
3. O(n^2) population via repeated linear which() scans
add_ref / _check_object_field / to_dataframe rely on repeated linear Table.which() scans (e.g. _check_object_field scans the objects table up to three times per call). Building a large HERD is therefore roughly O(n^2). Not a bug and fine for small tables, but worth noting for scalability if HERD is used at scale (e.g. an index/cache keyed by object_id/key would remove the repeated scans).
Code of Conduct
A few minor, non-blocking cleanups noticed while reviewing
HERD(src/hdmf/common/resources.py). None change behavior for correct usage; grouping them here rather than filing separately.1. Dead assignment in
add_ref(resources.py:679-682)In this branch the entity already exists, so
add_entityisFalseand_add_entityis never called. The reassignedentity_uriis never used again. The line can be removed (keep the warning).2.
to_dataframerebuilds the files DataFrame inside a loop (resources.py:954-955)self.files.to_dataframe()is recomputed on every iteration. Hoist it out of the loop (or map via the already-available files DataFrame).3. O(n^2) population via repeated linear
which()scansadd_ref/_check_object_field/to_dataframerely on repeated linearTable.which()scans (e.g._check_object_fieldscans the objects table up to three times per call). Building a largeHERDis therefore roughly O(n^2). Not a bug and fine for small tables, but worth noting for scalability ifHERDis used at scale (e.g. an index/cache keyed by object_id/key would remove the repeated scans).Code of Conduct