HDFS-17890. Avoid slow disks datanode when reading data by junjie1233 · Pull Request #8338 · apache/hadoop

junjie1233 · 2026-03-10T12:56:33Z

Summary

When a client requests to read a block, the NameNode returns a list of DataNodes holding the replicas of that block.

The current logic sorts these DataNodes based on network topology (rack awareness, distance), without considering the performance of the underlying storage (disk/volume).

If a block replica resides on a slow or overloaded disk (for example, a hot disk with high latency), its DataNode may still be placed at the top of the sorted list and selected first by the client.

Example

Suppose a block has three replicas located on:

dn1: storage1 (slow disk)

dn2: storage1 (normal disk)

dn3: storage1 (normal disk)

Even if the speed of memory 1 is known to be slow, the client will still prioritize reading in the order returned by the NameNode.

Fix

This PR adds disk-level slow storage tracking and deprioritization during block location sorting.

1. Disk-Level Tracking (`SlowDiskTracker.java`)

Track slow storage at StorageID granularity (not DataNode level).
Copy-on-Write cache cachedSlowDisksForRead to avoid lock contention on read path.
Cache key: IP:PORT:StorageID (e.g., 127.0.0.1:50010:DS-xxx).
Dual key modes:
- CACHE_KEY: IP:PORT:StorageID for read deprioritization.
- LEGACY_KEY: IP:PORT:volumeName for backward compatibility with Top-N reports.
Background cache rebuild with configurable interval (default 30s).
Automatic expiration of stale entries (reportValidityMs).

2. Block Location Sorting (`FSNamesystem.java`)

New method sortLocatedBlocksBySlowDisk() reorders replicas after topology-based sorting.
Pre-compute slow keys before sorting to avoid string concatenation in comparator hot path.
Stable sort: preserves network topology order for non-slow replicas.
Slow replicas sorted by latency (higher latency → lower priority).
Controlled by config:
dfs.namenode.deprioritize.slow.disk.datanode.for.read (default: false).

3. Configuration (`DFSConfigKeys.java`)

dfs.namenode.slow.disk.cache.rebuild.interval (default 30s).
Decouples cache rebuild frequency from Top-N report generation interval.
Allows independent tuning for large clusters.

4. Disk Key Format (`DataNodeDiskMetrics.java`)

Use volumeName|storageID format for slow disk reports.
Enables SlowDiskTracker to extract both legacy key (WebUI) and cache key (read path).

Test

Test class: TestSlowDiskBlockLocations.java

Test Coverage

✅ testDeprioritizeSlowDiskDatanodeForReadEnabled
Verifies that slow disk replicas are moved to the end of location list.
Checks block read path integration.
✅ testSlowDiskCacheRebuild
Tests cache population after DataNode reports slow disk.
Verifies cache refresh mechanism.
✅ testSlowDiskExpiration
Validates expiration of stale slow disk entries.
Confirms cache is cleaned after disk recovery.
✅ testCacheIntegrationWithReadPath
End-to-end test: slow disk report → cache update → block location sorting.
Verifies clients avoid slow replicas.
✅ testIndependentCacheRebuildInterval
Tests independent cache rebuild interval configuration.
Verifies decoupling from Top-N report generation.
✅ testMultipleSlowDisks
Multiple slow disks across different DataNodes.
Validates sorting by latency when all replicas are slow.
✅ testNoSlowDiskReports
Baseline test: no sorting when no slow disks reported.
Ensures feature is non-intrusive when disabled.

Test Configuration

DFS_HEARTBEAT_INTERVAL: 1s (fast heartbeat for testing)
DFS_NAMENODE_SLOW_DISK_CACHE_REBUILD_INTERVAL: 1s (quick cache rebuild)
OUTLIERS_REPORT_INTERVAL: 1s (rapid slow disk detection)
Uses GenericTestUtils.waitFor() for async operations.

…o hdfs-default.xml

hadoop-yetus · 2026-03-16T16:57:03Z

🎊 +1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	7m 23s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 0s		No case conflicting files found.
+0 🆗	codespell	0m 0s		codespell was not available.
+0 🆗	detsecrets	0m 0s		detect-secrets was not available.
+0 🆗	xmllint	0m 0s		xmllint was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s		The patch appears to include 1 new or modified test files.
			_ trunk Compile Tests _
+1 💚	mvninstall	27m 36s		trunk passed
+1 💚	compile	1m 3s		trunk passed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04
+1 💚	compile	1m 4s		trunk passed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1
+1 💚	checkstyle	1m 4s		trunk passed
+1 💚	mvnsite	1m 6s		trunk passed
+1 💚	javadoc	0m 56s		trunk passed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04
+1 💚	javadoc	0m 57s		trunk passed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1
+1 💚	spotbugs	2m 13s		trunk passed
+1 💚	shadedclient	17m 5s		branch has no errors when building and testing our client artifacts.
			_ Patch Compile Tests _
+1 💚	mvninstall	0m 44s		the patch passed
+1 💚	compile	0m 40s		the patch passed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04
+1 💚	javac	0m 40s		the patch passed
+1 💚	compile	0m 44s		the patch passed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1
+1 💚	javac	0m 44s		the patch passed
+1 💚	blanks	0m 0s		The patch has no blanks issues.
-0 ⚠️	checkstyle	0m 42s	/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt	hadoop-hdfs-project/hadoop-hdfs: The patch generated 26 new + 324 unchanged - 0 fixed = 350 total (was 324)
+1 💚	mvnsite	0m 47s		the patch passed
+1 💚	javadoc	0m 35s		the patch passed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04
+1 💚	javadoc	0m 36s		the patch passed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1
+1 💚	spotbugs	2m 0s		the patch passed
+1 💚	shadedclient	16m 28s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
+1 💚	unit	181m 37s		hadoop-hdfs in the patch passed.
+1 💚	asflicense	0m 27s		The patch does not generate ASF License warnings.
		265m 11s

Subsystem	Report/Notes
Docker	ClientAPI=1.54 ServerAPI=1.54 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8338/3/artifact/out/Dockerfile
GITHUB PR	#8338
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint
uname	Linux 27173926dac5 5.15.0-171-generic #181-Ubuntu SMP Fri Feb 6 22:44:50 UTC 2026 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / `f0abd71`
Default Java	Ubuntu-17.0.18+8-Ubuntu-124.04.1
Multi-JDK versions	/usr/lib/jvm/java-21-openjdk-amd64:Ubuntu-21.0.10+7-Ubuntu-124.04 /usr/lib/jvm/java-17-openjdk-amd64:Ubuntu-17.0.18+8-Ubuntu-124.04.1
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8338/3/testReport/
Max. process+thread count	4053 (vs. ulimit of 5500)
modules	C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8338/3/console
versions	git=2.43.0 maven=3.9.11 spotbugs=4.9.7
Powered by	Apache Yetus 0.14.1 https://yetus.apache.org

This message was automatically generated.

HDFS-17890. Avoid slow disks datanode when reading data

c997228

github-actions bot added HDFS trunk labels Mar 10, 2026

junjie1233 added 2 commits March 13, 2026 12:26

HDFS-17890. Add configuration properties for slow disk optimization t…

3788030

…o hdfs-default.xml

Trigger CI

f0abd71

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HDFS-17890. Avoid slow disks datanode when reading data#8338

HDFS-17890. Avoid slow disks datanode when reading data#8338
junjie1233 wants to merge 3 commits intoapache:trunkfrom
junjie1233:HDFS-17890

junjie1233 commented Mar 10, 2026

Uh oh!

hadoop-yetus commented Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

junjie1233 commented Mar 10, 2026

Summary

Example

Fix

1. Disk-Level Tracking (SlowDiskTracker.java)

2. Block Location Sorting (FSNamesystem.java)

3. Configuration (DFSConfigKeys.java)

4. Disk Key Format (DataNodeDiskMetrics.java)

Test

Test Coverage

Test Configuration

Uh oh!

hadoop-yetus commented Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

1. Disk-Level Tracking (`SlowDiskTracker.java`)

2. Block Location Sorting (`FSNamesystem.java`)

3. Configuration (`DFSConfigKeys.java`)

4. Disk Key Format (`DataNodeDiskMetrics.java`)