rate-limiting on bandwidth #336

wenchao5211 · 2024-02-01T02:43:08Z

Describe the problem to be solved

i use k3s , I want to limit the bandwidth of the spegel to a range, if there is no upper limit, it may affect the cluster network and lead to the cluster health

Proposed solution to the problem

No response

phillebaba · 2024-02-01T12:37:06Z

I have thought about this being a potential issue but never been able to see that it is. You will need to start a lot of Pods with new images considering that image pulling is done from multiple nodes during Pod startup. Have you observed the network traffic as an issue in your deployments? If that is the case I would want to try to replicate these issues.

wenchao5211 · 2024-02-02T01:31:55Z

I created a cluster, and all images are on a single master node. Then, when I added three or more new nodes simultaneously, the new nodes only have spegel images. This greatly affects the network of the master node.

bittrance · 2024-02-02T07:46:07Z

Just to be clear, that means we are talking about HTTP rate limits, right? A socket-level rate limit would still hog all the TCP connections until most pods on all fresh nodes are done, whereas an HTTP 429 would have the clients retry and at least spegel would over time start to put requests to the other nodes that managed to actually get layers from the master node?

phillebaba · 2024-02-02T09:49:28Z

@bittrance if we were to implement rate limiting it would be HTTP rate limits. Returning a 429 response would result in a fallback to the original registry. If the original registry is not accessible the mirror would be attempted again during the next image pull go around after the back off.

@wenchao5211 what type of VMs are you using for your controlplane? It seems like they are very slim if serving a couple of megabytes of data from disk is greatly affecting its networking. Or are the image layers that you are pulling very large? It would be great to see some metrics from this control plane node to understand what is going on.

wenchao5211 · 2024-02-04T05:54:12Z

I have both physical and virtual appliances, both with gigabit transfers, and the mirrors on our first master node are very large indeed, adding up to over 10GB, what metrics do you need? What kind of metrics are you looking for? Are they machine network and compute metrics or Pro Monitor data?

phillebaba · 2024-02-04T12:19:29Z

This may in that case make sense if the individual layers are very large. I am not really sure right now what metrics are related. There are a lot of factors at play here, from disk IO to networking. I probably need some time to think about this.

I have had a look at other registries to see if they have any similar features.

Zot seems to only support HTTP rate limiting on individual requests.

project-zot/zot#380

Harbor has support for throttling IO during image replication.

goharbor/harbor#13194

It would be interesting to see if there are any other projects doing similar things. If there is a way to go it would be to limit the io copy speed when writing blobs-

wenchao5211 · 2024-02-05T01:23:56Z

ref: dragonflyoss/Dragonfly#1427

phillebaba · 2024-02-05T12:23:18Z

I think this is fine to implement after having a look at the configuration options in Dragonfly.

https://d7y.io/docs/reference/configuration/dfdaemon

@bittrance do you have any opinion about this?

bittrance · 2024-02-05T20:30:15Z

I think that the best tool to address this problem is kernel traffic control which would de-emphasize spegel traffic. However, there are various scenarios where you don't have control over kernel-level config. Plus, the bottleneck may be upstream (think edge nodes).

Thus, implementing a spegel node max bandwidth consumption may indeed be a good idea. This should address the "large image" node net if saturation. The docs should prolly mention something about setting the limit so there is bandwidth left for gossipping.

I'm more skeptical about a per-peer/client bandwidth limit since I think in most cases it will be hard to estimate the number of concurrent clients. Crazy idea: perhaps a spegel node could be configured to 429 requests above some limit only if it sees that the layer is available from other peers?

wenchao5211 added the enhancement New feature or request label Feb 1, 2024

phillebaba mentioned this issue Feb 20, 2024

Add support for throttling blob write speed #365

Merged

phillebaba closed this as completed in #365 Feb 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rate-limiting on bandwidth #336

rate-limiting on bandwidth #336

wenchao5211 commented Feb 1, 2024

phillebaba commented Feb 1, 2024

wenchao5211 commented Feb 2, 2024

bittrance commented Feb 2, 2024

phillebaba commented Feb 2, 2024

wenchao5211 commented Feb 4, 2024

phillebaba commented Feb 4, 2024

wenchao5211 commented Feb 5, 2024

phillebaba commented Feb 5, 2024

bittrance commented Feb 5, 2024

rate-limiting on bandwidth #336

rate-limiting on bandwidth #336

Comments

wenchao5211 commented Feb 1, 2024

Describe the problem to be solved

Proposed solution to the problem

phillebaba commented Feb 1, 2024

wenchao5211 commented Feb 2, 2024

bittrance commented Feb 2, 2024

phillebaba commented Feb 2, 2024

wenchao5211 commented Feb 4, 2024

phillebaba commented Feb 4, 2024

wenchao5211 commented Feb 5, 2024

phillebaba commented Feb 5, 2024

bittrance commented Feb 5, 2024