Skip to content

Conversation

djeebus
Copy link
Contributor

@djeebus djeebus commented Oct 10, 2025

This consolidates metrics into a single struct that does a few things:

  • exports metrics to a file called {pid}.json
  • watches for other files and reads their metrics.
  • when handling incoming requests, check full host metrics

It also creates server.Limiter for checking starting and running limits.


Note

Introduces a shared-state manager that aggregates sandbox allocations across processes via PID JSON files and integrates a limiter to cap starting/running sandboxes, wiring both into server and service info paths.

  • Shared State (peer-to-peer metrics):
    • Add internal/sharedstate with Manager that writes self allocations to {pid}.json, watches directory via fsnotify, and aggregates Allocations across processes.
    • Integrate with sandboxes map via Subscribe; expose TotalAllocated() and TotalRunningCount().
    • Add tests in internal/sharedstate/tracker_test.go.
  • Sandbox start limiting:
    • Add server.Limiter to enforce max running (via featureflags.MaxSandboxesPerNode) and max starting per node; replace semaphore logic in server with sandboxLimiter and error handling.
  • Service info metrics:
    • internal/service/service_info.go: switch from iterating local sandboxes to using sharedstate.Manager for CPU/memory/disk and running counts.
  • Config:
    • Add SHARED_STATE_DIRECTORY, SHARED_STATE_WRITE_INTERVAL, MAX_STARTING_INSTANCES to cfg.Config.
  • Wiring:
    • main.go: instantiate and run sharedstate.Manager; pass to InfoService and server.New; create server.NewLimiter.
    • Minor: rename Google Storage limiter variable for clarity.
  • Dependencies:
    • Add github.com/fsnotify/fsnotify to go.mod/go.sum.

Written by Cursor Bugbot for commit 835ee09. This will update automatically on new commits. Configure here.

@linear
Copy link

linear bot commented Oct 10, 2025

@e2b-request-same-site-reviewers e2b-request-same-site-reviewers bot requested review from ValentaTomas and removed request for ValentaTomas, dobrac and jakubno October 10, 2025 00:29
cursor[bot]

This comment was marked as outdated.

cursor[bot]

This comment was marked as outdated.

@djeebus djeebus marked this pull request as draft October 10, 2025 15:47
@djeebus
Copy link
Contributor Author

djeebus commented Oct 10, 2025

I'm going to pull a piece out to its own PR, and reopen when that's merged.

@ValentaTomas ValentaTomas changed the title ENG-3153 peer-to-peer metrics tracker Add peer-to-peer metrics tracker Oct 10, 2025
# Conflicts:
#	packages/orchestrator/internal/server/main.go
#	packages/orchestrator/internal/server/sandboxes_test.go
#	packages/orchestrator/internal/service/service_info.go
#	packages/orchestrator/main.go
@djeebus djeebus marked this pull request as ready for review October 15, 2025 23:56
cursor[bot]

This comment was marked as outdated.

# Conflicts:
#	packages/orchestrator/internal/cfg/model.go
cursor[bot]

This comment was marked as outdated.

@sitole sitole self-requested a review October 21, 2025 09:32
RedisClusterURL string `env:"REDIS_CLUSTER_URL"`
RedisURL string `env:"REDIS_URL"`
Services []string `env:"ORCHESTRATOR_SERVICES" envDefault:"orchestrator"`
MetricsDirectory string `env:"METRICS_DIRECTORY" envDefault:"/orchestrator/metrics"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about calling it /orchestrator/state so we can put more here in the future?

Comment on lines -54 to -62
sandboxVCpuAllocated := uint32(0)
sandboxMemoryAllocated := uint64(0)
sandboxDiskAllocated := uint64(0)

for _, item := range s.sandboxes.Items() {
sandboxVCpuAllocated += uint32(item.Config.Vcpu)
sandboxMemoryAllocated += uint64(item.Config.RamMB) * 1024 * 1024
sandboxDiskAllocated += uint64(item.Config.TotalDiskSizeMB) * 1024 * 1024
}
Copy link
Member

@sitole sitole Oct 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, this is what I misunderstood during our conversation about host shared metrics. I thought we were asking OS to give us allocated memory, CPU, and disk per whole host and sandbox sums are not returned here. Sorry for the miscommunication here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem, glad we're on the same page!

cursor[bot]

This comment was marked as outdated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants