memory regions #70257

mknyszek · 2024-11-08T18:21:27Z

mknyszek
Nov 8, 2024
Maintainer

I'm starting this discussion to collect early feedback on a draft design for a kind of region-based memory management in Go. There is no prototype yet, only a design and a preliminary evaluation.

Please read everything below before replying, especially the design discussion section.

(Feel free to skip the detailed design, unless you're interested.)

Background

The arena experiment adds a package consisting of a single type to the standard library: Arena. This type allows one to allocate data structures into it directly, and allows early release of the arena memory in bulk. In other words, it adds a form of region-based memory management to Go. The implementation is memory-safe insofar as use-after-frees will never result in memory corruption, only a potential crash. Arenas have achieved real performance wins, almost entirely due to earlier memory reuse and staving off GC execution.

Unfortunately, the proposal to add arenas to the standard library is on indefinite hold due to the fact that they compose poorly with the language and standard library.

For example, builtin types all need special cases in the implementation, and require explicit slice- and map-related methods. Additionally, choosing to arena-allocate a variable means that it can never be stack-allocated, not without more complexity in the compiler.

Furthermore, for an API to make use of arenas, it must accept an additional argument: the arena to allocate into. There are far too many APIs that would need to be updated to make this integrate well with the language, and it would make those APIs worse.

The text below proposes a composable replacement for arenas in the form of user-defined goroutine-local memory regions.

Goals

First and foremost, our main goal is to reduce resource costs associated with the GC. If we can't achieve that, then this proposal is pointless.

The second most important goal is composability. Specifically:

APIs should not need to be changed to take advantage of arena-like memory allocation patterns.
Regions must compose with standard library features, like sync.Pool and unique.Handle.
Regions must compose with existing optimizations, like stack allocation via escape analysis.

Finally, whatever we implement must be relatively easy to use and intuitive for intermediate-to-advanced Go developers. We must offer tools for discovering where regions might be worth it, and where they aren't working out.

Design

The core of this design revolves around a pair of functions that behave like annotations of function calls. It's useful to think of them as annotations, because crucially, they do not affect the correctness of code, bugs notwithstanding.

The annotations indicate whether the user expects most or all the memory allocated by some function call (and its callees) to stay local to that function (and its callees), and to be unreachable by the time that function returns. If these expectations hold, then that memory is eagerly reclaimed when the function returns, bypassing the garbage collector. If these expectations do not hold for some memory, then that memory is opted out of this early reclaim; management is passed on to the garbage collector as normal.

Below is the proposed new API which explains the semantics in more detail.

package region

// Do creates a new scope called a region, and calls f in
// that scope. The scope is destroyed when Do returns.
// 
// At the implementation's discretion, memory allocated by f
// and its callees may be implicitly bound to the region.
//
// Memory is automatically unbound from the region when it
// becomes reachable from another region, another goroutine,
// the caller of Do, its caller, or from any other memory not
// bound to this region.
//
// Any memory still bound to the region when it is destroyed is
// eagerly reclaimed by the runtime.
//
// This function exists to reduce resource costs by more
// effectively reusing memory, reducing pressure on the garbage
// collector.
// However, when used incorrectly, it may instead increase
// resource costs, as there is a cost to unbinding memory from
// the region.
// Always experiment and benchmark before committing to
// using a region.
//
// If Do is called within an active region, it creates a new one,
// and the old one is reinstated once f returns.
// However, memory cannot be rebound between regions.
// If memory created by an inner region is referenced by an outer
// region, it is not rebound to the outer region, but rather unbound
// completely.
// Memory created by an outer regions referenced by an inner region
// does not unbind anything, because the outer region always out-lives
// the inner region.
//
// Regions are local to the goroutines that create them,
// and do not propagate to newly created goroutines.
//
// Panics and calls to [runtime.Goexit] will destroy region
// scopes the same as if f returned, if they unwind past the
// call to Do.
func Do(f func())

// Ignore causes g and its callees to ignore the current
// region on the goroutine.
//
// Calling Ignore when not in an active region has no effect.
//
// The primary use-case for Ignore is to exclude memory that
// is known to outlive a region, to more effectively make use
// of regions. Using Ignore is less expensive than the unbinding
// process described in the documentation for [region.Do].
func Ignore(g func())

For some very basic examples, see the detailed design doc, or the next section.

Comparison with arenas

Where an arena might be used like...

func myFunc(buf []byte) error {
	a := arena.New()
	defer a.Free()

	data := new(MyBigComplexProto)
	if err := proto.UnmarshalOptions{Arena: a}.Unmarshal(buf, data); err != nil {
		return err
	}
	use(data)
}

... regions would be used like so:

func myFunc(buf []byte) error {
	var topLevelErr error
	region.Do(func() {
		data := new(MyBigComplexProto)
		if err := proto.Unmarshal(buf, data); err != nil {
			topLevelErr = err
			return
		}
		use(data)
	})
	return topLevelErr
}

You can think of a region as an implicit goroutine-local arena that lives for the duration of some function call. That goroutine-local arena is used for allocating all the memory needed by that function call and its callees (including maps, slices, structs, etc.). Thanks to some compiler and runtime magic (see below), if any of that memory would cause a use-after-free issue, it is automatically removed from the arena and handed off to the garbage collector instead.

In practice, we've found that the vast majority of arena uses tightly limit the arena's lifetime to that of a particular function, usually the one they are created in, like the example above. This fact suggests that regions will most likely be usable in most of the same circumstances as arenas.

Summary of benefits and costs

The core benefit is the potential for reduced GC overheads. An additional, more minor benefit is the potential for more efficient memory allocation. If the application code follows the region discipline, it makes much more sense to introduce a bump-pointer allocator for that memory (something like Immix; see the detailed design.

As alluded to in the previous section, some "magic" is required to dynamically escape memory from the region to the general heap. The magic is a goroutine-local write barrier (goroutine-local because it is only enabled on that goroutine, inside the region). We believe that we have a write barrier design that is cheap enough to make this worthwhile, incurring between 1–4% worst-case overhead when enabled globally, depending on the application (so it will be less in practice, limited to the goroutines that use it). We believe that this can be easily won back and then some in GC-heavy applications, provided their memory usage patterns line up with the region's assumptions.

However, this assumes that most or all memory in a region does not escape. The cost of promoting memory is higher, approximately the same cost as reallocating that memory on the heap (that is not how it would be implemented, but it gives you a sense of the cost).

Detailed design and implementation

For more details, please see the complete design document, which includes:

A preliminary performance evaluation and cost/benefit analysis.
Several proposed diagnostics to monitor use of regions.
Alternatives considered.
Prior art.

Detailed draft design.

(Note that the full design doc introduces a new term for memory 'escaping' a region ("fading") to avoid overloading with the compiler's 'escape analysis'. These mean the same thing.)

Design discussion

Below are a few discussion points that have come up often in early feedback, as well as my responses to those discussion points.

Goroutine-local region state seems problematic. Why is it OK?

Enabling region-based allocation for all variables created by a goroutine delivers a clear win if the vast majority of your memory allocated adheres to the region discipline. It's really OK if a small percentage (say, under 5%) of memory allocations escape from the region to the heap.

Also, note that the idea of implicitly opting-in memory was discarded for arenas, but that's because arenas can possibly introduce use-after-free crashes. If you use regions incorrectly, your program will not crash.

Will code owners need to consider applying `region.Ignore` everywhere?

One concern that was raised multiple times early in the design was whether region.Ignore would encourage tightly controlling allocations within regions so heavily that users would start pestering library owners to wrap certain portions of code in region.Ignore.

While this is something that could happen, I hope it would be rare, and I would encourage maintainers to push back on such requests if they occur. As mentioned in the previous discussion point, it's really OK if a small percentage of memory allocations escape from the region to the heap.

For example, I would explicitly advocate for not wrapping (*sync.Pool).New with region.Ignore in the standard library. Why? Because if you're using a Pool effectively, the number of steady-state allocations made should be quite small in practice, and easily overtaken by region allocations.

Given the concern however, perhaps we should remove region.Ignore from the design until we get more experience with it.

Possible extensions

Using PGO to automatically disable costly regions

If at compile time we see from a profile that the region slow paths are "hot" inside a particular region, the compiler can disable that region and potentially report that it did so. This technique has the potential to make monitoring more automatic.

Provide a `GOEXPERIMENT` to make every goroutine implicitly a region

This GOEXPERIMENT makes it easy to quickly turn regions on and off for an entire application. I suspect the majority of performance-sensitive Go applications, such as web services, would benefit from wrapping entire requests (usually captured by a single goroutine) in a region.

This idea is equivalent to enabling the request-oriented collector, an experimental garbage collector from early in Go's life, designed by Rick Hudson and Austin Clements. The difference between that design and this one is in the details: separately managed memory, and a much cheaper write barrier fast path.

This may also combine well with dynamically disabling regions with PGO.

Provide a `GODEBUG` to disable all regions

This allows for quicker rollback and experimentation. We can also extend this GODEBUG to work with compile-time hash bisection to identify costly regions efficiently. This is made possible due to the fact that regions do not change the semantics of the program.

Next steps

Although fairly fleshed out, this design does not yet have a prototype. Before making such an investment, we wish to gauge interest from the community.

Once we feel that broad interest exists, we may prioritize it. This would then involve building a prototype, available as a GOEXPERIMENT, which would then be used to steer the design, possibly enough toward approval. Note that we plan to remove arenas from the standard library once this prototype is created.

gsymons · 2024-11-08T18:52:14Z

gsymons
Nov 8, 2024

why not go:region directive

//go:region
func myFunc(buf []byte) error {
    data := new(MyBigComplexProto)
    if err := proto.Unmarshal(buf, data); err != nil {
	    return err
    }
    use(data)
}

7 replies

gsymons Nov 8, 2024

Hi @gsymons, that would make it effectively an attribute of the function definition?

As I understand, part of the goal of the design is to make it a caller decision rather than being an attribute of the function itself. In your case, annotating the function might then make it harder to then use that annotated function without a region?

OK, caller decision is important if the feature has side-effect. Nonetheless, if the region functionality becomes widespread, I suggest we investigate other design patterns to minimize the use of closures.

SerJaimeLannister Nov 9, 2024

Hey , I am really really sorry that I couldn't understand what this is supposed to bring to the table.
Could someone please explain it to me like I am 5 / rookie go developer (because that's what I currently am)

rferreira Nov 9, 2024

Hey , I am really really sorry that I couldn't understand what this is supposed to bring to the table. Could someone please explain it to me like I am 5 / rookie go developer (because that's what I currently am)

It provides deterministic memory control, so you can control deallocation and not the runtime GC. Not something you would use normally.

SerJaimeLannister Nov 12, 2024

Hey , I am really really sorry that I couldn't understand what this is supposed to bring to the table. Could someone please explain it to me like I am 5 / rookie go developer (because that's what I currently am)

It provides deterministic memory control, so you can control deallocation and not the runtime GC. Not something you would use normally.

does this allow golang to be faster for some use cases ? Like I might write code in golang and if its not fast enough then I can use these memory regions to drop memory count ? what are its side effects / lose ?

ianlancetaylor Nov 12, 2024
Maintainer

As the description says, that goal of this proposal is to allow programmers to reduce the resource costs associated with the garbage collector. In particular, if your program handles requests with clearly bounded memory requirements, as is true of many servers, and if your program has significant garbage collection costs, as is true of some servers, then the hope is that careful use of regions may reduce the garbage collection costs.

ydnar · 2024-11-08T19:07:57Z

ydnar
Nov 8, 2024

Provide a GOEXPERIMENT to make every goroutine implicitly a region

I think this is worth considering in isolation.

0 replies

robaho · 2024-11-08T23:21:07Z

robaho
Nov 8, 2024

I don’t like it at all. Arena memory management work great in C++ because you are already manually managing memory. Bringing that to go is a step backwards. Better to integrate different GC algorithms that better handle some workloads.

Adding more manual memory management to Go is a bad choice. Already I guess that unsafe causing 90+% of all of Go hard to debug bugs.

9 replies

robaho Nov 9, 2024

The difference is that these are GC backed native buffers with limited usefulness. I wouldn’t use new age Java as a great reference - a lot of the stewards have thrown in the towel and it’s become a kitchen sink language. It’s doubtful many of these features will survive and they’ll be deprecated for low adoption.

Still these are designed more for FFI than as a performance optimization. It is doubtful they’ll perform better than using heap memory.

SerJaimeLannister Nov 9, 2024

Still these are designed more for FFI than as a performance optimization. It is doubtful they’ll perform better than using heap memory.

Sure I agree but just because something in java doesn't make it bad even if you hate it. Its a collection of those things combined in that specific time .

Here Java probably wasn't talked about as great reference in the language is the best but rather that its a mature language.

Hate it or love it , though I am not a java enjoyer. I have heard good things about modern java being a beast of its own by many admirable people and you may dislike this opinion and that's okay but still your criticism was rather shallow and I deem that we should atleast appreciate efforts of somebody putting efforts into our favourite language instead of jumping to conclusions.

robaho Nov 9, 2024

Not sure what you’re referring to. I am a proponent of Java - I’m against some recent kitchen sink type changes - others like Virtual Threads have been game changers. It’s more the pace of change without enough review has been a problem.

As to Go, I’m also a fan, which is why I think this is a bad change as proposed. Better to add a generational collector and do this transparently.

robaho Nov 9, 2024

Better to integrate different GC algorithms that better handle some workloads.

That is effectively what this is.

Already I guess that unsafe causing 90+% of all of Go hard to debug bugs.

The proposed mechanism is safe.

Right, you are having to manually control the GC. Don't get me wrong, it is a novel way of getting around Go not having a moving/copying/generational collector, but it is a step backwards for Go in my opinion.

I also never said it was unsafe. I was comparing it to the manual effort required in using unsafe, and the problems that arise from that. I can see similar cases here where regions, and nested regions in libraries cause a performance degradation . Maybe I am wrong.

Just seems to me that if the developer is "smart enough" to create a region, the compiler/runtime should be even smarter in this area, and do it automatically.

ianlancetaylor Nov 11, 2024
Maintainer

Doing automatic region allocation was the goal of the request oriented collector mentioned in the proposal. Unfortunately after considerable work it was not an improvement.

As the proposal says, we could turn regions on automatically using a GOEXPERIMENT if it proves worthwhile.

Go already provides considerable manual control in memory allocation. That is what makes it hard to do better automatically--the programmer has already taken care of the low-hanging fruit.

Danlock · 2024-11-09T04:24:48Z

Danlock
Nov 9, 2024

I am in favor of removing Ignore(), as personally it seems so much simpler to reason about programs using this feature without it.

What happens when a CGo call happens in the middle of region.Do?

2 replies

aep Nov 9, 2024

I want to agree, the use case for that seems optimization, since the proposal would already ignore variables that have to escape to heap automatically. code that wants to explicitly optimize use of memory lifetime on the heap should just use a pool instead.

but i'm not sure if I read the reasoning for pool correctly. if this is also part of the region, it doesnt help.
Developers need some way of micromanaging explicit heap allocation because there's definitely corner cases where the automatic escape is too expensive.

I just wish ignore was less conveniently available. in most cases pgo should optimize this better than micromanaging.

func fetchPicture(id) []byte {
  if !cache[id] {
     // this should not be bound to the lifetime of the caller
     cache[id] = largeAllocation
  }
  return cache[id]
}

i'd feel more comfortable with ignore if it was explicit for just one allocation. like heap.new(BigThing) or something.
or just dont. we already have closures thanks to goros that can be abused to do explicit lifetime binding:

func init() {
    go region.Do(cacher)
}
func cacher() {
   for range request {  allocate cached mem here }
}

func fetchPicture(id) []byte {
   make channel
   <- etc etc
}

mknyszek Nov 9, 2024
Maintainer Author

A cgo call in a region.Do that returns back to Go is fine from a correctness standpoint, but may have other costs in some specific cases. It's not like C cares -- it can't allocate Go memory anyway. Plus, you can't write Go pointers into C memory that stay there beyond the C call unless they're explicitly pinned, and so explicit pinning would require unbinding memory from a region.

However, we could probably make an exception for memory that is pinned for the duration of the cgo call (the way passing values through cgo is currently defined to work). That could, I think, safely stay inside the region, and the region just stays active across the cgo call.

One thing I haven't given a lot of thought to is what happens if Go calls into C, which then calls back into Go. Should the region still be active? I'm not sure, I'll give it more thought.

qiulaidongfeng · 2024-11-09T08:10:44Z

qiulaidongfeng
Nov 9, 2024

I wonder if this or a similar scheme could be used in the future for situations where multiple goroutine objects have the same life cycle?
My use case is a compiler that I wrote myself for the purpose of researching and learning compilation principles.
In https://gitee.com/u-language/u-language/blob/master/ucom/main.go#L258 , if it exec to https://gitee.com/u-language/u-language/blob/master/ucom/parser/parser.go#L51 , exec to https://gitee.com/u-language/u-language/blob/master/ucom/parser/buildmode2.go#L63
The abstract syntax trees of multiple files are parsed in different Goroutines and have the same lifetime.
Having the compiler and gc manage memory for such use cases with known lifecycles may not seem like a good solution. Some problems such as #68815 #68974

0 replies

perj · 2024-11-09T08:47:08Z

perj
Nov 9, 2024

I'm slightly concerned about the use of the stdlib package namespace. I reacted already at the weak package, which also seem very small in scope. The region.Do seem to work just as well as runtime.Region.

Identifiers already often conflict with stdlib package names, most commonly path. Region seems like another one that will conflict fairly often.

4 replies

mknyszek Nov 9, 2024
Maintainer Author

Personally, I don't care too much what the name is or where the API lives. Of course it matters, but I feel like that's something to iterate on in the actual proposal process if we make it there.

(The main reason this lives in its own package is because of Ignore. If Ignore is dropped, there isn't a very good reason to define a new package for a single function, IMO.)

dpifke Nov 13, 2024

Ignore could be runtime.IgnoreRegion or runtime.RegionIgnore.

Agree that the package name "region" is likely to conflict with existing code all over the place. There isn't a great abbreviation for it (rgn? reg? r?), so having to refactor code to deconflict is likely to make it less readable.

seankhliao Nov 13, 2024
Collaborator

from golab: region.Do and region.Dont

mikeschinkel Nov 14, 2024

@seankhliao — My inner Yoda would prefer region.DoNot, especially given that in Go there is no try. 😁

Splizard · 2024-11-09T12:25:17Z

Splizard
Nov 9, 2024

Can't the compiler do escape analysis at the goroutine-level and use this as a hueristic for guiding such an allocation optimisation? Wouldn't all self contained goroutines benefit from this? I don't see any reason why a new API needs to be introduced for this.

It would be disappointing to see yet another interesting technical optimisation pushed unnecessarily into the language through some new magical standard library package.

25 replies

Merovius Nov 12, 2024

@Splizard I apologize that I didn't make clearer what I was trying to do.

When I called it "prohibitively expensive" I wasn't making a statement about the actual quantified cost, but I was trying to relay and clarify an existing consensus. All tradeoffs are ultimately design decisions. And the way Go is developed, such design decisions are ultimately made by the Go team. And (in case you are not aware) @randall77 is a member of the Go team and one of the longest and most prolific contributors to the compiler and runtime. So are @ianlancetaylor and @mknyszek. So their judgement of which tradeoffs are worth it and which are not have significant weight.

As I mentioned above, you can disagree with the tradeoffs the Go team makes for the language and implementation. Where to draw the line in these tradeoffs is, ultimately, a matter of opinion. But for the purposes of this discussion, the consensus of the Go team (in particular as it is reflected in the current implementation) are what matters. And while I'm not one of those, I have a lot of context from previous discussions and was trying to use that to explain my understanding of that consensus. Mostly by trying to achieve consensus of the facts (such as that it might not be "impossible", but that its cost is considered prohibitive, by correcting "the compiler knows all implementations of an interface" and converging on "the compiler has to use an imperfect heuristic to determine which arguments escape").

Again, I apologize for failing to make clear what I was trying to do. If @mknyszek's is the only explanation you accept, I'll let him speak for himself.

Splizard Nov 12, 2024

@ianlancetaylor

For example, slice data will always escape unless the compiler can somehow calculate an upper bound on the length of the slice

This reply thread is very particularly focused on allocating memory within goroutine-local memory arenas/regions, not the stack. Slices that do not escape their goroutine would be allocated here as they escape from the stack. As such, I do feel the need to reject your assessment that this is an entirely independent topic. The question I've raised here is, why does this need to be exposed through a new compiler-package versus the use of static analysis techniques?

We've been taking a closer look at interfaces and by extension closures here as they appear to be the primary hurdle here.

you are saying things like "the compiler is aware of all interface implementations at compile-time" which is simply incorrect.

This is intended to convey (setting aside plugins) , that a Go compiler toolchain as a whole (including the linker, the type checker etc) has access to all named types within the source of a Go program as well as all interface types and could compute the set of types within a program that implement each interface.

I don't see it being meaningful nor useful to interpret this as "the current standard Go compiler by the Go team knows the concrete type of every interface at every call site ahead of time for any Go program", which is obviously not correct, nor possible.

there simply isn't anything in the toolchain to support this kind of analysis

Since we've established that this is possible and that the primary concern with doing this entirely ahead of time is too difficult and/or would incurr an unacceptable hit to compile speed. My next question would be, why not partially determine this at runtime?

If escape analysis were extended to goroutines (in the manner that @randall77 mentioned) and this information is recorded for each argument of each interface method and closure then this can be queried at runtime before dispatch to determine whether an argument needs to be unbound/moved-out from the region. In some cases the memory can be allocated in the appropriate location to begin with.

I would imagine that in real world Go code many arguments to functions can be proven not to escape their goroutine.

atdiar Nov 12, 2024

@Splizard I guess the question is what if a method conditionally assigns a value (allocated in the region) to a global object?

What if the interface being implemented has its implementing value depend itself on some codepath known at runtime?

What kind of conditional paths could someone be applying?

Whether it is setting something at the linker level via ldflags or even at runtime.

I'm wondering if a static analysis might not become too blurry due to compounding false positives.

That's when some help is typically required, often under the form of annotations or other mechanisms.

I believe you're right that whole program wise, we know some things. But escape analysis is still flow sensitive if we want to be somewhat precise.

That's why there are advantages in relying on nudges/intent from devs, explicitly declaring regions: it should be verifiable.

ianlancetaylor Nov 12, 2024
Maintainer

@Splizard Thanks for the note.

I apologize for misunderstanding what you meant by "compiler." That said, words do matter. When you say "compiler," I think you mean the compiler. I don't think you mean the entire toolchain.

This is particularly the case because doing whole-program analysis at link time, especially if we expect to use that analysis to change the generated code, is not something that we are going to do.

I take your point that we can do a kind of escape analysis that allocates goroutine-local memory for slices that do not escape.

I agree that in some cases it may be possible to get escape information dynamically, and generate different code paths based on that information. That said, some of this information is difficult to calculate, even dynamically. For example, arguments passed to fmt.Printf always escape, because fmt.Printf may call the String method, and that may cause the value to escape. To represent that even dynamically is difficult, because in general the call to fmt.Printf may itself be passing interface values whose static types are unknown. So while in general the compiler can in some cases determine that the arguments to fmt.Printf don't escape, it's not obvious to me that this can be done without writing special cases for fmt.Printf. And of course we don't want to write special cases for everything that comes up. For another example, does the buffer passed to io.CopyBuffer escape? It depends on the ReadFrom and WriteTo methods.

So I have come around to agreeing that there are some possibilities here. But I don't think it's clear that we should pursue those possibilities, which may not pan out, in favor of pursuing the memory region proposal.

Splizard Nov 15, 2024

@ianlancetaylor
Cheers, I appreciate your response. Here's the thing, I don't think you need to do whole-program analysis at link time, let's say for each argument provided to each local function, each compilation artifact records "Does argument X of function/method A escape the stack/arena?", take the fmt.Printf may call String example, each (T) String() string, method signature can report this information into the compilation artifact at a fixed addressable location. Then at link time, you AND these values together into a final global addressable value and use this result at runtime as part of the decision on where to allocate. So any place you call fmt.Printf the arguments can be allocated on the stack/arena dependant on this global value. When a plugin is loaded, it ANDs its own values together into the host runtime.

fmt.Stringer and io.Writer interfaces are great examples of where in practise they don't escape their arguments and if they do then this is something tooling should clearly report so that it can be resolved. I imagine most interfaces fit this profile. In other words, either each argument is expected to be able to escape due to the nature of their intended behaviour, or they should never escape. You could do the same thing with closures but perhaps to a less reliable effect.

For the io.CopyBuffer example, where buf is allocated can be decided upon by the logical AND of the addressable escape values recorded for buffer arguments for both the ReadFrom and WriteTo methods.

Does this help to clarify whether or not this would be worth pursuing? I've never said don't do memory regions, I'm saying why not drop the API and use escape analysis (in this case supported at runtime to reduce compilation time and to support dynamic loading) to make these sorts of allocation decisions of whether to allocate on the stack, a goroutine-scoped arena or the heap.

zephyrtronium · 2024-11-09T14:27:26Z

zephyrtronium
Nov 9, 2024

What happens with variables that internally own memory? Are there any differences in what memory is bound to the region between these examples?

// Example A: No region.
x := big.NewInt(1)
for range rand.IntN(32) {
    x.Lsh(x, 64)
}
return x

// Example B: Region for the loop.
x := big.NewInt(1)
region.Do(func() {
    for range rand.IntN(32) {
        x.Lsh(x, 64)
    }
})
return x

// Example C: Region per iteration.
x := big.NewInt(1)
for range rand.IntN(32) {
    region.Do(func() {
        x.Lsh(x, 64)
    })
}
return x

// Example D: Explicit copy out of the region.
x := new(big.Int)
region.Do(func() {
    y := big.NewInt(1)
    for range rand.IntN(32) {
        y.Lsh(y, 64)
    }
    x.Set(y)
})
return x

2 replies

mknyszek Nov 9, 2024
Maintainer Author

Values that internally own memory created outside of a region are very likely to have their allocations unbound, since their internal memory needs to be reachable from outside the region (the original value, in this case, the big.Int).

More specifically, since B and C are mutating an outside-of-region structure inside the region, then any internal allocations would be unbound from the region immediately. In this way, A, B, and C are all approximately the same, but with B and C you're paying an extra cost to unbind from the region.

With D, you only unbind the last value, which means all the intermediate allocations get to be cleaned up.

timothy-king Nov 12, 2024
Maintainer

More specifically, since B and C are mutating an outside-of-region structure inside the region, then any internal allocations would be unbound from the region immediately.

@mknyszek Can you say more about how this is done/what "unbound from the region immediately" means? For simplicity lets assume x has type *struct{f []uint}. We come in with x and x.f allocated from some outer region Q from big.NewInt(1). In example B, we enter a new region R on region.Do, and then we call x.Lsh(...). That gets to some internal x.f = make([]uint, ...) line. Does something different to today's compiler/runtime happen at the point make is called due to regions? Does the unbinding happen when R is exited?

My questions were answered in the detailed Implementation section.

CannibalVox · 2024-11-09T20:59:04Z

CannibalVox
Nov 9, 2024

For arenas and now regions, the obvious use case is allocating request and response objects. This has far better ergonomics than arenas, so that's very good, but the design seems to be fooling itself a bit: obviously most go programs in the wild would have to live entirely inside regions if this plan went forward, since generally speaking 100% of all running code will be an in flight endpoint call that needs access to the request that initiated it.

Making all goroutines have regions isn't just an interesting experiment, that is an accurate description of how most production go code will run and if go can't handle it, this proposal cannot succeed.

I've mentioned it in the past, and I know language changes are a serious issue, but one possibility not explored in the design doc is a type keyword that would prevent taking actions with a variable that would result in its data being promoted out of its region (enforced by compiler), thereby allowing region-allocated data to be safely used read-only on the stack within an ignore goroutine.

It's a big lift, but if it doesn't exist then everything is going to be in a region anyway and this proposal should probably acknowledge that.

2 replies

marcintustin Nov 10, 2024

In practice I think whatever proposal succeeds needs to allow memory to be shared with child goroutines, and so be scoped to a context rather than a goroutine. Or alternatively explicitly support spawning a goroutine attached to an existing region.

atdiar Nov 10, 2024

I believe the mechanism is taking this into account already. As soon as there is shared memory between goroutines, we can automatically opt-out of region-based memory handling in favor of traditional memory handling.

If this feature is stuck in PGO-land, that means that traditional is always the default and establishing a region is a runtime concern.
Not sure why leaving it to the runtime though.
On the other hand, is it valuable for every goroutine to automatically establish a region or is it better to leave it as a dev concern? The design docs hints toward the latter since fading is control-flow sensitive.
Otherwise, it could definitely be a compiler optimization pass on every goroutine of a program.

robaho · 2024-11-09T21:07:57Z

robaho
Nov 9, 2024

I am not certain why you couldn’t get most of the benefit transparently with a small object non copying per routine generational collector/region. So objects that are recently allocated can be quickly collected - hopefully cleaning the region. The region could also support bump allocations if mostly empty - which is the expectation of a region biased workload.

4 replies

mknyszek Nov 9, 2024
Maintainer Author

Connecting this proposal to generational approaches is reasonable. But generational garbage collection has substantial costs that often don't pan out for Go programs. We have tried it, in at least two different iterations.

In short, some applications benefit. Many do not.

At least part of the problem appears to be that, in practical terms, the weak generational hypothesis does not hold up well with respect to heap-allocated memory in many real-world Go programs (think 60-70% young object mortality vs. the 95% typically expected). If the weak generational hypothesis does not hold up well, we're not reaping enough memory each minor cycle to cover the global fixed costs: you will have more GC cycles, so those GC cycles better be a good bang for your buck. In particular, generational collectors need some way to track old->new pointers. Even a simple card marking write barrier is relatively expensive because it is always enabled.

So why does the weak generational hypothesis not hold up well? Compared to Java, many fewer objects get allocated on the heap in Go due to compile-time escape analysis. Empirically, for the same CPU burn, I've observed about a 5x difference. (From this perspective you could argue the weak generational hypothesis does hold, and you might be right. Let's just focus on heap memory though, because we're talking about generational garbage collection techniques.) Furthermore, many more objects are likely to out-live a GC cycle because GOGC=100 is somewhat aggressive as a default. JVMs typically use more memory by default and have longer cycles. And defaults matter.

We have put additional thought into this over the years. There may yet be a way to do it and make it widely profitable, but if so, it's not obvious. It is also possible we made mistakes in our previous attempts and they could be rectified, but it's hard to justify the complexity of a global generational mechanism with the risk of many real programs performing less efficiently out of the box. (Could it be a GOEXPERIMENT like the design discussion here suggests? Sure. But generational garbage collection is still global in a way that this isn't and so you lose some nice locality properties. These properties allow one to, for example, disable regions with PGO. But you can't just turn off the costs of generational garbage collection in such a localized way.)

Part of the reasoning behind this proposal, and the reason this proposal is an API, is to provide a localized mechanism for opting into something sort of generational-like where appropriate. The core observation here is that bounding lifetimes of most objects in most programs is not too hard for someone to do, and they can profit. We learned this from the arena experiment.

To respond to your other concern in a different thread about cognitive overhead: this API is somewhat niche. The point is to apply it to the performance-sensitive part of your code when you have something real to gain. Most Go programs likely do not care. But some will, namely the resource-sensitive and latency-sensitive ones. Once you're already incurring the cognitive overhead of this sensitivity, my guess is that the additional cognitive overhead of regions is a small delta. And given that it can be somewhat locally reasoned about, the hope is that the cognitive overhead is also localized, roughly the same way it would be if you're optimizing a particular codepath.

I hope this is helpful in providing some context on how this specific design came to be.

robaho Nov 10, 2024

I confess that I am unfamiliar with the proposed implementation, but I suspect that that since others have stated this cannot be done statically at compile time, then at the moment an object must be allocated on the heap (which is determined at compile time in Go), it must decide that a region is in effect, and allocate in the region, then using a mechanism I am not clear about, if that object is then passed to another Go routine, or is made a reference of an object outside of the region, it must be moved from the region to the global heap and all pointers to the object updated. Is this correct?

edit: if the object is made an element of another object that escapes the region, then this object as well must be moved.

jellevandenhooff Nov 10, 2024

@robaho when a pointer to a on object allocated in the region is written outside of the region, the object is not moved, but marked as “faded”. When the region is later released, the faded objects are kept alive. The region in that way is different from an arena: all the non-faded allocations are freed, but not the entire arena. This leads to (some) memory fragmentation, but the proposal argues that fragmentation cost will be limited.

Freeing the non-faded allocations is faster than a normal GC cycle because the runtime knows that any non-faded objects have no references outside the region and can be released without further marking and sweeping.

robaho Nov 10, 2024

Interesting. Thanks for the clarification. This has been tested - because on the surface it seems the fragmentation would be excessive unless the runtime marks the variable/code as fading for all future allocations - which I assume is stored in a map in the region object? Doesn’t that mean that every allocation needs a check against the map - so it is not simply a bump allocation?

marcintustin · 2024-11-09T21:56:55Z

marcintustin
Nov 9, 2024

Why not tie the lifetime of the region to a context? Then escape analysis from the goroutine would be unnecessary AND this could be used across goroutines. Specifically this would make it usable for the whole lifetime of an http request.

If we were getting really fancy, we could add a new variant of new, newCtx(ctx, type, size…) that uses the allocator associated with a context, or the default global allocator if none is associated with the context.

9 replies

marcintustin Nov 10, 2024

Ah maybe I should make my own proposal :) I’m ok with an opt in mechanism that can cause crashes.

robaho Nov 10, 2024

The problem with the crash proposal is that it affects the stdlib and others libraries that aren’t aware they’re running in an arena.

marcintustin Nov 10, 2024

What’s a scenario where the Stdlib would fall foul of this if access is scoped to a context or goroutine tree, that the caller couldn’t anticipate when opting that part of their program into regionalised memory management? I view it as being the same as passing a nil pointer to the stdlib - you expect that to panic.

robaho Nov 10, 2024

It allocates an object and stores it in a preexisting cache map. When the arena is cleared the cache now contains an invalid pointer.

marcintustin Nov 10, 2024

Ah yes either the stdlib would have to use only regionalised pools, or programs would have to be able to request creation of memory in the global region to be able to send it to libraries using global pools.

Tbh I think being able to request allocation in the global region would probably be desirable, but an api that requires the programmer to keep track of which region objects are in would probably be more fiddly than we like in golang.

ivankorobkov · 2024-11-10T04:14:35Z

ivankorobkov
Nov 10, 2024

Hi, thank you for your detailed proposal and explanations.

Could you answer some questions, please?

Why not make every goroutine region-based by default?
What happens when a region-based object is passed to an outside function/closure, which can outlive the region?

func myFunc(buf []byte, fn handleFunc) (err error) {
	region.Do(func() {
		data := new(MyBigComplexProto)
		if err = proto.Unmarshal(buf, data); err != nil {
			return
		}
		fn(data) // data passed to the function outside of the region
	})
	return
}

What happens when a region-based object is passed to another goroutine?

func myFunc(buf []byte, fn handleFunc) (err error) {
	region.Do(func() {
		data := new(MyBigComplexProto)
		if err = proto.Unmarshal(buf, data); err != nil {
			return
		}
		go process(data) // data passed to another goroutine
	})
	return
}

Thanks

1 reply

Merovius Nov 10, 2024

From other replies in this thread:

Because there is a cost and it is likely not worth it, in all cases. In particular, this would be basically equivalent to the "Request Oriented Collector" tried out a few years back, which sped up some programs, but slowed down other and was not a clear win.
Regions are stack-specific, so AIUI in your example fn is actually inside the region. If fn escapes data (e.g. by writing it to a global variable), then the underlying array of data would be marked as faded and not collected with the rest of the memory in the region.
Same.

glycerine · 2024-11-10T05:04:24Z

glycerine
Nov 10, 2024

Thanks for working on this. It sounds fantastic.

Allowing regions to span goroutines dramatically complicates their implementation, enough that it may not actually be a profitable feature anymore.

This comment (above) helps my mental picture of how regions would work.

Question: As I potential user of regions, one question in my mind is: do I need to concern myself with the (region or heap) origin of the memory (a pointer to struct of some kind) that I send on a channel? Is there an efficiency difference if something is coming from a region because it needs to be copied to the garbage collected "region" so to speak?

Comment: a nice side effect of any design would be to enable users to experiment readily with memory allocation strategies, to find those most appropriate for their code. I wrote an off-heap hash table for my Go code about 10 years ago, and the main pain-point in use was that I had to serialize and deserialize everything manually to move objects between the heap and the manually managed memory ( https://github.com/glycerine/offheap ). Ideally I imagine that region use could be like the net/http package having a default mux: Having region.Do() and region.Ignore() operate on a default memory manager, but a specific instance could also be instantiated and tweaked. I could then call Do() on my specific instance to get allocations to (in my example) my own previously malloc-ed region of memory that is off heap; writing into my hash table there; ideally without having to manually do the serialization, nor having to chase pointers to child objects to get transitively completely self-contained objects.

Beside the off-heap hash table, a second use case for user-customization would be when running in WASM code on a web-worker thread, and wanting to pass memory to a WasmGC implementation for other components to use. Go compiled to Wasm is probably never going to be able to use WasmGC's moving collector, because it is moving and lacks support for interior pointers, but we might well want to be able to inter-operate with other languages that do.

1 reply

ianlancetaylor Nov 11, 2024
Maintainer

In this proposal nothing gets copied into a region. Objects are allocated in a region, or they are not. Once they are allocated, they do not move.

When receiving an object on a channel, you don't need to worry about where it was allocated, except that 1) if it was allocated in a region, it will escape that region, so maybe don't do that; 2) I am ignoring NUMA issues.

andrewbaptist · 2024-11-11T20:30:14Z

andrewbaptist
Nov 11, 2024

Thank you for the work on this! I'm excited to see this completed.

Regarding the concern in the design related to allowing regions to cover multiple goroutines, this might dovetail really well with a structured/scoped concurrency approaches like https://github.com/sourcegraph/conc. If the lifetime of a goroutine is entirely within a region.Do, then the API issue because tractable. Unfortunately that project hasn't had much activity lately.

In terms of the implementation issue, I agree that it might be better to leave this as a "potential future enhancement". It would be reasonable for this to be higher cost and possibly even have a variant which allowed the user to specify the "low cost single goroutine" region or the "higher cost including child goroutine" regions. Something like region.Do and region.DoWithChildren.

The paradigm I'm thinking about is something like a request/response microservice that kicks off goroutines to make additional requests along the way. Everything is bound to the top level request and when that is complete the entire request and all sub-requests will be destroyed. Within the requests there could be data sharing through channels or other mechanisms, but it would all be in a single region.

2 replies

robaho Nov 11, 2024

This doesn't seem to jive with Ian's comments that Go programs don't adhere to the weak generational hypothesis - because it seems all that would entail is a generational collector, or the request oriented collector would have shown better results https://docs.google.com/document/d/1gCsFxXamW8RRvOe5hECz98Ftk-tcRRJcDFANj2VwCB0/edit?pli=1&tab=t.0

mknyszek Nov 12, 2024
Maintainer Author

The request-oriented collector would've still struggled with this, because it's goroutine-oriented. This may be a case where generational garbage collection helps if reclamation even works out. (For example, if your RPCs/tasks are long and exceed 1 GC cycle, you're likely to not benefit from generational collection.)

FTR, you are not the only one to suggest you would benefit from scoping regions across multiple goroutines. The costs would likely be global, and would likely be similar to generational garbage collection, so I think if we were to go down this road, thinking about that again might be worthwhile. But, a region-style thing still might be worthwhile over generational garbage collection if the semantics were more like the Yak GC (see the detailed design doc).

You're also right that it could (probably) be implemented as an extension to this proposal.

mvdan · 2024-11-11T21:17:05Z

mvdan
Nov 11, 2024
Collaborator

I think it will be relatively common to do some garbage-generating work inside a region.Do call and want to obtain a small result out of it, such as an error or a (T, error). Akin to how sync.Once gained value variants such as sync.OnceValue and sync.OnceValues, have you considered providing something similar?

For example, one of your original examples which uses a closure could be written as follows, saving two lines overall and IMHO being more readable and idiomatic:

func myFunc(buf []byte) error {
	return region.DoValue(func() error {
		data := new(MyBigComplexProto)
		if err := proto.Unmarshal(buf, data); err != nil {
			return err
		}
		use(data)
		return nil
	})
}

I am not particular about the naming; e.g. region.DoValue and region.DoValues could be simply called region.Value and region.Values.

4 replies

thepudds Nov 11, 2024
Collaborator

That's a very interesting idea and could make things tidier and feel more idiomatic.

One initial (and half-baked 😅) thought is I wonder (with sufficient "compiler and runtime magic") if a function returning a value could also serve effectively as the user annotating that the return value should never be bound to the region, thus avoiding the cost of that unbinding.

It might be plausible for at least simple examples, like if we update the first 'Basic' example from the detailed design doc:

var keep *int
keep = region.DoValue(func() *int {
	// NOTE: w is never bound to the region because escape analysis (or something)
	// is able to prove that w leaks to the return value.
	w := new(int) 
	x := new(MyStruct)
	y := make([]int, 10)
	z := make(map[string]string)
	*w = use(x, y, z)
	return w // w is returned here, never having been bound to the region.
}) // x, y, and z's memory is eagerly cleaned up, w is not.

thepudds Nov 11, 2024
Collaborator

Hmmm. Thinking about it for a few minutes more, I'm not sure that a return value would help the compiler's analysis compared to, say, a variable captured by the function literal.

That said, whether a return value or captured variable, perhaps the compiler might be able to prove at least in simple cases that something should not be bound, with the result being that w is not bound in either the original 'Basic' example or my modified example above. I think the current proposal is suggesting the compiler will still be proving that variables within a region can be stack allocated, and at least opportunistically identifying some heap allocations that should not be bound might be plausible as well as an optimization.

Danlock Nov 12, 2024

I like this better than region.Ignore() from a simplicity perspective.

Maybe the proposal could start with region.Do and region.DoValue, and only add Ignore later if those two aren't enough?

tandr Nov 15, 2024

region.Go() for goroutines, and region.Do() for what region.DoValue() from above proposes?

fluhus · 2024-11-12T07:48:21Z

fluhus
Nov 12, 2024

Looks interesting!

Some noob questions:

Does that mean that garbage created inside a Do() isn't collected until the function returns? If so, does it mean that an "irresponsible" use of Do() can potentially accumulate a lot of garbage, increasing the program's memory footprint significantly?
How would dynamically extending a slice/map behave inside a region? For example:
```
var mySlice []int
region.Do(func() {
  var s []int
  for i := range 100 {
    s = append(s, i)
  }
  mySlice = s
})
```
Would the old buffers used for s be cleared as a part of the region, or do they escape it?

1 reply

mknyszek Nov 12, 2024
Maintainer Author

Does that mean that garbage created inside a Do() isn't collected until the function returns? If so, does it mean that an "irresponsible" use of Do() can potentially accumulate a lot of garbage, increasing the program's memory footprint significantly?

No, this is not a risk. It is a problem with arenas, and is something I wanted to avoid.

The regular GC can also reclaim memory inside of a region if it happens to run while the region is out-standing while the GC executes. So, this particular footgun of arenas does not exist in this design. (This wasn't covered in the text above, but is covered in the detailed design.)

How would dynamically extending a slice/map behave inside a region? For example:
var mySlice []int
region.Do(func() {
  var s []int
  for i := range 100 {
    s = append(s, i)
  }
  mySlice = s
})
Would the old buffers used for s be cleared as a part of the region, or do they escape it?

In this example, any intermediate slice backing stores would be reclaimed immediately. The only thing that would escape the region in this example is the final slice backing store.

Maps are trickier to reason about because they're more complicated. For example, their internal growth systems are incremental, so you often have both the old table and the new table sticking around while actively growing the map. This becomes harder to avoid with the upcoming SwissMap implementation, because that is even less likely to discard tables, though some intermediate tables may still be discarded.

My advice for maps (and really any other more complicated data structure) would be to use them entirely in the region if you want any solid guarantees. Anything else is going to be fragile.

robaho · 2024-11-12T17:33:19Z

robaho
Nov 12, 2024

I have been following the conversation, and since my earlier question wasn't answered, I'd like to summarize my understanding and ask if I am understanding this correctly.

When a region is in effect, for every allocation the compiler has already determined if it can be 1) allocated on the stack, 2) allocated in the region, or 3) must be allocated in the global heap

Note, If this isn't the case, and for 2) it decides it can be allocated in the region but then later it can't be collected (which doesn't seem possible, the benefit of releasing the region in one step isn't possible, and it will lead (I think) to massive fragmentation.

So if 4) isn't possible, wouldn't it make more sense to have PGO decide the region boundaries to optimally manage memory. This might be expensive to compute, but it seems better than having the developer trying to decide the region boundaries but not fully being able to control what would actually be in the region (due to library calls, etc.).

The developer controlling the region boundaries feels very "un Go-like".

4 replies

ianlancetaylor Nov 12, 2024
Maintainer

That is not my understanding. My understanding is that for every allocation the compiler has determined if it can be 1) allocated on the stack, 2) allocated on the heap. For the compiler, there is no distinction between your choices 2 and 3.

This proposal says that if the goroutine is currently executing a function that was invoked by region.Do (but not region.Ignore), then when allocating on the heap it will allocate in the goroutine's region.

So, you are correct that there is no benefit of releasing the region in one step. That doesn't happen. But it doesn't follow that fragmentation is a big concern. The detailed design doc linked from the original comment argues that it is not.

I agree that having PGO enable regions sounds like a nice idea. But I don't see how to implement it. What is PGO going to key on?

robaho Nov 12, 2024

If that's the case, then I don't quite understand the benefits - if you can't release the region in a single operation, don't you essentially need to perform a GC trace at every region exit - which sounds incredibly expensive. Fragmentation has to be an issue imo because otherwise on each successive call into a region, if it allocates the same amount of memory as the last, it must have no choice then to expand the per routine region area in order to allow bump allocation. You can't have a bump allocator with holes in the middle (i.e. items that were unable to be released when the region exited) - unless "bump allocator" in the design is being used more loosely than my definition.

As for how PGO would do this, I think I would approach it that so that on method call exit, it records the the total allocation size. then as the method entry point is hit again, and the total allocation size is in the top n% of methods, it creates a region, then performs the standard region clean-up and records it's efficiency - if it isn't x%, that region is marked as not viable. It automatically handles the go routine based regions, as all go routines have a top-level method.

mknyszek Nov 12, 2024
Maintainer Author

For the compiler, there is no distinction between your choices 2 and 3.

+1.

So, you are correct that there is no benefit of releasing the region in one step. That doesn't happen. But it doesn't follow that fragmentation is a big concern. The detailed design doc linked from the original comment argues that it is not.

I'm a bit lost on this point. What does "releasing the region in one step" mean?

Also, fragmentation might be a concern at the absolute lowest layers. The allocator proposed is a bump-pointer-style allocator, but not a pure bump-pointer allocator. It's an Immix-style allocator but without the defragmentation step, so worst case you could have 8 bytes keeping 128 bytes of heap alive (however many times). But I argue the defragmentation is less important if you're using regions effectively; you should already benefit from a lower peak heap size, because your live heap at any given time should be smaller, so the overall heap size should also be smaller. (This is admittedly an under-explored part of the design, but tweaking the allocator or tuning its parameters better are both decent options for making this work in practice. They're really practical details and not intractable problems.)

I agree that having PGO enable regions sounds like a nice idea. But I don't see how to implement it. What is PGO going to key on?

This may be doable, but this seems like a research problem. I am not sure what the right PGO profile data to collect would be such that the compiler could be confident that placing a region in a certain place is a good idea. Here's an attempt.

To sketch this out, I think there are three things the compiler would need to know.

Is the memory allocated by a call stack relatively short-lived?
Does the memory escape the goroutine?
How high up the stack does a given piece of memory ever go?

(1) is straightforward, and I even proposed an extension to heap profiles in the detailed design for helping users find promising areas to apply regions.

(2) is less straightforward but doable. One would add a similar write barrier to the one proposed in the detailed design which might then annotate memory profiles to indicate that a sampled allocation escaped the goroutine. Though this may suffer from some of the same problems as (3) (read on...).

(3) is the really hard bit. I don't know how to do this without maybe adding a very high overhead profiling mechanism. You might need a stack-write barrier, which currently the compiler can't even do because the GC design avoids it. It's very, very expensive to have a write barrier on stack writes in the first place. The information we would need to record for each stack write would also be complicated to compute; something like a high watermark up the stack, but it can't be stack locations because we move stacks, so we would need to convert those locations to PCs on-the-fly. Or something.

The difficulty with expensive diagnostics is not just that users would be less likely to enable them, but they perturb the application in ways that make the diagnostics a poorer representation of reality. This is a big problem.

Disclaimer: this is just a sketch, so there are probably some other issues I'm not really considering. If someone could explain in detail how this would work, and how it would work effectively, that would be one thing. But as Ian says, I don't think we really know how to implement it.

ianlancetaylor Nov 12, 2024
Maintainer

don't you essentially need to perform a GC trace at every region exit

No, that's the point of the new write barrier, which tracks all memory that escapes the region. See the detailed design doc.

jellevandenhooff · 2024-11-13T01:35:12Z

jellevandenhooff
Nov 13, 2024

Bit of a crazy idea: What about a function region.NoUnbind(f func()) which guarantees memory allocated will not be unbound. Any memory allocated inside of f gets marked with a special bit, say "colorful", which is not allowed to be faded. The write barrier could check the "colorful" and panic when it attempts to fade the allocation.

Probably not worth adding, but curious how you imagine thinking about debugging and/or instrumenting code using regions.

24 replies

robaho Nov 13, 2024

I don’t know. Different languages are designed for different purposes and if you try to design a language that work well for any task it is usually a failure. Java & Python gives up performance for safety and ease of use. C++ and Rust give up ease of use for performance and small binaries.

Go sits in the middle of these and probably always will imo - and I don’t think that is a bad thing - it’s actually pretty great.

CannibalVox Nov 13, 2024

Java & Python gives up performance for safety and ease of use

These two languages are in totally different performance classes, and go is generally slower than one of them due to the exact issues that I'm describing.

robaho Nov 13, 2024

My testing doesn’t bear that out for many classes of applications. As I mentioned check out the performance table at github.com/robaho/cpp_leveldb in real world settings.

even if you are correct, this will actually make the problem worse I believe with the additional write barrier. If lowers the GC cost not the allocation cost. I plan on reading the detailed design doc today.

thepudds Nov 13, 2024
Collaborator

I also recall a brief mention of the possibility that arenas may be used internally at google even after the change was shelved though obviously I have no idea whether that happened.

FWIW, the detailed design doc does reference the use of the arena experiment at Google:

In practice, all uses of an arena within Google tightly limit the arena's lifetime to that of a particular function, usually the one they are created in, like the example above. This fact suggests that regions will most likely be usable in most or all of the same circumstances as arenas.

In other words, the suggested region experiment is building upon learnings from prior experiments and not entirely just theoretical speculation. (And the fact that several of those prior experiments have been set aside might give some reassurance that the Go team would likely be willing to set this experiment aside as well if the performance results aren't good enough as happened with the Request Oriented Collector, or if the code impact is anticipated to be too large like with the arena experiment, or if otherwise determined to be not a good fit for Go.)

atdiar Nov 13, 2024

@CannibalVox @robaho I think it should be a wait and see.
The design docs explains that allocating can be expected to be faster.
In fact, it would make sense since we are dealing with coarser spans of memory with potential patterns of recycling.

I can already think of ways this can be very useful, for instance when writing lock free ring buffers (quite used in java world in hft for instance). Mainly to communicate intraprocess (or is it interprocess? Basically bridges).
That could be a boon for wasm and/or interop with other runtimes (in my case I'm thinking swift and android).
These are just ideas. It might or might not pan out exactly. Still worth exploring.

But the way this proposal appears not too intrusive language wise, I have good hopes personally.

objectref · 2024-11-13T08:51:33Z

objectref
Nov 13, 2024

I am really excited about this proposal! Giving extra power to the language (when you really need it) with minimal programmers’ effort is something I would really like to have and it fits really nice with the spirit of Go.
I hope it will be accepted and move forward.

0 replies

glycerine · 2024-11-13T18:55:10Z

glycerine
Nov 13, 2024

I was surprised at the last word in this sentence from the detailed design. Was meant to be "regular heap spaces"? Is there such a thing as a "regular heapArena"? It feels like it might be a typo.

Address Space
When these are created, we provide hints to mmap and the like to be contiguous and in a distinct part of the address space from the heap, though it's OK if these special region heapArenas interleave with regular heapArenas.

2 replies

thepudds Nov 13, 2024
Collaborator

FWIW, the runtime already uses heapArenas internally:

go/src/runtime/mheap.go

Line 238 in f1add18

// A heapArena stores metadata for a heap arena. heapArenas are stored

mknyszek Nov 13, 2024
Maintainer Author

A heapArena is an existing structure in the runtime. So what I mean here by "regular heapArena" is just one without its corresponding isRegionArena bit set.

glycerine · 2024-11-13T19:16:51Z

glycerine
Nov 13, 2024

For example, the compiler‘s escape analysis may be able to deduce that some value may be stack allocated, but this analysis immediately fails if the value is allocated into an arena.

This (detailed design doc) sentence asserts surprising a claim without any backing rationale. If the traditional escape analysis says something can be stack allocated, how is it possible that, all of a sudden, it now cannot be stack allocated if it is under a region.Do() call?

5 replies

mknyszek Nov 13, 2024
Maintainer Author

That's talking about arenas (#51317), not region.Do.

glycerine Nov 13, 2024

That's talking about arenas (#51317), not region.Do.

Ah yes, I missed that. Still, a short reason why arenas collide with escape analysis would help me understand why it doesn't apply to the implicit design as well.

I would like to gently suggest that clarity could be improved by adding a concluding, final sentence to that paragraph, along the lines of, "In contrast, our implicit region design is able to continue to leverage the existing compiler's escape analysis for efficient stack allocation."; if such a claim can be truthfully made.

Merovius Nov 13, 2024

@glycerine Stack allocation has to be reserved by the compiler. Arena allocation was an API call. So the reason that arenas don't play nice with escape analysis is that with arenas, the code explicitly says "please put this on (a special part of) the heap". It's the implicitness of the region design that make it work with escape analysis.

glycerine Nov 13, 2024

Thanks Axel.

atdiar Nov 14, 2024

@glycerine and also because arenas require manual explicit freeing.
So a value allocated in an arena should be live as long as the arena exists. Escape analysis doesn't directly check the arena bounds.

Unlike this region-based design that uses function scopes to define lifetimes (hence the relation to escape analysis although it's renamed fading to be more distinct)

glycerine · 2024-11-13T21:34:45Z

glycerine
Nov 13, 2024

The detailed design doc was very helpful in clarifying for me one main point.

This is not a region design.

I like it, and I think its great. I just think it is mis-named.

Arenas and regions are synonyms, and this design is neither. It is an enhancement to the garbage collector based on a user annotation or hint.

So my primary feedback is about the writing and presentation, not the merits of the design.
The design seems fabulous and well worth doing.

But for the presentation: I think calling this design a region design does it a disservice. It creates confusion that could be avoided. I thought I caught the gist of what was going on from the title and the short description at the top. Only in the detailed design doc did I realize how wrong my assumptions had been.

This design is not doing region management where
a block of memory does not need to be Garbage collected (swept) and the
entire block of memory is released at once. Instead the GC still runs over these
blocks, the blocks are swept for free space upon allocation, and the GC is
run eagerly when the user annotation point is returned from.

So I would, foremost, suggest a rename.

Any of these names would be better...

immix.Do()
blue.Do()
unshared.Do()
collectimmediately.Do()
eagersweep.Do()
...
your idea here
...

This would also immediately allow the detailed design doc to distinguish the
new implicit approach by name. This would help the narrative in many places where this
design and an arena/region approach are to be contrasted.

0 replies

timothy-king · 2024-11-13T22:35:18Z

timothy-king
Nov 13, 2024
Maintainer

Would runtime diagnostics for fades be feasible? Particularly reporting the location of the pointer write that causes the fade.

I could imagine some users of regions wanting fairly rigorous enforcement of not accidentally copying memory allocated within a region. Runtime diagnostics seem like a plausible way to deliver on this.

1 reply

mknyszek Nov 14, 2024
Maintainer Author

Yup! There's a section on it in the doc. Reporting the location of the pointer write that caused the fade is the "Fade profile" sub-heading. This would be fairly easy to implement and could easily be sampled, so we can control overheads.

kelindar · 2024-11-13T22:49:04Z

kelindar
Nov 13, 2024

I read the detailed proposal, and the papers its inspired by. Overall, I can see myself using this in a few places which would bring some sanity back to the codebase.

I think shipping good tooling together with this proposal would actually be key to making it much more useful. If we have something like the proposed heap with lifetime analysis, it would be much easier to identify good candidates. Personally, pprof would be preferred approach since we have it live in production and already use it to identify bottlenecks.

0 replies

tandr · 2024-11-15T01:19:07Z

tandr
Nov 15, 2024

Do I understand proposed spec properly - region exists only for the enclosed function, and not functions called or goroutines created from it? If so, why region is not "inherited"?

0 replies

This comment has been hidden.

Sign in to view

This comment has been hidden.

Sign in to view

This comment has been hidden.

Sign in to view

memory regions #70257

mknyszek Nov 8, 2024 Maintainer

Background

Goals

Design

Comparison with arenas

Summary of benefits and costs

Detailed design and implementation

Design discussion

Goroutine-local region state seems problematic. Why is it OK?

Will code owners need to consider applying region.Ignore everywhere?

Possible extensions

Using PGO to automatically disable costly regions

Provide a GOEXPERIMENT to make every goroutine implicitly a region

Provide a GODEBUG to disable all regions

Next steps

Replies: 26 comments · 110 replies

ianlancetaylor Nov 12, 2024 Maintainer

ianlancetaylor Nov 11, 2024 Maintainer

This comment has been hidden.

This comment has been hidden.

mknyszek Nov 9, 2024 Maintainer Author

mknyszek Nov 9, 2024 Maintainer Author

seankhliao Nov 13, 2024 Collaborator

ianlancetaylor Nov 12, 2024 Maintainer

mknyszek Nov 9, 2024 Maintainer Author

timothy-king Nov 12, 2024 Maintainer

mknyszek Nov 9, 2024 Maintainer Author

mknyszek
Nov 8, 2024
Maintainer

Will code owners need to consider applying `region.Ignore` everywhere?

Provide a `GOEXPERIMENT` to make every goroutine implicitly a region

Provide a `GODEBUG` to disable all regions

Replies: 26 comments 110 replies

ianlancetaylor Nov 12, 2024
Maintainer

ianlancetaylor Nov 11, 2024
Maintainer

mknyszek Nov 9, 2024
Maintainer Author

mknyszek Nov 9, 2024
Maintainer Author

seankhliao Nov 13, 2024
Collaborator

ianlancetaylor Nov 12, 2024
Maintainer

mknyszek Nov 9, 2024
Maintainer Author

timothy-king Nov 12, 2024
Maintainer

mknyszek Nov 9, 2024
Maintainer Author