-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NullPointerException when calling onError(StatusRuntimeException) #11973
Comments
Here is a Java test to reproduce the problem:
|
Note that the test hangs when this happens; the Also, I was not able to reproduce with |
The code is racing but these are application threads that should synchronize their write to
|
These are writes to different StreamObserver instances, so this comment does not apply. |
Yes. Agree that this breaks the documented behavior about synchronicity and needs fixing. |
What version of gRPC-Java are you using?
v1.71.0
What is your environment?
Linux
What did you expect to see?
I was not expecting onError to throw NullPointerException.
What did you see instead?
It threw NPE.
Steps to reproduce the bug
We have been seeing some rare
NullPointerException
orArrayIndexOutOfBoundsException
when callingonError
on an incoming server call. This happens several times per day across all of our services, which translates to maybe one in a few billion calls or so. The exception looks like this:In order to debug this, we added some code to use reflection to dump the internal structure of the
Metadata
object that throws. Here are some examples:Some of these look perfectly valid, while others have the expected content, but in an unusual order, with
null
elements at the beginning or between entries. The documentation forMetadata
clearly says that it is mutable and not thread-safe, and this appears to be due to multi-threaded mutation of theMetadata
object.We initially found this surprising: we're not setting or mutating metadata anywhere.
After reviewing the grpc-java source code, we discovered the following sequence of events:
We call
io.grpc.stub.ServerCalls$ServerCallStreamObserverImpl.onError
which calls
Metadata metadata = Status.trailersFromThrowable(t);
and then calls
io.grpc.internal.ServerCallImpl.close(ServerCallImpl.java:213)
which calls
io.grpc.internal.ServerCallImpl.closeInternal(ServerCallImpl.java:227)
which calls
io.grpc.internal.AbstractServerStream.close(AbstractServerStream.java:133)
which calls
addStatusToTrailers
which mutates the
Metadata
object.The Metadata object mutated here is the
Metadata
object stored in theStatusRuntimeException
that we passed toonError
. Who owns that instance?What actually happens in our code is that we make outgoing gRPC calls to other services which fail with a
StatusRuntimeException
, and then we use the same instance to reply to multiple incoming gRPC calls on different threads. When we see the concurrent mutation of theMetadata
object, that's because theonError
call unconditionally mutates theMetadata
, which is now shared across multiple threads.The code is racing with itself, and this happens even when the application code never sets or modifies
Metadata
objects. This is not great.Personally, I think it's surprising and problematic to mutate the
Metadata
that's attached to aSRE
in theonError
call. I would preferStatusRuntimeException
to be effectively immutable.At a minimum, it would be good to explicitly document that
StatusRuntimeException
is not thread-safe and should not be shared across threads (unfortunately, we end up storing it inFuture
objects, which is difficult to fix on our side). This would - at least - have pointed us in the right direction earlier.Other options:
SRE
immutable by copying the metadata whenever it's returned fromgetTrailers
, but this would have performance implications.onError
not mutate the passed-inMetadata
object:trailers
if set (performance?).Metadata
if the status fields match the correspondingStatus
(not a complete solution).SRE
without aMetadata
object; I believe that the data is currently duplicated via theMetadata
and theStatus
, even if no other metadata is set (not a complete solution).The text was updated successfully, but these errors were encountered: