Skip to content

Conversation

adrian-cojocaru
Copy link
Collaborator

@adrian-cojocaru adrian-cojocaru commented Jun 25, 2025

Possible workarounds for some obscure Android ANRs during surface/context destruction (more info #3618).

MapLibreGLSurfaceView.WAIT_COMMANDS_ON_CLEANUP = true tries to wait the GPU commands since the opengl/egl implementation seems to hang on cleanup, so we're trying to enforce the order. When enabled it adds some wait time since it forces GPU task completion.

MapLibreGLSurfaceView.DEFER_CONTEXT_DESTRUCTION = true moves the blocking calls to a different thread. The main thread still needs to pause the render thread (and wait for it).

@louwers
Copy link
Member

louwers commented Jun 25, 2025

Can you provide some more context? When is this needed?

@sjg-wdw
Copy link
Collaborator

sjg-wdw commented Jul 3, 2025

We're seeing a case on specialized hardware where one render thread is going and another one gets stuck shutting down in the surface or context destruction. Specifically in the egl calls.

The thinking is flushing out the queue first might help with that.

Copy link
Member

@louwers louwers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like this should not be configurable in my opinion.

Unless it's just intended for testing with the intention to decide on a possible other default behavior later. In that case please add a comment with an explanation and a tracking issue.

mEglSurface = null;
}
}

public void finish() {
if (DEFER_CONTEXT_DESTRUCTION) {
MapLibreGLSurfaceView view = mGLSurfaceViewWeakRef.get();
if (view != null && (mEglContext != null || mEglDisplay != null)) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a race condition here. To fix it, assign the "final" identifiers first, then use those in the "if" condition.

Copy link
Collaborator Author

@adrian-cojocaru adrian-cojocaru Jul 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The surface/context are touched only by the render thread (and in synchronized blocks). Unless I'm missing something this should guarantee that the values are valid in that scope.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps, but that requires non-local knowledge. For example, I am not sure if your analysis is correct.

Meanwhile, if you adopt the suggestion, then we won't need to discuss it. It will locally-provable that there is no race condition. That is preferable.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd like to see if this case is necessary. It's a weird way to work around a driver bug and, if we're lucky, we can get rid of it completely.

If it does have to be merged in, then we should look closer.

factory.destroyContext(mEgl, display, context);
}

if (display != null) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the original code this runs whenever display != null. Here we might skip this based on the earlier "if" condition (line 313). I don't see why.

More generally, there are two ways of finishing, by duplication. What about combining these, e.g. always defining and using the "final" identifiers, so that we either do "the thing" in this thread or we do "the thing" in a cleanup thread, but where we always do the same thing?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. You're right. The display cleanup will get skipped when there is no valid mGLSurfaceViewWeakRef, but at the same time MapLibreGLSurfaceView is the owner of the EglHelper (and passes itself as a weak reference) and I'm not sure if there is a scenario where the parent can go out of scope and the helper is still alive (this leads to other problematic leaks).

My main concern when adding the workarounds was to keep the existing surface lifecycle code the same since it has been problematic in the past. If it is confirmed that they solve the existing issues this needs to be extended to cover the texture view variant.

MapLibreGLSurfaceView view = mGLSurfaceViewWeakRef.get();
if (view != null) {
view.eglWindowSurfaceFactory.destroySurface(mEgl, mEglDisplay, mEglSurface);
if (DEFER_CONTEXT_DESTRUCTION) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: To keep these cases from looking more different than they are, than they are supposed to be, how about swapping the "final" identifier assignments with the "if" statement. We could always call factory.destroySurface(mEgl, display, surface); either in this thread or in the cleanup thread.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

4 participants