-
-
Notifications
You must be signed in to change notification settings - Fork 2k
Fix for vulkan_create_swapchain() not clearing flag (fixes random crashes with fast forwarding) #18295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…r and avoid crashes
Except that some drivers are known to break in the wild when being too clever with oldSwapchain. |
gfx/common/vulkan_common.c
Outdated
return false; | ||
} | ||
|
||
if (count > ctx_swap_images_cap) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is nonsense. Swapchain can still return indices for images that you didn't acquire, and will just crash then.
gfx/common/vulkan_common.c
Outdated
} | ||
|
||
old_swapchain = vk->swapchain; | ||
info.oldSwapchain = old_swapchain; /* !important can cause crash on some devices if destroyed before creating new is created! keep it alive until after creation */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using oldSwapchain is far more unstable than not using it, and just blindly merging this will likely break way more than it helps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not true, it completely resolved my crashes. You can clearly see it flickering when jumping between fast forward and normal speed and its not happening with my fixes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there perhaps a different explanation to that? Is there a missing QueueWaitIdle before the swapchain destruction perhaps? That's a more plausible explanation. Deferring oldSwapchain destruction is almost certainly not the real underlying issue here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it can be demonstrated that QueueWaitIdle does not work for whatever reason, and basically you're dealing with an Android bug, then it is appropriate to use the oldSwapchain method, but only do that on Android to limit potential damage to e.g. Windows where WSI is known to be very temperamental.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it can be demonstrated that QueueWaitIdle does not work for whatever reason, and basically you're dealing with an Android bug, then it is appropriate to use the oldSwapchain method, but only do that on Android to limit potential damage to e.g. Windows where WSI is known to be very temperamental.
Well yeah its on 3 out of the 3 Android devices I have, but I can't say its a problem with ALL Android devices. But I will commit again with minimal change using oldSwapChain and will case it only on ANDROID. But honestly the WIndows and Linux builds with oldSwapChain work fine for me too. I don't see how removing the old swapchain only after the new one is created should be any problem. There's no such information about swapchains can be only 1 at a time.
In fact from every info I read its actually recommended to just remove the old one after the new one is created. I don't know why the Idle wait is implemented as therer's no explantion in the code whatsoever for this way of doing it except comments say yeah wait for new one. But if you just delete the old one after the new one is created, there's really no need to wait for it. The way I do it only simplifies things and makes it safer unless somehow there is indeed a requirement that there's only 1. But I can't find anything about that
gfx/common/vulkan_common.c
Outdated
break; | ||
default: | ||
break; | ||
case VK_PRESENT_MODE_IMMEDIATE_KHR: RARCH_DBG("[Vulkan] Creating swapchain: IMMEDIATE.\n"); break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you really have to reformat a ton of code that has nothing to do with the fix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the only really reformatted part and its nog 98% as you say
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
90% of the PR is just rewriting code (mostly reformatting?) that has nothing to do with your actual issue you're trying to solve. Good PRs are focused, and to the point, this PR is not. Based on your issue statement this sounds like it should be fixed with ~5 lines of code or so?
I dont agree at all yes I reformatted some parts of the code as it was just my prettifier doing that. But if you would look closely I actually did change a lot of stuff. I added extra checks and I made sure old swapchain is removed only after the new one is created. Just try out my fix as it worked for me and you clearly see that flickering from removing and creating the swapchain not happening anymore and its a much more smoother transition between fastforward and normal speed now |
Hi, perhaps it's best to leave the reformatting and stuff for another followup PR after this is in a mergeable state and reduce the surface area of this PR to only make the necessary functional changes? |
Then don't commit those hunks? Code reformats should generally not be done unless agreed with the maintainer.
Asking reviewers to wade through changes just to spot if there's a tiny functional change in there instead of reformat is very annoying. When I review stuff like that, I don't have patience left to consider the real functional changes. |
Hi @ro8inmorgan , just to reaffirm, I think if you just do another commit reverting some of the code formatting so that we only see the functional changes then @Themaister prob has an easier time reviewing this and we can get this in a mergeable state sooner. Thanks |
No worries, I will commit again with just the oldSwapChain change only. I can sometimes go a little overboard and rewrite the whole function instead of sticking to the original intention. But I understand for PR review its annoying, and just the minimal change is easyer to read. Will try to commit with minimal changes later today :) |
… function with only minimal changes made for this bug fix on Android only
…w one is created with minimal changes to original vulkan_create_swapchain() function
Ok I commited it now with only minimal changes and exclusive to Android. I tested it on multiple Android devices and fix still works and completely solves any crashes I had before when switching between fastforwarding and normal speed, it also stops the screen flickering when you switch between modes. Tested it using different swap interval settings, synchronisation settings and shaders |
Ok wait it actually seems the crashes are back now and just this change only does not resolve it. I need to check again into my previous version, what part of that code actually resolves the issue then. Leaving this PR open but if I find it's not about oldswapchain creation but something else I will submit a new PR instead |
Ok I reformatted my initial fix, but without all the mess. So now all my changes are clear without changing any existing code formatting etc. I'm still looking into which part is exactly the code change that avoids the crash. but honestly all these changes are in my opinion an improvement. But I will still investigate the actual needed improvement for Android specific so I rule out any chances of causing problems for other platforms again. My initial thoughts was the creation and destroying of the swapchain and the other parts I added for general more improvement. But now it seems it was actually something in the other parts that was fixing my crash and not the destroying and creating of the swapchain itself |
…ery call to this function can be handled as a new swapchain creation while it might not be the case, it fixes crashes on Android
Ok done! Found the problem and have updated this PR with a fix |
vulkan_emulated_mailbox_init(&vk->mailbox, vk->context.device, vk->swapchain); | ||
|
||
|
||
/* This flag needs to be cleared otherwise elsewhere it can be perceived as if there's a new swapchain created everytime its being called */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this helps, it might point to bugs elsewhere, but alone, this looks very wrong. I'm no longer familiar with this code, but further up there's stuff like:
vk->flags |= VK_DATA_FLAG_CREATED_NEW_SWAPCHAIN;
if ( (vk->swapchain != VK_NULL_HANDLE)
&& (!(vk->context.flags & VK_CTX_FLAG_INVALID_SWAPCHAIN))
&& (vk->context.swapchain_width == width)
&& (vk->context.swapchain_height == height)
&& (vk->context.swap_interval == swap_interval))
{
/* Do not bother creating a swapchain redundantly. */
#ifdef VULKAN_DEBUG
RARCH_DBG("[Vulkan] Do not need to re-create swapchain.\n");
#endif
vulkan_create_wait_fences(vk);
if ( (vk->flags & VK_DATA_FLAG_EMULATING_MAILBOX)
&& (vk->mailbox.swapchain == VK_NULL_HANDLE))
{
vulkan_emulated_mailbox_init(
&vk->mailbox, vk->context.device, vk->swapchain);
vk->flags &= ~VK_DATA_FLAG_CREATED_NEW_SWAPCHAIN;
return true;
}
If that code path is reached without a new swapchain having been created that seems like it should not be possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the code figures out there is no need to recreate it's already clearing that flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this helps, it might point to bugs elsewhere, but alone, this looks very wrong. I'm no longer familiar with this code, but further up there's stuff like:
vk->flags |= VK_DATA_FLAG_CREATED_NEW_SWAPCHAIN; if ( (vk->swapchain != VK_NULL_HANDLE) && (!(vk->context.flags & VK_CTX_FLAG_INVALID_SWAPCHAIN)) && (vk->context.swapchain_width == width) && (vk->context.swapchain_height == height) && (vk->context.swap_interval == swap_interval)) { /* Do not bother creating a swapchain redundantly. */ #ifdef VULKAN_DEBUG RARCH_DBG("[Vulkan] Do not need to re-create swapchain.\n"); #endif vulkan_create_wait_fences(vk); if ( (vk->flags & VK_DATA_FLAG_EMULATING_MAILBOX) && (vk->mailbox.swapchain == VK_NULL_HANDLE)) { vulkan_emulated_mailbox_init( &vk->mailbox, vk->context.device, vk->swapchain); vk->flags &= ~VK_DATA_FLAG_CREATED_NEW_SWAPCHAIN; return true; }
If that code path is reached without a new swapchain having been created that seems like it should not be possible.
Thats actually not true, there are paths in this function that will return true without ever clearing the flag. This function is a one off so it should always be cleared after creation is success. Could you ellaborate on why it looks "very wrong"? Because IMO it should always be cleared before the function ends.
The flag is set at line 1960 with
vk->flags |= VK_DATA_FLAG_CREATED_NEW_SWAPCHAIN;
and it def could reach the last return true without ever clearing the flag even tho its succesfull. In fact I actually see another path now in this function at line 2220 that doesn't clear the flag either, which could cause the same issue.
Leaving this flag set it will life beyond this function and can cause false positive on new swapchains being created. I personally feel nog clearing this flag here is the bug.
I searched issues here on github and I am definitly not the only one having the same problem specific with Vulkan.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the code figures out there is no need to recreate it's already clearing that flag.
That is the problem, it isn't. There are atleast 2 paths where its creating it but not clearing the flag.
Description
I had random crashes on my Android device (Anbernic rg476hz) with 120hz screen when switching between fastforwarding and normal speed. After looking into it I narrowed it down to this happening in the creation and removing of swapchains.
Looking at the vulkan_create_swapchain() function I found out that the VK_DATA_FLAG_CREATED_NEW_SWAPCHAIN flag was never cleared and everytime this function is called it could return invalid that there is a new swapchain created even when there is not.
Not clearing this flag gave crashes on the Android version, but it could potentially be problematic on all platforms.