Ryujinx-uplift

mirror of https://github.com/GreemDev/Ryujinx.git synced 2024-12-21 06:15:54 +01:00

Author	SHA1	Message	Date
gdkchan	80519af67d	Update short cache textures if modified (#4586 )	2023-03-24 12:54:58 +01:00
riperiperi	9f1cf6458c	Vulkan: Migrate buffers between memory types to improve GPU performance (#4540 ) * Initial implementation of migration between memory heaps - Missing OOM handling - Missing `_map` data safety when remapping - Copy may not have completed yet (needs some kind of fence) - Map may be unmapped before it is done being used. (needs scoped access) - SSBO accesses are all "writes" - maybe pass info in another way. - Missing keeping map type when resizing buffers (should this be done?) * Ensure migrated data is in place before flushing. * Fix issue where old waitable would be signalled. - There is a real issue where existing Auto<> references need to be replaced. * Swap bound Auto<> instances when swapping buffer backing * Fix conversion buffers * Don't try move buffers if the host has shared memory. * Make GPU methods return PinnedSpan with scope * Storage Hint * Fix stupidity * Fix rebase * Tweak rules Attempt to sidestep BOTW slowdown * Remove line * Migrate only when command buffers flush * Change backing swap log to debug * Address some feedback * Disallow backing swap when the flush lock is held by the current thread * Make PinnedSpan from ReadOnlySpan explicitly unsafe * Fix some small issues - Index buffer swap fixed - Allocate DeviceLocal buffers using a separate block list to images. * Remove alternative flags * Address feedback	2023-03-19 17:56:48 -03:00
gdkchan	67b4e63cff	Remove MultiRange Min/MaxAddress and rename GetSlice to Slice (#4566 ) * Delete MinAddress and MaxAddress from MultiRange * Rename MultiRange.GetSlice to MultiRange.Slice	2023-03-19 17:31:35 +01:00
jhorv	5131b71437	Reducing memory allocations (#4537 ) * add RecyclableMemoryStream dependency and MemoryStreamManager * organize BinaryReader/BinaryWriter extensions * add StreamExtensions to reduce need for BinaryWriter * simple replacments of MemoryStream with RecyclableMemoryStream * add write ReadOnlySequence<byte> support to IVirtualMemoryManager * avoid 0-length array creation * rework IpcMessage and related types to greatly reduce memory allocation by using RecylableMemoryStream, keeping streams around longer, avoiding their creation when possible, and avoiding creation of BinaryReader and BinaryWriter when possible * reduce LINQ-induced memory allocations with custom methods to query KPriorityQueue * use RecyclableMemoryStream in StreamUtils, and use StreamUtils in EmbeddedResources * add constants for nanosecond/millisecond conversions * code formatting * XML doc adjustments * fix: StreamExtension.WriteByte not writing non-zero values for lengths <= 16 * XML Doc improvements. Implement StreamExtensions.WriteByte() block writes for large-enough count values. * add copyless path for StreamExtension.Write(ReadOnlySpan<int>) * add default implementation of IVirtualMemoryManager.Write(ulong, ReadOnlySequence<byte>); remove previous explicit implementations * code style fixes * remove LINQ completely from KScheduler/KPriorityQueue by implementing a custom struct-based enumerator	2023-03-17 13:14:50 +01:00
riperiperi	da073fce61	GPU: Fast path for adding one texture view to a group (#4528 ) * GPU: Fast path for adding one texture view to a group Texture group handles must store a list of their overlapping views, so they can be properly notified when a write is detected, and a few other things relating to texture readback. This is generally created when the group is established, with each handle looping over all views to find its overlaps. This whole process was also done when only a single view was added (and no handles were changed), however... Sonic Frontiers had a huge cubemap array with 7350 faces (175 cubemaps * 6 faces * 7 levels), so iterating over both handles and existing views added up very fast. Since we are only adding a single view, we only need to _add_ that view to the existing overlaps, rather than recalculate them all. This greatly improves performance during loading screens and a few seconds into gameplay on the "open zone" sections of Sonic Frontiers. May improve loading times or stutters on some other games. Note that the current texture cache rules will cause these views to fall out of the cache, as there are more than the hard cap, so the cost will be repaid when reloading the open zone. I also added some code to properly remove overlaps when texture views are removed, since it seems that was missing. This can be improved further by only iterating handles that overlap the view (filter by range), but so can a few places in TextureGroup, so better to do all at once. The full generation of overlaps could probably be improved in a similar way. I recommend testing a few games to make sure nothing breaks. * Address feedback	2023-03-14 17:33:44 -03:00
riperiperi	1fc90e57d2	Update range for remapped sparse textures instead of recreating them (#4442 ) * Update sparsely mapped texture ranges without recreating Important TODO in TexturePool. Smaller TODO: should I look into making textures with views also do this? It needs to be able to detect if the views can be instantly deleted without issue if they're now remapped. * Actually do partial updates * Signal group dirty after mappings changed * Fix various issues (should work now) * Further optimisation Should load a lot less data (16x) when partial updating 3d textures. * Improve stability * Allow granular uploads on large textures, improve rules * Actually avoid updating slices that aren't modified. * Address some feedback, minor optimisation * Small tweak * Refactor DereferenceRequest More specific initialization methods. * Improve code for resetting handles * Explain data loading a bit more * Add some safety for setting null from different threads. All texture sets come from the one thread, but null sets can come from multiple. Only decrement ref count if we succeeded the null set first. * Address feedback 1 * Make a bit safer	2023-03-14 17:08:44 -03:00
riperiperi	6e9bd4de13	GPU: Scale counter results before addition (#4471 ) * GPU: Scale counter results before addition Counter results were being scaled on ReportCounter, which meant that the _total_ value of the counter was being scaled. Not only could this result in very large numbers and weird overflows if the game doesn't clear the counter, but it also caused the result to change drastically. This PR changes scaling to be done when the value is added to the counter on the backend. This should evaluate the scale at the same time as before, on report counter, but avoiding the issue with scaling the total. Fixes scaling in Warioware, at least in the demo, where it seems to compare old/new counters and broke down when scaling was enabled. * Fix issues when result is partially uploaded. Drivers tend to write the low half first, then the high half. Retry if the high half is FFFFFFFF.	2023-03-12 18:01:15 +01:00
gdkchan	4f3af839be	Minor code formatting (#4498 )	2023-03-04 14:43:08 +01:00
gdkchan	cedd200745	Move gl_Layer to vertex shader if geometry is not supported (#4368 ) * Set gl_Layer on vertex shader if it's set on the geometry shader and it does nothing else * Shader cache version bump * PR feedback * Fix typo	2023-02-25 10:39:51 +00:00
gdkchan	095ad923ad	Account for multisample when calculating render target size hint (#4467 )	2023-02-23 10:08:54 +01:00
gdkchan	c3a5716a95	Add copy dependency for some incompatible texture formats (#4380 ) * Add copy dependency for some incompatible texture formats * Simplify compatibility check	2023-02-21 19:21:57 -03:00
gdkchan	58d7a1fe97	Mark texture as modified and sync on I2M fast path (#4449 )	2023-02-21 10:40:23 +01:00
gdkchan	7aa430f1a5	Add support for advanced blend (part 1/2) (#2801 ) * Add blend microcode registers * Add advanced blend support using host extension * Remove debug message * Use pre-generated table for blend functions * XML docs * Rename AdvancedBlendMode to AdvancedBlendOp for consistency * Remove redundant code * Fix some advanced blend related issues on Vulkan * Formatting	2023-02-19 22:37:37 -03:00
gdkchan	efb135b74c	Clear CPU side data on GPU buffer clears (#4125 ) * Clear CPU side data on GPU buffer clears * Implement tracked fill operation that can signal other resource types except buffer * Fix tests, add missing XML doc * PR feedback	2023-02-16 18:28:49 -03:00
gdkchan	a707842e14	Validate dimensions before creating texture (#4430 )	2023-02-16 11:16:31 -03:00
riperiperi	e4f68592c3	Fix partial updates for textures. (#4401 ) I was forcing some types of texture to partially update when investigating performance with games that stream in data, and noticed that partially loading texture data was really broken on both backends. Fixes Vulkan texture set by getting the correct expected size for the texture. Fixes partial upload on both backends for both Texture 2D Array and Cubemap using the wrong offset and uploading to the first layer/level for a handle. 3D might also be affected. This might fix textures randomly having incorrect data in games that render to it - jumbled in the case of OpenGL, and outdated/black in the case of Vulkan. This case typically happens in UE4 games.	2023-02-12 10:30:26 +01:00
gdkchan	61b1ce252f	Allow partially mapped textures with unmapped start (#4394 )	2023-02-10 11:47:59 -03:00
gdkchan	26bf13a65d	Limit texture cache based on total texture size (#4350 ) * Limit texture cache based on total texture size * Formatting	2023-02-08 14:19:43 +01:00
gdkchan	96cf242bcf	Handle mismatching texture size with copy dependencies (#4364 ) * Handle mismatching texture size with copy dependencies * Create copy and render textures with the minimum possible size * Only align width for comparisons, assume that height is always exact * Fix IsExactMatch size check * Allow sampler and copy textures to match textures with larger width * Delete texture ChangeSize related code * Move AdjustSize to TextureInfo and give it a better name, adjust usages * Fix GetMinimumWidthInGob when minimumWidth > width * Only update render targets that are actually cleared for clear Avoids creating textures with incorrect sizes * Delete UpdateRenderTargetState method that is not needed anymore Clears now only ever sets the render targets that will be cleared rather than all of them	2023-02-08 08:48:09 +01:00
gdkchan	43081c16c4	Insert bitcast for assignment of fragment integer outputs on GLSL (#4369 ) * Insert bitcast for assignment of fragment integer outputs on GLSL * Shader cache version bump	2023-02-05 18:52:57 -03:00
gdkchan	2fd819613f	SPIR-V: Change BitfieldExtract and BitfieldInsert for SPIRV-Cross (#4336 ) * SPIR-V: Change BitfieldExtract and BitfieldInsert types to make Metal MSL compiler happy * Shader cache version bump	2023-01-23 19:20:40 -03:00
riperiperi	e3d0ccf8d5	Allow setting texture data from 1x to fix some textures resetting randomly (#2860 ) * Allow setting texture data from 1x to fix some textures resetting randomly Expected targets: - Deltarune 1+2 - Crash Team Racing - Those new pokemon games idk * Allow scaling of MSAA textures, propagate scale on copy. * Fix Rebase Oops * Automatic disable * A bit more aggressive * Without the debug log * Actually decrement the score when writing.	2023-01-22 02:03:30 +00:00
gdkchan	6adf15e479	Implement CSET and CSETP shader instructions (#4318 ) * Implement CSET and CSETP shader instructions * Shader cache version bump * Fix CC.HI	2023-01-21 12:18:05 -03:00
gdkchan	86fd0643c2	Implement support for page sizes > 4KB (#4252 ) * Implement support for page sizes > 4KB * Check and work around more alignment issues * Was not meant to change this * Use MemoryBlock.GetPageSize() value for signal handler code * Do not take the path for private allocations if host supports 4KB pages * Add Flags attribute on MemoryMapFlags * Fix dirty region size with 16kb pages Would accidentally report a size that was too high (generally 16k instead of 4k, uploading 4x as much data) Co-authored-by: riperiperi <rhy3756547@hotmail.com>	2023-01-17 05:13:24 +01:00
riperiperi	f0e27a23a5	Add short duration texture cache (#3754 ) * Add short duration texture cache This texture cache takes textures that lose their last pool reference and keeps them alive until the next frame, or until an incompatible overlap removes it. This is done since under certain circumstances, a texture's reference can be wiped from a pool despite it still being in use - though typically the reference will return when rendering the next frame. While this may slightly increase texture memory usage when quickly going through a bunch of temporary textures, it's still bounded due to the overlap removal rule. This greatly increases performance in Hyrule Warriors: Age of Calamity. It may positively affect some UE4 games which dip framerate severely under certain circumstances. * Small optimization * Don't forget this. * Add short cache dictionary * Address feedback * Address some feedback	2023-01-17 04:39:46 +01:00
gdkchan	93df366b2c	Fix texture flush from CPU WaitSync regression on OpenGL (#4289 )	2023-01-14 11:23:57 -03:00
gdkchan	cd3a15aea5	Fix NRE when MemoryUnmappedHandler is called for a destroyed channel (#4285 )	2023-01-14 00:16:06 -03:00
gdkchan	070136b3f7	Fix texture modified on CPU from GPU thread after being modified on GPU not being updated (#4284 )	2023-01-13 23:46:45 -03:00
riperiperi	8fa248ceb4	Vulkan: Add workarounds for MoltenVK (#4202 ) * Add MVK basics. * Use appropriate output attribute types * 4kb vertex alignment, bunch of fixes * Add reduced shader precision mode for mvk. * Disable ASTC on MVK for now * Only request robustnes2 when it is available. * It's just the one feature actually * Add triangle fan conversion * Allow NullDescriptor on MVK for some reason. * Force safe blit on MoltenVK * Use ASTC only when formats are all available. * Disable multilevel 3d texture views * Filter duplicate render targets (on backend) * Add Automatic MoltenVK Configuration * Do not create color attachment views with formats that are not RT compatible * Make sure that the host format matches the vertex shader input types for invalid/unknown guest formats * FIx rebase for Vertex Attrib State * Fix 4b alignment for vertex * Use asynchronous queue submits for MVK * Ensure color clear shader has correct output type * Update MoltenVK config * Always use MoltenVK workarounds on MacOS * Make MVK supersede all vendors * Fix rebase * Various fixes on rebase * Get portability flags from extension * Fix some minor rebasing issues * Style change * Use LibraryImport for MVKConfiguration * Rename MoltenVK vendor to Apple Intel and AMD GPUs on moltenvk report with the those vendors - only apple silicon reports with vendor 0x106B. * Fix features2 rebase conflict * Rename fragment output type * Add missing check for fragment output types Might have caused the crash in MK8 * Only do fragment output specialization on MoltenVK * Avoid copy when passing capabilities * Self feedback * Address feedback Co-authored-by: gdk <gab.dark.100@gmail.com> Co-authored-by: nastys <nastys@users.noreply.github.com>	2023-01-13 01:31:21 +01:00
gdkchan	94a64f2aea	Remove textures from cache on unmap if not mapped and modified (#4211 )	2023-01-11 01:53:56 +00:00
gdkchan	9dfe81770a	Use vector outputs for texture operations (#3939 ) * Change AggregateType to include vector type counts * Replace VariableType uses with AggregateType and delete VariableType * Support new local vector types on SPIR-V and GLSL * Start using vector outputs for texture operations * Use vectors on more texture operations * Use vector output for ImageLoad operations * Replace all uses of single destination texture constructors with multi destination ones * Update textureGatherOffsets replacement to split vector operations * Shader cache version bump Co-authored-by: Ac_K <Acoustik666@gmail.com>	2022-12-29 16:09:34 +01:00
riperiperi	e20abbf9cc	Vulkan: Don't flush commands when creating most sync (#4087 ) * Vulkan: Don't flush commands when creating most sync When the WaitForIdle method is called, we create sync as some internal GPU method may read back written buffer data. Some games randomly intersperse compute dispatch into their render passes, which result in this happening an unbounded number of times depending on how many times they run compute. Creating sync in Vulkan is expensive, as we need to flush the current command buffer so that it can be waited on. We have a limited number of active command buffers due to how we track resource usage, so submitting too many command buffers will force us to wait for them to return to the pool. This PR allows less "important" sync (things which are less likely to be waited on) to wait on a command buffer's result without submitting it, instead relying on AutoFlush or another, more important sync to flush it later on. Because of the possibility of us waiting for a command buffer that hasn't submitted yet, any thread needs to be able to force the active command buffer to submit. The ability to do this has been added to the backend multithreading via an "Interrupt", though it is not supported without multithreading. OpenGL drivers should already be doing something similar so they don't blow up when creating lots of sync, which is why this hasn't been a problem for these games over there. Improves Vulkan performance on Xenoblade DE, Pokemon Scarlet/Violet, and Zelda BOTW (still another large issue here) * Add strict argument This is technically a separate concern from whether the sync is a host syncpoint. * Remove _interrupted variable * Actually wait for the invoke This is required by AMD GPUs, and also may have caused some issues on other GPUs. * Remove unused using. * I don't know why it added these ones. * Address Feedback * Fix typo	2022-12-29 15:39:04 +01:00
riperiperi	470be03c2f	GPU: Add fallback when 16-bit formats are not supported (#4108 ) * Add conversion for 16 bit RGBA formats (not supported in Rosetta) * Rebase fix Rebase fix * Forgot to remove this * Fix RGBA16 format conversion * Add RGBA4 -> RGBA8 conversion * Handle host stride alignment * Address Feedback Part 1 * Can't count * Don't zero out rgb when alpha is 0 * Separate RGBA4 and 5-bit component formats Not sure of a better way to name them... * Add A1B5G5R5 conversion * Put this in the right place. * Make format naming consistent for capabilities * Change method names	2022-12-26 15:50:27 -03:00
Hunter	c963b3c804	Added Generic Math to BitUtils (#3929 ) * Generic Math Update Updated Several functions in Ryujinx.Common/Utilities/BitUtils to use generic math * Updated BitUtil calls * Removed Whitespace * Switched decrement * Fixed changed method calls. The method calls were originally changed on accident due to me relying too much on intellisense doing stuff for me * Update Ryujinx.Common/Utilities/BitUtils.cs Co-authored-by: gdkchan <gab.dark.100@gmail.com> Co-authored-by: gdkchan <gab.dark.100@gmail.com>	2022-12-26 14:11:05 +00:00
gdkchan	f906eb06c2	Implement a software ETC2 texture decoder (#4121 ) * Implement a software ETC2 texture decoder * Fix output size calculation for non-2D textures * Address PR feedback	2022-12-21 20:39:58 -03:00
gdkchan	1cca3e99ab	GPU: Force rebind when pool changes (#4129 )	2022-12-21 17:35:28 -03:00
gdkchan	cb70e7bb30	Fix DrawArrays vertex buffer size (#4141 )	2022-12-21 19:08:12 +01:00
gdkchan	ec4cd57ccf	Implement another non-indexed draw method on GPU (#4123 )	2022-12-16 12:06:38 -03:00
riperiperi	5a085cba0f	GPU: Fix layered attachment write (#4131 ) Fixes a regression caused by #4003 where the code that writes `_vtgWritesRtLayer` was removed, breaking the crowd in mario strikers.	2022-12-16 09:40:01 -03:00
gdkchan	f4d731ae20	Fix NRE when loading Vulkan shader cache with Vertex A shaders (#4124 )	2022-12-15 17:52:12 +01:00
Isaac Marovitz	8ac53c66b4	Remove Half Conversion (#4106 ) * Remove HalfConversion * Update `CodeGenVersion`	2022-12-14 21:13:23 -03:00
Andrey Sukharev	edf7e628ca	Use method overloads that support trimming. Mark some types to be trimming friendly (#4083 ) * Use method overloads that support trimming. Mark some types to be trimming friendly * Use generic version of marshalling method	2022-12-12 15:10:05 +01:00
Isaac Marovitz	851d81d24a	Fix Redundant Qualifer Warnings (#4091 ) * Fix Redundant Qualifer Warnings * Remove unnecessary using	2022-12-10 21:21:13 +01:00
gdkchan	459c4caeba	Fix HasUnalignedStorageBuffers value when buffers are always unaligned (#4078 )	2022-12-09 17:41:40 -03:00
gdkchan	8428bb6541	Fix shader FSWZADD instruction (#4069 ) * Fix shader FSWZADD instruction * Shader cache version bump	2022-12-08 14:08:07 -03:00
gdkchan	9a0330f7f8	Shader: Implement PrimitiveID (#4067 ) * Shader: Implement PrimitiveID * Shader cache version bump	2022-12-08 10:55:03 +01:00
riperiperi	f23b2878cc	Shader: Add fallback for LDG from "ube" buffer ranges. (#4027 ) We have a conversion from LDG on the compute shader to a special constant buffer binding that's used to exceed hardware limits on compute, but it was only running if the byte offset could be identified. The fallback that checks all of the bindings at runtime only checks the storage buffers. This PR adds checking ube ranges to the LoadGlobal fallback. This extends the changes in #4011 to only check ube entries which are accessed by the shader. Fixes particles affected by the wind in The Legend of Zelda: Breath of the Wild. May fix other weird issues with compute shaders in some games. Try a bunch of games and drivers to make sure they don't blow up loading constants willynilly from searchable buffers.	2022-12-06 23:15:44 +00:00
gdkchan	dde9bb5c69	Fix storage buffer access when match fails (#4037 ) * Fix storage buffer access when match fails * Shader cache version bump	2022-12-06 03:36:54 +00:00
gdkchan	de06ffb0f7	Fix shaders with global memory access from unknown locations (#4029 ) * Fix shaders with global memory access from unknown locations * Shader cache version bump	2022-12-06 01:09:24 +00:00
gdkchan	bbb24d8c7e	Restrict shader storage buffer search when match fails (#4011 ) * Restrict storage buffer search when match fails * Shader cache version bump	2022-12-05 19:11:32 +00:00

1 2 3 4 5 ...

500 Commits