Ryujinx-uplift/src/ARMeilleure/Instructions/InstEmitSystem32.cs

352 lines
12 KiB
C#
Raw Normal View History

Add most of the A32 instruction set to ARMeilleure (#897) * Implement TEQ and MOV (Imm16) * Initial work on A32 instructions + SVC. No tests yet, hangs in rtld. * Implement CLZ, fix BFI and BFC Now stops on SIMD initialization. * Exclusive access instructions, fix to mul, system instructions. Now gets to a break after SignalProcessWideKey64. * Better impl of UBFX, add UDIV and SDIV Now boots way further - now stuck on VMOV instruction. * Many more instructions, start on SIMD and testing framework. * Fix build issues * svc: Rework 32 bit codepath Fixing once and for all argument ordering issues. * Fix 32 bits stacktrace * hle debug: Add 32 bits dynamic section parsing * Fix highCq mode, add many tests, fix some instruction bugs Still suffers from critical malloc failure :weary: * Fix incorrect opcode decoders and a few more instructions. * Add a few instructions and fix others. re-disable highCq for now. Disabled the svc memory clear since i'm not sure about it. * Fix build * Fix typo in ordered/exclusive stores. * Implement some more instructions, fix others. Uxtab16/Sxtab16 are untested. * Begin impl of pairwise, some other instructions. * Add a few more instructions, a quick hack to fix svcs for now. * Add tests and fix issues with VTRN, VZIP, VUZP * Add a few more instructions, fix Vmul_1 encoding. * Fix way too many instruction bugs, add tests for some of the more important ones. * Fix HighCq, enable FastFP paths for some floating point instructions (not entirely sure why these were disabled, so important to note this commit exists) Branching has been removed in A32 shifts until I figure out if it's worth it * Cleanup Part 1 There should be no functional change between these next few commits. Should is the key word. (except for removing break handler) * Implement 32 bits syscalls Co-authored-by: riperiperi <rhy3756547@hotmail.com> Implement all 32 bits counterparts of the 64 bits syscalls we currently have. * Refactor part 2: Move index/subindex logic to Operand May have inadvertently fixed one (1) bug * Add FlushProcessDataCache32 * Address jd's comments * Remove 16 bit encodings from OpCodeTable Still need to catch some edge cases (operands that use the "F" flag) and make Q encodings with non-even indexes undefined. * Correct Fpscr handling for FP vector slow paths WIP * Add StandardFPSCRValue behaviour for all Arithmetic instructions * Add StandardFPSCRValue behaviour to compare instructions. * Force passing of fpcr to FPProcessException and FPUnpack. Reduces potential for code error significantly * OpCode cleanup * Remove urgency from DMB comment in MRRC DMB is currently a no-op via the instruction, so it should likely still be a no-op here. * Test Cleanup * Fix FPDefaultNaN on Ryzen CPUs * Improve some tests, fix some shift instructions, add slow path for Vadd * Fix Typo * More test cleanup * Flip order of Fx and index, to indicate that the operand's is the "base" * Remove Simd32 register type, use Int32 and Int64 for scalars like A64 does. * Reintroduce alignment to DecoderHelper (removed by accident) * One more realign as reading diffs is hard * Use I32 registers in A32 (part 2) Swap default integer register type based on current execution mode. * FPSCR flags as Registers (part 1) Still need to change NativeContext and ExecutionContext to allow getting/setting with the flag values. * Use I32 registers in A32 (part 1) * FPSCR flags as registers (part 2) Only CMP flags are on the registers right now. It could be useful to use more of the space in non-fast-float when implementing A32 flags accurately in the fast path. * Address Feedback * Correct FP->Int behaviour (should saturate) * Make branches made by writing to PC eligible for Rejit Greatly improves performance in most games. * Remove unused branching for Vtbl * RejitRequest as a class rather than a tuple Makes a lot more sense than storing tuples on a dictionary. * Add VMOVN, VSHR (imm), VSHRN (imm) and related tests * Re-order InstEmitSystem32 Alphabetical sorting. * Address Feedback Feedback from Ac_K, remove and sort usings. * Address Feedback 2 * Address Feedback from LDj3SNuD Opcode table reordered to have alphabetical sorting within groups, Vmaxnm and Vminnm have split names to be less ambiguous, SoftFloat nits, Test nits and Test simplification with ValueSource. * Add Debug Asserts to A32 helpers Mainly to prevent the shift ones from being used on I64 operands, as they expect I32 input for most operations (eg. carry flag setting), and expect I32 input for shift and boolean amounts. Most other helper functions don't take Operands, throw on out of range values, and take specific types of OpCode, so didn't need any asserts. * Use ConstF rather than creating an operand. (useful for pooling in future) * Move exclusive load to helper, reference call flag rather than literal 1. * Address LDj feedback (minus table flatten) one final look before it's all gone. the world is so beautiful. * Flatten OpCodeTable oh no * Address more table ordering * Call Flag as int on A32 Co-authored-by: Natalie C. <cyuubiapps@gmail.com> Co-authored-by: Thog <thog@protonmail.com>
2020-02-23 22:20:40 +01:00
using ARMeilleure.Decoders;
using ARMeilleure.IntermediateRepresentation;
using ARMeilleure.State;
using ARMeilleure.Translation;
using System;
Add Profiled Persistent Translation Cache. (#769) * Delete DelegateTypes.cs * Delete DelegateCache.cs * Add files via upload * Update Horizon.cs * Update Program.cs * Update MainWindow.cs * Update Aot.cs * Update RelocEntry.cs * Update Translator.cs * Update MemoryManager.cs * Update InstEmitMemoryHelper.cs * Update Delegates.cs * Nit. * Nit. * Nit. * 10 fewer MSIL bytes for us * Add comment. Nits. * Update Translator.cs * Update Aot.cs * Nits. * Opt.. * Opt.. * Opt.. * Opt.. * Allow to change compression level. * Update MemoryManager.cs * Update Translator.cs * Manage corner cases during the save phase. Nits. * Update Aot.cs * Translator response tweak for Aot disabled. Nit. * Nit. * Nits. * Create DelegateHelpers.cs * Update Delegates.cs * Nit. * Nit. * Nits. * Fix due to #784. * Fixes due to #757 & #841. * Fix due to #846. * Fix due to #847. * Use MethodInfo for managed method calls. Use IR methods instead of managed methods about Max/Min (S/U). Follow-ups & Nits. * Add missing exception messages. Reintroduce slow path for Fmov_Vi. Implement slow path for Fmov_Si. * Switch to the new folder structure. Nits. * Impl. index-based relocation information. Impl. cache file version field. * Nit. * Address gdkchan comments. Mainly: - fixed cache file corruption issue on exit; - exposed a way to disable AOT on the GUI. * Address AcK77 comment. * Address Thealexbarney, jduncanator & emmauss comments. Header magic, CpuId (FI) & Aot -> Ptc. * Adaptation to the new application reloading system. Improvements to the call system of managed methods. Follow-ups. Nits. * Get the same boot times as on master when PTC is disabled. * Profiled Aot. * A32 support (#897). * #975 support (1 of 2). * #975 support (2 of 2). * Rebase fix & nits. * Some fixes and nits (still one bug left). * One fix & nits. * Tests fix (by gdk) & nits. * Support translations not only in high quality and rejit. Nits. * Added possibility to skip translations and continue execution, using `ESC` key. * Update SettingsWindow.cs * Update GLRenderer.cs * Update Ptc.cs * Disabled Profiled PTC by default as requested in the past by gdk. * Fix rejit bug. Increased number of parallel translations. Add stack unwinding stuffs support (1 of 2). Nits. * Add stack unwinding stuffs support (2 of 2). Tuned number of parallel translations. * Restored the ability to assemble jumps with 8-bit offset when Profiled PTC is disabled or during profiling. Modifications due to rebase. Nits. * Limited profiling of the functions to be translated to the addresses belonging to the range of static objects only. * Nits. * Nits. * Update Delegates.cs * Nit. * Update InstEmitSimdArithmetic.cs * Address riperiperi comments. * Fixed the issue of unjustifiably longer boot times at the second boot than at the first boot, measured at the same time or reference point and with the same number of translated functions. * Implemented a simple redundant load/save mechanism. Halved the value of Decoder.MaxInstsPerFunction more appropriate for the current performance of the Translator. Replaced by Logger.PrintError to Logger.PrintDebug in TexturePool.cs about the supposed invalid texture format to avoid the spawn of the log. Nits. * Nit. Improved Logger.PrintError in TexturePool.cs to avoid log spawn. Added missing code for FZ handling (in output) for fp max/min instructions (slow paths). * Add configuration migration for PTC Co-authored-by: Thog <me@thog.eu>
2020-06-16 20:28:02 +02:00
using System.Reflection;
Add most of the A32 instruction set to ARMeilleure (#897) * Implement TEQ and MOV (Imm16) * Initial work on A32 instructions + SVC. No tests yet, hangs in rtld. * Implement CLZ, fix BFI and BFC Now stops on SIMD initialization. * Exclusive access instructions, fix to mul, system instructions. Now gets to a break after SignalProcessWideKey64. * Better impl of UBFX, add UDIV and SDIV Now boots way further - now stuck on VMOV instruction. * Many more instructions, start on SIMD and testing framework. * Fix build issues * svc: Rework 32 bit codepath Fixing once and for all argument ordering issues. * Fix 32 bits stacktrace * hle debug: Add 32 bits dynamic section parsing * Fix highCq mode, add many tests, fix some instruction bugs Still suffers from critical malloc failure :weary: * Fix incorrect opcode decoders and a few more instructions. * Add a few instructions and fix others. re-disable highCq for now. Disabled the svc memory clear since i'm not sure about it. * Fix build * Fix typo in ordered/exclusive stores. * Implement some more instructions, fix others. Uxtab16/Sxtab16 are untested. * Begin impl of pairwise, some other instructions. * Add a few more instructions, a quick hack to fix svcs for now. * Add tests and fix issues with VTRN, VZIP, VUZP * Add a few more instructions, fix Vmul_1 encoding. * Fix way too many instruction bugs, add tests for some of the more important ones. * Fix HighCq, enable FastFP paths for some floating point instructions (not entirely sure why these were disabled, so important to note this commit exists) Branching has been removed in A32 shifts until I figure out if it's worth it * Cleanup Part 1 There should be no functional change between these next few commits. Should is the key word. (except for removing break handler) * Implement 32 bits syscalls Co-authored-by: riperiperi <rhy3756547@hotmail.com> Implement all 32 bits counterparts of the 64 bits syscalls we currently have. * Refactor part 2: Move index/subindex logic to Operand May have inadvertently fixed one (1) bug * Add FlushProcessDataCache32 * Address jd's comments * Remove 16 bit encodings from OpCodeTable Still need to catch some edge cases (operands that use the "F" flag) and make Q encodings with non-even indexes undefined. * Correct Fpscr handling for FP vector slow paths WIP * Add StandardFPSCRValue behaviour for all Arithmetic instructions * Add StandardFPSCRValue behaviour to compare instructions. * Force passing of fpcr to FPProcessException and FPUnpack. Reduces potential for code error significantly * OpCode cleanup * Remove urgency from DMB comment in MRRC DMB is currently a no-op via the instruction, so it should likely still be a no-op here. * Test Cleanup * Fix FPDefaultNaN on Ryzen CPUs * Improve some tests, fix some shift instructions, add slow path for Vadd * Fix Typo * More test cleanup * Flip order of Fx and index, to indicate that the operand's is the "base" * Remove Simd32 register type, use Int32 and Int64 for scalars like A64 does. * Reintroduce alignment to DecoderHelper (removed by accident) * One more realign as reading diffs is hard * Use I32 registers in A32 (part 2) Swap default integer register type based on current execution mode. * FPSCR flags as Registers (part 1) Still need to change NativeContext and ExecutionContext to allow getting/setting with the flag values. * Use I32 registers in A32 (part 1) * FPSCR flags as registers (part 2) Only CMP flags are on the registers right now. It could be useful to use more of the space in non-fast-float when implementing A32 flags accurately in the fast path. * Address Feedback * Correct FP->Int behaviour (should saturate) * Make branches made by writing to PC eligible for Rejit Greatly improves performance in most games. * Remove unused branching for Vtbl * RejitRequest as a class rather than a tuple Makes a lot more sense than storing tuples on a dictionary. * Add VMOVN, VSHR (imm), VSHRN (imm) and related tests * Re-order InstEmitSystem32 Alphabetical sorting. * Address Feedback Feedback from Ac_K, remove and sort usings. * Address Feedback 2 * Address Feedback from LDj3SNuD Opcode table reordered to have alphabetical sorting within groups, Vmaxnm and Vminnm have split names to be less ambiguous, SoftFloat nits, Test nits and Test simplification with ValueSource. * Add Debug Asserts to A32 helpers Mainly to prevent the shift ones from being used on I64 operands, as they expect I32 input for most operations (eg. carry flag setting), and expect I32 input for shift and boolean amounts. Most other helper functions don't take Operands, throw on out of range values, and take specific types of OpCode, so didn't need any asserts. * Use ConstF rather than creating an operand. (useful for pooling in future) * Move exclusive load to helper, reference call flag rather than literal 1. * Address LDj feedback (minus table flatten) one final look before it's all gone. the world is so beautiful. * Flatten OpCodeTable oh no * Address more table ordering * Call Flag as int on A32 Co-authored-by: Natalie C. <cyuubiapps@gmail.com> Co-authored-by: Thog <thog@protonmail.com>
2020-02-23 22:20:40 +01:00
using static ARMeilleure.Instructions.InstEmitHelper;
Reduce JIT GC allocations (#2515) * Turn `MemoryOperand` into a struct * Remove `IntrinsicOperation` * Remove `PhiNode` * Remove `Node` * Turn `Operand` into a struct * Turn `Operation` into a struct * Clean up pool management methods * Add `Arena` allocator * Move `OperationHelper` to `Operation.Factory` * Move `OperandHelper` to `Operand.Factory` * Optimize `Operation` a bit * Fix `Arena` initialization * Rename `NativeList<T>` to `ArenaList<T>` * Reduce `Operand` size from 88 to 56 bytes * Reduce `Operation` size from 56 to 40 bytes * Add optimistic interning of Register & Constant operands * Optimize `RegisterUsage` pass a bit * Optimize `RemoveUnusedNodes` pass a bit Iterating in reverse-order allows killing dependency chains in a single pass. * Fix PPTC symbols * Optimize `BasicBlock` a bit Reduce allocations from `_successor` & `DominanceFrontiers` * Fix `Operation` resize * Make `Arena` expandable Change the arena allocator to be expandable by allocating in pages, with some of them being pooled. Currently 32 pages are pooled. An LRU removal mechanism should probably be added to it. Apparently MHR can allocate bitmaps large enough to exceed the 16MB limit for the type. * Move `Arena` & `ArenaList` to `Common` * Remove `ThreadStaticPool` & co * Add `PhiOperation` * Reduce `Operand` size from 56 from 48 bytes * Add linear-probing to `Operand` intern table * Optimize `HybridAllocator` a bit * Add `Allocators` class * Tune `ArenaAllocator` sizes * Add page removal mechanism to `ArenaAllocator` Remove pages which have not been used for more than 5s after each reset. I am on fence if this would be better using a Gen2 callback object like the one in System.Buffers.ArrayPool<T>, to trim the pool. Because right now if a large translation happens, the pages will be freed only after a reset. This reset may not happen for a while because no new translation is hit, but the arena base sizes are rather small. * Fix `OOM` when allocating larger than page size in `ArenaAllocator` Tweak resizing mechanism for Operand.Uses and Assignemnts. * Optimize `Optimizer` a bit * Optimize `Operand.Add<T>/Remove<T>` a bit * Clean up `PreAllocator` * Fix phi insertion order Reduce codegen diffs. * Fix code alignment * Use new heuristics for degree of parallelism * Suppress warnings * Address gdkchan's feedback Renamed `GetValue()` to `GetValueUnsafe()` to make it more clear that `Operand.Value` should usually not be modified directly. * Add fast path to `ArenaAllocator` * Assembly for `ArenaAllocator.Allocate(ulong)`: .L0: mov rax, [rcx+0x18] lea r8, [rax+rdx] cmp r8, [rcx+0x10] ja short .L2 .L1: mov rdx, [rcx+8] add rax, [rdx+8] mov [rcx+0x18], r8 ret .L2: jmp ArenaAllocator.AllocateSlow(UInt64) A few variable/field had to be changed to ulong so that RyuJIT avoids emitting zero-extends. * Implement a new heuristic to free pooled pages. If an arena is used often, it is more likely that its pages will be needed, so the pages are kept for longer (e.g: during PPTC rebuild or burst sof compilations). If is not used often, then it is more likely that its pages will not be needed (e.g: after PPTC rebuild or bursts of compilations). * Address riperiperi's feedback * Use `EqualityComparer<T>` in `IntrusiveList<T>` Avoids a potential GC hole in `Equals(T, T)`.
2021-08-17 20:08:34 +02:00
using static ARMeilleure.IntermediateRepresentation.Operand.Factory;
Add most of the A32 instruction set to ARMeilleure (#897) * Implement TEQ and MOV (Imm16) * Initial work on A32 instructions + SVC. No tests yet, hangs in rtld. * Implement CLZ, fix BFI and BFC Now stops on SIMD initialization. * Exclusive access instructions, fix to mul, system instructions. Now gets to a break after SignalProcessWideKey64. * Better impl of UBFX, add UDIV and SDIV Now boots way further - now stuck on VMOV instruction. * Many more instructions, start on SIMD and testing framework. * Fix build issues * svc: Rework 32 bit codepath Fixing once and for all argument ordering issues. * Fix 32 bits stacktrace * hle debug: Add 32 bits dynamic section parsing * Fix highCq mode, add many tests, fix some instruction bugs Still suffers from critical malloc failure :weary: * Fix incorrect opcode decoders and a few more instructions. * Add a few instructions and fix others. re-disable highCq for now. Disabled the svc memory clear since i'm not sure about it. * Fix build * Fix typo in ordered/exclusive stores. * Implement some more instructions, fix others. Uxtab16/Sxtab16 are untested. * Begin impl of pairwise, some other instructions. * Add a few more instructions, a quick hack to fix svcs for now. * Add tests and fix issues with VTRN, VZIP, VUZP * Add a few more instructions, fix Vmul_1 encoding. * Fix way too many instruction bugs, add tests for some of the more important ones. * Fix HighCq, enable FastFP paths for some floating point instructions (not entirely sure why these were disabled, so important to note this commit exists) Branching has been removed in A32 shifts until I figure out if it's worth it * Cleanup Part 1 There should be no functional change between these next few commits. Should is the key word. (except for removing break handler) * Implement 32 bits syscalls Co-authored-by: riperiperi <rhy3756547@hotmail.com> Implement all 32 bits counterparts of the 64 bits syscalls we currently have. * Refactor part 2: Move index/subindex logic to Operand May have inadvertently fixed one (1) bug * Add FlushProcessDataCache32 * Address jd's comments * Remove 16 bit encodings from OpCodeTable Still need to catch some edge cases (operands that use the "F" flag) and make Q encodings with non-even indexes undefined. * Correct Fpscr handling for FP vector slow paths WIP * Add StandardFPSCRValue behaviour for all Arithmetic instructions * Add StandardFPSCRValue behaviour to compare instructions. * Force passing of fpcr to FPProcessException and FPUnpack. Reduces potential for code error significantly * OpCode cleanup * Remove urgency from DMB comment in MRRC DMB is currently a no-op via the instruction, so it should likely still be a no-op here. * Test Cleanup * Fix FPDefaultNaN on Ryzen CPUs * Improve some tests, fix some shift instructions, add slow path for Vadd * Fix Typo * More test cleanup * Flip order of Fx and index, to indicate that the operand's is the "base" * Remove Simd32 register type, use Int32 and Int64 for scalars like A64 does. * Reintroduce alignment to DecoderHelper (removed by accident) * One more realign as reading diffs is hard * Use I32 registers in A32 (part 2) Swap default integer register type based on current execution mode. * FPSCR flags as Registers (part 1) Still need to change NativeContext and ExecutionContext to allow getting/setting with the flag values. * Use I32 registers in A32 (part 1) * FPSCR flags as registers (part 2) Only CMP flags are on the registers right now. It could be useful to use more of the space in non-fast-float when implementing A32 flags accurately in the fast path. * Address Feedback * Correct FP->Int behaviour (should saturate) * Make branches made by writing to PC eligible for Rejit Greatly improves performance in most games. * Remove unused branching for Vtbl * RejitRequest as a class rather than a tuple Makes a lot more sense than storing tuples on a dictionary. * Add VMOVN, VSHR (imm), VSHRN (imm) and related tests * Re-order InstEmitSystem32 Alphabetical sorting. * Address Feedback Feedback from Ac_K, remove and sort usings. * Address Feedback 2 * Address Feedback from LDj3SNuD Opcode table reordered to have alphabetical sorting within groups, Vmaxnm and Vminnm have split names to be less ambiguous, SoftFloat nits, Test nits and Test simplification with ValueSource. * Add Debug Asserts to A32 helpers Mainly to prevent the shift ones from being used on I64 operands, as they expect I32 input for most operations (eg. carry flag setting), and expect I32 input for shift and boolean amounts. Most other helper functions don't take Operands, throw on out of range values, and take specific types of OpCode, so didn't need any asserts. * Use ConstF rather than creating an operand. (useful for pooling in future) * Move exclusive load to helper, reference call flag rather than literal 1. * Address LDj feedback (minus table flatten) one final look before it's all gone. the world is so beautiful. * Flatten OpCodeTable oh no * Address more table ordering * Call Flag as int on A32 Co-authored-by: Natalie C. <cyuubiapps@gmail.com> Co-authored-by: Thog <thog@protonmail.com>
2020-02-23 22:20:40 +01:00
namespace ARMeilleure.Instructions
{
static partial class InstEmit32
{
public static void Mcr(ArmEmitterContext context)
{
OpCode32System op = (OpCode32System)context.CurrOp;
if (op.Coproc != 15 || op.Opc1 != 0)
Add most of the A32 instruction set to ARMeilleure (#897) * Implement TEQ and MOV (Imm16) * Initial work on A32 instructions + SVC. No tests yet, hangs in rtld. * Implement CLZ, fix BFI and BFC Now stops on SIMD initialization. * Exclusive access instructions, fix to mul, system instructions. Now gets to a break after SignalProcessWideKey64. * Better impl of UBFX, add UDIV and SDIV Now boots way further - now stuck on VMOV instruction. * Many more instructions, start on SIMD and testing framework. * Fix build issues * svc: Rework 32 bit codepath Fixing once and for all argument ordering issues. * Fix 32 bits stacktrace * hle debug: Add 32 bits dynamic section parsing * Fix highCq mode, add many tests, fix some instruction bugs Still suffers from critical malloc failure :weary: * Fix incorrect opcode decoders and a few more instructions. * Add a few instructions and fix others. re-disable highCq for now. Disabled the svc memory clear since i'm not sure about it. * Fix build * Fix typo in ordered/exclusive stores. * Implement some more instructions, fix others. Uxtab16/Sxtab16 are untested. * Begin impl of pairwise, some other instructions. * Add a few more instructions, a quick hack to fix svcs for now. * Add tests and fix issues with VTRN, VZIP, VUZP * Add a few more instructions, fix Vmul_1 encoding. * Fix way too many instruction bugs, add tests for some of the more important ones. * Fix HighCq, enable FastFP paths for some floating point instructions (not entirely sure why these were disabled, so important to note this commit exists) Branching has been removed in A32 shifts until I figure out if it's worth it * Cleanup Part 1 There should be no functional change between these next few commits. Should is the key word. (except for removing break handler) * Implement 32 bits syscalls Co-authored-by: riperiperi <rhy3756547@hotmail.com> Implement all 32 bits counterparts of the 64 bits syscalls we currently have. * Refactor part 2: Move index/subindex logic to Operand May have inadvertently fixed one (1) bug * Add FlushProcessDataCache32 * Address jd's comments * Remove 16 bit encodings from OpCodeTable Still need to catch some edge cases (operands that use the "F" flag) and make Q encodings with non-even indexes undefined. * Correct Fpscr handling for FP vector slow paths WIP * Add StandardFPSCRValue behaviour for all Arithmetic instructions * Add StandardFPSCRValue behaviour to compare instructions. * Force passing of fpcr to FPProcessException and FPUnpack. Reduces potential for code error significantly * OpCode cleanup * Remove urgency from DMB comment in MRRC DMB is currently a no-op via the instruction, so it should likely still be a no-op here. * Test Cleanup * Fix FPDefaultNaN on Ryzen CPUs * Improve some tests, fix some shift instructions, add slow path for Vadd * Fix Typo * More test cleanup * Flip order of Fx and index, to indicate that the operand's is the "base" * Remove Simd32 register type, use Int32 and Int64 for scalars like A64 does. * Reintroduce alignment to DecoderHelper (removed by accident) * One more realign as reading diffs is hard * Use I32 registers in A32 (part 2) Swap default integer register type based on current execution mode. * FPSCR flags as Registers (part 1) Still need to change NativeContext and ExecutionContext to allow getting/setting with the flag values. * Use I32 registers in A32 (part 1) * FPSCR flags as registers (part 2) Only CMP flags are on the registers right now. It could be useful to use more of the space in non-fast-float when implementing A32 flags accurately in the fast path. * Address Feedback * Correct FP->Int behaviour (should saturate) * Make branches made by writing to PC eligible for Rejit Greatly improves performance in most games. * Remove unused branching for Vtbl * RejitRequest as a class rather than a tuple Makes a lot more sense than storing tuples on a dictionary. * Add VMOVN, VSHR (imm), VSHRN (imm) and related tests * Re-order InstEmitSystem32 Alphabetical sorting. * Address Feedback Feedback from Ac_K, remove and sort usings. * Address Feedback 2 * Address Feedback from LDj3SNuD Opcode table reordered to have alphabetical sorting within groups, Vmaxnm and Vminnm have split names to be less ambiguous, SoftFloat nits, Test nits and Test simplification with ValueSource. * Add Debug Asserts to A32 helpers Mainly to prevent the shift ones from being used on I64 operands, as they expect I32 input for most operations (eg. carry flag setting), and expect I32 input for shift and boolean amounts. Most other helper functions don't take Operands, throw on out of range values, and take specific types of OpCode, so didn't need any asserts. * Use ConstF rather than creating an operand. (useful for pooling in future) * Move exclusive load to helper, reference call flag rather than literal 1. * Address LDj feedback (minus table flatten) one final look before it's all gone. the world is so beautiful. * Flatten OpCodeTable oh no * Address more table ordering * Call Flag as int on A32 Co-authored-by: Natalie C. <cyuubiapps@gmail.com> Co-authored-by: Thog <thog@protonmail.com>
2020-02-23 22:20:40 +01:00
{
InstEmit.Und(context);
Add Profiled Persistent Translation Cache. (#769) * Delete DelegateTypes.cs * Delete DelegateCache.cs * Add files via upload * Update Horizon.cs * Update Program.cs * Update MainWindow.cs * Update Aot.cs * Update RelocEntry.cs * Update Translator.cs * Update MemoryManager.cs * Update InstEmitMemoryHelper.cs * Update Delegates.cs * Nit. * Nit. * Nit. * 10 fewer MSIL bytes for us * Add comment. Nits. * Update Translator.cs * Update Aot.cs * Nits. * Opt.. * Opt.. * Opt.. * Opt.. * Allow to change compression level. * Update MemoryManager.cs * Update Translator.cs * Manage corner cases during the save phase. Nits. * Update Aot.cs * Translator response tweak for Aot disabled. Nit. * Nit. * Nits. * Create DelegateHelpers.cs * Update Delegates.cs * Nit. * Nit. * Nits. * Fix due to #784. * Fixes due to #757 & #841. * Fix due to #846. * Fix due to #847. * Use MethodInfo for managed method calls. Use IR methods instead of managed methods about Max/Min (S/U). Follow-ups & Nits. * Add missing exception messages. Reintroduce slow path for Fmov_Vi. Implement slow path for Fmov_Si. * Switch to the new folder structure. Nits. * Impl. index-based relocation information. Impl. cache file version field. * Nit. * Address gdkchan comments. Mainly: - fixed cache file corruption issue on exit; - exposed a way to disable AOT on the GUI. * Address AcK77 comment. * Address Thealexbarney, jduncanator & emmauss comments. Header magic, CpuId (FI) & Aot -> Ptc. * Adaptation to the new application reloading system. Improvements to the call system of managed methods. Follow-ups. Nits. * Get the same boot times as on master when PTC is disabled. * Profiled Aot. * A32 support (#897). * #975 support (1 of 2). * #975 support (2 of 2). * Rebase fix & nits. * Some fixes and nits (still one bug left). * One fix & nits. * Tests fix (by gdk) & nits. * Support translations not only in high quality and rejit. Nits. * Added possibility to skip translations and continue execution, using `ESC` key. * Update SettingsWindow.cs * Update GLRenderer.cs * Update Ptc.cs * Disabled Profiled PTC by default as requested in the past by gdk. * Fix rejit bug. Increased number of parallel translations. Add stack unwinding stuffs support (1 of 2). Nits. * Add stack unwinding stuffs support (2 of 2). Tuned number of parallel translations. * Restored the ability to assemble jumps with 8-bit offset when Profiled PTC is disabled or during profiling. Modifications due to rebase. Nits. * Limited profiling of the functions to be translated to the addresses belonging to the range of static objects only. * Nits. * Nits. * Update Delegates.cs * Nit. * Update InstEmitSimdArithmetic.cs * Address riperiperi comments. * Fixed the issue of unjustifiably longer boot times at the second boot than at the first boot, measured at the same time or reference point and with the same number of translated functions. * Implemented a simple redundant load/save mechanism. Halved the value of Decoder.MaxInstsPerFunction more appropriate for the current performance of the Translator. Replaced by Logger.PrintError to Logger.PrintDebug in TexturePool.cs about the supposed invalid texture format to avoid the spawn of the log. Nits. * Nit. Improved Logger.PrintError in TexturePool.cs to avoid log spawn. Added missing code for FZ handling (in output) for fp max/min instructions (slow paths). * Add configuration migration for PTC Co-authored-by: Thog <me@thog.eu>
2020-06-16 20:28:02 +02:00
return;
Add most of the A32 instruction set to ARMeilleure (#897) * Implement TEQ and MOV (Imm16) * Initial work on A32 instructions + SVC. No tests yet, hangs in rtld. * Implement CLZ, fix BFI and BFC Now stops on SIMD initialization. * Exclusive access instructions, fix to mul, system instructions. Now gets to a break after SignalProcessWideKey64. * Better impl of UBFX, add UDIV and SDIV Now boots way further - now stuck on VMOV instruction. * Many more instructions, start on SIMD and testing framework. * Fix build issues * svc: Rework 32 bit codepath Fixing once and for all argument ordering issues. * Fix 32 bits stacktrace * hle debug: Add 32 bits dynamic section parsing * Fix highCq mode, add many tests, fix some instruction bugs Still suffers from critical malloc failure :weary: * Fix incorrect opcode decoders and a few more instructions. * Add a few instructions and fix others. re-disable highCq for now. Disabled the svc memory clear since i'm not sure about it. * Fix build * Fix typo in ordered/exclusive stores. * Implement some more instructions, fix others. Uxtab16/Sxtab16 are untested. * Begin impl of pairwise, some other instructions. * Add a few more instructions, a quick hack to fix svcs for now. * Add tests and fix issues with VTRN, VZIP, VUZP * Add a few more instructions, fix Vmul_1 encoding. * Fix way too many instruction bugs, add tests for some of the more important ones. * Fix HighCq, enable FastFP paths for some floating point instructions (not entirely sure why these were disabled, so important to note this commit exists) Branching has been removed in A32 shifts until I figure out if it's worth it * Cleanup Part 1 There should be no functional change between these next few commits. Should is the key word. (except for removing break handler) * Implement 32 bits syscalls Co-authored-by: riperiperi <rhy3756547@hotmail.com> Implement all 32 bits counterparts of the 64 bits syscalls we currently have. * Refactor part 2: Move index/subindex logic to Operand May have inadvertently fixed one (1) bug * Add FlushProcessDataCache32 * Address jd's comments * Remove 16 bit encodings from OpCodeTable Still need to catch some edge cases (operands that use the "F" flag) and make Q encodings with non-even indexes undefined. * Correct Fpscr handling for FP vector slow paths WIP * Add StandardFPSCRValue behaviour for all Arithmetic instructions * Add StandardFPSCRValue behaviour to compare instructions. * Force passing of fpcr to FPProcessException and FPUnpack. Reduces potential for code error significantly * OpCode cleanup * Remove urgency from DMB comment in MRRC DMB is currently a no-op via the instruction, so it should likely still be a no-op here. * Test Cleanup * Fix FPDefaultNaN on Ryzen CPUs * Improve some tests, fix some shift instructions, add slow path for Vadd * Fix Typo * More test cleanup * Flip order of Fx and index, to indicate that the operand's is the "base" * Remove Simd32 register type, use Int32 and Int64 for scalars like A64 does. * Reintroduce alignment to DecoderHelper (removed by accident) * One more realign as reading diffs is hard * Use I32 registers in A32 (part 2) Swap default integer register type based on current execution mode. * FPSCR flags as Registers (part 1) Still need to change NativeContext and ExecutionContext to allow getting/setting with the flag values. * Use I32 registers in A32 (part 1) * FPSCR flags as registers (part 2) Only CMP flags are on the registers right now. It could be useful to use more of the space in non-fast-float when implementing A32 flags accurately in the fast path. * Address Feedback * Correct FP->Int behaviour (should saturate) * Make branches made by writing to PC eligible for Rejit Greatly improves performance in most games. * Remove unused branching for Vtbl * RejitRequest as a class rather than a tuple Makes a lot more sense than storing tuples on a dictionary. * Add VMOVN, VSHR (imm), VSHRN (imm) and related tests * Re-order InstEmitSystem32 Alphabetical sorting. * Address Feedback Feedback from Ac_K, remove and sort usings. * Address Feedback 2 * Address Feedback from LDj3SNuD Opcode table reordered to have alphabetical sorting within groups, Vmaxnm and Vminnm have split names to be less ambiguous, SoftFloat nits, Test nits and Test simplification with ValueSource. * Add Debug Asserts to A32 helpers Mainly to prevent the shift ones from being used on I64 operands, as they expect I32 input for most operations (eg. carry flag setting), and expect I32 input for shift and boolean amounts. Most other helper functions don't take Operands, throw on out of range values, and take specific types of OpCode, so didn't need any asserts. * Use ConstF rather than creating an operand. (useful for pooling in future) * Move exclusive load to helper, reference call flag rather than literal 1. * Address LDj feedback (minus table flatten) one final look before it's all gone. the world is so beautiful. * Flatten OpCodeTable oh no * Address more table ordering * Call Flag as int on A32 Co-authored-by: Natalie C. <cyuubiapps@gmail.com> Co-authored-by: Thog <thog@protonmail.com>
2020-02-23 22:20:40 +01:00
}
switch (op.CRn)
{
case 13: // Process and Thread Info.
if (op.CRm != 0)
{
throw new NotImplementedException($"Unknown MRC CRm 0x{op.CRm:X} at 0x{op.Address:X} (0x{op.RawOpCode:X}).");
Add most of the A32 instruction set to ARMeilleure (#897) * Implement TEQ and MOV (Imm16) * Initial work on A32 instructions + SVC. No tests yet, hangs in rtld. * Implement CLZ, fix BFI and BFC Now stops on SIMD initialization. * Exclusive access instructions, fix to mul, system instructions. Now gets to a break after SignalProcessWideKey64. * Better impl of UBFX, add UDIV and SDIV Now boots way further - now stuck on VMOV instruction. * Many more instructions, start on SIMD and testing framework. * Fix build issues * svc: Rework 32 bit codepath Fixing once and for all argument ordering issues. * Fix 32 bits stacktrace * hle debug: Add 32 bits dynamic section parsing * Fix highCq mode, add many tests, fix some instruction bugs Still suffers from critical malloc failure :weary: * Fix incorrect opcode decoders and a few more instructions. * Add a few instructions and fix others. re-disable highCq for now. Disabled the svc memory clear since i'm not sure about it. * Fix build * Fix typo in ordered/exclusive stores. * Implement some more instructions, fix others. Uxtab16/Sxtab16 are untested. * Begin impl of pairwise, some other instructions. * Add a few more instructions, a quick hack to fix svcs for now. * Add tests and fix issues with VTRN, VZIP, VUZP * Add a few more instructions, fix Vmul_1 encoding. * Fix way too many instruction bugs, add tests for some of the more important ones. * Fix HighCq, enable FastFP paths for some floating point instructions (not entirely sure why these were disabled, so important to note this commit exists) Branching has been removed in A32 shifts until I figure out if it's worth it * Cleanup Part 1 There should be no functional change between these next few commits. Should is the key word. (except for removing break handler) * Implement 32 bits syscalls Co-authored-by: riperiperi <rhy3756547@hotmail.com> Implement all 32 bits counterparts of the 64 bits syscalls we currently have. * Refactor part 2: Move index/subindex logic to Operand May have inadvertently fixed one (1) bug * Add FlushProcessDataCache32 * Address jd's comments * Remove 16 bit encodings from OpCodeTable Still need to catch some edge cases (operands that use the "F" flag) and make Q encodings with non-even indexes undefined. * Correct Fpscr handling for FP vector slow paths WIP * Add StandardFPSCRValue behaviour for all Arithmetic instructions * Add StandardFPSCRValue behaviour to compare instructions. * Force passing of fpcr to FPProcessException and FPUnpack. Reduces potential for code error significantly * OpCode cleanup * Remove urgency from DMB comment in MRRC DMB is currently a no-op via the instruction, so it should likely still be a no-op here. * Test Cleanup * Fix FPDefaultNaN on Ryzen CPUs * Improve some tests, fix some shift instructions, add slow path for Vadd * Fix Typo * More test cleanup * Flip order of Fx and index, to indicate that the operand's is the "base" * Remove Simd32 register type, use Int32 and Int64 for scalars like A64 does. * Reintroduce alignment to DecoderHelper (removed by accident) * One more realign as reading diffs is hard * Use I32 registers in A32 (part 2) Swap default integer register type based on current execution mode. * FPSCR flags as Registers (part 1) Still need to change NativeContext and ExecutionContext to allow getting/setting with the flag values. * Use I32 registers in A32 (part 1) * FPSCR flags as registers (part 2) Only CMP flags are on the registers right now. It could be useful to use more of the space in non-fast-float when implementing A32 flags accurately in the fast path. * Address Feedback * Correct FP->Int behaviour (should saturate) * Make branches made by writing to PC eligible for Rejit Greatly improves performance in most games. * Remove unused branching for Vtbl * RejitRequest as a class rather than a tuple Makes a lot more sense than storing tuples on a dictionary. * Add VMOVN, VSHR (imm), VSHRN (imm) and related tests * Re-order InstEmitSystem32 Alphabetical sorting. * Address Feedback Feedback from Ac_K, remove and sort usings. * Address Feedback 2 * Address Feedback from LDj3SNuD Opcode table reordered to have alphabetical sorting within groups, Vmaxnm and Vminnm have split names to be less ambiguous, SoftFloat nits, Test nits and Test simplification with ValueSource. * Add Debug Asserts to A32 helpers Mainly to prevent the shift ones from being used on I64 operands, as they expect I32 input for most operations (eg. carry flag setting), and expect I32 input for shift and boolean amounts. Most other helper functions don't take Operands, throw on out of range values, and take specific types of OpCode, so didn't need any asserts. * Use ConstF rather than creating an operand. (useful for pooling in future) * Move exclusive load to helper, reference call flag rather than literal 1. * Address LDj feedback (minus table flatten) one final look before it's all gone. the world is so beautiful. * Flatten OpCodeTable oh no * Address more table ordering * Call Flag as int on A32 Co-authored-by: Natalie C. <cyuubiapps@gmail.com> Co-authored-by: Thog <thog@protonmail.com>
2020-02-23 22:20:40 +01:00
}
Add Profiled Persistent Translation Cache. (#769) * Delete DelegateTypes.cs * Delete DelegateCache.cs * Add files via upload * Update Horizon.cs * Update Program.cs * Update MainWindow.cs * Update Aot.cs * Update RelocEntry.cs * Update Translator.cs * Update MemoryManager.cs * Update InstEmitMemoryHelper.cs * Update Delegates.cs * Nit. * Nit. * Nit. * 10 fewer MSIL bytes for us * Add comment. Nits. * Update Translator.cs * Update Aot.cs * Nits. * Opt.. * Opt.. * Opt.. * Opt.. * Allow to change compression level. * Update MemoryManager.cs * Update Translator.cs * Manage corner cases during the save phase. Nits. * Update Aot.cs * Translator response tweak for Aot disabled. Nit. * Nit. * Nits. * Create DelegateHelpers.cs * Update Delegates.cs * Nit. * Nit. * Nits. * Fix due to #784. * Fixes due to #757 & #841. * Fix due to #846. * Fix due to #847. * Use MethodInfo for managed method calls. Use IR methods instead of managed methods about Max/Min (S/U). Follow-ups & Nits. * Add missing exception messages. Reintroduce slow path for Fmov_Vi. Implement slow path for Fmov_Si. * Switch to the new folder structure. Nits. * Impl. index-based relocation information. Impl. cache file version field. * Nit. * Address gdkchan comments. Mainly: - fixed cache file corruption issue on exit; - exposed a way to disable AOT on the GUI. * Address AcK77 comment. * Address Thealexbarney, jduncanator & emmauss comments. Header magic, CpuId (FI) & Aot -> Ptc. * Adaptation to the new application reloading system. Improvements to the call system of managed methods. Follow-ups. Nits. * Get the same boot times as on master when PTC is disabled. * Profiled Aot. * A32 support (#897). * #975 support (1 of 2). * #975 support (2 of 2). * Rebase fix & nits. * Some fixes and nits (still one bug left). * One fix & nits. * Tests fix (by gdk) & nits. * Support translations not only in high quality and rejit. Nits. * Added possibility to skip translations and continue execution, using `ESC` key. * Update SettingsWindow.cs * Update GLRenderer.cs * Update Ptc.cs * Disabled Profiled PTC by default as requested in the past by gdk. * Fix rejit bug. Increased number of parallel translations. Add stack unwinding stuffs support (1 of 2). Nits. * Add stack unwinding stuffs support (2 of 2). Tuned number of parallel translations. * Restored the ability to assemble jumps with 8-bit offset when Profiled PTC is disabled or during profiling. Modifications due to rebase. Nits. * Limited profiling of the functions to be translated to the addresses belonging to the range of static objects only. * Nits. * Nits. * Update Delegates.cs * Nit. * Update InstEmitSimdArithmetic.cs * Address riperiperi comments. * Fixed the issue of unjustifiably longer boot times at the second boot than at the first boot, measured at the same time or reference point and with the same number of translated functions. * Implemented a simple redundant load/save mechanism. Halved the value of Decoder.MaxInstsPerFunction more appropriate for the current performance of the Translator. Replaced by Logger.PrintError to Logger.PrintDebug in TexturePool.cs about the supposed invalid texture format to avoid the spawn of the log. Nits. * Nit. Improved Logger.PrintError in TexturePool.cs to avoid log spawn. Added missing code for FZ handling (in output) for fp max/min instructions (slow paths). * Add configuration migration for PTC Co-authored-by: Thog <me@thog.eu>
2020-06-16 20:28:02 +02:00
Add most of the A32 instruction set to ARMeilleure (#897) * Implement TEQ and MOV (Imm16) * Initial work on A32 instructions + SVC. No tests yet, hangs in rtld. * Implement CLZ, fix BFI and BFC Now stops on SIMD initialization. * Exclusive access instructions, fix to mul, system instructions. Now gets to a break after SignalProcessWideKey64. * Better impl of UBFX, add UDIV and SDIV Now boots way further - now stuck on VMOV instruction. * Many more instructions, start on SIMD and testing framework. * Fix build issues * svc: Rework 32 bit codepath Fixing once and for all argument ordering issues. * Fix 32 bits stacktrace * hle debug: Add 32 bits dynamic section parsing * Fix highCq mode, add many tests, fix some instruction bugs Still suffers from critical malloc failure :weary: * Fix incorrect opcode decoders and a few more instructions. * Add a few instructions and fix others. re-disable highCq for now. Disabled the svc memory clear since i'm not sure about it. * Fix build * Fix typo in ordered/exclusive stores. * Implement some more instructions, fix others. Uxtab16/Sxtab16 are untested. * Begin impl of pairwise, some other instructions. * Add a few more instructions, a quick hack to fix svcs for now. * Add tests and fix issues with VTRN, VZIP, VUZP * Add a few more instructions, fix Vmul_1 encoding. * Fix way too many instruction bugs, add tests for some of the more important ones. * Fix HighCq, enable FastFP paths for some floating point instructions (not entirely sure why these were disabled, so important to note this commit exists) Branching has been removed in A32 shifts until I figure out if it's worth it * Cleanup Part 1 There should be no functional change between these next few commits. Should is the key word. (except for removing break handler) * Implement 32 bits syscalls Co-authored-by: riperiperi <rhy3756547@hotmail.com> Implement all 32 bits counterparts of the 64 bits syscalls we currently have. * Refactor part 2: Move index/subindex logic to Operand May have inadvertently fixed one (1) bug * Add FlushProcessDataCache32 * Address jd's comments * Remove 16 bit encodings from OpCodeTable Still need to catch some edge cases (operands that use the "F" flag) and make Q encodings with non-even indexes undefined. * Correct Fpscr handling for FP vector slow paths WIP * Add StandardFPSCRValue behaviour for all Arithmetic instructions * Add StandardFPSCRValue behaviour to compare instructions. * Force passing of fpcr to FPProcessException and FPUnpack. Reduces potential for code error significantly * OpCode cleanup * Remove urgency from DMB comment in MRRC DMB is currently a no-op via the instruction, so it should likely still be a no-op here. * Test Cleanup * Fix FPDefaultNaN on Ryzen CPUs * Improve some tests, fix some shift instructions, add slow path for Vadd * Fix Typo * More test cleanup * Flip order of Fx and index, to indicate that the operand's is the "base" * Remove Simd32 register type, use Int32 and Int64 for scalars like A64 does. * Reintroduce alignment to DecoderHelper (removed by accident) * One more realign as reading diffs is hard * Use I32 registers in A32 (part 2) Swap default integer register type based on current execution mode. * FPSCR flags as Registers (part 1) Still need to change NativeContext and ExecutionContext to allow getting/setting with the flag values. * Use I32 registers in A32 (part 1) * FPSCR flags as registers (part 2) Only CMP flags are on the registers right now. It could be useful to use more of the space in non-fast-float when implementing A32 flags accurately in the fast path. * Address Feedback * Correct FP->Int behaviour (should saturate) * Make branches made by writing to PC eligible for Rejit Greatly improves performance in most games. * Remove unused branching for Vtbl * RejitRequest as a class rather than a tuple Makes a lot more sense than storing tuples on a dictionary. * Add VMOVN, VSHR (imm), VSHRN (imm) and related tests * Re-order InstEmitSystem32 Alphabetical sorting. * Address Feedback Feedback from Ac_K, remove and sort usings. * Address Feedback 2 * Address Feedback from LDj3SNuD Opcode table reordered to have alphabetical sorting within groups, Vmaxnm and Vminnm have split names to be less ambiguous, SoftFloat nits, Test nits and Test simplification with ValueSource. * Add Debug Asserts to A32 helpers Mainly to prevent the shift ones from being used on I64 operands, as they expect I32 input for most operations (eg. carry flag setting), and expect I32 input for shift and boolean amounts. Most other helper functions don't take Operands, throw on out of range values, and take specific types of OpCode, so didn't need any asserts. * Use ConstF rather than creating an operand. (useful for pooling in future) * Move exclusive load to helper, reference call flag rather than literal 1. * Address LDj feedback (minus table flatten) one final look before it's all gone. the world is so beautiful. * Flatten OpCodeTable oh no * Address more table ordering * Call Flag as int on A32 Co-authored-by: Natalie C. <cyuubiapps@gmail.com> Co-authored-by: Thog <thog@protonmail.com>
2020-02-23 22:20:40 +01:00
switch (op.Opc2)
{
case 2:
EmitSetTpidrEl0(context); return;
Add Profiled Persistent Translation Cache. (#769) * Delete DelegateTypes.cs * Delete DelegateCache.cs * Add files via upload * Update Horizon.cs * Update Program.cs * Update MainWindow.cs * Update Aot.cs * Update RelocEntry.cs * Update Translator.cs * Update MemoryManager.cs * Update InstEmitMemoryHelper.cs * Update Delegates.cs * Nit. * Nit. * Nit. * 10 fewer MSIL bytes for us * Add comment. Nits. * Update Translator.cs * Update Aot.cs * Nits. * Opt.. * Opt.. * Opt.. * Opt.. * Allow to change compression level. * Update MemoryManager.cs * Update Translator.cs * Manage corner cases during the save phase. Nits. * Update Aot.cs * Translator response tweak for Aot disabled. Nit. * Nit. * Nits. * Create DelegateHelpers.cs * Update Delegates.cs * Nit. * Nit. * Nits. * Fix due to #784. * Fixes due to #757 & #841. * Fix due to #846. * Fix due to #847. * Use MethodInfo for managed method calls. Use IR methods instead of managed methods about Max/Min (S/U). Follow-ups & Nits. * Add missing exception messages. Reintroduce slow path for Fmov_Vi. Implement slow path for Fmov_Si. * Switch to the new folder structure. Nits. * Impl. index-based relocation information. Impl. cache file version field. * Nit. * Address gdkchan comments. Mainly: - fixed cache file corruption issue on exit; - exposed a way to disable AOT on the GUI. * Address AcK77 comment. * Address Thealexbarney, jduncanator & emmauss comments. Header magic, CpuId (FI) & Aot -> Ptc. * Adaptation to the new application reloading system. Improvements to the call system of managed methods. Follow-ups. Nits. * Get the same boot times as on master when PTC is disabled. * Profiled Aot. * A32 support (#897). * #975 support (1 of 2). * #975 support (2 of 2). * Rebase fix & nits. * Some fixes and nits (still one bug left). * One fix & nits. * Tests fix (by gdk) & nits. * Support translations not only in high quality and rejit. Nits. * Added possibility to skip translations and continue execution, using `ESC` key. * Update SettingsWindow.cs * Update GLRenderer.cs * Update Ptc.cs * Disabled Profiled PTC by default as requested in the past by gdk. * Fix rejit bug. Increased number of parallel translations. Add stack unwinding stuffs support (1 of 2). Nits. * Add stack unwinding stuffs support (2 of 2). Tuned number of parallel translations. * Restored the ability to assemble jumps with 8-bit offset when Profiled PTC is disabled or during profiling. Modifications due to rebase. Nits. * Limited profiling of the functions to be translated to the addresses belonging to the range of static objects only. * Nits. * Nits. * Update Delegates.cs * Nit. * Update InstEmitSimdArithmetic.cs * Address riperiperi comments. * Fixed the issue of unjustifiably longer boot times at the second boot than at the first boot, measured at the same time or reference point and with the same number of translated functions. * Implemented a simple redundant load/save mechanism. Halved the value of Decoder.MaxInstsPerFunction more appropriate for the current performance of the Translator. Replaced by Logger.PrintError to Logger.PrintDebug in TexturePool.cs about the supposed invalid texture format to avoid the spawn of the log. Nits. * Nit. Improved Logger.PrintError in TexturePool.cs to avoid log spawn. Added missing code for FZ handling (in output) for fp max/min instructions (slow paths). * Add configuration migration for PTC Co-authored-by: Thog <me@thog.eu>
2020-06-16 20:28:02 +02:00
Add most of the A32 instruction set to ARMeilleure (#897) * Implement TEQ and MOV (Imm16) * Initial work on A32 instructions + SVC. No tests yet, hangs in rtld. * Implement CLZ, fix BFI and BFC Now stops on SIMD initialization. * Exclusive access instructions, fix to mul, system instructions. Now gets to a break after SignalProcessWideKey64. * Better impl of UBFX, add UDIV and SDIV Now boots way further - now stuck on VMOV instruction. * Many more instructions, start on SIMD and testing framework. * Fix build issues * svc: Rework 32 bit codepath Fixing once and for all argument ordering issues. * Fix 32 bits stacktrace * hle debug: Add 32 bits dynamic section parsing * Fix highCq mode, add many tests, fix some instruction bugs Still suffers from critical malloc failure :weary: * Fix incorrect opcode decoders and a few more instructions. * Add a few instructions and fix others. re-disable highCq for now. Disabled the svc memory clear since i'm not sure about it. * Fix build * Fix typo in ordered/exclusive stores. * Implement some more instructions, fix others. Uxtab16/Sxtab16 are untested. * Begin impl of pairwise, some other instructions. * Add a few more instructions, a quick hack to fix svcs for now. * Add tests and fix issues with VTRN, VZIP, VUZP * Add a few more instructions, fix Vmul_1 encoding. * Fix way too many instruction bugs, add tests for some of the more important ones. * Fix HighCq, enable FastFP paths for some floating point instructions (not entirely sure why these were disabled, so important to note this commit exists) Branching has been removed in A32 shifts until I figure out if it's worth it * Cleanup Part 1 There should be no functional change between these next few commits. Should is the key word. (except for removing break handler) * Implement 32 bits syscalls Co-authored-by: riperiperi <rhy3756547@hotmail.com> Implement all 32 bits counterparts of the 64 bits syscalls we currently have. * Refactor part 2: Move index/subindex logic to Operand May have inadvertently fixed one (1) bug * Add FlushProcessDataCache32 * Address jd's comments * Remove 16 bit encodings from OpCodeTable Still need to catch some edge cases (operands that use the "F" flag) and make Q encodings with non-even indexes undefined. * Correct Fpscr handling for FP vector slow paths WIP * Add StandardFPSCRValue behaviour for all Arithmetic instructions * Add StandardFPSCRValue behaviour to compare instructions. * Force passing of fpcr to FPProcessException and FPUnpack. Reduces potential for code error significantly * OpCode cleanup * Remove urgency from DMB comment in MRRC DMB is currently a no-op via the instruction, so it should likely still be a no-op here. * Test Cleanup * Fix FPDefaultNaN on Ryzen CPUs * Improve some tests, fix some shift instructions, add slow path for Vadd * Fix Typo * More test cleanup * Flip order of Fx and index, to indicate that the operand's is the "base" * Remove Simd32 register type, use Int32 and Int64 for scalars like A64 does. * Reintroduce alignment to DecoderHelper (removed by accident) * One more realign as reading diffs is hard * Use I32 registers in A32 (part 2) Swap default integer register type based on current execution mode. * FPSCR flags as Registers (part 1) Still need to change NativeContext and ExecutionContext to allow getting/setting with the flag values. * Use I32 registers in A32 (part 1) * FPSCR flags as registers (part 2) Only CMP flags are on the registers right now. It could be useful to use more of the space in non-fast-float when implementing A32 flags accurately in the fast path. * Address Feedback * Correct FP->Int behaviour (should saturate) * Make branches made by writing to PC eligible for Rejit Greatly improves performance in most games. * Remove unused branching for Vtbl * RejitRequest as a class rather than a tuple Makes a lot more sense than storing tuples on a dictionary. * Add VMOVN, VSHR (imm), VSHRN (imm) and related tests * Re-order InstEmitSystem32 Alphabetical sorting. * Address Feedback Feedback from Ac_K, remove and sort usings. * Address Feedback 2 * Address Feedback from LDj3SNuD Opcode table reordered to have alphabetical sorting within groups, Vmaxnm and Vminnm have split names to be less ambiguous, SoftFloat nits, Test nits and Test simplification with ValueSource. * Add Debug Asserts to A32 helpers Mainly to prevent the shift ones from being used on I64 operands, as they expect I32 input for most operations (eg. carry flag setting), and expect I32 input for shift and boolean amounts. Most other helper functions don't take Operands, throw on out of range values, and take specific types of OpCode, so didn't need any asserts. * Use ConstF rather than creating an operand. (useful for pooling in future) * Move exclusive load to helper, reference call flag rather than literal 1. * Address LDj feedback (minus table flatten) one final look before it's all gone. the world is so beautiful. * Flatten OpCodeTable oh no * Address more table ordering * Call Flag as int on A32 Co-authored-by: Natalie C. <cyuubiapps@gmail.com> Co-authored-by: Thog <thog@protonmail.com>
2020-02-23 22:20:40 +01:00
default:
throw new NotImplementedException($"Unknown MRC Opc2 0x{op.Opc2:X} at 0x{op.Address:X} (0x{op.RawOpCode:X}).");
Add most of the A32 instruction set to ARMeilleure (#897) * Implement TEQ and MOV (Imm16) * Initial work on A32 instructions + SVC. No tests yet, hangs in rtld. * Implement CLZ, fix BFI and BFC Now stops on SIMD initialization. * Exclusive access instructions, fix to mul, system instructions. Now gets to a break after SignalProcessWideKey64. * Better impl of UBFX, add UDIV and SDIV Now boots way further - now stuck on VMOV instruction. * Many more instructions, start on SIMD and testing framework. * Fix build issues * svc: Rework 32 bit codepath Fixing once and for all argument ordering issues. * Fix 32 bits stacktrace * hle debug: Add 32 bits dynamic section parsing * Fix highCq mode, add many tests, fix some instruction bugs Still suffers from critical malloc failure :weary: * Fix incorrect opcode decoders and a few more instructions. * Add a few instructions and fix others. re-disable highCq for now. Disabled the svc memory clear since i'm not sure about it. * Fix build * Fix typo in ordered/exclusive stores. * Implement some more instructions, fix others. Uxtab16/Sxtab16 are untested. * Begin impl of pairwise, some other instructions. * Add a few more instructions, a quick hack to fix svcs for now. * Add tests and fix issues with VTRN, VZIP, VUZP * Add a few more instructions, fix Vmul_1 encoding. * Fix way too many instruction bugs, add tests for some of the more important ones. * Fix HighCq, enable FastFP paths for some floating point instructions (not entirely sure why these were disabled, so important to note this commit exists) Branching has been removed in A32 shifts until I figure out if it's worth it * Cleanup Part 1 There should be no functional change between these next few commits. Should is the key word. (except for removing break handler) * Implement 32 bits syscalls Co-authored-by: riperiperi <rhy3756547@hotmail.com> Implement all 32 bits counterparts of the 64 bits syscalls we currently have. * Refactor part 2: Move index/subindex logic to Operand May have inadvertently fixed one (1) bug * Add FlushProcessDataCache32 * Address jd's comments * Remove 16 bit encodings from OpCodeTable Still need to catch some edge cases (operands that use the "F" flag) and make Q encodings with non-even indexes undefined. * Correct Fpscr handling for FP vector slow paths WIP * Add StandardFPSCRValue behaviour for all Arithmetic instructions * Add StandardFPSCRValue behaviour to compare instructions. * Force passing of fpcr to FPProcessException and FPUnpack. Reduces potential for code error significantly * OpCode cleanup * Remove urgency from DMB comment in MRRC DMB is currently a no-op via the instruction, so it should likely still be a no-op here. * Test Cleanup * Fix FPDefaultNaN on Ryzen CPUs * Improve some tests, fix some shift instructions, add slow path for Vadd * Fix Typo * More test cleanup * Flip order of Fx and index, to indicate that the operand's is the "base" * Remove Simd32 register type, use Int32 and Int64 for scalars like A64 does. * Reintroduce alignment to DecoderHelper (removed by accident) * One more realign as reading diffs is hard * Use I32 registers in A32 (part 2) Swap default integer register type based on current execution mode. * FPSCR flags as Registers (part 1) Still need to change NativeContext and ExecutionContext to allow getting/setting with the flag values. * Use I32 registers in A32 (part 1) * FPSCR flags as registers (part 2) Only CMP flags are on the registers right now. It could be useful to use more of the space in non-fast-float when implementing A32 flags accurately in the fast path. * Address Feedback * Correct FP->Int behaviour (should saturate) * Make branches made by writing to PC eligible for Rejit Greatly improves performance in most games. * Remove unused branching for Vtbl * RejitRequest as a class rather than a tuple Makes a lot more sense than storing tuples on a dictionary. * Add VMOVN, VSHR (imm), VSHRN (imm) and related tests * Re-order InstEmitSystem32 Alphabetical sorting. * Address Feedback Feedback from Ac_K, remove and sort usings. * Address Feedback 2 * Address Feedback from LDj3SNuD Opcode table reordered to have alphabetical sorting within groups, Vmaxnm and Vminnm have split names to be less ambiguous, SoftFloat nits, Test nits and Test simplification with ValueSource. * Add Debug Asserts to A32 helpers Mainly to prevent the shift ones from being used on I64 operands, as they expect I32 input for most operations (eg. carry flag setting), and expect I32 input for shift and boolean amounts. Most other helper functions don't take Operands, throw on out of range values, and take specific types of OpCode, so didn't need any asserts. * Use ConstF rather than creating an operand. (useful for pooling in future) * Move exclusive load to helper, reference call flag rather than literal 1. * Address LDj feedback (minus table flatten) one final look before it's all gone. the world is so beautiful. * Flatten OpCodeTable oh no * Address more table ordering * Call Flag as int on A32 Co-authored-by: Natalie C. <cyuubiapps@gmail.com> Co-authored-by: Thog <thog@protonmail.com>
2020-02-23 22:20:40 +01:00
}
Add Profiled Persistent Translation Cache. (#769) * Delete DelegateTypes.cs * Delete DelegateCache.cs * Add files via upload * Update Horizon.cs * Update Program.cs * Update MainWindow.cs * Update Aot.cs * Update RelocEntry.cs * Update Translator.cs * Update MemoryManager.cs * Update InstEmitMemoryHelper.cs * Update Delegates.cs * Nit. * Nit. * Nit. * 10 fewer MSIL bytes for us * Add comment. Nits. * Update Translator.cs * Update Aot.cs * Nits. * Opt.. * Opt.. * Opt.. * Opt.. * Allow to change compression level. * Update MemoryManager.cs * Update Translator.cs * Manage corner cases during the save phase. Nits. * Update Aot.cs * Translator response tweak for Aot disabled. Nit. * Nit. * Nits. * Create DelegateHelpers.cs * Update Delegates.cs * Nit. * Nit. * Nits. * Fix due to #784. * Fixes due to #757 & #841. * Fix due to #846. * Fix due to #847. * Use MethodInfo for managed method calls. Use IR methods instead of managed methods about Max/Min (S/U). Follow-ups & Nits. * Add missing exception messages. Reintroduce slow path for Fmov_Vi. Implement slow path for Fmov_Si. * Switch to the new folder structure. Nits. * Impl. index-based relocation information. Impl. cache file version field. * Nit. * Address gdkchan comments. Mainly: - fixed cache file corruption issue on exit; - exposed a way to disable AOT on the GUI. * Address AcK77 comment. * Address Thealexbarney, jduncanator & emmauss comments. Header magic, CpuId (FI) & Aot -> Ptc. * Adaptation to the new application reloading system. Improvements to the call system of managed methods. Follow-ups. Nits. * Get the same boot times as on master when PTC is disabled. * Profiled Aot. * A32 support (#897). * #975 support (1 of 2). * #975 support (2 of 2). * Rebase fix & nits. * Some fixes and nits (still one bug left). * One fix & nits. * Tests fix (by gdk) & nits. * Support translations not only in high quality and rejit. Nits. * Added possibility to skip translations and continue execution, using `ESC` key. * Update SettingsWindow.cs * Update GLRenderer.cs * Update Ptc.cs * Disabled Profiled PTC by default as requested in the past by gdk. * Fix rejit bug. Increased number of parallel translations. Add stack unwinding stuffs support (1 of 2). Nits. * Add stack unwinding stuffs support (2 of 2). Tuned number of parallel translations. * Restored the ability to assemble jumps with 8-bit offset when Profiled PTC is disabled or during profiling. Modifications due to rebase. Nits. * Limited profiling of the functions to be translated to the addresses belonging to the range of static objects only. * Nits. * Nits. * Update Delegates.cs * Nit. * Update InstEmitSimdArithmetic.cs * Address riperiperi comments. * Fixed the issue of unjustifiably longer boot times at the second boot than at the first boot, measured at the same time or reference point and with the same number of translated functions. * Implemented a simple redundant load/save mechanism. Halved the value of Decoder.MaxInstsPerFunction more appropriate for the current performance of the Translator. Replaced by Logger.PrintError to Logger.PrintDebug in TexturePool.cs about the supposed invalid texture format to avoid the spawn of the log. Nits. * Nit. Improved Logger.PrintError in TexturePool.cs to avoid log spawn. Added missing code for FZ handling (in output) for fp max/min instructions (slow paths). * Add configuration migration for PTC Co-authored-by: Thog <me@thog.eu>
2020-06-16 20:28:02 +02:00
Add most of the A32 instruction set to ARMeilleure (#897) * Implement TEQ and MOV (Imm16) * Initial work on A32 instructions + SVC. No tests yet, hangs in rtld. * Implement CLZ, fix BFI and BFC Now stops on SIMD initialization. * Exclusive access instructions, fix to mul, system instructions. Now gets to a break after SignalProcessWideKey64. * Better impl of UBFX, add UDIV and SDIV Now boots way further - now stuck on VMOV instruction. * Many more instructions, start on SIMD and testing framework. * Fix build issues * svc: Rework 32 bit codepath Fixing once and for all argument ordering issues. * Fix 32 bits stacktrace * hle debug: Add 32 bits dynamic section parsing * Fix highCq mode, add many tests, fix some instruction bugs Still suffers from critical malloc failure :weary: * Fix incorrect opcode decoders and a few more instructions. * Add a few instructions and fix others. re-disable highCq for now. Disabled the svc memory clear since i'm not sure about it. * Fix build * Fix typo in ordered/exclusive stores. * Implement some more instructions, fix others. Uxtab16/Sxtab16 are untested. * Begin impl of pairwise, some other instructions. * Add a few more instructions, a quick hack to fix svcs for now. * Add tests and fix issues with VTRN, VZIP, VUZP * Add a few more instructions, fix Vmul_1 encoding. * Fix way too many instruction bugs, add tests for some of the more important ones. * Fix HighCq, enable FastFP paths for some floating point instructions (not entirely sure why these were disabled, so important to note this commit exists) Branching has been removed in A32 shifts until I figure out if it's worth it * Cleanup Part 1 There should be no functional change between these next few commits. Should is the key word. (except for removing break handler) * Implement 32 bits syscalls Co-authored-by: riperiperi <rhy3756547@hotmail.com> Implement all 32 bits counterparts of the 64 bits syscalls we currently have. * Refactor part 2: Move index/subindex logic to Operand May have inadvertently fixed one (1) bug * Add FlushProcessDataCache32 * Address jd's comments * Remove 16 bit encodings from OpCodeTable Still need to catch some edge cases (operands that use the "F" flag) and make Q encodings with non-even indexes undefined. * Correct Fpscr handling for FP vector slow paths WIP * Add StandardFPSCRValue behaviour for all Arithmetic instructions * Add StandardFPSCRValue behaviour to compare instructions. * Force passing of fpcr to FPProcessException and FPUnpack. Reduces potential for code error significantly * OpCode cleanup * Remove urgency from DMB comment in MRRC DMB is currently a no-op via the instruction, so it should likely still be a no-op here. * Test Cleanup * Fix FPDefaultNaN on Ryzen CPUs * Improve some tests, fix some shift instructions, add slow path for Vadd * Fix Typo * More test cleanup * Flip order of Fx and index, to indicate that the operand's is the "base" * Remove Simd32 register type, use Int32 and Int64 for scalars like A64 does. * Reintroduce alignment to DecoderHelper (removed by accident) * One more realign as reading diffs is hard * Use I32 registers in A32 (part 2) Swap default integer register type based on current execution mode. * FPSCR flags as Registers (part 1) Still need to change NativeContext and ExecutionContext to allow getting/setting with the flag values. * Use I32 registers in A32 (part 1) * FPSCR flags as registers (part 2) Only CMP flags are on the registers right now. It could be useful to use more of the space in non-fast-float when implementing A32 flags accurately in the fast path. * Address Feedback * Correct FP->Int behaviour (should saturate) * Make branches made by writing to PC eligible for Rejit Greatly improves performance in most games. * Remove unused branching for Vtbl * RejitRequest as a class rather than a tuple Makes a lot more sense than storing tuples on a dictionary. * Add VMOVN, VSHR (imm), VSHRN (imm) and related tests * Re-order InstEmitSystem32 Alphabetical sorting. * Address Feedback Feedback from Ac_K, remove and sort usings. * Address Feedback 2 * Address Feedback from LDj3SNuD Opcode table reordered to have alphabetical sorting within groups, Vmaxnm and Vminnm have split names to be less ambiguous, SoftFloat nits, Test nits and Test simplification with ValueSource. * Add Debug Asserts to A32 helpers Mainly to prevent the shift ones from being used on I64 operands, as they expect I32 input for most operations (eg. carry flag setting), and expect I32 input for shift and boolean amounts. Most other helper functions don't take Operands, throw on out of range values, and take specific types of OpCode, so didn't need any asserts. * Use ConstF rather than creating an operand. (useful for pooling in future) * Move exclusive load to helper, reference call flag rather than literal 1. * Address LDj feedback (minus table flatten) one final look before it's all gone. the world is so beautiful. * Flatten OpCodeTable oh no * Address more table ordering * Call Flag as int on A32 Co-authored-by: Natalie C. <cyuubiapps@gmail.com> Co-authored-by: Thog <thog@protonmail.com>
2020-02-23 22:20:40 +01:00
case 7:
switch (op.CRm) // Cache and Memory barrier.
{
case 10:
switch (op.Opc2)
{
case 5: // Data Memory Barrier Register.
return; // No-op.
Add Profiled Persistent Translation Cache. (#769) * Delete DelegateTypes.cs * Delete DelegateCache.cs * Add files via upload * Update Horizon.cs * Update Program.cs * Update MainWindow.cs * Update Aot.cs * Update RelocEntry.cs * Update Translator.cs * Update MemoryManager.cs * Update InstEmitMemoryHelper.cs * Update Delegates.cs * Nit. * Nit. * Nit. * 10 fewer MSIL bytes for us * Add comment. Nits. * Update Translator.cs * Update Aot.cs * Nits. * Opt.. * Opt.. * Opt.. * Opt.. * Allow to change compression level. * Update MemoryManager.cs * Update Translator.cs * Manage corner cases during the save phase. Nits. * Update Aot.cs * Translator response tweak for Aot disabled. Nit. * Nit. * Nits. * Create DelegateHelpers.cs * Update Delegates.cs * Nit. * Nit. * Nits. * Fix due to #784. * Fixes due to #757 & #841. * Fix due to #846. * Fix due to #847. * Use MethodInfo for managed method calls. Use IR methods instead of managed methods about Max/Min (S/U). Follow-ups & Nits. * Add missing exception messages. Reintroduce slow path for Fmov_Vi. Implement slow path for Fmov_Si. * Switch to the new folder structure. Nits. * Impl. index-based relocation information. Impl. cache file version field. * Nit. * Address gdkchan comments. Mainly: - fixed cache file corruption issue on exit; - exposed a way to disable AOT on the GUI. * Address AcK77 comment. * Address Thealexbarney, jduncanator & emmauss comments. Header magic, CpuId (FI) & Aot -> Ptc. * Adaptation to the new application reloading system. Improvements to the call system of managed methods. Follow-ups. Nits. * Get the same boot times as on master when PTC is disabled. * Profiled Aot. * A32 support (#897). * #975 support (1 of 2). * #975 support (2 of 2). * Rebase fix & nits. * Some fixes and nits (still one bug left). * One fix & nits. * Tests fix (by gdk) & nits. * Support translations not only in high quality and rejit. Nits. * Added possibility to skip translations and continue execution, using `ESC` key. * Update SettingsWindow.cs * Update GLRenderer.cs * Update Ptc.cs * Disabled Profiled PTC by default as requested in the past by gdk. * Fix rejit bug. Increased number of parallel translations. Add stack unwinding stuffs support (1 of 2). Nits. * Add stack unwinding stuffs support (2 of 2). Tuned number of parallel translations. * Restored the ability to assemble jumps with 8-bit offset when Profiled PTC is disabled or during profiling. Modifications due to rebase. Nits. * Limited profiling of the functions to be translated to the addresses belonging to the range of static objects only. * Nits. * Nits. * Update Delegates.cs * Nit. * Update InstEmitSimdArithmetic.cs * Address riperiperi comments. * Fixed the issue of unjustifiably longer boot times at the second boot than at the first boot, measured at the same time or reference point and with the same number of translated functions. * Implemented a simple redundant load/save mechanism. Halved the value of Decoder.MaxInstsPerFunction more appropriate for the current performance of the Translator. Replaced by Logger.PrintError to Logger.PrintDebug in TexturePool.cs about the supposed invalid texture format to avoid the spawn of the log. Nits. * Nit. Improved Logger.PrintError in TexturePool.cs to avoid log spawn. Added missing code for FZ handling (in output) for fp max/min instructions (slow paths). * Add configuration migration for PTC Co-authored-by: Thog <me@thog.eu>
2020-06-16 20:28:02 +02:00
Add most of the A32 instruction set to ARMeilleure (#897) * Implement TEQ and MOV (Imm16) * Initial work on A32 instructions + SVC. No tests yet, hangs in rtld. * Implement CLZ, fix BFI and BFC Now stops on SIMD initialization. * Exclusive access instructions, fix to mul, system instructions. Now gets to a break after SignalProcessWideKey64. * Better impl of UBFX, add UDIV and SDIV Now boots way further - now stuck on VMOV instruction. * Many more instructions, start on SIMD and testing framework. * Fix build issues * svc: Rework 32 bit codepath Fixing once and for all argument ordering issues. * Fix 32 bits stacktrace * hle debug: Add 32 bits dynamic section parsing * Fix highCq mode, add many tests, fix some instruction bugs Still suffers from critical malloc failure :weary: * Fix incorrect opcode decoders and a few more instructions. * Add a few instructions and fix others. re-disable highCq for now. Disabled the svc memory clear since i'm not sure about it. * Fix build * Fix typo in ordered/exclusive stores. * Implement some more instructions, fix others. Uxtab16/Sxtab16 are untested. * Begin impl of pairwise, some other instructions. * Add a few more instructions, a quick hack to fix svcs for now. * Add tests and fix issues with VTRN, VZIP, VUZP * Add a few more instructions, fix Vmul_1 encoding. * Fix way too many instruction bugs, add tests for some of the more important ones. * Fix HighCq, enable FastFP paths for some floating point instructions (not entirely sure why these were disabled, so important to note this commit exists) Branching has been removed in A32 shifts until I figure out if it's worth it * Cleanup Part 1 There should be no functional change between these next few commits. Should is the key word. (except for removing break handler) * Implement 32 bits syscalls Co-authored-by: riperiperi <rhy3756547@hotmail.com> Implement all 32 bits counterparts of the 64 bits syscalls we currently have. * Refactor part 2: Move index/subindex logic to Operand May have inadvertently fixed one (1) bug * Add FlushProcessDataCache32 * Address jd's comments * Remove 16 bit encodings from OpCodeTable Still need to catch some edge cases (operands that use the "F" flag) and make Q encodings with non-even indexes undefined. * Correct Fpscr handling for FP vector slow paths WIP * Add StandardFPSCRValue behaviour for all Arithmetic instructions * Add StandardFPSCRValue behaviour to compare instructions. * Force passing of fpcr to FPProcessException and FPUnpack. Reduces potential for code error significantly * OpCode cleanup * Remove urgency from DMB comment in MRRC DMB is currently a no-op via the instruction, so it should likely still be a no-op here. * Test Cleanup * Fix FPDefaultNaN on Ryzen CPUs * Improve some tests, fix some shift instructions, add slow path for Vadd * Fix Typo * More test cleanup * Flip order of Fx and index, to indicate that the operand's is the "base" * Remove Simd32 register type, use Int32 and Int64 for scalars like A64 does. * Reintroduce alignment to DecoderHelper (removed by accident) * One more realign as reading diffs is hard * Use I32 registers in A32 (part 2) Swap default integer register type based on current execution mode. * FPSCR flags as Registers (part 1) Still need to change NativeContext and ExecutionContext to allow getting/setting with the flag values. * Use I32 registers in A32 (part 1) * FPSCR flags as registers (part 2) Only CMP flags are on the registers right now. It could be useful to use more of the space in non-fast-float when implementing A32 flags accurately in the fast path. * Address Feedback * Correct FP->Int behaviour (should saturate) * Make branches made by writing to PC eligible for Rejit Greatly improves performance in most games. * Remove unused branching for Vtbl * RejitRequest as a class rather than a tuple Makes a lot more sense than storing tuples on a dictionary. * Add VMOVN, VSHR (imm), VSHRN (imm) and related tests * Re-order InstEmitSystem32 Alphabetical sorting. * Address Feedback Feedback from Ac_K, remove and sort usings. * Address Feedback 2 * Address Feedback from LDj3SNuD Opcode table reordered to have alphabetical sorting within groups, Vmaxnm and Vminnm have split names to be less ambiguous, SoftFloat nits, Test nits and Test simplification with ValueSource. * Add Debug Asserts to A32 helpers Mainly to prevent the shift ones from being used on I64 operands, as they expect I32 input for most operations (eg. carry flag setting), and expect I32 input for shift and boolean amounts. Most other helper functions don't take Operands, throw on out of range values, and take specific types of OpCode, so didn't need any asserts. * Use ConstF rather than creating an operand. (useful for pooling in future) * Move exclusive load to helper, reference call flag rather than literal 1. * Address LDj feedback (minus table flatten) one final look before it's all gone. the world is so beautiful. * Flatten OpCodeTable oh no * Address more table ordering * Call Flag as int on A32 Co-authored-by: Natalie C. <cyuubiapps@gmail.com> Co-authored-by: Thog <thog@protonmail.com>
2020-02-23 22:20:40 +01:00
default:
throw new NotImplementedException($"Unknown MRC Opc2 0x{op.Opc2:X16} at 0x{op.Address:X16} (0x{op.RawOpCode:X}).");
Add most of the A32 instruction set to ARMeilleure (#897) * Implement TEQ and MOV (Imm16) * Initial work on A32 instructions + SVC. No tests yet, hangs in rtld. * Implement CLZ, fix BFI and BFC Now stops on SIMD initialization. * Exclusive access instructions, fix to mul, system instructions. Now gets to a break after SignalProcessWideKey64. * Better impl of UBFX, add UDIV and SDIV Now boots way further - now stuck on VMOV instruction. * Many more instructions, start on SIMD and testing framework. * Fix build issues * svc: Rework 32 bit codepath Fixing once and for all argument ordering issues. * Fix 32 bits stacktrace * hle debug: Add 32 bits dynamic section parsing * Fix highCq mode, add many tests, fix some instruction bugs Still suffers from critical malloc failure :weary: * Fix incorrect opcode decoders and a few more instructions. * Add a few instructions and fix others. re-disable highCq for now. Disabled the svc memory clear since i'm not sure about it. * Fix build * Fix typo in ordered/exclusive stores. * Implement some more instructions, fix others. Uxtab16/Sxtab16 are untested. * Begin impl of pairwise, some other instructions. * Add a few more instructions, a quick hack to fix svcs for now. * Add tests and fix issues with VTRN, VZIP, VUZP * Add a few more instructions, fix Vmul_1 encoding. * Fix way too many instruction bugs, add tests for some of the more important ones. * Fix HighCq, enable FastFP paths for some floating point instructions (not entirely sure why these were disabled, so important to note this commit exists) Branching has been removed in A32 shifts until I figure out if it's worth it * Cleanup Part 1 There should be no functional change between these next few commits. Should is the key word. (except for removing break handler) * Implement 32 bits syscalls Co-authored-by: riperiperi <rhy3756547@hotmail.com> Implement all 32 bits counterparts of the 64 bits syscalls we currently have. * Refactor part 2: Move index/subindex logic to Operand May have inadvertently fixed one (1) bug * Add FlushProcessDataCache32 * Address jd's comments * Remove 16 bit encodings from OpCodeTable Still need to catch some edge cases (operands that use the "F" flag) and make Q encodings with non-even indexes undefined. * Correct Fpscr handling for FP vector slow paths WIP * Add StandardFPSCRValue behaviour for all Arithmetic instructions * Add StandardFPSCRValue behaviour to compare instructions. * Force passing of fpcr to FPProcessException and FPUnpack. Reduces potential for code error significantly * OpCode cleanup * Remove urgency from DMB comment in MRRC DMB is currently a no-op via the instruction, so it should likely still be a no-op here. * Test Cleanup * Fix FPDefaultNaN on Ryzen CPUs * Improve some tests, fix some shift instructions, add slow path for Vadd * Fix Typo * More test cleanup * Flip order of Fx and index, to indicate that the operand's is the "base" * Remove Simd32 register type, use Int32 and Int64 for scalars like A64 does. * Reintroduce alignment to DecoderHelper (removed by accident) * One more realign as reading diffs is hard * Use I32 registers in A32 (part 2) Swap default integer register type based on current execution mode. * FPSCR flags as Registers (part 1) Still need to change NativeContext and ExecutionContext to allow getting/setting with the flag values. * Use I32 registers in A32 (part 1) * FPSCR flags as registers (part 2) Only CMP flags are on the registers right now. It could be useful to use more of the space in non-fast-float when implementing A32 flags accurately in the fast path. * Address Feedback * Correct FP->Int behaviour (should saturate) * Make branches made by writing to PC eligible for Rejit Greatly improves performance in most games. * Remove unused branching for Vtbl * RejitRequest as a class rather than a tuple Makes a lot more sense than storing tuples on a dictionary. * Add VMOVN, VSHR (imm), VSHRN (imm) and related tests * Re-order InstEmitSystem32 Alphabetical sorting. * Address Feedback Feedback from Ac_K, remove and sort usings. * Address Feedback 2 * Address Feedback from LDj3SNuD Opcode table reordered to have alphabetical sorting within groups, Vmaxnm and Vminnm have split names to be less ambiguous, SoftFloat nits, Test nits and Test simplification with ValueSource. * Add Debug Asserts to A32 helpers Mainly to prevent the shift ones from being used on I64 operands, as they expect I32 input for most operations (eg. carry flag setting), and expect I32 input for shift and boolean amounts. Most other helper functions don't take Operands, throw on out of range values, and take specific types of OpCode, so didn't need any asserts. * Use ConstF rather than creating an operand. (useful for pooling in future) * Move exclusive load to helper, reference call flag rather than literal 1. * Address LDj feedback (minus table flatten) one final look before it's all gone. the world is so beautiful. * Flatten OpCodeTable oh no * Address more table ordering * Call Flag as int on A32 Co-authored-by: Natalie C. <cyuubiapps@gmail.com> Co-authored-by: Thog <thog@protonmail.com>
2020-02-23 22:20:40 +01:00
}
Add Profiled Persistent Translation Cache. (#769) * Delete DelegateTypes.cs * Delete DelegateCache.cs * Add files via upload * Update Horizon.cs * Update Program.cs * Update MainWindow.cs * Update Aot.cs * Update RelocEntry.cs * Update Translator.cs * Update MemoryManager.cs * Update InstEmitMemoryHelper.cs * Update Delegates.cs * Nit. * Nit. * Nit. * 10 fewer MSIL bytes for us * Add comment. Nits. * Update Translator.cs * Update Aot.cs * Nits. * Opt.. * Opt.. * Opt.. * Opt.. * Allow to change compression level. * Update MemoryManager.cs * Update Translator.cs * Manage corner cases during the save phase. Nits. * Update Aot.cs * Translator response tweak for Aot disabled. Nit. * Nit. * Nits. * Create DelegateHelpers.cs * Update Delegates.cs * Nit. * Nit. * Nits. * Fix due to #784. * Fixes due to #757 & #841. * Fix due to #846. * Fix due to #847. * Use MethodInfo for managed method calls. Use IR methods instead of managed methods about Max/Min (S/U). Follow-ups & Nits. * Add missing exception messages. Reintroduce slow path for Fmov_Vi. Implement slow path for Fmov_Si. * Switch to the new folder structure. Nits. * Impl. index-based relocation information. Impl. cache file version field. * Nit. * Address gdkchan comments. Mainly: - fixed cache file corruption issue on exit; - exposed a way to disable AOT on the GUI. * Address AcK77 comment. * Address Thealexbarney, jduncanator & emmauss comments. Header magic, CpuId (FI) & Aot -> Ptc. * Adaptation to the new application reloading system. Improvements to the call system of managed methods. Follow-ups. Nits. * Get the same boot times as on master when PTC is disabled. * Profiled Aot. * A32 support (#897). * #975 support (1 of 2). * #975 support (2 of 2). * Rebase fix & nits. * Some fixes and nits (still one bug left). * One fix & nits. * Tests fix (by gdk) & nits. * Support translations not only in high quality and rejit. Nits. * Added possibility to skip translations and continue execution, using `ESC` key. * Update SettingsWindow.cs * Update GLRenderer.cs * Update Ptc.cs * Disabled Profiled PTC by default as requested in the past by gdk. * Fix rejit bug. Increased number of parallel translations. Add stack unwinding stuffs support (1 of 2). Nits. * Add stack unwinding stuffs support (2 of 2). Tuned number of parallel translations. * Restored the ability to assemble jumps with 8-bit offset when Profiled PTC is disabled or during profiling. Modifications due to rebase. Nits. * Limited profiling of the functions to be translated to the addresses belonging to the range of static objects only. * Nits. * Nits. * Update Delegates.cs * Nit. * Update InstEmitSimdArithmetic.cs * Address riperiperi comments. * Fixed the issue of unjustifiably longer boot times at the second boot than at the first boot, measured at the same time or reference point and with the same number of translated functions. * Implemented a simple redundant load/save mechanism. Halved the value of Decoder.MaxInstsPerFunction more appropriate for the current performance of the Translator. Replaced by Logger.PrintError to Logger.PrintDebug in TexturePool.cs about the supposed invalid texture format to avoid the spawn of the log. Nits. * Nit. Improved Logger.PrintError in TexturePool.cs to avoid log spawn. Added missing code for FZ handling (in output) for fp max/min instructions (slow paths). * Add configuration migration for PTC Co-authored-by: Thog <me@thog.eu>
2020-06-16 20:28:02 +02:00
Add most of the A32 instruction set to ARMeilleure (#897) * Implement TEQ and MOV (Imm16) * Initial work on A32 instructions + SVC. No tests yet, hangs in rtld. * Implement CLZ, fix BFI and BFC Now stops on SIMD initialization. * Exclusive access instructions, fix to mul, system instructions. Now gets to a break after SignalProcessWideKey64. * Better impl of UBFX, add UDIV and SDIV Now boots way further - now stuck on VMOV instruction. * Many more instructions, start on SIMD and testing framework. * Fix build issues * svc: Rework 32 bit codepath Fixing once and for all argument ordering issues. * Fix 32 bits stacktrace * hle debug: Add 32 bits dynamic section parsing * Fix highCq mode, add many tests, fix some instruction bugs Still suffers from critical malloc failure :weary: * Fix incorrect opcode decoders and a few more instructions. * Add a few instructions and fix others. re-disable highCq for now. Disabled the svc memory clear since i'm not sure about it. * Fix build * Fix typo in ordered/exclusive stores. * Implement some more instructions, fix others. Uxtab16/Sxtab16 are untested. * Begin impl of pairwise, some other instructions. * Add a few more instructions, a quick hack to fix svcs for now. * Add tests and fix issues with VTRN, VZIP, VUZP * Add a few more instructions, fix Vmul_1 encoding. * Fix way too many instruction bugs, add tests for some of the more important ones. * Fix HighCq, enable FastFP paths for some floating point instructions (not entirely sure why these were disabled, so important to note this commit exists) Branching has been removed in A32 shifts until I figure out if it's worth it * Cleanup Part 1 There should be no functional change between these next few commits. Should is the key word. (except for removing break handler) * Implement 32 bits syscalls Co-authored-by: riperiperi <rhy3756547@hotmail.com> Implement all 32 bits counterparts of the 64 bits syscalls we currently have. * Refactor part 2: Move index/subindex logic to Operand May have inadvertently fixed one (1) bug * Add FlushProcessDataCache32 * Address jd's comments * Remove 16 bit encodings from OpCodeTable Still need to catch some edge cases (operands that use the "F" flag) and make Q encodings with non-even indexes undefined. * Correct Fpscr handling for FP vector slow paths WIP * Add StandardFPSCRValue behaviour for all Arithmetic instructions * Add StandardFPSCRValue behaviour to compare instructions. * Force passing of fpcr to FPProcessException and FPUnpack. Reduces potential for code error significantly * OpCode cleanup * Remove urgency from DMB comment in MRRC DMB is currently a no-op via the instruction, so it should likely still be a no-op here. * Test Cleanup * Fix FPDefaultNaN on Ryzen CPUs * Improve some tests, fix some shift instructions, add slow path for Vadd * Fix Typo * More test cleanup * Flip order of Fx and index, to indicate that the operand's is the "base" * Remove Simd32 register type, use Int32 and Int64 for scalars like A64 does. * Reintroduce alignment to DecoderHelper (removed by accident) * One more realign as reading diffs is hard * Use I32 registers in A32 (part 2) Swap default integer register type based on current execution mode. * FPSCR flags as Registers (part 1) Still need to change NativeContext and ExecutionContext to allow getting/setting with the flag values. * Use I32 registers in A32 (part 1) * FPSCR flags as registers (part 2) Only CMP flags are on the registers right now. It could be useful to use more of the space in non-fast-float when implementing A32 flags accurately in the fast path. * Address Feedback * Correct FP->Int behaviour (should saturate) * Make branches made by writing to PC eligible for Rejit Greatly improves performance in most games. * Remove unused branching for Vtbl * RejitRequest as a class rather than a tuple Makes a lot more sense than storing tuples on a dictionary. * Add VMOVN, VSHR (imm), VSHRN (imm) and related tests * Re-order InstEmitSystem32 Alphabetical sorting. * Address Feedback Feedback from Ac_K, remove and sort usings. * Address Feedback 2 * Address Feedback from LDj3SNuD Opcode table reordered to have alphabetical sorting within groups, Vmaxnm and Vminnm have split names to be less ambiguous, SoftFloat nits, Test nits and Test simplification with ValueSource. * Add Debug Asserts to A32 helpers Mainly to prevent the shift ones from being used on I64 operands, as they expect I32 input for most operations (eg. carry flag setting), and expect I32 input for shift and boolean amounts. Most other helper functions don't take Operands, throw on out of range values, and take specific types of OpCode, so didn't need any asserts. * Use ConstF rather than creating an operand. (useful for pooling in future) * Move exclusive load to helper, reference call flag rather than literal 1. * Address LDj feedback (minus table flatten) one final look before it's all gone. the world is so beautiful. * Flatten OpCodeTable oh no * Address more table ordering * Call Flag as int on A32 Co-authored-by: Natalie C. <cyuubiapps@gmail.com> Co-authored-by: Thog <thog@protonmail.com>
2020-02-23 22:20:40 +01:00
default:
throw new NotImplementedException($"Unknown MRC CRm 0x{op.CRm:X16} at 0x{op.Address:X16} (0x{op.RawOpCode:X}).");
Add most of the A32 instruction set to ARMeilleure (#897) * Implement TEQ and MOV (Imm16) * Initial work on A32 instructions + SVC. No tests yet, hangs in rtld. * Implement CLZ, fix BFI and BFC Now stops on SIMD initialization. * Exclusive access instructions, fix to mul, system instructions. Now gets to a break after SignalProcessWideKey64. * Better impl of UBFX, add UDIV and SDIV Now boots way further - now stuck on VMOV instruction. * Many more instructions, start on SIMD and testing framework. * Fix build issues * svc: Rework 32 bit codepath Fixing once and for all argument ordering issues. * Fix 32 bits stacktrace * hle debug: Add 32 bits dynamic section parsing * Fix highCq mode, add many tests, fix some instruction bugs Still suffers from critical malloc failure :weary: * Fix incorrect opcode decoders and a few more instructions. * Add a few instructions and fix others. re-disable highCq for now. Disabled the svc memory clear since i'm not sure about it. * Fix build * Fix typo in ordered/exclusive stores. * Implement some more instructions, fix others. Uxtab16/Sxtab16 are untested. * Begin impl of pairwise, some other instructions. * Add a few more instructions, a quick hack to fix svcs for now. * Add tests and fix issues with VTRN, VZIP, VUZP * Add a few more instructions, fix Vmul_1 encoding. * Fix way too many instruction bugs, add tests for some of the more important ones. * Fix HighCq, enable FastFP paths for some floating point instructions (not entirely sure why these were disabled, so important to note this commit exists) Branching has been removed in A32 shifts until I figure out if it's worth it * Cleanup Part 1 There should be no functional change between these next few commits. Should is the key word. (except for removing break handler) * Implement 32 bits syscalls Co-authored-by: riperiperi <rhy3756547@hotmail.com> Implement all 32 bits counterparts of the 64 bits syscalls we currently have. * Refactor part 2: Move index/subindex logic to Operand May have inadvertently fixed one (1) bug * Add FlushProcessDataCache32 * Address jd's comments * Remove 16 bit encodings from OpCodeTable Still need to catch some edge cases (operands that use the "F" flag) and make Q encodings with non-even indexes undefined. * Correct Fpscr handling for FP vector slow paths WIP * Add StandardFPSCRValue behaviour for all Arithmetic instructions * Add StandardFPSCRValue behaviour to compare instructions. * Force passing of fpcr to FPProcessException and FPUnpack. Reduces potential for code error significantly * OpCode cleanup * Remove urgency from DMB comment in MRRC DMB is currently a no-op via the instruction, so it should likely still be a no-op here. * Test Cleanup * Fix FPDefaultNaN on Ryzen CPUs * Improve some tests, fix some shift instructions, add slow path for Vadd * Fix Typo * More test cleanup * Flip order of Fx and index, to indicate that the operand's is the "base" * Remove Simd32 register type, use Int32 and Int64 for scalars like A64 does. * Reintroduce alignment to DecoderHelper (removed by accident) * One more realign as reading diffs is hard * Use I32 registers in A32 (part 2) Swap default integer register type based on current execution mode. * FPSCR flags as Registers (part 1) Still need to change NativeContext and ExecutionContext to allow getting/setting with the flag values. * Use I32 registers in A32 (part 1) * FPSCR flags as registers (part 2) Only CMP flags are on the registers right now. It could be useful to use more of the space in non-fast-float when implementing A32 flags accurately in the fast path. * Address Feedback * Correct FP->Int behaviour (should saturate) * Make branches made by writing to PC eligible for Rejit Greatly improves performance in most games. * Remove unused branching for Vtbl * RejitRequest as a class rather than a tuple Makes a lot more sense than storing tuples on a dictionary. * Add VMOVN, VSHR (imm), VSHRN (imm) and related tests * Re-order InstEmitSystem32 Alphabetical sorting. * Address Feedback Feedback from Ac_K, remove and sort usings. * Address Feedback 2 * Address Feedback from LDj3SNuD Opcode table reordered to have alphabetical sorting within groups, Vmaxnm and Vminnm have split names to be less ambiguous, SoftFloat nits, Test nits and Test simplification with ValueSource. * Add Debug Asserts to A32 helpers Mainly to prevent the shift ones from being used on I64 operands, as they expect I32 input for most operations (eg. carry flag setting), and expect I32 input for shift and boolean amounts. Most other helper functions don't take Operands, throw on out of range values, and take specific types of OpCode, so didn't need any asserts. * Use ConstF rather than creating an operand. (useful for pooling in future) * Move exclusive load to helper, reference call flag rather than literal 1. * Address LDj feedback (minus table flatten) one final look before it's all gone. the world is so beautiful. * Flatten OpCodeTable oh no * Address more table ordering * Call Flag as int on A32 Co-authored-by: Natalie C. <cyuubiapps@gmail.com> Co-authored-by: Thog <thog@protonmail.com>
2020-02-23 22:20:40 +01:00
}
Add Profiled Persistent Translation Cache. (#769) * Delete DelegateTypes.cs * Delete DelegateCache.cs * Add files via upload * Update Horizon.cs * Update Program.cs * Update MainWindow.cs * Update Aot.cs * Update RelocEntry.cs * Update Translator.cs * Update MemoryManager.cs * Update InstEmitMemoryHelper.cs * Update Delegates.cs * Nit. * Nit. * Nit. * 10 fewer MSIL bytes for us * Add comment. Nits. * Update Translator.cs * Update Aot.cs * Nits. * Opt.. * Opt.. * Opt.. * Opt.. * Allow to change compression level. * Update MemoryManager.cs * Update Translator.cs * Manage corner cases during the save phase. Nits. * Update Aot.cs * Translator response tweak for Aot disabled. Nit. * Nit. * Nits. * Create DelegateHelpers.cs * Update Delegates.cs * Nit. * Nit. * Nits. * Fix due to #784. * Fixes due to #757 & #841. * Fix due to #846. * Fix due to #847. * Use MethodInfo for managed method calls. Use IR methods instead of managed methods about Max/Min (S/U). Follow-ups & Nits. * Add missing exception messages. Reintroduce slow path for Fmov_Vi. Implement slow path for Fmov_Si. * Switch to the new folder structure. Nits. * Impl. index-based relocation information. Impl. cache file version field. * Nit. * Address gdkchan comments. Mainly: - fixed cache file corruption issue on exit; - exposed a way to disable AOT on the GUI. * Address AcK77 comment. * Address Thealexbarney, jduncanator & emmauss comments. Header magic, CpuId (FI) & Aot -> Ptc. * Adaptation to the new application reloading system. Improvements to the call system of managed methods. Follow-ups. Nits. * Get the same boot times as on master when PTC is disabled. * Profiled Aot. * A32 support (#897). * #975 support (1 of 2). * #975 support (2 of 2). * Rebase fix & nits. * Some fixes and nits (still one bug left). * One fix & nits. * Tests fix (by gdk) & nits. * Support translations not only in high quality and rejit. Nits. * Added possibility to skip translations and continue execution, using `ESC` key. * Update SettingsWindow.cs * Update GLRenderer.cs * Update Ptc.cs * Disabled Profiled PTC by default as requested in the past by gdk. * Fix rejit bug. Increased number of parallel translations. Add stack unwinding stuffs support (1 of 2). Nits. * Add stack unwinding stuffs support (2 of 2). Tuned number of parallel translations. * Restored the ability to assemble jumps with 8-bit offset when Profiled PTC is disabled or during profiling. Modifications due to rebase. Nits. * Limited profiling of the functions to be translated to the addresses belonging to the range of static objects only. * Nits. * Nits. * Update Delegates.cs * Nit. * Update InstEmitSimdArithmetic.cs * Address riperiperi comments. * Fixed the issue of unjustifiably longer boot times at the second boot than at the first boot, measured at the same time or reference point and with the same number of translated functions. * Implemented a simple redundant load/save mechanism. Halved the value of Decoder.MaxInstsPerFunction more appropriate for the current performance of the Translator. Replaced by Logger.PrintError to Logger.PrintDebug in TexturePool.cs about the supposed invalid texture format to avoid the spawn of the log. Nits. * Nit. Improved Logger.PrintError in TexturePool.cs to avoid log spawn. Added missing code for FZ handling (in output) for fp max/min instructions (slow paths). * Add configuration migration for PTC Co-authored-by: Thog <me@thog.eu>
2020-06-16 20:28:02 +02:00
default:
Add most of the A32 instruction set to ARMeilleure (#897) * Implement TEQ and MOV (Imm16) * Initial work on A32 instructions + SVC. No tests yet, hangs in rtld. * Implement CLZ, fix BFI and BFC Now stops on SIMD initialization. * Exclusive access instructions, fix to mul, system instructions. Now gets to a break after SignalProcessWideKey64. * Better impl of UBFX, add UDIV and SDIV Now boots way further - now stuck on VMOV instruction. * Many more instructions, start on SIMD and testing framework. * Fix build issues * svc: Rework 32 bit codepath Fixing once and for all argument ordering issues. * Fix 32 bits stacktrace * hle debug: Add 32 bits dynamic section parsing * Fix highCq mode, add many tests, fix some instruction bugs Still suffers from critical malloc failure :weary: * Fix incorrect opcode decoders and a few more instructions. * Add a few instructions and fix others. re-disable highCq for now. Disabled the svc memory clear since i'm not sure about it. * Fix build * Fix typo in ordered/exclusive stores. * Implement some more instructions, fix others. Uxtab16/Sxtab16 are untested. * Begin impl of pairwise, some other instructions. * Add a few more instructions, a quick hack to fix svcs for now. * Add tests and fix issues with VTRN, VZIP, VUZP * Add a few more instructions, fix Vmul_1 encoding. * Fix way too many instruction bugs, add tests for some of the more important ones. * Fix HighCq, enable FastFP paths for some floating point instructions (not entirely sure why these were disabled, so important to note this commit exists) Branching has been removed in A32 shifts until I figure out if it's worth it * Cleanup Part 1 There should be no functional change between these next few commits. Should is the key word. (except for removing break handler) * Implement 32 bits syscalls Co-authored-by: riperiperi <rhy3756547@hotmail.com> Implement all 32 bits counterparts of the 64 bits syscalls we currently have. * Refactor part 2: Move index/subindex logic to Operand May have inadvertently fixed one (1) bug * Add FlushProcessDataCache32 * Address jd's comments * Remove 16 bit encodings from OpCodeTable Still need to catch some edge cases (operands that use the "F" flag) and make Q encodings with non-even indexes undefined. * Correct Fpscr handling for FP vector slow paths WIP * Add StandardFPSCRValue behaviour for all Arithmetic instructions * Add StandardFPSCRValue behaviour to compare instructions. * Force passing of fpcr to FPProcessException and FPUnpack. Reduces potential for code error significantly * OpCode cleanup * Remove urgency from DMB comment in MRRC DMB is currently a no-op via the instruction, so it should likely still be a no-op here. * Test Cleanup * Fix FPDefaultNaN on Ryzen CPUs * Improve some tests, fix some shift instructions, add slow path for Vadd * Fix Typo * More test cleanup * Flip order of Fx and index, to indicate that the operand's is the "base" * Remove Simd32 register type, use Int32 and Int64 for scalars like A64 does. * Reintroduce alignment to DecoderHelper (removed by accident) * One more realign as reading diffs is hard * Use I32 registers in A32 (part 2) Swap default integer register type based on current execution mode. * FPSCR flags as Registers (part 1) Still need to change NativeContext and ExecutionContext to allow getting/setting with the flag values. * Use I32 registers in A32 (part 1) * FPSCR flags as registers (part 2) Only CMP flags are on the registers right now. It could be useful to use more of the space in non-fast-float when implementing A32 flags accurately in the fast path. * Address Feedback * Correct FP->Int behaviour (should saturate) * Make branches made by writing to PC eligible for Rejit Greatly improves performance in most games. * Remove unused branching for Vtbl * RejitRequest as a class rather than a tuple Makes a lot more sense than storing tuples on a dictionary. * Add VMOVN, VSHR (imm), VSHRN (imm) and related tests * Re-order InstEmitSystem32 Alphabetical sorting. * Address Feedback Feedback from Ac_K, remove and sort usings. * Address Feedback 2 * Address Feedback from LDj3SNuD Opcode table reordered to have alphabetical sorting within groups, Vmaxnm and Vminnm have split names to be less ambiguous, SoftFloat nits, Test nits and Test simplification with ValueSource. * Add Debug Asserts to A32 helpers Mainly to prevent the shift ones from being used on I64 operands, as they expect I32 input for most operations (eg. carry flag setting), and expect I32 input for shift and boolean amounts. Most other helper functions don't take Operands, throw on out of range values, and take specific types of OpCode, so didn't need any asserts. * Use ConstF rather than creating an operand. (useful for pooling in future) * Move exclusive load to helper, reference call flag rather than literal 1. * Address LDj feedback (minus table flatten) one final look before it's all gone. the world is so beautiful. * Flatten OpCodeTable oh no * Address more table ordering * Call Flag as int on A32 Co-authored-by: Natalie C. <cyuubiapps@gmail.com> Co-authored-by: Thog <thog@protonmail.com>
2020-02-23 22:20:40 +01:00
throw new NotImplementedException($"Unknown MRC 0x{op.RawOpCode:X8} at 0x{op.Address:X16}.");
}
}
public static void Mrc(ArmEmitterContext context)
{
OpCode32System op = (OpCode32System)context.CurrOp;
if (op.Coproc != 15 || op.Opc1 != 0)
Add most of the A32 instruction set to ARMeilleure (#897) * Implement TEQ and MOV (Imm16) * Initial work on A32 instructions + SVC. No tests yet, hangs in rtld. * Implement CLZ, fix BFI and BFC Now stops on SIMD initialization. * Exclusive access instructions, fix to mul, system instructions. Now gets to a break after SignalProcessWideKey64. * Better impl of UBFX, add UDIV and SDIV Now boots way further - now stuck on VMOV instruction. * Many more instructions, start on SIMD and testing framework. * Fix build issues * svc: Rework 32 bit codepath Fixing once and for all argument ordering issues. * Fix 32 bits stacktrace * hle debug: Add 32 bits dynamic section parsing * Fix highCq mode, add many tests, fix some instruction bugs Still suffers from critical malloc failure :weary: * Fix incorrect opcode decoders and a few more instructions. * Add a few instructions and fix others. re-disable highCq for now. Disabled the svc memory clear since i'm not sure about it. * Fix build * Fix typo in ordered/exclusive stores. * Implement some more instructions, fix others. Uxtab16/Sxtab16 are untested. * Begin impl of pairwise, some other instructions. * Add a few more instructions, a quick hack to fix svcs for now. * Add tests and fix issues with VTRN, VZIP, VUZP * Add a few more instructions, fix Vmul_1 encoding. * Fix way too many instruction bugs, add tests for some of the more important ones. * Fix HighCq, enable FastFP paths for some floating point instructions (not entirely sure why these were disabled, so important to note this commit exists) Branching has been removed in A32 shifts until I figure out if it's worth it * Cleanup Part 1 There should be no functional change between these next few commits. Should is the key word. (except for removing break handler) * Implement 32 bits syscalls Co-authored-by: riperiperi <rhy3756547@hotmail.com> Implement all 32 bits counterparts of the 64 bits syscalls we currently have. * Refactor part 2: Move index/subindex logic to Operand May have inadvertently fixed one (1) bug * Add FlushProcessDataCache32 * Address jd's comments * Remove 16 bit encodings from OpCodeTable Still need to catch some edge cases (operands that use the "F" flag) and make Q encodings with non-even indexes undefined. * Correct Fpscr handling for FP vector slow paths WIP * Add StandardFPSCRValue behaviour for all Arithmetic instructions * Add StandardFPSCRValue behaviour to compare instructions. * Force passing of fpcr to FPProcessException and FPUnpack. Reduces potential for code error significantly * OpCode cleanup * Remove urgency from DMB comment in MRRC DMB is currently a no-op via the instruction, so it should likely still be a no-op here. * Test Cleanup * Fix FPDefaultNaN on Ryzen CPUs * Improve some tests, fix some shift instructions, add slow path for Vadd * Fix Typo * More test cleanup * Flip order of Fx and index, to indicate that the operand's is the "base" * Remove Simd32 register type, use Int32 and Int64 for scalars like A64 does. * Reintroduce alignment to DecoderHelper (removed by accident) * One more realign as reading diffs is hard * Use I32 registers in A32 (part 2) Swap default integer register type based on current execution mode. * FPSCR flags as Registers (part 1) Still need to change NativeContext and ExecutionContext to allow getting/setting with the flag values. * Use I32 registers in A32 (part 1) * FPSCR flags as registers (part 2) Only CMP flags are on the registers right now. It could be useful to use more of the space in non-fast-float when implementing A32 flags accurately in the fast path. * Address Feedback * Correct FP->Int behaviour (should saturate) * Make branches made by writing to PC eligible for Rejit Greatly improves performance in most games. * Remove unused branching for Vtbl * RejitRequest as a class rather than a tuple Makes a lot more sense than storing tuples on a dictionary. * Add VMOVN, VSHR (imm), VSHRN (imm) and related tests * Re-order InstEmitSystem32 Alphabetical sorting. * Address Feedback Feedback from Ac_K, remove and sort usings. * Address Feedback 2 * Address Feedback from LDj3SNuD Opcode table reordered to have alphabetical sorting within groups, Vmaxnm and Vminnm have split names to be less ambiguous, SoftFloat nits, Test nits and Test simplification with ValueSource. * Add Debug Asserts to A32 helpers Mainly to prevent the shift ones from being used on I64 operands, as they expect I32 input for most operations (eg. carry flag setting), and expect I32 input for shift and boolean amounts. Most other helper functions don't take Operands, throw on out of range values, and take specific types of OpCode, so didn't need any asserts. * Use ConstF rather than creating an operand. (useful for pooling in future) * Move exclusive load to helper, reference call flag rather than literal 1. * Address LDj feedback (minus table flatten) one final look before it's all gone. the world is so beautiful. * Flatten OpCodeTable oh no * Address more table ordering * Call Flag as int on A32 Co-authored-by: Natalie C. <cyuubiapps@gmail.com> Co-authored-by: Thog <thog@protonmail.com>
2020-02-23 22:20:40 +01:00
{
InstEmit.Und(context);
Add Profiled Persistent Translation Cache. (#769) * Delete DelegateTypes.cs * Delete DelegateCache.cs * Add files via upload * Update Horizon.cs * Update Program.cs * Update MainWindow.cs * Update Aot.cs * Update RelocEntry.cs * Update Translator.cs * Update MemoryManager.cs * Update InstEmitMemoryHelper.cs * Update Delegates.cs * Nit. * Nit. * Nit. * 10 fewer MSIL bytes for us * Add comment. Nits. * Update Translator.cs * Update Aot.cs * Nits. * Opt.. * Opt.. * Opt.. * Opt.. * Allow to change compression level. * Update MemoryManager.cs * Update Translator.cs * Manage corner cases during the save phase. Nits. * Update Aot.cs * Translator response tweak for Aot disabled. Nit. * Nit. * Nits. * Create DelegateHelpers.cs * Update Delegates.cs * Nit. * Nit. * Nits. * Fix due to #784. * Fixes due to #757 & #841. * Fix due to #846. * Fix due to #847. * Use MethodInfo for managed method calls. Use IR methods instead of managed methods about Max/Min (S/U). Follow-ups & Nits. * Add missing exception messages. Reintroduce slow path for Fmov_Vi. Implement slow path for Fmov_Si. * Switch to the new folder structure. Nits. * Impl. index-based relocation information. Impl. cache file version field. * Nit. * Address gdkchan comments. Mainly: - fixed cache file corruption issue on exit; - exposed a way to disable AOT on the GUI. * Address AcK77 comment. * Address Thealexbarney, jduncanator & emmauss comments. Header magic, CpuId (FI) & Aot -> Ptc. * Adaptation to the new application reloading system. Improvements to the call system of managed methods. Follow-ups. Nits. * Get the same boot times as on master when PTC is disabled. * Profiled Aot. * A32 support (#897). * #975 support (1 of 2). * #975 support (2 of 2). * Rebase fix & nits. * Some fixes and nits (still one bug left). * One fix & nits. * Tests fix (by gdk) & nits. * Support translations not only in high quality and rejit. Nits. * Added possibility to skip translations and continue execution, using `ESC` key. * Update SettingsWindow.cs * Update GLRenderer.cs * Update Ptc.cs * Disabled Profiled PTC by default as requested in the past by gdk. * Fix rejit bug. Increased number of parallel translations. Add stack unwinding stuffs support (1 of 2). Nits. * Add stack unwinding stuffs support (2 of 2). Tuned number of parallel translations. * Restored the ability to assemble jumps with 8-bit offset when Profiled PTC is disabled or during profiling. Modifications due to rebase. Nits. * Limited profiling of the functions to be translated to the addresses belonging to the range of static objects only. * Nits. * Nits. * Update Delegates.cs * Nit. * Update InstEmitSimdArithmetic.cs * Address riperiperi comments. * Fixed the issue of unjustifiably longer boot times at the second boot than at the first boot, measured at the same time or reference point and with the same number of translated functions. * Implemented a simple redundant load/save mechanism. Halved the value of Decoder.MaxInstsPerFunction more appropriate for the current performance of the Translator. Replaced by Logger.PrintError to Logger.PrintDebug in TexturePool.cs about the supposed invalid texture format to avoid the spawn of the log. Nits. * Nit. Improved Logger.PrintError in TexturePool.cs to avoid log spawn. Added missing code for FZ handling (in output) for fp max/min instructions (slow paths). * Add configuration migration for PTC Co-authored-by: Thog <me@thog.eu>
2020-06-16 20:28:02 +02:00
return;
Add most of the A32 instruction set to ARMeilleure (#897) * Implement TEQ and MOV (Imm16) * Initial work on A32 instructions + SVC. No tests yet, hangs in rtld. * Implement CLZ, fix BFI and BFC Now stops on SIMD initialization. * Exclusive access instructions, fix to mul, system instructions. Now gets to a break after SignalProcessWideKey64. * Better impl of UBFX, add UDIV and SDIV Now boots way further - now stuck on VMOV instruction. * Many more instructions, start on SIMD and testing framework. * Fix build issues * svc: Rework 32 bit codepath Fixing once and for all argument ordering issues. * Fix 32 bits stacktrace * hle debug: Add 32 bits dynamic section parsing * Fix highCq mode, add many tests, fix some instruction bugs Still suffers from critical malloc failure :weary: * Fix incorrect opcode decoders and a few more instructions. * Add a few instructions and fix others. re-disable highCq for now. Disabled the svc memory clear since i'm not sure about it. * Fix build * Fix typo in ordered/exclusive stores. * Implement some more instructions, fix others. Uxtab16/Sxtab16 are untested. * Begin impl of pairwise, some other instructions. * Add a few more instructions, a quick hack to fix svcs for now. * Add tests and fix issues with VTRN, VZIP, VUZP * Add a few more instructions, fix Vmul_1 encoding. * Fix way too many instruction bugs, add tests for some of the more important ones. * Fix HighCq, enable FastFP paths for some floating point instructions (not entirely sure why these were disabled, so important to note this commit exists) Branching has been removed in A32 shifts until I figure out if it's worth it * Cleanup Part 1 There should be no functional change between these next few commits. Should is the key word. (except for removing break handler) * Implement 32 bits syscalls Co-authored-by: riperiperi <rhy3756547@hotmail.com> Implement all 32 bits counterparts of the 64 bits syscalls we currently have. * Refactor part 2: Move index/subindex logic to Operand May have inadvertently fixed one (1) bug * Add FlushProcessDataCache32 * Address jd's comments * Remove 16 bit encodings from OpCodeTable Still need to catch some edge cases (operands that use the "F" flag) and make Q encodings with non-even indexes undefined. * Correct Fpscr handling for FP vector slow paths WIP * Add StandardFPSCRValue behaviour for all Arithmetic instructions * Add StandardFPSCRValue behaviour to compare instructions. * Force passing of fpcr to FPProcessException and FPUnpack. Reduces potential for code error significantly * OpCode cleanup * Remove urgency from DMB comment in MRRC DMB is currently a no-op via the instruction, so it should likely still be a no-op here. * Test Cleanup * Fix FPDefaultNaN on Ryzen CPUs * Improve some tests, fix some shift instructions, add slow path for Vadd * Fix Typo * More test cleanup * Flip order of Fx and index, to indicate that the operand's is the "base" * Remove Simd32 register type, use Int32 and Int64 for scalars like A64 does. * Reintroduce alignment to DecoderHelper (removed by accident) * One more realign as reading diffs is hard * Use I32 registers in A32 (part 2) Swap default integer register type based on current execution mode. * FPSCR flags as Registers (part 1) Still need to change NativeContext and ExecutionContext to allow getting/setting with the flag values. * Use I32 registers in A32 (part 1) * FPSCR flags as registers (part 2) Only CMP flags are on the registers right now. It could be useful to use more of the space in non-fast-float when implementing A32 flags accurately in the fast path. * Address Feedback * Correct FP->Int behaviour (should saturate) * Make branches made by writing to PC eligible for Rejit Greatly improves performance in most games. * Remove unused branching for Vtbl * RejitRequest as a class rather than a tuple Makes a lot more sense than storing tuples on a dictionary. * Add VMOVN, VSHR (imm), VSHRN (imm) and related tests * Re-order InstEmitSystem32 Alphabetical sorting. * Address Feedback Feedback from Ac_K, remove and sort usings. * Address Feedback 2 * Address Feedback from LDj3SNuD Opcode table reordered to have alphabetical sorting within groups, Vmaxnm and Vminnm have split names to be less ambiguous, SoftFloat nits, Test nits and Test simplification with ValueSource. * Add Debug Asserts to A32 helpers Mainly to prevent the shift ones from being used on I64 operands, as they expect I32 input for most operations (eg. carry flag setting), and expect I32 input for shift and boolean amounts. Most other helper functions don't take Operands, throw on out of range values, and take specific types of OpCode, so didn't need any asserts. * Use ConstF rather than creating an operand. (useful for pooling in future) * Move exclusive load to helper, reference call flag rather than literal 1. * Address LDj feedback (minus table flatten) one final look before it's all gone. the world is so beautiful. * Flatten OpCodeTable oh no * Address more table ordering * Call Flag as int on A32 Co-authored-by: Natalie C. <cyuubiapps@gmail.com> Co-authored-by: Thog <thog@protonmail.com>
2020-02-23 22:20:40 +01:00
}
Operand result;
Add Profiled Persistent Translation Cache. (#769) * Delete DelegateTypes.cs * Delete DelegateCache.cs * Add files via upload * Update Horizon.cs * Update Program.cs * Update MainWindow.cs * Update Aot.cs * Update RelocEntry.cs * Update Translator.cs * Update MemoryManager.cs * Update InstEmitMemoryHelper.cs * Update Delegates.cs * Nit. * Nit. * Nit. * 10 fewer MSIL bytes for us * Add comment. Nits. * Update Translator.cs * Update Aot.cs * Nits. * Opt.. * Opt.. * Opt.. * Opt.. * Allow to change compression level. * Update MemoryManager.cs * Update Translator.cs * Manage corner cases during the save phase. Nits. * Update Aot.cs * Translator response tweak for Aot disabled. Nit. * Nit. * Nits. * Create DelegateHelpers.cs * Update Delegates.cs * Nit. * Nit. * Nits. * Fix due to #784. * Fixes due to #757 & #841. * Fix due to #846. * Fix due to #847. * Use MethodInfo for managed method calls. Use IR methods instead of managed methods about Max/Min (S/U). Follow-ups & Nits. * Add missing exception messages. Reintroduce slow path for Fmov_Vi. Implement slow path for Fmov_Si. * Switch to the new folder structure. Nits. * Impl. index-based relocation information. Impl. cache file version field. * Nit. * Address gdkchan comments. Mainly: - fixed cache file corruption issue on exit; - exposed a way to disable AOT on the GUI. * Address AcK77 comment. * Address Thealexbarney, jduncanator & emmauss comments. Header magic, CpuId (FI) & Aot -> Ptc. * Adaptation to the new application reloading system. Improvements to the call system of managed methods. Follow-ups. Nits. * Get the same boot times as on master when PTC is disabled. * Profiled Aot. * A32 support (#897). * #975 support (1 of 2). * #975 support (2 of 2). * Rebase fix & nits. * Some fixes and nits (still one bug left). * One fix & nits. * Tests fix (by gdk) & nits. * Support translations not only in high quality and rejit. Nits. * Added possibility to skip translations and continue execution, using `ESC` key. * Update SettingsWindow.cs * Update GLRenderer.cs * Update Ptc.cs * Disabled Profiled PTC by default as requested in the past by gdk. * Fix rejit bug. Increased number of parallel translations. Add stack unwinding stuffs support (1 of 2). Nits. * Add stack unwinding stuffs support (2 of 2). Tuned number of parallel translations. * Restored the ability to assemble jumps with 8-bit offset when Profiled PTC is disabled or during profiling. Modifications due to rebase. Nits. * Limited profiling of the functions to be translated to the addresses belonging to the range of static objects only. * Nits. * Nits. * Update Delegates.cs * Nit. * Update InstEmitSimdArithmetic.cs * Address riperiperi comments. * Fixed the issue of unjustifiably longer boot times at the second boot than at the first boot, measured at the same time or reference point and with the same number of translated functions. * Implemented a simple redundant load/save mechanism. Halved the value of Decoder.MaxInstsPerFunction more appropriate for the current performance of the Translator. Replaced by Logger.PrintError to Logger.PrintDebug in TexturePool.cs about the supposed invalid texture format to avoid the spawn of the log. Nits. * Nit. Improved Logger.PrintError in TexturePool.cs to avoid log spawn. Added missing code for FZ handling (in output) for fp max/min instructions (slow paths). * Add configuration migration for PTC Co-authored-by: Thog <me@thog.eu>
2020-06-16 20:28:02 +02:00
Add most of the A32 instruction set to ARMeilleure (#897) * Implement TEQ and MOV (Imm16) * Initial work on A32 instructions + SVC. No tests yet, hangs in rtld. * Implement CLZ, fix BFI and BFC Now stops on SIMD initialization. * Exclusive access instructions, fix to mul, system instructions. Now gets to a break after SignalProcessWideKey64. * Better impl of UBFX, add UDIV and SDIV Now boots way further - now stuck on VMOV instruction. * Many more instructions, start on SIMD and testing framework. * Fix build issues * svc: Rework 32 bit codepath Fixing once and for all argument ordering issues. * Fix 32 bits stacktrace * hle debug: Add 32 bits dynamic section parsing * Fix highCq mode, add many tests, fix some instruction bugs Still suffers from critical malloc failure :weary: * Fix incorrect opcode decoders and a few more instructions. * Add a few instructions and fix others. re-disable highCq for now. Disabled the svc memory clear since i'm not sure about it. * Fix build * Fix typo in ordered/exclusive stores. * Implement some more instructions, fix others. Uxtab16/Sxtab16 are untested. * Begin impl of pairwise, some other instructions. * Add a few more instructions, a quick hack to fix svcs for now. * Add tests and fix issues with VTRN, VZIP, VUZP * Add a few more instructions, fix Vmul_1 encoding. * Fix way too many instruction bugs, add tests for some of the more important ones. * Fix HighCq, enable FastFP paths for some floating point instructions (not entirely sure why these were disabled, so important to note this commit exists) Branching has been removed in A32 shifts until I figure out if it's worth it * Cleanup Part 1 There should be no functional change between these next few commits. Should is the key word. (except for removing break handler) * Implement 32 bits syscalls Co-authored-by: riperiperi <rhy3756547@hotmail.com> Implement all 32 bits counterparts of the 64 bits syscalls we currently have. * Refactor part 2: Move index/subindex logic to Operand May have inadvertently fixed one (1) bug * Add FlushProcessDataCache32 * Address jd's comments * Remove 16 bit encodings from OpCodeTable Still need to catch some edge cases (operands that use the "F" flag) and make Q encodings with non-even indexes undefined. * Correct Fpscr handling for FP vector slow paths WIP * Add StandardFPSCRValue behaviour for all Arithmetic instructions * Add StandardFPSCRValue behaviour to compare instructions. * Force passing of fpcr to FPProcessException and FPUnpack. Reduces potential for code error significantly * OpCode cleanup * Remove urgency from DMB comment in MRRC DMB is currently a no-op via the instruction, so it should likely still be a no-op here. * Test Cleanup * Fix FPDefaultNaN on Ryzen CPUs * Improve some tests, fix some shift instructions, add slow path for Vadd * Fix Typo * More test cleanup * Flip order of Fx and index, to indicate that the operand's is the "base" * Remove Simd32 register type, use Int32 and Int64 for scalars like A64 does. * Reintroduce alignment to DecoderHelper (removed by accident) * One more realign as reading diffs is hard * Use I32 registers in A32 (part 2) Swap default integer register type based on current execution mode. * FPSCR flags as Registers (part 1) Still need to change NativeContext and ExecutionContext to allow getting/setting with the flag values. * Use I32 registers in A32 (part 1) * FPSCR flags as registers (part 2) Only CMP flags are on the registers right now. It could be useful to use more of the space in non-fast-float when implementing A32 flags accurately in the fast path. * Address Feedback * Correct FP->Int behaviour (should saturate) * Make branches made by writing to PC eligible for Rejit Greatly improves performance in most games. * Remove unused branching for Vtbl * RejitRequest as a class rather than a tuple Makes a lot more sense than storing tuples on a dictionary. * Add VMOVN, VSHR (imm), VSHRN (imm) and related tests * Re-order InstEmitSystem32 Alphabetical sorting. * Address Feedback Feedback from Ac_K, remove and sort usings. * Address Feedback 2 * Address Feedback from LDj3SNuD Opcode table reordered to have alphabetical sorting within groups, Vmaxnm and Vminnm have split names to be less ambiguous, SoftFloat nits, Test nits and Test simplification with ValueSource. * Add Debug Asserts to A32 helpers Mainly to prevent the shift ones from being used on I64 operands, as they expect I32 input for most operations (eg. carry flag setting), and expect I32 input for shift and boolean amounts. Most other helper functions don't take Operands, throw on out of range values, and take specific types of OpCode, so didn't need any asserts. * Use ConstF rather than creating an operand. (useful for pooling in future) * Move exclusive load to helper, reference call flag rather than literal 1. * Address LDj feedback (minus table flatten) one final look before it's all gone. the world is so beautiful. * Flatten OpCodeTable oh no * Address more table ordering * Call Flag as int on A32 Co-authored-by: Natalie C. <cyuubiapps@gmail.com> Co-authored-by: Thog <thog@protonmail.com>
2020-02-23 22:20:40 +01:00
switch (op.CRn)
{
case 13: // Process and Thread Info.
if (op.CRm != 0)
{
throw new NotImplementedException($"Unknown MRC CRm 0x{op.CRm:X} at 0x{op.Address:X} (0x{op.RawOpCode:X}).");
Add most of the A32 instruction set to ARMeilleure (#897) * Implement TEQ and MOV (Imm16) * Initial work on A32 instructions + SVC. No tests yet, hangs in rtld. * Implement CLZ, fix BFI and BFC Now stops on SIMD initialization. * Exclusive access instructions, fix to mul, system instructions. Now gets to a break after SignalProcessWideKey64. * Better impl of UBFX, add UDIV and SDIV Now boots way further - now stuck on VMOV instruction. * Many more instructions, start on SIMD and testing framework. * Fix build issues * svc: Rework 32 bit codepath Fixing once and for all argument ordering issues. * Fix 32 bits stacktrace * hle debug: Add 32 bits dynamic section parsing * Fix highCq mode, add many tests, fix some instruction bugs Still suffers from critical malloc failure :weary: * Fix incorrect opcode decoders and a few more instructions. * Add a few instructions and fix others. re-disable highCq for now. Disabled the svc memory clear since i'm not sure about it. * Fix build * Fix typo in ordered/exclusive stores. * Implement some more instructions, fix others. Uxtab16/Sxtab16 are untested. * Begin impl of pairwise, some other instructions. * Add a few more instructions, a quick hack to fix svcs for now. * Add tests and fix issues with VTRN, VZIP, VUZP * Add a few more instructions, fix Vmul_1 encoding. * Fix way too many instruction bugs, add tests for some of the more important ones. * Fix HighCq, enable FastFP paths for some floating point instructions (not entirely sure why these were disabled, so important to note this commit exists) Branching has been removed in A32 shifts until I figure out if it's worth it * Cleanup Part 1 There should be no functional change between these next few commits. Should is the key word. (except for removing break handler) * Implement 32 bits syscalls Co-authored-by: riperiperi <rhy3756547@hotmail.com> Implement all 32 bits counterparts of the 64 bits syscalls we currently have. * Refactor part 2: Move index/subindex logic to Operand May have inadvertently fixed one (1) bug * Add FlushProcessDataCache32 * Address jd's comments * Remove 16 bit encodings from OpCodeTable Still need to catch some edge cases (operands that use the "F" flag) and make Q encodings with non-even indexes undefined. * Correct Fpscr handling for FP vector slow paths WIP * Add StandardFPSCRValue behaviour for all Arithmetic instructions * Add StandardFPSCRValue behaviour to compare instructions. * Force passing of fpcr to FPProcessException and FPUnpack. Reduces potential for code error significantly * OpCode cleanup * Remove urgency from DMB comment in MRRC DMB is currently a no-op via the instruction, so it should likely still be a no-op here. * Test Cleanup * Fix FPDefaultNaN on Ryzen CPUs * Improve some tests, fix some shift instructions, add slow path for Vadd * Fix Typo * More test cleanup * Flip order of Fx and index, to indicate that the operand's is the "base" * Remove Simd32 register type, use Int32 and Int64 for scalars like A64 does. * Reintroduce alignment to DecoderHelper (removed by accident) * One more realign as reading diffs is hard * Use I32 registers in A32 (part 2) Swap default integer register type based on current execution mode. * FPSCR flags as Registers (part 1) Still need to change NativeContext and ExecutionContext to allow getting/setting with the flag values. * Use I32 registers in A32 (part 1) * FPSCR flags as registers (part 2) Only CMP flags are on the registers right now. It could be useful to use more of the space in non-fast-float when implementing A32 flags accurately in the fast path. * Address Feedback * Correct FP->Int behaviour (should saturate) * Make branches made by writing to PC eligible for Rejit Greatly improves performance in most games. * Remove unused branching for Vtbl * RejitRequest as a class rather than a tuple Makes a lot more sense than storing tuples on a dictionary. * Add VMOVN, VSHR (imm), VSHRN (imm) and related tests * Re-order InstEmitSystem32 Alphabetical sorting. * Address Feedback Feedback from Ac_K, remove and sort usings. * Address Feedback 2 * Address Feedback from LDj3SNuD Opcode table reordered to have alphabetical sorting within groups, Vmaxnm and Vminnm have split names to be less ambiguous, SoftFloat nits, Test nits and Test simplification with ValueSource. * Add Debug Asserts to A32 helpers Mainly to prevent the shift ones from being used on I64 operands, as they expect I32 input for most operations (eg. carry flag setting), and expect I32 input for shift and boolean amounts. Most other helper functions don't take Operands, throw on out of range values, and take specific types of OpCode, so didn't need any asserts. * Use ConstF rather than creating an operand. (useful for pooling in future) * Move exclusive load to helper, reference call flag rather than literal 1. * Address LDj feedback (minus table flatten) one final look before it's all gone. the world is so beautiful. * Flatten OpCodeTable oh no * Address more table ordering * Call Flag as int on A32 Co-authored-by: Natalie C. <cyuubiapps@gmail.com> Co-authored-by: Thog <thog@protonmail.com>
2020-02-23 22:20:40 +01:00
}
Add Profiled Persistent Translation Cache. (#769) * Delete DelegateTypes.cs * Delete DelegateCache.cs * Add files via upload * Update Horizon.cs * Update Program.cs * Update MainWindow.cs * Update Aot.cs * Update RelocEntry.cs * Update Translator.cs * Update MemoryManager.cs * Update InstEmitMemoryHelper.cs * Update Delegates.cs * Nit. * Nit. * Nit. * 10 fewer MSIL bytes for us * Add comment. Nits. * Update Translator.cs * Update Aot.cs * Nits. * Opt.. * Opt.. * Opt.. * Opt.. * Allow to change compression level. * Update MemoryManager.cs * Update Translator.cs * Manage corner cases during the save phase. Nits. * Update Aot.cs * Translator response tweak for Aot disabled. Nit. * Nit. * Nits. * Create DelegateHelpers.cs * Update Delegates.cs * Nit. * Nit. * Nits. * Fix due to #784. * Fixes due to #757 & #841. * Fix due to #846. * Fix due to #847. * Use MethodInfo for managed method calls. Use IR methods instead of managed methods about Max/Min (S/U). Follow-ups & Nits. * Add missing exception messages. Reintroduce slow path for Fmov_Vi. Implement slow path for Fmov_Si. * Switch to the new folder structure. Nits. * Impl. index-based relocation information. Impl. cache file version field. * Nit. * Address gdkchan comments. Mainly: - fixed cache file corruption issue on exit; - exposed a way to disable AOT on the GUI. * Address AcK77 comment. * Address Thealexbarney, jduncanator & emmauss comments. Header magic, CpuId (FI) & Aot -> Ptc. * Adaptation to the new application reloading system. Improvements to the call system of managed methods. Follow-ups. Nits. * Get the same boot times as on master when PTC is disabled. * Profiled Aot. * A32 support (#897). * #975 support (1 of 2). * #975 support (2 of 2). * Rebase fix & nits. * Some fixes and nits (still one bug left). * One fix & nits. * Tests fix (by gdk) & nits. * Support translations not only in high quality and rejit. Nits. * Added possibility to skip translations and continue execution, using `ESC` key. * Update SettingsWindow.cs * Update GLRenderer.cs * Update Ptc.cs * Disabled Profiled PTC by default as requested in the past by gdk. * Fix rejit bug. Increased number of parallel translations. Add stack unwinding stuffs support (1 of 2). Nits. * Add stack unwinding stuffs support (2 of 2). Tuned number of parallel translations. * Restored the ability to assemble jumps with 8-bit offset when Profiled PTC is disabled or during profiling. Modifications due to rebase. Nits. * Limited profiling of the functions to be translated to the addresses belonging to the range of static objects only. * Nits. * Nits. * Update Delegates.cs * Nit. * Update InstEmitSimdArithmetic.cs * Address riperiperi comments. * Fixed the issue of unjustifiably longer boot times at the second boot than at the first boot, measured at the same time or reference point and with the same number of translated functions. * Implemented a simple redundant load/save mechanism. Halved the value of Decoder.MaxInstsPerFunction more appropriate for the current performance of the Translator. Replaced by Logger.PrintError to Logger.PrintDebug in TexturePool.cs about the supposed invalid texture format to avoid the spawn of the log. Nits. * Nit. Improved Logger.PrintError in TexturePool.cs to avoid log spawn. Added missing code for FZ handling (in output) for fp max/min instructions (slow paths). * Add configuration migration for PTC Co-authored-by: Thog <me@thog.eu>
2020-06-16 20:28:02 +02:00
Add most of the A32 instruction set to ARMeilleure (#897) * Implement TEQ and MOV (Imm16) * Initial work on A32 instructions + SVC. No tests yet, hangs in rtld. * Implement CLZ, fix BFI and BFC Now stops on SIMD initialization. * Exclusive access instructions, fix to mul, system instructions. Now gets to a break after SignalProcessWideKey64. * Better impl of UBFX, add UDIV and SDIV Now boots way further - now stuck on VMOV instruction. * Many more instructions, start on SIMD and testing framework. * Fix build issues * svc: Rework 32 bit codepath Fixing once and for all argument ordering issues. * Fix 32 bits stacktrace * hle debug: Add 32 bits dynamic section parsing * Fix highCq mode, add many tests, fix some instruction bugs Still suffers from critical malloc failure :weary: * Fix incorrect opcode decoders and a few more instructions. * Add a few instructions and fix others. re-disable highCq for now. Disabled the svc memory clear since i'm not sure about it. * Fix build * Fix typo in ordered/exclusive stores. * Implement some more instructions, fix others. Uxtab16/Sxtab16 are untested. * Begin impl of pairwise, some other instructions. * Add a few more instructions, a quick hack to fix svcs for now. * Add tests and fix issues with VTRN, VZIP, VUZP * Add a few more instructions, fix Vmul_1 encoding. * Fix way too many instruction bugs, add tests for some of the more important ones. * Fix HighCq, enable FastFP paths for some floating point instructions (not entirely sure why these were disabled, so important to note this commit exists) Branching has been removed in A32 shifts until I figure out if it's worth it * Cleanup Part 1 There should be no functional change between these next few commits. Should is the key word. (except for removing break handler) * Implement 32 bits syscalls Co-authored-by: riperiperi <rhy3756547@hotmail.com> Implement all 32 bits counterparts of the 64 bits syscalls we currently have. * Refactor part 2: Move index/subindex logic to Operand May have inadvertently fixed one (1) bug * Add FlushProcessDataCache32 * Address jd's comments * Remove 16 bit encodings from OpCodeTable Still need to catch some edge cases (operands that use the "F" flag) and make Q encodings with non-even indexes undefined. * Correct Fpscr handling for FP vector slow paths WIP * Add StandardFPSCRValue behaviour for all Arithmetic instructions * Add StandardFPSCRValue behaviour to compare instructions. * Force passing of fpcr to FPProcessException and FPUnpack. Reduces potential for code error significantly * OpCode cleanup * Remove urgency from DMB comment in MRRC DMB is currently a no-op via the instruction, so it should likely still be a no-op here. * Test Cleanup * Fix FPDefaultNaN on Ryzen CPUs * Improve some tests, fix some shift instructions, add slow path for Vadd * Fix Typo * More test cleanup * Flip order of Fx and index, to indicate that the operand's is the "base" * Remove Simd32 register type, use Int32 and Int64 for scalars like A64 does. * Reintroduce alignment to DecoderHelper (removed by accident) * One more realign as reading diffs is hard * Use I32 registers in A32 (part 2) Swap default integer register type based on current execution mode. * FPSCR flags as Registers (part 1) Still need to change NativeContext and ExecutionContext to allow getting/setting with the flag values. * Use I32 registers in A32 (part 1) * FPSCR flags as registers (part 2) Only CMP flags are on the registers right now. It could be useful to use more of the space in non-fast-float when implementing A32 flags accurately in the fast path. * Address Feedback * Correct FP->Int behaviour (should saturate) * Make branches made by writing to PC eligible for Rejit Greatly improves performance in most games. * Remove unused branching for Vtbl * RejitRequest as a class rather than a tuple Makes a lot more sense than storing tuples on a dictionary. * Add VMOVN, VSHR (imm), VSHRN (imm) and related tests * Re-order InstEmitSystem32 Alphabetical sorting. * Address Feedback Feedback from Ac_K, remove and sort usings. * Address Feedback 2 * Address Feedback from LDj3SNuD Opcode table reordered to have alphabetical sorting within groups, Vmaxnm and Vminnm have split names to be less ambiguous, SoftFloat nits, Test nits and Test simplification with ValueSource. * Add Debug Asserts to A32 helpers Mainly to prevent the shift ones from being used on I64 operands, as they expect I32 input for most operations (eg. carry flag setting), and expect I32 input for shift and boolean amounts. Most other helper functions don't take Operands, throw on out of range values, and take specific types of OpCode, so didn't need any asserts. * Use ConstF rather than creating an operand. (useful for pooling in future) * Move exclusive load to helper, reference call flag rather than literal 1. * Address LDj feedback (minus table flatten) one final look before it's all gone. the world is so beautiful. * Flatten OpCodeTable oh no * Address more table ordering * Call Flag as int on A32 Co-authored-by: Natalie C. <cyuubiapps@gmail.com> Co-authored-by: Thog <thog@protonmail.com>
2020-02-23 22:20:40 +01:00
switch (op.Opc2)
{
case 2:
result = EmitGetTpidrEl0(context); break;
Add Profiled Persistent Translation Cache. (#769) * Delete DelegateTypes.cs * Delete DelegateCache.cs * Add files via upload * Update Horizon.cs * Update Program.cs * Update MainWindow.cs * Update Aot.cs * Update RelocEntry.cs * Update Translator.cs * Update MemoryManager.cs * Update InstEmitMemoryHelper.cs * Update Delegates.cs * Nit. * Nit. * Nit. * 10 fewer MSIL bytes for us * Add comment. Nits. * Update Translator.cs * Update Aot.cs * Nits. * Opt.. * Opt.. * Opt.. * Opt.. * Allow to change compression level. * Update MemoryManager.cs * Update Translator.cs * Manage corner cases during the save phase. Nits. * Update Aot.cs * Translator response tweak for Aot disabled. Nit. * Nit. * Nits. * Create DelegateHelpers.cs * Update Delegates.cs * Nit. * Nit. * Nits. * Fix due to #784. * Fixes due to #757 & #841. * Fix due to #846. * Fix due to #847. * Use MethodInfo for managed method calls. Use IR methods instead of managed methods about Max/Min (S/U). Follow-ups & Nits. * Add missing exception messages. Reintroduce slow path for Fmov_Vi. Implement slow path for Fmov_Si. * Switch to the new folder structure. Nits. * Impl. index-based relocation information. Impl. cache file version field. * Nit. * Address gdkchan comments. Mainly: - fixed cache file corruption issue on exit; - exposed a way to disable AOT on the GUI. * Address AcK77 comment. * Address Thealexbarney, jduncanator & emmauss comments. Header magic, CpuId (FI) & Aot -> Ptc. * Adaptation to the new application reloading system. Improvements to the call system of managed methods. Follow-ups. Nits. * Get the same boot times as on master when PTC is disabled. * Profiled Aot. * A32 support (#897). * #975 support (1 of 2). * #975 support (2 of 2). * Rebase fix & nits. * Some fixes and nits (still one bug left). * One fix & nits. * Tests fix (by gdk) & nits. * Support translations not only in high quality and rejit. Nits. * Added possibility to skip translations and continue execution, using `ESC` key. * Update SettingsWindow.cs * Update GLRenderer.cs * Update Ptc.cs * Disabled Profiled PTC by default as requested in the past by gdk. * Fix rejit bug. Increased number of parallel translations. Add stack unwinding stuffs support (1 of 2). Nits. * Add stack unwinding stuffs support (2 of 2). Tuned number of parallel translations. * Restored the ability to assemble jumps with 8-bit offset when Profiled PTC is disabled or during profiling. Modifications due to rebase. Nits. * Limited profiling of the functions to be translated to the addresses belonging to the range of static objects only. * Nits. * Nits. * Update Delegates.cs * Nit. * Update InstEmitSimdArithmetic.cs * Address riperiperi comments. * Fixed the issue of unjustifiably longer boot times at the second boot than at the first boot, measured at the same time or reference point and with the same number of translated functions. * Implemented a simple redundant load/save mechanism. Halved the value of Decoder.MaxInstsPerFunction more appropriate for the current performance of the Translator. Replaced by Logger.PrintError to Logger.PrintDebug in TexturePool.cs about the supposed invalid texture format to avoid the spawn of the log. Nits. * Nit. Improved Logger.PrintError in TexturePool.cs to avoid log spawn. Added missing code for FZ handling (in output) for fp max/min instructions (slow paths). * Add configuration migration for PTC Co-authored-by: Thog <me@thog.eu>
2020-06-16 20:28:02 +02:00
Add most of the A32 instruction set to ARMeilleure (#897) * Implement TEQ and MOV (Imm16) * Initial work on A32 instructions + SVC. No tests yet, hangs in rtld. * Implement CLZ, fix BFI and BFC Now stops on SIMD initialization. * Exclusive access instructions, fix to mul, system instructions. Now gets to a break after SignalProcessWideKey64. * Better impl of UBFX, add UDIV and SDIV Now boots way further - now stuck on VMOV instruction. * Many more instructions, start on SIMD and testing framework. * Fix build issues * svc: Rework 32 bit codepath Fixing once and for all argument ordering issues. * Fix 32 bits stacktrace * hle debug: Add 32 bits dynamic section parsing * Fix highCq mode, add many tests, fix some instruction bugs Still suffers from critical malloc failure :weary: * Fix incorrect opcode decoders and a few more instructions. * Add a few instructions and fix others. re-disable highCq for now. Disabled the svc memory clear since i'm not sure about it. * Fix build * Fix typo in ordered/exclusive stores. * Implement some more instructions, fix others. Uxtab16/Sxtab16 are untested. * Begin impl of pairwise, some other instructions. * Add a few more instructions, a quick hack to fix svcs for now. * Add tests and fix issues with VTRN, VZIP, VUZP * Add a few more instructions, fix Vmul_1 encoding. * Fix way too many instruction bugs, add tests for some of the more important ones. * Fix HighCq, enable FastFP paths for some floating point instructions (not entirely sure why these were disabled, so important to note this commit exists) Branching has been removed in A32 shifts until I figure out if it's worth it * Cleanup Part 1 There should be no functional change between these next few commits. Should is the key word. (except for removing break handler) * Implement 32 bits syscalls Co-authored-by: riperiperi <rhy3756547@hotmail.com> Implement all 32 bits counterparts of the 64 bits syscalls we currently have. * Refactor part 2: Move index/subindex logic to Operand May have inadvertently fixed one (1) bug * Add FlushProcessDataCache32 * Address jd's comments * Remove 16 bit encodings from OpCodeTable Still need to catch some edge cases (operands that use the "F" flag) and make Q encodings with non-even indexes undefined. * Correct Fpscr handling for FP vector slow paths WIP * Add StandardFPSCRValue behaviour for all Arithmetic instructions * Add StandardFPSCRValue behaviour to compare instructions. * Force passing of fpcr to FPProcessException and FPUnpack. Reduces potential for code error significantly * OpCode cleanup * Remove urgency from DMB comment in MRRC DMB is currently a no-op via the instruction, so it should likely still be a no-op here. * Test Cleanup * Fix FPDefaultNaN on Ryzen CPUs * Improve some tests, fix some shift instructions, add slow path for Vadd * Fix Typo * More test cleanup * Flip order of Fx and index, to indicate that the operand's is the "base" * Remove Simd32 register type, use Int32 and Int64 for scalars like A64 does. * Reintroduce alignment to DecoderHelper (removed by accident) * One more realign as reading diffs is hard * Use I32 registers in A32 (part 2) Swap default integer register type based on current execution mode. * FPSCR flags as Registers (part 1) Still need to change NativeContext and ExecutionContext to allow getting/setting with the flag values. * Use I32 registers in A32 (part 1) * FPSCR flags as registers (part 2) Only CMP flags are on the registers right now. It could be useful to use more of the space in non-fast-float when implementing A32 flags accurately in the fast path. * Address Feedback * Correct FP->Int behaviour (should saturate) * Make branches made by writing to PC eligible for Rejit Greatly improves performance in most games. * Remove unused branching for Vtbl * RejitRequest as a class rather than a tuple Makes a lot more sense than storing tuples on a dictionary. * Add VMOVN, VSHR (imm), VSHRN (imm) and related tests * Re-order InstEmitSystem32 Alphabetical sorting. * Address Feedback Feedback from Ac_K, remove and sort usings. * Address Feedback 2 * Address Feedback from LDj3SNuD Opcode table reordered to have alphabetical sorting within groups, Vmaxnm and Vminnm have split names to be less ambiguous, SoftFloat nits, Test nits and Test simplification with ValueSource. * Add Debug Asserts to A32 helpers Mainly to prevent the shift ones from being used on I64 operands, as they expect I32 input for most operations (eg. carry flag setting), and expect I32 input for shift and boolean amounts. Most other helper functions don't take Operands, throw on out of range values, and take specific types of OpCode, so didn't need any asserts. * Use ConstF rather than creating an operand. (useful for pooling in future) * Move exclusive load to helper, reference call flag rather than literal 1. * Address LDj feedback (minus table flatten) one final look before it's all gone. the world is so beautiful. * Flatten OpCodeTable oh no * Address more table ordering * Call Flag as int on A32 Co-authored-by: Natalie C. <cyuubiapps@gmail.com> Co-authored-by: Thog <thog@protonmail.com>
2020-02-23 22:20:40 +01:00
case 3:
result = EmitGetTpidrroEl0(context); break;
Add Profiled Persistent Translation Cache. (#769) * Delete DelegateTypes.cs * Delete DelegateCache.cs * Add files via upload * Update Horizon.cs * Update Program.cs * Update MainWindow.cs * Update Aot.cs * Update RelocEntry.cs * Update Translator.cs * Update MemoryManager.cs * Update InstEmitMemoryHelper.cs * Update Delegates.cs * Nit. * Nit. * Nit. * 10 fewer MSIL bytes for us * Add comment. Nits. * Update Translator.cs * Update Aot.cs * Nits. * Opt.. * Opt.. * Opt.. * Opt.. * Allow to change compression level. * Update MemoryManager.cs * Update Translator.cs * Manage corner cases during the save phase. Nits. * Update Aot.cs * Translator response tweak for Aot disabled. Nit. * Nit. * Nits. * Create DelegateHelpers.cs * Update Delegates.cs * Nit. * Nit. * Nits. * Fix due to #784. * Fixes due to #757 & #841. * Fix due to #846. * Fix due to #847. * Use MethodInfo for managed method calls. Use IR methods instead of managed methods about Max/Min (S/U). Follow-ups & Nits. * Add missing exception messages. Reintroduce slow path for Fmov_Vi. Implement slow path for Fmov_Si. * Switch to the new folder structure. Nits. * Impl. index-based relocation information. Impl. cache file version field. * Nit. * Address gdkchan comments. Mainly: - fixed cache file corruption issue on exit; - exposed a way to disable AOT on the GUI. * Address AcK77 comment. * Address Thealexbarney, jduncanator & emmauss comments. Header magic, CpuId (FI) & Aot -> Ptc. * Adaptation to the new application reloading system. Improvements to the call system of managed methods. Follow-ups. Nits. * Get the same boot times as on master when PTC is disabled. * Profiled Aot. * A32 support (#897). * #975 support (1 of 2). * #975 support (2 of 2). * Rebase fix & nits. * Some fixes and nits (still one bug left). * One fix & nits. * Tests fix (by gdk) & nits. * Support translations not only in high quality and rejit. Nits. * Added possibility to skip translations and continue execution, using `ESC` key. * Update SettingsWindow.cs * Update GLRenderer.cs * Update Ptc.cs * Disabled Profiled PTC by default as requested in the past by gdk. * Fix rejit bug. Increased number of parallel translations. Add stack unwinding stuffs support (1 of 2). Nits. * Add stack unwinding stuffs support (2 of 2). Tuned number of parallel translations. * Restored the ability to assemble jumps with 8-bit offset when Profiled PTC is disabled or during profiling. Modifications due to rebase. Nits. * Limited profiling of the functions to be translated to the addresses belonging to the range of static objects only. * Nits. * Nits. * Update Delegates.cs * Nit. * Update InstEmitSimdArithmetic.cs * Address riperiperi comments. * Fixed the issue of unjustifiably longer boot times at the second boot than at the first boot, measured at the same time or reference point and with the same number of translated functions. * Implemented a simple redundant load/save mechanism. Halved the value of Decoder.MaxInstsPerFunction more appropriate for the current performance of the Translator. Replaced by Logger.PrintError to Logger.PrintDebug in TexturePool.cs about the supposed invalid texture format to avoid the spawn of the log. Nits. * Nit. Improved Logger.PrintError in TexturePool.cs to avoid log spawn. Added missing code for FZ handling (in output) for fp max/min instructions (slow paths). * Add configuration migration for PTC Co-authored-by: Thog <me@thog.eu>
2020-06-16 20:28:02 +02:00
Add most of the A32 instruction set to ARMeilleure (#897) * Implement TEQ and MOV (Imm16) * Initial work on A32 instructions + SVC. No tests yet, hangs in rtld. * Implement CLZ, fix BFI and BFC Now stops on SIMD initialization. * Exclusive access instructions, fix to mul, system instructions. Now gets to a break after SignalProcessWideKey64. * Better impl of UBFX, add UDIV and SDIV Now boots way further - now stuck on VMOV instruction. * Many more instructions, start on SIMD and testing framework. * Fix build issues * svc: Rework 32 bit codepath Fixing once and for all argument ordering issues. * Fix 32 bits stacktrace * hle debug: Add 32 bits dynamic section parsing * Fix highCq mode, add many tests, fix some instruction bugs Still suffers from critical malloc failure :weary: * Fix incorrect opcode decoders and a few more instructions. * Add a few instructions and fix others. re-disable highCq for now. Disabled the svc memory clear since i'm not sure about it. * Fix build * Fix typo in ordered/exclusive stores. * Implement some more instructions, fix others. Uxtab16/Sxtab16 are untested. * Begin impl of pairwise, some other instructions. * Add a few more instructions, a quick hack to fix svcs for now. * Add tests and fix issues with VTRN, VZIP, VUZP * Add a few more instructions, fix Vmul_1 encoding. * Fix way too many instruction bugs, add tests for some of the more important ones. * Fix HighCq, enable FastFP paths for some floating point instructions (not entirely sure why these were disabled, so important to note this commit exists) Branching has been removed in A32 shifts until I figure out if it's worth it * Cleanup Part 1 There should be no functional change between these next few commits. Should is the key word. (except for removing break handler) * Implement 32 bits syscalls Co-authored-by: riperiperi <rhy3756547@hotmail.com> Implement all 32 bits counterparts of the 64 bits syscalls we currently have. * Refactor part 2: Move index/subindex logic to Operand May have inadvertently fixed one (1) bug * Add FlushProcessDataCache32 * Address jd's comments * Remove 16 bit encodings from OpCodeTable Still need to catch some edge cases (operands that use the "F" flag) and make Q encodings with non-even indexes undefined. * Correct Fpscr handling for FP vector slow paths WIP * Add StandardFPSCRValue behaviour for all Arithmetic instructions * Add StandardFPSCRValue behaviour to compare instructions. * Force passing of fpcr to FPProcessException and FPUnpack. Reduces potential for code error significantly * OpCode cleanup * Remove urgency from DMB comment in MRRC DMB is currently a no-op via the instruction, so it should likely still be a no-op here. * Test Cleanup * Fix FPDefaultNaN on Ryzen CPUs * Improve some tests, fix some shift instructions, add slow path for Vadd * Fix Typo * More test cleanup * Flip order of Fx and index, to indicate that the operand's is the "base" * Remove Simd32 register type, use Int32 and Int64 for scalars like A64 does. * Reintroduce alignment to DecoderHelper (removed by accident) * One more realign as reading diffs is hard * Use I32 registers in A32 (part 2) Swap default integer register type based on current execution mode. * FPSCR flags as Registers (part 1) Still need to change NativeContext and ExecutionContext to allow getting/setting with the flag values. * Use I32 registers in A32 (part 1) * FPSCR flags as registers (part 2) Only CMP flags are on the registers right now. It could be useful to use more of the space in non-fast-float when implementing A32 flags accurately in the fast path. * Address Feedback * Correct FP->Int behaviour (should saturate) * Make branches made by writing to PC eligible for Rejit Greatly improves performance in most games. * Remove unused branching for Vtbl * RejitRequest as a class rather than a tuple Makes a lot more sense than storing tuples on a dictionary. * Add VMOVN, VSHR (imm), VSHRN (imm) and related tests * Re-order InstEmitSystem32 Alphabetical sorting. * Address Feedback Feedback from Ac_K, remove and sort usings. * Address Feedback 2 * Address Feedback from LDj3SNuD Opcode table reordered to have alphabetical sorting within groups, Vmaxnm and Vminnm have split names to be less ambiguous, SoftFloat nits, Test nits and Test simplification with ValueSource. * Add Debug Asserts to A32 helpers Mainly to prevent the shift ones from being used on I64 operands, as they expect I32 input for most operations (eg. carry flag setting), and expect I32 input for shift and boolean amounts. Most other helper functions don't take Operands, throw on out of range values, and take specific types of OpCode, so didn't need any asserts. * Use ConstF rather than creating an operand. (useful for pooling in future) * Move exclusive load to helper, reference call flag rather than literal 1. * Address LDj feedback (minus table flatten) one final look before it's all gone. the world is so beautiful. * Flatten OpCodeTable oh no * Address more table ordering * Call Flag as int on A32 Co-authored-by: Natalie C. <cyuubiapps@gmail.com> Co-authored-by: Thog <thog@protonmail.com>
2020-02-23 22:20:40 +01:00
default:
throw new NotImplementedException($"Unknown MRC Opc2 0x{op.Opc2:X} at 0x{op.Address:X} (0x{op.RawOpCode:X}).");
Add most of the A32 instruction set to ARMeilleure (#897) * Implement TEQ and MOV (Imm16) * Initial work on A32 instructions + SVC. No tests yet, hangs in rtld. * Implement CLZ, fix BFI and BFC Now stops on SIMD initialization. * Exclusive access instructions, fix to mul, system instructions. Now gets to a break after SignalProcessWideKey64. * Better impl of UBFX, add UDIV and SDIV Now boots way further - now stuck on VMOV instruction. * Many more instructions, start on SIMD and testing framework. * Fix build issues * svc: Rework 32 bit codepath Fixing once and for all argument ordering issues. * Fix 32 bits stacktrace * hle debug: Add 32 bits dynamic section parsing * Fix highCq mode, add many tests, fix some instruction bugs Still suffers from critical malloc failure :weary: * Fix incorrect opcode decoders and a few more instructions. * Add a few instructions and fix others. re-disable highCq for now. Disabled the svc memory clear since i'm not sure about it. * Fix build * Fix typo in ordered/exclusive stores. * Implement some more instructions, fix others. Uxtab16/Sxtab16 are untested. * Begin impl of pairwise, some other instructions. * Add a few more instructions, a quick hack to fix svcs for now. * Add tests and fix issues with VTRN, VZIP, VUZP * Add a few more instructions, fix Vmul_1 encoding. * Fix way too many instruction bugs, add tests for some of the more important ones. * Fix HighCq, enable FastFP paths for some floating point instructions (not entirely sure why these were disabled, so important to note this commit exists) Branching has been removed in A32 shifts until I figure out if it's worth it * Cleanup Part 1 There should be no functional change between these next few commits. Should is the key word. (except for removing break handler) * Implement 32 bits syscalls Co-authored-by: riperiperi <rhy3756547@hotmail.com> Implement all 32 bits counterparts of the 64 bits syscalls we currently have. * Refactor part 2: Move index/subindex logic to Operand May have inadvertently fixed one (1) bug * Add FlushProcessDataCache32 * Address jd's comments * Remove 16 bit encodings from OpCodeTable Still need to catch some edge cases (operands that use the "F" flag) and make Q encodings with non-even indexes undefined. * Correct Fpscr handling for FP vector slow paths WIP * Add StandardFPSCRValue behaviour for all Arithmetic instructions * Add StandardFPSCRValue behaviour to compare instructions. * Force passing of fpcr to FPProcessException and FPUnpack. Reduces potential for code error significantly * OpCode cleanup * Remove urgency from DMB comment in MRRC DMB is currently a no-op via the instruction, so it should likely still be a no-op here. * Test Cleanup * Fix FPDefaultNaN on Ryzen CPUs * Improve some tests, fix some shift instructions, add slow path for Vadd * Fix Typo * More test cleanup * Flip order of Fx and index, to indicate that the operand's is the "base" * Remove Simd32 register type, use Int32 and Int64 for scalars like A64 does. * Reintroduce alignment to DecoderHelper (removed by accident) * One more realign as reading diffs is hard * Use I32 registers in A32 (part 2) Swap default integer register type based on current execution mode. * FPSCR flags as Registers (part 1) Still need to change NativeContext and ExecutionContext to allow getting/setting with the flag values. * Use I32 registers in A32 (part 1) * FPSCR flags as registers (part 2) Only CMP flags are on the registers right now. It could be useful to use more of the space in non-fast-float when implementing A32 flags accurately in the fast path. * Address Feedback * Correct FP->Int behaviour (should saturate) * Make branches made by writing to PC eligible for Rejit Greatly improves performance in most games. * Remove unused branching for Vtbl * RejitRequest as a class rather than a tuple Makes a lot more sense than storing tuples on a dictionary. * Add VMOVN, VSHR (imm), VSHRN (imm) and related tests * Re-order InstEmitSystem32 Alphabetical sorting. * Address Feedback Feedback from Ac_K, remove and sort usings. * Address Feedback 2 * Address Feedback from LDj3SNuD Opcode table reordered to have alphabetical sorting within groups, Vmaxnm and Vminnm have split names to be less ambiguous, SoftFloat nits, Test nits and Test simplification with ValueSource. * Add Debug Asserts to A32 helpers Mainly to prevent the shift ones from being used on I64 operands, as they expect I32 input for most operations (eg. carry flag setting), and expect I32 input for shift and boolean amounts. Most other helper functions don't take Operands, throw on out of range values, and take specific types of OpCode, so didn't need any asserts. * Use ConstF rather than creating an operand. (useful for pooling in future) * Move exclusive load to helper, reference call flag rather than literal 1. * Address LDj feedback (minus table flatten) one final look before it's all gone. the world is so beautiful. * Flatten OpCodeTable oh no * Address more table ordering * Call Flag as int on A32 Co-authored-by: Natalie C. <cyuubiapps@gmail.com> Co-authored-by: Thog <thog@protonmail.com>
2020-02-23 22:20:40 +01:00
}
Add Profiled Persistent Translation Cache. (#769) * Delete DelegateTypes.cs * Delete DelegateCache.cs * Add files via upload * Update Horizon.cs * Update Program.cs * Update MainWindow.cs * Update Aot.cs * Update RelocEntry.cs * Update Translator.cs * Update MemoryManager.cs * Update InstEmitMemoryHelper.cs * Update Delegates.cs * Nit. * Nit. * Nit. * 10 fewer MSIL bytes for us * Add comment. Nits. * Update Translator.cs * Update Aot.cs * Nits. * Opt.. * Opt.. * Opt.. * Opt.. * Allow to change compression level. * Update MemoryManager.cs * Update Translator.cs * Manage corner cases during the save phase. Nits. * Update Aot.cs * Translator response tweak for Aot disabled. Nit. * Nit. * Nits. * Create DelegateHelpers.cs * Update Delegates.cs * Nit. * Nit. * Nits. * Fix due to #784. * Fixes due to #757 & #841. * Fix due to #846. * Fix due to #847. * Use MethodInfo for managed method calls. Use IR methods instead of managed methods about Max/Min (S/U). Follow-ups & Nits. * Add missing exception messages. Reintroduce slow path for Fmov_Vi. Implement slow path for Fmov_Si. * Switch to the new folder structure. Nits. * Impl. index-based relocation information. Impl. cache file version field. * Nit. * Address gdkchan comments. Mainly: - fixed cache file corruption issue on exit; - exposed a way to disable AOT on the GUI. * Address AcK77 comment. * Address Thealexbarney, jduncanator & emmauss comments. Header magic, CpuId (FI) & Aot -> Ptc. * Adaptation to the new application reloading system. Improvements to the call system of managed methods. Follow-ups. Nits. * Get the same boot times as on master when PTC is disabled. * Profiled Aot. * A32 support (#897). * #975 support (1 of 2). * #975 support (2 of 2). * Rebase fix & nits. * Some fixes and nits (still one bug left). * One fix & nits. * Tests fix (by gdk) & nits. * Support translations not only in high quality and rejit. Nits. * Added possibility to skip translations and continue execution, using `ESC` key. * Update SettingsWindow.cs * Update GLRenderer.cs * Update Ptc.cs * Disabled Profiled PTC by default as requested in the past by gdk. * Fix rejit bug. Increased number of parallel translations. Add stack unwinding stuffs support (1 of 2). Nits. * Add stack unwinding stuffs support (2 of 2). Tuned number of parallel translations. * Restored the ability to assemble jumps with 8-bit offset when Profiled PTC is disabled or during profiling. Modifications due to rebase. Nits. * Limited profiling of the functions to be translated to the addresses belonging to the range of static objects only. * Nits. * Nits. * Update Delegates.cs * Nit. * Update InstEmitSimdArithmetic.cs * Address riperiperi comments. * Fixed the issue of unjustifiably longer boot times at the second boot than at the first boot, measured at the same time or reference point and with the same number of translated functions. * Implemented a simple redundant load/save mechanism. Halved the value of Decoder.MaxInstsPerFunction more appropriate for the current performance of the Translator. Replaced by Logger.PrintError to Logger.PrintDebug in TexturePool.cs about the supposed invalid texture format to avoid the spawn of the log. Nits. * Nit. Improved Logger.PrintError in TexturePool.cs to avoid log spawn. Added missing code for FZ handling (in output) for fp max/min instructions (slow paths). * Add configuration migration for PTC Co-authored-by: Thog <me@thog.eu>
2020-06-16 20:28:02 +02:00
Add most of the A32 instruction set to ARMeilleure (#897) * Implement TEQ and MOV (Imm16) * Initial work on A32 instructions + SVC. No tests yet, hangs in rtld. * Implement CLZ, fix BFI and BFC Now stops on SIMD initialization. * Exclusive access instructions, fix to mul, system instructions. Now gets to a break after SignalProcessWideKey64. * Better impl of UBFX, add UDIV and SDIV Now boots way further - now stuck on VMOV instruction. * Many more instructions, start on SIMD and testing framework. * Fix build issues * svc: Rework 32 bit codepath Fixing once and for all argument ordering issues. * Fix 32 bits stacktrace * hle debug: Add 32 bits dynamic section parsing * Fix highCq mode, add many tests, fix some instruction bugs Still suffers from critical malloc failure :weary: * Fix incorrect opcode decoders and a few more instructions. * Add a few instructions and fix others. re-disable highCq for now. Disabled the svc memory clear since i'm not sure about it. * Fix build * Fix typo in ordered/exclusive stores. * Implement some more instructions, fix others. Uxtab16/Sxtab16 are untested. * Begin impl of pairwise, some other instructions. * Add a few more instructions, a quick hack to fix svcs for now. * Add tests and fix issues with VTRN, VZIP, VUZP * Add a few more instructions, fix Vmul_1 encoding. * Fix way too many instruction bugs, add tests for some of the more important ones. * Fix HighCq, enable FastFP paths for some floating point instructions (not entirely sure why these were disabled, so important to note this commit exists) Branching has been removed in A32 shifts until I figure out if it's worth it * Cleanup Part 1 There should be no functional change between these next few commits. Should is the key word. (except for removing break handler) * Implement 32 bits syscalls Co-authored-by: riperiperi <rhy3756547@hotmail.com> Implement all 32 bits counterparts of the 64 bits syscalls we currently have. * Refactor part 2: Move index/subindex logic to Operand May have inadvertently fixed one (1) bug * Add FlushProcessDataCache32 * Address jd's comments * Remove 16 bit encodings from OpCodeTable Still need to catch some edge cases (operands that use the "F" flag) and make Q encodings with non-even indexes undefined. * Correct Fpscr handling for FP vector slow paths WIP * Add StandardFPSCRValue behaviour for all Arithmetic instructions * Add StandardFPSCRValue behaviour to compare instructions. * Force passing of fpcr to FPProcessException and FPUnpack. Reduces potential for code error significantly * OpCode cleanup * Remove urgency from DMB comment in MRRC DMB is currently a no-op via the instruction, so it should likely still be a no-op here. * Test Cleanup * Fix FPDefaultNaN on Ryzen CPUs * Improve some tests, fix some shift instructions, add slow path for Vadd * Fix Typo * More test cleanup * Flip order of Fx and index, to indicate that the operand's is the "base" * Remove Simd32 register type, use Int32 and Int64 for scalars like A64 does. * Reintroduce alignment to DecoderHelper (removed by accident) * One more realign as reading diffs is hard * Use I32 registers in A32 (part 2) Swap default integer register type based on current execution mode. * FPSCR flags as Registers (part 1) Still need to change NativeContext and ExecutionContext to allow getting/setting with the flag values. * Use I32 registers in A32 (part 1) * FPSCR flags as registers (part 2) Only CMP flags are on the registers right now. It could be useful to use more of the space in non-fast-float when implementing A32 flags accurately in the fast path. * Address Feedback * Correct FP->Int behaviour (should saturate) * Make branches made by writing to PC eligible for Rejit Greatly improves performance in most games. * Remove unused branching for Vtbl * RejitRequest as a class rather than a tuple Makes a lot more sense than storing tuples on a dictionary. * Add VMOVN, VSHR (imm), VSHRN (imm) and related tests * Re-order InstEmitSystem32 Alphabetical sorting. * Address Feedback Feedback from Ac_K, remove and sort usings. * Address Feedback 2 * Address Feedback from LDj3SNuD Opcode table reordered to have alphabetical sorting within groups, Vmaxnm and Vminnm have split names to be less ambiguous, SoftFloat nits, Test nits and Test simplification with ValueSource. * Add Debug Asserts to A32 helpers Mainly to prevent the shift ones from being used on I64 operands, as they expect I32 input for most operations (eg. carry flag setting), and expect I32 input for shift and boolean amounts. Most other helper functions don't take Operands, throw on out of range values, and take specific types of OpCode, so didn't need any asserts. * Use ConstF rather than creating an operand. (useful for pooling in future) * Move exclusive load to helper, reference call flag rather than literal 1. * Address LDj feedback (minus table flatten) one final look before it's all gone. the world is so beautiful. * Flatten OpCodeTable oh no * Address more table ordering * Call Flag as int on A32 Co-authored-by: Natalie C. <cyuubiapps@gmail.com> Co-authored-by: Thog <thog@protonmail.com>
2020-02-23 22:20:40 +01:00
break;
Add Profiled Persistent Translation Cache. (#769) * Delete DelegateTypes.cs * Delete DelegateCache.cs * Add files via upload * Update Horizon.cs * Update Program.cs * Update MainWindow.cs * Update Aot.cs * Update RelocEntry.cs * Update Translator.cs * Update MemoryManager.cs * Update InstEmitMemoryHelper.cs * Update Delegates.cs * Nit. * Nit. * Nit. * 10 fewer MSIL bytes for us * Add comment. Nits. * Update Translator.cs * Update Aot.cs * Nits. * Opt.. * Opt.. * Opt.. * Opt.. * Allow to change compression level. * Update MemoryManager.cs * Update Translator.cs * Manage corner cases during the save phase. Nits. * Update Aot.cs * Translator response tweak for Aot disabled. Nit. * Nit. * Nits. * Create DelegateHelpers.cs * Update Delegates.cs * Nit. * Nit. * Nits. * Fix due to #784. * Fixes due to #757 & #841. * Fix due to #846. * Fix due to #847. * Use MethodInfo for managed method calls. Use IR methods instead of managed methods about Max/Min (S/U). Follow-ups & Nits. * Add missing exception messages. Reintroduce slow path for Fmov_Vi. Implement slow path for Fmov_Si. * Switch to the new folder structure. Nits. * Impl. index-based relocation information. Impl. cache file version field. * Nit. * Address gdkchan comments. Mainly: - fixed cache file corruption issue on exit; - exposed a way to disable AOT on the GUI. * Address AcK77 comment. * Address Thealexbarney, jduncanator & emmauss comments. Header magic, CpuId (FI) & Aot -> Ptc. * Adaptation to the new application reloading system. Improvements to the call system of managed methods. Follow-ups. Nits. * Get the same boot times as on master when PTC is disabled. * Profiled Aot. * A32 support (#897). * #975 support (1 of 2). * #975 support (2 of 2). * Rebase fix & nits. * Some fixes and nits (still one bug left). * One fix & nits. * Tests fix (by gdk) & nits. * Support translations not only in high quality and rejit. Nits. * Added possibility to skip translations and continue execution, using `ESC` key. * Update SettingsWindow.cs * Update GLRenderer.cs * Update Ptc.cs * Disabled Profiled PTC by default as requested in the past by gdk. * Fix rejit bug. Increased number of parallel translations. Add stack unwinding stuffs support (1 of 2). Nits. * Add stack unwinding stuffs support (2 of 2). Tuned number of parallel translations. * Restored the ability to assemble jumps with 8-bit offset when Profiled PTC is disabled or during profiling. Modifications due to rebase. Nits. * Limited profiling of the functions to be translated to the addresses belonging to the range of static objects only. * Nits. * Nits. * Update Delegates.cs * Nit. * Update InstEmitSimdArithmetic.cs * Address riperiperi comments. * Fixed the issue of unjustifiably longer boot times at the second boot than at the first boot, measured at the same time or reference point and with the same number of translated functions. * Implemented a simple redundant load/save mechanism. Halved the value of Decoder.MaxInstsPerFunction more appropriate for the current performance of the Translator. Replaced by Logger.PrintError to Logger.PrintDebug in TexturePool.cs about the supposed invalid texture format to avoid the spawn of the log. Nits. * Nit. Improved Logger.PrintError in TexturePool.cs to avoid log spawn. Added missing code for FZ handling (in output) for fp max/min instructions (slow paths). * Add configuration migration for PTC Co-authored-by: Thog <me@thog.eu>
2020-06-16 20:28:02 +02:00
default:
throw new NotImplementedException($"Unknown MRC 0x{op.RawOpCode:X} at 0x{op.Address:X}.");
Add most of the A32 instruction set to ARMeilleure (#897) * Implement TEQ and MOV (Imm16) * Initial work on A32 instructions + SVC. No tests yet, hangs in rtld. * Implement CLZ, fix BFI and BFC Now stops on SIMD initialization. * Exclusive access instructions, fix to mul, system instructions. Now gets to a break after SignalProcessWideKey64. * Better impl of UBFX, add UDIV and SDIV Now boots way further - now stuck on VMOV instruction. * Many more instructions, start on SIMD and testing framework. * Fix build issues * svc: Rework 32 bit codepath Fixing once and for all argument ordering issues. * Fix 32 bits stacktrace * hle debug: Add 32 bits dynamic section parsing * Fix highCq mode, add many tests, fix some instruction bugs Still suffers from critical malloc failure :weary: * Fix incorrect opcode decoders and a few more instructions. * Add a few instructions and fix others. re-disable highCq for now. Disabled the svc memory clear since i'm not sure about it. * Fix build * Fix typo in ordered/exclusive stores. * Implement some more instructions, fix others. Uxtab16/Sxtab16 are untested. * Begin impl of pairwise, some other instructions. * Add a few more instructions, a quick hack to fix svcs for now. * Add tests and fix issues with VTRN, VZIP, VUZP * Add a few more instructions, fix Vmul_1 encoding. * Fix way too many instruction bugs, add tests for some of the more important ones. * Fix HighCq, enable FastFP paths for some floating point instructions (not entirely sure why these were disabled, so important to note this commit exists) Branching has been removed in A32 shifts until I figure out if it's worth it * Cleanup Part 1 There should be no functional change between these next few commits. Should is the key word. (except for removing break handler) * Implement 32 bits syscalls Co-authored-by: riperiperi <rhy3756547@hotmail.com> Implement all 32 bits counterparts of the 64 bits syscalls we currently have. * Refactor part 2: Move index/subindex logic to Operand May have inadvertently fixed one (1) bug * Add FlushProcessDataCache32 * Address jd's comments * Remove 16 bit encodings from OpCodeTable Still need to catch some edge cases (operands that use the "F" flag) and make Q encodings with non-even indexes undefined. * Correct Fpscr handling for FP vector slow paths WIP * Add StandardFPSCRValue behaviour for all Arithmetic instructions * Add StandardFPSCRValue behaviour to compare instructions. * Force passing of fpcr to FPProcessException and FPUnpack. Reduces potential for code error significantly * OpCode cleanup * Remove urgency from DMB comment in MRRC DMB is currently a no-op via the instruction, so it should likely still be a no-op here. * Test Cleanup * Fix FPDefaultNaN on Ryzen CPUs * Improve some tests, fix some shift instructions, add slow path for Vadd * Fix Typo * More test cleanup * Flip order of Fx and index, to indicate that the operand's is the "base" * Remove Simd32 register type, use Int32 and Int64 for scalars like A64 does. * Reintroduce alignment to DecoderHelper (removed by accident) * One more realign as reading diffs is hard * Use I32 registers in A32 (part 2) Swap default integer register type based on current execution mode. * FPSCR flags as Registers (part 1) Still need to change NativeContext and ExecutionContext to allow getting/setting with the flag values. * Use I32 registers in A32 (part 1) * FPSCR flags as registers (part 2) Only CMP flags are on the registers right now. It could be useful to use more of the space in non-fast-float when implementing A32 flags accurately in the fast path. * Address Feedback * Correct FP->Int behaviour (should saturate) * Make branches made by writing to PC eligible for Rejit Greatly improves performance in most games. * Remove unused branching for Vtbl * RejitRequest as a class rather than a tuple Makes a lot more sense than storing tuples on a dictionary. * Add VMOVN, VSHR (imm), VSHRN (imm) and related tests * Re-order InstEmitSystem32 Alphabetical sorting. * Address Feedback Feedback from Ac_K, remove and sort usings. * Address Feedback 2 * Address Feedback from LDj3SNuD Opcode table reordered to have alphabetical sorting within groups, Vmaxnm and Vminnm have split names to be less ambiguous, SoftFloat nits, Test nits and Test simplification with ValueSource. * Add Debug Asserts to A32 helpers Mainly to prevent the shift ones from being used on I64 operands, as they expect I32 input for most operations (eg. carry flag setting), and expect I32 input for shift and boolean amounts. Most other helper functions don't take Operands, throw on out of range values, and take specific types of OpCode, so didn't need any asserts. * Use ConstF rather than creating an operand. (useful for pooling in future) * Move exclusive load to helper, reference call flag rather than literal 1. * Address LDj feedback (minus table flatten) one final look before it's all gone. the world is so beautiful. * Flatten OpCodeTable oh no * Address more table ordering * Call Flag as int on A32 Co-authored-by: Natalie C. <cyuubiapps@gmail.com> Co-authored-by: Thog <thog@protonmail.com>
2020-02-23 22:20:40 +01:00
}
if (op.Rt == RegisterAlias.Aarch32Pc)
{
// Special behavior: copy NZCV flags into APSR.
EmitSetNzcv(context, result);
Add Profiled Persistent Translation Cache. (#769) * Delete DelegateTypes.cs * Delete DelegateCache.cs * Add files via upload * Update Horizon.cs * Update Program.cs * Update MainWindow.cs * Update Aot.cs * Update RelocEntry.cs * Update Translator.cs * Update MemoryManager.cs * Update InstEmitMemoryHelper.cs * Update Delegates.cs * Nit. * Nit. * Nit. * 10 fewer MSIL bytes for us * Add comment. Nits. * Update Translator.cs * Update Aot.cs * Nits. * Opt.. * Opt.. * Opt.. * Opt.. * Allow to change compression level. * Update MemoryManager.cs * Update Translator.cs * Manage corner cases during the save phase. Nits. * Update Aot.cs * Translator response tweak for Aot disabled. Nit. * Nit. * Nits. * Create DelegateHelpers.cs * Update Delegates.cs * Nit. * Nit. * Nits. * Fix due to #784. * Fixes due to #757 & #841. * Fix due to #846. * Fix due to #847. * Use MethodInfo for managed method calls. Use IR methods instead of managed methods about Max/Min (S/U). Follow-ups & Nits. * Add missing exception messages. Reintroduce slow path for Fmov_Vi. Implement slow path for Fmov_Si. * Switch to the new folder structure. Nits. * Impl. index-based relocation information. Impl. cache file version field. * Nit. * Address gdkchan comments. Mainly: - fixed cache file corruption issue on exit; - exposed a way to disable AOT on the GUI. * Address AcK77 comment. * Address Thealexbarney, jduncanator & emmauss comments. Header magic, CpuId (FI) & Aot -> Ptc. * Adaptation to the new application reloading system. Improvements to the call system of managed methods. Follow-ups. Nits. * Get the same boot times as on master when PTC is disabled. * Profiled Aot. * A32 support (#897). * #975 support (1 of 2). * #975 support (2 of 2). * Rebase fix & nits. * Some fixes and nits (still one bug left). * One fix & nits. * Tests fix (by gdk) & nits. * Support translations not only in high quality and rejit. Nits. * Added possibility to skip translations and continue execution, using `ESC` key. * Update SettingsWindow.cs * Update GLRenderer.cs * Update Ptc.cs * Disabled Profiled PTC by default as requested in the past by gdk. * Fix rejit bug. Increased number of parallel translations. Add stack unwinding stuffs support (1 of 2). Nits. * Add stack unwinding stuffs support (2 of 2). Tuned number of parallel translations. * Restored the ability to assemble jumps with 8-bit offset when Profiled PTC is disabled or during profiling. Modifications due to rebase. Nits. * Limited profiling of the functions to be translated to the addresses belonging to the range of static objects only. * Nits. * Nits. * Update Delegates.cs * Nit. * Update InstEmitSimdArithmetic.cs * Address riperiperi comments. * Fixed the issue of unjustifiably longer boot times at the second boot than at the first boot, measured at the same time or reference point and with the same number of translated functions. * Implemented a simple redundant load/save mechanism. Halved the value of Decoder.MaxInstsPerFunction more appropriate for the current performance of the Translator. Replaced by Logger.PrintError to Logger.PrintDebug in TexturePool.cs about the supposed invalid texture format to avoid the spawn of the log. Nits. * Nit. Improved Logger.PrintError in TexturePool.cs to avoid log spawn. Added missing code for FZ handling (in output) for fp max/min instructions (slow paths). * Add configuration migration for PTC Co-authored-by: Thog <me@thog.eu>
2020-06-16 20:28:02 +02:00
Add most of the A32 instruction set to ARMeilleure (#897) * Implement TEQ and MOV (Imm16) * Initial work on A32 instructions + SVC. No tests yet, hangs in rtld. * Implement CLZ, fix BFI and BFC Now stops on SIMD initialization. * Exclusive access instructions, fix to mul, system instructions. Now gets to a break after SignalProcessWideKey64. * Better impl of UBFX, add UDIV and SDIV Now boots way further - now stuck on VMOV instruction. * Many more instructions, start on SIMD and testing framework. * Fix build issues * svc: Rework 32 bit codepath Fixing once and for all argument ordering issues. * Fix 32 bits stacktrace * hle debug: Add 32 bits dynamic section parsing * Fix highCq mode, add many tests, fix some instruction bugs Still suffers from critical malloc failure :weary: * Fix incorrect opcode decoders and a few more instructions. * Add a few instructions and fix others. re-disable highCq for now. Disabled the svc memory clear since i'm not sure about it. * Fix build * Fix typo in ordered/exclusive stores. * Implement some more instructions, fix others. Uxtab16/Sxtab16 are untested. * Begin impl of pairwise, some other instructions. * Add a few more instructions, a quick hack to fix svcs for now. * Add tests and fix issues with VTRN, VZIP, VUZP * Add a few more instructions, fix Vmul_1 encoding. * Fix way too many instruction bugs, add tests for some of the more important ones. * Fix HighCq, enable FastFP paths for some floating point instructions (not entirely sure why these were disabled, so important to note this commit exists) Branching has been removed in A32 shifts until I figure out if it's worth it * Cleanup Part 1 There should be no functional change between these next few commits. Should is the key word. (except for removing break handler) * Implement 32 bits syscalls Co-authored-by: riperiperi <rhy3756547@hotmail.com> Implement all 32 bits counterparts of the 64 bits syscalls we currently have. * Refactor part 2: Move index/subindex logic to Operand May have inadvertently fixed one (1) bug * Add FlushProcessDataCache32 * Address jd's comments * Remove 16 bit encodings from OpCodeTable Still need to catch some edge cases (operands that use the "F" flag) and make Q encodings with non-even indexes undefined. * Correct Fpscr handling for FP vector slow paths WIP * Add StandardFPSCRValue behaviour for all Arithmetic instructions * Add StandardFPSCRValue behaviour to compare instructions. * Force passing of fpcr to FPProcessException and FPUnpack. Reduces potential for code error significantly * OpCode cleanup * Remove urgency from DMB comment in MRRC DMB is currently a no-op via the instruction, so it should likely still be a no-op here. * Test Cleanup * Fix FPDefaultNaN on Ryzen CPUs * Improve some tests, fix some shift instructions, add slow path for Vadd * Fix Typo * More test cleanup * Flip order of Fx and index, to indicate that the operand's is the "base" * Remove Simd32 register type, use Int32 and Int64 for scalars like A64 does. * Reintroduce alignment to DecoderHelper (removed by accident) * One more realign as reading diffs is hard * Use I32 registers in A32 (part 2) Swap default integer register type based on current execution mode. * FPSCR flags as Registers (part 1) Still need to change NativeContext and ExecutionContext to allow getting/setting with the flag values. * Use I32 registers in A32 (part 1) * FPSCR flags as registers (part 2) Only CMP flags are on the registers right now. It could be useful to use more of the space in non-fast-float when implementing A32 flags accurately in the fast path. * Address Feedback * Correct FP->Int behaviour (should saturate) * Make branches made by writing to PC eligible for Rejit Greatly improves performance in most games. * Remove unused branching for Vtbl * RejitRequest as a class rather than a tuple Makes a lot more sense than storing tuples on a dictionary. * Add VMOVN, VSHR (imm), VSHRN (imm) and related tests * Re-order InstEmitSystem32 Alphabetical sorting. * Address Feedback Feedback from Ac_K, remove and sort usings. * Address Feedback 2 * Address Feedback from LDj3SNuD Opcode table reordered to have alphabetical sorting within groups, Vmaxnm and Vminnm have split names to be less ambiguous, SoftFloat nits, Test nits and Test simplification with ValueSource. * Add Debug Asserts to A32 helpers Mainly to prevent the shift ones from being used on I64 operands, as they expect I32 input for most operations (eg. carry flag setting), and expect I32 input for shift and boolean amounts. Most other helper functions don't take Operands, throw on out of range values, and take specific types of OpCode, so didn't need any asserts. * Use ConstF rather than creating an operand. (useful for pooling in future) * Move exclusive load to helper, reference call flag rather than literal 1. * Address LDj feedback (minus table flatten) one final look before it's all gone. the world is so beautiful. * Flatten OpCodeTable oh no * Address more table ordering * Call Flag as int on A32 Co-authored-by: Natalie C. <cyuubiapps@gmail.com> Co-authored-by: Thog <thog@protonmail.com>
2020-02-23 22:20:40 +01:00
return;
}
else
{
SetIntA32(context, op.Rt, result);
Add most of the A32 instruction set to ARMeilleure (#897) * Implement TEQ and MOV (Imm16) * Initial work on A32 instructions + SVC. No tests yet, hangs in rtld. * Implement CLZ, fix BFI and BFC Now stops on SIMD initialization. * Exclusive access instructions, fix to mul, system instructions. Now gets to a break after SignalProcessWideKey64. * Better impl of UBFX, add UDIV and SDIV Now boots way further - now stuck on VMOV instruction. * Many more instructions, start on SIMD and testing framework. * Fix build issues * svc: Rework 32 bit codepath Fixing once and for all argument ordering issues. * Fix 32 bits stacktrace * hle debug: Add 32 bits dynamic section parsing * Fix highCq mode, add many tests, fix some instruction bugs Still suffers from critical malloc failure :weary: * Fix incorrect opcode decoders and a few more instructions. * Add a few instructions and fix others. re-disable highCq for now. Disabled the svc memory clear since i'm not sure about it. * Fix build * Fix typo in ordered/exclusive stores. * Implement some more instructions, fix others. Uxtab16/Sxtab16 are untested. * Begin impl of pairwise, some other instructions. * Add a few more instructions, a quick hack to fix svcs for now. * Add tests and fix issues with VTRN, VZIP, VUZP * Add a few more instructions, fix Vmul_1 encoding. * Fix way too many instruction bugs, add tests for some of the more important ones. * Fix HighCq, enable FastFP paths for some floating point instructions (not entirely sure why these were disabled, so important to note this commit exists) Branching has been removed in A32 shifts until I figure out if it's worth it * Cleanup Part 1 There should be no functional change between these next few commits. Should is the key word. (except for removing break handler) * Implement 32 bits syscalls Co-authored-by: riperiperi <rhy3756547@hotmail.com> Implement all 32 bits counterparts of the 64 bits syscalls we currently have. * Refactor part 2: Move index/subindex logic to Operand May have inadvertently fixed one (1) bug * Add FlushProcessDataCache32 * Address jd's comments * Remove 16 bit encodings from OpCodeTable Still need to catch some edge cases (operands that use the "F" flag) and make Q encodings with non-even indexes undefined. * Correct Fpscr handling for FP vector slow paths WIP * Add StandardFPSCRValue behaviour for all Arithmetic instructions * Add StandardFPSCRValue behaviour to compare instructions. * Force passing of fpcr to FPProcessException and FPUnpack. Reduces potential for code error significantly * OpCode cleanup * Remove urgency from DMB comment in MRRC DMB is currently a no-op via the instruction, so it should likely still be a no-op here. * Test Cleanup * Fix FPDefaultNaN on Ryzen CPUs * Improve some tests, fix some shift instructions, add slow path for Vadd * Fix Typo * More test cleanup * Flip order of Fx and index, to indicate that the operand's is the "base" * Remove Simd32 register type, use Int32 and Int64 for scalars like A64 does. * Reintroduce alignment to DecoderHelper (removed by accident) * One more realign as reading diffs is hard * Use I32 registers in A32 (part 2) Swap default integer register type based on current execution mode. * FPSCR flags as Registers (part 1) Still need to change NativeContext and ExecutionContext to allow getting/setting with the flag values. * Use I32 registers in A32 (part 1) * FPSCR flags as registers (part 2) Only CMP flags are on the registers right now. It could be useful to use more of the space in non-fast-float when implementing A32 flags accurately in the fast path. * Address Feedback * Correct FP->Int behaviour (should saturate) * Make branches made by writing to PC eligible for Rejit Greatly improves performance in most games. * Remove unused branching for Vtbl * RejitRequest as a class rather than a tuple Makes a lot more sense than storing tuples on a dictionary. * Add VMOVN, VSHR (imm), VSHRN (imm) and related tests * Re-order InstEmitSystem32 Alphabetical sorting. * Address Feedback Feedback from Ac_K, remove and sort usings. * Address Feedback 2 * Address Feedback from LDj3SNuD Opcode table reordered to have alphabetical sorting within groups, Vmaxnm and Vminnm have split names to be less ambiguous, SoftFloat nits, Test nits and Test simplification with ValueSource. * Add Debug Asserts to A32 helpers Mainly to prevent the shift ones from being used on I64 operands, as they expect I32 input for most operations (eg. carry flag setting), and expect I32 input for shift and boolean amounts. Most other helper functions don't take Operands, throw on out of range values, and take specific types of OpCode, so didn't need any asserts. * Use ConstF rather than creating an operand. (useful for pooling in future) * Move exclusive load to helper, reference call flag rather than literal 1. * Address LDj feedback (minus table flatten) one final look before it's all gone. the world is so beautiful. * Flatten OpCodeTable oh no * Address more table ordering * Call Flag as int on A32 Co-authored-by: Natalie C. <cyuubiapps@gmail.com> Co-authored-by: Thog <thog@protonmail.com>
2020-02-23 22:20:40 +01:00
}
}
public static void Mrrc(ArmEmitterContext context)
{
OpCode32System op = (OpCode32System)context.CurrOp;
if (op.Coproc != 15)
{
InstEmit.Und(context);
Add Profiled Persistent Translation Cache. (#769) * Delete DelegateTypes.cs * Delete DelegateCache.cs * Add files via upload * Update Horizon.cs * Update Program.cs * Update MainWindow.cs * Update Aot.cs * Update RelocEntry.cs * Update Translator.cs * Update MemoryManager.cs * Update InstEmitMemoryHelper.cs * Update Delegates.cs * Nit. * Nit. * Nit. * 10 fewer MSIL bytes for us * Add comment. Nits. * Update Translator.cs * Update Aot.cs * Nits. * Opt.. * Opt.. * Opt.. * Opt.. * Allow to change compression level. * Update MemoryManager.cs * Update Translator.cs * Manage corner cases during the save phase. Nits. * Update Aot.cs * Translator response tweak for Aot disabled. Nit. * Nit. * Nits. * Create DelegateHelpers.cs * Update Delegates.cs * Nit. * Nit. * Nits. * Fix due to #784. * Fixes due to #757 & #841. * Fix due to #846. * Fix due to #847. * Use MethodInfo for managed method calls. Use IR methods instead of managed methods about Max/Min (S/U). Follow-ups & Nits. * Add missing exception messages. Reintroduce slow path for Fmov_Vi. Implement slow path for Fmov_Si. * Switch to the new folder structure. Nits. * Impl. index-based relocation information. Impl. cache file version field. * Nit. * Address gdkchan comments. Mainly: - fixed cache file corruption issue on exit; - exposed a way to disable AOT on the GUI. * Address AcK77 comment. * Address Thealexbarney, jduncanator & emmauss comments. Header magic, CpuId (FI) & Aot -> Ptc. * Adaptation to the new application reloading system. Improvements to the call system of managed methods. Follow-ups. Nits. * Get the same boot times as on master when PTC is disabled. * Profiled Aot. * A32 support (#897). * #975 support (1 of 2). * #975 support (2 of 2). * Rebase fix & nits. * Some fixes and nits (still one bug left). * One fix & nits. * Tests fix (by gdk) & nits. * Support translations not only in high quality and rejit. Nits. * Added possibility to skip translations and continue execution, using `ESC` key. * Update SettingsWindow.cs * Update GLRenderer.cs * Update Ptc.cs * Disabled Profiled PTC by default as requested in the past by gdk. * Fix rejit bug. Increased number of parallel translations. Add stack unwinding stuffs support (1 of 2). Nits. * Add stack unwinding stuffs support (2 of 2). Tuned number of parallel translations. * Restored the ability to assemble jumps with 8-bit offset when Profiled PTC is disabled or during profiling. Modifications due to rebase. Nits. * Limited profiling of the functions to be translated to the addresses belonging to the range of static objects only. * Nits. * Nits. * Update Delegates.cs * Nit. * Update InstEmitSimdArithmetic.cs * Address riperiperi comments. * Fixed the issue of unjustifiably longer boot times at the second boot than at the first boot, measured at the same time or reference point and with the same number of translated functions. * Implemented a simple redundant load/save mechanism. Halved the value of Decoder.MaxInstsPerFunction more appropriate for the current performance of the Translator. Replaced by Logger.PrintError to Logger.PrintDebug in TexturePool.cs about the supposed invalid texture format to avoid the spawn of the log. Nits. * Nit. Improved Logger.PrintError in TexturePool.cs to avoid log spawn. Added missing code for FZ handling (in output) for fp max/min instructions (slow paths). * Add configuration migration for PTC Co-authored-by: Thog <me@thog.eu>
2020-06-16 20:28:02 +02:00
return;
Add most of the A32 instruction set to ARMeilleure (#897) * Implement TEQ and MOV (Imm16) * Initial work on A32 instructions + SVC. No tests yet, hangs in rtld. * Implement CLZ, fix BFI and BFC Now stops on SIMD initialization. * Exclusive access instructions, fix to mul, system instructions. Now gets to a break after SignalProcessWideKey64. * Better impl of UBFX, add UDIV and SDIV Now boots way further - now stuck on VMOV instruction. * Many more instructions, start on SIMD and testing framework. * Fix build issues * svc: Rework 32 bit codepath Fixing once and for all argument ordering issues. * Fix 32 bits stacktrace * hle debug: Add 32 bits dynamic section parsing * Fix highCq mode, add many tests, fix some instruction bugs Still suffers from critical malloc failure :weary: * Fix incorrect opcode decoders and a few more instructions. * Add a few instructions and fix others. re-disable highCq for now. Disabled the svc memory clear since i'm not sure about it. * Fix build * Fix typo in ordered/exclusive stores. * Implement some more instructions, fix others. Uxtab16/Sxtab16 are untested. * Begin impl of pairwise, some other instructions. * Add a few more instructions, a quick hack to fix svcs for now. * Add tests and fix issues with VTRN, VZIP, VUZP * Add a few more instructions, fix Vmul_1 encoding. * Fix way too many instruction bugs, add tests for some of the more important ones. * Fix HighCq, enable FastFP paths for some floating point instructions (not entirely sure why these were disabled, so important to note this commit exists) Branching has been removed in A32 shifts until I figure out if it's worth it * Cleanup Part 1 There should be no functional change between these next few commits. Should is the key word. (except for removing break handler) * Implement 32 bits syscalls Co-authored-by: riperiperi <rhy3756547@hotmail.com> Implement all 32 bits counterparts of the 64 bits syscalls we currently have. * Refactor part 2: Move index/subindex logic to Operand May have inadvertently fixed one (1) bug * Add FlushProcessDataCache32 * Address jd's comments * Remove 16 bit encodings from OpCodeTable Still need to catch some edge cases (operands that use the "F" flag) and make Q encodings with non-even indexes undefined. * Correct Fpscr handling for FP vector slow paths WIP * Add StandardFPSCRValue behaviour for all Arithmetic instructions * Add StandardFPSCRValue behaviour to compare instructions. * Force passing of fpcr to FPProcessException and FPUnpack. Reduces potential for code error significantly * OpCode cleanup * Remove urgency from DMB comment in MRRC DMB is currently a no-op via the instruction, so it should likely still be a no-op here. * Test Cleanup * Fix FPDefaultNaN on Ryzen CPUs * Improve some tests, fix some shift instructions, add slow path for Vadd * Fix Typo * More test cleanup * Flip order of Fx and index, to indicate that the operand's is the "base" * Remove Simd32 register type, use Int32 and Int64 for scalars like A64 does. * Reintroduce alignment to DecoderHelper (removed by accident) * One more realign as reading diffs is hard * Use I32 registers in A32 (part 2) Swap default integer register type based on current execution mode. * FPSCR flags as Registers (part 1) Still need to change NativeContext and ExecutionContext to allow getting/setting with the flag values. * Use I32 registers in A32 (part 1) * FPSCR flags as registers (part 2) Only CMP flags are on the registers right now. It could be useful to use more of the space in non-fast-float when implementing A32 flags accurately in the fast path. * Address Feedback * Correct FP->Int behaviour (should saturate) * Make branches made by writing to PC eligible for Rejit Greatly improves performance in most games. * Remove unused branching for Vtbl * RejitRequest as a class rather than a tuple Makes a lot more sense than storing tuples on a dictionary. * Add VMOVN, VSHR (imm), VSHRN (imm) and related tests * Re-order InstEmitSystem32 Alphabetical sorting. * Address Feedback Feedback from Ac_K, remove and sort usings. * Address Feedback 2 * Address Feedback from LDj3SNuD Opcode table reordered to have alphabetical sorting within groups, Vmaxnm and Vminnm have split names to be less ambiguous, SoftFloat nits, Test nits and Test simplification with ValueSource. * Add Debug Asserts to A32 helpers Mainly to prevent the shift ones from being used on I64 operands, as they expect I32 input for most operations (eg. carry flag setting), and expect I32 input for shift and boolean amounts. Most other helper functions don't take Operands, throw on out of range values, and take specific types of OpCode, so didn't need any asserts. * Use ConstF rather than creating an operand. (useful for pooling in future) * Move exclusive load to helper, reference call flag rather than literal 1. * Address LDj feedback (minus table flatten) one final look before it's all gone. the world is so beautiful. * Flatten OpCodeTable oh no * Address more table ordering * Call Flag as int on A32 Co-authored-by: Natalie C. <cyuubiapps@gmail.com> Co-authored-by: Thog <thog@protonmail.com>
2020-02-23 22:20:40 +01:00
}
Add Profiled Persistent Translation Cache. (#769) * Delete DelegateTypes.cs * Delete DelegateCache.cs * Add files via upload * Update Horizon.cs * Update Program.cs * Update MainWindow.cs * Update Aot.cs * Update RelocEntry.cs * Update Translator.cs * Update MemoryManager.cs * Update InstEmitMemoryHelper.cs * Update Delegates.cs * Nit. * Nit. * Nit. * 10 fewer MSIL bytes for us * Add comment. Nits. * Update Translator.cs * Update Aot.cs * Nits. * Opt.. * Opt.. * Opt.. * Opt.. * Allow to change compression level. * Update MemoryManager.cs * Update Translator.cs * Manage corner cases during the save phase. Nits. * Update Aot.cs * Translator response tweak for Aot disabled. Nit. * Nit. * Nits. * Create DelegateHelpers.cs * Update Delegates.cs * Nit. * Nit. * Nits. * Fix due to #784. * Fixes due to #757 & #841. * Fix due to #846. * Fix due to #847. * Use MethodInfo for managed method calls. Use IR methods instead of managed methods about Max/Min (S/U). Follow-ups & Nits. * Add missing exception messages. Reintroduce slow path for Fmov_Vi. Implement slow path for Fmov_Si. * Switch to the new folder structure. Nits. * Impl. index-based relocation information. Impl. cache file version field. * Nit. * Address gdkchan comments. Mainly: - fixed cache file corruption issue on exit; - exposed a way to disable AOT on the GUI. * Address AcK77 comment. * Address Thealexbarney, jduncanator & emmauss comments. Header magic, CpuId (FI) & Aot -> Ptc. * Adaptation to the new application reloading system. Improvements to the call system of managed methods. Follow-ups. Nits. * Get the same boot times as on master when PTC is disabled. * Profiled Aot. * A32 support (#897). * #975 support (1 of 2). * #975 support (2 of 2). * Rebase fix & nits. * Some fixes and nits (still one bug left). * One fix & nits. * Tests fix (by gdk) & nits. * Support translations not only in high quality and rejit. Nits. * Added possibility to skip translations and continue execution, using `ESC` key. * Update SettingsWindow.cs * Update GLRenderer.cs * Update Ptc.cs * Disabled Profiled PTC by default as requested in the past by gdk. * Fix rejit bug. Increased number of parallel translations. Add stack unwinding stuffs support (1 of 2). Nits. * Add stack unwinding stuffs support (2 of 2). Tuned number of parallel translations. * Restored the ability to assemble jumps with 8-bit offset when Profiled PTC is disabled or during profiling. Modifications due to rebase. Nits. * Limited profiling of the functions to be translated to the addresses belonging to the range of static objects only. * Nits. * Nits. * Update Delegates.cs * Nit. * Update InstEmitSimdArithmetic.cs * Address riperiperi comments. * Fixed the issue of unjustifiably longer boot times at the second boot than at the first boot, measured at the same time or reference point and with the same number of translated functions. * Implemented a simple redundant load/save mechanism. Halved the value of Decoder.MaxInstsPerFunction more appropriate for the current performance of the Translator. Replaced by Logger.PrintError to Logger.PrintDebug in TexturePool.cs about the supposed invalid texture format to avoid the spawn of the log. Nits. * Nit. Improved Logger.PrintError in TexturePool.cs to avoid log spawn. Added missing code for FZ handling (in output) for fp max/min instructions (slow paths). * Add configuration migration for PTC Co-authored-by: Thog <me@thog.eu>
2020-06-16 20:28:02 +02:00
int opc = op.MrrcOp;
MethodInfo info;
Add most of the A32 instruction set to ARMeilleure (#897) * Implement TEQ and MOV (Imm16) * Initial work on A32 instructions + SVC. No tests yet, hangs in rtld. * Implement CLZ, fix BFI and BFC Now stops on SIMD initialization. * Exclusive access instructions, fix to mul, system instructions. Now gets to a break after SignalProcessWideKey64. * Better impl of UBFX, add UDIV and SDIV Now boots way further - now stuck on VMOV instruction. * Many more instructions, start on SIMD and testing framework. * Fix build issues * svc: Rework 32 bit codepath Fixing once and for all argument ordering issues. * Fix 32 bits stacktrace * hle debug: Add 32 bits dynamic section parsing * Fix highCq mode, add many tests, fix some instruction bugs Still suffers from critical malloc failure :weary: * Fix incorrect opcode decoders and a few more instructions. * Add a few instructions and fix others. re-disable highCq for now. Disabled the svc memory clear since i'm not sure about it. * Fix build * Fix typo in ordered/exclusive stores. * Implement some more instructions, fix others. Uxtab16/Sxtab16 are untested. * Begin impl of pairwise, some other instructions. * Add a few more instructions, a quick hack to fix svcs for now. * Add tests and fix issues with VTRN, VZIP, VUZP * Add a few more instructions, fix Vmul_1 encoding. * Fix way too many instruction bugs, add tests for some of the more important ones. * Fix HighCq, enable FastFP paths for some floating point instructions (not entirely sure why these were disabled, so important to note this commit exists) Branching has been removed in A32 shifts until I figure out if it's worth it * Cleanup Part 1 There should be no functional change between these next few commits. Should is the key word. (except for removing break handler) * Implement 32 bits syscalls Co-authored-by: riperiperi <rhy3756547@hotmail.com> Implement all 32 bits counterparts of the 64 bits syscalls we currently have. * Refactor part 2: Move index/subindex logic to Operand May have inadvertently fixed one (1) bug * Add FlushProcessDataCache32 * Address jd's comments * Remove 16 bit encodings from OpCodeTable Still need to catch some edge cases (operands that use the "F" flag) and make Q encodings with non-even indexes undefined. * Correct Fpscr handling for FP vector slow paths WIP * Add StandardFPSCRValue behaviour for all Arithmetic instructions * Add StandardFPSCRValue behaviour to compare instructions. * Force passing of fpcr to FPProcessException and FPUnpack. Reduces potential for code error significantly * OpCode cleanup * Remove urgency from DMB comment in MRRC DMB is currently a no-op via the instruction, so it should likely still be a no-op here. * Test Cleanup * Fix FPDefaultNaN on Ryzen CPUs * Improve some tests, fix some shift instructions, add slow path for Vadd * Fix Typo * More test cleanup * Flip order of Fx and index, to indicate that the operand's is the "base" * Remove Simd32 register type, use Int32 and Int64 for scalars like A64 does. * Reintroduce alignment to DecoderHelper (removed by accident) * One more realign as reading diffs is hard * Use I32 registers in A32 (part 2) Swap default integer register type based on current execution mode. * FPSCR flags as Registers (part 1) Still need to change NativeContext and ExecutionContext to allow getting/setting with the flag values. * Use I32 registers in A32 (part 1) * FPSCR flags as registers (part 2) Only CMP flags are on the registers right now. It could be useful to use more of the space in non-fast-float when implementing A32 flags accurately in the fast path. * Address Feedback * Correct FP->Int behaviour (should saturate) * Make branches made by writing to PC eligible for Rejit Greatly improves performance in most games. * Remove unused branching for Vtbl * RejitRequest as a class rather than a tuple Makes a lot more sense than storing tuples on a dictionary. * Add VMOVN, VSHR (imm), VSHRN (imm) and related tests * Re-order InstEmitSystem32 Alphabetical sorting. * Address Feedback Feedback from Ac_K, remove and sort usings. * Address Feedback 2 * Address Feedback from LDj3SNuD Opcode table reordered to have alphabetical sorting within groups, Vmaxnm and Vminnm have split names to be less ambiguous, SoftFloat nits, Test nits and Test simplification with ValueSource. * Add Debug Asserts to A32 helpers Mainly to prevent the shift ones from being used on I64 operands, as they expect I32 input for most operations (eg. carry flag setting), and expect I32 input for shift and boolean amounts. Most other helper functions don't take Operands, throw on out of range values, and take specific types of OpCode, so didn't need any asserts. * Use ConstF rather than creating an operand. (useful for pooling in future) * Move exclusive load to helper, reference call flag rather than literal 1. * Address LDj feedback (minus table flatten) one final look before it's all gone. the world is so beautiful. * Flatten OpCodeTable oh no * Address more table ordering * Call Flag as int on A32 Co-authored-by: Natalie C. <cyuubiapps@gmail.com> Co-authored-by: Thog <thog@protonmail.com>
2020-02-23 22:20:40 +01:00
switch (op.CRm)
{
case 14: // Timer.
switch (opc)
{
case 0:
Add Profiled Persistent Translation Cache. (#769) * Delete DelegateTypes.cs * Delete DelegateCache.cs * Add files via upload * Update Horizon.cs * Update Program.cs * Update MainWindow.cs * Update Aot.cs * Update RelocEntry.cs * Update Translator.cs * Update MemoryManager.cs * Update InstEmitMemoryHelper.cs * Update Delegates.cs * Nit. * Nit. * Nit. * 10 fewer MSIL bytes for us * Add comment. Nits. * Update Translator.cs * Update Aot.cs * Nits. * Opt.. * Opt.. * Opt.. * Opt.. * Allow to change compression level. * Update MemoryManager.cs * Update Translator.cs * Manage corner cases during the save phase. Nits. * Update Aot.cs * Translator response tweak for Aot disabled. Nit. * Nit. * Nits. * Create DelegateHelpers.cs * Update Delegates.cs * Nit. * Nit. * Nits. * Fix due to #784. * Fixes due to #757 & #841. * Fix due to #846. * Fix due to #847. * Use MethodInfo for managed method calls. Use IR methods instead of managed methods about Max/Min (S/U). Follow-ups & Nits. * Add missing exception messages. Reintroduce slow path for Fmov_Vi. Implement slow path for Fmov_Si. * Switch to the new folder structure. Nits. * Impl. index-based relocation information. Impl. cache file version field. * Nit. * Address gdkchan comments. Mainly: - fixed cache file corruption issue on exit; - exposed a way to disable AOT on the GUI. * Address AcK77 comment. * Address Thealexbarney, jduncanator & emmauss comments. Header magic, CpuId (FI) & Aot -> Ptc. * Adaptation to the new application reloading system. Improvements to the call system of managed methods. Follow-ups. Nits. * Get the same boot times as on master when PTC is disabled. * Profiled Aot. * A32 support (#897). * #975 support (1 of 2). * #975 support (2 of 2). * Rebase fix & nits. * Some fixes and nits (still one bug left). * One fix & nits. * Tests fix (by gdk) & nits. * Support translations not only in high quality and rejit. Nits. * Added possibility to skip translations and continue execution, using `ESC` key. * Update SettingsWindow.cs * Update GLRenderer.cs * Update Ptc.cs * Disabled Profiled PTC by default as requested in the past by gdk. * Fix rejit bug. Increased number of parallel translations. Add stack unwinding stuffs support (1 of 2). Nits. * Add stack unwinding stuffs support (2 of 2). Tuned number of parallel translations. * Restored the ability to assemble jumps with 8-bit offset when Profiled PTC is disabled or during profiling. Modifications due to rebase. Nits. * Limited profiling of the functions to be translated to the addresses belonging to the range of static objects only. * Nits. * Nits. * Update Delegates.cs * Nit. * Update InstEmitSimdArithmetic.cs * Address riperiperi comments. * Fixed the issue of unjustifiably longer boot times at the second boot than at the first boot, measured at the same time or reference point and with the same number of translated functions. * Implemented a simple redundant load/save mechanism. Halved the value of Decoder.MaxInstsPerFunction more appropriate for the current performance of the Translator. Replaced by Logger.PrintError to Logger.PrintDebug in TexturePool.cs about the supposed invalid texture format to avoid the spawn of the log. Nits. * Nit. Improved Logger.PrintError in TexturePool.cs to avoid log spawn. Added missing code for FZ handling (in output) for fp max/min instructions (slow paths). * Add configuration migration for PTC Co-authored-by: Thog <me@thog.eu>
2020-06-16 20:28:02 +02:00
info = typeof(NativeInterface).GetMethod(nameof(NativeInterface.GetCntpctEl0)); break;
Add most of the A32 instruction set to ARMeilleure (#897) * Implement TEQ and MOV (Imm16) * Initial work on A32 instructions + SVC. No tests yet, hangs in rtld. * Implement CLZ, fix BFI and BFC Now stops on SIMD initialization. * Exclusive access instructions, fix to mul, system instructions. Now gets to a break after SignalProcessWideKey64. * Better impl of UBFX, add UDIV and SDIV Now boots way further - now stuck on VMOV instruction. * Many more instructions, start on SIMD and testing framework. * Fix build issues * svc: Rework 32 bit codepath Fixing once and for all argument ordering issues. * Fix 32 bits stacktrace * hle debug: Add 32 bits dynamic section parsing * Fix highCq mode, add many tests, fix some instruction bugs Still suffers from critical malloc failure :weary: * Fix incorrect opcode decoders and a few more instructions. * Add a few instructions and fix others. re-disable highCq for now. Disabled the svc memory clear since i'm not sure about it. * Fix build * Fix typo in ordered/exclusive stores. * Implement some more instructions, fix others. Uxtab16/Sxtab16 are untested. * Begin impl of pairwise, some other instructions. * Add a few more instructions, a quick hack to fix svcs for now. * Add tests and fix issues with VTRN, VZIP, VUZP * Add a few more instructions, fix Vmul_1 encoding. * Fix way too many instruction bugs, add tests for some of the more important ones. * Fix HighCq, enable FastFP paths for some floating point instructions (not entirely sure why these were disabled, so important to note this commit exists) Branching has been removed in A32 shifts until I figure out if it's worth it * Cleanup Part 1 There should be no functional change between these next few commits. Should is the key word. (except for removing break handler) * Implement 32 bits syscalls Co-authored-by: riperiperi <rhy3756547@hotmail.com> Implement all 32 bits counterparts of the 64 bits syscalls we currently have. * Refactor part 2: Move index/subindex logic to Operand May have inadvertently fixed one (1) bug * Add FlushProcessDataCache32 * Address jd's comments * Remove 16 bit encodings from OpCodeTable Still need to catch some edge cases (operands that use the "F" flag) and make Q encodings with non-even indexes undefined. * Correct Fpscr handling for FP vector slow paths WIP * Add StandardFPSCRValue behaviour for all Arithmetic instructions * Add StandardFPSCRValue behaviour to compare instructions. * Force passing of fpcr to FPProcessException and FPUnpack. Reduces potential for code error significantly * OpCode cleanup * Remove urgency from DMB comment in MRRC DMB is currently a no-op via the instruction, so it should likely still be a no-op here. * Test Cleanup * Fix FPDefaultNaN on Ryzen CPUs * Improve some tests, fix some shift instructions, add slow path for Vadd * Fix Typo * More test cleanup * Flip order of Fx and index, to indicate that the operand's is the "base" * Remove Simd32 register type, use Int32 and Int64 for scalars like A64 does. * Reintroduce alignment to DecoderHelper (removed by accident) * One more realign as reading diffs is hard * Use I32 registers in A32 (part 2) Swap default integer register type based on current execution mode. * FPSCR flags as Registers (part 1) Still need to change NativeContext and ExecutionContext to allow getting/setting with the flag values. * Use I32 registers in A32 (part 1) * FPSCR flags as registers (part 2) Only CMP flags are on the registers right now. It could be useful to use more of the space in non-fast-float when implementing A32 flags accurately in the fast path. * Address Feedback * Correct FP->Int behaviour (should saturate) * Make branches made by writing to PC eligible for Rejit Greatly improves performance in most games. * Remove unused branching for Vtbl * RejitRequest as a class rather than a tuple Makes a lot more sense than storing tuples on a dictionary. * Add VMOVN, VSHR (imm), VSHRN (imm) and related tests * Re-order InstEmitSystem32 Alphabetical sorting. * Address Feedback Feedback from Ac_K, remove and sort usings. * Address Feedback 2 * Address Feedback from LDj3SNuD Opcode table reordered to have alphabetical sorting within groups, Vmaxnm and Vminnm have split names to be less ambiguous, SoftFloat nits, Test nits and Test simplification with ValueSource. * Add Debug Asserts to A32 helpers Mainly to prevent the shift ones from being used on I64 operands, as they expect I32 input for most operations (eg. carry flag setting), and expect I32 input for shift and boolean amounts. Most other helper functions don't take Operands, throw on out of range values, and take specific types of OpCode, so didn't need any asserts. * Use ConstF rather than creating an operand. (useful for pooling in future) * Move exclusive load to helper, reference call flag rather than literal 1. * Address LDj feedback (minus table flatten) one final look before it's all gone. the world is so beautiful. * Flatten OpCodeTable oh no * Address more table ordering * Call Flag as int on A32 Co-authored-by: Natalie C. <cyuubiapps@gmail.com> Co-authored-by: Thog <thog@protonmail.com>
2020-02-23 22:20:40 +01:00
default:
throw new NotImplementedException($"Unknown MRRC Opc1 0x{opc:X} at 0x{op.Address:X} (0x{op.RawOpCode:X}).");
Add most of the A32 instruction set to ARMeilleure (#897) * Implement TEQ and MOV (Imm16) * Initial work on A32 instructions + SVC. No tests yet, hangs in rtld. * Implement CLZ, fix BFI and BFC Now stops on SIMD initialization. * Exclusive access instructions, fix to mul, system instructions. Now gets to a break after SignalProcessWideKey64. * Better impl of UBFX, add UDIV and SDIV Now boots way further - now stuck on VMOV instruction. * Many more instructions, start on SIMD and testing framework. * Fix build issues * svc: Rework 32 bit codepath Fixing once and for all argument ordering issues. * Fix 32 bits stacktrace * hle debug: Add 32 bits dynamic section parsing * Fix highCq mode, add many tests, fix some instruction bugs Still suffers from critical malloc failure :weary: * Fix incorrect opcode decoders and a few more instructions. * Add a few instructions and fix others. re-disable highCq for now. Disabled the svc memory clear since i'm not sure about it. * Fix build * Fix typo in ordered/exclusive stores. * Implement some more instructions, fix others. Uxtab16/Sxtab16 are untested. * Begin impl of pairwise, some other instructions. * Add a few more instructions, a quick hack to fix svcs for now. * Add tests and fix issues with VTRN, VZIP, VUZP * Add a few more instructions, fix Vmul_1 encoding. * Fix way too many instruction bugs, add tests for some of the more important ones. * Fix HighCq, enable FastFP paths for some floating point instructions (not entirely sure why these were disabled, so important to note this commit exists) Branching has been removed in A32 shifts until I figure out if it's worth it * Cleanup Part 1 There should be no functional change between these next few commits. Should is the key word. (except for removing break handler) * Implement 32 bits syscalls Co-authored-by: riperiperi <rhy3756547@hotmail.com> Implement all 32 bits counterparts of the 64 bits syscalls we currently have. * Refactor part 2: Move index/subindex logic to Operand May have inadvertently fixed one (1) bug * Add FlushProcessDataCache32 * Address jd's comments * Remove 16 bit encodings from OpCodeTable Still need to catch some edge cases (operands that use the "F" flag) and make Q encodings with non-even indexes undefined. * Correct Fpscr handling for FP vector slow paths WIP * Add StandardFPSCRValue behaviour for all Arithmetic instructions * Add StandardFPSCRValue behaviour to compare instructions. * Force passing of fpcr to FPProcessException and FPUnpack. Reduces potential for code error significantly * OpCode cleanup * Remove urgency from DMB comment in MRRC DMB is currently a no-op via the instruction, so it should likely still be a no-op here. * Test Cleanup * Fix FPDefaultNaN on Ryzen CPUs * Improve some tests, fix some shift instructions, add slow path for Vadd * Fix Typo * More test cleanup * Flip order of Fx and index, to indicate that the operand's is the "base" * Remove Simd32 register type, use Int32 and Int64 for scalars like A64 does. * Reintroduce alignment to DecoderHelper (removed by accident) * One more realign as reading diffs is hard * Use I32 registers in A32 (part 2) Swap default integer register type based on current execution mode. * FPSCR flags as Registers (part 1) Still need to change NativeContext and ExecutionContext to allow getting/setting with the flag values. * Use I32 registers in A32 (part 1) * FPSCR flags as registers (part 2) Only CMP flags are on the registers right now. It could be useful to use more of the space in non-fast-float when implementing A32 flags accurately in the fast path. * Address Feedback * Correct FP->Int behaviour (should saturate) * Make branches made by writing to PC eligible for Rejit Greatly improves performance in most games. * Remove unused branching for Vtbl * RejitRequest as a class rather than a tuple Makes a lot more sense than storing tuples on a dictionary. * Add VMOVN, VSHR (imm), VSHRN (imm) and related tests * Re-order InstEmitSystem32 Alphabetical sorting. * Address Feedback Feedback from Ac_K, remove and sort usings. * Address Feedback 2 * Address Feedback from LDj3SNuD Opcode table reordered to have alphabetical sorting within groups, Vmaxnm and Vminnm have split names to be less ambiguous, SoftFloat nits, Test nits and Test simplification with ValueSource. * Add Debug Asserts to A32 helpers Mainly to prevent the shift ones from being used on I64 operands, as they expect I32 input for most operations (eg. carry flag setting), and expect I32 input for shift and boolean amounts. Most other helper functions don't take Operands, throw on out of range values, and take specific types of OpCode, so didn't need any asserts. * Use ConstF rather than creating an operand. (useful for pooling in future) * Move exclusive load to helper, reference call flag rather than literal 1. * Address LDj feedback (minus table flatten) one final look before it's all gone. the world is so beautiful. * Flatten OpCodeTable oh no * Address more table ordering * Call Flag as int on A32 Co-authored-by: Natalie C. <cyuubiapps@gmail.com> Co-authored-by: Thog <thog@protonmail.com>
2020-02-23 22:20:40 +01:00
}
Add Profiled Persistent Translation Cache. (#769) * Delete DelegateTypes.cs * Delete DelegateCache.cs * Add files via upload * Update Horizon.cs * Update Program.cs * Update MainWindow.cs * Update Aot.cs * Update RelocEntry.cs * Update Translator.cs * Update MemoryManager.cs * Update InstEmitMemoryHelper.cs * Update Delegates.cs * Nit. * Nit. * Nit. * 10 fewer MSIL bytes for us * Add comment. Nits. * Update Translator.cs * Update Aot.cs * Nits. * Opt.. * Opt.. * Opt.. * Opt.. * Allow to change compression level. * Update MemoryManager.cs * Update Translator.cs * Manage corner cases during the save phase. Nits. * Update Aot.cs * Translator response tweak for Aot disabled. Nit. * Nit. * Nits. * Create DelegateHelpers.cs * Update Delegates.cs * Nit. * Nit. * Nits. * Fix due to #784. * Fixes due to #757 & #841. * Fix due to #846. * Fix due to #847. * Use MethodInfo for managed method calls. Use IR methods instead of managed methods about Max/Min (S/U). Follow-ups & Nits. * Add missing exception messages. Reintroduce slow path for Fmov_Vi. Implement slow path for Fmov_Si. * Switch to the new folder structure. Nits. * Impl. index-based relocation information. Impl. cache file version field. * Nit. * Address gdkchan comments. Mainly: - fixed cache file corruption issue on exit; - exposed a way to disable AOT on the GUI. * Address AcK77 comment. * Address Thealexbarney, jduncanator & emmauss comments. Header magic, CpuId (FI) & Aot -> Ptc. * Adaptation to the new application reloading system. Improvements to the call system of managed methods. Follow-ups. Nits. * Get the same boot times as on master when PTC is disabled. * Profiled Aot. * A32 support (#897). * #975 support (1 of 2). * #975 support (2 of 2). * Rebase fix & nits. * Some fixes and nits (still one bug left). * One fix & nits. * Tests fix (by gdk) & nits. * Support translations not only in high quality and rejit. Nits. * Added possibility to skip translations and continue execution, using `ESC` key. * Update SettingsWindow.cs * Update GLRenderer.cs * Update Ptc.cs * Disabled Profiled PTC by default as requested in the past by gdk. * Fix rejit bug. Increased number of parallel translations. Add stack unwinding stuffs support (1 of 2). Nits. * Add stack unwinding stuffs support (2 of 2). Tuned number of parallel translations. * Restored the ability to assemble jumps with 8-bit offset when Profiled PTC is disabled or during profiling. Modifications due to rebase. Nits. * Limited profiling of the functions to be translated to the addresses belonging to the range of static objects only. * Nits. * Nits. * Update Delegates.cs * Nit. * Update InstEmitSimdArithmetic.cs * Address riperiperi comments. * Fixed the issue of unjustifiably longer boot times at the second boot than at the first boot, measured at the same time or reference point and with the same number of translated functions. * Implemented a simple redundant load/save mechanism. Halved the value of Decoder.MaxInstsPerFunction more appropriate for the current performance of the Translator. Replaced by Logger.PrintError to Logger.PrintDebug in TexturePool.cs about the supposed invalid texture format to avoid the spawn of the log. Nits. * Nit. Improved Logger.PrintError in TexturePool.cs to avoid log spawn. Added missing code for FZ handling (in output) for fp max/min instructions (slow paths). * Add configuration migration for PTC Co-authored-by: Thog <me@thog.eu>
2020-06-16 20:28:02 +02:00
Add most of the A32 instruction set to ARMeilleure (#897) * Implement TEQ and MOV (Imm16) * Initial work on A32 instructions + SVC. No tests yet, hangs in rtld. * Implement CLZ, fix BFI and BFC Now stops on SIMD initialization. * Exclusive access instructions, fix to mul, system instructions. Now gets to a break after SignalProcessWideKey64. * Better impl of UBFX, add UDIV and SDIV Now boots way further - now stuck on VMOV instruction. * Many more instructions, start on SIMD and testing framework. * Fix build issues * svc: Rework 32 bit codepath Fixing once and for all argument ordering issues. * Fix 32 bits stacktrace * hle debug: Add 32 bits dynamic section parsing * Fix highCq mode, add many tests, fix some instruction bugs Still suffers from critical malloc failure :weary: * Fix incorrect opcode decoders and a few more instructions. * Add a few instructions and fix others. re-disable highCq for now. Disabled the svc memory clear since i'm not sure about it. * Fix build * Fix typo in ordered/exclusive stores. * Implement some more instructions, fix others. Uxtab16/Sxtab16 are untested. * Begin impl of pairwise, some other instructions. * Add a few more instructions, a quick hack to fix svcs for now. * Add tests and fix issues with VTRN, VZIP, VUZP * Add a few more instructions, fix Vmul_1 encoding. * Fix way too many instruction bugs, add tests for some of the more important ones. * Fix HighCq, enable FastFP paths for some floating point instructions (not entirely sure why these were disabled, so important to note this commit exists) Branching has been removed in A32 shifts until I figure out if it's worth it * Cleanup Part 1 There should be no functional change between these next few commits. Should is the key word. (except for removing break handler) * Implement 32 bits syscalls Co-authored-by: riperiperi <rhy3756547@hotmail.com> Implement all 32 bits counterparts of the 64 bits syscalls we currently have. * Refactor part 2: Move index/subindex logic to Operand May have inadvertently fixed one (1) bug * Add FlushProcessDataCache32 * Address jd's comments * Remove 16 bit encodings from OpCodeTable Still need to catch some edge cases (operands that use the "F" flag) and make Q encodings with non-even indexes undefined. * Correct Fpscr handling for FP vector slow paths WIP * Add StandardFPSCRValue behaviour for all Arithmetic instructions * Add StandardFPSCRValue behaviour to compare instructions. * Force passing of fpcr to FPProcessException and FPUnpack. Reduces potential for code error significantly * OpCode cleanup * Remove urgency from DMB comment in MRRC DMB is currently a no-op via the instruction, so it should likely still be a no-op here. * Test Cleanup * Fix FPDefaultNaN on Ryzen CPUs * Improve some tests, fix some shift instructions, add slow path for Vadd * Fix Typo * More test cleanup * Flip order of Fx and index, to indicate that the operand's is the "base" * Remove Simd32 register type, use Int32 and Int64 for scalars like A64 does. * Reintroduce alignment to DecoderHelper (removed by accident) * One more realign as reading diffs is hard * Use I32 registers in A32 (part 2) Swap default integer register type based on current execution mode. * FPSCR flags as Registers (part 1) Still need to change NativeContext and ExecutionContext to allow getting/setting with the flag values. * Use I32 registers in A32 (part 1) * FPSCR flags as registers (part 2) Only CMP flags are on the registers right now. It could be useful to use more of the space in non-fast-float when implementing A32 flags accurately in the fast path. * Address Feedback * Correct FP->Int behaviour (should saturate) * Make branches made by writing to PC eligible for Rejit Greatly improves performance in most games. * Remove unused branching for Vtbl * RejitRequest as a class rather than a tuple Makes a lot more sense than storing tuples on a dictionary. * Add VMOVN, VSHR (imm), VSHRN (imm) and related tests * Re-order InstEmitSystem32 Alphabetical sorting. * Address Feedback Feedback from Ac_K, remove and sort usings. * Address Feedback 2 * Address Feedback from LDj3SNuD Opcode table reordered to have alphabetical sorting within groups, Vmaxnm and Vminnm have split names to be less ambiguous, SoftFloat nits, Test nits and Test simplification with ValueSource. * Add Debug Asserts to A32 helpers Mainly to prevent the shift ones from being used on I64 operands, as they expect I32 input for most operations (eg. carry flag setting), and expect I32 input for shift and boolean amounts. Most other helper functions don't take Operands, throw on out of range values, and take specific types of OpCode, so didn't need any asserts. * Use ConstF rather than creating an operand. (useful for pooling in future) * Move exclusive load to helper, reference call flag rather than literal 1. * Address LDj feedback (minus table flatten) one final look before it's all gone. the world is so beautiful. * Flatten OpCodeTable oh no * Address more table ordering * Call Flag as int on A32 Co-authored-by: Natalie C. <cyuubiapps@gmail.com> Co-authored-by: Thog <thog@protonmail.com>
2020-02-23 22:20:40 +01:00
break;
Add Profiled Persistent Translation Cache. (#769) * Delete DelegateTypes.cs * Delete DelegateCache.cs * Add files via upload * Update Horizon.cs * Update Program.cs * Update MainWindow.cs * Update Aot.cs * Update RelocEntry.cs * Update Translator.cs * Update MemoryManager.cs * Update InstEmitMemoryHelper.cs * Update Delegates.cs * Nit. * Nit. * Nit. * 10 fewer MSIL bytes for us * Add comment. Nits. * Update Translator.cs * Update Aot.cs * Nits. * Opt.. * Opt.. * Opt.. * Opt.. * Allow to change compression level. * Update MemoryManager.cs * Update Translator.cs * Manage corner cases during the save phase. Nits. * Update Aot.cs * Translator response tweak for Aot disabled. Nit. * Nit. * Nits. * Create DelegateHelpers.cs * Update Delegates.cs * Nit. * Nit. * Nits. * Fix due to #784. * Fixes due to #757 & #841. * Fix due to #846. * Fix due to #847. * Use MethodInfo for managed method calls. Use IR methods instead of managed methods about Max/Min (S/U). Follow-ups & Nits. * Add missing exception messages. Reintroduce slow path for Fmov_Vi. Implement slow path for Fmov_Si. * Switch to the new folder structure. Nits. * Impl. index-based relocation information. Impl. cache file version field. * Nit. * Address gdkchan comments. Mainly: - fixed cache file corruption issue on exit; - exposed a way to disable AOT on the GUI. * Address AcK77 comment. * Address Thealexbarney, jduncanator & emmauss comments. Header magic, CpuId (FI) & Aot -> Ptc. * Adaptation to the new application reloading system. Improvements to the call system of managed methods. Follow-ups. Nits. * Get the same boot times as on master when PTC is disabled. * Profiled Aot. * A32 support (#897). * #975 support (1 of 2). * #975 support (2 of 2). * Rebase fix & nits. * Some fixes and nits (still one bug left). * One fix & nits. * Tests fix (by gdk) & nits. * Support translations not only in high quality and rejit. Nits. * Added possibility to skip translations and continue execution, using `ESC` key. * Update SettingsWindow.cs * Update GLRenderer.cs * Update Ptc.cs * Disabled Profiled PTC by default as requested in the past by gdk. * Fix rejit bug. Increased number of parallel translations. Add stack unwinding stuffs support (1 of 2). Nits. * Add stack unwinding stuffs support (2 of 2). Tuned number of parallel translations. * Restored the ability to assemble jumps with 8-bit offset when Profiled PTC is disabled or during profiling. Modifications due to rebase. Nits. * Limited profiling of the functions to be translated to the addresses belonging to the range of static objects only. * Nits. * Nits. * Update Delegates.cs * Nit. * Update InstEmitSimdArithmetic.cs * Address riperiperi comments. * Fixed the issue of unjustifiably longer boot times at the second boot than at the first boot, measured at the same time or reference point and with the same number of translated functions. * Implemented a simple redundant load/save mechanism. Halved the value of Decoder.MaxInstsPerFunction more appropriate for the current performance of the Translator. Replaced by Logger.PrintError to Logger.PrintDebug in TexturePool.cs about the supposed invalid texture format to avoid the spawn of the log. Nits. * Nit. Improved Logger.PrintError in TexturePool.cs to avoid log spawn. Added missing code for FZ handling (in output) for fp max/min instructions (slow paths). * Add configuration migration for PTC Co-authored-by: Thog <me@thog.eu>
2020-06-16 20:28:02 +02:00
default:
throw new NotImplementedException($"Unknown MRRC 0x{op.RawOpCode:X} at 0x{op.Address:X}.");
Add most of the A32 instruction set to ARMeilleure (#897) * Implement TEQ and MOV (Imm16) * Initial work on A32 instructions + SVC. No tests yet, hangs in rtld. * Implement CLZ, fix BFI and BFC Now stops on SIMD initialization. * Exclusive access instructions, fix to mul, system instructions. Now gets to a break after SignalProcessWideKey64. * Better impl of UBFX, add UDIV and SDIV Now boots way further - now stuck on VMOV instruction. * Many more instructions, start on SIMD and testing framework. * Fix build issues * svc: Rework 32 bit codepath Fixing once and for all argument ordering issues. * Fix 32 bits stacktrace * hle debug: Add 32 bits dynamic section parsing * Fix highCq mode, add many tests, fix some instruction bugs Still suffers from critical malloc failure :weary: * Fix incorrect opcode decoders and a few more instructions. * Add a few instructions and fix others. re-disable highCq for now. Disabled the svc memory clear since i'm not sure about it. * Fix build * Fix typo in ordered/exclusive stores. * Implement some more instructions, fix others. Uxtab16/Sxtab16 are untested. * Begin impl of pairwise, some other instructions. * Add a few more instructions, a quick hack to fix svcs for now. * Add tests and fix issues with VTRN, VZIP, VUZP * Add a few more instructions, fix Vmul_1 encoding. * Fix way too many instruction bugs, add tests for some of the more important ones. * Fix HighCq, enable FastFP paths for some floating point instructions (not entirely sure why these were disabled, so important to note this commit exists) Branching has been removed in A32 shifts until I figure out if it's worth it * Cleanup Part 1 There should be no functional change between these next few commits. Should is the key word. (except for removing break handler) * Implement 32 bits syscalls Co-authored-by: riperiperi <rhy3756547@hotmail.com> Implement all 32 bits counterparts of the 64 bits syscalls we currently have. * Refactor part 2: Move index/subindex logic to Operand May have inadvertently fixed one (1) bug * Add FlushProcessDataCache32 * Address jd's comments * Remove 16 bit encodings from OpCodeTable Still need to catch some edge cases (operands that use the "F" flag) and make Q encodings with non-even indexes undefined. * Correct Fpscr handling for FP vector slow paths WIP * Add StandardFPSCRValue behaviour for all Arithmetic instructions * Add StandardFPSCRValue behaviour to compare instructions. * Force passing of fpcr to FPProcessException and FPUnpack. Reduces potential for code error significantly * OpCode cleanup * Remove urgency from DMB comment in MRRC DMB is currently a no-op via the instruction, so it should likely still be a no-op here. * Test Cleanup * Fix FPDefaultNaN on Ryzen CPUs * Improve some tests, fix some shift instructions, add slow path for Vadd * Fix Typo * More test cleanup * Flip order of Fx and index, to indicate that the operand's is the "base" * Remove Simd32 register type, use Int32 and Int64 for scalars like A64 does. * Reintroduce alignment to DecoderHelper (removed by accident) * One more realign as reading diffs is hard * Use I32 registers in A32 (part 2) Swap default integer register type based on current execution mode. * FPSCR flags as Registers (part 1) Still need to change NativeContext and ExecutionContext to allow getting/setting with the flag values. * Use I32 registers in A32 (part 1) * FPSCR flags as registers (part 2) Only CMP flags are on the registers right now. It could be useful to use more of the space in non-fast-float when implementing A32 flags accurately in the fast path. * Address Feedback * Correct FP->Int behaviour (should saturate) * Make branches made by writing to PC eligible for Rejit Greatly improves performance in most games. * Remove unused branching for Vtbl * RejitRequest as a class rather than a tuple Makes a lot more sense than storing tuples on a dictionary. * Add VMOVN, VSHR (imm), VSHRN (imm) and related tests * Re-order InstEmitSystem32 Alphabetical sorting. * Address Feedback Feedback from Ac_K, remove and sort usings. * Address Feedback 2 * Address Feedback from LDj3SNuD Opcode table reordered to have alphabetical sorting within groups, Vmaxnm and Vminnm have split names to be less ambiguous, SoftFloat nits, Test nits and Test simplification with ValueSource. * Add Debug Asserts to A32 helpers Mainly to prevent the shift ones from being used on I64 operands, as they expect I32 input for most operations (eg. carry flag setting), and expect I32 input for shift and boolean amounts. Most other helper functions don't take Operands, throw on out of range values, and take specific types of OpCode, so didn't need any asserts. * Use ConstF rather than creating an operand. (useful for pooling in future) * Move exclusive load to helper, reference call flag rather than literal 1. * Address LDj feedback (minus table flatten) one final look before it's all gone. the world is so beautiful. * Flatten OpCodeTable oh no * Address more table ordering * Call Flag as int on A32 Co-authored-by: Natalie C. <cyuubiapps@gmail.com> Co-authored-by: Thog <thog@protonmail.com>
2020-02-23 22:20:40 +01:00
}
Add Profiled Persistent Translation Cache. (#769) * Delete DelegateTypes.cs * Delete DelegateCache.cs * Add files via upload * Update Horizon.cs * Update Program.cs * Update MainWindow.cs * Update Aot.cs * Update RelocEntry.cs * Update Translator.cs * Update MemoryManager.cs * Update InstEmitMemoryHelper.cs * Update Delegates.cs * Nit. * Nit. * Nit. * 10 fewer MSIL bytes for us * Add comment. Nits. * Update Translator.cs * Update Aot.cs * Nits. * Opt.. * Opt.. * Opt.. * Opt.. * Allow to change compression level. * Update MemoryManager.cs * Update Translator.cs * Manage corner cases during the save phase. Nits. * Update Aot.cs * Translator response tweak for Aot disabled. Nit. * Nit. * Nits. * Create DelegateHelpers.cs * Update Delegates.cs * Nit. * Nit. * Nits. * Fix due to #784. * Fixes due to #757 & #841. * Fix due to #846. * Fix due to #847. * Use MethodInfo for managed method calls. Use IR methods instead of managed methods about Max/Min (S/U). Follow-ups & Nits. * Add missing exception messages. Reintroduce slow path for Fmov_Vi. Implement slow path for Fmov_Si. * Switch to the new folder structure. Nits. * Impl. index-based relocation information. Impl. cache file version field. * Nit. * Address gdkchan comments. Mainly: - fixed cache file corruption issue on exit; - exposed a way to disable AOT on the GUI. * Address AcK77 comment. * Address Thealexbarney, jduncanator & emmauss comments. Header magic, CpuId (FI) & Aot -> Ptc. * Adaptation to the new application reloading system. Improvements to the call system of managed methods. Follow-ups. Nits. * Get the same boot times as on master when PTC is disabled. * Profiled Aot. * A32 support (#897). * #975 support (1 of 2). * #975 support (2 of 2). * Rebase fix & nits. * Some fixes and nits (still one bug left). * One fix & nits. * Tests fix (by gdk) & nits. * Support translations not only in high quality and rejit. Nits. * Added possibility to skip translations and continue execution, using `ESC` key. * Update SettingsWindow.cs * Update GLRenderer.cs * Update Ptc.cs * Disabled Profiled PTC by default as requested in the past by gdk. * Fix rejit bug. Increased number of parallel translations. Add stack unwinding stuffs support (1 of 2). Nits. * Add stack unwinding stuffs support (2 of 2). Tuned number of parallel translations. * Restored the ability to assemble jumps with 8-bit offset when Profiled PTC is disabled or during profiling. Modifications due to rebase. Nits. * Limited profiling of the functions to be translated to the addresses belonging to the range of static objects only. * Nits. * Nits. * Update Delegates.cs * Nit. * Update InstEmitSimdArithmetic.cs * Address riperiperi comments. * Fixed the issue of unjustifiably longer boot times at the second boot than at the first boot, measured at the same time or reference point and with the same number of translated functions. * Implemented a simple redundant load/save mechanism. Halved the value of Decoder.MaxInstsPerFunction more appropriate for the current performance of the Translator. Replaced by Logger.PrintError to Logger.PrintDebug in TexturePool.cs about the supposed invalid texture format to avoid the spawn of the log. Nits. * Nit. Improved Logger.PrintError in TexturePool.cs to avoid log spawn. Added missing code for FZ handling (in output) for fp max/min instructions (slow paths). * Add configuration migration for PTC Co-authored-by: Thog <me@thog.eu>
2020-06-16 20:28:02 +02:00
Operand result = context.Call(info);
Add most of the A32 instruction set to ARMeilleure (#897) * Implement TEQ and MOV (Imm16) * Initial work on A32 instructions + SVC. No tests yet, hangs in rtld. * Implement CLZ, fix BFI and BFC Now stops on SIMD initialization. * Exclusive access instructions, fix to mul, system instructions. Now gets to a break after SignalProcessWideKey64. * Better impl of UBFX, add UDIV and SDIV Now boots way further - now stuck on VMOV instruction. * Many more instructions, start on SIMD and testing framework. * Fix build issues * svc: Rework 32 bit codepath Fixing once and for all argument ordering issues. * Fix 32 bits stacktrace * hle debug: Add 32 bits dynamic section parsing * Fix highCq mode, add many tests, fix some instruction bugs Still suffers from critical malloc failure :weary: * Fix incorrect opcode decoders and a few more instructions. * Add a few instructions and fix others. re-disable highCq for now. Disabled the svc memory clear since i'm not sure about it. * Fix build * Fix typo in ordered/exclusive stores. * Implement some more instructions, fix others. Uxtab16/Sxtab16 are untested. * Begin impl of pairwise, some other instructions. * Add a few more instructions, a quick hack to fix svcs for now. * Add tests and fix issues with VTRN, VZIP, VUZP * Add a few more instructions, fix Vmul_1 encoding. * Fix way too many instruction bugs, add tests for some of the more important ones. * Fix HighCq, enable FastFP paths for some floating point instructions (not entirely sure why these were disabled, so important to note this commit exists) Branching has been removed in A32 shifts until I figure out if it's worth it * Cleanup Part 1 There should be no functional change between these next few commits. Should is the key word. (except for removing break handler) * Implement 32 bits syscalls Co-authored-by: riperiperi <rhy3756547@hotmail.com> Implement all 32 bits counterparts of the 64 bits syscalls we currently have. * Refactor part 2: Move index/subindex logic to Operand May have inadvertently fixed one (1) bug * Add FlushProcessDataCache32 * Address jd's comments * Remove 16 bit encodings from OpCodeTable Still need to catch some edge cases (operands that use the "F" flag) and make Q encodings with non-even indexes undefined. * Correct Fpscr handling for FP vector slow paths WIP * Add StandardFPSCRValue behaviour for all Arithmetic instructions * Add StandardFPSCRValue behaviour to compare instructions. * Force passing of fpcr to FPProcessException and FPUnpack. Reduces potential for code error significantly * OpCode cleanup * Remove urgency from DMB comment in MRRC DMB is currently a no-op via the instruction, so it should likely still be a no-op here. * Test Cleanup * Fix FPDefaultNaN on Ryzen CPUs * Improve some tests, fix some shift instructions, add slow path for Vadd * Fix Typo * More test cleanup * Flip order of Fx and index, to indicate that the operand's is the "base" * Remove Simd32 register type, use Int32 and Int64 for scalars like A64 does. * Reintroduce alignment to DecoderHelper (removed by accident) * One more realign as reading diffs is hard * Use I32 registers in A32 (part 2) Swap default integer register type based on current execution mode. * FPSCR flags as Registers (part 1) Still need to change NativeContext and ExecutionContext to allow getting/setting with the flag values. * Use I32 registers in A32 (part 1) * FPSCR flags as registers (part 2) Only CMP flags are on the registers right now. It could be useful to use more of the space in non-fast-float when implementing A32 flags accurately in the fast path. * Address Feedback * Correct FP->Int behaviour (should saturate) * Make branches made by writing to PC eligible for Rejit Greatly improves performance in most games. * Remove unused branching for Vtbl * RejitRequest as a class rather than a tuple Makes a lot more sense than storing tuples on a dictionary. * Add VMOVN, VSHR (imm), VSHRN (imm) and related tests * Re-order InstEmitSystem32 Alphabetical sorting. * Address Feedback Feedback from Ac_K, remove and sort usings. * Address Feedback 2 * Address Feedback from LDj3SNuD Opcode table reordered to have alphabetical sorting within groups, Vmaxnm and Vminnm have split names to be less ambiguous, SoftFloat nits, Test nits and Test simplification with ValueSource. * Add Debug Asserts to A32 helpers Mainly to prevent the shift ones from being used on I64 operands, as they expect I32 input for most operations (eg. carry flag setting), and expect I32 input for shift and boolean amounts. Most other helper functions don't take Operands, throw on out of range values, and take specific types of OpCode, so didn't need any asserts. * Use ConstF rather than creating an operand. (useful for pooling in future) * Move exclusive load to helper, reference call flag rather than literal 1. * Address LDj feedback (minus table flatten) one final look before it's all gone. the world is so beautiful. * Flatten OpCodeTable oh no * Address more table ordering * Call Flag as int on A32 Co-authored-by: Natalie C. <cyuubiapps@gmail.com> Co-authored-by: Thog <thog@protonmail.com>
2020-02-23 22:20:40 +01:00
SetIntA32(context, op.Rt, context.ConvertI64ToI32(result));
SetIntA32(context, op.CRn, context.ConvertI64ToI32(context.ShiftRightUI(result, Const(32))));
}
public static void Mrs(ArmEmitterContext context)
{
OpCode32Mrs op = (OpCode32Mrs)context.CurrOp;
if (op.R)
{
throw new NotImplementedException("SPSR");
}
else
{
Fpsr and Fpcr freed. (#3701) * Implemented in IR the managed methods of the Saturating region ... ... of the SoftFallback class (the SatQ ones). The need to natively manage the Fpcr and Fpsr system registers is still a fact. Contributes to https://github.com/Ryujinx/Ryujinx/issues/2917 ; I will open another PR to implement in Intrinsics-branchless the methods of the Saturation region as well (the SatXXXToXXX ones). All instructions involved have been tested locally in both release and debug modes, in both lowcq and highcq. * Ptc.InternalVersion = 3665 * Addressed PR feedback. * Implemented in IR the managed methods of the ShlReg region of the SoftFallback class. It also includes the last two SatQ ones (following up on https://github.com/Ryujinx/Ryujinx/pull/3665). All instructions involved have been tested locally in both release and debug modes, in both lowcq and highcq. * Fpsr and Fpcr freed. Handling/isolation of Fpsr and Fpcr via register for IR and via memory for Tests and Threads, with synchronization to context exchanges (explicit for SoftFloat); without having to call managed methods. Thanks to the inlining work of the previous two PRs and others in this. Tests performed locally in both release and debug modes, in both lowcq and highcq, with FastFP to true and false (explicit FP tests included). Tested with the title Tony Hawk's PS. Depends on shlreg. * Update InstEmitSimdHelper.cs * De-magic Masks. Remove the Stride and Len flags; Fpsr.NZCV are A32 only, then moved to Fpscr: this leads to emitting less IR in reference to Get/Set Fpsr/Fpcr/Fpscr methods in reference to Mrs/Msr (A64) and Vmrs/Vmsr (A32) instructions. * Addressed PR feedback.
2022-09-20 23:55:13 +02:00
Operand spsr = context.ShiftLeft(GetFlag(PState.VFlag), Const((int)PState.VFlag));
spsr = context.BitwiseOr(spsr, context.ShiftLeft(GetFlag(PState.CFlag), Const((int)PState.CFlag)));
spsr = context.BitwiseOr(spsr, context.ShiftLeft(GetFlag(PState.ZFlag), Const((int)PState.ZFlag)));
spsr = context.BitwiseOr(spsr, context.ShiftLeft(GetFlag(PState.NFlag), Const((int)PState.NFlag)));
spsr = context.BitwiseOr(spsr, context.ShiftLeft(GetFlag(PState.QFlag), Const((int)PState.QFlag)));
// TODO: Remaining flags.
SetIntA32(context, op.Rd, spsr);
}
}
public static void Msr(ArmEmitterContext context)
{
OpCode32MsrReg op = (OpCode32MsrReg)context.CurrOp;
if (op.R)
{
throw new NotImplementedException("SPSR");
}
else
{
if ((op.Mask & 8) != 0)
{
Operand value = GetIntA32(context, op.Rn);
EmitSetNzcv(context, value);
Fpsr and Fpcr freed. (#3701) * Implemented in IR the managed methods of the Saturating region ... ... of the SoftFallback class (the SatQ ones). The need to natively manage the Fpcr and Fpsr system registers is still a fact. Contributes to https://github.com/Ryujinx/Ryujinx/issues/2917 ; I will open another PR to implement in Intrinsics-branchless the methods of the Saturation region as well (the SatXXXToXXX ones). All instructions involved have been tested locally in both release and debug modes, in both lowcq and highcq. * Ptc.InternalVersion = 3665 * Addressed PR feedback. * Implemented in IR the managed methods of the ShlReg region of the SoftFallback class. It also includes the last two SatQ ones (following up on https://github.com/Ryujinx/Ryujinx/pull/3665). All instructions involved have been tested locally in both release and debug modes, in both lowcq and highcq. * Fpsr and Fpcr freed. Handling/isolation of Fpsr and Fpcr via register for IR and via memory for Tests and Threads, with synchronization to context exchanges (explicit for SoftFloat); without having to call managed methods. Thanks to the inlining work of the previous two PRs and others in this. Tests performed locally in both release and debug modes, in both lowcq and highcq, with FastFP to true and false (explicit FP tests included). Tested with the title Tony Hawk's PS. Depends on shlreg. * Update InstEmitSimdHelper.cs * De-magic Masks. Remove the Stride and Len flags; Fpsr.NZCV are A32 only, then moved to Fpscr: this leads to emitting less IR in reference to Get/Set Fpsr/Fpcr/Fpscr methods in reference to Mrs/Msr (A64) and Vmrs/Vmsr (A32) instructions. * Addressed PR feedback.
2022-09-20 23:55:13 +02:00
Operand q = context.BitwiseAnd(context.ShiftRightUI(value, Const((int)PState.QFlag)), Const(1));
SetFlag(context, PState.QFlag, q);
}
if ((op.Mask & 4) != 0)
{
throw new NotImplementedException("APSR_g");
}
if ((op.Mask & 2) != 0)
{
throw new NotImplementedException("CPSR_x");
}
if ((op.Mask & 1) != 0)
{
throw new NotImplementedException("CPSR_c");
}
}
}
Add most of the A32 instruction set to ARMeilleure (#897) * Implement TEQ and MOV (Imm16) * Initial work on A32 instructions + SVC. No tests yet, hangs in rtld. * Implement CLZ, fix BFI and BFC Now stops on SIMD initialization. * Exclusive access instructions, fix to mul, system instructions. Now gets to a break after SignalProcessWideKey64. * Better impl of UBFX, add UDIV and SDIV Now boots way further - now stuck on VMOV instruction. * Many more instructions, start on SIMD and testing framework. * Fix build issues * svc: Rework 32 bit codepath Fixing once and for all argument ordering issues. * Fix 32 bits stacktrace * hle debug: Add 32 bits dynamic section parsing * Fix highCq mode, add many tests, fix some instruction bugs Still suffers from critical malloc failure :weary: * Fix incorrect opcode decoders and a few more instructions. * Add a few instructions and fix others. re-disable highCq for now. Disabled the svc memory clear since i'm not sure about it. * Fix build * Fix typo in ordered/exclusive stores. * Implement some more instructions, fix others. Uxtab16/Sxtab16 are untested. * Begin impl of pairwise, some other instructions. * Add a few more instructions, a quick hack to fix svcs for now. * Add tests and fix issues with VTRN, VZIP, VUZP * Add a few more instructions, fix Vmul_1 encoding. * Fix way too many instruction bugs, add tests for some of the more important ones. * Fix HighCq, enable FastFP paths for some floating point instructions (not entirely sure why these were disabled, so important to note this commit exists) Branching has been removed in A32 shifts until I figure out if it's worth it * Cleanup Part 1 There should be no functional change between these next few commits. Should is the key word. (except for removing break handler) * Implement 32 bits syscalls Co-authored-by: riperiperi <rhy3756547@hotmail.com> Implement all 32 bits counterparts of the 64 bits syscalls we currently have. * Refactor part 2: Move index/subindex logic to Operand May have inadvertently fixed one (1) bug * Add FlushProcessDataCache32 * Address jd's comments * Remove 16 bit encodings from OpCodeTable Still need to catch some edge cases (operands that use the "F" flag) and make Q encodings with non-even indexes undefined. * Correct Fpscr handling for FP vector slow paths WIP * Add StandardFPSCRValue behaviour for all Arithmetic instructions * Add StandardFPSCRValue behaviour to compare instructions. * Force passing of fpcr to FPProcessException and FPUnpack. Reduces potential for code error significantly * OpCode cleanup * Remove urgency from DMB comment in MRRC DMB is currently a no-op via the instruction, so it should likely still be a no-op here. * Test Cleanup * Fix FPDefaultNaN on Ryzen CPUs * Improve some tests, fix some shift instructions, add slow path for Vadd * Fix Typo * More test cleanup * Flip order of Fx and index, to indicate that the operand's is the "base" * Remove Simd32 register type, use Int32 and Int64 for scalars like A64 does. * Reintroduce alignment to DecoderHelper (removed by accident) * One more realign as reading diffs is hard * Use I32 registers in A32 (part 2) Swap default integer register type based on current execution mode. * FPSCR flags as Registers (part 1) Still need to change NativeContext and ExecutionContext to allow getting/setting with the flag values. * Use I32 registers in A32 (part 1) * FPSCR flags as registers (part 2) Only CMP flags are on the registers right now. It could be useful to use more of the space in non-fast-float when implementing A32 flags accurately in the fast path. * Address Feedback * Correct FP->Int behaviour (should saturate) * Make branches made by writing to PC eligible for Rejit Greatly improves performance in most games. * Remove unused branching for Vtbl * RejitRequest as a class rather than a tuple Makes a lot more sense than storing tuples on a dictionary. * Add VMOVN, VSHR (imm), VSHRN (imm) and related tests * Re-order InstEmitSystem32 Alphabetical sorting. * Address Feedback Feedback from Ac_K, remove and sort usings. * Address Feedback 2 * Address Feedback from LDj3SNuD Opcode table reordered to have alphabetical sorting within groups, Vmaxnm and Vminnm have split names to be less ambiguous, SoftFloat nits, Test nits and Test simplification with ValueSource. * Add Debug Asserts to A32 helpers Mainly to prevent the shift ones from being used on I64 operands, as they expect I32 input for most operations (eg. carry flag setting), and expect I32 input for shift and boolean amounts. Most other helper functions don't take Operands, throw on out of range values, and take specific types of OpCode, so didn't need any asserts. * Use ConstF rather than creating an operand. (useful for pooling in future) * Move exclusive load to helper, reference call flag rather than literal 1. * Address LDj feedback (minus table flatten) one final look before it's all gone. the world is so beautiful. * Flatten OpCodeTable oh no * Address more table ordering * Call Flag as int on A32 Co-authored-by: Natalie C. <cyuubiapps@gmail.com> Co-authored-by: Thog <thog@protonmail.com>
2020-02-23 22:20:40 +01:00
public static void Nop(ArmEmitterContext context) { }
public static void Vmrs(ArmEmitterContext context)
{
OpCode32SimdSpecial op = (OpCode32SimdSpecial)context.CurrOp;
if (op.Rt == RegisterAlias.Aarch32Pc && op.Sreg == 0b0001)
{
// Special behavior: copy NZCV flags into APSR.
SetFlag(context, PState.VFlag, GetFpFlag(FPState.VFlag));
SetFlag(context, PState.CFlag, GetFpFlag(FPState.CFlag));
SetFlag(context, PState.ZFlag, GetFpFlag(FPState.ZFlag));
SetFlag(context, PState.NFlag, GetFpFlag(FPState.NFlag));
Add Profiled Persistent Translation Cache. (#769) * Delete DelegateTypes.cs * Delete DelegateCache.cs * Add files via upload * Update Horizon.cs * Update Program.cs * Update MainWindow.cs * Update Aot.cs * Update RelocEntry.cs * Update Translator.cs * Update MemoryManager.cs * Update InstEmitMemoryHelper.cs * Update Delegates.cs * Nit. * Nit. * Nit. * 10 fewer MSIL bytes for us * Add comment. Nits. * Update Translator.cs * Update Aot.cs * Nits. * Opt.. * Opt.. * Opt.. * Opt.. * Allow to change compression level. * Update MemoryManager.cs * Update Translator.cs * Manage corner cases during the save phase. Nits. * Update Aot.cs * Translator response tweak for Aot disabled. Nit. * Nit. * Nits. * Create DelegateHelpers.cs * Update Delegates.cs * Nit. * Nit. * Nits. * Fix due to #784. * Fixes due to #757 & #841. * Fix due to #846. * Fix due to #847. * Use MethodInfo for managed method calls. Use IR methods instead of managed methods about Max/Min (S/U). Follow-ups & Nits. * Add missing exception messages. Reintroduce slow path for Fmov_Vi. Implement slow path for Fmov_Si. * Switch to the new folder structure. Nits. * Impl. index-based relocation information. Impl. cache file version field. * Nit. * Address gdkchan comments. Mainly: - fixed cache file corruption issue on exit; - exposed a way to disable AOT on the GUI. * Address AcK77 comment. * Address Thealexbarney, jduncanator & emmauss comments. Header magic, CpuId (FI) & Aot -> Ptc. * Adaptation to the new application reloading system. Improvements to the call system of managed methods. Follow-ups. Nits. * Get the same boot times as on master when PTC is disabled. * Profiled Aot. * A32 support (#897). * #975 support (1 of 2). * #975 support (2 of 2). * Rebase fix & nits. * Some fixes and nits (still one bug left). * One fix & nits. * Tests fix (by gdk) & nits. * Support translations not only in high quality and rejit. Nits. * Added possibility to skip translations and continue execution, using `ESC` key. * Update SettingsWindow.cs * Update GLRenderer.cs * Update Ptc.cs * Disabled Profiled PTC by default as requested in the past by gdk. * Fix rejit bug. Increased number of parallel translations. Add stack unwinding stuffs support (1 of 2). Nits. * Add stack unwinding stuffs support (2 of 2). Tuned number of parallel translations. * Restored the ability to assemble jumps with 8-bit offset when Profiled PTC is disabled or during profiling. Modifications due to rebase. Nits. * Limited profiling of the functions to be translated to the addresses belonging to the range of static objects only. * Nits. * Nits. * Update Delegates.cs * Nit. * Update InstEmitSimdArithmetic.cs * Address riperiperi comments. * Fixed the issue of unjustifiably longer boot times at the second boot than at the first boot, measured at the same time or reference point and with the same number of translated functions. * Implemented a simple redundant load/save mechanism. Halved the value of Decoder.MaxInstsPerFunction more appropriate for the current performance of the Translator. Replaced by Logger.PrintError to Logger.PrintDebug in TexturePool.cs about the supposed invalid texture format to avoid the spawn of the log. Nits. * Nit. Improved Logger.PrintError in TexturePool.cs to avoid log spawn. Added missing code for FZ handling (in output) for fp max/min instructions (slow paths). * Add configuration migration for PTC Co-authored-by: Thog <me@thog.eu>
2020-06-16 20:28:02 +02:00
Add most of the A32 instruction set to ARMeilleure (#897) * Implement TEQ and MOV (Imm16) * Initial work on A32 instructions + SVC. No tests yet, hangs in rtld. * Implement CLZ, fix BFI and BFC Now stops on SIMD initialization. * Exclusive access instructions, fix to mul, system instructions. Now gets to a break after SignalProcessWideKey64. * Better impl of UBFX, add UDIV and SDIV Now boots way further - now stuck on VMOV instruction. * Many more instructions, start on SIMD and testing framework. * Fix build issues * svc: Rework 32 bit codepath Fixing once and for all argument ordering issues. * Fix 32 bits stacktrace * hle debug: Add 32 bits dynamic section parsing * Fix highCq mode, add many tests, fix some instruction bugs Still suffers from critical malloc failure :weary: * Fix incorrect opcode decoders and a few more instructions. * Add a few instructions and fix others. re-disable highCq for now. Disabled the svc memory clear since i'm not sure about it. * Fix build * Fix typo in ordered/exclusive stores. * Implement some more instructions, fix others. Uxtab16/Sxtab16 are untested. * Begin impl of pairwise, some other instructions. * Add a few more instructions, a quick hack to fix svcs for now. * Add tests and fix issues with VTRN, VZIP, VUZP * Add a few more instructions, fix Vmul_1 encoding. * Fix way too many instruction bugs, add tests for some of the more important ones. * Fix HighCq, enable FastFP paths for some floating point instructions (not entirely sure why these were disabled, so important to note this commit exists) Branching has been removed in A32 shifts until I figure out if it's worth it * Cleanup Part 1 There should be no functional change between these next few commits. Should is the key word. (except for removing break handler) * Implement 32 bits syscalls Co-authored-by: riperiperi <rhy3756547@hotmail.com> Implement all 32 bits counterparts of the 64 bits syscalls we currently have. * Refactor part 2: Move index/subindex logic to Operand May have inadvertently fixed one (1) bug * Add FlushProcessDataCache32 * Address jd's comments * Remove 16 bit encodings from OpCodeTable Still need to catch some edge cases (operands that use the "F" flag) and make Q encodings with non-even indexes undefined. * Correct Fpscr handling for FP vector slow paths WIP * Add StandardFPSCRValue behaviour for all Arithmetic instructions * Add StandardFPSCRValue behaviour to compare instructions. * Force passing of fpcr to FPProcessException and FPUnpack. Reduces potential for code error significantly * OpCode cleanup * Remove urgency from DMB comment in MRRC DMB is currently a no-op via the instruction, so it should likely still be a no-op here. * Test Cleanup * Fix FPDefaultNaN on Ryzen CPUs * Improve some tests, fix some shift instructions, add slow path for Vadd * Fix Typo * More test cleanup * Flip order of Fx and index, to indicate that the operand's is the "base" * Remove Simd32 register type, use Int32 and Int64 for scalars like A64 does. * Reintroduce alignment to DecoderHelper (removed by accident) * One more realign as reading diffs is hard * Use I32 registers in A32 (part 2) Swap default integer register type based on current execution mode. * FPSCR flags as Registers (part 1) Still need to change NativeContext and ExecutionContext to allow getting/setting with the flag values. * Use I32 registers in A32 (part 1) * FPSCR flags as registers (part 2) Only CMP flags are on the registers right now. It could be useful to use more of the space in non-fast-float when implementing A32 flags accurately in the fast path. * Address Feedback * Correct FP->Int behaviour (should saturate) * Make branches made by writing to PC eligible for Rejit Greatly improves performance in most games. * Remove unused branching for Vtbl * RejitRequest as a class rather than a tuple Makes a lot more sense than storing tuples on a dictionary. * Add VMOVN, VSHR (imm), VSHRN (imm) and related tests * Re-order InstEmitSystem32 Alphabetical sorting. * Address Feedback Feedback from Ac_K, remove and sort usings. * Address Feedback 2 * Address Feedback from LDj3SNuD Opcode table reordered to have alphabetical sorting within groups, Vmaxnm and Vminnm have split names to be less ambiguous, SoftFloat nits, Test nits and Test simplification with ValueSource. * Add Debug Asserts to A32 helpers Mainly to prevent the shift ones from being used on I64 operands, as they expect I32 input for most operations (eg. carry flag setting), and expect I32 input for shift and boolean amounts. Most other helper functions don't take Operands, throw on out of range values, and take specific types of OpCode, so didn't need any asserts. * Use ConstF rather than creating an operand. (useful for pooling in future) * Move exclusive load to helper, reference call flag rather than literal 1. * Address LDj feedback (minus table flatten) one final look before it's all gone. the world is so beautiful. * Flatten OpCodeTable oh no * Address more table ordering * Call Flag as int on A32 Co-authored-by: Natalie C. <cyuubiapps@gmail.com> Co-authored-by: Thog <thog@protonmail.com>
2020-02-23 22:20:40 +01:00
return;
}
switch (op.Sreg)
{
case 0b0000: // FPSID
throw new NotImplementedException("Supervisor Only");
case 0b0001: // FPSCR
EmitGetFpscr(context); return;
Add most of the A32 instruction set to ARMeilleure (#897) * Implement TEQ and MOV (Imm16) * Initial work on A32 instructions + SVC. No tests yet, hangs in rtld. * Implement CLZ, fix BFI and BFC Now stops on SIMD initialization. * Exclusive access instructions, fix to mul, system instructions. Now gets to a break after SignalProcessWideKey64. * Better impl of UBFX, add UDIV and SDIV Now boots way further - now stuck on VMOV instruction. * Many more instructions, start on SIMD and testing framework. * Fix build issues * svc: Rework 32 bit codepath Fixing once and for all argument ordering issues. * Fix 32 bits stacktrace * hle debug: Add 32 bits dynamic section parsing * Fix highCq mode, add many tests, fix some instruction bugs Still suffers from critical malloc failure :weary: * Fix incorrect opcode decoders and a few more instructions. * Add a few instructions and fix others. re-disable highCq for now. Disabled the svc memory clear since i'm not sure about it. * Fix build * Fix typo in ordered/exclusive stores. * Implement some more instructions, fix others. Uxtab16/Sxtab16 are untested. * Begin impl of pairwise, some other instructions. * Add a few more instructions, a quick hack to fix svcs for now. * Add tests and fix issues with VTRN, VZIP, VUZP * Add a few more instructions, fix Vmul_1 encoding. * Fix way too many instruction bugs, add tests for some of the more important ones. * Fix HighCq, enable FastFP paths for some floating point instructions (not entirely sure why these were disabled, so important to note this commit exists) Branching has been removed in A32 shifts until I figure out if it's worth it * Cleanup Part 1 There should be no functional change between these next few commits. Should is the key word. (except for removing break handler) * Implement 32 bits syscalls Co-authored-by: riperiperi <rhy3756547@hotmail.com> Implement all 32 bits counterparts of the 64 bits syscalls we currently have. * Refactor part 2: Move index/subindex logic to Operand May have inadvertently fixed one (1) bug * Add FlushProcessDataCache32 * Address jd's comments * Remove 16 bit encodings from OpCodeTable Still need to catch some edge cases (operands that use the "F" flag) and make Q encodings with non-even indexes undefined. * Correct Fpscr handling for FP vector slow paths WIP * Add StandardFPSCRValue behaviour for all Arithmetic instructions * Add StandardFPSCRValue behaviour to compare instructions. * Force passing of fpcr to FPProcessException and FPUnpack. Reduces potential for code error significantly * OpCode cleanup * Remove urgency from DMB comment in MRRC DMB is currently a no-op via the instruction, so it should likely still be a no-op here. * Test Cleanup * Fix FPDefaultNaN on Ryzen CPUs * Improve some tests, fix some shift instructions, add slow path for Vadd * Fix Typo * More test cleanup * Flip order of Fx and index, to indicate that the operand's is the "base" * Remove Simd32 register type, use Int32 and Int64 for scalars like A64 does. * Reintroduce alignment to DecoderHelper (removed by accident) * One more realign as reading diffs is hard * Use I32 registers in A32 (part 2) Swap default integer register type based on current execution mode. * FPSCR flags as Registers (part 1) Still need to change NativeContext and ExecutionContext to allow getting/setting with the flag values. * Use I32 registers in A32 (part 1) * FPSCR flags as registers (part 2) Only CMP flags are on the registers right now. It could be useful to use more of the space in non-fast-float when implementing A32 flags accurately in the fast path. * Address Feedback * Correct FP->Int behaviour (should saturate) * Make branches made by writing to PC eligible for Rejit Greatly improves performance in most games. * Remove unused branching for Vtbl * RejitRequest as a class rather than a tuple Makes a lot more sense than storing tuples on a dictionary. * Add VMOVN, VSHR (imm), VSHRN (imm) and related tests * Re-order InstEmitSystem32 Alphabetical sorting. * Address Feedback Feedback from Ac_K, remove and sort usings. * Address Feedback 2 * Address Feedback from LDj3SNuD Opcode table reordered to have alphabetical sorting within groups, Vmaxnm and Vminnm have split names to be less ambiguous, SoftFloat nits, Test nits and Test simplification with ValueSource. * Add Debug Asserts to A32 helpers Mainly to prevent the shift ones from being used on I64 operands, as they expect I32 input for most operations (eg. carry flag setting), and expect I32 input for shift and boolean amounts. Most other helper functions don't take Operands, throw on out of range values, and take specific types of OpCode, so didn't need any asserts. * Use ConstF rather than creating an operand. (useful for pooling in future) * Move exclusive load to helper, reference call flag rather than literal 1. * Address LDj feedback (minus table flatten) one final look before it's all gone. the world is so beautiful. * Flatten OpCodeTable oh no * Address more table ordering * Call Flag as int on A32 Co-authored-by: Natalie C. <cyuubiapps@gmail.com> Co-authored-by: Thog <thog@protonmail.com>
2020-02-23 22:20:40 +01:00
case 0b0101: // MVFR2
throw new NotImplementedException("MVFR2");
case 0b0110: // MVFR1
throw new NotImplementedException("MVFR1");
case 0b0111: // MVFR0
throw new NotImplementedException("MVFR0");
case 0b1000: // FPEXC
throw new NotImplementedException("Supervisor Only");
Add Profiled Persistent Translation Cache. (#769) * Delete DelegateTypes.cs * Delete DelegateCache.cs * Add files via upload * Update Horizon.cs * Update Program.cs * Update MainWindow.cs * Update Aot.cs * Update RelocEntry.cs * Update Translator.cs * Update MemoryManager.cs * Update InstEmitMemoryHelper.cs * Update Delegates.cs * Nit. * Nit. * Nit. * 10 fewer MSIL bytes for us * Add comment. Nits. * Update Translator.cs * Update Aot.cs * Nits. * Opt.. * Opt.. * Opt.. * Opt.. * Allow to change compression level. * Update MemoryManager.cs * Update Translator.cs * Manage corner cases during the save phase. Nits. * Update Aot.cs * Translator response tweak for Aot disabled. Nit. * Nit. * Nits. * Create DelegateHelpers.cs * Update Delegates.cs * Nit. * Nit. * Nits. * Fix due to #784. * Fixes due to #757 & #841. * Fix due to #846. * Fix due to #847. * Use MethodInfo for managed method calls. Use IR methods instead of managed methods about Max/Min (S/U). Follow-ups & Nits. * Add missing exception messages. Reintroduce slow path for Fmov_Vi. Implement slow path for Fmov_Si. * Switch to the new folder structure. Nits. * Impl. index-based relocation information. Impl. cache file version field. * Nit. * Address gdkchan comments. Mainly: - fixed cache file corruption issue on exit; - exposed a way to disable AOT on the GUI. * Address AcK77 comment. * Address Thealexbarney, jduncanator & emmauss comments. Header magic, CpuId (FI) & Aot -> Ptc. * Adaptation to the new application reloading system. Improvements to the call system of managed methods. Follow-ups. Nits. * Get the same boot times as on master when PTC is disabled. * Profiled Aot. * A32 support (#897). * #975 support (1 of 2). * #975 support (2 of 2). * Rebase fix & nits. * Some fixes and nits (still one bug left). * One fix & nits. * Tests fix (by gdk) & nits. * Support translations not only in high quality and rejit. Nits. * Added possibility to skip translations and continue execution, using `ESC` key. * Update SettingsWindow.cs * Update GLRenderer.cs * Update Ptc.cs * Disabled Profiled PTC by default as requested in the past by gdk. * Fix rejit bug. Increased number of parallel translations. Add stack unwinding stuffs support (1 of 2). Nits. * Add stack unwinding stuffs support (2 of 2). Tuned number of parallel translations. * Restored the ability to assemble jumps with 8-bit offset when Profiled PTC is disabled or during profiling. Modifications due to rebase. Nits. * Limited profiling of the functions to be translated to the addresses belonging to the range of static objects only. * Nits. * Nits. * Update Delegates.cs * Nit. * Update InstEmitSimdArithmetic.cs * Address riperiperi comments. * Fixed the issue of unjustifiably longer boot times at the second boot than at the first boot, measured at the same time or reference point and with the same number of translated functions. * Implemented a simple redundant load/save mechanism. Halved the value of Decoder.MaxInstsPerFunction more appropriate for the current performance of the Translator. Replaced by Logger.PrintError to Logger.PrintDebug in TexturePool.cs about the supposed invalid texture format to avoid the spawn of the log. Nits. * Nit. Improved Logger.PrintError in TexturePool.cs to avoid log spawn. Added missing code for FZ handling (in output) for fp max/min instructions (slow paths). * Add configuration migration for PTC Co-authored-by: Thog <me@thog.eu>
2020-06-16 20:28:02 +02:00
default:
throw new NotImplementedException($"Unknown VMRS 0x{op.RawOpCode:X} at 0x{op.Address:X}.");
Add most of the A32 instruction set to ARMeilleure (#897) * Implement TEQ and MOV (Imm16) * Initial work on A32 instructions + SVC. No tests yet, hangs in rtld. * Implement CLZ, fix BFI and BFC Now stops on SIMD initialization. * Exclusive access instructions, fix to mul, system instructions. Now gets to a break after SignalProcessWideKey64. * Better impl of UBFX, add UDIV and SDIV Now boots way further - now stuck on VMOV instruction. * Many more instructions, start on SIMD and testing framework. * Fix build issues * svc: Rework 32 bit codepath Fixing once and for all argument ordering issues. * Fix 32 bits stacktrace * hle debug: Add 32 bits dynamic section parsing * Fix highCq mode, add many tests, fix some instruction bugs Still suffers from critical malloc failure :weary: * Fix incorrect opcode decoders and a few more instructions. * Add a few instructions and fix others. re-disable highCq for now. Disabled the svc memory clear since i'm not sure about it. * Fix build * Fix typo in ordered/exclusive stores. * Implement some more instructions, fix others. Uxtab16/Sxtab16 are untested. * Begin impl of pairwise, some other instructions. * Add a few more instructions, a quick hack to fix svcs for now. * Add tests and fix issues with VTRN, VZIP, VUZP * Add a few more instructions, fix Vmul_1 encoding. * Fix way too many instruction bugs, add tests for some of the more important ones. * Fix HighCq, enable FastFP paths for some floating point instructions (not entirely sure why these were disabled, so important to note this commit exists) Branching has been removed in A32 shifts until I figure out if it's worth it * Cleanup Part 1 There should be no functional change between these next few commits. Should is the key word. (except for removing break handler) * Implement 32 bits syscalls Co-authored-by: riperiperi <rhy3756547@hotmail.com> Implement all 32 bits counterparts of the 64 bits syscalls we currently have. * Refactor part 2: Move index/subindex logic to Operand May have inadvertently fixed one (1) bug * Add FlushProcessDataCache32 * Address jd's comments * Remove 16 bit encodings from OpCodeTable Still need to catch some edge cases (operands that use the "F" flag) and make Q encodings with non-even indexes undefined. * Correct Fpscr handling for FP vector slow paths WIP * Add StandardFPSCRValue behaviour for all Arithmetic instructions * Add StandardFPSCRValue behaviour to compare instructions. * Force passing of fpcr to FPProcessException and FPUnpack. Reduces potential for code error significantly * OpCode cleanup * Remove urgency from DMB comment in MRRC DMB is currently a no-op via the instruction, so it should likely still be a no-op here. * Test Cleanup * Fix FPDefaultNaN on Ryzen CPUs * Improve some tests, fix some shift instructions, add slow path for Vadd * Fix Typo * More test cleanup * Flip order of Fx and index, to indicate that the operand's is the "base" * Remove Simd32 register type, use Int32 and Int64 for scalars like A64 does. * Reintroduce alignment to DecoderHelper (removed by accident) * One more realign as reading diffs is hard * Use I32 registers in A32 (part 2) Swap default integer register type based on current execution mode. * FPSCR flags as Registers (part 1) Still need to change NativeContext and ExecutionContext to allow getting/setting with the flag values. * Use I32 registers in A32 (part 1) * FPSCR flags as registers (part 2) Only CMP flags are on the registers right now. It could be useful to use more of the space in non-fast-float when implementing A32 flags accurately in the fast path. * Address Feedback * Correct FP->Int behaviour (should saturate) * Make branches made by writing to PC eligible for Rejit Greatly improves performance in most games. * Remove unused branching for Vtbl * RejitRequest as a class rather than a tuple Makes a lot more sense than storing tuples on a dictionary. * Add VMOVN, VSHR (imm), VSHRN (imm) and related tests * Re-order InstEmitSystem32 Alphabetical sorting. * Address Feedback Feedback from Ac_K, remove and sort usings. * Address Feedback 2 * Address Feedback from LDj3SNuD Opcode table reordered to have alphabetical sorting within groups, Vmaxnm and Vminnm have split names to be less ambiguous, SoftFloat nits, Test nits and Test simplification with ValueSource. * Add Debug Asserts to A32 helpers Mainly to prevent the shift ones from being used on I64 operands, as they expect I32 input for most operations (eg. carry flag setting), and expect I32 input for shift and boolean amounts. Most other helper functions don't take Operands, throw on out of range values, and take specific types of OpCode, so didn't need any asserts. * Use ConstF rather than creating an operand. (useful for pooling in future) * Move exclusive load to helper, reference call flag rather than literal 1. * Address LDj feedback (minus table flatten) one final look before it's all gone. the world is so beautiful. * Flatten OpCodeTable oh no * Address more table ordering * Call Flag as int on A32 Co-authored-by: Natalie C. <cyuubiapps@gmail.com> Co-authored-by: Thog <thog@protonmail.com>
2020-02-23 22:20:40 +01:00
}
}
public static void Vmsr(ArmEmitterContext context)
{
OpCode32SimdSpecial op = (OpCode32SimdSpecial)context.CurrOp;
switch (op.Sreg)
{
case 0b0000: // FPSID
throw new NotImplementedException("Supervisor Only");
case 0b0001: // FPSCR
EmitSetFpscr(context); return;
Add most of the A32 instruction set to ARMeilleure (#897) * Implement TEQ and MOV (Imm16) * Initial work on A32 instructions + SVC. No tests yet, hangs in rtld. * Implement CLZ, fix BFI and BFC Now stops on SIMD initialization. * Exclusive access instructions, fix to mul, system instructions. Now gets to a break after SignalProcessWideKey64. * Better impl of UBFX, add UDIV and SDIV Now boots way further - now stuck on VMOV instruction. * Many more instructions, start on SIMD and testing framework. * Fix build issues * svc: Rework 32 bit codepath Fixing once and for all argument ordering issues. * Fix 32 bits stacktrace * hle debug: Add 32 bits dynamic section parsing * Fix highCq mode, add many tests, fix some instruction bugs Still suffers from critical malloc failure :weary: * Fix incorrect opcode decoders and a few more instructions. * Add a few instructions and fix others. re-disable highCq for now. Disabled the svc memory clear since i'm not sure about it. * Fix build * Fix typo in ordered/exclusive stores. * Implement some more instructions, fix others. Uxtab16/Sxtab16 are untested. * Begin impl of pairwise, some other instructions. * Add a few more instructions, a quick hack to fix svcs for now. * Add tests and fix issues with VTRN, VZIP, VUZP * Add a few more instructions, fix Vmul_1 encoding. * Fix way too many instruction bugs, add tests for some of the more important ones. * Fix HighCq, enable FastFP paths for some floating point instructions (not entirely sure why these were disabled, so important to note this commit exists) Branching has been removed in A32 shifts until I figure out if it's worth it * Cleanup Part 1 There should be no functional change between these next few commits. Should is the key word. (except for removing break handler) * Implement 32 bits syscalls Co-authored-by: riperiperi <rhy3756547@hotmail.com> Implement all 32 bits counterparts of the 64 bits syscalls we currently have. * Refactor part 2: Move index/subindex logic to Operand May have inadvertently fixed one (1) bug * Add FlushProcessDataCache32 * Address jd's comments * Remove 16 bit encodings from OpCodeTable Still need to catch some edge cases (operands that use the "F" flag) and make Q encodings with non-even indexes undefined. * Correct Fpscr handling for FP vector slow paths WIP * Add StandardFPSCRValue behaviour for all Arithmetic instructions * Add StandardFPSCRValue behaviour to compare instructions. * Force passing of fpcr to FPProcessException and FPUnpack. Reduces potential for code error significantly * OpCode cleanup * Remove urgency from DMB comment in MRRC DMB is currently a no-op via the instruction, so it should likely still be a no-op here. * Test Cleanup * Fix FPDefaultNaN on Ryzen CPUs * Improve some tests, fix some shift instructions, add slow path for Vadd * Fix Typo * More test cleanup * Flip order of Fx and index, to indicate that the operand's is the "base" * Remove Simd32 register type, use Int32 and Int64 for scalars like A64 does. * Reintroduce alignment to DecoderHelper (removed by accident) * One more realign as reading diffs is hard * Use I32 registers in A32 (part 2) Swap default integer register type based on current execution mode. * FPSCR flags as Registers (part 1) Still need to change NativeContext and ExecutionContext to allow getting/setting with the flag values. * Use I32 registers in A32 (part 1) * FPSCR flags as registers (part 2) Only CMP flags are on the registers right now. It could be useful to use more of the space in non-fast-float when implementing A32 flags accurately in the fast path. * Address Feedback * Correct FP->Int behaviour (should saturate) * Make branches made by writing to PC eligible for Rejit Greatly improves performance in most games. * Remove unused branching for Vtbl * RejitRequest as a class rather than a tuple Makes a lot more sense than storing tuples on a dictionary. * Add VMOVN, VSHR (imm), VSHRN (imm) and related tests * Re-order InstEmitSystem32 Alphabetical sorting. * Address Feedback Feedback from Ac_K, remove and sort usings. * Address Feedback 2 * Address Feedback from LDj3SNuD Opcode table reordered to have alphabetical sorting within groups, Vmaxnm and Vminnm have split names to be less ambiguous, SoftFloat nits, Test nits and Test simplification with ValueSource. * Add Debug Asserts to A32 helpers Mainly to prevent the shift ones from being used on I64 operands, as they expect I32 input for most operations (eg. carry flag setting), and expect I32 input for shift and boolean amounts. Most other helper functions don't take Operands, throw on out of range values, and take specific types of OpCode, so didn't need any asserts. * Use ConstF rather than creating an operand. (useful for pooling in future) * Move exclusive load to helper, reference call flag rather than literal 1. * Address LDj feedback (minus table flatten) one final look before it's all gone. the world is so beautiful. * Flatten OpCodeTable oh no * Address more table ordering * Call Flag as int on A32 Co-authored-by: Natalie C. <cyuubiapps@gmail.com> Co-authored-by: Thog <thog@protonmail.com>
2020-02-23 22:20:40 +01:00
case 0b0101: // MVFR2
throw new NotImplementedException("MVFR2");
case 0b0110: // MVFR1
throw new NotImplementedException("MVFR1");
case 0b0111: // MVFR0
throw new NotImplementedException("MVFR0");
case 0b1000: // FPEXC
throw new NotImplementedException("Supervisor Only");
Add Profiled Persistent Translation Cache. (#769) * Delete DelegateTypes.cs * Delete DelegateCache.cs * Add files via upload * Update Horizon.cs * Update Program.cs * Update MainWindow.cs * Update Aot.cs * Update RelocEntry.cs * Update Translator.cs * Update MemoryManager.cs * Update InstEmitMemoryHelper.cs * Update Delegates.cs * Nit. * Nit. * Nit. * 10 fewer MSIL bytes for us * Add comment. Nits. * Update Translator.cs * Update Aot.cs * Nits. * Opt.. * Opt.. * Opt.. * Opt.. * Allow to change compression level. * Update MemoryManager.cs * Update Translator.cs * Manage corner cases during the save phase. Nits. * Update Aot.cs * Translator response tweak for Aot disabled. Nit. * Nit. * Nits. * Create DelegateHelpers.cs * Update Delegates.cs * Nit. * Nit. * Nits. * Fix due to #784. * Fixes due to #757 & #841. * Fix due to #846. * Fix due to #847. * Use MethodInfo for managed method calls. Use IR methods instead of managed methods about Max/Min (S/U). Follow-ups & Nits. * Add missing exception messages. Reintroduce slow path for Fmov_Vi. Implement slow path for Fmov_Si. * Switch to the new folder structure. Nits. * Impl. index-based relocation information. Impl. cache file version field. * Nit. * Address gdkchan comments. Mainly: - fixed cache file corruption issue on exit; - exposed a way to disable AOT on the GUI. * Address AcK77 comment. * Address Thealexbarney, jduncanator & emmauss comments. Header magic, CpuId (FI) & Aot -> Ptc. * Adaptation to the new application reloading system. Improvements to the call system of managed methods. Follow-ups. Nits. * Get the same boot times as on master when PTC is disabled. * Profiled Aot. * A32 support (#897). * #975 support (1 of 2). * #975 support (2 of 2). * Rebase fix & nits. * Some fixes and nits (still one bug left). * One fix & nits. * Tests fix (by gdk) & nits. * Support translations not only in high quality and rejit. Nits. * Added possibility to skip translations and continue execution, using `ESC` key. * Update SettingsWindow.cs * Update GLRenderer.cs * Update Ptc.cs * Disabled Profiled PTC by default as requested in the past by gdk. * Fix rejit bug. Increased number of parallel translations. Add stack unwinding stuffs support (1 of 2). Nits. * Add stack unwinding stuffs support (2 of 2). Tuned number of parallel translations. * Restored the ability to assemble jumps with 8-bit offset when Profiled PTC is disabled or during profiling. Modifications due to rebase. Nits. * Limited profiling of the functions to be translated to the addresses belonging to the range of static objects only. * Nits. * Nits. * Update Delegates.cs * Nit. * Update InstEmitSimdArithmetic.cs * Address riperiperi comments. * Fixed the issue of unjustifiably longer boot times at the second boot than at the first boot, measured at the same time or reference point and with the same number of translated functions. * Implemented a simple redundant load/save mechanism. Halved the value of Decoder.MaxInstsPerFunction more appropriate for the current performance of the Translator. Replaced by Logger.PrintError to Logger.PrintDebug in TexturePool.cs about the supposed invalid texture format to avoid the spawn of the log. Nits. * Nit. Improved Logger.PrintError in TexturePool.cs to avoid log spawn. Added missing code for FZ handling (in output) for fp max/min instructions (slow paths). * Add configuration migration for PTC Co-authored-by: Thog <me@thog.eu>
2020-06-16 20:28:02 +02:00
default:
throw new NotImplementedException($"Unknown VMSR 0x{op.RawOpCode:X} at 0x{op.Address:X}.");
Add most of the A32 instruction set to ARMeilleure (#897) * Implement TEQ and MOV (Imm16) * Initial work on A32 instructions + SVC. No tests yet, hangs in rtld. * Implement CLZ, fix BFI and BFC Now stops on SIMD initialization. * Exclusive access instructions, fix to mul, system instructions. Now gets to a break after SignalProcessWideKey64. * Better impl of UBFX, add UDIV and SDIV Now boots way further - now stuck on VMOV instruction. * Many more instructions, start on SIMD and testing framework. * Fix build issues * svc: Rework 32 bit codepath Fixing once and for all argument ordering issues. * Fix 32 bits stacktrace * hle debug: Add 32 bits dynamic section parsing * Fix highCq mode, add many tests, fix some instruction bugs Still suffers from critical malloc failure :weary: * Fix incorrect opcode decoders and a few more instructions. * Add a few instructions and fix others. re-disable highCq for now. Disabled the svc memory clear since i'm not sure about it. * Fix build * Fix typo in ordered/exclusive stores. * Implement some more instructions, fix others. Uxtab16/Sxtab16 are untested. * Begin impl of pairwise, some other instructions. * Add a few more instructions, a quick hack to fix svcs for now. * Add tests and fix issues with VTRN, VZIP, VUZP * Add a few more instructions, fix Vmul_1 encoding. * Fix way too many instruction bugs, add tests for some of the more important ones. * Fix HighCq, enable FastFP paths for some floating point instructions (not entirely sure why these were disabled, so important to note this commit exists) Branching has been removed in A32 shifts until I figure out if it's worth it * Cleanup Part 1 There should be no functional change between these next few commits. Should is the key word. (except for removing break handler) * Implement 32 bits syscalls Co-authored-by: riperiperi <rhy3756547@hotmail.com> Implement all 32 bits counterparts of the 64 bits syscalls we currently have. * Refactor part 2: Move index/subindex logic to Operand May have inadvertently fixed one (1) bug * Add FlushProcessDataCache32 * Address jd's comments * Remove 16 bit encodings from OpCodeTable Still need to catch some edge cases (operands that use the "F" flag) and make Q encodings with non-even indexes undefined. * Correct Fpscr handling for FP vector slow paths WIP * Add StandardFPSCRValue behaviour for all Arithmetic instructions * Add StandardFPSCRValue behaviour to compare instructions. * Force passing of fpcr to FPProcessException and FPUnpack. Reduces potential for code error significantly * OpCode cleanup * Remove urgency from DMB comment in MRRC DMB is currently a no-op via the instruction, so it should likely still be a no-op here. * Test Cleanup * Fix FPDefaultNaN on Ryzen CPUs * Improve some tests, fix some shift instructions, add slow path for Vadd * Fix Typo * More test cleanup * Flip order of Fx and index, to indicate that the operand's is the "base" * Remove Simd32 register type, use Int32 and Int64 for scalars like A64 does. * Reintroduce alignment to DecoderHelper (removed by accident) * One more realign as reading diffs is hard * Use I32 registers in A32 (part 2) Swap default integer register type based on current execution mode. * FPSCR flags as Registers (part 1) Still need to change NativeContext and ExecutionContext to allow getting/setting with the flag values. * Use I32 registers in A32 (part 1) * FPSCR flags as registers (part 2) Only CMP flags are on the registers right now. It could be useful to use more of the space in non-fast-float when implementing A32 flags accurately in the fast path. * Address Feedback * Correct FP->Int behaviour (should saturate) * Make branches made by writing to PC eligible for Rejit Greatly improves performance in most games. * Remove unused branching for Vtbl * RejitRequest as a class rather than a tuple Makes a lot more sense than storing tuples on a dictionary. * Add VMOVN, VSHR (imm), VSHRN (imm) and related tests * Re-order InstEmitSystem32 Alphabetical sorting. * Address Feedback Feedback from Ac_K, remove and sort usings. * Address Feedback 2 * Address Feedback from LDj3SNuD Opcode table reordered to have alphabetical sorting within groups, Vmaxnm and Vminnm have split names to be less ambiguous, SoftFloat nits, Test nits and Test simplification with ValueSource. * Add Debug Asserts to A32 helpers Mainly to prevent the shift ones from being used on I64 operands, as they expect I32 input for most operations (eg. carry flag setting), and expect I32 input for shift and boolean amounts. Most other helper functions don't take Operands, throw on out of range values, and take specific types of OpCode, so didn't need any asserts. * Use ConstF rather than creating an operand. (useful for pooling in future) * Move exclusive load to helper, reference call flag rather than literal 1. * Address LDj feedback (minus table flatten) one final look before it's all gone. the world is so beautiful. * Flatten OpCodeTable oh no * Address more table ordering * Call Flag as int on A32 Co-authored-by: Natalie C. <cyuubiapps@gmail.com> Co-authored-by: Thog <thog@protonmail.com>
2020-02-23 22:20:40 +01:00
}
}
private static void EmitSetNzcv(ArmEmitterContext context, Operand t)
{
Fpsr and Fpcr freed. (#3701) * Implemented in IR the managed methods of the Saturating region ... ... of the SoftFallback class (the SatQ ones). The need to natively manage the Fpcr and Fpsr system registers is still a fact. Contributes to https://github.com/Ryujinx/Ryujinx/issues/2917 ; I will open another PR to implement in Intrinsics-branchless the methods of the Saturation region as well (the SatXXXToXXX ones). All instructions involved have been tested locally in both release and debug modes, in both lowcq and highcq. * Ptc.InternalVersion = 3665 * Addressed PR feedback. * Implemented in IR the managed methods of the ShlReg region of the SoftFallback class. It also includes the last two SatQ ones (following up on https://github.com/Ryujinx/Ryujinx/pull/3665). All instructions involved have been tested locally in both release and debug modes, in both lowcq and highcq. * Fpsr and Fpcr freed. Handling/isolation of Fpsr and Fpcr via register for IR and via memory for Tests and Threads, with synchronization to context exchanges (explicit for SoftFloat); without having to call managed methods. Thanks to the inlining work of the previous two PRs and others in this. Tests performed locally in both release and debug modes, in both lowcq and highcq, with FastFP to true and false (explicit FP tests included). Tested with the title Tony Hawk's PS. Depends on shlreg. * Update InstEmitSimdHelper.cs * De-magic Masks. Remove the Stride and Len flags; Fpsr.NZCV are A32 only, then moved to Fpscr: this leads to emitting less IR in reference to Get/Set Fpsr/Fpcr/Fpscr methods in reference to Mrs/Msr (A64) and Vmrs/Vmsr (A32) instructions. * Addressed PR feedback.
2022-09-20 23:55:13 +02:00
Operand v = context.BitwiseAnd(context.ShiftRightUI(t, Const((int)PState.VFlag)), Const(1));
Operand c = context.BitwiseAnd(context.ShiftRightUI(t, Const((int)PState.CFlag)), Const(1));
Operand z = context.BitwiseAnd(context.ShiftRightUI(t, Const((int)PState.ZFlag)), Const(1));
Operand n = context.BitwiseAnd(context.ShiftRightUI(t, Const((int)PState.NFlag)), Const(1));
Add most of the A32 instruction set to ARMeilleure (#897) * Implement TEQ and MOV (Imm16) * Initial work on A32 instructions + SVC. No tests yet, hangs in rtld. * Implement CLZ, fix BFI and BFC Now stops on SIMD initialization. * Exclusive access instructions, fix to mul, system instructions. Now gets to a break after SignalProcessWideKey64. * Better impl of UBFX, add UDIV and SDIV Now boots way further - now stuck on VMOV instruction. * Many more instructions, start on SIMD and testing framework. * Fix build issues * svc: Rework 32 bit codepath Fixing once and for all argument ordering issues. * Fix 32 bits stacktrace * hle debug: Add 32 bits dynamic section parsing * Fix highCq mode, add many tests, fix some instruction bugs Still suffers from critical malloc failure :weary: * Fix incorrect opcode decoders and a few more instructions. * Add a few instructions and fix others. re-disable highCq for now. Disabled the svc memory clear since i'm not sure about it. * Fix build * Fix typo in ordered/exclusive stores. * Implement some more instructions, fix others. Uxtab16/Sxtab16 are untested. * Begin impl of pairwise, some other instructions. * Add a few more instructions, a quick hack to fix svcs for now. * Add tests and fix issues with VTRN, VZIP, VUZP * Add a few more instructions, fix Vmul_1 encoding. * Fix way too many instruction bugs, add tests for some of the more important ones. * Fix HighCq, enable FastFP paths for some floating point instructions (not entirely sure why these were disabled, so important to note this commit exists) Branching has been removed in A32 shifts until I figure out if it's worth it * Cleanup Part 1 There should be no functional change between these next few commits. Should is the key word. (except for removing break handler) * Implement 32 bits syscalls Co-authored-by: riperiperi <rhy3756547@hotmail.com> Implement all 32 bits counterparts of the 64 bits syscalls we currently have. * Refactor part 2: Move index/subindex logic to Operand May have inadvertently fixed one (1) bug * Add FlushProcessDataCache32 * Address jd's comments * Remove 16 bit encodings from OpCodeTable Still need to catch some edge cases (operands that use the "F" flag) and make Q encodings with non-even indexes undefined. * Correct Fpscr handling for FP vector slow paths WIP * Add StandardFPSCRValue behaviour for all Arithmetic instructions * Add StandardFPSCRValue behaviour to compare instructions. * Force passing of fpcr to FPProcessException and FPUnpack. Reduces potential for code error significantly * OpCode cleanup * Remove urgency from DMB comment in MRRC DMB is currently a no-op via the instruction, so it should likely still be a no-op here. * Test Cleanup * Fix FPDefaultNaN on Ryzen CPUs * Improve some tests, fix some shift instructions, add slow path for Vadd * Fix Typo * More test cleanup * Flip order of Fx and index, to indicate that the operand's is the "base" * Remove Simd32 register type, use Int32 and Int64 for scalars like A64 does. * Reintroduce alignment to DecoderHelper (removed by accident) * One more realign as reading diffs is hard * Use I32 registers in A32 (part 2) Swap default integer register type based on current execution mode. * FPSCR flags as Registers (part 1) Still need to change NativeContext and ExecutionContext to allow getting/setting with the flag values. * Use I32 registers in A32 (part 1) * FPSCR flags as registers (part 2) Only CMP flags are on the registers right now. It could be useful to use more of the space in non-fast-float when implementing A32 flags accurately in the fast path. * Address Feedback * Correct FP->Int behaviour (should saturate) * Make branches made by writing to PC eligible for Rejit Greatly improves performance in most games. * Remove unused branching for Vtbl * RejitRequest as a class rather than a tuple Makes a lot more sense than storing tuples on a dictionary. * Add VMOVN, VSHR (imm), VSHRN (imm) and related tests * Re-order InstEmitSystem32 Alphabetical sorting. * Address Feedback Feedback from Ac_K, remove and sort usings. * Address Feedback 2 * Address Feedback from LDj3SNuD Opcode table reordered to have alphabetical sorting within groups, Vmaxnm and Vminnm have split names to be less ambiguous, SoftFloat nits, Test nits and Test simplification with ValueSource. * Add Debug Asserts to A32 helpers Mainly to prevent the shift ones from being used on I64 operands, as they expect I32 input for most operations (eg. carry flag setting), and expect I32 input for shift and boolean amounts. Most other helper functions don't take Operands, throw on out of range values, and take specific types of OpCode, so didn't need any asserts. * Use ConstF rather than creating an operand. (useful for pooling in future) * Move exclusive load to helper, reference call flag rather than literal 1. * Address LDj feedback (minus table flatten) one final look before it's all gone. the world is so beautiful. * Flatten OpCodeTable oh no * Address more table ordering * Call Flag as int on A32 Co-authored-by: Natalie C. <cyuubiapps@gmail.com> Co-authored-by: Thog <thog@protonmail.com>
2020-02-23 22:20:40 +01:00
SetFlag(context, PState.VFlag, v);
SetFlag(context, PState.CFlag, c);
SetFlag(context, PState.ZFlag, z);
SetFlag(context, PState.NFlag, n);
}
private static void EmitGetFpscr(ArmEmitterContext context)
{
OpCode32SimdSpecial op = (OpCode32SimdSpecial)context.CurrOp;
Fpsr and Fpcr freed. (#3701) * Implemented in IR the managed methods of the Saturating region ... ... of the SoftFallback class (the SatQ ones). The need to natively manage the Fpcr and Fpsr system registers is still a fact. Contributes to https://github.com/Ryujinx/Ryujinx/issues/2917 ; I will open another PR to implement in Intrinsics-branchless the methods of the Saturation region as well (the SatXXXToXXX ones). All instructions involved have been tested locally in both release and debug modes, in both lowcq and highcq. * Ptc.InternalVersion = 3665 * Addressed PR feedback. * Implemented in IR the managed methods of the ShlReg region of the SoftFallback class. It also includes the last two SatQ ones (following up on https://github.com/Ryujinx/Ryujinx/pull/3665). All instructions involved have been tested locally in both release and debug modes, in both lowcq and highcq. * Fpsr and Fpcr freed. Handling/isolation of Fpsr and Fpcr via register for IR and via memory for Tests and Threads, with synchronization to context exchanges (explicit for SoftFloat); without having to call managed methods. Thanks to the inlining work of the previous two PRs and others in this. Tests performed locally in both release and debug modes, in both lowcq and highcq, with FastFP to true and false (explicit FP tests included). Tested with the title Tony Hawk's PS. Depends on shlreg. * Update InstEmitSimdHelper.cs * De-magic Masks. Remove the Stride and Len flags; Fpsr.NZCV are A32 only, then moved to Fpscr: this leads to emitting less IR in reference to Get/Set Fpsr/Fpcr/Fpscr methods in reference to Mrs/Msr (A64) and Vmrs/Vmsr (A32) instructions. * Addressed PR feedback.
2022-09-20 23:55:13 +02:00
Operand fpscr = Const(0);
Fpsr and Fpcr freed. (#3701) * Implemented in IR the managed methods of the Saturating region ... ... of the SoftFallback class (the SatQ ones). The need to natively manage the Fpcr and Fpsr system registers is still a fact. Contributes to https://github.com/Ryujinx/Ryujinx/issues/2917 ; I will open another PR to implement in Intrinsics-branchless the methods of the Saturation region as well (the SatXXXToXXX ones). All instructions involved have been tested locally in both release and debug modes, in both lowcq and highcq. * Ptc.InternalVersion = 3665 * Addressed PR feedback. * Implemented in IR the managed methods of the ShlReg region of the SoftFallback class. It also includes the last two SatQ ones (following up on https://github.com/Ryujinx/Ryujinx/pull/3665). All instructions involved have been tested locally in both release and debug modes, in both lowcq and highcq. * Fpsr and Fpcr freed. Handling/isolation of Fpsr and Fpcr via register for IR and via memory for Tests and Threads, with synchronization to context exchanges (explicit for SoftFloat); without having to call managed methods. Thanks to the inlining work of the previous two PRs and others in this. Tests performed locally in both release and debug modes, in both lowcq and highcq, with FastFP to true and false (explicit FP tests included). Tested with the title Tony Hawk's PS. Depends on shlreg. * Update InstEmitSimdHelper.cs * De-magic Masks. Remove the Stride and Len flags; Fpsr.NZCV are A32 only, then moved to Fpscr: this leads to emitting less IR in reference to Get/Set Fpsr/Fpcr/Fpscr methods in reference to Mrs/Msr (A64) and Vmrs/Vmsr (A32) instructions. * Addressed PR feedback.
2022-09-20 23:55:13 +02:00
for (int flag = 0; flag < RegisterConsts.FpFlagsCount; flag++)
{
if (FPSCR.Mask.HasFlag((FPSCR)(1u << flag)))
{
fpscr = context.BitwiseOr(fpscr, context.ShiftLeft(GetFpFlag((FPState)flag), Const(flag)));
}
}
Fpsr and Fpcr freed. (#3701) * Implemented in IR the managed methods of the Saturating region ... ... of the SoftFallback class (the SatQ ones). The need to natively manage the Fpcr and Fpsr system registers is still a fact. Contributes to https://github.com/Ryujinx/Ryujinx/issues/2917 ; I will open another PR to implement in Intrinsics-branchless the methods of the Saturation region as well (the SatXXXToXXX ones). All instructions involved have been tested locally in both release and debug modes, in both lowcq and highcq. * Ptc.InternalVersion = 3665 * Addressed PR feedback. * Implemented in IR the managed methods of the ShlReg region of the SoftFallback class. It also includes the last two SatQ ones (following up on https://github.com/Ryujinx/Ryujinx/pull/3665). All instructions involved have been tested locally in both release and debug modes, in both lowcq and highcq. * Fpsr and Fpcr freed. Handling/isolation of Fpsr and Fpcr via register for IR and via memory for Tests and Threads, with synchronization to context exchanges (explicit for SoftFloat); without having to call managed methods. Thanks to the inlining work of the previous two PRs and others in this. Tests performed locally in both release and debug modes, in both lowcq and highcq, with FastFP to true and false (explicit FP tests included). Tested with the title Tony Hawk's PS. Depends on shlreg. * Update InstEmitSimdHelper.cs * De-magic Masks. Remove the Stride and Len flags; Fpsr.NZCV are A32 only, then moved to Fpscr: this leads to emitting less IR in reference to Get/Set Fpsr/Fpcr/Fpscr methods in reference to Mrs/Msr (A64) and Vmrs/Vmsr (A32) instructions. * Addressed PR feedback.
2022-09-20 23:55:13 +02:00
SetIntA32(context, op.Rt, fpscr);
}
private static void EmitSetFpscr(ArmEmitterContext context)
{
OpCode32SimdSpecial op = (OpCode32SimdSpecial)context.CurrOp;
Fpsr and Fpcr freed. (#3701) * Implemented in IR the managed methods of the Saturating region ... ... of the SoftFallback class (the SatQ ones). The need to natively manage the Fpcr and Fpsr system registers is still a fact. Contributes to https://github.com/Ryujinx/Ryujinx/issues/2917 ; I will open another PR to implement in Intrinsics-branchless the methods of the Saturation region as well (the SatXXXToXXX ones). All instructions involved have been tested locally in both release and debug modes, in both lowcq and highcq. * Ptc.InternalVersion = 3665 * Addressed PR feedback. * Implemented in IR the managed methods of the ShlReg region of the SoftFallback class. It also includes the last two SatQ ones (following up on https://github.com/Ryujinx/Ryujinx/pull/3665). All instructions involved have been tested locally in both release and debug modes, in both lowcq and highcq. * Fpsr and Fpcr freed. Handling/isolation of Fpsr and Fpcr via register for IR and via memory for Tests and Threads, with synchronization to context exchanges (explicit for SoftFloat); without having to call managed methods. Thanks to the inlining work of the previous two PRs and others in this. Tests performed locally in both release and debug modes, in both lowcq and highcq, with FastFP to true and false (explicit FP tests included). Tested with the title Tony Hawk's PS. Depends on shlreg. * Update InstEmitSimdHelper.cs * De-magic Masks. Remove the Stride and Len flags; Fpsr.NZCV are A32 only, then moved to Fpscr: this leads to emitting less IR in reference to Get/Set Fpsr/Fpcr/Fpscr methods in reference to Mrs/Msr (A64) and Vmrs/Vmsr (A32) instructions. * Addressed PR feedback.
2022-09-20 23:55:13 +02:00
Operand fpscr = GetIntA32(context, op.Rt);
Fpsr and Fpcr freed. (#3701) * Implemented in IR the managed methods of the Saturating region ... ... of the SoftFallback class (the SatQ ones). The need to natively manage the Fpcr and Fpsr system registers is still a fact. Contributes to https://github.com/Ryujinx/Ryujinx/issues/2917 ; I will open another PR to implement in Intrinsics-branchless the methods of the Saturation region as well (the SatXXXToXXX ones). All instructions involved have been tested locally in both release and debug modes, in both lowcq and highcq. * Ptc.InternalVersion = 3665 * Addressed PR feedback. * Implemented in IR the managed methods of the ShlReg region of the SoftFallback class. It also includes the last two SatQ ones (following up on https://github.com/Ryujinx/Ryujinx/pull/3665). All instructions involved have been tested locally in both release and debug modes, in both lowcq and highcq. * Fpsr and Fpcr freed. Handling/isolation of Fpsr and Fpcr via register for IR and via memory for Tests and Threads, with synchronization to context exchanges (explicit for SoftFloat); without having to call managed methods. Thanks to the inlining work of the previous two PRs and others in this. Tests performed locally in both release and debug modes, in both lowcq and highcq, with FastFP to true and false (explicit FP tests included). Tested with the title Tony Hawk's PS. Depends on shlreg. * Update InstEmitSimdHelper.cs * De-magic Masks. Remove the Stride and Len flags; Fpsr.NZCV are A32 only, then moved to Fpscr: this leads to emitting less IR in reference to Get/Set Fpsr/Fpcr/Fpscr methods in reference to Mrs/Msr (A64) and Vmrs/Vmsr (A32) instructions. * Addressed PR feedback.
2022-09-20 23:55:13 +02:00
for (int flag = 0; flag < RegisterConsts.FpFlagsCount; flag++)
{
if (FPSCR.Mask.HasFlag((FPSCR)(1u << flag)))
{
SetFpFlag(context, (FPState)flag, context.BitwiseAnd(context.ShiftRightUI(fpscr, Const(flag)), Const(1)));
}
}
ARMeilleure: Respect FZ/RM flags for all floating point operations (#4618) * ARMeilleure: Respect Fz flag for all floating point operations. This is a change in strategy for emulating the Fz FPCR flag. Before, it was set before instructions that "needed it" and reset after. However, this missed a few hot instructions like the multiplication instruction, and the entirety of A32. The new strategy is to set the Fz flag only in the following circumstances: - Set to match FPCR before translated functions/loop are executed. - Reset when calling SoftFloat methods, set when returning. - Reset when exiting execution. This allows us to remove the code around the existing Fz aware instructions, and get the accuracy benefits on all floating point instructions executed while in translated code. Single step executions now need to be called with a context wrapper - right now it just contains the Fz flag initialization, and won't actually do anything on ARM. This fixes a bug in Breath of the Wild where some physics interactions could randomly crash the game due to subnormal values not flushing to zero. This is draft right now because I need to answer the questions: - Does dotnet avoid changing the value of Mxcsr? - Is it a good idea to assume that? Or should the flag set/restore be done on every managed method call, not just softfloat? - If we assume that, do we want a unit test to verify the behaviour? I recommend testing a bunch of games, especially games affected when this was originally added, such as #1611. * Remove unused method * Use FMA for Fmadd, Fmsub, Fnmadd, Fnmsub, Fmla, Fmls ...when available. Similar implementation to A32 * Use FMA for Frecps, Frsqrts * Don't set DAZ. * Add round mode to ARM FP mode * Fix mistakes * Add test for FP state when calling managed methods * Add explanatory comment to test. * Cleanup * Add A64 FPCR flags * Vrintx_S A32 fast path on A64 backend * Address feedback 1, re-enable DAZ * Fix FMA instructions By Elem * Address feedback
2023-04-10 12:22:58 +02:00
context.UpdateArmFpMode();
}
private static Operand EmitGetTpidrEl0(ArmEmitterContext context)
{
OpCode32System op = (OpCode32System)context.CurrOp;
Operand nativeContext = context.LoadArgument(OperandType.I64, 0);
return context.Load(OperandType.I64, context.Add(nativeContext, Const((ulong)NativeContext.GetTpidrEl0Offset())));
}
private static Operand EmitGetTpidrroEl0(ArmEmitterContext context)
{
OpCode32System op = (OpCode32System)context.CurrOp;
Operand nativeContext = context.LoadArgument(OperandType.I64, 0);
return context.Load(OperandType.I64, context.Add(nativeContext, Const((ulong)NativeContext.GetTpidrroEl0Offset())));
}
private static void EmitSetTpidrEl0(ArmEmitterContext context)
{
OpCode32System op = (OpCode32System)context.CurrOp;
Operand value = GetIntA32(context, op.Rt);
Operand nativeContext = context.LoadArgument(OperandType.I64, 0);
context.Store(context.Add(nativeContext, Const((ulong)NativeContext.GetTpidrEl0Offset())), context.ZeroExtend32(OperandType.I64, value));
}
Add most of the A32 instruction set to ARMeilleure (#897) * Implement TEQ and MOV (Imm16) * Initial work on A32 instructions + SVC. No tests yet, hangs in rtld. * Implement CLZ, fix BFI and BFC Now stops on SIMD initialization. * Exclusive access instructions, fix to mul, system instructions. Now gets to a break after SignalProcessWideKey64. * Better impl of UBFX, add UDIV and SDIV Now boots way further - now stuck on VMOV instruction. * Many more instructions, start on SIMD and testing framework. * Fix build issues * svc: Rework 32 bit codepath Fixing once and for all argument ordering issues. * Fix 32 bits stacktrace * hle debug: Add 32 bits dynamic section parsing * Fix highCq mode, add many tests, fix some instruction bugs Still suffers from critical malloc failure :weary: * Fix incorrect opcode decoders and a few more instructions. * Add a few instructions and fix others. re-disable highCq for now. Disabled the svc memory clear since i'm not sure about it. * Fix build * Fix typo in ordered/exclusive stores. * Implement some more instructions, fix others. Uxtab16/Sxtab16 are untested. * Begin impl of pairwise, some other instructions. * Add a few more instructions, a quick hack to fix svcs for now. * Add tests and fix issues with VTRN, VZIP, VUZP * Add a few more instructions, fix Vmul_1 encoding. * Fix way too many instruction bugs, add tests for some of the more important ones. * Fix HighCq, enable FastFP paths for some floating point instructions (not entirely sure why these were disabled, so important to note this commit exists) Branching has been removed in A32 shifts until I figure out if it's worth it * Cleanup Part 1 There should be no functional change between these next few commits. Should is the key word. (except for removing break handler) * Implement 32 bits syscalls Co-authored-by: riperiperi <rhy3756547@hotmail.com> Implement all 32 bits counterparts of the 64 bits syscalls we currently have. * Refactor part 2: Move index/subindex logic to Operand May have inadvertently fixed one (1) bug * Add FlushProcessDataCache32 * Address jd's comments * Remove 16 bit encodings from OpCodeTable Still need to catch some edge cases (operands that use the "F" flag) and make Q encodings with non-even indexes undefined. * Correct Fpscr handling for FP vector slow paths WIP * Add StandardFPSCRValue behaviour for all Arithmetic instructions * Add StandardFPSCRValue behaviour to compare instructions. * Force passing of fpcr to FPProcessException and FPUnpack. Reduces potential for code error significantly * OpCode cleanup * Remove urgency from DMB comment in MRRC DMB is currently a no-op via the instruction, so it should likely still be a no-op here. * Test Cleanup * Fix FPDefaultNaN on Ryzen CPUs * Improve some tests, fix some shift instructions, add slow path for Vadd * Fix Typo * More test cleanup * Flip order of Fx and index, to indicate that the operand's is the "base" * Remove Simd32 register type, use Int32 and Int64 for scalars like A64 does. * Reintroduce alignment to DecoderHelper (removed by accident) * One more realign as reading diffs is hard * Use I32 registers in A32 (part 2) Swap default integer register type based on current execution mode. * FPSCR flags as Registers (part 1) Still need to change NativeContext and ExecutionContext to allow getting/setting with the flag values. * Use I32 registers in A32 (part 1) * FPSCR flags as registers (part 2) Only CMP flags are on the registers right now. It could be useful to use more of the space in non-fast-float when implementing A32 flags accurately in the fast path. * Address Feedback * Correct FP->Int behaviour (should saturate) * Make branches made by writing to PC eligible for Rejit Greatly improves performance in most games. * Remove unused branching for Vtbl * RejitRequest as a class rather than a tuple Makes a lot more sense than storing tuples on a dictionary. * Add VMOVN, VSHR (imm), VSHRN (imm) and related tests * Re-order InstEmitSystem32 Alphabetical sorting. * Address Feedback Feedback from Ac_K, remove and sort usings. * Address Feedback 2 * Address Feedback from LDj3SNuD Opcode table reordered to have alphabetical sorting within groups, Vmaxnm and Vminnm have split names to be less ambiguous, SoftFloat nits, Test nits and Test simplification with ValueSource. * Add Debug Asserts to A32 helpers Mainly to prevent the shift ones from being used on I64 operands, as they expect I32 input for most operations (eg. carry flag setting), and expect I32 input for shift and boolean amounts. Most other helper functions don't take Operands, throw on out of range values, and take specific types of OpCode, so didn't need any asserts. * Use ConstF rather than creating an operand. (useful for pooling in future) * Move exclusive load to helper, reference call flag rather than literal 1. * Address LDj feedback (minus table flatten) one final look before it's all gone. the world is so beautiful. * Flatten OpCodeTable oh no * Address more table ordering * Call Flag as int on A32 Co-authored-by: Natalie C. <cyuubiapps@gmail.com> Co-authored-by: Thog <thog@protonmail.com>
2020-02-23 22:20:40 +01:00
}
}