* Skip repeated cache tests between same sync
* Skip some checks for regions where just one resource is resident
* Dehardcode residency page size
* Some cleanup
* Directly send host address to buffer data
* Cleanup OGLShader
* Directly copy vertex and index data too
* Revert shader bind "cache"
* Address feedback
* Attempt to support deswizzle of sparse tiled textures
* Use correct frame buffer and viewport sizes, started to clean up the copy engine
* Correct texture width alignment
* Use Scale/Translate registers to calculate viewport rect
* Allow texture copy between frame buffers
* Query multiple pages at once with GetWriteWatch
* Allow multiple buffer types to share the same page, aways use the physical address as cache key
* Remove a variable that is no longer needed
* Implement stencil testing
* Implement depth testing
* Implement face culling
* Implement front face
* Comparison functions now take OGL enums too
* Fix front facing when flipping was used
* Add depth and stencil clear values
* Add WIP support for Vertex Program A, add the FADD_I32 shader instruction, small fix on FFMA_I encoding, nits
* Add separate subroutines for program A/B, and copy attributes to a temp
* Move finalization code to main
* Add new line after flip uniform on the shader
* Handle possible case where VPB uses an output attribute written by VPA but not available on the vbo
* Address PR feedback
* Call OpenGL functions directly, remove the pfifo thread, some refactoring
* Fix PerformanceStatistics calculating the wrong host fps, remove wait event on PFIFO as this wasn't exactly was causing the freezes (may replace with an exception later)
* Organized the Gpu folder a bit more, renamed a few things, address PR feedback
* Make PerformanceStatistics thread safe
* Remove unused constant
* Use unlimited update rate for better pref