diff --git a/Documentation/accel/amdxdna/amdnpu.rst b/Documentation/accel/amdxdna/amdnpu.rst new file mode 100644 index 0000000000000000000000000000000000000000..fbe0a75853452f6067a0faec538bc67c5beaa69d --- /dev/null +++ b/Documentation/accel/amdxdna/amdnpu.rst @@ -0,0 +1,281 @@ +.. SPDX-License-Identifier: GPL-2.0-only + +.. include:: + +========= + AMD NPU +========= + +:Copyright: |copy| 2024 Advanced Micro Devices, Inc. +:Author: Sonal Santan + +Overview +======== + +AMD NPU (Neural Processing Unit) is a multi-user AI inference accelerator +integrated into AMD client APU. NPU enables efficient execution of Machine +Learning applications like CNN, LLM, etc. NPU is based on +`AMD XDNA Architecture`_. NPU is managed by **amdxdna** driver. + + +Hardware Description +==================== + +AMD NPU consists of the following hardware components: + +AMD XDNA Array +-------------- + +AMD XDNA Array comprises of 2D array of compute and memory tiles built with +`AMD AI Engine Technology`_. Each column has 4 rows of compute tiles and 1 +row of memory tile. Each compute tile contains a VLIW processor with its own +dedicated program and data memory. The memory tile acts as L2 memory. The 2D +array can be partitioned at a column boundary creating a spatially isolated +partition which can be bound to a workload context. + +Each column also has dedicated DMA engines to move data between host DDR and +memory tile. + +AMD Phoenix and AMD Hawk Point client NPU have a 4x5 topology, i.e., 4 rows of +compute tiles arranged into 5 columns. AMD Strix Point client APU have 4x8 +topology, i.e., 4 rows of compute tiles arranged into 8 columns. + +Shared L2 Memory +---------------- + +The single row of memory tiles create a pool of software managed on chip L2 +memory. DMA engines are used to move data between host DDR and memory tiles. +AMD Phoenix and AMD Hawk Point NPUs have a total of 2560 KB of L2 memory. +AMD Strix Point NPU has a total of 4096 KB of L2 memory. + +Microcontroller +--------------- + +A microcontroller runs NPU Firmware which is responsible for command processing, +XDNA Array partition setup, XDNA Array configuration, workload context +management and workload orchestration. + +NPU Firmware uses a dedicated instance of an isolated non-privileged context +called ERT to service each workload context. ERT is also used to execute user +provided ``ctrlcode`` associated with the workload context. + +NPU Firmware uses a single isolated privileged context called MERT to service +management commands from the amdxdna driver. + +Mailboxes +--------- + +The microcontroller and amdxdna driver use a privileged channel for management +tasks like setting up of contexts, telemetry, query, error handling, setting up +user channel, etc. As mentioned before, privileged channel requests are +serviced by MERT. The privileged channel is bound to a single mailbox. + +The microcontroller and amdxdna driver use a dedicated user channel per +workload context. The user channel is primarily used for submitting work to +the NPU. As mentioned before, a user channel requests are serviced by an +instance of ERT. Each user channel is bound to its own dedicated mailbox. + +PCIe EP +------- + +NPU is visible to the x86 host CPU as a PCIe device with multiple BARs and some +MSI-X interrupt vectors. NPU uses a dedicated high bandwidth SoC level fabric +for reading or writing into host memory. Each instance of ERT gets its own +dedicated MSI-X interrupt. MERT gets a single instance of MSI-X interrupt. + +The number of PCIe BARs varies depending on the specific device. Based on their +functions, PCIe BARs can generally be categorized into the following types. + +* PSP BAR: Expose the AMD PSP (Platform Security Processor) function +* SMU BAR: Expose the AMD SMU (System Management Unit) function +* SRAM BAR: Expose ring buffers for the mailbox +* Mailbox BAR: Expose the mailbox control registers (head, tail and ISR + registers etc.) +* Public Register BAR: Expose public registers + +On specific devices, the above-mentioned BAR type might be combined into a +single physical PCIe BAR. Or a module might require two physical PCIe BARs to +be fully functional. For example, + +* On AMD Phoenix device, PSP, SMU, Public Register BARs are on PCIe BAR index 0. +* On AMD Strix Point device, Mailbox and Public Register BARs are on PCIe BAR + index 0. The PSP has some registers in PCIe BAR index 0 (Public Register BAR) + and PCIe BAR index 4 (PSP BAR). + +Process Isolation Hardware +-------------------------- + +As explained before, XDNA Array can be dynamically divided into isolated +spatial partitions, each of which may have one or more columns. The spatial +partition is setup by programming the column isolation registers by the +microcontroller. Each spatial partition is associated with a PASID which is +also programmed by the microcontroller. Hence multiple spatial partitions in +the NPU can make concurrent host access protected by PASID. + +The NPU FW itself uses microcontroller MMU enforced isolated contexts for +servicing user and privileged channel requests. + + +Mixed Spatial and Temporal Scheduling +===================================== + +AMD XDNA architecture supports mixed spatial and temporal (time sharing) +scheduling of 2D array. This means that spatial partitions may be setup and +torn down dynamically to accommodate various workloads. A *spatial* partition +may be *exclusively* bound to one workload context while another partition may +be *temporarily* bound to more than one workload contexts. The microcontroller +updates the PASID for a temporarily shared partition to match the context that +has been bound to the partition at any moment. + +Resource Solver +--------------- + +The Resource Solver component of the amdxdna driver manages the allocation +of 2D array among various workloads. Every workload describes the number +of columns required to run the NPU binary in its metadata. The Resource Solver +component uses hints passed by the workload and its own heuristics to +decide 2D array (re)partition strategy and mapping of workloads for spatial and +temporal sharing of columns. The FW enforces the context-to-column(s) resource +binding decisions made by the Resource Solver. + +AMD Phoenix and AMD Hawk Point client NPU can support 6 concurrent workload +contexts. AMD Strix Point can support 16 concurrent workload contexts. + + +Application Binaries +==================== + +A NPU application workload is comprised of two separate binaries which are +generated by the NPU compiler. + +1. AMD XDNA Array overlay, which is used to configure a NPU spatial partition. + The overlay contains instructions for setting up the stream switch + configuration and ELF for the compute tiles. The overlay is loaded on the + spatial partition bound to the workload by the associated ERT instance. + Refer to the + `Versal Adaptive SoC AIE-ML Architecture Manual (AM020)`_ for more details. + +2. ``ctrlcode``, used for orchestrating the overlay loaded on the spatial + partition. ``ctrlcode`` is executed by the ERT running in protected mode on + the microcontroller in the context of the workload. ``ctrlcode`` is made up + of a sequence of opcodes named ``XAie_TxnOpcode``. Refer to the + `AI Engine Run Time`_ for more details. + + +Special Host Buffers +==================== + +Per-context Instruction Buffer +------------------------------ + +Every workload context uses a host resident 64 MB buffer which is memory +mapped into the ERT instance created to service the workload. The ``ctrlcode`` +used by the workload is copied into this special memory. This buffer is +protected by PASID like all other input/output buffers used by that workload. +Instruction buffer is also mapped into the user space of the workload. + +Global Privileged Buffer +------------------------ + +In addition, the driver also allocates a single buffer for maintenance tasks +like recording errors from MERT. This global buffer uses the global IOMMU +domain and is only accessible by MERT. + + +High-level Use Flow +=================== + +Here are the steps to run a workload on AMD NPU: + +1. Compile the workload into an overlay and a ``ctrlcode`` binary. +2. Userspace opens a context in the driver and provides the overlay. +3. The driver checks with the Resource Solver for provisioning a set of columns + for the workload. +4. The driver then asks MERT to create a context on the device with the desired + columns. +5. MERT then creates an instance of ERT. MERT also maps the Instruction Buffer + into ERT memory. +6. The userspace then copies the ``ctrlcode`` to the Instruction Buffer. +7. Userspace then creates a command buffer with pointers to input, output, and + instruction buffer; it then submits command buffer with the driver and goes + to sleep waiting for completion. +8. The driver sends the command over the Mailbox to ERT. +9. ERT *executes* the ``ctrlcode`` in the instruction buffer. +10. Execution of the ``ctrlcode`` kicks off DMAs to and from the host DDR while + AMD XDNA Array is running. +11. When ERT reaches end of ``ctrlcode``, it raises an MSI-X to send completion + signal to the driver which then wakes up the waiting workload. + + +Boot Flow +========= + +amdxdna driver uses PSP to securely load signed NPU FW and kick off the boot +of the NPU microcontroller. amdxdna driver then waits for the alive signal in +a special location on BAR 0. The NPU is switched off during SoC suspend and +turned on after resume where the NPU FW is reloaded, and the handshake is +performed again. + + +Userspace components +==================== + +Compiler +-------- + +Peano is an LLVM based open-source compiler for AMD XDNA Array compute tile +available at: +https://github.com/Xilinx/llvm-aie + +The open-source IREE compiler supports graph compilation of ML models for AMD +NPU and uses Peano underneath. It is available at: +https://github.com/nod-ai/iree-amd-aie + +Usermode Driver (UMD) +--------------------- + +The open-source XRT runtime stack interfaces with amdxdna kernel driver. XRT +can be found at: +https://github.com/Xilinx/XRT + +The open-source XRT shim for NPU is can be found at: +https://github.com/amd/xdna-driver + + +DMA Operation +============= + +DMA operation instructions are encoded in the ``ctrlcode`` as +``XAIE_IO_BLOCKWRITE`` opcode. When ERT executes ``XAIE_IO_BLOCKWRITE``, DMA +operations between host DDR and L2 memory are effected. + + +Error Handling +============== + +When MERT detects an error in AMD XDNA Array, it pauses execution for that +workload context and sends an asynchronous message to the driver over the +privileged channel. The driver then sends a buffer pointer to MERT to capture +the register states for the partition bound to faulting workload context. The +driver then decodes the error by reading the contents of the buffer pointer. + + +Telemetry +========= + +MERT can report various kinds of telemetry information like the following: + +* L1 interrupt counter +* DMA counter +* Deep Sleep counter +* etc. + + +References +========== + +- `AMD XDNA Architecture `_ +- `AMD AI Engine Technology `_ +- `Peano `_ +- `Versal Adaptive SoC AIE-ML Architecture Manual (AM020) `_ +- `AI Engine Run Time `_ diff --git a/Documentation/accel/amdxdna/index.rst b/Documentation/accel/amdxdna/index.rst new file mode 100644 index 0000000000000000000000000000000000000000..38c16939f1fce2b6c23a2294134bdf0a95dd10d4 --- /dev/null +++ b/Documentation/accel/amdxdna/index.rst @@ -0,0 +1,11 @@ +.. SPDX-License-Identifier: GPL-2.0-only + +===================================== + accel/amdxdna NPU driver +===================================== + +The accel/amdxdna driver supports the AMD NPU (Neural Processing Unit). + +.. toctree:: + + amdnpu diff --git a/Documentation/accel/index.rst b/Documentation/accel/index.rst index e94a0160b6a04b5f1eaa7b9cd7926d47571a4abb..bc85f26533d88891dde482f91e26c99991b22869 100644 --- a/Documentation/accel/index.rst +++ b/Documentation/accel/index.rst @@ -8,6 +8,7 @@ Compute Accelerators :maxdepth: 1 introduction + amdxdna/index qaic/index .. only:: subproject and html diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index dc663c0ca67067d041cf9a3767117eec765ccca8..3872bc6ec49d63772755504966ae70113f24a1db 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -4822,6 +4822,11 @@ can be preempted anytime. Tasks will also yield contended spinlocks (if the critical section isn't explicitly preempt disabled beyond the lock itself). + lazy - Scheduler controlled. Similar to full but instead + of preempting the task immediately, the task gets + one HZ tick time to yield itself before the + preemption will be forced. One preemption is when the + task returns to user space. print-fatal-signals= [KNL] debug: print fatal signals diff --git a/Documentation/devicetree/bindings/display/brcm,bcm2711-hdmi.yaml b/Documentation/devicetree/bindings/display/brcm,bcm2711-hdmi.yaml index 5b35adf34c7bd0c993ced68a72aae30f8293526d..6d11f5955b51a3d6e58f3f55b6160e0e1b3896f0 100644 --- a/Documentation/devicetree/bindings/display/brcm,bcm2711-hdmi.yaml +++ b/Documentation/devicetree/bindings/display/brcm,bcm2711-hdmi.yaml @@ -14,6 +14,8 @@ properties: enum: - brcm,bcm2711-hdmi0 - brcm,bcm2711-hdmi1 + - brcm,bcm2712-hdmi0 + - brcm,bcm2712-hdmi1 reg: items: diff --git a/Documentation/devicetree/bindings/display/brcm,bcm2835-hvs.yaml b/Documentation/devicetree/bindings/display/brcm,bcm2835-hvs.yaml index 2e8566f47e63c0ea30b7855c0fd8a163e940a935..f91c9dce2a44d30819a828d6ab856371255c4dc2 100644 --- a/Documentation/devicetree/bindings/display/brcm,bcm2835-hvs.yaml +++ b/Documentation/devicetree/bindings/display/brcm,bcm2835-hvs.yaml @@ -13,6 +13,7 @@ properties: compatible: enum: - brcm,bcm2711-hvs + - brcm,bcm2712-hvs - brcm,bcm2835-hvs reg: @@ -36,7 +37,9 @@ if: properties: compatible: contains: - const: brcm,bcm2711-hvs + enum: + - brcm,bcm2711-hvs + - brcm,bcm2712-hvs then: required: diff --git a/Documentation/devicetree/bindings/display/brcm,bcm2835-pixelvalve0.yaml b/Documentation/devicetree/bindings/display/brcm,bcm2835-pixelvalve0.yaml index 4e1ba03f6477f450868f2c73673295ada5e4c160..6b5b1d3fbc0bebf56f43de9dbdd2bb1b92f31ebb 100644 --- a/Documentation/devicetree/bindings/display/brcm,bcm2835-pixelvalve0.yaml +++ b/Documentation/devicetree/bindings/display/brcm,bcm2835-pixelvalve0.yaml @@ -20,6 +20,9 @@ properties: - brcm,bcm2711-pixelvalve2 - brcm,bcm2711-pixelvalve3 - brcm,bcm2711-pixelvalve4 + - brcm,bcm2712-pixelvalve0 + - brcm,bcm2712-pixelvalve1 + - brcm,bcm2712-pixelvalve2 reg: maxItems: 1 diff --git a/Documentation/devicetree/bindings/display/brcm,bcm2835-txp.yaml b/Documentation/devicetree/bindings/display/brcm,bcm2835-txp.yaml index bb186197e471eb4ff7d8c76e2af48975f6f270fe..16f45afd2badce61a6c0cfce0d3f170ca0e6fd27 100644 --- a/Documentation/devicetree/bindings/display/brcm,bcm2835-txp.yaml +++ b/Documentation/devicetree/bindings/display/brcm,bcm2835-txp.yaml @@ -11,7 +11,10 @@ maintainers: properties: compatible: - const: brcm,bcm2835-txp + enum: + - brcm,bcm2712-mop + - brcm,bcm2712-moplet + - brcm,bcm2835-txp reg: maxItems: 1 diff --git a/Documentation/devicetree/bindings/display/brcm,bcm2835-vc4.yaml b/Documentation/devicetree/bindings/display/brcm,bcm2835-vc4.yaml index 49a5e041aa49369a70ee716d02d5dc2a868aff28..2aa9d5d2afff0452feca3da31fe4fcc598ad2d2c 100644 --- a/Documentation/devicetree/bindings/display/brcm,bcm2835-vc4.yaml +++ b/Documentation/devicetree/bindings/display/brcm,bcm2835-vc4.yaml @@ -18,6 +18,7 @@ properties: compatible: enum: - brcm,bcm2711-vc5 + - brcm,bcm2712-vc6 - brcm,bcm2835-vc4 - brcm,cygnus-vc4 diff --git a/Documentation/devicetree/bindings/display/bridge/renesas,dsi-csi2-tx.yaml b/Documentation/devicetree/bindings/display/bridge/renesas,dsi-csi2-tx.yaml index d33026f85e19135544d2076b163681bacf2e4c99..c167795c63f6406c8d272291bdefde4d530c8c32 100644 --- a/Documentation/devicetree/bindings/display/bridge/renesas,dsi-csi2-tx.yaml +++ b/Documentation/devicetree/bindings/display/bridge/renesas,dsi-csi2-tx.yaml @@ -19,6 +19,7 @@ properties: enum: - renesas,r8a779a0-dsi-csi2-tx # for V3U - renesas,r8a779g0-dsi-csi2-tx # for V4H + - renesas,r8a779h0-dsi-csi2-tx # for V4M reg: maxItems: 1 diff --git a/Documentation/devicetree/bindings/display/bridge/ti,sn65dsi83.yaml b/Documentation/devicetree/bindings/display/bridge/ti,sn65dsi83.yaml index 48a97bb3e2e0dcad5d5275b74526ddb93c68ee25..bad6f5c81b06d938d637714023bb69e352467acb 100644 --- a/Documentation/devicetree/bindings/display/bridge/ti,sn65dsi83.yaml +++ b/Documentation/devicetree/bindings/display/bridge/ti,sn65dsi83.yaml @@ -80,12 +80,12 @@ properties: - const: 4 port@2: - $ref: /schemas/graph.yaml#/properties/port description: Video port for LVDS Channel-A output (panel or bridge). + $ref: '#/$defs/lvds-port' port@3: - $ref: /schemas/graph.yaml#/properties/port description: Video port for LVDS Channel-B output (panel or bridge). + $ref: '#/$defs/lvds-port' required: - port@0 @@ -96,6 +96,36 @@ required: - reg - ports +$defs: + lvds-port: + $ref: /schemas/graph.yaml#/$defs/port-base + unevaluatedProperties: false + + properties: + endpoint: + $ref: /schemas/media/video-interfaces.yaml# + unevaluatedProperties: false + + properties: + ti,lvds-termination-ohms: + description: The value of near end differential termination in ohms. + enum: [100, 200] + default: 200 + + ti,lvds-vod-swing-clock-microvolt: + description: LVDS diferential output voltage for clock + lanes in microvolts. + $ref: /schemas/types.yaml#/definitions/uint32-array + minItems: 2 + maxItems: 2 + + ti,lvds-vod-swing-data-microvolt: + description: LVDS diferential output voltage for data + lanes in microvolts. + $ref: /schemas/types.yaml#/definitions/uint32-array + minItems: 2 + maxItems: 2 + allOf: - if: properties: diff --git a/Documentation/devicetree/bindings/display/panel/panel-lvds.yaml b/Documentation/devicetree/bindings/display/panel/panel-lvds.yaml index 5af2d69300751a2bd9448efc0fe88e72304fe449..fcb5834f799a8afd00dbb6a053d206d689b3f932 100644 --- a/Documentation/devicetree/bindings/display/panel/panel-lvds.yaml +++ b/Documentation/devicetree/bindings/display/panel/panel-lvds.yaml @@ -42,6 +42,8 @@ properties: # Admatec 9904379 10.1" 1024x600 LVDS panel - admatec,9904379 - auo,b101ew05 + # AUO G084SN05 V9 8.4" 800x600 LVDS panel + - auo,g084sn05 # Chunghwa Picture Tubes Ltd. 7" WXGA (800x1280) TFT LCD LVDS panel - chunghwa,claa070wp03xg # EDT ETML0700Z9NDHA 7.0" WSVGA (1024x600) color TFT LCD LVDS panel diff --git a/Documentation/devicetree/bindings/display/panel/panel-simple.yaml b/Documentation/devicetree/bindings/display/panel/panel-simple.yaml index 18b63f356bb4bbf6d2c8e58b13ebb14c5f4004ad..7cdd69ba6db0d8572f532358836862304156b079 100644 --- a/Documentation/devicetree/bindings/display/panel/panel-simple.yaml +++ b/Documentation/devicetree/bindings/display/panel/panel-simple.yaml @@ -206,12 +206,16 @@ properties: - mitsubishi,aa070mc01-ca1 # Mitsubishi AA084XE01 8.4" XGA TFT LCD panel - mitsubishi,aa084xe01 + # Multi-Inno Technology Co.,Ltd MI0700A2T-30 7" 800x480 TFT Resistive Touch Module + - multi-inno,mi0700a2t-30 # Multi-Inno Technology Co.,Ltd MI0700S4T-6 7" 800x480 TFT Resistive Touch Module - multi-inno,mi0700s4t-6 # Multi-Inno Technology Co.,Ltd MI0800FT-9 8" 800x600 TFT Resistive Touch Module - multi-inno,mi0800ft-9 # Multi-Inno Technology Co.,Ltd MI1010AIT-1CP 10.1" 1280x800 LVDS IPS Cap Touch Mod. - multi-inno,mi1010ait-1cp + # Multi-Inno Technology Co.,Ltd MI1010Z1T-1CP11 10.1" 1024x600 TFT Resistive Touch Module + - multi-inno,mi1010z1t-1cp11 # NEC LCD Technologies, Ltd. 12.1" WXGA (1280x800) LVDS TFT LCD panel - nec,nl12880bc20-05 # NEC LCD Technologies,Ltd. WQVGA TFT LCD panel @@ -280,6 +284,8 @@ properties: - team-source-display,tst043015cmhx # Tianma Micro-electronics TM070JDHG30 7.0" WXGA TFT LCD panel - tianma,tm070jdhg30 + # Tianma Micro-electronics TM070JDHG34-00 7.0" WXGA (1280x800) LVDS TFT LCD panel + - tianma,tm070jdhg34-00 # Tianma Micro-electronics TM070JVHG33 7.0" WXGA TFT LCD panel - tianma,tm070jvhg33 # Tianma Micro-electronics TM070RVHG71 7.0" WXGA TFT LCD panel diff --git a/Documentation/devicetree/bindings/display/panel/samsung,atna33xc20.yaml b/Documentation/devicetree/bindings/display/panel/samsung,atna33xc20.yaml index 032f783eefc4508df35da10e53ca20ff8b1b9bdf..684c2896d2387077cf2d91cc5a025e0838c0f536 100644 --- a/Documentation/devicetree/bindings/display/panel/samsung,atna33xc20.yaml +++ b/Documentation/devicetree/bindings/display/panel/samsung,atna33xc20.yaml @@ -23,6 +23,8 @@ properties: - samsung,atna45af01 # Samsung 14.5" 3K (2944x1840 pixels) eDP AMOLED panel - samsung,atna45dc02 + # Samsung 15.6" 3K (2880x1620 pixels) eDP AMOLED panel + - samsung,atna56ac03 - const: samsung,atna33xc20 enable-gpios: true diff --git a/Documentation/devicetree/bindings/display/renesas,du.yaml b/Documentation/devicetree/bindings/display/renesas,du.yaml index c5b9e6812bceccc92b78d0e8a83f610f32e8360b..3880b4c2ea9ad44ed1e154340522074a3c7c1259 100644 --- a/Documentation/devicetree/bindings/display/renesas,du.yaml +++ b/Documentation/devicetree/bindings/display/renesas,du.yaml @@ -41,6 +41,7 @@ properties: - renesas,du-r8a77995 # for R-Car D3 compatible DU - renesas,du-r8a779a0 # for R-Car V3U compatible DU - renesas,du-r8a779g0 # for R-Car V4H compatible DU + - renesas,du-r8a779h0 # for R-Car V4M compatible DU reg: maxItems: 1 @@ -69,14 +70,12 @@ properties: $ref: /schemas/graph.yaml#/properties/port unevaluatedProperties: false - required: - - port@0 - - port@1 - unevaluatedProperties: false renesas,cmms: $ref: /schemas/types.yaml#/definitions/phandle-array + minItems: 1 + maxItems: 4 items: maxItems: 1 description: @@ -85,6 +84,8 @@ properties: renesas,vsps: $ref: /schemas/types.yaml#/definitions/phandle-array + minItems: 1 + maxItems: 4 items: items: - description: phandle to VSP instance that serves the DU channel @@ -489,9 +490,11 @@ allOf: renesas,cmms: minItems: 4 + maxItems: 4 renesas,vsps: minItems: 4 + maxItems: 4 required: - clock-names @@ -558,9 +561,11 @@ allOf: renesas,cmms: minItems: 3 + maxItems: 3 renesas,vsps: minItems: 3 + maxItems: 3 required: - clock-names @@ -627,9 +632,11 @@ allOf: renesas,cmms: minItems: 3 + maxItems: 3 renesas,vsps: minItems: 3 + maxItems: 3 required: - clock-names @@ -683,7 +690,7 @@ allOf: - port@1 renesas,vsps: - minItems: 1 + maxItems: 1 required: - clock-names @@ -746,9 +753,11 @@ allOf: renesas,cmms: minItems: 2 + maxItems: 2 renesas,vsps: minItems: 2 + maxItems: 2 required: - clock-names @@ -799,6 +808,54 @@ allOf: renesas,vsps: minItems: 2 + maxItems: 2 + + required: + - clock-names + - interrupts + - resets + - reset-names + - renesas,vsps + + - if: + properties: + compatible: + contains: + enum: + - renesas,du-r8a779h0 + then: + properties: + clocks: + items: + - description: Functional clock + + clock-names: + items: + - const: du.0 + + interrupts: + maxItems: 1 + + resets: + maxItems: 1 + + reset-names: + items: + - const: du.0 + + ports: + properties: + port@0: + description: DSI 0 + port@1: false + port@2: false + port@3: false + + required: + - port@0 + + renesas,vsps: + maxItems: 1 required: - clock-names diff --git a/Documentation/devicetree/bindings/display/rockchip/rockchip,rk3588-mipi-dsi2.yaml b/Documentation/devicetree/bindings/display/rockchip/rockchip,rk3588-mipi-dsi2.yaml new file mode 100644 index 0000000000000000000000000000000000000000..53384e47b507d4d1d882b2df630f5c3ca45cc303 --- /dev/null +++ b/Documentation/devicetree/bindings/display/rockchip/rockchip,rk3588-mipi-dsi2.yaml @@ -0,0 +1,120 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/display/rockchip/rockchip,rk3588-mipi-dsi2.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Rockchip specific extensions to the Synopsys Designware MIPI DSI2 + +maintainers: + - Heiko Stuebner + +properties: + compatible: + enum: + - rockchip,rk3588-mipi-dsi2 + + reg: + maxItems: 1 + + interrupts: + maxItems: 1 + + clocks: + maxItems: 2 + + clock-names: + items: + - const: pclk + - const: sys + + rockchip,grf: + $ref: /schemas/types.yaml#/definitions/phandle + description: + This SoC uses GRF regs to switch between vopl/vopb. + + phys: + maxItems: 1 + + phy-names: + const: dcphy + + power-domains: + maxItems: 1 + + resets: + maxItems: 1 + + reset-names: + const: apb + + ports: + $ref: /schemas/graph.yaml#/properties/ports + + properties: + port@0: + $ref: /schemas/graph.yaml#/properties/port + description: Input node to receive pixel data. + + port@1: + $ref: /schemas/graph.yaml#/properties/port + description: DSI output node to panel. + + required: + - port@0 + - port@1 + +required: + - compatible + - clocks + - clock-names + - rockchip,grf + - phys + - phy-names + - ports + - reg + +allOf: + - $ref: /schemas/display/dsi-controller.yaml# + +unevaluatedProperties: false + +examples: + - | + #include + #include + #include + #include + #include + #include + + soc { + #address-cells = <2>; + #size-cells = <2>; + + dsi@fde20000 { + compatible = "rockchip,rk3588-mipi-dsi2"; + reg = <0x0 0xfde20000 0x0 0x10000>; + interrupts = ; + clocks = <&cru PCLK_DSIHOST0>, <&cru CLK_DSIHOST0>; + clock-names = "pclk", "sys"; + resets = <&cru SRST_P_DSIHOST0>; + reset-names = "apb"; + power-domains = <&power RK3588_PD_VOP>; + phys = <&mipidcphy0 PHY_TYPE_DPHY>; + phy-names = "dcphy"; + rockchip,grf = <&vop_grf>; + + ports { + #address-cells = <1>; + #size-cells = <0>; + dsi0_in: port@0 { + reg = <0>; + }; + + dsi0_out: port@1 { + reg = <1>; + }; + }; + }; + }; diff --git a/Documentation/devicetree/bindings/display/xlnx/xlnx,zynqmp-dpsub.yaml b/Documentation/devicetree/bindings/display/xlnx/xlnx,zynqmp-dpsub.yaml index 554f9d5809d49ca3a5eba223248127c17d32107c..6b754d4f260e1b226db44ec8f59648a633f2fab0 100644 --- a/Documentation/devicetree/bindings/display/xlnx/xlnx,zynqmp-dpsub.yaml +++ b/Documentation/devicetree/bindings/display/xlnx/xlnx,zynqmp-dpsub.yaml @@ -100,12 +100,16 @@ properties: - description: Video layer, plane 1 (U/V or U) - description: Video layer, plane 2 (V) - description: Graphics layer + - description: Audio channel 0 + - description: Audio channel 1 dma-names: items: - const: vid0 - const: vid1 - const: vid2 - const: gfx0 + - const: aud0 + - const: aud1 phys: description: PHYs for the DP data lanes @@ -194,11 +198,13 @@ examples: power-domains = <&pd_dp>; resets = <&reset ZYNQMP_RESET_DP>; - dma-names = "vid0", "vid1", "vid2", "gfx0"; + dma-names = "vid0", "vid1", "vid2", "gfx0", "aud0", "aud1"; dmas = <&xlnx_dpdma 0>, <&xlnx_dpdma 1>, <&xlnx_dpdma 2>, - <&xlnx_dpdma 3>; + <&xlnx_dpdma 3>, + <&xlnx_dpdma 4>, + <&xlnx_dpdma 5>; phys = <&psgtr 1 PHY_TYPE_DP 0 3>, <&psgtr 0 PHY_TYPE_DP 1 3>; diff --git a/Documentation/devicetree/bindings/phy/fsl,imx8mq-usb-phy.yaml b/Documentation/devicetree/bindings/phy/fsl,imx8mq-usb-phy.yaml index 6d6d211883aee2fbd02f8afa65f75ac0b9f61377..daee0c0fc915392d73570d8c36cee503fb571bd7 100644 --- a/Documentation/devicetree/bindings/phy/fsl,imx8mq-usb-phy.yaml +++ b/Documentation/devicetree/bindings/phy/fsl,imx8mq-usb-phy.yaml @@ -113,11 +113,8 @@ allOf: maxItems: 1 - if: - properties: - compatible: - contains: - enum: - - fsl,imx95-usb-phy + required: + - orientation-switch then: $ref: /schemas/usb/usb-switch.yaml# diff --git a/Documentation/devicetree/bindings/regulator/qcom,qca6390-pmu.yaml b/Documentation/devicetree/bindings/regulator/qcom,qca6390-pmu.yaml index ca401a209cca7efda8b9adb0400e4098b24f0e94..47c425c9fff16f8c6a530723fdc3c3ac62b55db6 100644 --- a/Documentation/devicetree/bindings/regulator/qcom,qca6390-pmu.yaml +++ b/Documentation/devicetree/bindings/regulator/qcom,qca6390-pmu.yaml @@ -18,6 +18,7 @@ properties: compatible: enum: - qcom,qca6390-pmu + - qcom,wcn6750-pmu - qcom,wcn6855-pmu - qcom,wcn7850-pmu @@ -27,6 +28,9 @@ properties: vddaon-supply: description: VDD_AON supply regulator handle + vddasd-supply: + description: VDD_ASD supply regulator handle + vdddig-supply: description: VDD_DIG supply regulator handle @@ -42,6 +46,9 @@ properties: vddio1p2-supply: description: VDD_IO_1P2 supply regulator handle + vddrfa0p8-supply: + description: VDD_RFA_0P8 supply regulator handle + vddrfa0p95-supply: description: VDD_RFA_0P95 supply regulator handle @@ -51,12 +58,18 @@ properties: vddrfa1p3-supply: description: VDD_RFA_1P3 supply regulator handle + vddrfa1p7-supply: + description: VDD_RFA_1P7 supply regulator handle + vddrfa1p8-supply: description: VDD_RFA_1P8 supply regulator handle vddrfa1p9-supply: description: VDD_RFA_1P9 supply regulator handle + vddrfa2p2-supply: + description: VDD_RFA_2P2 supply regulator handle + vddpcie1p3-supply: description: VDD_PCIE_1P3 supply regulator handle @@ -119,6 +132,20 @@ allOf: - vddpcie1p3-supply - vddpcie1p9-supply - vddio-supply + - if: + properties: + compatible: + contains: + const: qcom,wcn6750-pmu + then: + required: + - vddaon-supply + - vddasd-supply + - vddpmu-supply + - vddrfa0p8-supply + - vddrfa1p2-supply + - vddrfa1p7-supply + - vddrfa2p2-supply - if: properties: compatible: diff --git a/Documentation/gpu/drm-kms-helpers.rst b/Documentation/gpu/drm-kms-helpers.rst index 8cf2f041af4704875910ce8228ae04615d0f21bd..b4ee25af1702b0019e0de5f9ee66d2dbdac2c664 100644 --- a/Documentation/gpu/drm-kms-helpers.rst +++ b/Documentation/gpu/drm-kms-helpers.rst @@ -221,6 +221,9 @@ Panel Helper Reference .. kernel-doc:: drivers/gpu/drm/drm_panel_orientation_quirks.c :export: +.. kernel-doc:: drivers/gpu/drm/drm_panel_backlight_quirks.c + :export: + Panel Self Refresh Helper Reference =================================== diff --git a/Documentation/gpu/xe/index.rst b/Documentation/gpu/xe/index.rst index 3f07aa3b54325d308aa93135909bdaf4dcf7b302..92cfb25e64d3279c724d4657d5e380a9aa61a1f3 100644 --- a/Documentation/gpu/xe/index.rst +++ b/Documentation/gpu/xe/index.rst @@ -23,4 +23,5 @@ DG2, etc is provided to prototype the driver. xe_firmware xe_tile xe_debugging + xe_devcoredump xe-drm-usage-stats.rst diff --git a/Documentation/gpu/xe/xe_devcoredump.rst b/Documentation/gpu/xe/xe_devcoredump.rst new file mode 100644 index 0000000000000000000000000000000000000000..ae4ec0e34dc0142aa316184ff069c68225b578bc --- /dev/null +++ b/Documentation/gpu/xe/xe_devcoredump.rst @@ -0,0 +1,14 @@ +.. SPDX-License-Identifier: (GPL-2.0+ OR MIT) + +================== +Xe Device Coredump +================== + +.. kernel-doc:: drivers/gpu/drm/xe/xe_devcoredump.c + :doc: Xe device coredump + +Internal API +============ + +.. kernel-doc:: drivers/gpu/drm/xe/xe_devcoredump.c + :internal: diff --git a/Documentation/networking/ip-sysctl.rst b/Documentation/networking/ip-sysctl.rst index eacf8983e2307476895a8def7363375f2af36d9d..dcbb6f6caf6de35a035a331c0dd1b024f04978a5 100644 --- a/Documentation/networking/ip-sysctl.rst +++ b/Documentation/networking/ip-sysctl.rst @@ -2170,6 +2170,12 @@ nexthop_compat_mode - BOOLEAN understands the new API, this sysctl can be disabled to achieve full performance benefits of the new API by disabling the nexthop expansion and extraneous notifications. + + Note that as a backward-compatible mode, dumping of modern features + might be incomplete or wrong. For example, resilient groups will not be + shown as such, but rather as just a list of next hops. Also weights that + do not fit into 8 bits will show incorrectly. + Default: true (backward compat mode) fib_notify_on_flag_change - INTEGER diff --git a/Documentation/power/runtime_pm.rst b/Documentation/power/runtime_pm.rst index 53d1996460abfca00fd1b4a1cf81bf4608ca1439..12f429359a823eeb2ed34eff169dab22924e2d98 100644 --- a/Documentation/power/runtime_pm.rst +++ b/Documentation/power/runtime_pm.rst @@ -347,7 +347,9 @@ drivers/base/power/runtime.c and include/linux/pm_runtime.h: `int pm_runtime_resume_and_get(struct device *dev);` - run pm_runtime_resume(dev) and if successful, increment the device's - usage counter; return the result of pm_runtime_resume + usage counter; returns 0 on success (whether or not the device's + runtime PM status was already 'active') or the error code from + pm_runtime_resume() on failure. `int pm_request_idle(struct device *dev);` - submit a request to execute the subsystem-level idle callback for the diff --git a/MAINTAINERS b/MAINTAINERS index 17daa9ee9384509c1ef3f2a3825a4594eab88741..63118f36e76e9386fb8176806c76f088bb3b6b0e 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -1194,6 +1194,17 @@ L: linux-spi@vger.kernel.org S: Supported F: drivers/spi/spi-amd.c +AMD XDNA DRIVER +M: Min Ma +M: Lizhi Hou +L: dri-devel@lists.freedesktop.org +S: Supported +T: git https://gitlab.freedesktop.org/drm/misc/kernel.git +F: Documentation/accel/amdxdna/ +F: drivers/accel/amdxdna/ +F: include/trace/events/amdxdna.h +F: include/uapi/drm/amdxdna_accel.h + AMD XGBE DRIVER M: "Shyam Sundar S K" L: netdev@vger.kernel.org @@ -3893,7 +3904,7 @@ W: http://www.baycom.org/~tom/ham/ham.html F: drivers/net/hamradio/baycom* BCACHE (BLOCK LAYER CACHE) -M: Coly Li +M: Coly Li M: Kent Overstreet L: linux-bcache@vger.kernel.org S: Maintained @@ -7385,7 +7396,7 @@ L: virtualization@lists.linux.dev S: Obsolete W: https://www.kraxel.org/blog/2014/10/qemu-using-cirrus-considered-harmful/ T: git https://gitlab.freedesktop.org/drm/misc/kernel.git -F: drivers/gpu/drm/tiny/cirrus.c +F: drivers/gpu/drm/tiny/cirrus-qemu.c DRM DRIVER FOR QXL VIRTUAL GPU M: Dave Airlie @@ -7796,6 +7807,7 @@ F: drivers/gpu/drm/rockchip/ DRM DRIVERS FOR STI M: Alain Volmat +M: Raphael Gallais-Pou L: dri-devel@lists.freedesktop.org S: Maintained T: git https://gitlab.freedesktop.org/drm/misc/kernel.git @@ -15345,7 +15357,7 @@ M: Daniel Machon M: UNGLinuxDriver@microchip.com L: netdev@vger.kernel.org S: Maintained -F: drivers/net/ethernet/microchip/lan969x/* +F: drivers/net/ethernet/microchip/sparx5/lan969x/* MICROCHIP LCDFB DRIVER M: Nicolas Ferre @@ -16337,6 +16349,7 @@ F: Documentation/networking/ F: Documentation/networking/net_cachelines/ F: Documentation/process/maintainer-netdev.rst F: Documentation/userspace-api/netlink/ +F: include/linux/ethtool.h F: include/linux/framer/framer-provider.h F: include/linux/framer/framer.h F: include/linux/in.h @@ -16351,6 +16364,7 @@ F: include/linux/rtnetlink.h F: include/linux/seq_file_net.h F: include/linux/skbuff* F: include/net/ +F: include/uapi/linux/ethtool.h F: include/uapi/linux/genetlink.h F: include/uapi/linux/hsr_netlink.h F: include/uapi/linux/in.h diff --git a/Makefile b/Makefile index 64c594bd7ad030d7b79a4982f0a7b1cc8315bbd2..e5b8a8832c0caa16fa05159611dc2529066095c1 100644 --- a/Makefile +++ b/Makefile @@ -2,7 +2,7 @@ VERSION = 6 PATCHLEVEL = 13 SUBLEVEL = 0 -EXTRAVERSION = -rc2 +EXTRAVERSION = -rc3 NAME = Baby Opossum Posse # *DOCUMENTATION* diff --git a/arch/arc/Kconfig b/arch/arc/Kconfig index 5b24881420414729de85b3e3c42e3c1215c62ba8..ea5a1dcb133b4f02f96a6d5ba30e5098770845d1 100644 --- a/arch/arc/Kconfig +++ b/arch/arc/Kconfig @@ -297,7 +297,6 @@ config ARC_PAGE_SIZE_16K config ARC_PAGE_SIZE_4K bool "4KB" select HAVE_PAGE_SIZE_4KB - depends on ARC_MMU_V3 || ARC_MMU_V4 endchoice @@ -474,7 +473,8 @@ config HIGHMEM config ARC_HAS_PAE40 bool "Support for the 40-bit Physical Address Extension" - depends on ISA_ARCV2 + depends on ARC_MMU_V4 + depends on !ARC_PAGE_SIZE_4K select HIGHMEM select PHYS_ADDR_T_64BIT help diff --git a/arch/arc/Makefile b/arch/arc/Makefile index 2390dd042e3636442bfd7b123209f1216095fecd..fb98478ed1ab09b903cda854bff0d2851568b979 100644 --- a/arch/arc/Makefile +++ b/arch/arc/Makefile @@ -6,7 +6,7 @@ KBUILD_DEFCONFIG := haps_hs_smp_defconfig ifeq ($(CROSS_COMPILE),) -CROSS_COMPILE := $(call cc-cross-prefix, arc-linux- arceb-linux-) +CROSS_COMPILE := $(call cc-cross-prefix, arc-linux- arceb-linux- arc-linux-gnu-) endif cflags-y += -fno-common -pipe -fno-builtin -mmedium-calls -D__linux__ diff --git a/arch/arc/boot/dts/axc001.dtsi b/arch/arc/boot/dts/axc001.dtsi index 2a151607b08057c515563472ed1129507225f9f8..88bcc7ab6f5af0d266774102d42c0a6e9eb1d938 100644 --- a/arch/arc/boot/dts/axc001.dtsi +++ b/arch/arc/boot/dts/axc001.dtsi @@ -54,7 +54,7 @@ compatible = "snps,dw-apb-gpio-port"; gpio-controller; #gpio-cells = <2>; - snps,nr-gpios = <30>; + ngpios = <30>; reg = <0>; interrupt-controller; #interrupt-cells = <2>; diff --git a/arch/arc/boot/dts/axc003.dtsi b/arch/arc/boot/dts/axc003.dtsi index c0a812674ce9edac177c5ae98889766b5e052ad6..9a2dc39a5cff4df32035eec42e77e0b8d7e39d3e 100644 --- a/arch/arc/boot/dts/axc003.dtsi +++ b/arch/arc/boot/dts/axc003.dtsi @@ -62,7 +62,7 @@ compatible = "snps,dw-apb-gpio-port"; gpio-controller; #gpio-cells = <2>; - snps,nr-gpios = <30>; + ngpios = <30>; reg = <0>; interrupt-controller; #interrupt-cells = <2>; diff --git a/arch/arc/boot/dts/axc003_idu.dtsi b/arch/arc/boot/dts/axc003_idu.dtsi index 67556f4b70574e6ca0556b5b20141d27eb6759e6..f31382cb8be4bbcffaf732ebfca9a5f418579094 100644 --- a/arch/arc/boot/dts/axc003_idu.dtsi +++ b/arch/arc/boot/dts/axc003_idu.dtsi @@ -69,7 +69,7 @@ compatible = "snps,dw-apb-gpio-port"; gpio-controller; #gpio-cells = <2>; - snps,nr-gpios = <30>; + ngpios = <30>; reg = <0>; interrupt-controller; #interrupt-cells = <2>; diff --git a/arch/arc/boot/dts/axs10x_mb.dtsi b/arch/arc/boot/dts/axs10x_mb.dtsi index b644353853049579b07d0d8b5250128dccdff356..3add2fe257f8b8dc4f6cfe1f3dbd4f6a52302e62 100644 --- a/arch/arc/boot/dts/axs10x_mb.dtsi +++ b/arch/arc/boot/dts/axs10x_mb.dtsi @@ -250,7 +250,7 @@ compatible = "snps,dw-apb-gpio-port"; gpio-controller; #gpio-cells = <2>; - snps,nr-gpios = <32>; + ngpios = <32>; reg = <0>; }; @@ -258,7 +258,7 @@ compatible = "snps,dw-apb-gpio-port"; gpio-controller; #gpio-cells = <2>; - snps,nr-gpios = <8>; + ngpios = <8>; reg = <1>; }; @@ -266,7 +266,7 @@ compatible = "snps,dw-apb-gpio-port"; gpio-controller; #gpio-cells = <2>; - snps,nr-gpios = <8>; + ngpios = <8>; reg = <2>; }; }; @@ -281,7 +281,7 @@ compatible = "snps,dw-apb-gpio-port"; gpio-controller; #gpio-cells = <2>; - snps,nr-gpios = <30>; + ngpios = <30>; reg = <0>; }; @@ -289,7 +289,7 @@ compatible = "snps,dw-apb-gpio-port"; gpio-controller; #gpio-cells = <2>; - snps,nr-gpios = <10>; + ngpios = <10>; reg = <1>; }; @@ -297,7 +297,7 @@ compatible = "snps,dw-apb-gpio-port"; gpio-controller; #gpio-cells = <2>; - snps,nr-gpios = <8>; + ngpios = <8>; reg = <2>; }; }; diff --git a/arch/arc/boot/dts/hsdk.dts b/arch/arc/boot/dts/hsdk.dts index 41b980df862b14aa2a97867d05051f0f29434b65..98bb850722a4b1859cd646da517213bccdc223d8 100644 --- a/arch/arc/boot/dts/hsdk.dts +++ b/arch/arc/boot/dts/hsdk.dts @@ -308,7 +308,7 @@ compatible = "snps,dw-apb-gpio-port"; gpio-controller; #gpio-cells = <2>; - snps,nr-gpios = <24>; + ngpios = <24>; reg = <0>; }; }; diff --git a/arch/arc/include/asm/arcregs.h b/arch/arc/include/asm/arcregs.h index 4b13f60fe7cae8ac292c855d33ab74634606ec69..005d9e4d187a0440246b131c1ee0852110336dc4 100644 --- a/arch/arc/include/asm/arcregs.h +++ b/arch/arc/include/asm/arcregs.h @@ -146,7 +146,7 @@ #ifndef __ASSEMBLY__ -#include +#include /* Helpers */ #define TO_KB(bytes) ((bytes) >> 10) diff --git a/arch/arc/include/asm/cmpxchg.h b/arch/arc/include/asm/cmpxchg.h index 58045c898340458da6e8e02f653ace55a05253c9..76f43db0890fcd0caa79a0644731723b734a306a 100644 --- a/arch/arc/include/asm/cmpxchg.h +++ b/arch/arc/include/asm/cmpxchg.h @@ -48,7 +48,7 @@ \ switch(sizeof((_p_))) { \ case 1: \ - _prev_ = (__typeof__(*(ptr)))cmpxchg_emu_u8((volatile u8 *)_p_, (uintptr_t)_o_, (uintptr_t)_n_); \ + _prev_ = (__typeof__(*(ptr)))cmpxchg_emu_u8((volatile u8 *__force)_p_, (uintptr_t)_o_, (uintptr_t)_n_); \ break; \ case 4: \ _prev_ = __cmpxchg(_p_, _o_, _n_); \ diff --git a/arch/arc/include/asm/mmu-arcv2.h b/arch/arc/include/asm/mmu-arcv2.h index d85dc07219071f34cf5a0b6e84bc8f6ab14ef36b..41412642f27963f704b601896096737e689e27d5 100644 --- a/arch/arc/include/asm/mmu-arcv2.h +++ b/arch/arc/include/asm/mmu-arcv2.h @@ -9,7 +9,7 @@ #ifndef _ASM_ARC_MMU_ARCV2_H #define _ASM_ARC_MMU_ARCV2_H -#include +#include /* * TLB Management regs diff --git a/arch/arc/net/bpf_jit_arcv2.c b/arch/arc/net/bpf_jit_arcv2.c index 4458e409ca0a84dacfaee10423ad54eaa2752fa0..6d989b6d88c69b736a701b5cbd057b754806f0ee 100644 --- a/arch/arc/net/bpf_jit_arcv2.c +++ b/arch/arc/net/bpf_jit_arcv2.c @@ -2916,7 +2916,7 @@ bool check_jmp_32(u32 curr_off, u32 targ_off, u8 cond) addendum = (cond == ARC_CC_AL) ? 0 : INSN_len_normal; disp = get_displacement(curr_off + addendum, targ_off); - if (ARC_CC_AL) + if (cond == ARC_CC_AL) return is_valid_far_disp(disp); else return is_valid_near_disp(disp); diff --git a/arch/arm64/boot/dts/xilinx/zynqmp.dtsi b/arch/arm64/boot/dts/xilinx/zynqmp.dtsi index 467f084c6469dab9ac952c45ac66ea599dc923ce..e11d282462bd3d65f577ca0ab4a31afc50e27a7c 100644 --- a/arch/arm64/boot/dts/xilinx/zynqmp.dtsi +++ b/arch/arm64/boot/dts/xilinx/zynqmp.dtsi @@ -1306,11 +1306,14 @@ "dp_vtc_pixel_clk_in"; power-domains = <&zynqmp_firmware PD_DP>; resets = <&zynqmp_reset ZYNQMP_RESET_DP>; - dma-names = "vid0", "vid1", "vid2", "gfx0"; + dma-names = "vid0", "vid1", "vid2", "gfx0", + "aud0", "aud1"; dmas = <&zynqmp_dpdma ZYNQMP_DPDMA_VIDEO0>, <&zynqmp_dpdma ZYNQMP_DPDMA_VIDEO1>, <&zynqmp_dpdma ZYNQMP_DPDMA_VIDEO2>, - <&zynqmp_dpdma ZYNQMP_DPDMA_GRAPHICS>; + <&zynqmp_dpdma ZYNQMP_DPDMA_GRAPHICS>, + <&zynqmp_dpdma ZYNQMP_DPDMA_AUDIO0>, + <&zynqmp_dpdma ZYNQMP_DPDMA_AUDIO1>; ports { #address-cells = <1>; diff --git a/arch/arm64/include/asm/el2_setup.h b/arch/arm64/include/asm/el2_setup.h index 85ef966c08cd231a5872d2cf988470391ad2973e..4ef52d7245bbb76a2a6f37d2512caf6728377bcf 100644 --- a/arch/arm64/include/asm/el2_setup.h +++ b/arch/arm64/include/asm/el2_setup.h @@ -87,7 +87,7 @@ 1 << PMSCR_EL2_PA_SHIFT) msr_s SYS_PMSCR_EL2, x0 // addresses and physical counter .Lskip_spe_el2_\@: - mov x0, #(MDCR_EL2_E2PB_MASK << MDCR_EL2_E2PB_SHIFT) + mov x0, #MDCR_EL2_E2PB_MASK orr x2, x2, x0 // If we don't have VHE, then // use EL1&0 translation. @@ -100,7 +100,7 @@ and x0, x0, TRBIDR_EL1_P cbnz x0, .Lskip_trace_\@ // If TRBE is available at EL2 - mov x0, #(MDCR_EL2_E2TB_MASK << MDCR_EL2_E2TB_SHIFT) + mov x0, #MDCR_EL2_E2TB_MASK orr x2, x2, x0 // allow the EL1&0 translation // to own it. diff --git a/arch/arm64/kernel/hyp-stub.S b/arch/arm64/kernel/hyp-stub.S index 65f76064c86b24db53795ea420efe56f4a21172d..ae990da1eae5a97221b87fdd4b839eccfeaa125c 100644 --- a/arch/arm64/kernel/hyp-stub.S +++ b/arch/arm64/kernel/hyp-stub.S @@ -114,8 +114,8 @@ SYM_CODE_START_LOCAL(__finalise_el2) // Use EL2 translations for SPE & TRBE and disable access from EL1 mrs x0, mdcr_el2 - bic x0, x0, #(MDCR_EL2_E2PB_MASK << MDCR_EL2_E2PB_SHIFT) - bic x0, x0, #(MDCR_EL2_E2TB_MASK << MDCR_EL2_E2TB_SHIFT) + bic x0, x0, #MDCR_EL2_E2PB_MASK + bic x0, x0, #MDCR_EL2_E2TB_MASK msr mdcr_el2, x0 // Transfer the MM state from EL1 to EL2 diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c index 14ac6fdb872b9672e4b16a097f1b577aae8dec50..37e24f1bd2279f541c38e4c84a030da7877691bd 100644 --- a/arch/arm64/kernel/signal.c +++ b/arch/arm64/kernel/signal.c @@ -1462,10 +1462,33 @@ static int setup_return(struct pt_regs *regs, struct ksignal *ksig, struct rt_sigframe_user_layout *user, int usig) { __sigrestore_t sigtramp; + int err; + + if (ksig->ka.sa.sa_flags & SA_RESTORER) + sigtramp = ksig->ka.sa.sa_restorer; + else + sigtramp = VDSO_SYMBOL(current->mm->context.vdso, sigtramp); + + err = gcs_signal_entry(sigtramp, ksig); + if (err) + return err; + + /* + * We must not fail from this point onwards. We are going to update + * registers, including SP, in order to invoke the signal handler. If + * we failed and attempted to deliver a nested SIGSEGV to a handler + * after that point, the subsequent sigreturn would end up restoring + * the (partial) state for the original signal handler. + */ regs->regs[0] = usig; + if (ksig->ka.sa.sa_flags & SA_SIGINFO) { + regs->regs[1] = (unsigned long)&user->sigframe->info; + regs->regs[2] = (unsigned long)&user->sigframe->uc; + } regs->sp = (unsigned long)user->sigframe; regs->regs[29] = (unsigned long)&user->next_frame->fp; + regs->regs[30] = (unsigned long)sigtramp; regs->pc = (unsigned long)ksig->ka.sa.sa_handler; /* @@ -1506,14 +1529,7 @@ static int setup_return(struct pt_regs *regs, struct ksignal *ksig, sme_smstop(); } - if (ksig->ka.sa.sa_flags & SA_RESTORER) - sigtramp = ksig->ka.sa.sa_restorer; - else - sigtramp = VDSO_SYMBOL(current->mm->context.vdso, sigtramp); - - regs->regs[30] = (unsigned long)sigtramp; - - return gcs_signal_entry(sigtramp, ksig); + return 0; } static int setup_rt_frame(int usig, struct ksignal *ksig, sigset_t *set, @@ -1537,14 +1553,16 @@ static int setup_rt_frame(int usig, struct ksignal *ksig, sigset_t *set, err |= __save_altstack(&frame->uc.uc_stack, regs->sp); err |= setup_sigframe(&user, regs, set, &ua_state); - if (err == 0) { + if (ksig->ka.sa.sa_flags & SA_SIGINFO) + err |= copy_siginfo_to_user(&frame->info, &ksig->info); + + if (err == 0) err = setup_return(regs, ksig, &user, usig); - if (ksig->ka.sa.sa_flags & SA_SIGINFO) { - err |= copy_siginfo_to_user(&frame->info, &ksig->info); - regs->regs[1] = (unsigned long)&frame->info; - regs->regs[2] = (unsigned long)&frame->uc; - } - } + + /* + * We must not fail if setup_return() succeeded - see comment at the + * beginning of setup_return(). + */ if (err == 0) set_handler_user_access_state(); diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c index caef85462acb6f37c38f45246d08858728f1cc4f..1d9d51d7627fd48257fe7d7be00093d884041055 100644 --- a/arch/arm64/kernel/stacktrace.c +++ b/arch/arm64/kernel/stacktrace.c @@ -26,7 +26,6 @@ enum kunwind_source { KUNWIND_SOURCE_CALLER, KUNWIND_SOURCE_TASK, KUNWIND_SOURCE_REGS_PC, - KUNWIND_SOURCE_REGS_LR, }; union unwind_flags { @@ -138,8 +137,10 @@ kunwind_recover_return_address(struct kunwind_state *state) orig_pc = ftrace_graph_ret_addr(state->task, &state->graph_idx, state->common.pc, (void *)state->common.fp); - if (WARN_ON_ONCE(state->common.pc == orig_pc)) + if (state->common.pc == orig_pc) { + WARN_ON_ONCE(state->task == current); return -EINVAL; + } state->common.pc = orig_pc; state->flags.fgraph = 1; } @@ -178,23 +179,8 @@ int kunwind_next_regs_pc(struct kunwind_state *state) state->regs = regs; state->common.pc = regs->pc; state->common.fp = regs->regs[29]; - state->source = KUNWIND_SOURCE_REGS_PC; - return 0; -} - -static __always_inline int -kunwind_next_regs_lr(struct kunwind_state *state) -{ - /* - * The stack for the regs was consumed by kunwind_next_regs_pc(), so we - * cannot consume that again here, but we know the regs are safe to - * access. - */ - state->common.pc = state->regs->regs[30]; - state->common.fp = state->regs->regs[29]; state->regs = NULL; - state->source = KUNWIND_SOURCE_REGS_LR; - + state->source = KUNWIND_SOURCE_REGS_PC; return 0; } @@ -215,12 +201,12 @@ kunwind_next_frame_record_meta(struct kunwind_state *state) case FRAME_META_TYPE_FINAL: if (meta == &task_pt_regs(tsk)->stackframe) return -ENOENT; - WARN_ON_ONCE(1); + WARN_ON_ONCE(tsk == current); return -EINVAL; case FRAME_META_TYPE_PT_REGS: return kunwind_next_regs_pc(state); default: - WARN_ON_ONCE(1); + WARN_ON_ONCE(tsk == current); return -EINVAL; } } @@ -274,11 +260,8 @@ kunwind_next(struct kunwind_state *state) case KUNWIND_SOURCE_FRAME: case KUNWIND_SOURCE_CALLER: case KUNWIND_SOURCE_TASK: - case KUNWIND_SOURCE_REGS_LR: - err = kunwind_next_frame_record(state); - break; case KUNWIND_SOURCE_REGS_PC: - err = kunwind_next_regs_lr(state); + err = kunwind_next_frame_record(state); break; default: err = -EINVAL; @@ -436,7 +419,6 @@ static const char *state_source_string(const struct kunwind_state *state) case KUNWIND_SOURCE_CALLER: return "C"; case KUNWIND_SOURCE_TASK: return "T"; case KUNWIND_SOURCE_REGS_PC: return "P"; - case KUNWIND_SOURCE_REGS_LR: return "L"; default: return "U"; } } diff --git a/arch/arm64/kvm/at.c b/arch/arm64/kvm/at.c index 8c5d7990e5b31207e9ded0da9319591d3c69e47f..3d7eb395e33ddb0b0a825e83d17881c9132f1fba 100644 --- a/arch/arm64/kvm/at.c +++ b/arch/arm64/kvm/at.c @@ -739,8 +739,15 @@ static u64 compute_par_s12(struct kvm_vcpu *vcpu, u64 s1_par, final_attr = s1_parattr; break; default: - /* MemAttr[2]=0, Device from S2 */ - final_attr = s2_memattr & GENMASK(1,0) << 2; + /* + * MemAttr[2]=0, Device from S2. + * + * FWB does not influence the way that stage 1 + * memory types and attributes are combined + * with stage 2 Device type and attributes. + */ + final_attr = min(s2_memattr_to_attr(s2_memattr), + s1_parattr); } } else { /* Combination of R_HMNDG, R_TNHFM and R_GQFSF */ diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c index 01616c39a810777a123b6a3a5da33a16b6768ad7..071993c16de81ca0b0181c56d0598b1b026ae018 100644 --- a/arch/arm64/kvm/hyp/nvhe/pkvm.c +++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c @@ -126,7 +126,7 @@ static void pvm_init_traps_aa64dfr0(struct kvm_vcpu *vcpu) /* Trap SPE */ if (!FIELD_GET(ARM64_FEATURE_MASK(ID_AA64DFR0_EL1_PMSVer), feature_ids)) { mdcr_set |= MDCR_EL2_TPMS; - mdcr_clear |= MDCR_EL2_E2PB_MASK << MDCR_EL2_E2PB_SHIFT; + mdcr_clear |= MDCR_EL2_E2PB_MASK; } /* Trap Trace Filter */ @@ -143,7 +143,7 @@ static void pvm_init_traps_aa64dfr0(struct kvm_vcpu *vcpu) /* Trap External Trace */ if (!FIELD_GET(ARM64_FEATURE_MASK(ID_AA64DFR0_EL1_ExtTrcBuff), feature_ids)) - mdcr_clear |= MDCR_EL2_E2TB_MASK << MDCR_EL2_E2TB_SHIFT; + mdcr_clear |= MDCR_EL2_E2TB_MASK; vcpu->arch.mdcr_el2 |= mdcr_set; vcpu->arch.mdcr_el2 &= ~mdcr_clear; diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c index 83c6b4a07ef56cf0ed9c8751ec80686f45dca6b2..e2a5c2918d9e5af9ee8526dce221d6e80c292d03 100644 --- a/arch/arm64/kvm/sys_regs.c +++ b/arch/arm64/kvm/sys_regs.c @@ -2618,7 +2618,8 @@ static const struct sys_reg_desc sys_reg_descs[] = { ID_WRITABLE(ID_AA64MMFR0_EL1, ~(ID_AA64MMFR0_EL1_RES0 | ID_AA64MMFR0_EL1_TGRAN4_2 | ID_AA64MMFR0_EL1_TGRAN64_2 | - ID_AA64MMFR0_EL1_TGRAN16_2)), + ID_AA64MMFR0_EL1_TGRAN16_2 | + ID_AA64MMFR0_EL1_ASIDBITS)), ID_WRITABLE(ID_AA64MMFR1_EL1, ~(ID_AA64MMFR1_EL1_RES0 | ID_AA64MMFR1_EL1_HCX | ID_AA64MMFR1_EL1_TWED | diff --git a/arch/arm64/kvm/vgic/vgic-its.c b/arch/arm64/kvm/vgic/vgic-its.c index f4c4494645c34bf145baedf63267cbdee16cbe4c..fb96802799c6fedb5df3a5cbcf0b275fb86398a1 100644 --- a/arch/arm64/kvm/vgic/vgic-its.c +++ b/arch/arm64/kvm/vgic/vgic-its.c @@ -608,12 +608,22 @@ static void vgic_its_cache_translation(struct kvm *kvm, struct vgic_its *its, lockdep_assert_held(&its->its_lock); vgic_get_irq_kref(irq); + old = xa_store(&its->translation_cache, cache_key, irq, GFP_KERNEL_ACCOUNT); + + /* + * Put the reference taken on @irq if the store fails. Intentionally do + * not return the error as the translation cache is best effort. + */ + if (xa_is_err(old)) { + vgic_put_irq(kvm, irq); + return; + } + /* * We could have raced with another CPU caching the same * translation behind our back, ensure we don't leak a * reference if that is the case. */ - old = xa_store(&its->translation_cache, cache_key, irq, GFP_KERNEL_ACCOUNT); if (old) vgic_put_irq(kvm, old); } diff --git a/arch/openrisc/kernel/entry.S b/arch/openrisc/kernel/entry.S index 440711d7bf40e3538dcaf9a64c2451079253b9d7..ce6f2b08a35edc6cccd03937344db736031ee9ec 100644 --- a/arch/openrisc/kernel/entry.S +++ b/arch/openrisc/kernel/entry.S @@ -239,6 +239,8 @@ handler: ;\ /* =====================================================[ exceptions] === */ + __REF + /* ---[ 0x100: RESET exception ]----------------------------------------- */ EXCEPTION_ENTRY(_tng_kernel_start) diff --git a/arch/openrisc/kernel/head.S b/arch/openrisc/kernel/head.S index 439e00f81e5dde182244d7e90756157c8fc02b7f..bd760066f1cdc45045a430602c1379856cd71beb 100644 --- a/arch/openrisc/kernel/head.S +++ b/arch/openrisc/kernel/head.S @@ -26,15 +26,15 @@ #include #include -#define tophys(rd,rs) \ - l.movhi rd,hi(-KERNELBASE) ;\ +#define tophys(rd,rs) \ + l.movhi rd,hi(-KERNELBASE) ;\ l.add rd,rd,rs -#define CLEAR_GPR(gpr) \ +#define CLEAR_GPR(gpr) \ l.movhi gpr,0x0 -#define LOAD_SYMBOL_2_GPR(gpr,symbol) \ - l.movhi gpr,hi(symbol) ;\ +#define LOAD_SYMBOL_2_GPR(gpr,symbol) \ + l.movhi gpr,hi(symbol) ;\ l.ori gpr,gpr,lo(symbol) @@ -326,21 +326,21 @@ l.addi r1,r1,-(INT_FRAME_SIZE) ;\ /* r1 is KSP, r30 is __pa(KSP) */ ;\ tophys (r30,r1) ;\ - l.sw PT_GPR12(r30),r12 ;\ + l.sw PT_GPR12(r30),r12 ;\ l.mfspr r12,r0,SPR_EPCR_BASE ;\ l.sw PT_PC(r30),r12 ;\ l.mfspr r12,r0,SPR_ESR_BASE ;\ l.sw PT_SR(r30),r12 ;\ /* save r31 */ ;\ EXCEPTION_T_LOAD_GPR30(r12) ;\ - l.sw PT_GPR30(r30),r12 ;\ + l.sw PT_GPR30(r30),r12 ;\ /* save r10 as was prior to exception */ ;\ EXCEPTION_T_LOAD_GPR10(r12) ;\ - l.sw PT_GPR10(r30),r12 ;\ - /* save PT_SP as was prior to exception */ ;\ + l.sw PT_GPR10(r30),r12 ;\ + /* save PT_SP as was prior to exception */ ;\ EXCEPTION_T_LOAD_SP(r12) ;\ l.sw PT_SP(r30),r12 ;\ - l.sw PT_GPR13(r30),r13 ;\ + l.sw PT_GPR13(r30),r13 ;\ /* --> */ ;\ /* save exception r4, set r4 = EA */ ;\ l.sw PT_GPR4(r30),r4 ;\ @@ -357,6 +357,8 @@ /* =====================================================[ exceptions] === */ + __HEAD + /* ---[ 0x100: RESET exception ]----------------------------------------- */ .org 0x100 /* Jump to .init code at _start which lives in the .head section @@ -394,7 +396,7 @@ _dispatch_do_ipage_fault: .org 0x500 EXCEPTION_HANDLE(_timer_handler) -/* ---[ 0x600: Alignment exception ]-------------------------------------- */ +/* ---[ 0x600: Alignment exception ]------------------------------------- */ .org 0x600 EXCEPTION_HANDLE(_alignment_handler) @@ -424,7 +426,7 @@ _dispatch_do_ipage_fault: .org 0xc00 EXCEPTION_HANDLE(_sys_call_handler) -/* ---[ 0xd00: Floating point exception ]--------------------------------- */ +/* ---[ 0xd00: Floating point exception ]-------------------------------- */ .org 0xd00 EXCEPTION_HANDLE(_fpe_trap_handler) @@ -506,10 +508,10 @@ _dispatch_do_ipage_fault: /* .text*/ -/* This early stuff belongs in HEAD, but some of the functions below definitely +/* This early stuff belongs in the .init.text section, but some of the functions below definitely * don't... */ - __HEAD + __INIT .global _start _start: /* Init r0 to zero as per spec */ @@ -816,7 +818,7 @@ secondary_start: #endif -/* ========================================[ cache ]=== */ +/* ==========================================================[ cache ]=== */ /* alignment here so we don't change memory offsets with * memory controller defined diff --git a/arch/openrisc/kernel/vmlinux.lds.S b/arch/openrisc/kernel/vmlinux.lds.S index bc13060478373d785f07305adc2b321e9ffb1ac3..049bff45f612658c3e5658b0ee1a9a86561fda2a 100644 --- a/arch/openrisc/kernel/vmlinux.lds.S +++ b/arch/openrisc/kernel/vmlinux.lds.S @@ -50,6 +50,7 @@ SECTIONS .text : AT(ADDR(.text) - LOAD_OFFSET) { _stext = .; + HEAD_TEXT TEXT_TEXT SCHED_TEXT LOCK_TEXT @@ -83,8 +84,6 @@ SECTIONS . = ALIGN(PAGE_SIZE); __init_begin = .; - HEAD_TEXT_SECTION - /* Page aligned */ INIT_TEXT_SECTION(PAGE_SIZE) diff --git a/arch/riscv/include/asm/kfence.h b/arch/riscv/include/asm/kfence.h index 7388edd88986f94b7167ba4b84f802ac5fba0b91..d08bf7fb3aee610fd5475144f8a092603558abd2 100644 --- a/arch/riscv/include/asm/kfence.h +++ b/arch/riscv/include/asm/kfence.h @@ -22,7 +22,9 @@ static inline bool kfence_protect_page(unsigned long addr, bool protect) else set_pte(pte, __pte(pte_val(ptep_get(pte)) | _PAGE_PRESENT)); - flush_tlb_kernel_range(addr, addr + PAGE_SIZE); + preempt_disable(); + local_flush_tlb_kernel_range(addr, addr + PAGE_SIZE); + preempt_enable(); return true; } diff --git a/arch/riscv/kernel/jump_label.c b/arch/riscv/kernel/jump_label.c index 6eee6f736f68764c996ef3838f2671d9a1f4266f..654ed159c830b3d5e34ac58bf367107066eb73a1 100644 --- a/arch/riscv/kernel/jump_label.c +++ b/arch/riscv/kernel/jump_label.c @@ -36,9 +36,15 @@ bool arch_jump_label_transform_queue(struct jump_entry *entry, insn = RISCV_INSN_NOP; } - mutex_lock(&text_mutex); - patch_insn_write(addr, &insn, sizeof(insn)); - mutex_unlock(&text_mutex); + if (early_boot_irqs_disabled) { + riscv_patch_in_stop_machine = 1; + patch_insn_write(addr, &insn, sizeof(insn)); + riscv_patch_in_stop_machine = 0; + } else { + mutex_lock(&text_mutex); + patch_insn_write(addr, &insn, sizeof(insn)); + mutex_unlock(&text_mutex); + } return true; } diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c index 016b48fcd6f27ec6c129ad2bc51507ff148a0e13..45010e71df86ce40f99fabf03103a49ad1041f3d 100644 --- a/arch/riscv/kernel/setup.c +++ b/arch/riscv/kernel/setup.c @@ -227,7 +227,7 @@ static void __init init_resources(void) static void __init parse_dtb(void) { /* Early scan of device tree from init memory */ - if (early_init_dt_scan(dtb_early_va, __pa(dtb_early_va))) { + if (early_init_dt_scan(dtb_early_va, dtb_early_pa)) { const char *name = of_flat_dt_get_machine_name(); if (name) { diff --git a/arch/riscv/kvm/aia.c b/arch/riscv/kvm/aia.c index dcced4db7fe8c3b6e4d1813a07b346fafad9270a..19afd1f235377682488304d28900bfcbc93e4e61 100644 --- a/arch/riscv/kvm/aia.c +++ b/arch/riscv/kvm/aia.c @@ -590,7 +590,7 @@ void kvm_riscv_aia_enable(void) csr_set(CSR_HIE, BIT(IRQ_S_GEXT)); /* Enable IRQ filtering for overflow interrupt only if sscofpmf is present */ if (__riscv_isa_extension_available(NULL, RISCV_ISA_EXT_SSCOFPMF)) - csr_write(CSR_HVIEN, BIT(IRQ_PMU_OVF)); + csr_set(CSR_HVIEN, BIT(IRQ_PMU_OVF)); } void kvm_riscv_aia_disable(void) diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c index 0e8c20adcd98df5874fecb70573f9375ca333407..fc53ce748c8049a6aa79c68de9b8526731540ea3 100644 --- a/arch/riscv/mm/init.c +++ b/arch/riscv/mm/init.c @@ -1566,7 +1566,7 @@ static void __meminit free_pte_table(pte_t *pte_start, pmd_t *pmd) pmd_clear(pmd); } -static void __meminit free_pmd_table(pmd_t *pmd_start, pud_t *pud) +static void __meminit free_pmd_table(pmd_t *pmd_start, pud_t *pud, bool is_vmemmap) { struct page *page = pud_page(*pud); struct ptdesc *ptdesc = page_ptdesc(page); @@ -1579,7 +1579,8 @@ static void __meminit free_pmd_table(pmd_t *pmd_start, pud_t *pud) return; } - pagetable_pmd_dtor(ptdesc); + if (!is_vmemmap) + pagetable_pmd_dtor(ptdesc); if (PageReserved(page)) free_reserved_page(page); else @@ -1703,7 +1704,7 @@ static void __meminit remove_pud_mapping(pud_t *pud_base, unsigned long addr, un remove_pmd_mapping(pmd_base, addr, next, is_vmemmap, altmap); if (pgtable_l4_enabled) - free_pmd_table(pmd_base, pudp); + free_pmd_table(pmd_base, pudp, is_vmemmap); } } diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c index bb284aff7bfd6d7474a7563f931d59d6300a1473..2e1e268460500a44aa5f62394c23014ab7a0088d 100644 --- a/arch/x86/events/intel/core.c +++ b/arch/x86/events/intel/core.c @@ -7135,6 +7135,7 @@ __init int intel_pmu_init(void) case INTEL_METEORLAKE: case INTEL_METEORLAKE_L: + case INTEL_ARROWLAKE_U: intel_pmu_init_hybrid(hybrid_big_small); x86_pmu.pebs_latency_data = cmt_latency_data; diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c index 8afc4ad3cd16eb8039c75f89e5e44ead7db1fdb1..1a4b326ca2ce1d8fd4ecc008dcf15c0f7bbf5bd6 100644 --- a/arch/x86/events/intel/ds.c +++ b/arch/x86/events/intel/ds.c @@ -1489,7 +1489,7 @@ void intel_pmu_pebs_enable(struct perf_event *event) * hence we need to drain when changing said * size. */ - intel_pmu_drain_large_pebs(cpuc); + intel_pmu_drain_pebs_buffer(); adaptive_pebs_record_size_update(); wrmsrl(MSR_PEBS_DATA_CFG, pebs_data_cfg); cpuc->active_pebs_data_cfg = pebs_data_cfg; diff --git a/arch/x86/kernel/relocate_kernel_64.S b/arch/x86/kernel/relocate_kernel_64.S index 1236f25fc8d1206854bc378301564701c1085e30..540443d699e3c243ede2e0df33b281d27c331918 100644 --- a/arch/x86/kernel/relocate_kernel_64.S +++ b/arch/x86/kernel/relocate_kernel_64.S @@ -13,6 +13,7 @@ #include #include #include +#include /* * Must be relocatable PIC code callable as a C function, in particular diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 097bdc022d0f476cb9e1f4d149f9a5a082f90898..ae0b438a2c991ceb6a28af84914808e4df913b5b 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -36,6 +36,26 @@ u32 kvm_cpu_caps[NR_KVM_CPU_CAPS] __read_mostly; EXPORT_SYMBOL_GPL(kvm_cpu_caps); +struct cpuid_xstate_sizes { + u32 eax; + u32 ebx; + u32 ecx; +}; + +static struct cpuid_xstate_sizes xstate_sizes[XFEATURE_MAX] __ro_after_init; + +void __init kvm_init_xstate_sizes(void) +{ + u32 ign; + int i; + + for (i = XFEATURE_YMM; i < ARRAY_SIZE(xstate_sizes); i++) { + struct cpuid_xstate_sizes *xs = &xstate_sizes[i]; + + cpuid_count(0xD, i, &xs->eax, &xs->ebx, &xs->ecx, &ign); + } +} + u32 xstate_required_size(u64 xstate_bv, bool compacted) { int feature_bit = 0; @@ -44,14 +64,15 @@ u32 xstate_required_size(u64 xstate_bv, bool compacted) xstate_bv &= XFEATURE_MASK_EXTEND; while (xstate_bv) { if (xstate_bv & 0x1) { - u32 eax, ebx, ecx, edx, offset; - cpuid_count(0xD, feature_bit, &eax, &ebx, &ecx, &edx); + struct cpuid_xstate_sizes *xs = &xstate_sizes[feature_bit]; + u32 offset; + /* ECX[1]: 64B alignment in compacted form */ if (compacted) - offset = (ecx & 0x2) ? ALIGN(ret, 64) : ret; + offset = (xs->ecx & 0x2) ? ALIGN(ret, 64) : ret; else - offset = ebx; - ret = max(ret, offset + eax); + offset = xs->ebx; + ret = max(ret, offset + xs->eax); } xstate_bv >>= 1; diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h index c8dc66eddefda7506fac15332bdc58cca263607d..f16a7b2c2adcf439d65b781d92e6ecbff43204a0 100644 --- a/arch/x86/kvm/cpuid.h +++ b/arch/x86/kvm/cpuid.h @@ -31,6 +31,7 @@ int kvm_vcpu_ioctl_get_cpuid2(struct kvm_vcpu *vcpu, bool kvm_cpuid(struct kvm_vcpu *vcpu, u32 *eax, u32 *ebx, u32 *ecx, u32 *edx, bool exact_only); +void __init kvm_init_xstate_sizes(void); u32 xstate_required_size(u64 xstate_bv, bool compacted); int cpuid_query_maxphyaddr(struct kvm_vcpu *vcpu); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 2e713480933a1378398d5cab50db8e40396502d6..c8160baf3838515caaf96b6d231c00242c93748a 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -13997,6 +13997,8 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_rmp_fault); static int __init kvm_x86_init(void) { + kvm_init_xstate_sizes(); + kvm_mmu_x86_module_init(); mitigate_smt_rsb &= boot_cpu_has_bug(X86_BUG_SMT_RSB) && cpu_smt_possible(); return 0; diff --git a/block/bio.c b/block/bio.c index 699a78c85c75660fed2463759e2491bb12c63453..d5bdc31d88d327bac40535059080e3f571741811 100644 --- a/block/bio.c +++ b/block/bio.c @@ -1171,7 +1171,7 @@ void __bio_release_pages(struct bio *bio, bool mark_dirty) } EXPORT_SYMBOL_GPL(__bio_release_pages); -void bio_iov_bvec_set(struct bio *bio, struct iov_iter *iter) +void bio_iov_bvec_set(struct bio *bio, const struct iov_iter *iter) { WARN_ON_ONCE(bio->bi_max_vecs); diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c index e68c725cf8d975f53703ecf6e6c50594076204c8..45a395862fbc88f448fe281eeac620710bc1587d 100644 --- a/block/blk-cgroup.c +++ b/block/blk-cgroup.c @@ -1324,10 +1324,14 @@ void blkcg_unpin_online(struct cgroup_subsys_state *blkcg_css) struct blkcg *blkcg = css_to_blkcg(blkcg_css); do { + struct blkcg *parent; + if (!refcount_dec_and_test(&blkcg->online_pin)) break; + + parent = blkcg_parent(blkcg); blkcg_destroy_blkgs(blkcg); - blkcg = blkcg_parent(blkcg); + blkcg = parent; } while (blkcg); } diff --git a/block/blk-iocost.c b/block/blk-iocost.c index 384aa15e8260bd4d83b00f1c03efb87829950327..a5894ec9696e7e8c1011cbda1d849562e1732d31 100644 --- a/block/blk-iocost.c +++ b/block/blk-iocost.c @@ -1098,7 +1098,14 @@ static void __propagate_weights(struct ioc_gq *iocg, u32 active, u32 inuse, inuse = DIV64_U64_ROUND_UP(active * iocg->child_inuse_sum, iocg->child_active_sum); } else { - inuse = clamp_t(u32, inuse, 1, active); + /* + * It may be tempting to turn this into a clamp expression with + * a lower limit of 1 but active may be 0, which cannot be used + * as an upper limit in that situation. This expression allows + * active to clamp inuse unless it is 0, in which case inuse + * becomes 1. + */ + inuse = min(inuse, active) ?: 1; } iocg->last_inuse = iocg->inuse; diff --git a/block/blk-map.c b/block/blk-map.c index b5fd1d8574615e28da206b6e12a1b81009279fff..894009b2d881be743fb46ad8b5282ef565ec8482 100644 --- a/block/blk-map.c +++ b/block/blk-map.c @@ -574,7 +574,7 @@ static int blk_rq_map_user_bvec(struct request *rq, const struct iov_iter *iter) bio = blk_rq_map_bio_alloc(rq, 0, GFP_KERNEL); if (!bio) return -ENOMEM; - bio_iov_bvec_set(bio, (struct iov_iter *)iter); + bio_iov_bvec_set(bio, iter); /* check that the data layout matches the hardware restrictions */ ret = bio_split_rw_at(bio, lim, &nsegs, max_bytes); diff --git a/block/blk-mq-sysfs.c b/block/blk-mq-sysfs.c index 156e9bb07abf1a4801f35f7197674b723155a5d8..cd5ea6eaa76b098b7f713fbf3d42461232dc8cc9 100644 --- a/block/blk-mq-sysfs.c +++ b/block/blk-mq-sysfs.c @@ -275,15 +275,13 @@ void blk_mq_sysfs_unregister_hctxs(struct request_queue *q) struct blk_mq_hw_ctx *hctx; unsigned long i; - mutex_lock(&q->sysfs_dir_lock); + lockdep_assert_held(&q->sysfs_dir_lock); + if (!q->mq_sysfs_init_done) - goto unlock; + return; queue_for_each_hw_ctx(q, hctx, i) blk_mq_unregister_hctx(hctx); - -unlock: - mutex_unlock(&q->sysfs_dir_lock); } int blk_mq_sysfs_register_hctxs(struct request_queue *q) @@ -292,9 +290,10 @@ int blk_mq_sysfs_register_hctxs(struct request_queue *q) unsigned long i; int ret = 0; - mutex_lock(&q->sysfs_dir_lock); + lockdep_assert_held(&q->sysfs_dir_lock); + if (!q->mq_sysfs_init_done) - goto unlock; + return ret; queue_for_each_hw_ctx(q, hctx, i) { ret = blk_mq_register_hctx(hctx); @@ -302,8 +301,5 @@ int blk_mq_sysfs_register_hctxs(struct request_queue *q) break; } -unlock: - mutex_unlock(&q->sysfs_dir_lock); - return ret; } diff --git a/block/blk-mq.c b/block/blk-mq.c index aa340b097b6e45e25f52f3762e1e3b4b920e702b..6b6111513986f225d97430fafdfbc106ff8ba0b5 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -1544,19 +1544,17 @@ static void blk_mq_requeue_work(struct work_struct *work) while (!list_empty(&rq_list)) { rq = list_entry(rq_list.next, struct request, queuelist); + list_del_init(&rq->queuelist); /* - * If RQF_DONTPREP ist set, the request has been started by the + * If RQF_DONTPREP is set, the request has been started by the * driver already and might have driver-specific data allocated * already. Insert it into the hctx dispatch list to avoid * block layer merges for the request. */ - if (rq->rq_flags & RQF_DONTPREP) { - list_del_init(&rq->queuelist); + if (rq->rq_flags & RQF_DONTPREP) blk_mq_request_bypass_insert(rq, 0); - } else { - list_del_init(&rq->queuelist); + else blk_mq_insert_request(rq, BLK_MQ_INSERT_AT_HEAD); - } } while (!list_empty(&flush_list)) { @@ -4455,7 +4453,8 @@ static void blk_mq_realloc_hw_ctxs(struct blk_mq_tag_set *set, unsigned long i, j; /* protect against switching io scheduler */ - mutex_lock(&q->sysfs_lock); + lockdep_assert_held(&q->sysfs_lock); + for (i = 0; i < set->nr_hw_queues; i++) { int old_node; int node = blk_mq_get_hctx_node(set, i); @@ -4488,7 +4487,6 @@ static void blk_mq_realloc_hw_ctxs(struct blk_mq_tag_set *set, xa_for_each_start(&q->hctx_table, j, hctx, j) blk_mq_exit_hctx(q, set, hctx, j); - mutex_unlock(&q->sysfs_lock); /* unregister cpuhp callbacks for exited hctxs */ blk_mq_remove_hw_queues_cpuhp(q); @@ -4520,10 +4518,14 @@ int blk_mq_init_allocated_queue(struct blk_mq_tag_set *set, xa_init(&q->hctx_table); + mutex_lock(&q->sysfs_lock); + blk_mq_realloc_hw_ctxs(set, q); if (!q->nr_hw_queues) goto err_hctxs; + mutex_unlock(&q->sysfs_lock); + INIT_WORK(&q->timeout_work, blk_mq_timeout_work); blk_queue_rq_timeout(q, set->timeout ? set->timeout : 30 * HZ); @@ -4542,6 +4544,7 @@ int blk_mq_init_allocated_queue(struct blk_mq_tag_set *set, return 0; err_hctxs: + mutex_unlock(&q->sysfs_lock); blk_mq_release(q); err_exit: q->mq_ops = NULL; @@ -4922,12 +4925,12 @@ static bool blk_mq_elv_switch_none(struct list_head *head, return false; /* q->elevator needs protection from ->sysfs_lock */ - mutex_lock(&q->sysfs_lock); + lockdep_assert_held(&q->sysfs_lock); /* the check has to be done with holding sysfs_lock */ if (!q->elevator) { kfree(qe); - goto unlock; + goto out; } INIT_LIST_HEAD(&qe->node); @@ -4937,9 +4940,7 @@ static bool blk_mq_elv_switch_none(struct list_head *head, __elevator_get(qe->type); list_add(&qe->node, head); elevator_disable(q); -unlock: - mutex_unlock(&q->sysfs_lock); - +out: return true; } @@ -4968,11 +4969,9 @@ static void blk_mq_elv_switch_back(struct list_head *head, list_del(&qe->node); kfree(qe); - mutex_lock(&q->sysfs_lock); elevator_switch(q, t); /* drop the reference acquired in blk_mq_elv_switch_none */ elevator_put(t); - mutex_unlock(&q->sysfs_lock); } static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set, @@ -4992,8 +4991,11 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set, if (set->nr_maps == 1 && nr_hw_queues == set->nr_hw_queues) return; - list_for_each_entry(q, &set->tag_list, tag_set_list) + list_for_each_entry(q, &set->tag_list, tag_set_list) { + mutex_lock(&q->sysfs_dir_lock); + mutex_lock(&q->sysfs_lock); blk_mq_freeze_queue(q); + } /* * Switch IO scheduler to 'none', cleaning up the data associated * with the previous scheduler. We will switch back once we are done @@ -5049,8 +5051,11 @@ switch_back: list_for_each_entry(q, &set->tag_list, tag_set_list) blk_mq_elv_switch_back(&head, q); - list_for_each_entry(q, &set->tag_list, tag_set_list) + list_for_each_entry(q, &set->tag_list, tag_set_list) { blk_mq_unfreeze_queue(q); + mutex_unlock(&q->sysfs_lock); + mutex_unlock(&q->sysfs_dir_lock); + } /* Free the excess tags when nr_hw_queues shrink. */ for (i = set->nr_hw_queues; i < prev_nr_hw_queues; i++) diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c index 4241aea84161ca7611bbdad70bd524ab79922898..64f70c713d2f92ca604fbebbb95f99de11197f6a 100644 --- a/block/blk-sysfs.c +++ b/block/blk-sysfs.c @@ -263,7 +263,7 @@ static ssize_t queue_nr_zones_show(struct gendisk *disk, char *page) static ssize_t queue_iostats_passthrough_show(struct gendisk *disk, char *page) { - return queue_var_show(blk_queue_passthrough_stat(disk->queue), page); + return queue_var_show(!!blk_queue_passthrough_stat(disk->queue), page); } static ssize_t queue_iostats_passthrough_store(struct gendisk *disk, @@ -706,11 +706,11 @@ queue_attr_store(struct kobject *kobj, struct attribute *attr, if (entry->load_module) entry->load_module(disk, page, length); - blk_mq_freeze_queue(q); mutex_lock(&q->sysfs_lock); + blk_mq_freeze_queue(q); res = entry->store(disk, page, length); - mutex_unlock(&q->sysfs_lock); blk_mq_unfreeze_queue(q); + mutex_unlock(&q->sysfs_lock); return res; } diff --git a/block/blk-zoned.c b/block/blk-zoned.c index 263e28b720538f0d441f1584ef82539717437ac5..84da1eadff642ded541eb1bb1c3112a7fe445010 100644 --- a/block/blk-zoned.c +++ b/block/blk-zoned.c @@ -41,7 +41,6 @@ static const char *const zone_cond_name[] = { /* * Per-zone write plug. * @node: hlist_node structure for managing the plug using a hash table. - * @link: To list the plug in the zone write plug error list of the disk. * @ref: Zone write plug reference counter. A zone write plug reference is * always at least 1 when the plug is hashed in the disk plug hash table. * The reference is incremented whenever a new BIO needing plugging is @@ -63,7 +62,6 @@ static const char *const zone_cond_name[] = { */ struct blk_zone_wplug { struct hlist_node node; - struct list_head link; refcount_t ref; spinlock_t lock; unsigned int flags; @@ -80,8 +78,8 @@ struct blk_zone_wplug { * - BLK_ZONE_WPLUG_PLUGGED: Indicates that the zone write plug is plugged, * that is, that write BIOs are being throttled due to a write BIO already * being executed or the zone write plug bio list is not empty. - * - BLK_ZONE_WPLUG_ERROR: Indicates that a write error happened which will be - * recovered with a report zone to update the zone write pointer offset. + * - BLK_ZONE_WPLUG_NEED_WP_UPDATE: Indicates that we lost track of a zone + * write pointer offset and need to update it. * - BLK_ZONE_WPLUG_UNHASHED: Indicates that the zone write plug was removed * from the disk hash table and that the initial reference to the zone * write plug set when the plug was first added to the hash table has been @@ -91,11 +89,9 @@ struct blk_zone_wplug { * freed once all remaining references from BIOs or functions are dropped. */ #define BLK_ZONE_WPLUG_PLUGGED (1U << 0) -#define BLK_ZONE_WPLUG_ERROR (1U << 1) +#define BLK_ZONE_WPLUG_NEED_WP_UPDATE (1U << 1) #define BLK_ZONE_WPLUG_UNHASHED (1U << 2) -#define BLK_ZONE_WPLUG_BUSY (BLK_ZONE_WPLUG_PLUGGED | BLK_ZONE_WPLUG_ERROR) - /** * blk_zone_cond_str - Return string XXX in BLK_ZONE_COND_XXX. * @zone_cond: BLK_ZONE_COND_XXX. @@ -115,6 +111,30 @@ const char *blk_zone_cond_str(enum blk_zone_cond zone_cond) } EXPORT_SYMBOL_GPL(blk_zone_cond_str); +struct disk_report_zones_cb_args { + struct gendisk *disk; + report_zones_cb user_cb; + void *user_data; +}; + +static void disk_zone_wplug_sync_wp_offset(struct gendisk *disk, + struct blk_zone *zone); + +static int disk_report_zones_cb(struct blk_zone *zone, unsigned int idx, + void *data) +{ + struct disk_report_zones_cb_args *args = data; + struct gendisk *disk = args->disk; + + if (disk->zone_wplugs_hash) + disk_zone_wplug_sync_wp_offset(disk, zone); + + if (!args->user_cb) + return 0; + + return args->user_cb(zone, idx, args->user_data); +} + /** * blkdev_report_zones - Get zones information * @bdev: Target block device @@ -139,6 +159,11 @@ int blkdev_report_zones(struct block_device *bdev, sector_t sector, { struct gendisk *disk = bdev->bd_disk; sector_t capacity = get_capacity(disk); + struct disk_report_zones_cb_args args = { + .disk = disk, + .user_cb = cb, + .user_data = data, + }; if (!bdev_is_zoned(bdev) || WARN_ON_ONCE(!disk->fops->report_zones)) return -EOPNOTSUPP; @@ -146,7 +171,8 @@ int blkdev_report_zones(struct block_device *bdev, sector_t sector, if (!nr_zones || sector >= capacity) return 0; - return disk->fops->report_zones(disk, sector, nr_zones, cb, data); + return disk->fops->report_zones(disk, sector, nr_zones, + disk_report_zones_cb, &args); } EXPORT_SYMBOL_GPL(blkdev_report_zones); @@ -427,7 +453,7 @@ static inline void disk_put_zone_wplug(struct blk_zone_wplug *zwplug) { if (refcount_dec_and_test(&zwplug->ref)) { WARN_ON_ONCE(!bio_list_empty(&zwplug->bio_list)); - WARN_ON_ONCE(!list_empty(&zwplug->link)); + WARN_ON_ONCE(zwplug->flags & BLK_ZONE_WPLUG_PLUGGED); WARN_ON_ONCE(!(zwplug->flags & BLK_ZONE_WPLUG_UNHASHED)); call_rcu(&zwplug->rcu_head, disk_free_zone_wplug_rcu); @@ -441,8 +467,8 @@ static inline bool disk_should_remove_zone_wplug(struct gendisk *disk, if (zwplug->flags & BLK_ZONE_WPLUG_UNHASHED) return false; - /* If the zone write plug is still busy, it cannot be removed. */ - if (zwplug->flags & BLK_ZONE_WPLUG_BUSY) + /* If the zone write plug is still plugged, it cannot be removed. */ + if (zwplug->flags & BLK_ZONE_WPLUG_PLUGGED) return false; /* @@ -525,12 +551,11 @@ again: return NULL; INIT_HLIST_NODE(&zwplug->node); - INIT_LIST_HEAD(&zwplug->link); refcount_set(&zwplug->ref, 2); spin_lock_init(&zwplug->lock); zwplug->flags = 0; zwplug->zone_no = zno; - zwplug->wp_offset = sector & (disk->queue->limits.chunk_sectors - 1); + zwplug->wp_offset = bdev_offset_from_zone_start(disk->part0, sector); bio_list_init(&zwplug->bio_list); INIT_WORK(&zwplug->bio_work, blk_zone_wplug_bio_work); zwplug->disk = disk; @@ -574,124 +599,81 @@ static void disk_zone_wplug_abort(struct blk_zone_wplug *zwplug) } /* - * Abort (fail) all plugged BIOs of a zone write plug that are not aligned - * with the assumed write pointer location of the zone when the BIO will - * be unplugged. + * Set a zone write plug write pointer offset to the specified value. + * This aborts all plugged BIOs, which is fine as this function is called for + * a zone reset operation, a zone finish operation or if the zone needs a wp + * update from a report zone after a write error. */ -static void disk_zone_wplug_abort_unaligned(struct gendisk *disk, - struct blk_zone_wplug *zwplug) -{ - unsigned int wp_offset = zwplug->wp_offset; - struct bio_list bl = BIO_EMPTY_LIST; - struct bio *bio; - - while ((bio = bio_list_pop(&zwplug->bio_list))) { - if (disk_zone_is_full(disk, zwplug->zone_no, wp_offset) || - (bio_op(bio) != REQ_OP_ZONE_APPEND && - bio_offset_from_zone_start(bio) != wp_offset)) { - blk_zone_wplug_bio_io_error(zwplug, bio); - continue; - } - - wp_offset += bio_sectors(bio); - bio_list_add(&bl, bio); - } - - bio_list_merge(&zwplug->bio_list, &bl); -} - -static inline void disk_zone_wplug_set_error(struct gendisk *disk, - struct blk_zone_wplug *zwplug) +static void disk_zone_wplug_set_wp_offset(struct gendisk *disk, + struct blk_zone_wplug *zwplug, + unsigned int wp_offset) { - unsigned long flags; + lockdep_assert_held(&zwplug->lock); - if (zwplug->flags & BLK_ZONE_WPLUG_ERROR) - return; + /* Update the zone write pointer and abort all plugged BIOs. */ + zwplug->flags &= ~BLK_ZONE_WPLUG_NEED_WP_UPDATE; + zwplug->wp_offset = wp_offset; + disk_zone_wplug_abort(zwplug); /* - * At this point, we already have a reference on the zone write plug. - * However, since we are going to add the plug to the disk zone write - * plugs work list, increase its reference count. This reference will - * be dropped in disk_zone_wplugs_work() once the error state is - * handled, or in disk_zone_wplug_clear_error() if the zone is reset or - * finished. + * The zone write plug now has no BIO plugged: remove it from the + * hash table so that it cannot be seen. The plug will be freed + * when the last reference is dropped. */ - zwplug->flags |= BLK_ZONE_WPLUG_ERROR; - refcount_inc(&zwplug->ref); - - spin_lock_irqsave(&disk->zone_wplugs_lock, flags); - list_add_tail(&zwplug->link, &disk->zone_wplugs_err_list); - spin_unlock_irqrestore(&disk->zone_wplugs_lock, flags); + if (disk_should_remove_zone_wplug(disk, zwplug)) + disk_remove_zone_wplug(disk, zwplug); } -static inline void disk_zone_wplug_clear_error(struct gendisk *disk, - struct blk_zone_wplug *zwplug) +static unsigned int blk_zone_wp_offset(struct blk_zone *zone) { - unsigned long flags; - - if (!(zwplug->flags & BLK_ZONE_WPLUG_ERROR)) - return; - - /* - * We are racing with the error handling work which drops the reference - * on the zone write plug after handling the error state. So remove the - * plug from the error list and drop its reference count only if the - * error handling has not yet started, that is, if the zone write plug - * is still listed. - */ - spin_lock_irqsave(&disk->zone_wplugs_lock, flags); - if (!list_empty(&zwplug->link)) { - list_del_init(&zwplug->link); - zwplug->flags &= ~BLK_ZONE_WPLUG_ERROR; - disk_put_zone_wplug(zwplug); + switch (zone->cond) { + case BLK_ZONE_COND_IMP_OPEN: + case BLK_ZONE_COND_EXP_OPEN: + case BLK_ZONE_COND_CLOSED: + return zone->wp - zone->start; + case BLK_ZONE_COND_FULL: + return zone->len; + case BLK_ZONE_COND_EMPTY: + return 0; + case BLK_ZONE_COND_NOT_WP: + case BLK_ZONE_COND_OFFLINE: + case BLK_ZONE_COND_READONLY: + default: + /* + * Conventional, offline and read-only zones do not have a valid + * write pointer. + */ + return UINT_MAX; } - spin_unlock_irqrestore(&disk->zone_wplugs_lock, flags); } -/* - * Set a zone write plug write pointer offset to either 0 (zone reset case) - * or to the zone size (zone finish case). This aborts all plugged BIOs, which - * is fine to do as doing a zone reset or zone finish while writes are in-flight - * is a mistake from the user which will most likely cause all plugged BIOs to - * fail anyway. - */ -static void disk_zone_wplug_set_wp_offset(struct gendisk *disk, - struct blk_zone_wplug *zwplug, - unsigned int wp_offset) +static void disk_zone_wplug_sync_wp_offset(struct gendisk *disk, + struct blk_zone *zone) { + struct blk_zone_wplug *zwplug; unsigned long flags; - spin_lock_irqsave(&zwplug->lock, flags); - - /* - * Make sure that a BIO completion or another zone reset or finish - * operation has not already removed the plug from the hash table. - */ - if (zwplug->flags & BLK_ZONE_WPLUG_UNHASHED) { - spin_unlock_irqrestore(&zwplug->lock, flags); + zwplug = disk_get_zone_wplug(disk, zone->start); + if (!zwplug) return; - } - /* Update the zone write pointer and abort all plugged BIOs. */ - zwplug->wp_offset = wp_offset; - disk_zone_wplug_abort(zwplug); + spin_lock_irqsave(&zwplug->lock, flags); + if (zwplug->flags & BLK_ZONE_WPLUG_NEED_WP_UPDATE) + disk_zone_wplug_set_wp_offset(disk, zwplug, + blk_zone_wp_offset(zone)); + spin_unlock_irqrestore(&zwplug->lock, flags); - /* - * Updating the write pointer offset puts back the zone - * in a good state. So clear the error flag and decrement the - * error count if we were in error state. - */ - disk_zone_wplug_clear_error(disk, zwplug); + disk_put_zone_wplug(zwplug); +} - /* - * The zone write plug now has no BIO plugged: remove it from the - * hash table so that it cannot be seen. The plug will be freed - * when the last reference is dropped. - */ - if (disk_should_remove_zone_wplug(disk, zwplug)) - disk_remove_zone_wplug(disk, zwplug); +static int disk_zone_sync_wp_offset(struct gendisk *disk, sector_t sector) +{ + struct disk_report_zones_cb_args args = { + .disk = disk, + }; - spin_unlock_irqrestore(&zwplug->lock, flags); + return disk->fops->report_zones(disk, sector, 1, + disk_report_zones_cb, &args); } static bool blk_zone_wplug_handle_reset_or_finish(struct bio *bio, @@ -700,6 +682,7 @@ static bool blk_zone_wplug_handle_reset_or_finish(struct bio *bio, struct gendisk *disk = bio->bi_bdev->bd_disk; sector_t sector = bio->bi_iter.bi_sector; struct blk_zone_wplug *zwplug; + unsigned long flags; /* Conventional zones cannot be reset nor finished. */ if (!bdev_zone_is_seq(bio->bi_bdev, sector)) { @@ -707,6 +690,15 @@ static bool blk_zone_wplug_handle_reset_or_finish(struct bio *bio, return true; } + /* + * No-wait reset or finish BIOs do not make much sense as the callers + * issue these as blocking operations in most cases. To avoid issues + * the BIO execution potentially failing with BLK_STS_AGAIN, warn about + * REQ_NOWAIT being set and ignore that flag. + */ + if (WARN_ON_ONCE(bio->bi_opf & REQ_NOWAIT)) + bio->bi_opf &= ~REQ_NOWAIT; + /* * If we have a zone write plug, set its write pointer offset to 0 * (reset case) or to the zone size (finish case). This will abort all @@ -716,7 +708,9 @@ static bool blk_zone_wplug_handle_reset_or_finish(struct bio *bio, */ zwplug = disk_get_zone_wplug(disk, sector); if (zwplug) { + spin_lock_irqsave(&zwplug->lock, flags); disk_zone_wplug_set_wp_offset(disk, zwplug, wp_offset); + spin_unlock_irqrestore(&zwplug->lock, flags); disk_put_zone_wplug(zwplug); } @@ -727,6 +721,7 @@ static bool blk_zone_wplug_handle_reset_all(struct bio *bio) { struct gendisk *disk = bio->bi_bdev->bd_disk; struct blk_zone_wplug *zwplug; + unsigned long flags; sector_t sector; /* @@ -738,7 +733,9 @@ static bool blk_zone_wplug_handle_reset_all(struct bio *bio) sector += disk->queue->limits.chunk_sectors) { zwplug = disk_get_zone_wplug(disk, sector); if (zwplug) { + spin_lock_irqsave(&zwplug->lock, flags); disk_zone_wplug_set_wp_offset(disk, zwplug, 0); + spin_unlock_irqrestore(&zwplug->lock, flags); disk_put_zone_wplug(zwplug); } } @@ -746,9 +743,25 @@ static bool blk_zone_wplug_handle_reset_all(struct bio *bio) return false; } -static inline void blk_zone_wplug_add_bio(struct blk_zone_wplug *zwplug, - struct bio *bio, unsigned int nr_segs) +static void disk_zone_wplug_schedule_bio_work(struct gendisk *disk, + struct blk_zone_wplug *zwplug) { + /* + * Take a reference on the zone write plug and schedule the submission + * of the next plugged BIO. blk_zone_wplug_bio_work() will release the + * reference we take here. + */ + WARN_ON_ONCE(!(zwplug->flags & BLK_ZONE_WPLUG_PLUGGED)); + refcount_inc(&zwplug->ref); + queue_work(disk->zone_wplugs_wq, &zwplug->bio_work); +} + +static inline void disk_zone_wplug_add_bio(struct gendisk *disk, + struct blk_zone_wplug *zwplug, + struct bio *bio, unsigned int nr_segs) +{ + bool schedule_bio_work = false; + /* * Grab an extra reference on the BIO request queue usage counter. * This reference will be reused to submit a request for the BIO for @@ -764,6 +777,16 @@ static inline void blk_zone_wplug_add_bio(struct blk_zone_wplug *zwplug, */ bio_clear_polled(bio); + /* + * REQ_NOWAIT BIOs are always handled using the zone write plug BIO + * work, which can block. So clear the REQ_NOWAIT flag and schedule the + * work if this is the first BIO we are plugging. + */ + if (bio->bi_opf & REQ_NOWAIT) { + schedule_bio_work = !(zwplug->flags & BLK_ZONE_WPLUG_PLUGGED); + bio->bi_opf &= ~REQ_NOWAIT; + } + /* * Reuse the poll cookie field to store the number of segments when * split to the hardware limits. @@ -777,6 +800,11 @@ static inline void blk_zone_wplug_add_bio(struct blk_zone_wplug *zwplug, * at the tail of the list to preserve the sequential write order. */ bio_list_add(&zwplug->bio_list, bio); + + zwplug->flags |= BLK_ZONE_WPLUG_PLUGGED; + + if (schedule_bio_work) + disk_zone_wplug_schedule_bio_work(disk, zwplug); } /* @@ -889,13 +917,23 @@ static bool blk_zone_wplug_prepare_bio(struct blk_zone_wplug *zwplug, { struct gendisk *disk = bio->bi_bdev->bd_disk; + /* + * If we lost track of the zone write pointer due to a write error, + * the user must either execute a report zones, reset the zone or finish + * the to recover a reliable write pointer position. Fail BIOs if the + * user did not do that as we cannot handle emulated zone append + * otherwise. + */ + if (zwplug->flags & BLK_ZONE_WPLUG_NEED_WP_UPDATE) + return false; + /* * Check that the user is not attempting to write to a full zone. * We know such BIO will fail, and that would potentially overflow our * write pointer offset beyond the end of the zone. */ if (disk_zone_wplug_is_full(disk, zwplug)) - goto err; + return false; if (bio_op(bio) == REQ_OP_ZONE_APPEND) { /* @@ -914,24 +952,18 @@ static bool blk_zone_wplug_prepare_bio(struct blk_zone_wplug *zwplug, bio_set_flag(bio, BIO_EMULATES_ZONE_APPEND); } else { /* - * Check for non-sequential writes early because we avoid a - * whole lot of error handling trouble if we don't send it off - * to the driver. + * Check for non-sequential writes early as we know that BIOs + * with a start sector not unaligned to the zone write pointer + * will fail. */ if (bio_offset_from_zone_start(bio) != zwplug->wp_offset) - goto err; + return false; } /* Advance the zone write pointer offset. */ zwplug->wp_offset += bio_sectors(bio); return true; - -err: - /* We detected an invalid write BIO: schedule error recovery. */ - disk_zone_wplug_set_error(disk, zwplug); - kblockd_schedule_work(&disk->zone_wplugs_work); - return false; } static bool blk_zone_wplug_handle_write(struct bio *bio, unsigned int nr_segs) @@ -970,7 +1002,10 @@ static bool blk_zone_wplug_handle_write(struct bio *bio, unsigned int nr_segs) zwplug = disk_get_and_lock_zone_wplug(disk, sector, gfp_mask, &flags); if (!zwplug) { - bio_io_error(bio); + if (bio->bi_opf & REQ_NOWAIT) + bio_wouldblock_error(bio); + else + bio_io_error(bio); return true; } @@ -978,18 +1013,20 @@ static bool blk_zone_wplug_handle_write(struct bio *bio, unsigned int nr_segs) bio_set_flag(bio, BIO_ZONE_WRITE_PLUGGING); /* - * If the zone is already plugged or has a pending error, add the BIO - * to the plug BIO list. Otherwise, plug and let the BIO execute. + * If the zone is already plugged, add the BIO to the plug BIO list. + * Do the same for REQ_NOWAIT BIOs to ensure that we will not see a + * BLK_STS_AGAIN failure if we let the BIO execute. + * Otherwise, plug and let the BIO execute. */ - if (zwplug->flags & BLK_ZONE_WPLUG_BUSY) + if ((zwplug->flags & BLK_ZONE_WPLUG_PLUGGED) || + (bio->bi_opf & REQ_NOWAIT)) goto plug; - /* - * If an error is detected when preparing the BIO, add it to the BIO - * list so that error recovery can deal with it. - */ - if (!blk_zone_wplug_prepare_bio(zwplug, bio)) - goto plug; + if (!blk_zone_wplug_prepare_bio(zwplug, bio)) { + spin_unlock_irqrestore(&zwplug->lock, flags); + bio_io_error(bio); + return true; + } zwplug->flags |= BLK_ZONE_WPLUG_PLUGGED; @@ -998,8 +1035,7 @@ static bool blk_zone_wplug_handle_write(struct bio *bio, unsigned int nr_segs) return false; plug: - zwplug->flags |= BLK_ZONE_WPLUG_PLUGGED; - blk_zone_wplug_add_bio(zwplug, bio, nr_segs); + disk_zone_wplug_add_bio(disk, zwplug, bio, nr_segs); spin_unlock_irqrestore(&zwplug->lock, flags); @@ -1083,19 +1119,6 @@ bool blk_zone_plug_bio(struct bio *bio, unsigned int nr_segs) } EXPORT_SYMBOL_GPL(blk_zone_plug_bio); -static void disk_zone_wplug_schedule_bio_work(struct gendisk *disk, - struct blk_zone_wplug *zwplug) -{ - /* - * Take a reference on the zone write plug and schedule the submission - * of the next plugged BIO. blk_zone_wplug_bio_work() will release the - * reference we take here. - */ - WARN_ON_ONCE(!(zwplug->flags & BLK_ZONE_WPLUG_PLUGGED)); - refcount_inc(&zwplug->ref); - queue_work(disk->zone_wplugs_wq, &zwplug->bio_work); -} - static void disk_zone_wplug_unplug_bio(struct gendisk *disk, struct blk_zone_wplug *zwplug) { @@ -1103,16 +1126,6 @@ static void disk_zone_wplug_unplug_bio(struct gendisk *disk, spin_lock_irqsave(&zwplug->lock, flags); - /* - * If we had an error, schedule error recovery. The recovery work - * will restart submission of plugged BIOs. - */ - if (zwplug->flags & BLK_ZONE_WPLUG_ERROR) { - spin_unlock_irqrestore(&zwplug->lock, flags); - kblockd_schedule_work(&disk->zone_wplugs_work); - return; - } - /* Schedule submission of the next plugged BIO if we have one. */ if (!bio_list_empty(&zwplug->bio_list)) { disk_zone_wplug_schedule_bio_work(disk, zwplug); @@ -1155,12 +1168,13 @@ void blk_zone_write_plug_bio_endio(struct bio *bio) } /* - * If the BIO failed, mark the plug as having an error to trigger - * recovery. + * If the BIO failed, abort all plugged BIOs and mark the plug as + * needing a write pointer update. */ if (bio->bi_status != BLK_STS_OK) { spin_lock_irqsave(&zwplug->lock, flags); - disk_zone_wplug_set_error(disk, zwplug); + disk_zone_wplug_abort(zwplug); + zwplug->flags |= BLK_ZONE_WPLUG_NEED_WP_UPDATE; spin_unlock_irqrestore(&zwplug->lock, flags); } @@ -1216,6 +1230,7 @@ static void blk_zone_wplug_bio_work(struct work_struct *work) */ spin_lock_irqsave(&zwplug->lock, flags); +again: bio = bio_list_pop(&zwplug->bio_list); if (!bio) { zwplug->flags &= ~BLK_ZONE_WPLUG_PLUGGED; @@ -1224,10 +1239,8 @@ static void blk_zone_wplug_bio_work(struct work_struct *work) } if (!blk_zone_wplug_prepare_bio(zwplug, bio)) { - /* Error recovery will decide what to do with the BIO. */ - bio_list_add_head(&zwplug->bio_list, bio); - spin_unlock_irqrestore(&zwplug->lock, flags); - goto put_zwplug; + blk_zone_wplug_bio_io_error(zwplug, bio); + goto again; } spin_unlock_irqrestore(&zwplug->lock, flags); @@ -1249,120 +1262,6 @@ put_zwplug: disk_put_zone_wplug(zwplug); } -static unsigned int blk_zone_wp_offset(struct blk_zone *zone) -{ - switch (zone->cond) { - case BLK_ZONE_COND_IMP_OPEN: - case BLK_ZONE_COND_EXP_OPEN: - case BLK_ZONE_COND_CLOSED: - return zone->wp - zone->start; - case BLK_ZONE_COND_FULL: - return zone->len; - case BLK_ZONE_COND_EMPTY: - return 0; - case BLK_ZONE_COND_NOT_WP: - case BLK_ZONE_COND_OFFLINE: - case BLK_ZONE_COND_READONLY: - default: - /* - * Conventional, offline and read-only zones do not have a valid - * write pointer. - */ - return UINT_MAX; - } -} - -static int blk_zone_wplug_report_zone_cb(struct blk_zone *zone, - unsigned int idx, void *data) -{ - struct blk_zone *zonep = data; - - *zonep = *zone; - return 0; -} - -static void disk_zone_wplug_handle_error(struct gendisk *disk, - struct blk_zone_wplug *zwplug) -{ - sector_t zone_start_sector = - bdev_zone_sectors(disk->part0) * zwplug->zone_no; - unsigned int noio_flag; - struct blk_zone zone; - unsigned long flags; - int ret; - - /* Get the current zone information from the device. */ - noio_flag = memalloc_noio_save(); - ret = disk->fops->report_zones(disk, zone_start_sector, 1, - blk_zone_wplug_report_zone_cb, &zone); - memalloc_noio_restore(noio_flag); - - spin_lock_irqsave(&zwplug->lock, flags); - - /* - * A zone reset or finish may have cleared the error already. In such - * case, do nothing as the report zones may have seen the "old" write - * pointer value before the reset/finish operation completed. - */ - if (!(zwplug->flags & BLK_ZONE_WPLUG_ERROR)) - goto unlock; - - zwplug->flags &= ~BLK_ZONE_WPLUG_ERROR; - - if (ret != 1) { - /* - * We failed to get the zone information, meaning that something - * is likely really wrong with the device. Abort all remaining - * plugged BIOs as otherwise we could endup waiting forever on - * plugged BIOs to complete if there is a queue freeze on-going. - */ - disk_zone_wplug_abort(zwplug); - goto unplug; - } - - /* Update the zone write pointer offset. */ - zwplug->wp_offset = blk_zone_wp_offset(&zone); - disk_zone_wplug_abort_unaligned(disk, zwplug); - - /* Restart BIO submission if we still have any BIO left. */ - if (!bio_list_empty(&zwplug->bio_list)) { - disk_zone_wplug_schedule_bio_work(disk, zwplug); - goto unlock; - } - -unplug: - zwplug->flags &= ~BLK_ZONE_WPLUG_PLUGGED; - if (disk_should_remove_zone_wplug(disk, zwplug)) - disk_remove_zone_wplug(disk, zwplug); - -unlock: - spin_unlock_irqrestore(&zwplug->lock, flags); -} - -static void disk_zone_wplugs_work(struct work_struct *work) -{ - struct gendisk *disk = - container_of(work, struct gendisk, zone_wplugs_work); - struct blk_zone_wplug *zwplug; - unsigned long flags; - - spin_lock_irqsave(&disk->zone_wplugs_lock, flags); - - while (!list_empty(&disk->zone_wplugs_err_list)) { - zwplug = list_first_entry(&disk->zone_wplugs_err_list, - struct blk_zone_wplug, link); - list_del_init(&zwplug->link); - spin_unlock_irqrestore(&disk->zone_wplugs_lock, flags); - - disk_zone_wplug_handle_error(disk, zwplug); - disk_put_zone_wplug(zwplug); - - spin_lock_irqsave(&disk->zone_wplugs_lock, flags); - } - - spin_unlock_irqrestore(&disk->zone_wplugs_lock, flags); -} - static inline unsigned int disk_zone_wplugs_hash_size(struct gendisk *disk) { return 1U << disk->zone_wplugs_hash_bits; @@ -1371,8 +1270,6 @@ static inline unsigned int disk_zone_wplugs_hash_size(struct gendisk *disk) void disk_init_zone_resources(struct gendisk *disk) { spin_lock_init(&disk->zone_wplugs_lock); - INIT_LIST_HEAD(&disk->zone_wplugs_err_list); - INIT_WORK(&disk->zone_wplugs_work, disk_zone_wplugs_work); } /* @@ -1471,8 +1368,6 @@ void disk_free_zone_resources(struct gendisk *disk) if (!disk->zone_wplugs_pool) return; - cancel_work_sync(&disk->zone_wplugs_work); - if (disk->zone_wplugs_wq) { destroy_workqueue(disk->zone_wplugs_wq); disk->zone_wplugs_wq = NULL; @@ -1669,6 +1564,8 @@ static int blk_revalidate_seq_zone(struct blk_zone *zone, unsigned int idx, if (!disk->zone_wplugs_hash) return 0; + disk_zone_wplug_sync_wp_offset(disk, zone); + wp_offset = blk_zone_wp_offset(zone); if (!wp_offset || wp_offset >= zone->capacity) return 0; @@ -1799,6 +1696,7 @@ int blk_revalidate_disk_zones(struct gendisk *disk) memalloc_noio_restore(noio_flag); return ret; } + ret = disk->fops->report_zones(disk, 0, UINT_MAX, blk_revalidate_zone_cb, &args); if (!ret) { @@ -1835,6 +1733,48 @@ int blk_revalidate_disk_zones(struct gendisk *disk) } EXPORT_SYMBOL_GPL(blk_revalidate_disk_zones); +/** + * blk_zone_issue_zeroout - zero-fill a block range in a zone + * @bdev: blockdev to write + * @sector: start sector + * @nr_sects: number of sectors to write + * @gfp_mask: memory allocation flags (for bio_alloc) + * + * Description: + * Zero-fill a block range in a zone (@sector must be equal to the zone write + * pointer), handling potential errors due to the (initially unknown) lack of + * hardware offload (See blkdev_issue_zeroout()). + */ +int blk_zone_issue_zeroout(struct block_device *bdev, sector_t sector, + sector_t nr_sects, gfp_t gfp_mask) +{ + int ret; + + if (WARN_ON_ONCE(!bdev_is_zoned(bdev))) + return -EIO; + + ret = blkdev_issue_zeroout(bdev, sector, nr_sects, gfp_mask, + BLKDEV_ZERO_NOFALLBACK); + if (ret != -EOPNOTSUPP) + return ret; + + /* + * The failed call to blkdev_issue_zeroout() advanced the zone write + * pointer. Undo this using a report zone to update the zone write + * pointer to the correct current value. + */ + ret = disk_zone_sync_wp_offset(bdev->bd_disk, sector); + if (ret != 1) + return ret < 0 ? ret : -EIO; + + /* + * Retry without BLKDEV_ZERO_NOFALLBACK to force the fallback to a + * regular write with zero-pages. + */ + return blkdev_issue_zeroout(bdev, sector, nr_sects, gfp_mask, 0); +} +EXPORT_SYMBOL_GPL(blk_zone_issue_zeroout); + #ifdef CONFIG_BLK_DEBUG_FS int queue_zone_wplugs_show(void *data, struct seq_file *m) diff --git a/block/mq-deadline.c b/block/mq-deadline.c index 91b3789f710e7a9b8ac3da53fa98632eff5f6abe..5528347b5fcfca3d5bda443f9a116e29b95dd0c1 100644 --- a/block/mq-deadline.c +++ b/block/mq-deadline.c @@ -698,8 +698,6 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq, list_add(&rq->queuelist, &per_prio->dispatch); rq->fifo_time = jiffies; } else { - struct list_head *insert_before; - deadline_add_rq_rb(per_prio, rq); if (rq_mergeable(rq)) { @@ -712,8 +710,7 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq, * set expire time and add to fifo list */ rq->fifo_time = jiffies + dd->fifo_expire[data_dir]; - insert_before = &per_prio->fifo_list[data_dir]; - list_add_tail(&rq->queuelist, insert_before); + list_add_tail(&rq->queuelist, &per_prio->fifo_list[data_dir]); } } diff --git a/crypto/rsassa-pkcs1.c b/crypto/rsassa-pkcs1.c index 4d077fc9607672061bdf14ebc23a12cee18364c9..f68ffd338f483f12358e37f2b01c32ae169c2323 100644 --- a/crypto/rsassa-pkcs1.c +++ b/crypto/rsassa-pkcs1.c @@ -163,10 +163,6 @@ static int rsassa_pkcs1_sign(struct crypto_sig *tfm, struct rsassa_pkcs1_inst_ctx *ictx = sig_instance_ctx(inst); const struct hash_prefix *hash_prefix = ictx->hash_prefix; struct rsassa_pkcs1_ctx *ctx = crypto_sig_ctx(tfm); - unsigned int child_reqsize = crypto_akcipher_reqsize(ctx->child); - struct akcipher_request *child_req __free(kfree_sensitive) = NULL; - struct scatterlist in_sg[3], out_sg; - struct crypto_wait cwait; unsigned int pad_len; unsigned int ps_end; unsigned int len; @@ -187,37 +183,25 @@ static int rsassa_pkcs1_sign(struct crypto_sig *tfm, pad_len = ctx->key_size - slen - hash_prefix->size - 1; - child_req = kmalloc(sizeof(*child_req) + child_reqsize + pad_len, - GFP_KERNEL); - if (!child_req) - return -ENOMEM; - /* RFC 8017 sec 8.2.1 step 1 - EMSA-PKCS1-v1_5 encoding generation */ - in_buf = (u8 *)(child_req + 1) + child_reqsize; + in_buf = dst; + memmove(in_buf + pad_len + hash_prefix->size, src, slen); + memcpy(in_buf + pad_len, hash_prefix->data, hash_prefix->size); + ps_end = pad_len - 1; in_buf[0] = 0x01; memset(in_buf + 1, 0xff, ps_end - 1); in_buf[ps_end] = 0x00; - /* RFC 8017 sec 8.2.1 step 2 - RSA signature */ - crypto_init_wait(&cwait); - sg_init_table(in_sg, 3); - sg_set_buf(&in_sg[0], in_buf, pad_len); - sg_set_buf(&in_sg[1], hash_prefix->data, hash_prefix->size); - sg_set_buf(&in_sg[2], src, slen); - sg_init_one(&out_sg, dst, dlen); - akcipher_request_set_tfm(child_req, ctx->child); - akcipher_request_set_crypt(child_req, in_sg, &out_sg, - ctx->key_size - 1, dlen); - akcipher_request_set_callback(child_req, CRYPTO_TFM_REQ_MAY_SLEEP, - crypto_req_done, &cwait); - err = crypto_akcipher_decrypt(child_req); - err = crypto_wait_req(err, &cwait); - if (err) + /* RFC 8017 sec 8.2.1 step 2 - RSA signature */ + err = crypto_akcipher_sync_decrypt(ctx->child, in_buf, + ctx->key_size - 1, in_buf, + ctx->key_size); + if (err < 0) return err; - len = child_req->dst_len; + len = err; pad_len = ctx->key_size - len; /* Four billion to one */ @@ -239,8 +223,8 @@ static int rsassa_pkcs1_verify(struct crypto_sig *tfm, struct rsassa_pkcs1_ctx *ctx = crypto_sig_ctx(tfm); unsigned int child_reqsize = crypto_akcipher_reqsize(ctx->child); struct akcipher_request *child_req __free(kfree_sensitive) = NULL; - struct scatterlist in_sg, out_sg; struct crypto_wait cwait; + struct scatterlist sg; unsigned int dst_len; unsigned int pos; u8 *out_buf; @@ -259,13 +243,12 @@ static int rsassa_pkcs1_verify(struct crypto_sig *tfm, return -ENOMEM; out_buf = (u8 *)(child_req + 1) + child_reqsize; + memcpy(out_buf, src, slen); crypto_init_wait(&cwait); - sg_init_one(&in_sg, src, slen); - sg_init_one(&out_sg, out_buf, ctx->key_size); + sg_init_one(&sg, out_buf, slen); akcipher_request_set_tfm(child_req, ctx->child); - akcipher_request_set_crypt(child_req, &in_sg, &out_sg, - slen, ctx->key_size); + akcipher_request_set_crypt(child_req, &sg, &sg, slen, slen); akcipher_request_set_callback(child_req, CRYPTO_TFM_REQ_MAY_SLEEP, crypto_req_done, &cwait); diff --git a/drivers/accel/Kconfig b/drivers/accel/Kconfig index 64065fb8922b0c2e0d55bc53f249290ba7e03f8a..5b9490367a39fd12d35a8d9021768aa186c09308 100644 --- a/drivers/accel/Kconfig +++ b/drivers/accel/Kconfig @@ -24,6 +24,7 @@ menuconfig DRM_ACCEL different device files, called accel/accel* (in /dev, sysfs and debugfs). +source "drivers/accel/amdxdna/Kconfig" source "drivers/accel/habanalabs/Kconfig" source "drivers/accel/ivpu/Kconfig" source "drivers/accel/qaic/Kconfig" diff --git a/drivers/accel/Makefile b/drivers/accel/Makefile index ab3df932937fd1b3a975489e2df67607a946dd43..a301fb6089d4c515430175c5e2ba9190f6dc9158 100644 --- a/drivers/accel/Makefile +++ b/drivers/accel/Makefile @@ -1,5 +1,6 @@ # SPDX-License-Identifier: GPL-2.0-only +obj-$(CONFIG_DRM_ACCEL_AMDXDNA) += amdxdna/ obj-$(CONFIG_DRM_ACCEL_HABANALABS) += habanalabs/ obj-$(CONFIG_DRM_ACCEL_IVPU) += ivpu/ obj-$(CONFIG_DRM_ACCEL_QAIC) += qaic/ diff --git a/drivers/accel/amdxdna/Kconfig b/drivers/accel/amdxdna/Kconfig new file mode 100644 index 0000000000000000000000000000000000000000..f39d7a87296cda75bc3cf9548b742e4e814ce491 --- /dev/null +++ b/drivers/accel/amdxdna/Kconfig @@ -0,0 +1,18 @@ +# SPDX-License-Identifier: GPL-2.0-only + +config DRM_ACCEL_AMDXDNA + tristate "AMD AI Engine" + depends on AMD_IOMMU + depends on DRM_ACCEL + depends on PCI && HAS_IOMEM + depends on X86_64 + select DRM_SCHED + select DRM_GEM_SHMEM_HELPER + select FW_LOADER + select HMM_MIRROR + help + Choose this option to enable support for NPU integrated into AMD + client CPUs like AMD Ryzen AI 300 Series. AMD NPU can be used to + accelerate machine learning applications. + + If "M" is selected, the driver module will be amdxdna. diff --git a/drivers/accel/amdxdna/Makefile b/drivers/accel/amdxdna/Makefile new file mode 100644 index 0000000000000000000000000000000000000000..0e9adf6890a015cc1b9b098c0801ad741bf4fa9c --- /dev/null +++ b/drivers/accel/amdxdna/Makefile @@ -0,0 +1,23 @@ +# SPDX-License-Identifier: GPL-2.0-only + +amdxdna-y := \ + aie2_ctx.o \ + aie2_error.o \ + aie2_message.o \ + aie2_pci.o \ + aie2_pm.o \ + aie2_psp.o \ + aie2_smu.o \ + aie2_solver.o \ + amdxdna_ctx.o \ + amdxdna_gem.o \ + amdxdna_mailbox.o \ + amdxdna_mailbox_helper.o \ + amdxdna_pci_drv.o \ + amdxdna_sysfs.o \ + npu1_regs.o \ + npu2_regs.o \ + npu4_regs.o \ + npu5_regs.o \ + npu6_regs.o +obj-$(CONFIG_DRM_ACCEL_AMDXDNA) = amdxdna.o diff --git a/drivers/accel/amdxdna/TODO b/drivers/accel/amdxdna/TODO new file mode 100644 index 0000000000000000000000000000000000000000..5119bccd1917a1e0abf18dd736da1d934df06dd4 --- /dev/null +++ b/drivers/accel/amdxdna/TODO @@ -0,0 +1,3 @@ +- Add import and export BO support +- Add debugfs support +- Add debug BO support diff --git a/drivers/accel/amdxdna/aie2_ctx.c b/drivers/accel/amdxdna/aie2_ctx.c new file mode 100644 index 0000000000000000000000000000000000000000..9facf45818f9fde1839ffed676588a3cdd4f7297 --- /dev/null +++ b/drivers/accel/amdxdna/aie2_ctx.c @@ -0,0 +1,910 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2024, Advanced Micro Devices, Inc. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "aie2_msg_priv.h" +#include "aie2_pci.h" +#include "aie2_solver.h" +#include "amdxdna_ctx.h" +#include "amdxdna_gem.h" +#include "amdxdna_mailbox.h" +#include "amdxdna_pci_drv.h" + +bool force_cmdlist; +module_param(force_cmdlist, bool, 0600); +MODULE_PARM_DESC(force_cmdlist, "Force use command list (Default false)"); + +#define HWCTX_MAX_TIMEOUT 60000 /* milliseconds */ + +static void aie2_job_release(struct kref *ref) +{ + struct amdxdna_sched_job *job; + + job = container_of(ref, struct amdxdna_sched_job, refcnt); + amdxdna_sched_job_cleanup(job); + if (job->out_fence) + dma_fence_put(job->out_fence); + kfree(job); +} + +static void aie2_job_put(struct amdxdna_sched_job *job) +{ + kref_put(&job->refcnt, aie2_job_release); +} + +/* The bad_job is used in aie2_sched_job_timedout, otherwise, set it to NULL */ +static void aie2_hwctx_stop(struct amdxdna_dev *xdna, struct amdxdna_hwctx *hwctx, + struct drm_sched_job *bad_job) +{ + drm_sched_stop(&hwctx->priv->sched, bad_job); + aie2_destroy_context(xdna->dev_handle, hwctx); +} + +static int aie2_hwctx_restart(struct amdxdna_dev *xdna, struct amdxdna_hwctx *hwctx) +{ + struct amdxdna_gem_obj *heap = hwctx->priv->heap; + int ret; + + ret = aie2_create_context(xdna->dev_handle, hwctx); + if (ret) { + XDNA_ERR(xdna, "Create hwctx failed, ret %d", ret); + goto out; + } + + ret = aie2_map_host_buf(xdna->dev_handle, hwctx->fw_ctx_id, + heap->mem.userptr, heap->mem.size); + if (ret) { + XDNA_ERR(xdna, "Map host buf failed, ret %d", ret); + goto out; + } + + if (hwctx->status != HWCTX_STAT_READY) { + XDNA_DBG(xdna, "hwctx is not ready, status %d", hwctx->status); + goto out; + } + + ret = aie2_config_cu(hwctx); + if (ret) { + XDNA_ERR(xdna, "Config cu failed, ret %d", ret); + goto out; + } + +out: + drm_sched_start(&hwctx->priv->sched, 0); + XDNA_DBG(xdna, "%s restarted, ret %d", hwctx->name, ret); + return ret; +} + +void aie2_restart_ctx(struct amdxdna_client *client) +{ + struct amdxdna_dev *xdna = client->xdna; + struct amdxdna_hwctx *hwctx; + unsigned long hwctx_id; + + drm_WARN_ON(&xdna->ddev, !mutex_is_locked(&xdna->dev_lock)); + mutex_lock(&client->hwctx_lock); + amdxdna_for_each_hwctx(client, hwctx_id, hwctx) { + if (hwctx->status != HWCTX_STAT_STOP) + continue; + + hwctx->status = hwctx->old_status; + XDNA_DBG(xdna, "Resetting %s", hwctx->name); + aie2_hwctx_restart(xdna, hwctx); + } + mutex_unlock(&client->hwctx_lock); +} + +static struct dma_fence *aie2_cmd_get_out_fence(struct amdxdna_hwctx *hwctx, u64 seq) +{ + struct dma_fence *fence, *out_fence = NULL; + int ret; + + fence = drm_syncobj_fence_get(hwctx->priv->syncobj); + if (!fence) + return NULL; + + ret = dma_fence_chain_find_seqno(&fence, seq); + if (ret) + goto out; + + out_fence = dma_fence_get(dma_fence_chain_contained(fence)); + +out: + dma_fence_put(fence); + return out_fence; +} + +static void aie2_hwctx_wait_for_idle(struct amdxdna_hwctx *hwctx) +{ + struct dma_fence *fence; + + fence = aie2_cmd_get_out_fence(hwctx, hwctx->priv->seq - 1); + if (!fence) + return; + + dma_fence_wait(fence, false); + dma_fence_put(fence); +} + +void aie2_hwctx_suspend(struct amdxdna_hwctx *hwctx) +{ + struct amdxdna_dev *xdna = hwctx->client->xdna; + + /* + * Command timeout is unlikely. But if it happens, it doesn't + * break the system. aie2_hwctx_stop() will destroy mailbox + * and abort all commands. + */ + drm_WARN_ON(&xdna->ddev, !mutex_is_locked(&xdna->dev_lock)); + aie2_hwctx_wait_for_idle(hwctx); + aie2_hwctx_stop(xdna, hwctx, NULL); + hwctx->old_status = hwctx->status; + hwctx->status = HWCTX_STAT_STOP; +} + +void aie2_hwctx_resume(struct amdxdna_hwctx *hwctx) +{ + struct amdxdna_dev *xdna = hwctx->client->xdna; + + /* + * The resume path cannot guarantee that mailbox channel can be + * regenerated. If this happen, when submit message to this + * mailbox channel, error will return. + */ + drm_WARN_ON(&xdna->ddev, !mutex_is_locked(&xdna->dev_lock)); + hwctx->status = hwctx->old_status; + aie2_hwctx_restart(xdna, hwctx); +} + +static void +aie2_sched_notify(struct amdxdna_sched_job *job) +{ + struct dma_fence *fence = job->fence; + + trace_xdna_job(&job->base, job->hwctx->name, "signaled fence", job->seq); + job->hwctx->priv->completed++; + dma_fence_signal(fence); + + up(&job->hwctx->priv->job_sem); + job->job_done = true; + dma_fence_put(fence); + mmput_async(job->mm); + aie2_job_put(job); +} + +static int +aie2_sched_resp_handler(void *handle, const u32 *data, size_t size) +{ + struct amdxdna_sched_job *job = handle; + struct amdxdna_gem_obj *cmd_abo; + u32 ret = 0; + u32 status; + + cmd_abo = job->cmd_bo; + + if (unlikely(!data)) + goto out; + + if (unlikely(size != sizeof(u32))) { + amdxdna_cmd_set_state(cmd_abo, ERT_CMD_STATE_ABORT); + ret = -EINVAL; + goto out; + } + + status = *data; + XDNA_DBG(job->hwctx->client->xdna, "Resp status 0x%x", status); + if (status == AIE2_STATUS_SUCCESS) + amdxdna_cmd_set_state(cmd_abo, ERT_CMD_STATE_COMPLETED); + else + amdxdna_cmd_set_state(cmd_abo, ERT_CMD_STATE_ERROR); + +out: + aie2_sched_notify(job); + return ret; +} + +static int +aie2_sched_nocmd_resp_handler(void *handle, const u32 *data, size_t size) +{ + struct amdxdna_sched_job *job = handle; + u32 ret = 0; + u32 status; + + if (unlikely(!data)) + goto out; + + if (unlikely(size != sizeof(u32))) { + ret = -EINVAL; + goto out; + } + + status = *data; + XDNA_DBG(job->hwctx->client->xdna, "Resp status 0x%x", status); + +out: + aie2_sched_notify(job); + return ret; +} + +static int +aie2_sched_cmdlist_resp_handler(void *handle, const u32 *data, size_t size) +{ + struct amdxdna_sched_job *job = handle; + struct amdxdna_gem_obj *cmd_abo; + struct cmd_chain_resp *resp; + struct amdxdna_dev *xdna; + u32 fail_cmd_status; + u32 fail_cmd_idx; + u32 ret = 0; + + cmd_abo = job->cmd_bo; + if (unlikely(!data) || unlikely(size != sizeof(u32) * 3)) { + amdxdna_cmd_set_state(cmd_abo, ERT_CMD_STATE_ABORT); + ret = -EINVAL; + goto out; + } + + resp = (struct cmd_chain_resp *)data; + xdna = job->hwctx->client->xdna; + XDNA_DBG(xdna, "Status 0x%x", resp->status); + if (resp->status == AIE2_STATUS_SUCCESS) { + amdxdna_cmd_set_state(cmd_abo, ERT_CMD_STATE_COMPLETED); + goto out; + } + + /* Slow path to handle error, read from ringbuf on BAR */ + fail_cmd_idx = resp->fail_cmd_idx; + fail_cmd_status = resp->fail_cmd_status; + XDNA_DBG(xdna, "Failed cmd idx %d, status 0x%x", + fail_cmd_idx, fail_cmd_status); + + if (fail_cmd_status == AIE2_STATUS_SUCCESS) { + amdxdna_cmd_set_state(cmd_abo, ERT_CMD_STATE_ABORT); + ret = -EINVAL; + goto out; + } + amdxdna_cmd_set_state(cmd_abo, fail_cmd_status); + + if (amdxdna_cmd_get_op(cmd_abo) == ERT_CMD_CHAIN) { + struct amdxdna_cmd_chain *cc = amdxdna_cmd_get_payload(cmd_abo, NULL); + + cc->error_index = fail_cmd_idx; + if (cc->error_index >= cc->command_count) + cc->error_index = 0; + } +out: + aie2_sched_notify(job); + return ret; +} + +static struct dma_fence * +aie2_sched_job_run(struct drm_sched_job *sched_job) +{ + struct amdxdna_sched_job *job = drm_job_to_xdna_job(sched_job); + struct amdxdna_gem_obj *cmd_abo = job->cmd_bo; + struct amdxdna_hwctx *hwctx = job->hwctx; + struct dma_fence *fence; + int ret; + + if (!mmget_not_zero(job->mm)) + return ERR_PTR(-ESRCH); + + kref_get(&job->refcnt); + fence = dma_fence_get(job->fence); + + if (unlikely(!cmd_abo)) { + ret = aie2_sync_bo(hwctx, job, aie2_sched_nocmd_resp_handler); + goto out; + } + + amdxdna_cmd_set_state(cmd_abo, ERT_CMD_STATE_NEW); + + if (amdxdna_cmd_get_op(cmd_abo) == ERT_CMD_CHAIN) + ret = aie2_cmdlist_multi_execbuf(hwctx, job, aie2_sched_cmdlist_resp_handler); + else if (force_cmdlist) + ret = aie2_cmdlist_single_execbuf(hwctx, job, aie2_sched_cmdlist_resp_handler); + else + ret = aie2_execbuf(hwctx, job, aie2_sched_resp_handler); + +out: + if (ret) { + dma_fence_put(job->fence); + aie2_job_put(job); + mmput(job->mm); + fence = ERR_PTR(ret); + } + trace_xdna_job(sched_job, hwctx->name, "sent to device", job->seq); + + return fence; +} + +static void aie2_sched_job_free(struct drm_sched_job *sched_job) +{ + struct amdxdna_sched_job *job = drm_job_to_xdna_job(sched_job); + struct amdxdna_hwctx *hwctx = job->hwctx; + + trace_xdna_job(sched_job, hwctx->name, "job free", job->seq); + if (!job->job_done) + up(&hwctx->priv->job_sem); + + drm_sched_job_cleanup(sched_job); + aie2_job_put(job); +} + +static enum drm_gpu_sched_stat +aie2_sched_job_timedout(struct drm_sched_job *sched_job) +{ + struct amdxdna_sched_job *job = drm_job_to_xdna_job(sched_job); + struct amdxdna_hwctx *hwctx = job->hwctx; + struct amdxdna_dev *xdna; + + xdna = hwctx->client->xdna; + trace_xdna_job(sched_job, hwctx->name, "job timedout", job->seq); + mutex_lock(&xdna->dev_lock); + aie2_hwctx_stop(xdna, hwctx, sched_job); + + aie2_hwctx_restart(xdna, hwctx); + mutex_unlock(&xdna->dev_lock); + + return DRM_GPU_SCHED_STAT_NOMINAL; +} + +const struct drm_sched_backend_ops sched_ops = { + .run_job = aie2_sched_job_run, + .free_job = aie2_sched_job_free, + .timedout_job = aie2_sched_job_timedout, +}; + +static int aie2_hwctx_col_list(struct amdxdna_hwctx *hwctx) +{ + struct amdxdna_dev *xdna = hwctx->client->xdna; + struct amdxdna_dev_hdl *ndev; + int start, end, first, last; + u32 width = 1, entries = 0; + int i; + + if (!hwctx->num_tiles) { + XDNA_ERR(xdna, "Number of tiles is zero"); + return -EINVAL; + } + + ndev = xdna->dev_handle; + if (unlikely(!ndev->metadata.core.row_count)) { + XDNA_WARN(xdna, "Core tile row count is zero"); + return -EINVAL; + } + + hwctx->num_col = hwctx->num_tiles / ndev->metadata.core.row_count; + if (!hwctx->num_col || hwctx->num_col > ndev->total_col) { + XDNA_ERR(xdna, "Invalid num_col %d", hwctx->num_col); + return -EINVAL; + } + + if (ndev->priv->col_align == COL_ALIGN_NATURE) + width = hwctx->num_col; + + /* + * In range [start, end], find out columns that is multiple of width. + * 'first' is the first column, + * 'last' is the last column, + * 'entries' is the total number of columns. + */ + start = xdna->dev_info->first_col; + end = ndev->total_col - hwctx->num_col; + if (start > 0 && end == 0) { + XDNA_DBG(xdna, "Force start from col 0"); + start = 0; + } + first = start + (width - start % width) % width; + last = end - end % width; + if (last >= first) + entries = (last - first) / width + 1; + XDNA_DBG(xdna, "start %d end %d first %d last %d", + start, end, first, last); + + if (unlikely(!entries)) { + XDNA_ERR(xdna, "Start %d end %d width %d", + start, end, width); + return -EINVAL; + } + + hwctx->col_list = kmalloc_array(entries, sizeof(*hwctx->col_list), GFP_KERNEL); + if (!hwctx->col_list) + return -ENOMEM; + + hwctx->col_list_len = entries; + hwctx->col_list[0] = first; + for (i = 1; i < entries; i++) + hwctx->col_list[i] = hwctx->col_list[i - 1] + width; + + print_hex_dump_debug("col_list: ", DUMP_PREFIX_OFFSET, 16, 4, hwctx->col_list, + entries * sizeof(*hwctx->col_list), false); + return 0; +} + +static int aie2_alloc_resource(struct amdxdna_hwctx *hwctx) +{ + struct amdxdna_dev *xdna = hwctx->client->xdna; + struct alloc_requests *xrs_req; + int ret; + + xrs_req = kzalloc(sizeof(*xrs_req), GFP_KERNEL); + if (!xrs_req) + return -ENOMEM; + + xrs_req->cdo.start_cols = hwctx->col_list; + xrs_req->cdo.cols_len = hwctx->col_list_len; + xrs_req->cdo.ncols = hwctx->num_col; + xrs_req->cdo.qos_cap.opc = hwctx->max_opc; + + xrs_req->rqos.gops = hwctx->qos.gops; + xrs_req->rqos.fps = hwctx->qos.fps; + xrs_req->rqos.dma_bw = hwctx->qos.dma_bandwidth; + xrs_req->rqos.latency = hwctx->qos.latency; + xrs_req->rqos.exec_time = hwctx->qos.frame_exec_time; + xrs_req->rqos.priority = hwctx->qos.priority; + + xrs_req->rid = (uintptr_t)hwctx; + + ret = xrs_allocate_resource(xdna->xrs_hdl, xrs_req, hwctx); + if (ret) + XDNA_ERR(xdna, "Allocate AIE resource failed, ret %d", ret); + + kfree(xrs_req); + return ret; +} + +static void aie2_release_resource(struct amdxdna_hwctx *hwctx) +{ + struct amdxdna_dev *xdna = hwctx->client->xdna; + int ret; + + ret = xrs_release_resource(xdna->xrs_hdl, (uintptr_t)hwctx); + if (ret) + XDNA_ERR(xdna, "Release AIE resource failed, ret %d", ret); +} + +static int aie2_ctx_syncobj_create(struct amdxdna_hwctx *hwctx) +{ + struct amdxdna_dev *xdna = hwctx->client->xdna; + struct drm_file *filp = hwctx->client->filp; + struct drm_syncobj *syncobj; + u32 hdl; + int ret; + + hwctx->syncobj_hdl = AMDXDNA_INVALID_FENCE_HANDLE; + + ret = drm_syncobj_create(&syncobj, 0, NULL); + if (ret) { + XDNA_ERR(xdna, "Create ctx syncobj failed, ret %d", ret); + return ret; + } + ret = drm_syncobj_get_handle(filp, syncobj, &hdl); + if (ret) { + drm_syncobj_put(syncobj); + XDNA_ERR(xdna, "Create ctx syncobj handle failed, ret %d", ret); + return ret; + } + hwctx->priv->syncobj = syncobj; + hwctx->syncobj_hdl = hdl; + + return 0; +} + +static void aie2_ctx_syncobj_destroy(struct amdxdna_hwctx *hwctx) +{ + /* + * The syncobj_hdl is owned by user space and will be cleaned up + * separately. + */ + drm_syncobj_put(hwctx->priv->syncobj); +} + +int aie2_hwctx_init(struct amdxdna_hwctx *hwctx) +{ + struct amdxdna_client *client = hwctx->client; + struct amdxdna_dev *xdna = client->xdna; + struct drm_gpu_scheduler *sched; + struct amdxdna_hwctx_priv *priv; + struct amdxdna_gem_obj *heap; + struct amdxdna_dev_hdl *ndev; + int i, ret; + + priv = kzalloc(sizeof(*hwctx->priv), GFP_KERNEL); + if (!priv) + return -ENOMEM; + hwctx->priv = priv; + + mutex_lock(&client->mm_lock); + heap = client->dev_heap; + if (!heap) { + XDNA_ERR(xdna, "The client dev heap object not exist"); + mutex_unlock(&client->mm_lock); + ret = -ENOENT; + goto free_priv; + } + drm_gem_object_get(to_gobj(heap)); + mutex_unlock(&client->mm_lock); + priv->heap = heap; + sema_init(&priv->job_sem, HWCTX_MAX_CMDS); + + ret = amdxdna_gem_pin(heap); + if (ret) { + XDNA_ERR(xdna, "Dev heap pin failed, ret %d", ret); + goto put_heap; + } + + for (i = 0; i < ARRAY_SIZE(priv->cmd_buf); i++) { + struct amdxdna_gem_obj *abo; + struct amdxdna_drm_create_bo args = { + .flags = 0, + .type = AMDXDNA_BO_DEV, + .vaddr = 0, + .size = MAX_CHAIN_CMDBUF_SIZE, + }; + + abo = amdxdna_drm_alloc_dev_bo(&xdna->ddev, &args, client->filp, true); + if (IS_ERR(abo)) { + ret = PTR_ERR(abo); + goto free_cmd_bufs; + } + + XDNA_DBG(xdna, "Command buf %d addr 0x%llx size 0x%lx", + i, abo->mem.dev_addr, abo->mem.size); + priv->cmd_buf[i] = abo; + } + + sched = &priv->sched; + mutex_init(&priv->io_lock); + + fs_reclaim_acquire(GFP_KERNEL); + might_lock(&priv->io_lock); + fs_reclaim_release(GFP_KERNEL); + + ret = drm_sched_init(sched, &sched_ops, NULL, DRM_SCHED_PRIORITY_COUNT, + HWCTX_MAX_CMDS, 0, msecs_to_jiffies(HWCTX_MAX_TIMEOUT), + NULL, NULL, hwctx->name, xdna->ddev.dev); + if (ret) { + XDNA_ERR(xdna, "Failed to init DRM scheduler. ret %d", ret); + goto free_cmd_bufs; + } + + ret = drm_sched_entity_init(&priv->entity, DRM_SCHED_PRIORITY_NORMAL, + &sched, 1, NULL); + if (ret) { + XDNA_ERR(xdna, "Failed to initial sched entiry. ret %d", ret); + goto free_sched; + } + + ret = aie2_hwctx_col_list(hwctx); + if (ret) { + XDNA_ERR(xdna, "Create col list failed, ret %d", ret); + goto free_entity; + } + + ret = aie2_alloc_resource(hwctx); + if (ret) { + XDNA_ERR(xdna, "Alloc hw resource failed, ret %d", ret); + goto free_col_list; + } + + ret = aie2_map_host_buf(xdna->dev_handle, hwctx->fw_ctx_id, + heap->mem.userptr, heap->mem.size); + if (ret) { + XDNA_ERR(xdna, "Map host buffer failed, ret %d", ret); + goto release_resource; + } + + ret = aie2_ctx_syncobj_create(hwctx); + if (ret) { + XDNA_ERR(xdna, "Create syncobj failed, ret %d", ret); + goto release_resource; + } + + hwctx->status = HWCTX_STAT_INIT; + ndev = xdna->dev_handle; + ndev->hwctx_num++; + + XDNA_DBG(xdna, "hwctx %s init completed", hwctx->name); + + return 0; + +release_resource: + aie2_release_resource(hwctx); +free_col_list: + kfree(hwctx->col_list); +free_entity: + drm_sched_entity_destroy(&priv->entity); +free_sched: + drm_sched_fini(&priv->sched); +free_cmd_bufs: + for (i = 0; i < ARRAY_SIZE(priv->cmd_buf); i++) { + if (!priv->cmd_buf[i]) + continue; + drm_gem_object_put(to_gobj(priv->cmd_buf[i])); + } + amdxdna_gem_unpin(heap); +put_heap: + drm_gem_object_put(to_gobj(heap)); +free_priv: + kfree(priv); + return ret; +} + +void aie2_hwctx_fini(struct amdxdna_hwctx *hwctx) +{ + struct amdxdna_dev_hdl *ndev; + struct amdxdna_dev *xdna; + int idx; + + xdna = hwctx->client->xdna; + ndev = xdna->dev_handle; + ndev->hwctx_num--; + drm_sched_wqueue_stop(&hwctx->priv->sched); + + /* Now, scheduler will not send command to device. */ + aie2_release_resource(hwctx); + + /* + * All submitted commands are aborted. + * Restart scheduler queues to cleanup jobs. The amdxdna_sched_job_run() + * will return NODEV if it is called. + */ + drm_sched_wqueue_start(&hwctx->priv->sched); + + aie2_hwctx_wait_for_idle(hwctx); + drm_sched_entity_destroy(&hwctx->priv->entity); + drm_sched_fini(&hwctx->priv->sched); + aie2_ctx_syncobj_destroy(hwctx); + + XDNA_DBG(xdna, "%s sequence number %lld", hwctx->name, hwctx->priv->seq); + + for (idx = 0; idx < ARRAY_SIZE(hwctx->priv->cmd_buf); idx++) + drm_gem_object_put(to_gobj(hwctx->priv->cmd_buf[idx])); + amdxdna_gem_unpin(hwctx->priv->heap); + drm_gem_object_put(to_gobj(hwctx->priv->heap)); + + mutex_destroy(&hwctx->priv->io_lock); + kfree(hwctx->col_list); + kfree(hwctx->priv); + kfree(hwctx->cus); +} + +static int aie2_hwctx_cu_config(struct amdxdna_hwctx *hwctx, void *buf, u32 size) +{ + struct amdxdna_hwctx_param_config_cu *config = buf; + struct amdxdna_dev *xdna = hwctx->client->xdna; + u32 total_size; + int ret; + + XDNA_DBG(xdna, "Config %d CU to %s", config->num_cus, hwctx->name); + if (XDNA_MBZ_DBG(xdna, config->pad, sizeof(config->pad))) + return -EINVAL; + + if (hwctx->status != HWCTX_STAT_INIT) { + XDNA_ERR(xdna, "Not support re-config CU"); + return -EINVAL; + } + + if (!config->num_cus) { + XDNA_ERR(xdna, "Number of CU is zero"); + return -EINVAL; + } + + total_size = struct_size(config, cu_configs, config->num_cus); + if (total_size > size) { + XDNA_ERR(xdna, "CU config larger than size"); + return -EINVAL; + } + + hwctx->cus = kmemdup(config, total_size, GFP_KERNEL); + if (!hwctx->cus) + return -ENOMEM; + + ret = aie2_config_cu(hwctx); + if (ret) { + XDNA_ERR(xdna, "Config CU to firmware failed, ret %d", ret); + goto free_cus; + } + + wmb(); /* To avoid locking in command submit when check status */ + hwctx->status = HWCTX_STAT_READY; + + return 0; + +free_cus: + kfree(hwctx->cus); + hwctx->cus = NULL; + return ret; +} + +int aie2_hwctx_config(struct amdxdna_hwctx *hwctx, u32 type, u64 value, void *buf, u32 size) +{ + struct amdxdna_dev *xdna = hwctx->client->xdna; + + drm_WARN_ON(&xdna->ddev, !mutex_is_locked(&xdna->dev_lock)); + switch (type) { + case DRM_AMDXDNA_HWCTX_CONFIG_CU: + return aie2_hwctx_cu_config(hwctx, buf, size); + case DRM_AMDXDNA_HWCTX_ASSIGN_DBG_BUF: + case DRM_AMDXDNA_HWCTX_REMOVE_DBG_BUF: + return -EOPNOTSUPP; + default: + XDNA_DBG(xdna, "Not supported type %d", type); + return -EOPNOTSUPP; + } +} + +static int aie2_populate_range(struct amdxdna_gem_obj *abo) +{ + struct amdxdna_dev *xdna = to_xdna_dev(to_gobj(abo)->dev); + struct mm_struct *mm = abo->mem.notifier.mm; + struct hmm_range range = { 0 }; + unsigned long timeout; + int ret; + + XDNA_INFO_ONCE(xdna, "populate memory range %llx size %lx", + abo->mem.userptr, abo->mem.size); + range.notifier = &abo->mem.notifier; + range.start = abo->mem.userptr; + range.end = abo->mem.userptr + abo->mem.size; + range.hmm_pfns = abo->mem.pfns; + range.default_flags = HMM_PFN_REQ_FAULT; + + if (!mmget_not_zero(mm)) + return -EFAULT; + + timeout = jiffies + msecs_to_jiffies(HMM_RANGE_DEFAULT_TIMEOUT); +again: + range.notifier_seq = mmu_interval_read_begin(&abo->mem.notifier); + mmap_read_lock(mm); + ret = hmm_range_fault(&range); + mmap_read_unlock(mm); + if (ret) { + if (time_after(jiffies, timeout)) { + ret = -ETIME; + goto put_mm; + } + + if (ret == -EBUSY) + goto again; + + goto put_mm; + } + + down_read(&xdna->notifier_lock); + if (mmu_interval_read_retry(&abo->mem.notifier, range.notifier_seq)) { + up_read(&xdna->notifier_lock); + goto again; + } + abo->mem.map_invalid = false; + up_read(&xdna->notifier_lock); + +put_mm: + mmput(mm); + return ret; +} + +int aie2_cmd_submit(struct amdxdna_hwctx *hwctx, struct amdxdna_sched_job *job, u64 *seq) +{ + struct amdxdna_dev *xdna = hwctx->client->xdna; + struct ww_acquire_ctx acquire_ctx; + struct dma_fence_chain *chain; + struct amdxdna_gem_obj *abo; + unsigned long timeout = 0; + int ret, i; + + ret = down_interruptible(&hwctx->priv->job_sem); + if (ret) { + XDNA_ERR(xdna, "Grab job sem failed, ret %d", ret); + return ret; + } + + chain = dma_fence_chain_alloc(); + if (!chain) { + XDNA_ERR(xdna, "Alloc fence chain failed"); + ret = -ENOMEM; + goto up_sem; + } + + ret = drm_sched_job_init(&job->base, &hwctx->priv->entity, 1, hwctx); + if (ret) { + XDNA_ERR(xdna, "DRM job init failed, ret %d", ret); + goto free_chain; + } + +retry: + ret = drm_gem_lock_reservations(job->bos, job->bo_cnt, &acquire_ctx); + if (ret) { + XDNA_WARN(xdna, "Failed to lock BOs, ret %d", ret); + goto cleanup_job; + } + + for (i = 0; i < job->bo_cnt; i++) { + ret = dma_resv_reserve_fences(job->bos[i]->resv, 1); + if (ret) { + XDNA_WARN(xdna, "Failed to reserve fences %d", ret); + drm_gem_unlock_reservations(job->bos, job->bo_cnt, &acquire_ctx); + goto cleanup_job; + } + } + + down_read(&xdna->notifier_lock); + for (i = 0; i < job->bo_cnt; i++) { + abo = to_xdna_obj(job->bos[i]); + if (abo->mem.map_invalid) { + up_read(&xdna->notifier_lock); + drm_gem_unlock_reservations(job->bos, job->bo_cnt, &acquire_ctx); + if (!timeout) { + timeout = jiffies + + msecs_to_jiffies(HMM_RANGE_DEFAULT_TIMEOUT); + } else if (time_after(jiffies, timeout)) { + ret = -ETIME; + goto cleanup_job; + } + + ret = aie2_populate_range(abo); + if (ret) + goto cleanup_job; + goto retry; + } + } + + mutex_lock(&hwctx->priv->io_lock); + drm_sched_job_arm(&job->base); + job->out_fence = dma_fence_get(&job->base.s_fence->finished); + for (i = 0; i < job->bo_cnt; i++) + dma_resv_add_fence(job->bos[i]->resv, job->out_fence, DMA_RESV_USAGE_WRITE); + job->seq = hwctx->priv->seq++; + kref_get(&job->refcnt); + drm_sched_entity_push_job(&job->base); + + *seq = job->seq; + drm_syncobj_add_point(hwctx->priv->syncobj, chain, job->out_fence, *seq); + mutex_unlock(&hwctx->priv->io_lock); + + up_read(&xdna->notifier_lock); + drm_gem_unlock_reservations(job->bos, job->bo_cnt, &acquire_ctx); + + aie2_job_put(job); + + return 0; + +cleanup_job: + drm_sched_job_cleanup(&job->base); +free_chain: + dma_fence_chain_free(chain); +up_sem: + up(&hwctx->priv->job_sem); + job->job_done = true; + return ret; +} + +void aie2_hmm_invalidate(struct amdxdna_gem_obj *abo, + unsigned long cur_seq) +{ + struct amdxdna_dev *xdna = to_xdna_dev(to_gobj(abo)->dev); + struct drm_gem_object *gobj = to_gobj(abo); + long ret; + + down_write(&xdna->notifier_lock); + abo->mem.map_invalid = true; + mmu_interval_set_seq(&abo->mem.notifier, cur_seq); + up_write(&xdna->notifier_lock); + ret = dma_resv_wait_timeout(gobj->resv, DMA_RESV_USAGE_BOOKKEEP, + true, MAX_SCHEDULE_TIMEOUT); + if (!ret || ret == -ERESTARTSYS) + XDNA_ERR(xdna, "Failed to wait for bo, ret %ld", ret); +} diff --git a/drivers/accel/amdxdna/aie2_error.c b/drivers/accel/amdxdna/aie2_error.c new file mode 100644 index 0000000000000000000000000000000000000000..b1defaa8513bb465203d431ad02647a62b97c1e5 --- /dev/null +++ b/drivers/accel/amdxdna/aie2_error.c @@ -0,0 +1,360 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2023-2024, Advanced Micro Devices, Inc. + */ + +#include +#include +#include +#include +#include +#include +#include + +#include "aie2_msg_priv.h" +#include "aie2_pci.h" +#include "amdxdna_mailbox.h" +#include "amdxdna_pci_drv.h" + +struct async_event { + struct amdxdna_dev_hdl *ndev; + struct async_event_msg_resp resp; + struct workqueue_struct *wq; + struct work_struct work; + u8 *buf; + dma_addr_t addr; + u32 size; +}; + +struct async_events { + struct workqueue_struct *wq; + u8 *buf; + dma_addr_t addr; + u32 size; + u32 event_cnt; + struct async_event event[] __counted_by(event_cnt); +}; + +/* + * Below enum, struct and lookup tables are porting from XAIE util header file. + * + * Below data is defined by AIE device and it is used for decode error message + * from the device. + */ + +enum aie_module_type { + AIE_MEM_MOD = 0, + AIE_CORE_MOD, + AIE_PL_MOD, +}; + +enum aie_error_category { + AIE_ERROR_SATURATION = 0, + AIE_ERROR_FP, + AIE_ERROR_STREAM, + AIE_ERROR_ACCESS, + AIE_ERROR_BUS, + AIE_ERROR_INSTRUCTION, + AIE_ERROR_ECC, + AIE_ERROR_LOCK, + AIE_ERROR_DMA, + AIE_ERROR_MEM_PARITY, + /* Unknown is not from XAIE, added for better category */ + AIE_ERROR_UNKNOWN, +}; + +/* Don't pack, unless XAIE side changed */ +struct aie_error { + __u8 row; + __u8 col; + __u32 mod_type; + __u8 event_id; +}; + +struct aie_err_info { + u32 err_cnt; + u32 ret_code; + u32 rsvd; + struct aie_error payload[] __counted_by(err_cnt); +}; + +struct aie_event_category { + u8 event_id; + enum aie_error_category category; +}; + +#define EVENT_CATEGORY(id, cat) { id, cat } +static const struct aie_event_category aie_ml_mem_event_cat[] = { + EVENT_CATEGORY(88U, AIE_ERROR_ECC), + EVENT_CATEGORY(90U, AIE_ERROR_ECC), + EVENT_CATEGORY(91U, AIE_ERROR_MEM_PARITY), + EVENT_CATEGORY(92U, AIE_ERROR_MEM_PARITY), + EVENT_CATEGORY(93U, AIE_ERROR_MEM_PARITY), + EVENT_CATEGORY(94U, AIE_ERROR_MEM_PARITY), + EVENT_CATEGORY(95U, AIE_ERROR_MEM_PARITY), + EVENT_CATEGORY(96U, AIE_ERROR_MEM_PARITY), + EVENT_CATEGORY(97U, AIE_ERROR_DMA), + EVENT_CATEGORY(98U, AIE_ERROR_DMA), + EVENT_CATEGORY(99U, AIE_ERROR_DMA), + EVENT_CATEGORY(100U, AIE_ERROR_DMA), + EVENT_CATEGORY(101U, AIE_ERROR_LOCK), +}; + +static const struct aie_event_category aie_ml_core_event_cat[] = { + EVENT_CATEGORY(55U, AIE_ERROR_ACCESS), + EVENT_CATEGORY(56U, AIE_ERROR_STREAM), + EVENT_CATEGORY(57U, AIE_ERROR_STREAM), + EVENT_CATEGORY(58U, AIE_ERROR_BUS), + EVENT_CATEGORY(59U, AIE_ERROR_INSTRUCTION), + EVENT_CATEGORY(60U, AIE_ERROR_ACCESS), + EVENT_CATEGORY(62U, AIE_ERROR_ECC), + EVENT_CATEGORY(64U, AIE_ERROR_ECC), + EVENT_CATEGORY(65U, AIE_ERROR_ACCESS), + EVENT_CATEGORY(66U, AIE_ERROR_ACCESS), + EVENT_CATEGORY(67U, AIE_ERROR_LOCK), + EVENT_CATEGORY(70U, AIE_ERROR_INSTRUCTION), + EVENT_CATEGORY(71U, AIE_ERROR_STREAM), + EVENT_CATEGORY(72U, AIE_ERROR_BUS), +}; + +static const struct aie_event_category aie_ml_mem_tile_event_cat[] = { + EVENT_CATEGORY(130U, AIE_ERROR_ECC), + EVENT_CATEGORY(132U, AIE_ERROR_ECC), + EVENT_CATEGORY(133U, AIE_ERROR_DMA), + EVENT_CATEGORY(134U, AIE_ERROR_DMA), + EVENT_CATEGORY(135U, AIE_ERROR_STREAM), + EVENT_CATEGORY(136U, AIE_ERROR_STREAM), + EVENT_CATEGORY(137U, AIE_ERROR_STREAM), + EVENT_CATEGORY(138U, AIE_ERROR_BUS), + EVENT_CATEGORY(139U, AIE_ERROR_LOCK), +}; + +static const struct aie_event_category aie_ml_shim_tile_event_cat[] = { + EVENT_CATEGORY(64U, AIE_ERROR_BUS), + EVENT_CATEGORY(65U, AIE_ERROR_STREAM), + EVENT_CATEGORY(66U, AIE_ERROR_STREAM), + EVENT_CATEGORY(67U, AIE_ERROR_BUS), + EVENT_CATEGORY(68U, AIE_ERROR_BUS), + EVENT_CATEGORY(69U, AIE_ERROR_BUS), + EVENT_CATEGORY(70U, AIE_ERROR_BUS), + EVENT_CATEGORY(71U, AIE_ERROR_BUS), + EVENT_CATEGORY(72U, AIE_ERROR_DMA), + EVENT_CATEGORY(73U, AIE_ERROR_DMA), + EVENT_CATEGORY(74U, AIE_ERROR_LOCK), +}; + +static enum aie_error_category +aie_get_error_category(u8 row, u8 event_id, enum aie_module_type mod_type) +{ + const struct aie_event_category *lut; + int num_entry; + int i; + + switch (mod_type) { + case AIE_PL_MOD: + lut = aie_ml_shim_tile_event_cat; + num_entry = ARRAY_SIZE(aie_ml_shim_tile_event_cat); + break; + case AIE_CORE_MOD: + lut = aie_ml_core_event_cat; + num_entry = ARRAY_SIZE(aie_ml_core_event_cat); + break; + case AIE_MEM_MOD: + if (row == 1) { + lut = aie_ml_mem_tile_event_cat; + num_entry = ARRAY_SIZE(aie_ml_mem_tile_event_cat); + } else { + lut = aie_ml_mem_event_cat; + num_entry = ARRAY_SIZE(aie_ml_mem_event_cat); + } + break; + default: + return AIE_ERROR_UNKNOWN; + } + + for (i = 0; i < num_entry; i++) { + if (event_id != lut[i].event_id) + continue; + + return lut[i].category; + } + + return AIE_ERROR_UNKNOWN; +} + +static u32 aie2_error_backtrack(struct amdxdna_dev_hdl *ndev, void *err_info, u32 num_err) +{ + struct aie_error *errs = err_info; + u32 err_col = 0; /* assume that AIE has less than 32 columns */ + int i; + + /* Get err column bitmap */ + for (i = 0; i < num_err; i++) { + struct aie_error *err = &errs[i]; + enum aie_error_category cat; + + cat = aie_get_error_category(err->row, err->event_id, err->mod_type); + XDNA_ERR(ndev->xdna, "Row: %d, Col: %d, module %d, event ID %d, category %d", + err->row, err->col, err->mod_type, + err->event_id, cat); + + if (err->col >= 32) { + XDNA_WARN(ndev->xdna, "Invalid column number"); + break; + } + + err_col |= (1 << err->col); + } + + return err_col; +} + +static int aie2_error_async_cb(void *handle, const u32 *data, size_t size) +{ + struct async_event_msg_resp *resp; + struct async_event *e = handle; + + if (data) { + resp = (struct async_event_msg_resp *)data; + e->resp.type = resp->type; + wmb(); /* Update status in the end, so that no lock for here */ + e->resp.status = resp->status; + } + queue_work(e->wq, &e->work); + return 0; +} + +static int aie2_error_event_send(struct async_event *e) +{ + drm_clflush_virt_range(e->buf, e->size); /* device can access */ + return aie2_register_asyn_event_msg(e->ndev, e->addr, e->size, e, + aie2_error_async_cb); +} + +static void aie2_error_worker(struct work_struct *err_work) +{ + struct aie_err_info *info; + struct amdxdna_dev *xdna; + struct async_event *e; + u32 max_err; + u32 err_col; + + e = container_of(err_work, struct async_event, work); + + xdna = e->ndev->xdna; + + if (e->resp.status == MAX_AIE2_STATUS_CODE) + return; + + e->resp.status = MAX_AIE2_STATUS_CODE; + + print_hex_dump_debug("AIE error: ", DUMP_PREFIX_OFFSET, 16, 4, + e->buf, 0x100, false); + + info = (struct aie_err_info *)e->buf; + XDNA_DBG(xdna, "Error count %d return code %d", info->err_cnt, info->ret_code); + + max_err = (e->size - sizeof(*info)) / sizeof(struct aie_error); + if (unlikely(info->err_cnt > max_err)) { + WARN_ONCE(1, "Error count too large %d\n", info->err_cnt); + return; + } + err_col = aie2_error_backtrack(e->ndev, info->payload, info->err_cnt); + if (!err_col) { + XDNA_WARN(xdna, "Did not get error column"); + return; + } + + mutex_lock(&xdna->dev_lock); + /* Re-sent this event to firmware */ + if (aie2_error_event_send(e)) + XDNA_WARN(xdna, "Unable to register async event"); + mutex_unlock(&xdna->dev_lock); +} + +int aie2_error_async_events_send(struct amdxdna_dev_hdl *ndev) +{ + struct amdxdna_dev *xdna = ndev->xdna; + struct async_event *e; + int i, ret; + + drm_WARN_ON(&xdna->ddev, !mutex_is_locked(&xdna->dev_lock)); + for (i = 0; i < ndev->async_events->event_cnt; i++) { + e = &ndev->async_events->event[i]; + ret = aie2_error_event_send(e); + if (ret) + return ret; + } + + return 0; +} + +void aie2_error_async_events_free(struct amdxdna_dev_hdl *ndev) +{ + struct amdxdna_dev *xdna = ndev->xdna; + struct async_events *events; + + events = ndev->async_events; + + mutex_unlock(&xdna->dev_lock); + destroy_workqueue(events->wq); + mutex_lock(&xdna->dev_lock); + + dma_free_noncoherent(xdna->ddev.dev, events->size, events->buf, + events->addr, DMA_FROM_DEVICE); + kfree(events); +} + +int aie2_error_async_events_alloc(struct amdxdna_dev_hdl *ndev) +{ + struct amdxdna_dev *xdna = ndev->xdna; + u32 total_col = ndev->total_col; + u32 total_size = ASYNC_BUF_SIZE * total_col; + struct async_events *events; + int i, ret; + + events = kzalloc(struct_size(events, event, total_col), GFP_KERNEL); + if (!events) + return -ENOMEM; + + events->buf = dma_alloc_noncoherent(xdna->ddev.dev, total_size, &events->addr, + DMA_FROM_DEVICE, GFP_KERNEL); + if (!events->buf) { + ret = -ENOMEM; + goto free_events; + } + events->size = total_size; + events->event_cnt = total_col; + + events->wq = alloc_ordered_workqueue("async_wq", 0); + if (!events->wq) { + ret = -ENOMEM; + goto free_buf; + } + + for (i = 0; i < events->event_cnt; i++) { + struct async_event *e = &events->event[i]; + u32 offset = i * ASYNC_BUF_SIZE; + + e->ndev = ndev; + e->wq = events->wq; + e->buf = &events->buf[offset]; + e->addr = events->addr + offset; + e->size = ASYNC_BUF_SIZE; + e->resp.status = MAX_AIE2_STATUS_CODE; + INIT_WORK(&e->work, aie2_error_worker); + } + + ndev->async_events = events; + + XDNA_DBG(xdna, "Async event count %d, buf total size 0x%x", + events->event_cnt, events->size); + return 0; + +free_buf: + dma_free_noncoherent(xdna->ddev.dev, events->size, events->buf, + events->addr, DMA_FROM_DEVICE); +free_events: + kfree(events); + return ret; +} diff --git a/drivers/accel/amdxdna/aie2_message.c b/drivers/accel/amdxdna/aie2_message.c new file mode 100644 index 0000000000000000000000000000000000000000..9e2c9a44f76a4f26f05745508f006b1c3374ffc0 --- /dev/null +++ b/drivers/accel/amdxdna/aie2_message.c @@ -0,0 +1,776 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2023-2024, Advanced Micro Devices, Inc. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "aie2_msg_priv.h" +#include "aie2_pci.h" +#include "amdxdna_ctx.h" +#include "amdxdna_gem.h" +#include "amdxdna_mailbox.h" +#include "amdxdna_mailbox_helper.h" +#include "amdxdna_pci_drv.h" + +#define DECLARE_AIE2_MSG(name, op) \ + DECLARE_XDNA_MSG_COMMON(name, op, MAX_AIE2_STATUS_CODE) + +static int aie2_send_mgmt_msg_wait(struct amdxdna_dev_hdl *ndev, + struct xdna_mailbox_msg *msg) +{ + struct amdxdna_dev *xdna = ndev->xdna; + struct xdna_notify *hdl = msg->handle; + int ret; + + if (!ndev->mgmt_chann) + return -ENODEV; + + drm_WARN_ON(&xdna->ddev, !mutex_is_locked(&xdna->dev_lock)); + ret = xdna_send_msg_wait(xdna, ndev->mgmt_chann, msg); + if (ret == -ETIME) { + xdna_mailbox_stop_channel(ndev->mgmt_chann); + xdna_mailbox_destroy_channel(ndev->mgmt_chann); + ndev->mgmt_chann = NULL; + } + + if (!ret && *hdl->data != AIE2_STATUS_SUCCESS) { + XDNA_ERR(xdna, "command opcode 0x%x failed, status 0x%x", + msg->opcode, *hdl->data); + ret = -EINVAL; + } + + return ret; +} + +int aie2_suspend_fw(struct amdxdna_dev_hdl *ndev) +{ + DECLARE_AIE2_MSG(suspend, MSG_OP_SUSPEND); + + return aie2_send_mgmt_msg_wait(ndev, &msg); +} + +int aie2_resume_fw(struct amdxdna_dev_hdl *ndev) +{ + DECLARE_AIE2_MSG(suspend, MSG_OP_RESUME); + + return aie2_send_mgmt_msg_wait(ndev, &msg); +} + +int aie2_set_runtime_cfg(struct amdxdna_dev_hdl *ndev, u32 type, u64 value) +{ + DECLARE_AIE2_MSG(set_runtime_cfg, MSG_OP_SET_RUNTIME_CONFIG); + int ret; + + req.type = type; + req.value = value; + + ret = aie2_send_mgmt_msg_wait(ndev, &msg); + if (ret) { + XDNA_ERR(ndev->xdna, "Failed to set runtime config, ret %d", ret); + return ret; + } + + return 0; +} + +int aie2_get_runtime_cfg(struct amdxdna_dev_hdl *ndev, u32 type, u64 *value) +{ + DECLARE_AIE2_MSG(get_runtime_cfg, MSG_OP_GET_RUNTIME_CONFIG); + int ret; + + req.type = type; + ret = aie2_send_mgmt_msg_wait(ndev, &msg); + if (ret) { + XDNA_ERR(ndev->xdna, "Failed to get runtime config, ret %d", ret); + return ret; + } + + *value = resp.value; + return 0; +} + +int aie2_assign_mgmt_pasid(struct amdxdna_dev_hdl *ndev, u16 pasid) +{ + DECLARE_AIE2_MSG(assign_mgmt_pasid, MSG_OP_ASSIGN_MGMT_PASID); + + req.pasid = pasid; + + return aie2_send_mgmt_msg_wait(ndev, &msg); +} + +int aie2_query_aie_version(struct amdxdna_dev_hdl *ndev, struct aie_version *version) +{ + DECLARE_AIE2_MSG(aie_version_info, MSG_OP_QUERY_AIE_VERSION); + struct amdxdna_dev *xdna = ndev->xdna; + int ret; + + ret = aie2_send_mgmt_msg_wait(ndev, &msg); + if (ret) + return ret; + + XDNA_DBG(xdna, "Query AIE version - major: %u minor: %u completed", + resp.major, resp.minor); + + version->major = resp.major; + version->minor = resp.minor; + + return 0; +} + +int aie2_query_aie_metadata(struct amdxdna_dev_hdl *ndev, struct aie_metadata *metadata) +{ + DECLARE_AIE2_MSG(aie_tile_info, MSG_OP_QUERY_AIE_TILE_INFO); + int ret; + + ret = aie2_send_mgmt_msg_wait(ndev, &msg); + if (ret) + return ret; + + metadata->size = resp.info.size; + metadata->cols = resp.info.cols; + metadata->rows = resp.info.rows; + + metadata->version.major = resp.info.major; + metadata->version.minor = resp.info.minor; + + metadata->core.row_count = resp.info.core_rows; + metadata->core.row_start = resp.info.core_row_start; + metadata->core.dma_channel_count = resp.info.core_dma_channels; + metadata->core.lock_count = resp.info.core_locks; + metadata->core.event_reg_count = resp.info.core_events; + + metadata->mem.row_count = resp.info.mem_rows; + metadata->mem.row_start = resp.info.mem_row_start; + metadata->mem.dma_channel_count = resp.info.mem_dma_channels; + metadata->mem.lock_count = resp.info.mem_locks; + metadata->mem.event_reg_count = resp.info.mem_events; + + metadata->shim.row_count = resp.info.shim_rows; + metadata->shim.row_start = resp.info.shim_row_start; + metadata->shim.dma_channel_count = resp.info.shim_dma_channels; + metadata->shim.lock_count = resp.info.shim_locks; + metadata->shim.event_reg_count = resp.info.shim_events; + + return 0; +} + +int aie2_query_firmware_version(struct amdxdna_dev_hdl *ndev, + struct amdxdna_fw_ver *fw_ver) +{ + DECLARE_AIE2_MSG(firmware_version, MSG_OP_GET_FIRMWARE_VERSION); + int ret; + + ret = aie2_send_mgmt_msg_wait(ndev, &msg); + if (ret) + return ret; + + fw_ver->major = resp.major; + fw_ver->minor = resp.minor; + fw_ver->sub = resp.sub; + fw_ver->build = resp.build; + + return 0; +} + +int aie2_create_context(struct amdxdna_dev_hdl *ndev, struct amdxdna_hwctx *hwctx) +{ + DECLARE_AIE2_MSG(create_ctx, MSG_OP_CREATE_CONTEXT); + struct amdxdna_dev *xdna = ndev->xdna; + struct xdna_mailbox_chann_res x2i; + struct xdna_mailbox_chann_res i2x; + struct cq_pair *cq_pair; + u32 intr_reg; + int ret; + + req.aie_type = 1; + req.start_col = hwctx->start_col; + req.num_col = hwctx->num_col; + req.num_cq_pairs_requested = 1; + req.pasid = hwctx->client->pasid; + req.context_priority = 2; + + ret = aie2_send_mgmt_msg_wait(ndev, &msg); + if (ret) + return ret; + + hwctx->fw_ctx_id = resp.context_id; + WARN_ONCE(hwctx->fw_ctx_id == -1, "Unexpected context id"); + + cq_pair = &resp.cq_pair[0]; + x2i.mb_head_ptr_reg = AIE2_MBOX_OFF(ndev, cq_pair->x2i_q.head_addr); + x2i.mb_tail_ptr_reg = AIE2_MBOX_OFF(ndev, cq_pair->x2i_q.tail_addr); + x2i.rb_start_addr = AIE2_SRAM_OFF(ndev, cq_pair->x2i_q.buf_addr); + x2i.rb_size = cq_pair->x2i_q.buf_size; + + i2x.mb_head_ptr_reg = AIE2_MBOX_OFF(ndev, cq_pair->i2x_q.head_addr); + i2x.mb_tail_ptr_reg = AIE2_MBOX_OFF(ndev, cq_pair->i2x_q.tail_addr); + i2x.rb_start_addr = AIE2_SRAM_OFF(ndev, cq_pair->i2x_q.buf_addr); + i2x.rb_size = cq_pair->i2x_q.buf_size; + + ret = pci_irq_vector(to_pci_dev(xdna->ddev.dev), resp.msix_id); + if (ret == -EINVAL) { + XDNA_ERR(xdna, "not able to create channel"); + goto out_destroy_context; + } + + intr_reg = i2x.mb_head_ptr_reg + 4; + hwctx->priv->mbox_chann = xdna_mailbox_create_channel(ndev->mbox, &x2i, &i2x, + intr_reg, ret); + if (!hwctx->priv->mbox_chann) { + XDNA_ERR(xdna, "not able to create channel"); + ret = -EINVAL; + goto out_destroy_context; + } + + XDNA_DBG(xdna, "%s mailbox channel irq: %d, msix_id: %d", + hwctx->name, ret, resp.msix_id); + XDNA_DBG(xdna, "%s created fw ctx %d pasid %d", hwctx->name, + hwctx->fw_ctx_id, hwctx->client->pasid); + + return 0; + +out_destroy_context: + aie2_destroy_context(ndev, hwctx); + return ret; +} + +int aie2_destroy_context(struct amdxdna_dev_hdl *ndev, struct amdxdna_hwctx *hwctx) +{ + DECLARE_AIE2_MSG(destroy_ctx, MSG_OP_DESTROY_CONTEXT); + struct amdxdna_dev *xdna = ndev->xdna; + int ret; + + if (hwctx->fw_ctx_id == -1) + return 0; + + xdna_mailbox_stop_channel(hwctx->priv->mbox_chann); + + req.context_id = hwctx->fw_ctx_id; + ret = aie2_send_mgmt_msg_wait(ndev, &msg); + if (ret) + XDNA_WARN(xdna, "%s destroy context failed, ret %d", hwctx->name, ret); + + xdna_mailbox_destroy_channel(hwctx->priv->mbox_chann); + XDNA_DBG(xdna, "%s destroyed fw ctx %d", hwctx->name, + hwctx->fw_ctx_id); + hwctx->priv->mbox_chann = NULL; + hwctx->fw_ctx_id = -1; + + return ret; +} + +int aie2_map_host_buf(struct amdxdna_dev_hdl *ndev, u32 context_id, u64 addr, u64 size) +{ + DECLARE_AIE2_MSG(map_host_buffer, MSG_OP_MAP_HOST_BUFFER); + struct amdxdna_dev *xdna = ndev->xdna; + int ret; + + req.context_id = context_id; + req.buf_addr = addr; + req.buf_size = size; + ret = aie2_send_mgmt_msg_wait(ndev, &msg); + if (ret) + return ret; + + XDNA_DBG(xdna, "fw ctx %d map host buf addr 0x%llx size 0x%llx", + context_id, addr, size); + + return 0; +} + +int aie2_query_status(struct amdxdna_dev_hdl *ndev, char __user *buf, + u32 size, u32 *cols_filled) +{ + DECLARE_AIE2_MSG(aie_column_info, MSG_OP_QUERY_COL_STATUS); + struct amdxdna_dev *xdna = ndev->xdna; + struct amdxdna_client *client; + struct amdxdna_hwctx *hwctx; + unsigned long hwctx_id; + dma_addr_t dma_addr; + u32 aie_bitmap = 0; + u8 *buff_addr; + int ret, idx; + + buff_addr = dma_alloc_noncoherent(xdna->ddev.dev, size, &dma_addr, + DMA_FROM_DEVICE, GFP_KERNEL); + if (!buff_addr) + return -ENOMEM; + + /* Go through each hardware context and mark the AIE columns that are active */ + list_for_each_entry(client, &xdna->client_list, node) { + idx = srcu_read_lock(&client->hwctx_srcu); + amdxdna_for_each_hwctx(client, hwctx_id, hwctx) + aie_bitmap |= amdxdna_hwctx_col_map(hwctx); + srcu_read_unlock(&client->hwctx_srcu, idx); + } + + *cols_filled = 0; + req.dump_buff_addr = dma_addr; + req.dump_buff_size = size; + req.num_cols = hweight32(aie_bitmap); + req.aie_bitmap = aie_bitmap; + + drm_clflush_virt_range(buff_addr, size); /* device can access */ + ret = aie2_send_mgmt_msg_wait(ndev, &msg); + if (ret) { + XDNA_ERR(xdna, "Error during NPU query, status %d", ret); + goto fail; + } + + if (resp.status != AIE2_STATUS_SUCCESS) { + XDNA_ERR(xdna, "Query NPU status failed, status 0x%x", resp.status); + ret = -EINVAL; + goto fail; + } + XDNA_DBG(xdna, "Query NPU status completed"); + + if (size < resp.size) { + ret = -EINVAL; + XDNA_ERR(xdna, "Bad buffer size. Available: %u. Needs: %u", size, resp.size); + goto fail; + } + + if (copy_to_user(buf, buff_addr, resp.size)) { + ret = -EFAULT; + XDNA_ERR(xdna, "Failed to copy NPU status to user space"); + goto fail; + } + + *cols_filled = aie_bitmap; + +fail: + dma_free_noncoherent(xdna->ddev.dev, size, buff_addr, dma_addr, DMA_FROM_DEVICE); + return ret; +} + +int aie2_register_asyn_event_msg(struct amdxdna_dev_hdl *ndev, dma_addr_t addr, u32 size, + void *handle, int (*cb)(void*, const u32 *, size_t)) +{ + struct async_event_msg_req req = { 0 }; + struct xdna_mailbox_msg msg = { + .send_data = (u8 *)&req, + .send_size = sizeof(req), + .handle = handle, + .opcode = MSG_OP_REGISTER_ASYNC_EVENT_MSG, + .notify_cb = cb, + }; + + req.buf_addr = addr; + req.buf_size = size; + + XDNA_DBG(ndev->xdna, "Register addr 0x%llx size 0x%x", addr, size); + return xdna_mailbox_send_msg(ndev->mgmt_chann, &msg, TX_TIMEOUT); +} + +int aie2_config_cu(struct amdxdna_hwctx *hwctx) +{ + struct mailbox_channel *chann = hwctx->priv->mbox_chann; + struct amdxdna_dev *xdna = hwctx->client->xdna; + u32 shift = xdna->dev_info->dev_mem_buf_shift; + DECLARE_AIE2_MSG(config_cu, MSG_OP_CONFIG_CU); + struct drm_gem_object *gobj; + struct amdxdna_gem_obj *abo; + int ret, i; + + if (!chann) + return -ENODEV; + + if (hwctx->cus->num_cus > MAX_NUM_CUS) { + XDNA_DBG(xdna, "Exceed maximum CU %d", MAX_NUM_CUS); + return -EINVAL; + } + + for (i = 0; i < hwctx->cus->num_cus; i++) { + struct amdxdna_cu_config *cu = &hwctx->cus->cu_configs[i]; + + if (XDNA_MBZ_DBG(xdna, cu->pad, sizeof(cu->pad))) + return -EINVAL; + + gobj = drm_gem_object_lookup(hwctx->client->filp, cu->cu_bo); + if (!gobj) { + XDNA_ERR(xdna, "Lookup GEM object failed"); + return -EINVAL; + } + abo = to_xdna_obj(gobj); + + if (abo->type != AMDXDNA_BO_DEV) { + drm_gem_object_put(gobj); + XDNA_ERR(xdna, "Invalid BO type"); + return -EINVAL; + } + + req.cfgs[i] = FIELD_PREP(AIE2_MSG_CFG_CU_PDI_ADDR, + abo->mem.dev_addr >> shift); + req.cfgs[i] |= FIELD_PREP(AIE2_MSG_CFG_CU_FUNC, cu->cu_func); + XDNA_DBG(xdna, "CU %d full addr 0x%llx, cfg 0x%x", i, + abo->mem.dev_addr, req.cfgs[i]); + drm_gem_object_put(gobj); + } + req.num_cus = hwctx->cus->num_cus; + + ret = xdna_send_msg_wait(xdna, chann, &msg); + if (ret == -ETIME) + aie2_destroy_context(xdna->dev_handle, hwctx); + + if (resp.status == AIE2_STATUS_SUCCESS) { + XDNA_DBG(xdna, "Configure %d CUs, ret %d", req.num_cus, ret); + return 0; + } + + XDNA_ERR(xdna, "Command opcode 0x%x failed, status 0x%x ret %d", + msg.opcode, resp.status, ret); + return ret; +} + +int aie2_execbuf(struct amdxdna_hwctx *hwctx, struct amdxdna_sched_job *job, + int (*notify_cb)(void *, const u32 *, size_t)) +{ + struct mailbox_channel *chann = hwctx->priv->mbox_chann; + struct amdxdna_dev *xdna = hwctx->client->xdna; + struct amdxdna_gem_obj *cmd_abo = job->cmd_bo; + union { + struct execute_buffer_req ebuf; + struct exec_dpu_req dpu; + } req; + struct xdna_mailbox_msg msg; + u32 payload_len; + void *payload; + int cu_idx; + int ret; + u32 op; + + if (!chann) + return -ENODEV; + + payload = amdxdna_cmd_get_payload(cmd_abo, &payload_len); + if (!payload) { + XDNA_ERR(xdna, "Invalid command, cannot get payload"); + return -EINVAL; + } + + cu_idx = amdxdna_cmd_get_cu_idx(cmd_abo); + if (cu_idx < 0) { + XDNA_DBG(xdna, "Invalid cu idx"); + return -EINVAL; + } + + op = amdxdna_cmd_get_op(cmd_abo); + switch (op) { + case ERT_START_CU: + if (unlikely(payload_len > sizeof(req.ebuf.payload))) + XDNA_DBG(xdna, "Invalid ebuf payload len: %d", payload_len); + req.ebuf.cu_idx = cu_idx; + memcpy(req.ebuf.payload, payload, sizeof(req.ebuf.payload)); + msg.send_size = sizeof(req.ebuf); + msg.opcode = MSG_OP_EXECUTE_BUFFER_CF; + break; + case ERT_START_NPU: { + struct amdxdna_cmd_start_npu *sn = payload; + + if (unlikely(payload_len - sizeof(*sn) > sizeof(req.dpu.payload))) + XDNA_DBG(xdna, "Invalid dpu payload len: %d", payload_len); + req.dpu.inst_buf_addr = sn->buffer; + req.dpu.inst_size = sn->buffer_size; + req.dpu.inst_prop_cnt = sn->prop_count; + req.dpu.cu_idx = cu_idx; + memcpy(req.dpu.payload, sn->prop_args, sizeof(req.dpu.payload)); + msg.send_size = sizeof(req.dpu); + msg.opcode = MSG_OP_EXEC_DPU; + break; + } + default: + XDNA_DBG(xdna, "Invalid ERT cmd op code: %d", op); + return -EINVAL; + } + msg.handle = job; + msg.notify_cb = notify_cb; + msg.send_data = (u8 *)&req; + print_hex_dump_debug("cmd: ", DUMP_PREFIX_OFFSET, 16, 4, &req, + 0x40, false); + + ret = xdna_mailbox_send_msg(chann, &msg, TX_TIMEOUT); + if (ret) { + XDNA_ERR(xdna, "Send message failed"); + return ret; + } + + return 0; +} + +static int +aie2_cmdlist_fill_one_slot_cf(void *cmd_buf, u32 offset, + struct amdxdna_gem_obj *abo, u32 *size) +{ + struct cmd_chain_slot_execbuf_cf *buf = cmd_buf + offset; + int cu_idx = amdxdna_cmd_get_cu_idx(abo); + u32 payload_len; + void *payload; + + if (cu_idx < 0) + return -EINVAL; + + payload = amdxdna_cmd_get_payload(abo, &payload_len); + if (!payload) + return -EINVAL; + + if (!slot_cf_has_space(offset, payload_len)) + return -ENOSPC; + + buf->cu_idx = cu_idx; + buf->arg_cnt = payload_len / sizeof(u32); + memcpy(buf->args, payload, payload_len); + /* Accurate buf size to hint firmware to do necessary copy */ + *size = sizeof(*buf) + payload_len; + return 0; +} + +static int +aie2_cmdlist_fill_one_slot_dpu(void *cmd_buf, u32 offset, + struct amdxdna_gem_obj *abo, u32 *size) +{ + struct cmd_chain_slot_dpu *buf = cmd_buf + offset; + int cu_idx = amdxdna_cmd_get_cu_idx(abo); + struct amdxdna_cmd_start_npu *sn; + u32 payload_len; + void *payload; + u32 arg_sz; + + if (cu_idx < 0) + return -EINVAL; + + payload = amdxdna_cmd_get_payload(abo, &payload_len); + if (!payload) + return -EINVAL; + sn = payload; + arg_sz = payload_len - sizeof(*sn); + if (payload_len < sizeof(*sn) || arg_sz > MAX_DPU_ARGS_SIZE) + return -EINVAL; + + if (!slot_dpu_has_space(offset, arg_sz)) + return -ENOSPC; + + buf->inst_buf_addr = sn->buffer; + buf->inst_size = sn->buffer_size; + buf->inst_prop_cnt = sn->prop_count; + buf->cu_idx = cu_idx; + buf->arg_cnt = arg_sz / sizeof(u32); + memcpy(buf->args, sn->prop_args, arg_sz); + + /* Accurate buf size to hint firmware to do necessary copy */ + *size += sizeof(*buf) + arg_sz; + return 0; +} + +static int +aie2_cmdlist_fill_one_slot(u32 op, struct amdxdna_gem_obj *cmdbuf_abo, u32 offset, + struct amdxdna_gem_obj *abo, u32 *size) +{ + u32 this_op = amdxdna_cmd_get_op(abo); + void *cmd_buf = cmdbuf_abo->mem.kva; + int ret; + + if (this_op != op) { + ret = -EINVAL; + goto done; + } + + switch (op) { + case ERT_START_CU: + ret = aie2_cmdlist_fill_one_slot_cf(cmd_buf, offset, abo, size); + break; + case ERT_START_NPU: + ret = aie2_cmdlist_fill_one_slot_dpu(cmd_buf, offset, abo, size); + break; + default: + ret = -EOPNOTSUPP; + } + +done: + if (ret) { + XDNA_ERR(abo->client->xdna, "Can't fill slot for cmd op %d ret %d", + op, ret); + } + return ret; +} + +static inline struct amdxdna_gem_obj * +aie2_cmdlist_get_cmd_buf(struct amdxdna_sched_job *job) +{ + int idx = get_job_idx(job->seq); + + return job->hwctx->priv->cmd_buf[idx]; +} + +static void +aie2_cmdlist_prepare_request(struct cmd_chain_req *req, + struct amdxdna_gem_obj *cmdbuf_abo, u32 size, u32 cnt) +{ + req->buf_addr = cmdbuf_abo->mem.dev_addr; + req->buf_size = size; + req->count = cnt; + drm_clflush_virt_range(cmdbuf_abo->mem.kva, size); + XDNA_DBG(cmdbuf_abo->client->xdna, "Command buf addr 0x%llx size 0x%x count %d", + req->buf_addr, size, cnt); +} + +static inline u32 +aie2_cmd_op_to_msg_op(u32 op) +{ + switch (op) { + case ERT_START_CU: + return MSG_OP_CHAIN_EXEC_BUFFER_CF; + case ERT_START_NPU: + return MSG_OP_CHAIN_EXEC_DPU; + default: + return MSG_OP_MAX_OPCODE; + } +} + +int aie2_cmdlist_multi_execbuf(struct amdxdna_hwctx *hwctx, + struct amdxdna_sched_job *job, + int (*notify_cb)(void *, const u32 *, size_t)) +{ + struct amdxdna_gem_obj *cmdbuf_abo = aie2_cmdlist_get_cmd_buf(job); + struct mailbox_channel *chann = hwctx->priv->mbox_chann; + struct amdxdna_client *client = hwctx->client; + struct amdxdna_gem_obj *cmd_abo = job->cmd_bo; + struct amdxdna_cmd_chain *payload; + struct xdna_mailbox_msg msg; + struct cmd_chain_req req; + u32 payload_len; + u32 offset = 0; + u32 size; + int ret; + u32 op; + u32 i; + + op = amdxdna_cmd_get_op(cmd_abo); + payload = amdxdna_cmd_get_payload(cmd_abo, &payload_len); + if (op != ERT_CMD_CHAIN || !payload || + payload_len < struct_size(payload, data, payload->command_count)) + return -EINVAL; + + for (i = 0; i < payload->command_count; i++) { + u32 boh = (u32)(payload->data[i]); + struct amdxdna_gem_obj *abo; + + abo = amdxdna_gem_get_obj(client, boh, AMDXDNA_BO_CMD); + if (!abo) { + XDNA_ERR(client->xdna, "Failed to find cmd BO %d", boh); + return -ENOENT; + } + + /* All sub-cmd should have same op, use the first one. */ + if (i == 0) + op = amdxdna_cmd_get_op(abo); + + ret = aie2_cmdlist_fill_one_slot(op, cmdbuf_abo, offset, abo, &size); + amdxdna_gem_put_obj(abo); + if (ret) + return -EINVAL; + + offset += size; + } + + /* The offset is the accumulated total size of the cmd buffer */ + aie2_cmdlist_prepare_request(&req, cmdbuf_abo, offset, payload->command_count); + + msg.opcode = aie2_cmd_op_to_msg_op(op); + if (msg.opcode == MSG_OP_MAX_OPCODE) + return -EOPNOTSUPP; + msg.handle = job; + msg.notify_cb = notify_cb; + msg.send_data = (u8 *)&req; + msg.send_size = sizeof(req); + ret = xdna_mailbox_send_msg(chann, &msg, TX_TIMEOUT); + if (ret) { + XDNA_ERR(hwctx->client->xdna, "Send message failed"); + return ret; + } + + return 0; +} + +int aie2_cmdlist_single_execbuf(struct amdxdna_hwctx *hwctx, + struct amdxdna_sched_job *job, + int (*notify_cb)(void *, const u32 *, size_t)) +{ + struct amdxdna_gem_obj *cmdbuf_abo = aie2_cmdlist_get_cmd_buf(job); + struct mailbox_channel *chann = hwctx->priv->mbox_chann; + struct amdxdna_gem_obj *cmd_abo = job->cmd_bo; + struct xdna_mailbox_msg msg; + struct cmd_chain_req req; + u32 size; + int ret; + u32 op; + + op = amdxdna_cmd_get_op(cmd_abo); + ret = aie2_cmdlist_fill_one_slot(op, cmdbuf_abo, 0, cmd_abo, &size); + if (ret) + return ret; + + aie2_cmdlist_prepare_request(&req, cmdbuf_abo, size, 1); + + msg.opcode = aie2_cmd_op_to_msg_op(op); + if (msg.opcode == MSG_OP_MAX_OPCODE) + return -EOPNOTSUPP; + msg.handle = job; + msg.notify_cb = notify_cb; + msg.send_data = (u8 *)&req; + msg.send_size = sizeof(req); + ret = xdna_mailbox_send_msg(chann, &msg, TX_TIMEOUT); + if (ret) { + XDNA_ERR(hwctx->client->xdna, "Send message failed"); + return ret; + } + + return 0; +} + +int aie2_sync_bo(struct amdxdna_hwctx *hwctx, struct amdxdna_sched_job *job, + int (*notify_cb)(void *, const u32 *, size_t)) +{ + struct mailbox_channel *chann = hwctx->priv->mbox_chann; + struct amdxdna_gem_obj *abo = to_xdna_obj(job->bos[0]); + struct amdxdna_dev *xdna = hwctx->client->xdna; + struct xdna_mailbox_msg msg; + struct sync_bo_req req; + int ret = 0; + + req.src_addr = 0; + req.dst_addr = abo->mem.dev_addr - hwctx->client->dev_heap->mem.dev_addr; + req.size = abo->mem.size; + + /* Device to Host */ + req.type = FIELD_PREP(AIE2_MSG_SYNC_BO_SRC_TYPE, SYNC_BO_DEV_MEM) | + FIELD_PREP(AIE2_MSG_SYNC_BO_DST_TYPE, SYNC_BO_HOST_MEM); + + XDNA_DBG(xdna, "sync %d bytes src(0x%llx) to dst(0x%llx) completed", + req.size, req.src_addr, req.dst_addr); + + msg.handle = job; + msg.notify_cb = notify_cb; + msg.send_data = (u8 *)&req; + msg.send_size = sizeof(req); + msg.opcode = MSG_OP_SYNC_BO; + + ret = xdna_mailbox_send_msg(chann, &msg, TX_TIMEOUT); + if (ret) { + XDNA_ERR(xdna, "Send message failed"); + return ret; + } + + return 0; +} diff --git a/drivers/accel/amdxdna/aie2_msg_priv.h b/drivers/accel/amdxdna/aie2_msg_priv.h new file mode 100644 index 0000000000000000000000000000000000000000..4e02e744b470ebbb8f2b624deed35c426bd74d01 --- /dev/null +++ b/drivers/accel/amdxdna/aie2_msg_priv.h @@ -0,0 +1,370 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (C) 2022-2024, Advanced Micro Devices, Inc. + */ + +#ifndef _AIE2_MSG_PRIV_H_ +#define _AIE2_MSG_PRIV_H_ + +enum aie2_msg_opcode { + MSG_OP_CREATE_CONTEXT = 0x2, + MSG_OP_DESTROY_CONTEXT = 0x3, + MSG_OP_SYNC_BO = 0x7, + MSG_OP_EXECUTE_BUFFER_CF = 0xC, + MSG_OP_QUERY_COL_STATUS = 0xD, + MSG_OP_QUERY_AIE_TILE_INFO = 0xE, + MSG_OP_QUERY_AIE_VERSION = 0xF, + MSG_OP_EXEC_DPU = 0x10, + MSG_OP_CONFIG_CU = 0x11, + MSG_OP_CHAIN_EXEC_BUFFER_CF = 0x12, + MSG_OP_CHAIN_EXEC_DPU = 0x13, + MSG_OP_MAX_XRT_OPCODE, + MSG_OP_SUSPEND = 0x101, + MSG_OP_RESUME = 0x102, + MSG_OP_ASSIGN_MGMT_PASID = 0x103, + MSG_OP_INVOKE_SELF_TEST = 0x104, + MSG_OP_MAP_HOST_BUFFER = 0x106, + MSG_OP_GET_FIRMWARE_VERSION = 0x108, + MSG_OP_SET_RUNTIME_CONFIG = 0x10A, + MSG_OP_GET_RUNTIME_CONFIG = 0x10B, + MSG_OP_REGISTER_ASYNC_EVENT_MSG = 0x10C, + MSG_OP_MAX_DRV_OPCODE, + MSG_OP_GET_PROTOCOL_VERSION = 0x301, + MSG_OP_MAX_OPCODE +}; + +enum aie2_msg_status { + AIE2_STATUS_SUCCESS = 0x0, + /* AIE Error codes */ + AIE2_STATUS_AIE_SATURATION_ERROR = 0x1000001, + AIE2_STATUS_AIE_FP_ERROR = 0x1000002, + AIE2_STATUS_AIE_STREAM_ERROR = 0x1000003, + AIE2_STATUS_AIE_ACCESS_ERROR = 0x1000004, + AIE2_STATUS_AIE_BUS_ERROR = 0x1000005, + AIE2_STATUS_AIE_INSTRUCTION_ERROR = 0x1000006, + AIE2_STATUS_AIE_ECC_ERROR = 0x1000007, + AIE2_STATUS_AIE_LOCK_ERROR = 0x1000008, + AIE2_STATUS_AIE_DMA_ERROR = 0x1000009, + AIE2_STATUS_AIE_MEM_PARITY_ERROR = 0x100000a, + AIE2_STATUS_AIE_PWR_CFG_ERROR = 0x100000b, + AIE2_STATUS_AIE_BACKTRACK_ERROR = 0x100000c, + AIE2_STATUS_MAX_AIE_STATUS_CODE, + /* MGMT ERT Error codes */ + AIE2_STATUS_MGMT_ERT_SELF_TEST_FAILURE = 0x2000001, + AIE2_STATUS_MGMT_ERT_HASH_MISMATCH, + AIE2_STATUS_MGMT_ERT_NOAVAIL, + AIE2_STATUS_MGMT_ERT_INVALID_PARAM, + AIE2_STATUS_MGMT_ERT_ENTER_SUSPEND_FAILURE, + AIE2_STATUS_MGMT_ERT_BUSY, + AIE2_STATUS_MGMT_ERT_APPLICATION_ACTIVE, + MAX_MGMT_ERT_STATUS_CODE, + /* APP ERT Error codes */ + AIE2_STATUS_APP_ERT_FIRST_ERROR = 0x3000001, + AIE2_STATUS_APP_INVALID_INSTR, + AIE2_STATUS_APP_LOAD_PDI_FAIL, + MAX_APP_ERT_STATUS_CODE, + /* NPU RTOS Error Codes */ + AIE2_STATUS_INVALID_INPUT_BUFFER = 0x4000001, + AIE2_STATUS_INVALID_COMMAND, + AIE2_STATUS_INVALID_PARAM, + AIE2_STATUS_INVALID_OPERATION = 0x4000006, + AIE2_STATUS_ASYNC_EVENT_MSGS_FULL, + AIE2_STATUS_MAX_RTOS_STATUS_CODE, + MAX_AIE2_STATUS_CODE +}; + +struct assign_mgmt_pasid_req { + __u16 pasid; + __u16 reserved; +} __packed; + +struct assign_mgmt_pasid_resp { + enum aie2_msg_status status; +} __packed; + +struct map_host_buffer_req { + __u32 context_id; + __u64 buf_addr; + __u64 buf_size; +} __packed; + +struct map_host_buffer_resp { + enum aie2_msg_status status; +} __packed; + +#define MAX_CQ_PAIRS 2 +struct cq_info { + __u32 head_addr; + __u32 tail_addr; + __u32 buf_addr; + __u32 buf_size; +}; + +struct cq_pair { + struct cq_info x2i_q; + struct cq_info i2x_q; +}; + +struct create_ctx_req { + __u32 aie_type; + __u8 start_col; + __u8 num_col; + __u16 reserved; + __u8 num_cq_pairs_requested; + __u8 reserved1; + __u16 pasid; + __u32 pad[2]; + __u32 sec_comm_target_type; + __u32 context_priority; +} __packed; + +struct create_ctx_resp { + enum aie2_msg_status status; + __u32 context_id; + __u16 msix_id; + __u8 num_cq_pairs_allocated; + __u8 reserved; + struct cq_pair cq_pair[MAX_CQ_PAIRS]; +} __packed; + +struct destroy_ctx_req { + __u32 context_id; +} __packed; + +struct destroy_ctx_resp { + enum aie2_msg_status status; +} __packed; + +struct execute_buffer_req { + __u32 cu_idx; + __u32 payload[19]; +} __packed; + +struct exec_dpu_req { + __u64 inst_buf_addr; + __u32 inst_size; + __u32 inst_prop_cnt; + __u32 cu_idx; + __u32 payload[35]; +} __packed; + +struct execute_buffer_resp { + enum aie2_msg_status status; +} __packed; + +struct aie_tile_info { + __u32 size; + __u16 major; + __u16 minor; + __u16 cols; + __u16 rows; + __u16 core_rows; + __u16 mem_rows; + __u16 shim_rows; + __u16 core_row_start; + __u16 mem_row_start; + __u16 shim_row_start; + __u16 core_dma_channels; + __u16 mem_dma_channels; + __u16 shim_dma_channels; + __u16 core_locks; + __u16 mem_locks; + __u16 shim_locks; + __u16 core_events; + __u16 mem_events; + __u16 shim_events; + __u16 reserved; +}; + +struct aie_tile_info_req { + __u32 reserved; +} __packed; + +struct aie_tile_info_resp { + enum aie2_msg_status status; + struct aie_tile_info info; +} __packed; + +struct aie_version_info_req { + __u32 reserved; +} __packed; + +struct aie_version_info_resp { + enum aie2_msg_status status; + __u16 major; + __u16 minor; +} __packed; + +struct aie_column_info_req { + __u64 dump_buff_addr; + __u32 dump_buff_size; + __u32 num_cols; + __u32 aie_bitmap; +} __packed; + +struct aie_column_info_resp { + enum aie2_msg_status status; + __u32 size; +} __packed; + +struct suspend_req { + __u32 place_holder; +} __packed; + +struct suspend_resp { + enum aie2_msg_status status; +} __packed; + +struct resume_req { + __u32 place_holder; +} __packed; + +struct resume_resp { + enum aie2_msg_status status; +} __packed; + +struct check_header_hash_req { + __u64 hash_high; + __u64 hash_low; +} __packed; + +struct check_header_hash_resp { + enum aie2_msg_status status; +} __packed; + +struct query_error_req { + __u64 buf_addr; + __u32 buf_size; + __u32 next_row; + __u32 next_column; + __u32 next_module; +} __packed; + +struct query_error_resp { + enum aie2_msg_status status; + __u32 num_err; + __u32 has_next_err; + __u32 next_row; + __u32 next_column; + __u32 next_module; +} __packed; + +struct protocol_version_req { + __u32 reserved; +} __packed; + +struct protocol_version_resp { + enum aie2_msg_status status; + __u32 major; + __u32 minor; +} __packed; + +struct firmware_version_req { + __u32 reserved; +} __packed; + +struct firmware_version_resp { + enum aie2_msg_status status; + __u32 major; + __u32 minor; + __u32 sub; + __u32 build; +} __packed; + +#define MAX_NUM_CUS 32 +#define AIE2_MSG_CFG_CU_PDI_ADDR GENMASK(16, 0) +#define AIE2_MSG_CFG_CU_FUNC GENMASK(24, 17) +struct config_cu_req { + __u32 num_cus; + __u32 cfgs[MAX_NUM_CUS]; +} __packed; + +struct config_cu_resp { + enum aie2_msg_status status; +} __packed; + +struct set_runtime_cfg_req { + __u32 type; + __u64 value; +} __packed; + +struct set_runtime_cfg_resp { + enum aie2_msg_status status; +} __packed; + +struct get_runtime_cfg_req { + __u32 type; +} __packed; + +struct get_runtime_cfg_resp { + enum aie2_msg_status status; + __u64 value; +} __packed; + +enum async_event_type { + ASYNC_EVENT_TYPE_AIE_ERROR, + ASYNC_EVENT_TYPE_EXCEPTION, + MAX_ASYNC_EVENT_TYPE +}; + +#define ASYNC_BUF_SIZE SZ_8K +struct async_event_msg_req { + __u64 buf_addr; + __u32 buf_size; +} __packed; + +struct async_event_msg_resp { + enum aie2_msg_status status; + enum async_event_type type; +} __packed; + +#define MAX_CHAIN_CMDBUF_SIZE SZ_4K +#define slot_cf_has_space(offset, payload_size) \ + (MAX_CHAIN_CMDBUF_SIZE - ((offset) + (payload_size)) > \ + offsetof(struct cmd_chain_slot_execbuf_cf, args[0])) +struct cmd_chain_slot_execbuf_cf { + __u32 cu_idx; + __u32 arg_cnt; + __u32 args[] __counted_by(arg_cnt); +}; + +#define slot_dpu_has_space(offset, payload_size) \ + (MAX_CHAIN_CMDBUF_SIZE - ((offset) + (payload_size)) > \ + offsetof(struct cmd_chain_slot_dpu, args[0])) +struct cmd_chain_slot_dpu { + __u64 inst_buf_addr; + __u32 inst_size; + __u32 inst_prop_cnt; + __u32 cu_idx; + __u32 arg_cnt; +#define MAX_DPU_ARGS_SIZE (34 * sizeof(__u32)) + __u32 args[] __counted_by(arg_cnt); +}; + +struct cmd_chain_req { + __u64 buf_addr; + __u32 buf_size; + __u32 count; +} __packed; + +struct cmd_chain_resp { + enum aie2_msg_status status; + __u32 fail_cmd_idx; + enum aie2_msg_status fail_cmd_status; +} __packed; + +#define AIE2_MSG_SYNC_BO_SRC_TYPE GENMASK(3, 0) +#define AIE2_MSG_SYNC_BO_DST_TYPE GENMASK(7, 4) +struct sync_bo_req { + __u64 src_addr; + __u64 dst_addr; + __u32 size; +#define SYNC_BO_DEV_MEM 0 +#define SYNC_BO_HOST_MEM 2 + __u32 type; +} __packed; + +struct sync_bo_resp { + enum aie2_msg_status status; +} __packed; +#endif /* _AIE2_MSG_PRIV_H_ */ diff --git a/drivers/accel/amdxdna/aie2_pci.c b/drivers/accel/amdxdna/aie2_pci.c new file mode 100644 index 0000000000000000000000000000000000000000..8de8f3bd49876ecae99a29c3c8bd1103e2b4ee68 --- /dev/null +++ b/drivers/accel/amdxdna/aie2_pci.c @@ -0,0 +1,928 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2023-2024, Advanced Micro Devices, Inc. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "aie2_msg_priv.h" +#include "aie2_pci.h" +#include "aie2_solver.h" +#include "amdxdna_ctx.h" +#include "amdxdna_gem.h" +#include "amdxdna_mailbox.h" +#include "amdxdna_pci_drv.h" + +int aie2_max_col = XRS_MAX_COL; +module_param(aie2_max_col, uint, 0600); +MODULE_PARM_DESC(aie2_max_col, "Maximum column could be used"); + +/* + * The management mailbox channel is allocated by firmware. + * The related register and ring buffer information is on SRAM BAR. + * This struct is the register layout. + */ +#define MGMT_MBOX_MAGIC 0x55504e5f /* _NPU */ +struct mgmt_mbox_chann_info { + __u32 x2i_tail; + __u32 x2i_head; + __u32 x2i_buf; + __u32 x2i_buf_sz; + __u32 i2x_tail; + __u32 i2x_head; + __u32 i2x_buf; + __u32 i2x_buf_sz; + __u32 magic; + __u32 msi_id; + __u32 prot_major; + __u32 prot_minor; + __u32 rsvd[4]; +}; + +static int aie2_check_protocol(struct amdxdna_dev_hdl *ndev, u32 fw_major, u32 fw_minor) +{ + struct amdxdna_dev *xdna = ndev->xdna; + + /* + * The driver supported mailbox behavior is defined by + * ndev->priv->protocol_major and protocol_minor. + * + * When protocol_major and fw_major are different, it means driver + * and firmware are incompatible. + */ + if (ndev->priv->protocol_major != fw_major) { + XDNA_ERR(xdna, "Incompatible firmware protocol major %d minor %d", + fw_major, fw_minor); + return -EINVAL; + } + + /* + * When protocol_minor is greater then fw_minor, that means driver + * relies on operation the installed firmware does not support. + */ + if (ndev->priv->protocol_minor > fw_minor) { + XDNA_ERR(xdna, "Firmware minor version smaller than supported"); + return -EINVAL; + } + return 0; +} + +static void aie2_dump_chann_info_debug(struct amdxdna_dev_hdl *ndev) +{ + struct amdxdna_dev *xdna = ndev->xdna; + + XDNA_DBG(xdna, "i2x tail 0x%x", ndev->mgmt_i2x.mb_tail_ptr_reg); + XDNA_DBG(xdna, "i2x head 0x%x", ndev->mgmt_i2x.mb_head_ptr_reg); + XDNA_DBG(xdna, "i2x ringbuf 0x%x", ndev->mgmt_i2x.rb_start_addr); + XDNA_DBG(xdna, "i2x rsize 0x%x", ndev->mgmt_i2x.rb_size); + XDNA_DBG(xdna, "x2i tail 0x%x", ndev->mgmt_x2i.mb_tail_ptr_reg); + XDNA_DBG(xdna, "x2i head 0x%x", ndev->mgmt_x2i.mb_head_ptr_reg); + XDNA_DBG(xdna, "x2i ringbuf 0x%x", ndev->mgmt_x2i.rb_start_addr); + XDNA_DBG(xdna, "x2i rsize 0x%x", ndev->mgmt_x2i.rb_size); + XDNA_DBG(xdna, "x2i chann index 0x%x", ndev->mgmt_chan_idx); + XDNA_DBG(xdna, "mailbox protocol major 0x%x", ndev->mgmt_prot_major); + XDNA_DBG(xdna, "mailbox protocol minor 0x%x", ndev->mgmt_prot_minor); +} + +static int aie2_get_mgmt_chann_info(struct amdxdna_dev_hdl *ndev) +{ + struct mgmt_mbox_chann_info info_regs; + struct xdna_mailbox_chann_res *i2x; + struct xdna_mailbox_chann_res *x2i; + u32 addr, off; + u32 *reg; + int ret; + int i; + + /* + * Once firmware is alive, it will write management channel + * information in SRAM BAR and write the address of that information + * at FW_ALIVE_OFF offset in SRMA BAR. + * + * Read a non-zero value from FW_ALIVE_OFF implies that firmware + * is alive. + */ + ret = readx_poll_timeout(readl, SRAM_GET_ADDR(ndev, FW_ALIVE_OFF), + addr, addr, AIE2_INTERVAL, AIE2_TIMEOUT); + if (ret || !addr) + return -ETIME; + + off = AIE2_SRAM_OFF(ndev, addr); + reg = (u32 *)&info_regs; + for (i = 0; i < sizeof(info_regs) / sizeof(u32); i++) + reg[i] = readl(ndev->sram_base + off + i * sizeof(u32)); + + if (info_regs.magic != MGMT_MBOX_MAGIC) { + XDNA_ERR(ndev->xdna, "Invalid mbox magic 0x%x", info_regs.magic); + ret = -EINVAL; + goto done; + } + + i2x = &ndev->mgmt_i2x; + x2i = &ndev->mgmt_x2i; + + i2x->mb_head_ptr_reg = AIE2_MBOX_OFF(ndev, info_regs.i2x_head); + i2x->mb_tail_ptr_reg = AIE2_MBOX_OFF(ndev, info_regs.i2x_tail); + i2x->rb_start_addr = AIE2_SRAM_OFF(ndev, info_regs.i2x_buf); + i2x->rb_size = info_regs.i2x_buf_sz; + + x2i->mb_head_ptr_reg = AIE2_MBOX_OFF(ndev, info_regs.x2i_head); + x2i->mb_tail_ptr_reg = AIE2_MBOX_OFF(ndev, info_regs.x2i_tail); + x2i->rb_start_addr = AIE2_SRAM_OFF(ndev, info_regs.x2i_buf); + x2i->rb_size = info_regs.x2i_buf_sz; + + ndev->mgmt_chan_idx = info_regs.msi_id; + ndev->mgmt_prot_major = info_regs.prot_major; + ndev->mgmt_prot_minor = info_regs.prot_minor; + + ret = aie2_check_protocol(ndev, ndev->mgmt_prot_major, ndev->mgmt_prot_minor); + +done: + aie2_dump_chann_info_debug(ndev); + + /* Must clear address at FW_ALIVE_OFF */ + writel(0, SRAM_GET_ADDR(ndev, FW_ALIVE_OFF)); + + return ret; +} + +int aie2_runtime_cfg(struct amdxdna_dev_hdl *ndev, + enum rt_config_category category, u32 *val) +{ + const struct rt_config *cfg; + u32 value; + int ret; + + for (cfg = ndev->priv->rt_config; cfg->type; cfg++) { + if (cfg->category != category) + continue; + + value = val ? *val : cfg->value; + ret = aie2_set_runtime_cfg(ndev, cfg->type, value); + if (ret) { + XDNA_ERR(ndev->xdna, "Set type %d value %d failed", + cfg->type, value); + return ret; + } + } + + return 0; +} + +static int aie2_xdna_reset(struct amdxdna_dev_hdl *ndev) +{ + int ret; + + ret = aie2_suspend_fw(ndev); + if (ret) { + XDNA_ERR(ndev->xdna, "Suspend firmware failed"); + return ret; + } + + ret = aie2_resume_fw(ndev); + if (ret) { + XDNA_ERR(ndev->xdna, "Resume firmware failed"); + return ret; + } + + return 0; +} + +static int aie2_mgmt_fw_init(struct amdxdna_dev_hdl *ndev) +{ + int ret; + + ret = aie2_runtime_cfg(ndev, AIE2_RT_CFG_INIT, NULL); + if (ret) { + XDNA_ERR(ndev->xdna, "Runtime config failed"); + return ret; + } + + ret = aie2_assign_mgmt_pasid(ndev, 0); + if (ret) { + XDNA_ERR(ndev->xdna, "Can not assign PASID"); + return ret; + } + + ret = aie2_xdna_reset(ndev); + if (ret) { + XDNA_ERR(ndev->xdna, "Reset firmware failed"); + return ret; + } + + if (!ndev->async_events) + return 0; + + ret = aie2_error_async_events_send(ndev); + if (ret) { + XDNA_ERR(ndev->xdna, "Send async events failed"); + return ret; + } + + return 0; +} + +static int aie2_mgmt_fw_query(struct amdxdna_dev_hdl *ndev) +{ + int ret; + + ret = aie2_query_firmware_version(ndev, &ndev->xdna->fw_ver); + if (ret) { + XDNA_ERR(ndev->xdna, "query firmware version failed"); + return ret; + } + + ret = aie2_query_aie_version(ndev, &ndev->version); + if (ret) { + XDNA_ERR(ndev->xdna, "Query AIE version failed"); + return ret; + } + + ret = aie2_query_aie_metadata(ndev, &ndev->metadata); + if (ret) { + XDNA_ERR(ndev->xdna, "Query AIE metadata failed"); + return ret; + } + + return 0; +} + +static void aie2_mgmt_fw_fini(struct amdxdna_dev_hdl *ndev) +{ + if (aie2_suspend_fw(ndev)) + XDNA_ERR(ndev->xdna, "Suspend_fw failed"); + XDNA_DBG(ndev->xdna, "Firmware suspended"); +} + +static int aie2_xrs_load(void *cb_arg, struct xrs_action_load *action) +{ + struct amdxdna_hwctx *hwctx = cb_arg; + struct amdxdna_dev *xdna; + int ret; + + xdna = hwctx->client->xdna; + + hwctx->start_col = action->part.start_col; + hwctx->num_col = action->part.ncols; + ret = aie2_create_context(xdna->dev_handle, hwctx); + if (ret) + XDNA_ERR(xdna, "create context failed, ret %d", ret); + + return ret; +} + +static int aie2_xrs_unload(void *cb_arg) +{ + struct amdxdna_hwctx *hwctx = cb_arg; + struct amdxdna_dev *xdna; + int ret; + + xdna = hwctx->client->xdna; + + ret = aie2_destroy_context(xdna->dev_handle, hwctx); + if (ret) + XDNA_ERR(xdna, "destroy context failed, ret %d", ret); + + return ret; +} + +static int aie2_xrs_set_dft_dpm_level(struct drm_device *ddev, u32 dpm_level) +{ + struct amdxdna_dev *xdna = to_xdna_dev(ddev); + struct amdxdna_dev_hdl *ndev; + + drm_WARN_ON(&xdna->ddev, !mutex_is_locked(&xdna->dev_lock)); + + ndev = xdna->dev_handle; + ndev->dft_dpm_level = dpm_level; + if (ndev->pw_mode != POWER_MODE_DEFAULT || ndev->dpm_level == dpm_level) + return 0; + + return ndev->priv->hw_ops.set_dpm(ndev, dpm_level); +} + +static struct xrs_action_ops aie2_xrs_actions = { + .load = aie2_xrs_load, + .unload = aie2_xrs_unload, + .set_dft_dpm_level = aie2_xrs_set_dft_dpm_level, +}; + +static void aie2_hw_stop(struct amdxdna_dev *xdna) +{ + struct pci_dev *pdev = to_pci_dev(xdna->ddev.dev); + struct amdxdna_dev_hdl *ndev = xdna->dev_handle; + + if (ndev->dev_status <= AIE2_DEV_INIT) { + XDNA_ERR(xdna, "device is already stopped"); + return; + } + + aie2_mgmt_fw_fini(ndev); + xdna_mailbox_stop_channel(ndev->mgmt_chann); + xdna_mailbox_destroy_channel(ndev->mgmt_chann); + ndev->mgmt_chann = NULL; + drmm_kfree(&xdna->ddev, ndev->mbox); + ndev->mbox = NULL; + aie2_psp_stop(ndev->psp_hdl); + aie2_smu_fini(ndev); + pci_disable_device(pdev); + + ndev->dev_status = AIE2_DEV_INIT; +} + +static int aie2_hw_start(struct amdxdna_dev *xdna) +{ + struct pci_dev *pdev = to_pci_dev(xdna->ddev.dev); + struct amdxdna_dev_hdl *ndev = xdna->dev_handle; + struct xdna_mailbox_res mbox_res; + u32 xdna_mailbox_intr_reg; + int mgmt_mb_irq, ret; + + if (ndev->dev_status >= AIE2_DEV_START) { + XDNA_INFO(xdna, "device is already started"); + return 0; + } + + ret = pci_enable_device(pdev); + if (ret) { + XDNA_ERR(xdna, "failed to enable device, ret %d", ret); + return ret; + } + pci_set_master(pdev); + + ret = aie2_smu_init(ndev); + if (ret) { + XDNA_ERR(xdna, "failed to init smu, ret %d", ret); + goto disable_dev; + } + + ret = aie2_psp_start(ndev->psp_hdl); + if (ret) { + XDNA_ERR(xdna, "failed to start psp, ret %d", ret); + goto fini_smu; + } + + ret = aie2_get_mgmt_chann_info(ndev); + if (ret) { + XDNA_ERR(xdna, "firmware is not alive"); + goto stop_psp; + } + + mbox_res.ringbuf_base = (u64)ndev->sram_base; + mbox_res.ringbuf_size = pci_resource_len(pdev, xdna->dev_info->sram_bar); + mbox_res.mbox_base = (u64)ndev->mbox_base; + mbox_res.mbox_size = MBOX_SIZE(ndev); + mbox_res.name = "xdna_mailbox"; + ndev->mbox = xdnam_mailbox_create(&xdna->ddev, &mbox_res); + if (!ndev->mbox) { + XDNA_ERR(xdna, "failed to create mailbox device"); + ret = -ENODEV; + goto stop_psp; + } + + mgmt_mb_irq = pci_irq_vector(pdev, ndev->mgmt_chan_idx); + if (mgmt_mb_irq < 0) { + ret = mgmt_mb_irq; + XDNA_ERR(xdna, "failed to alloc irq vector, ret %d", ret); + goto stop_psp; + } + + xdna_mailbox_intr_reg = ndev->mgmt_i2x.mb_head_ptr_reg + 4; + ndev->mgmt_chann = xdna_mailbox_create_channel(ndev->mbox, + &ndev->mgmt_x2i, + &ndev->mgmt_i2x, + xdna_mailbox_intr_reg, + mgmt_mb_irq); + if (!ndev->mgmt_chann) { + XDNA_ERR(xdna, "failed to create management mailbox channel"); + ret = -EINVAL; + goto stop_psp; + } + + ret = aie2_pm_init(ndev); + if (ret) { + XDNA_ERR(xdna, "failed to init pm, ret %d", ret); + goto destroy_mgmt_chann; + } + + ret = aie2_mgmt_fw_init(ndev); + if (ret) { + XDNA_ERR(xdna, "initial mgmt firmware failed, ret %d", ret); + goto destroy_mgmt_chann; + } + + ndev->dev_status = AIE2_DEV_START; + + return 0; + +destroy_mgmt_chann: + xdna_mailbox_stop_channel(ndev->mgmt_chann); + xdna_mailbox_destroy_channel(ndev->mgmt_chann); +stop_psp: + aie2_psp_stop(ndev->psp_hdl); +fini_smu: + aie2_smu_fini(ndev); +disable_dev: + pci_disable_device(pdev); + + return ret; +} + +static int aie2_init(struct amdxdna_dev *xdna) +{ + struct pci_dev *pdev = to_pci_dev(xdna->ddev.dev); + void __iomem *tbl[PCI_NUM_RESOURCES] = {0}; + struct init_config xrs_cfg = { 0 }; + struct amdxdna_dev_hdl *ndev; + struct psp_config psp_conf; + const struct firmware *fw; + unsigned long bars = 0; + int i, nvec, ret; + + ndev = drmm_kzalloc(&xdna->ddev, sizeof(*ndev), GFP_KERNEL); + if (!ndev) + return -ENOMEM; + + ndev->priv = xdna->dev_info->dev_priv; + ndev->xdna = xdna; + + ret = request_firmware(&fw, ndev->priv->fw_path, &pdev->dev); + if (ret) { + XDNA_ERR(xdna, "failed to request_firmware %s, ret %d", + ndev->priv->fw_path, ret); + return ret; + } + + ret = pcim_enable_device(pdev); + if (ret) { + XDNA_ERR(xdna, "pcim enable device failed, ret %d", ret); + goto release_fw; + } + + for (i = 0; i < PSP_MAX_REGS; i++) + set_bit(PSP_REG_BAR(ndev, i), &bars); + + set_bit(xdna->dev_info->sram_bar, &bars); + set_bit(xdna->dev_info->smu_bar, &bars); + set_bit(xdna->dev_info->mbox_bar, &bars); + + for (i = 0; i < PCI_NUM_RESOURCES; i++) { + if (!test_bit(i, &bars)) + continue; + tbl[i] = pcim_iomap(pdev, i, 0); + if (!tbl[i]) { + XDNA_ERR(xdna, "map bar %d failed", i); + ret = -ENOMEM; + goto release_fw; + } + } + + ndev->sram_base = tbl[xdna->dev_info->sram_bar]; + ndev->smu_base = tbl[xdna->dev_info->smu_bar]; + ndev->mbox_base = tbl[xdna->dev_info->mbox_bar]; + + ret = dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64)); + if (ret) { + XDNA_ERR(xdna, "Failed to set DMA mask: %d", ret); + goto release_fw; + } + + nvec = pci_msix_vec_count(pdev); + if (nvec <= 0) { + XDNA_ERR(xdna, "does not get number of interrupt vector"); + ret = -EINVAL; + goto release_fw; + } + + ret = pci_alloc_irq_vectors(pdev, nvec, nvec, PCI_IRQ_MSIX); + if (ret < 0) { + XDNA_ERR(xdna, "failed to alloc irq vectors, ret %d", ret); + goto release_fw; + } + + ret = iommu_dev_enable_feature(&pdev->dev, IOMMU_DEV_FEAT_SVA); + if (ret) { + XDNA_ERR(xdna, "Enable PASID failed, ret %d", ret); + goto free_irq; + } + + psp_conf.fw_size = fw->size; + psp_conf.fw_buf = fw->data; + for (i = 0; i < PSP_MAX_REGS; i++) + psp_conf.psp_regs[i] = tbl[PSP_REG_BAR(ndev, i)] + PSP_REG_OFF(ndev, i); + ndev->psp_hdl = aie2m_psp_create(&xdna->ddev, &psp_conf); + if (!ndev->psp_hdl) { + XDNA_ERR(xdna, "failed to create psp"); + ret = -ENOMEM; + goto disable_sva; + } + xdna->dev_handle = ndev; + + ret = aie2_hw_start(xdna); + if (ret) { + XDNA_ERR(xdna, "start npu failed, ret %d", ret); + goto disable_sva; + } + + ret = aie2_mgmt_fw_query(ndev); + if (ret) { + XDNA_ERR(xdna, "Query firmware failed, ret %d", ret); + goto stop_hw; + } + ndev->total_col = min(aie2_max_col, ndev->metadata.cols); + + xrs_cfg.clk_list.num_levels = ndev->max_dpm_level + 1; + for (i = 0; i < xrs_cfg.clk_list.num_levels; i++) + xrs_cfg.clk_list.cu_clk_list[i] = ndev->priv->dpm_clk_tbl[i].hclk; + xrs_cfg.sys_eff_factor = 1; + xrs_cfg.ddev = &xdna->ddev; + xrs_cfg.actions = &aie2_xrs_actions; + xrs_cfg.total_col = ndev->total_col; + + xdna->xrs_hdl = xrsm_init(&xrs_cfg); + if (!xdna->xrs_hdl) { + XDNA_ERR(xdna, "Initialize resolver failed"); + ret = -EINVAL; + goto stop_hw; + } + + ret = aie2_error_async_events_alloc(ndev); + if (ret) { + XDNA_ERR(xdna, "Allocate async events failed, ret %d", ret); + goto stop_hw; + } + + ret = aie2_error_async_events_send(ndev); + if (ret) { + XDNA_ERR(xdna, "Send async events failed, ret %d", ret); + goto async_event_free; + } + + /* Issue a command to make sure firmware handled async events */ + ret = aie2_query_firmware_version(ndev, &ndev->xdna->fw_ver); + if (ret) { + XDNA_ERR(xdna, "Re-query firmware version failed"); + goto async_event_free; + } + + release_firmware(fw); + return 0; + +async_event_free: + aie2_error_async_events_free(ndev); +stop_hw: + aie2_hw_stop(xdna); +disable_sva: + iommu_dev_disable_feature(&pdev->dev, IOMMU_DEV_FEAT_SVA); +free_irq: + pci_free_irq_vectors(pdev); +release_fw: + release_firmware(fw); + + return ret; +} + +static void aie2_fini(struct amdxdna_dev *xdna) +{ + struct pci_dev *pdev = to_pci_dev(xdna->ddev.dev); + struct amdxdna_dev_hdl *ndev = xdna->dev_handle; + + aie2_hw_stop(xdna); + aie2_error_async_events_free(ndev); + iommu_dev_disable_feature(&pdev->dev, IOMMU_DEV_FEAT_SVA); + pci_free_irq_vectors(pdev); +} + +static int aie2_get_aie_status(struct amdxdna_client *client, + struct amdxdna_drm_get_info *args) +{ + struct amdxdna_drm_query_aie_status status; + struct amdxdna_dev *xdna = client->xdna; + struct amdxdna_dev_hdl *ndev; + int ret; + + ndev = xdna->dev_handle; + if (copy_from_user(&status, u64_to_user_ptr(args->buffer), sizeof(status))) { + XDNA_ERR(xdna, "Failed to copy AIE request into kernel"); + return -EFAULT; + } + + if (ndev->metadata.cols * ndev->metadata.size < status.buffer_size) { + XDNA_ERR(xdna, "Invalid buffer size. Given Size: %u. Need Size: %u.", + status.buffer_size, ndev->metadata.cols * ndev->metadata.size); + return -EINVAL; + } + + ret = aie2_query_status(ndev, u64_to_user_ptr(status.buffer), + status.buffer_size, &status.cols_filled); + if (ret) { + XDNA_ERR(xdna, "Failed to get AIE status info. Ret: %d", ret); + return ret; + } + + if (copy_to_user(u64_to_user_ptr(args->buffer), &status, sizeof(status))) { + XDNA_ERR(xdna, "Failed to copy AIE request info to user space"); + return -EFAULT; + } + + return 0; +} + +static int aie2_get_aie_metadata(struct amdxdna_client *client, + struct amdxdna_drm_get_info *args) +{ + struct amdxdna_drm_query_aie_metadata *meta; + struct amdxdna_dev *xdna = client->xdna; + struct amdxdna_dev_hdl *ndev; + int ret = 0; + + ndev = xdna->dev_handle; + meta = kzalloc(sizeof(*meta), GFP_KERNEL); + if (!meta) + return -ENOMEM; + + meta->col_size = ndev->metadata.size; + meta->cols = ndev->metadata.cols; + meta->rows = ndev->metadata.rows; + + meta->version.major = ndev->metadata.version.major; + meta->version.minor = ndev->metadata.version.minor; + + meta->core.row_count = ndev->metadata.core.row_count; + meta->core.row_start = ndev->metadata.core.row_start; + meta->core.dma_channel_count = ndev->metadata.core.dma_channel_count; + meta->core.lock_count = ndev->metadata.core.lock_count; + meta->core.event_reg_count = ndev->metadata.core.event_reg_count; + + meta->mem.row_count = ndev->metadata.mem.row_count; + meta->mem.row_start = ndev->metadata.mem.row_start; + meta->mem.dma_channel_count = ndev->metadata.mem.dma_channel_count; + meta->mem.lock_count = ndev->metadata.mem.lock_count; + meta->mem.event_reg_count = ndev->metadata.mem.event_reg_count; + + meta->shim.row_count = ndev->metadata.shim.row_count; + meta->shim.row_start = ndev->metadata.shim.row_start; + meta->shim.dma_channel_count = ndev->metadata.shim.dma_channel_count; + meta->shim.lock_count = ndev->metadata.shim.lock_count; + meta->shim.event_reg_count = ndev->metadata.shim.event_reg_count; + + if (copy_to_user(u64_to_user_ptr(args->buffer), meta, sizeof(*meta))) + ret = -EFAULT; + + kfree(meta); + return ret; +} + +static int aie2_get_aie_version(struct amdxdna_client *client, + struct amdxdna_drm_get_info *args) +{ + struct amdxdna_drm_query_aie_version version; + struct amdxdna_dev *xdna = client->xdna; + struct amdxdna_dev_hdl *ndev; + + ndev = xdna->dev_handle; + version.major = ndev->version.major; + version.minor = ndev->version.minor; + + if (copy_to_user(u64_to_user_ptr(args->buffer), &version, sizeof(version))) + return -EFAULT; + + return 0; +} + +static int aie2_get_firmware_version(struct amdxdna_client *client, + struct amdxdna_drm_get_info *args) +{ + struct amdxdna_drm_query_firmware_version version; + struct amdxdna_dev *xdna = client->xdna; + + version.major = xdna->fw_ver.major; + version.minor = xdna->fw_ver.minor; + version.patch = xdna->fw_ver.sub; + version.build = xdna->fw_ver.build; + + if (copy_to_user(u64_to_user_ptr(args->buffer), &version, sizeof(version))) + return -EFAULT; + + return 0; +} + +static int aie2_get_power_mode(struct amdxdna_client *client, + struct amdxdna_drm_get_info *args) +{ + struct amdxdna_drm_get_power_mode mode = {}; + struct amdxdna_dev *xdna = client->xdna; + struct amdxdna_dev_hdl *ndev; + + ndev = xdna->dev_handle; + mode.power_mode = ndev->pw_mode; + + if (copy_to_user(u64_to_user_ptr(args->buffer), &mode, sizeof(mode))) + return -EFAULT; + + return 0; +} + +static int aie2_get_clock_metadata(struct amdxdna_client *client, + struct amdxdna_drm_get_info *args) +{ + struct amdxdna_drm_query_clock_metadata *clock; + struct amdxdna_dev *xdna = client->xdna; + struct amdxdna_dev_hdl *ndev; + int ret = 0; + + ndev = xdna->dev_handle; + clock = kzalloc(sizeof(*clock), GFP_KERNEL); + if (!clock) + return -ENOMEM; + + snprintf(clock->mp_npu_clock.name, sizeof(clock->mp_npu_clock.name), + "MP-NPU Clock"); + clock->mp_npu_clock.freq_mhz = ndev->npuclk_freq; + snprintf(clock->h_clock.name, sizeof(clock->h_clock.name), "H Clock"); + clock->h_clock.freq_mhz = ndev->hclk_freq; + + if (copy_to_user(u64_to_user_ptr(args->buffer), clock, sizeof(*clock))) + ret = -EFAULT; + + kfree(clock); + return ret; +} + +static int aie2_get_hwctx_status(struct amdxdna_client *client, + struct amdxdna_drm_get_info *args) +{ + struct amdxdna_drm_query_hwctx __user *buf; + struct amdxdna_dev *xdna = client->xdna; + struct amdxdna_drm_query_hwctx *tmp; + struct amdxdna_client *tmp_client; + struct amdxdna_hwctx *hwctx; + unsigned long hwctx_id; + bool overflow = false; + u32 req_bytes = 0; + u32 hw_i = 0; + int ret = 0; + int idx; + + drm_WARN_ON(&xdna->ddev, !mutex_is_locked(&xdna->dev_lock)); + + tmp = kzalloc(sizeof(*tmp), GFP_KERNEL); + if (!tmp) + return -ENOMEM; + + buf = u64_to_user_ptr(args->buffer); + list_for_each_entry(tmp_client, &xdna->client_list, node) { + idx = srcu_read_lock(&tmp_client->hwctx_srcu); + amdxdna_for_each_hwctx(tmp_client, hwctx_id, hwctx) { + req_bytes += sizeof(*tmp); + if (args->buffer_size < req_bytes) { + /* Continue iterating to get the required size */ + overflow = true; + continue; + } + + memset(tmp, 0, sizeof(*tmp)); + tmp->pid = tmp_client->pid; + tmp->context_id = hwctx->id; + tmp->start_col = hwctx->start_col; + tmp->num_col = hwctx->num_col; + tmp->command_submissions = hwctx->priv->seq; + tmp->command_completions = hwctx->priv->completed; + + if (copy_to_user(&buf[hw_i], tmp, sizeof(*tmp))) { + ret = -EFAULT; + srcu_read_unlock(&tmp_client->hwctx_srcu, idx); + goto out; + } + hw_i++; + } + srcu_read_unlock(&tmp_client->hwctx_srcu, idx); + } + + if (overflow) { + XDNA_ERR(xdna, "Invalid buffer size. Given: %u Need: %u.", + args->buffer_size, req_bytes); + ret = -EINVAL; + } + +out: + kfree(tmp); + args->buffer_size = req_bytes; + return ret; +} + +static int aie2_get_info(struct amdxdna_client *client, struct amdxdna_drm_get_info *args) +{ + struct amdxdna_dev *xdna = client->xdna; + int ret, idx; + + if (!drm_dev_enter(&xdna->ddev, &idx)) + return -ENODEV; + + switch (args->param) { + case DRM_AMDXDNA_QUERY_AIE_STATUS: + ret = aie2_get_aie_status(client, args); + break; + case DRM_AMDXDNA_QUERY_AIE_METADATA: + ret = aie2_get_aie_metadata(client, args); + break; + case DRM_AMDXDNA_QUERY_AIE_VERSION: + ret = aie2_get_aie_version(client, args); + break; + case DRM_AMDXDNA_QUERY_CLOCK_METADATA: + ret = aie2_get_clock_metadata(client, args); + break; + case DRM_AMDXDNA_QUERY_HW_CONTEXTS: + ret = aie2_get_hwctx_status(client, args); + break; + case DRM_AMDXDNA_QUERY_FIRMWARE_VERSION: + ret = aie2_get_firmware_version(client, args); + break; + case DRM_AMDXDNA_GET_POWER_MODE: + ret = aie2_get_power_mode(client, args); + break; + default: + XDNA_ERR(xdna, "Not supported request parameter %u", args->param); + ret = -EOPNOTSUPP; + } + XDNA_DBG(xdna, "Got param %d", args->param); + + drm_dev_exit(idx); + return ret; +} + +static int aie2_set_power_mode(struct amdxdna_client *client, + struct amdxdna_drm_set_state *args) +{ + struct amdxdna_drm_set_power_mode power_state; + enum amdxdna_power_mode_type power_mode; + struct amdxdna_dev *xdna = client->xdna; + + if (copy_from_user(&power_state, u64_to_user_ptr(args->buffer), + sizeof(power_state))) { + XDNA_ERR(xdna, "Failed to copy power mode request into kernel"); + return -EFAULT; + } + + if (XDNA_MBZ_DBG(xdna, power_state.pad, sizeof(power_state.pad))) + return -EINVAL; + + power_mode = power_state.power_mode; + if (power_mode > POWER_MODE_TURBO) { + XDNA_ERR(xdna, "Invalid power mode %d", power_mode); + return -EINVAL; + } + + return aie2_pm_set_mode(xdna->dev_handle, power_mode); +} + +static int aie2_set_state(struct amdxdna_client *client, + struct amdxdna_drm_set_state *args) +{ + struct amdxdna_dev *xdna = client->xdna; + int ret, idx; + + if (!drm_dev_enter(&xdna->ddev, &idx)) + return -ENODEV; + + switch (args->param) { + case DRM_AMDXDNA_SET_POWER_MODE: + ret = aie2_set_power_mode(client, args); + break; + default: + XDNA_ERR(xdna, "Not supported request parameter %u", args->param); + ret = -EOPNOTSUPP; + break; + } + + drm_dev_exit(idx); + return ret; +} + +const struct amdxdna_dev_ops aie2_ops = { + .init = aie2_init, + .fini = aie2_fini, + .resume = aie2_hw_start, + .suspend = aie2_hw_stop, + .get_aie_info = aie2_get_info, + .set_aie_state = aie2_set_state, + .hwctx_init = aie2_hwctx_init, + .hwctx_fini = aie2_hwctx_fini, + .hwctx_config = aie2_hwctx_config, + .cmd_submit = aie2_cmd_submit, + .hmm_invalidate = aie2_hmm_invalidate, + .hwctx_suspend = aie2_hwctx_suspend, + .hwctx_resume = aie2_hwctx_resume, +}; diff --git a/drivers/accel/amdxdna/aie2_pci.h b/drivers/accel/amdxdna/aie2_pci.h new file mode 100644 index 0000000000000000000000000000000000000000..cc159cadff9f6fea63551ea67bd9ce6c2081ff2b --- /dev/null +++ b/drivers/accel/amdxdna/aie2_pci.h @@ -0,0 +1,297 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (C) 2023-2024, Advanced Micro Devices, Inc. + */ + +#ifndef _AIE2_PCI_H_ +#define _AIE2_PCI_H_ + +#include +#include + +#include "amdxdna_mailbox.h" + +#define AIE2_INTERVAL 20000 /* us */ +#define AIE2_TIMEOUT 1000000 /* us */ + +/* Firmware determines device memory base address and size */ +#define AIE2_DEVM_BASE 0x4000000 +#define AIE2_DEVM_SIZE SZ_64M + +#define NDEV2PDEV(ndev) (to_pci_dev((ndev)->xdna->ddev.dev)) + +#define AIE2_SRAM_OFF(ndev, addr) ((addr) - (ndev)->priv->sram_dev_addr) +#define AIE2_MBOX_OFF(ndev, addr) ((addr) - (ndev)->priv->mbox_dev_addr) + +#define PSP_REG_BAR(ndev, idx) ((ndev)->priv->psp_regs_off[(idx)].bar_idx) +#define PSP_REG_OFF(ndev, idx) ((ndev)->priv->psp_regs_off[(idx)].offset) +#define SRAM_REG_OFF(ndev, idx) ((ndev)->priv->sram_offs[(idx)].offset) + +#define SMU_REG(ndev, idx) \ +({ \ + typeof(ndev) _ndev = ndev; \ + ((_ndev)->smu_base + (_ndev)->priv->smu_regs_off[(idx)].offset); \ +}) +#define SRAM_GET_ADDR(ndev, idx) \ +({ \ + typeof(ndev) _ndev = ndev; \ + ((_ndev)->sram_base + SRAM_REG_OFF((_ndev), (idx))); \ +}) + +#define CHAN_SLOT_SZ SZ_8K +#define MBOX_SIZE(ndev) \ +({ \ + typeof(ndev) _ndev = (ndev); \ + ((_ndev)->priv->mbox_size) ? (_ndev)->priv->mbox_size : \ + pci_resource_len(NDEV2PDEV(_ndev), (_ndev)->xdna->dev_info->mbox_bar); \ +}) + +enum aie2_smu_reg_idx { + SMU_CMD_REG = 0, + SMU_ARG_REG, + SMU_INTR_REG, + SMU_RESP_REG, + SMU_OUT_REG, + SMU_MAX_REGS /* Keep this at the end */ +}; + +enum aie2_sram_reg_idx { + MBOX_CHANN_OFF = 0, + FW_ALIVE_OFF, + SRAM_MAX_INDEX /* Keep this at the end */ +}; + +enum psp_reg_idx { + PSP_CMD_REG = 0, + PSP_ARG0_REG, + PSP_ARG1_REG, + PSP_ARG2_REG, + PSP_NUM_IN_REGS, /* number of input registers */ + PSP_INTR_REG = PSP_NUM_IN_REGS, + PSP_STATUS_REG, + PSP_RESP_REG, + PSP_MAX_REGS /* Keep this at the end */ +}; + +struct amdxdna_client; +struct amdxdna_fw_ver; +struct amdxdna_hwctx; +struct amdxdna_sched_job; + +struct psp_config { + const void *fw_buf; + u32 fw_size; + void __iomem *psp_regs[PSP_MAX_REGS]; +}; + +struct aie_version { + u16 major; + u16 minor; +}; + +struct aie_tile_metadata { + u16 row_count; + u16 row_start; + u16 dma_channel_count; + u16 lock_count; + u16 event_reg_count; +}; + +struct aie_metadata { + u32 size; + u16 cols; + u16 rows; + struct aie_version version; + struct aie_tile_metadata core; + struct aie_tile_metadata mem; + struct aie_tile_metadata shim; +}; + +enum rt_config_category { + AIE2_RT_CFG_INIT, + AIE2_RT_CFG_CLK_GATING, +}; + +struct rt_config { + u32 type; + u32 value; + u32 category; +}; + +struct dpm_clk_freq { + u32 npuclk; + u32 hclk; +}; + +/* + * Define the maximum number of pending commands in a hardware context. + * Must be power of 2! + */ +#define HWCTX_MAX_CMDS 4 +#define get_job_idx(seq) ((seq) & (HWCTX_MAX_CMDS - 1)) +struct amdxdna_hwctx_priv { + struct amdxdna_gem_obj *heap; + void *mbox_chann; + + struct drm_gpu_scheduler sched; + struct drm_sched_entity entity; + + struct mutex io_lock; /* protect seq and cmd order */ + struct wait_queue_head job_free_wq; + u32 num_pending; + u64 seq; + struct semaphore job_sem; + bool job_done; + + /* Completed job counter */ + u64 completed; + + struct amdxdna_gem_obj *cmd_buf[HWCTX_MAX_CMDS]; + struct drm_syncobj *syncobj; +}; + +enum aie2_dev_status { + AIE2_DEV_UNINIT, + AIE2_DEV_INIT, + AIE2_DEV_START, +}; + +struct amdxdna_dev_hdl { + struct amdxdna_dev *xdna; + const struct amdxdna_dev_priv *priv; + void __iomem *sram_base; + void __iomem *smu_base; + void __iomem *mbox_base; + struct psp_device *psp_hdl; + + struct xdna_mailbox_chann_res mgmt_x2i; + struct xdna_mailbox_chann_res mgmt_i2x; + u32 mgmt_chan_idx; + u32 mgmt_prot_major; + u32 mgmt_prot_minor; + + u32 total_col; + struct aie_version version; + struct aie_metadata metadata; + + /* power management and clock*/ + enum amdxdna_power_mode_type pw_mode; + u32 dpm_level; + u32 dft_dpm_level; + u32 max_dpm_level; + u32 clk_gating; + u32 npuclk_freq; + u32 hclk_freq; + + /* Mailbox and the management channel */ + struct mailbox *mbox; + struct mailbox_channel *mgmt_chann; + struct async_events *async_events; + + enum aie2_dev_status dev_status; + u32 hwctx_num; +}; + +#define DEFINE_BAR_OFFSET(reg_name, bar, reg_addr) \ + [reg_name] = {bar##_BAR_INDEX, (reg_addr) - bar##_BAR_BASE} + +struct aie2_bar_off_pair { + int bar_idx; + u32 offset; +}; + +struct aie2_hw_ops { + int (*set_dpm)(struct amdxdna_dev_hdl *ndev, u32 dpm_level); +}; + +struct amdxdna_dev_priv { + const char *fw_path; + u64 protocol_major; + u64 protocol_minor; + const struct rt_config *rt_config; + const struct dpm_clk_freq *dpm_clk_tbl; + +#define COL_ALIGN_NONE 0 +#define COL_ALIGN_NATURE 1 + u32 col_align; + u32 mbox_dev_addr; + /* If mbox_size is 0, use BAR size. See MBOX_SIZE macro */ + u32 mbox_size; + u32 sram_dev_addr; + struct aie2_bar_off_pair sram_offs[SRAM_MAX_INDEX]; + struct aie2_bar_off_pair psp_regs_off[PSP_MAX_REGS]; + struct aie2_bar_off_pair smu_regs_off[SMU_MAX_REGS]; + struct aie2_hw_ops hw_ops; +}; + +extern const struct amdxdna_dev_ops aie2_ops; + +int aie2_runtime_cfg(struct amdxdna_dev_hdl *ndev, + enum rt_config_category category, u32 *val); + +/* aie2 npu hw config */ +extern const struct dpm_clk_freq npu1_dpm_clk_table[]; +extern const struct dpm_clk_freq npu4_dpm_clk_table[]; +extern const struct rt_config npu1_default_rt_cfg[]; +extern const struct rt_config npu4_default_rt_cfg[]; + +/* aie2_smu.c */ +int aie2_smu_init(struct amdxdna_dev_hdl *ndev); +void aie2_smu_fini(struct amdxdna_dev_hdl *ndev); +int npu1_set_dpm(struct amdxdna_dev_hdl *ndev, u32 dpm_level); +int npu4_set_dpm(struct amdxdna_dev_hdl *ndev, u32 dpm_level); + +/* aie2_pm.c */ +int aie2_pm_init(struct amdxdna_dev_hdl *ndev); +int aie2_pm_set_mode(struct amdxdna_dev_hdl *ndev, enum amdxdna_power_mode_type target); + +/* aie2_psp.c */ +struct psp_device *aie2m_psp_create(struct drm_device *ddev, struct psp_config *conf); +int aie2_psp_start(struct psp_device *psp); +void aie2_psp_stop(struct psp_device *psp); + +/* aie2_error.c */ +int aie2_error_async_events_alloc(struct amdxdna_dev_hdl *ndev); +void aie2_error_async_events_free(struct amdxdna_dev_hdl *ndev); +int aie2_error_async_events_send(struct amdxdna_dev_hdl *ndev); +int aie2_error_async_msg_thread(void *data); + +/* aie2_message.c */ +int aie2_suspend_fw(struct amdxdna_dev_hdl *ndev); +int aie2_resume_fw(struct amdxdna_dev_hdl *ndev); +int aie2_set_runtime_cfg(struct amdxdna_dev_hdl *ndev, u32 type, u64 value); +int aie2_get_runtime_cfg(struct amdxdna_dev_hdl *ndev, u32 type, u64 *value); +int aie2_assign_mgmt_pasid(struct amdxdna_dev_hdl *ndev, u16 pasid); +int aie2_query_aie_version(struct amdxdna_dev_hdl *ndev, struct aie_version *version); +int aie2_query_aie_metadata(struct amdxdna_dev_hdl *ndev, struct aie_metadata *metadata); +int aie2_query_firmware_version(struct amdxdna_dev_hdl *ndev, + struct amdxdna_fw_ver *fw_ver); +int aie2_create_context(struct amdxdna_dev_hdl *ndev, struct amdxdna_hwctx *hwctx); +int aie2_destroy_context(struct amdxdna_dev_hdl *ndev, struct amdxdna_hwctx *hwctx); +int aie2_map_host_buf(struct amdxdna_dev_hdl *ndev, u32 context_id, u64 addr, u64 size); +int aie2_query_status(struct amdxdna_dev_hdl *ndev, char *buf, u32 size, u32 *cols_filled); +int aie2_register_asyn_event_msg(struct amdxdna_dev_hdl *ndev, dma_addr_t addr, u32 size, + void *handle, int (*cb)(void*, const u32 *, size_t)); +int aie2_config_cu(struct amdxdna_hwctx *hwctx); +int aie2_execbuf(struct amdxdna_hwctx *hwctx, struct amdxdna_sched_job *job, + int (*notify_cb)(void *, const u32 *, size_t)); +int aie2_cmdlist_single_execbuf(struct amdxdna_hwctx *hwctx, + struct amdxdna_sched_job *job, + int (*notify_cb)(void *, const u32 *, size_t)); +int aie2_cmdlist_multi_execbuf(struct amdxdna_hwctx *hwctx, + struct amdxdna_sched_job *job, + int (*notify_cb)(void *, const u32 *, size_t)); +int aie2_sync_bo(struct amdxdna_hwctx *hwctx, struct amdxdna_sched_job *job, + int (*notify_cb)(void *, const u32 *, size_t)); + +/* aie2_hwctx.c */ +int aie2_hwctx_init(struct amdxdna_hwctx *hwctx); +void aie2_hwctx_fini(struct amdxdna_hwctx *hwctx); +int aie2_hwctx_config(struct amdxdna_hwctx *hwctx, u32 type, u64 value, void *buf, u32 size); +void aie2_hwctx_suspend(struct amdxdna_hwctx *hwctx); +void aie2_hwctx_resume(struct amdxdna_hwctx *hwctx); +int aie2_cmd_submit(struct amdxdna_hwctx *hwctx, struct amdxdna_sched_job *job, u64 *seq); +void aie2_hmm_invalidate(struct amdxdna_gem_obj *abo, unsigned long cur_seq); +void aie2_restart_ctx(struct amdxdna_client *client); + +#endif /* _AIE2_PCI_H_ */ diff --git a/drivers/accel/amdxdna/aie2_pm.c b/drivers/accel/amdxdna/aie2_pm.c new file mode 100644 index 0000000000000000000000000000000000000000..426c38fce8482917d15a50afb60bb4852ec5c359 --- /dev/null +++ b/drivers/accel/amdxdna/aie2_pm.c @@ -0,0 +1,108 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2024, Advanced Micro Devices, Inc. + */ + +#include +#include +#include +#include + +#include "aie2_pci.h" +#include "amdxdna_pci_drv.h" + +#define AIE2_CLK_GATING_ENABLE 1 +#define AIE2_CLK_GATING_DISABLE 0 + +static int aie2_pm_set_clk_gating(struct amdxdna_dev_hdl *ndev, u32 val) +{ + int ret; + + ret = aie2_runtime_cfg(ndev, AIE2_RT_CFG_CLK_GATING, &val); + if (ret) + return ret; + + ndev->clk_gating = val; + return 0; +} + +int aie2_pm_init(struct amdxdna_dev_hdl *ndev) +{ + int ret; + + if (ndev->dev_status != AIE2_DEV_UNINIT) { + /* Resume device */ + ret = ndev->priv->hw_ops.set_dpm(ndev, ndev->dpm_level); + if (ret) + return ret; + + ret = aie2_pm_set_clk_gating(ndev, ndev->clk_gating); + if (ret) + return ret; + + return 0; + } + + while (ndev->priv->dpm_clk_tbl[ndev->max_dpm_level].hclk) + ndev->max_dpm_level++; + ndev->max_dpm_level--; + + ret = ndev->priv->hw_ops.set_dpm(ndev, ndev->max_dpm_level); + if (ret) + return ret; + + ret = aie2_pm_set_clk_gating(ndev, AIE2_CLK_GATING_ENABLE); + if (ret) + return ret; + + ndev->pw_mode = POWER_MODE_DEFAULT; + ndev->dft_dpm_level = ndev->max_dpm_level; + + return 0; +} + +int aie2_pm_set_mode(struct amdxdna_dev_hdl *ndev, enum amdxdna_power_mode_type target) +{ + struct amdxdna_dev *xdna = ndev->xdna; + u32 clk_gating, dpm_level; + int ret; + + drm_WARN_ON(&xdna->ddev, !mutex_is_locked(&xdna->dev_lock)); + + if (ndev->pw_mode == target) + return 0; + + switch (target) { + case POWER_MODE_TURBO: + if (ndev->hwctx_num) { + XDNA_ERR(xdna, "Can not set turbo when there is active hwctx"); + return -EINVAL; + } + + clk_gating = AIE2_CLK_GATING_DISABLE; + dpm_level = ndev->max_dpm_level; + break; + case POWER_MODE_HIGH: + clk_gating = AIE2_CLK_GATING_ENABLE; + dpm_level = ndev->max_dpm_level; + break; + case POWER_MODE_DEFAULT: + clk_gating = AIE2_CLK_GATING_ENABLE; + dpm_level = ndev->dft_dpm_level; + break; + default: + return -EOPNOTSUPP; + } + + ret = ndev->priv->hw_ops.set_dpm(ndev, dpm_level); + if (ret) + return ret; + + ret = aie2_pm_set_clk_gating(ndev, clk_gating); + if (ret) + return ret; + + ndev->pw_mode = target; + + return 0; +} diff --git a/drivers/accel/amdxdna/aie2_psp.c b/drivers/accel/amdxdna/aie2_psp.c new file mode 100644 index 0000000000000000000000000000000000000000..dc3a072ce3b6df6d952291fef9620f30a257db5a --- /dev/null +++ b/drivers/accel/amdxdna/aie2_psp.c @@ -0,0 +1,146 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2022-2024, Advanced Micro Devices, Inc. + */ + +#include +#include +#include +#include +#include +#include +#include + +#include "aie2_pci.h" +#include "amdxdna_mailbox.h" +#include "amdxdna_pci_drv.h" + +#define PSP_STATUS_READY BIT(31) + +/* PSP commands */ +#define PSP_VALIDATE 1 +#define PSP_START 2 +#define PSP_RELEASE_TMR 3 + +/* PSP special arguments */ +#define PSP_START_COPY_FW 1 + +/* PSP response error code */ +#define PSP_ERROR_CANCEL 0xFFFF0002 +#define PSP_ERROR_BAD_STATE 0xFFFF0007 + +#define PSP_FW_ALIGN 0x10000 +#define PSP_POLL_INTERVAL 20000 /* us */ +#define PSP_POLL_TIMEOUT 1000000 /* us */ + +#define PSP_REG(p, reg) ((p)->psp_regs[reg]) + +struct psp_device { + struct drm_device *ddev; + struct psp_config conf; + u32 fw_buf_sz; + u64 fw_paddr; + void *fw_buffer; + void __iomem *psp_regs[PSP_MAX_REGS]; +}; + +static int psp_exec(struct psp_device *psp, u32 *reg_vals) +{ + u32 resp_code; + int ret, i; + u32 ready; + + /* Write command and argument registers */ + for (i = 0; i < PSP_NUM_IN_REGS; i++) + writel(reg_vals[i], PSP_REG(psp, i)); + + /* clear and set PSP INTR register to kick off */ + writel(0, PSP_REG(psp, PSP_INTR_REG)); + writel(1, PSP_REG(psp, PSP_INTR_REG)); + + /* PSP should be busy. Wait for ready, so we know task is done. */ + ret = readx_poll_timeout(readl, PSP_REG(psp, PSP_STATUS_REG), ready, + FIELD_GET(PSP_STATUS_READY, ready), + PSP_POLL_INTERVAL, PSP_POLL_TIMEOUT); + if (ret) { + drm_err(psp->ddev, "PSP is not ready, ret 0x%x", ret); + return ret; + } + + resp_code = readl(PSP_REG(psp, PSP_RESP_REG)); + if (resp_code) { + drm_err(psp->ddev, "fw return error 0x%x", resp_code); + return -EIO; + } + + return 0; +} + +void aie2_psp_stop(struct psp_device *psp) +{ + u32 reg_vals[PSP_NUM_IN_REGS] = { PSP_RELEASE_TMR, }; + int ret; + + ret = psp_exec(psp, reg_vals); + if (ret) + drm_err(psp->ddev, "release tmr failed, ret %d", ret); +} + +int aie2_psp_start(struct psp_device *psp) +{ + u32 reg_vals[PSP_NUM_IN_REGS]; + int ret; + + reg_vals[0] = PSP_VALIDATE; + reg_vals[1] = lower_32_bits(psp->fw_paddr); + reg_vals[2] = upper_32_bits(psp->fw_paddr); + reg_vals[3] = psp->fw_buf_sz; + + ret = psp_exec(psp, reg_vals); + if (ret) { + drm_err(psp->ddev, "failed to validate fw, ret %d", ret); + return ret; + } + + memset(reg_vals, 0, sizeof(reg_vals)); + reg_vals[0] = PSP_START; + reg_vals[1] = PSP_START_COPY_FW; + ret = psp_exec(psp, reg_vals); + if (ret) { + drm_err(psp->ddev, "failed to start fw, ret %d", ret); + return ret; + } + + return 0; +} + +struct psp_device *aie2m_psp_create(struct drm_device *ddev, struct psp_config *conf) +{ + struct psp_device *psp; + u64 offset; + + psp = drmm_kzalloc(ddev, sizeof(*psp), GFP_KERNEL); + if (!psp) + return NULL; + + psp->ddev = ddev; + memcpy(psp->psp_regs, conf->psp_regs, sizeof(psp->psp_regs)); + + psp->fw_buf_sz = ALIGN(conf->fw_size, PSP_FW_ALIGN) + PSP_FW_ALIGN; + psp->fw_buffer = drmm_kmalloc(ddev, psp->fw_buf_sz, GFP_KERNEL); + if (!psp->fw_buffer) { + drm_err(ddev, "no memory for fw buffer"); + return NULL; + } + + /* + * AMD Platform Security Processor(PSP) requires host physical + * address to load NPU firmware. + */ + psp->fw_paddr = virt_to_phys(psp->fw_buffer); + offset = ALIGN(psp->fw_paddr, PSP_FW_ALIGN) - psp->fw_paddr; + psp->fw_paddr += offset; + memcpy(psp->fw_buffer + offset, conf->fw_buf, conf->fw_size); + + return psp; +} diff --git a/drivers/accel/amdxdna/aie2_smu.c b/drivers/accel/amdxdna/aie2_smu.c new file mode 100644 index 0000000000000000000000000000000000000000..73388443c676788e17067505cb304ada23b61637 --- /dev/null +++ b/drivers/accel/amdxdna/aie2_smu.c @@ -0,0 +1,134 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2022-2024, Advanced Micro Devices, Inc. + */ + +#include +#include +#include +#include +#include + +#include "aie2_pci.h" +#include "amdxdna_pci_drv.h" + +#define SMU_RESULT_OK 1 + +/* SMU commands */ +#define AIE2_SMU_POWER_ON 0x3 +#define AIE2_SMU_POWER_OFF 0x4 +#define AIE2_SMU_SET_MPNPUCLK_FREQ 0x5 +#define AIE2_SMU_SET_HCLK_FREQ 0x6 +#define AIE2_SMU_SET_SOFT_DPMLEVEL 0x7 +#define AIE2_SMU_SET_HARD_DPMLEVEL 0x8 + +static int aie2_smu_exec(struct amdxdna_dev_hdl *ndev, u32 reg_cmd, + u32 reg_arg, u32 *out) +{ + u32 resp; + int ret; + + writel(0, SMU_REG(ndev, SMU_RESP_REG)); + writel(reg_arg, SMU_REG(ndev, SMU_ARG_REG)); + writel(reg_cmd, SMU_REG(ndev, SMU_CMD_REG)); + + /* Clear and set SMU_INTR_REG to kick off */ + writel(0, SMU_REG(ndev, SMU_INTR_REG)); + writel(1, SMU_REG(ndev, SMU_INTR_REG)); + + ret = readx_poll_timeout(readl, SMU_REG(ndev, SMU_RESP_REG), resp, + resp, AIE2_INTERVAL, AIE2_TIMEOUT); + if (ret) { + XDNA_ERR(ndev->xdna, "smu cmd %d timed out", reg_cmd); + return ret; + } + + if (out) + *out = readl(SMU_REG(ndev, SMU_OUT_REG)); + + if (resp != SMU_RESULT_OK) { + XDNA_ERR(ndev->xdna, "smu cmd %d failed, 0x%x", reg_cmd, resp); + return -EINVAL; + } + + return 0; +} + +int npu1_set_dpm(struct amdxdna_dev_hdl *ndev, u32 dpm_level) +{ + u32 freq; + int ret; + + ret = aie2_smu_exec(ndev, AIE2_SMU_SET_MPNPUCLK_FREQ, + ndev->priv->dpm_clk_tbl[dpm_level].npuclk, &freq); + if (ret) { + XDNA_ERR(ndev->xdna, "Set npu clock to %d failed, ret %d\n", + ndev->priv->dpm_clk_tbl[dpm_level].npuclk, ret); + } + ndev->npuclk_freq = freq; + + ret = aie2_smu_exec(ndev, AIE2_SMU_SET_HCLK_FREQ, + ndev->priv->dpm_clk_tbl[dpm_level].hclk, &freq); + if (ret) { + XDNA_ERR(ndev->xdna, "Set h clock to %d failed, ret %d\n", + ndev->priv->dpm_clk_tbl[dpm_level].hclk, ret); + } + ndev->hclk_freq = freq; + ndev->dpm_level = dpm_level; + + XDNA_DBG(ndev->xdna, "MP-NPU clock %d, H clock %d\n", + ndev->npuclk_freq, ndev->hclk_freq); + + return 0; +} + +int npu4_set_dpm(struct amdxdna_dev_hdl *ndev, u32 dpm_level) +{ + int ret; + + ret = aie2_smu_exec(ndev, AIE2_SMU_SET_HARD_DPMLEVEL, dpm_level, NULL); + if (ret) { + XDNA_ERR(ndev->xdna, "Set hard dpm level %d failed, ret %d ", + dpm_level, ret); + return ret; + } + + ret = aie2_smu_exec(ndev, AIE2_SMU_SET_SOFT_DPMLEVEL, dpm_level, NULL); + if (ret) { + XDNA_ERR(ndev->xdna, "Set soft dpm level %d failed, ret %d", + dpm_level, ret); + return ret; + } + + ndev->npuclk_freq = ndev->priv->dpm_clk_tbl[dpm_level].npuclk; + ndev->hclk_freq = ndev->priv->dpm_clk_tbl[dpm_level].hclk; + ndev->dpm_level = dpm_level; + + XDNA_DBG(ndev->xdna, "MP-NPU clock %d, H clock %d\n", + ndev->npuclk_freq, ndev->hclk_freq); + + return 0; +} + +int aie2_smu_init(struct amdxdna_dev_hdl *ndev) +{ + int ret; + + ret = aie2_smu_exec(ndev, AIE2_SMU_POWER_ON, 0, NULL); + if (ret) { + XDNA_ERR(ndev->xdna, "Power on failed, ret %d", ret); + return ret; + } + + return 0; +} + +void aie2_smu_fini(struct amdxdna_dev_hdl *ndev) +{ + int ret; + + ndev->priv->hw_ops.set_dpm(ndev, 0); + ret = aie2_smu_exec(ndev, AIE2_SMU_POWER_OFF, 0, NULL); + if (ret) + XDNA_ERR(ndev->xdna, "Power off failed, ret %d", ret); +} diff --git a/drivers/accel/amdxdna/aie2_solver.c b/drivers/accel/amdxdna/aie2_solver.c new file mode 100644 index 0000000000000000000000000000000000000000..2013d1f13aae30b88effe0c1d18ba0be15a486ec --- /dev/null +++ b/drivers/accel/amdxdna/aie2_solver.c @@ -0,0 +1,380 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2022-2024, Advanced Micro Devices, Inc. + */ + +#include +#include +#include +#include +#include +#include + +#include "aie2_solver.h" + +struct partition_node { + struct list_head list; + u32 nshared; /* # shared requests */ + u32 start_col; /* start column */ + u32 ncols; /* # columns */ + bool exclusive; /* can not be shared if set */ +}; + +struct solver_node { + struct list_head list; + u64 rid; /* Request ID from consumer */ + + struct partition_node *pt_node; + void *cb_arg; + u32 dpm_level; + u32 cols_len; + u32 start_cols[] __counted_by(cols_len); +}; + +struct solver_rgroup { + u32 rgid; + u32 nnode; + u32 npartition_node; + + DECLARE_BITMAP(resbit, XRS_MAX_COL); + struct list_head node_list; + struct list_head pt_node_list; +}; + +struct solver_state { + struct solver_rgroup rgp; + struct init_config cfg; + struct xrs_action_ops *actions; +}; + +static u32 calculate_gops(struct aie_qos *rqos) +{ + u32 service_rate = 0; + + if (rqos->latency) + service_rate = (1000 / rqos->latency); + + if (rqos->fps > service_rate) + return rqos->fps * rqos->gops; + + return service_rate * rqos->gops; +} + +/* + * qos_meet() - Check the QOS request can be met. + */ +static int qos_meet(struct solver_state *xrs, struct aie_qos *rqos, u32 cgops) +{ + u32 request_gops = calculate_gops(rqos) * xrs->cfg.sys_eff_factor; + + if (request_gops <= cgops) + return 0; + + return -EINVAL; +} + +/* + * sanity_check() - Do a basic sanity check on allocation request. + */ +static int sanity_check(struct solver_state *xrs, struct alloc_requests *req) +{ + struct cdo_parts *cdop = &req->cdo; + struct aie_qos *rqos = &req->rqos; + u32 cu_clk_freq; + + if (cdop->ncols > xrs->cfg.total_col) + return -EINVAL; + + /* + * We can find at least one CDOs groups that meet the + * GOPs requirement. + */ + cu_clk_freq = xrs->cfg.clk_list.cu_clk_list[xrs->cfg.clk_list.num_levels - 1]; + + if (qos_meet(xrs, rqos, cdop->qos_cap.opc * cu_clk_freq / 1000)) + return -EINVAL; + + return 0; +} + +static bool is_valid_qos_dpm_params(struct aie_qos *rqos) +{ + /* + * gops is retrieved from the xmodel, so it's always set + * fps and latency are the configurable params from the application + */ + if (rqos->gops > 0 && (rqos->fps > 0 || rqos->latency > 0)) + return true; + + return false; +} + +static int set_dpm_level(struct solver_state *xrs, struct alloc_requests *req, u32 *dpm_level) +{ + struct solver_rgroup *rgp = &xrs->rgp; + struct cdo_parts *cdop = &req->cdo; + struct aie_qos *rqos = &req->rqos; + u32 freq, max_dpm_level, level; + struct solver_node *node; + + max_dpm_level = xrs->cfg.clk_list.num_levels - 1; + /* If no QoS parameters are passed, set it to the max DPM level */ + if (!is_valid_qos_dpm_params(rqos)) { + level = max_dpm_level; + goto set_dpm; + } + + /* Find one CDO group that meet the GOPs requirement. */ + for (level = 0; level < max_dpm_level; level++) { + freq = xrs->cfg.clk_list.cu_clk_list[level]; + if (!qos_meet(xrs, rqos, cdop->qos_cap.opc * freq / 1000)) + break; + } + + /* set the dpm level which fits all the sessions */ + list_for_each_entry(node, &rgp->node_list, list) { + if (node->dpm_level > level) + level = node->dpm_level; + } + +set_dpm: + *dpm_level = level; + return xrs->cfg.actions->set_dft_dpm_level(xrs->cfg.ddev, level); +} + +static struct solver_node *rg_search_node(struct solver_rgroup *rgp, u64 rid) +{ + struct solver_node *node; + + list_for_each_entry(node, &rgp->node_list, list) { + if (node->rid == rid) + return node; + } + + return NULL; +} + +static void remove_partition_node(struct solver_rgroup *rgp, + struct partition_node *pt_node) +{ + pt_node->nshared--; + if (pt_node->nshared > 0) + return; + + list_del(&pt_node->list); + rgp->npartition_node--; + + bitmap_clear(rgp->resbit, pt_node->start_col, pt_node->ncols); + kfree(pt_node); +} + +static void remove_solver_node(struct solver_rgroup *rgp, + struct solver_node *node) +{ + list_del(&node->list); + rgp->nnode--; + + if (node->pt_node) + remove_partition_node(rgp, node->pt_node); + + kfree(node); +} + +static int get_free_partition(struct solver_state *xrs, + struct solver_node *snode, + struct alloc_requests *req) +{ + struct partition_node *pt_node; + u32 ncols = req->cdo.ncols; + u32 col, i; + + for (i = 0; i < snode->cols_len; i++) { + col = snode->start_cols[i]; + if (find_next_bit(xrs->rgp.resbit, XRS_MAX_COL, col) >= col + ncols) + break; + } + + if (i == snode->cols_len) + return -ENODEV; + + pt_node = kzalloc(sizeof(*pt_node), GFP_KERNEL); + if (!pt_node) + return -ENOMEM; + + pt_node->nshared = 1; + pt_node->start_col = col; + pt_node->ncols = ncols; + + /* + * Always set exclusive to false for now. + */ + pt_node->exclusive = false; + + list_add_tail(&pt_node->list, &xrs->rgp.pt_node_list); + xrs->rgp.npartition_node++; + bitmap_set(xrs->rgp.resbit, pt_node->start_col, pt_node->ncols); + + snode->pt_node = pt_node; + + return 0; +} + +static int allocate_partition(struct solver_state *xrs, + struct solver_node *snode, + struct alloc_requests *req) +{ + struct partition_node *pt_node, *rpt_node = NULL; + int idx, ret; + + ret = get_free_partition(xrs, snode, req); + if (!ret) + return ret; + + /* try to get a share-able partition */ + list_for_each_entry(pt_node, &xrs->rgp.pt_node_list, list) { + if (pt_node->exclusive) + continue; + + if (rpt_node && pt_node->nshared >= rpt_node->nshared) + continue; + + for (idx = 0; idx < snode->cols_len; idx++) { + if (snode->start_cols[idx] != pt_node->start_col) + continue; + + if (req->cdo.ncols != pt_node->ncols) + continue; + + rpt_node = pt_node; + break; + } + } + + if (!rpt_node) + return -ENODEV; + + rpt_node->nshared++; + snode->pt_node = rpt_node; + + return 0; +} + +static struct solver_node *create_solver_node(struct solver_state *xrs, + struct alloc_requests *req) +{ + struct cdo_parts *cdop = &req->cdo; + struct solver_node *node; + int ret; + + node = kzalloc(struct_size(node, start_cols, cdop->cols_len), GFP_KERNEL); + if (!node) + return ERR_PTR(-ENOMEM); + + node->rid = req->rid; + node->cols_len = cdop->cols_len; + memcpy(node->start_cols, cdop->start_cols, cdop->cols_len * sizeof(u32)); + + ret = allocate_partition(xrs, node, req); + if (ret) + goto free_node; + + list_add_tail(&node->list, &xrs->rgp.node_list); + xrs->rgp.nnode++; + return node; + +free_node: + kfree(node); + return ERR_PTR(ret); +} + +static void fill_load_action(struct solver_state *xrs, + struct solver_node *snode, + struct xrs_action_load *action) +{ + action->rid = snode->rid; + action->part.start_col = snode->pt_node->start_col; + action->part.ncols = snode->pt_node->ncols; +} + +int xrs_allocate_resource(void *hdl, struct alloc_requests *req, void *cb_arg) +{ + struct xrs_action_load load_act; + struct solver_node *snode; + struct solver_state *xrs; + u32 dpm_level; + int ret; + + xrs = (struct solver_state *)hdl; + + ret = sanity_check(xrs, req); + if (ret) { + drm_err(xrs->cfg.ddev, "invalid request"); + return ret; + } + + if (rg_search_node(&xrs->rgp, req->rid)) { + drm_err(xrs->cfg.ddev, "rid %lld is in-use", req->rid); + return -EEXIST; + } + + snode = create_solver_node(xrs, req); + if (IS_ERR(snode)) + return PTR_ERR(snode); + + fill_load_action(xrs, snode, &load_act); + ret = xrs->cfg.actions->load(cb_arg, &load_act); + if (ret) + goto free_node; + + ret = set_dpm_level(xrs, req, &dpm_level); + if (ret) + goto free_node; + + snode->dpm_level = dpm_level; + snode->cb_arg = cb_arg; + + drm_dbg(xrs->cfg.ddev, "start col %d ncols %d\n", + snode->pt_node->start_col, snode->pt_node->ncols); + + return 0; + +free_node: + remove_solver_node(&xrs->rgp, snode); + + return ret; +} + +int xrs_release_resource(void *hdl, u64 rid) +{ + struct solver_state *xrs = hdl; + struct solver_node *node; + + node = rg_search_node(&xrs->rgp, rid); + if (!node) { + drm_err(xrs->cfg.ddev, "node not exist"); + return -ENODEV; + } + + xrs->cfg.actions->unload(node->cb_arg); + remove_solver_node(&xrs->rgp, node); + + return 0; +} + +void *xrsm_init(struct init_config *cfg) +{ + struct solver_rgroup *rgp; + struct solver_state *xrs; + + xrs = drmm_kzalloc(cfg->ddev, sizeof(*xrs), GFP_KERNEL); + if (!xrs) + return NULL; + + memcpy(&xrs->cfg, cfg, sizeof(*cfg)); + + rgp = &xrs->rgp; + INIT_LIST_HEAD(&rgp->node_list); + INIT_LIST_HEAD(&rgp->pt_node_list); + + return xrs; +} diff --git a/drivers/accel/amdxdna/aie2_solver.h b/drivers/accel/amdxdna/aie2_solver.h new file mode 100644 index 0000000000000000000000000000000000000000..a2e3c52229e9989b7509d78adc82986cc751766e --- /dev/null +++ b/drivers/accel/amdxdna/aie2_solver.h @@ -0,0 +1,155 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (C) 2023-2024, Advanced Micro Devices, Inc. + */ + +#ifndef _AIE2_SOLVER_H +#define _AIE2_SOLVER_H + +#define XRS_MAX_COL 128 + +/* + * Structure used to describe a partition. A partition is column based + * allocation unit described by its start column and number of columns. + */ +struct aie_part { + u32 start_col; + u32 ncols; +}; + +/* + * The QoS capabilities of a given AIE partition. + */ +struct aie_qos_cap { + u32 opc; /* operations per cycle */ + u32 dma_bw; /* DMA bandwidth */ +}; + +/* + * QoS requirement of a resource allocation. + */ +struct aie_qos { + u32 gops; /* Giga operations */ + u32 fps; /* Frames per second */ + u32 dma_bw; /* DMA bandwidth */ + u32 latency; /* Frame response latency */ + u32 exec_time; /* Frame execution time */ + u32 priority; /* Request priority */ +}; + +/* + * Structure used to describe a relocatable CDO (Configuration Data Object). + */ +struct cdo_parts { + u32 *start_cols; /* Start column array */ + u32 cols_len; /* Length of start column array */ + u32 ncols; /* # of column */ + struct aie_qos_cap qos_cap; /* CDO QoS capabilities */ +}; + +/* + * Structure used to describe a request to allocate. + */ +struct alloc_requests { + u64 rid; + struct cdo_parts cdo; + struct aie_qos rqos; /* Requested QoS */ +}; + +/* + * Load callback argument + */ +struct xrs_action_load { + u32 rid; + struct aie_part part; +}; + +/* + * Define the power level available + * + * POWER_LEVEL_MIN: + * Lowest power level. Usually set when all actions are unloaded. + * + * POWER_LEVEL_n + * Power levels 0 - n, is a step increase in system frequencies + */ +enum power_level { + POWER_LEVEL_MIN = 0x0, + POWER_LEVEL_0 = 0x1, + POWER_LEVEL_1 = 0x2, + POWER_LEVEL_2 = 0x3, + POWER_LEVEL_3 = 0x4, + POWER_LEVEL_4 = 0x5, + POWER_LEVEL_5 = 0x6, + POWER_LEVEL_6 = 0x7, + POWER_LEVEL_7 = 0x8, + POWER_LEVEL_NUM, +}; + +/* + * Structure used to describe the frequency table. + * Resource solver chooses the frequency from the table + * to meet the QOS requirements. + */ +struct clk_list_info { + u32 num_levels; /* available power levels */ + u32 cu_clk_list[POWER_LEVEL_NUM]; /* available aie clock frequencies in Mhz*/ +}; + +struct xrs_action_ops { + int (*load)(void *cb_arg, struct xrs_action_load *action); + int (*unload)(void *cb_arg); + int (*set_dft_dpm_level)(struct drm_device *ddev, u32 level); +}; + +/* + * Structure used to describe information for solver during initialization. + */ +struct init_config { + u32 total_col; + u32 sys_eff_factor; /* system efficiency factor */ + u32 latency_adj; /* latency adjustment in ms */ + struct clk_list_info clk_list; /* List of frequencies available in system */ + struct drm_device *ddev; + struct xrs_action_ops *actions; +}; + +/* + * xrsm_init() - Register resource solver. Resource solver client needs + * to call this function to register itself. + * + * @cfg: The system metrics for resource solver to use + * + * Return: A resource solver handle + * + * Note: We should only create one handle per AIE array to be managed. + */ +void *xrsm_init(struct init_config *cfg); + +/* + * xrs_allocate_resource() - Request to allocate resources for a given context + * and a partition metadata. (See struct part_meta) + * + * @hdl: Resource solver handle obtained from xrs_init() + * @req: Input to the Resource solver including request id + * and partition metadata. + * @cb_arg: callback argument pointer + * + * Return: 0 when successful. + * Or standard error number when failing + * + * Note: + * There is no lock mechanism inside resource solver. So it is + * the caller's responsibility to lock down XCLBINs and grab + * necessary lock. + */ +int xrs_allocate_resource(void *hdl, struct alloc_requests *req, void *cb_arg); + +/* + * xrs_release_resource() - Request to free resources for a given context. + * + * @hdl: Resource solver handle obtained from xrs_init() + * @rid: The Request ID to identify the requesting context + */ +int xrs_release_resource(void *hdl, u64 rid); +#endif /* _AIE2_SOLVER_H */ diff --git a/drivers/accel/amdxdna/amdxdna_ctx.c b/drivers/accel/amdxdna/amdxdna_ctx.c new file mode 100644 index 0000000000000000000000000000000000000000..d11b1c83d9c3bb33fa4c2201783c8fdec4b12bec --- /dev/null +++ b/drivers/accel/amdxdna/amdxdna_ctx.c @@ -0,0 +1,550 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2022-2024, Advanced Micro Devices, Inc. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "amdxdna_ctx.h" +#include "amdxdna_gem.h" +#include "amdxdna_pci_drv.h" + +#define MAX_HWCTX_ID 255 +#define MAX_ARG_COUNT 4095 + +struct amdxdna_fence { + struct dma_fence base; + spinlock_t lock; /* for base */ + struct amdxdna_hwctx *hwctx; +}; + +static const char *amdxdna_fence_get_driver_name(struct dma_fence *fence) +{ + return KBUILD_MODNAME; +} + +static const char *amdxdna_fence_get_timeline_name(struct dma_fence *fence) +{ + struct amdxdna_fence *xdna_fence; + + xdna_fence = container_of(fence, struct amdxdna_fence, base); + + return xdna_fence->hwctx->name; +} + +static const struct dma_fence_ops fence_ops = { + .get_driver_name = amdxdna_fence_get_driver_name, + .get_timeline_name = amdxdna_fence_get_timeline_name, +}; + +static struct dma_fence *amdxdna_fence_create(struct amdxdna_hwctx *hwctx) +{ + struct amdxdna_fence *fence; + + fence = kzalloc(sizeof(*fence), GFP_KERNEL); + if (!fence) + return NULL; + + fence->hwctx = hwctx; + spin_lock_init(&fence->lock); + dma_fence_init(&fence->base, &fence_ops, &fence->lock, hwctx->id, 0); + return &fence->base; +} + +void amdxdna_hwctx_suspend(struct amdxdna_client *client) +{ + struct amdxdna_dev *xdna = client->xdna; + struct amdxdna_hwctx *hwctx; + unsigned long hwctx_id; + + drm_WARN_ON(&xdna->ddev, !mutex_is_locked(&xdna->dev_lock)); + mutex_lock(&client->hwctx_lock); + amdxdna_for_each_hwctx(client, hwctx_id, hwctx) + xdna->dev_info->ops->hwctx_suspend(hwctx); + mutex_unlock(&client->hwctx_lock); +} + +void amdxdna_hwctx_resume(struct amdxdna_client *client) +{ + struct amdxdna_dev *xdna = client->xdna; + struct amdxdna_hwctx *hwctx; + unsigned long hwctx_id; + + drm_WARN_ON(&xdna->ddev, !mutex_is_locked(&xdna->dev_lock)); + mutex_lock(&client->hwctx_lock); + amdxdna_for_each_hwctx(client, hwctx_id, hwctx) + xdna->dev_info->ops->hwctx_resume(hwctx); + mutex_unlock(&client->hwctx_lock); +} + +static void amdxdna_hwctx_destroy_rcu(struct amdxdna_hwctx *hwctx, + struct srcu_struct *ss) +{ + struct amdxdna_dev *xdna = hwctx->client->xdna; + + synchronize_srcu(ss); + + /* At this point, user is not able to submit new commands */ + mutex_lock(&xdna->dev_lock); + xdna->dev_info->ops->hwctx_fini(hwctx); + mutex_unlock(&xdna->dev_lock); + + kfree(hwctx->name); + kfree(hwctx); +} + +void *amdxdna_cmd_get_payload(struct amdxdna_gem_obj *abo, u32 *size) +{ + struct amdxdna_cmd *cmd = abo->mem.kva; + u32 num_masks, count; + + if (amdxdna_cmd_get_op(abo) == ERT_CMD_CHAIN) + num_masks = 0; + else + num_masks = 1 + FIELD_GET(AMDXDNA_CMD_EXTRA_CU_MASK, cmd->header); + + if (size) { + count = FIELD_GET(AMDXDNA_CMD_COUNT, cmd->header); + if (unlikely(count <= num_masks)) { + *size = 0; + return NULL; + } + *size = (count - num_masks) * sizeof(u32); + } + return &cmd->data[num_masks]; +} + +int amdxdna_cmd_get_cu_idx(struct amdxdna_gem_obj *abo) +{ + struct amdxdna_cmd *cmd = abo->mem.kva; + u32 num_masks, i; + u32 *cu_mask; + + if (amdxdna_cmd_get_op(abo) == ERT_CMD_CHAIN) + return -1; + + num_masks = 1 + FIELD_GET(AMDXDNA_CMD_EXTRA_CU_MASK, cmd->header); + cu_mask = cmd->data; + for (i = 0; i < num_masks; i++) { + if (cu_mask[i]) + return ffs(cu_mask[i]) - 1; + } + + return -1; +} + +/* + * This should be called in close() and remove(). DO NOT call in other syscalls. + * This guarantee that when hwctx and resources will be released, if user + * doesn't call amdxdna_drm_destroy_hwctx_ioctl. + */ +void amdxdna_hwctx_remove_all(struct amdxdna_client *client) +{ + struct amdxdna_hwctx *hwctx; + unsigned long hwctx_id; + + mutex_lock(&client->hwctx_lock); + amdxdna_for_each_hwctx(client, hwctx_id, hwctx) { + XDNA_DBG(client->xdna, "PID %d close HW context %d", + client->pid, hwctx->id); + xa_erase(&client->hwctx_xa, hwctx->id); + mutex_unlock(&client->hwctx_lock); + amdxdna_hwctx_destroy_rcu(hwctx, &client->hwctx_srcu); + mutex_lock(&client->hwctx_lock); + } + mutex_unlock(&client->hwctx_lock); +} + +int amdxdna_drm_create_hwctx_ioctl(struct drm_device *dev, void *data, struct drm_file *filp) +{ + struct amdxdna_client *client = filp->driver_priv; + struct amdxdna_drm_create_hwctx *args = data; + struct amdxdna_dev *xdna = to_xdna_dev(dev); + struct amdxdna_hwctx *hwctx; + int ret, idx; + + if (args->ext || args->ext_flags) + return -EINVAL; + + if (!drm_dev_enter(dev, &idx)) + return -ENODEV; + + hwctx = kzalloc(sizeof(*hwctx), GFP_KERNEL); + if (!hwctx) { + ret = -ENOMEM; + goto exit; + } + + if (copy_from_user(&hwctx->qos, u64_to_user_ptr(args->qos_p), sizeof(hwctx->qos))) { + XDNA_ERR(xdna, "Access QoS info failed"); + ret = -EFAULT; + goto free_hwctx; + } + + hwctx->client = client; + hwctx->fw_ctx_id = -1; + hwctx->num_tiles = args->num_tiles; + hwctx->mem_size = args->mem_size; + hwctx->max_opc = args->max_opc; + ret = xa_alloc_cyclic(&client->hwctx_xa, &hwctx->id, hwctx, + XA_LIMIT(AMDXDNA_INVALID_CTX_HANDLE + 1, MAX_HWCTX_ID), + &client->next_hwctxid, GFP_KERNEL); + if (ret < 0) { + XDNA_ERR(xdna, "Allocate hwctx ID failed, ret %d", ret); + goto free_hwctx; + } + + hwctx->name = kasprintf(GFP_KERNEL, "hwctx.%d.%d", client->pid, hwctx->id); + if (!hwctx->name) { + ret = -ENOMEM; + goto rm_id; + } + + mutex_lock(&xdna->dev_lock); + ret = xdna->dev_info->ops->hwctx_init(hwctx); + if (ret) { + mutex_unlock(&xdna->dev_lock); + XDNA_ERR(xdna, "Init hwctx failed, ret %d", ret); + goto free_name; + } + args->handle = hwctx->id; + args->syncobj_handle = hwctx->syncobj_hdl; + mutex_unlock(&xdna->dev_lock); + + XDNA_DBG(xdna, "PID %d create HW context %d, ret %d", client->pid, args->handle, ret); + drm_dev_exit(idx); + return 0; + +free_name: + kfree(hwctx->name); +rm_id: + xa_erase(&client->hwctx_xa, hwctx->id); +free_hwctx: + kfree(hwctx); +exit: + drm_dev_exit(idx); + return ret; +} + +int amdxdna_drm_destroy_hwctx_ioctl(struct drm_device *dev, void *data, struct drm_file *filp) +{ + struct amdxdna_client *client = filp->driver_priv; + struct amdxdna_drm_destroy_hwctx *args = data; + struct amdxdna_dev *xdna = to_xdna_dev(dev); + struct amdxdna_hwctx *hwctx; + int ret = 0, idx; + + if (XDNA_MBZ_DBG(xdna, &args->pad, sizeof(args->pad))) + return -EINVAL; + + if (!drm_dev_enter(dev, &idx)) + return -ENODEV; + + hwctx = xa_erase(&client->hwctx_xa, args->handle); + if (!hwctx) { + ret = -EINVAL; + XDNA_DBG(xdna, "PID %d HW context %d not exist", + client->pid, args->handle); + goto out; + } + + /* + * The pushed jobs are handled by DRM scheduler during destroy. + * SRCU to synchronize with exec command ioctls. + */ + amdxdna_hwctx_destroy_rcu(hwctx, &client->hwctx_srcu); + + XDNA_DBG(xdna, "PID %d destroyed HW context %d", client->pid, args->handle); +out: + drm_dev_exit(idx); + return ret; +} + +int amdxdna_drm_config_hwctx_ioctl(struct drm_device *dev, void *data, struct drm_file *filp) +{ + struct amdxdna_client *client = filp->driver_priv; + struct amdxdna_drm_config_hwctx *args = data; + struct amdxdna_dev *xdna = to_xdna_dev(dev); + struct amdxdna_hwctx *hwctx; + int ret, idx; + u32 buf_size; + void *buf; + u64 val; + + if (XDNA_MBZ_DBG(xdna, &args->pad, sizeof(args->pad))) + return -EINVAL; + + if (!xdna->dev_info->ops->hwctx_config) + return -EOPNOTSUPP; + + val = args->param_val; + buf_size = args->param_val_size; + + switch (args->param_type) { + case DRM_AMDXDNA_HWCTX_CONFIG_CU: + /* For those types that param_val is pointer */ + if (buf_size > PAGE_SIZE) { + XDNA_ERR(xdna, "Config CU param buffer too large"); + return -E2BIG; + } + + /* Hwctx needs to keep buf */ + buf = kzalloc(PAGE_SIZE, GFP_KERNEL); + if (!buf) + return -ENOMEM; + + if (copy_from_user(buf, u64_to_user_ptr(val), buf_size)) { + kfree(buf); + return -EFAULT; + } + + break; + case DRM_AMDXDNA_HWCTX_ASSIGN_DBG_BUF: + case DRM_AMDXDNA_HWCTX_REMOVE_DBG_BUF: + /* For those types that param_val is a value */ + buf = NULL; + buf_size = 0; + break; + default: + XDNA_DBG(xdna, "Unknown HW context config type %d", args->param_type); + return -EINVAL; + } + + mutex_lock(&xdna->dev_lock); + idx = srcu_read_lock(&client->hwctx_srcu); + hwctx = xa_load(&client->hwctx_xa, args->handle); + if (!hwctx) { + XDNA_DBG(xdna, "PID %d failed to get hwctx %d", client->pid, args->handle); + ret = -EINVAL; + goto unlock_srcu; + } + + ret = xdna->dev_info->ops->hwctx_config(hwctx, args->param_type, val, buf, buf_size); + +unlock_srcu: + srcu_read_unlock(&client->hwctx_srcu, idx); + mutex_unlock(&xdna->dev_lock); + kfree(buf); + return ret; +} + +static void +amdxdna_arg_bos_put(struct amdxdna_sched_job *job) +{ + int i; + + for (i = 0; i < job->bo_cnt; i++) { + if (!job->bos[i]) + break; + drm_gem_object_put(job->bos[i]); + } +} + +static int +amdxdna_arg_bos_lookup(struct amdxdna_client *client, + struct amdxdna_sched_job *job, + u32 *bo_hdls, u32 bo_cnt) +{ + struct drm_gem_object *gobj; + int i, ret; + + job->bo_cnt = bo_cnt; + for (i = 0; i < job->bo_cnt; i++) { + struct amdxdna_gem_obj *abo; + + gobj = drm_gem_object_lookup(client->filp, bo_hdls[i]); + if (!gobj) { + ret = -ENOENT; + goto put_shmem_bo; + } + abo = to_xdna_obj(gobj); + + mutex_lock(&abo->lock); + if (abo->pinned) { + mutex_unlock(&abo->lock); + job->bos[i] = gobj; + continue; + } + + ret = amdxdna_gem_pin_nolock(abo); + if (ret) { + mutex_unlock(&abo->lock); + drm_gem_object_put(gobj); + goto put_shmem_bo; + } + abo->pinned = true; + mutex_unlock(&abo->lock); + + job->bos[i] = gobj; + } + + return 0; + +put_shmem_bo: + amdxdna_arg_bos_put(job); + return ret; +} + +void amdxdna_sched_job_cleanup(struct amdxdna_sched_job *job) +{ + trace_amdxdna_debug_point(job->hwctx->name, job->seq, "job release"); + amdxdna_arg_bos_put(job); + amdxdna_gem_put_obj(job->cmd_bo); +} + +int amdxdna_cmd_submit(struct amdxdna_client *client, + u32 cmd_bo_hdl, u32 *arg_bo_hdls, u32 arg_bo_cnt, + u32 hwctx_hdl, u64 *seq) +{ + struct amdxdna_dev *xdna = client->xdna; + struct amdxdna_sched_job *job; + struct amdxdna_hwctx *hwctx; + int ret, idx; + + XDNA_DBG(xdna, "Command BO hdl %d, Arg BO count %d", cmd_bo_hdl, arg_bo_cnt); + job = kzalloc(struct_size(job, bos, arg_bo_cnt), GFP_KERNEL); + if (!job) + return -ENOMEM; + + if (cmd_bo_hdl != AMDXDNA_INVALID_BO_HANDLE) { + job->cmd_bo = amdxdna_gem_get_obj(client, cmd_bo_hdl, AMDXDNA_BO_CMD); + if (!job->cmd_bo) { + XDNA_ERR(xdna, "Failed to get cmd bo from %d", cmd_bo_hdl); + ret = -EINVAL; + goto free_job; + } + } else { + job->cmd_bo = NULL; + } + + ret = amdxdna_arg_bos_lookup(client, job, arg_bo_hdls, arg_bo_cnt); + if (ret) { + XDNA_ERR(xdna, "Argument BOs lookup failed, ret %d", ret); + goto cmd_put; + } + + idx = srcu_read_lock(&client->hwctx_srcu); + hwctx = xa_load(&client->hwctx_xa, hwctx_hdl); + if (!hwctx) { + XDNA_DBG(xdna, "PID %d failed to get hwctx %d", + client->pid, hwctx_hdl); + ret = -EINVAL; + goto unlock_srcu; + } + + if (hwctx->status != HWCTX_STAT_READY) { + XDNA_ERR(xdna, "HW Context is not ready"); + ret = -EINVAL; + goto unlock_srcu; + } + + job->hwctx = hwctx; + job->mm = current->mm; + + job->fence = amdxdna_fence_create(hwctx); + if (!job->fence) { + XDNA_ERR(xdna, "Failed to create fence"); + ret = -ENOMEM; + goto unlock_srcu; + } + kref_init(&job->refcnt); + + ret = xdna->dev_info->ops->cmd_submit(hwctx, job, seq); + if (ret) + goto put_fence; + + /* + * The amdxdna_hwctx_destroy_rcu() will release hwctx and associated + * resource after synchronize_srcu(). The submitted jobs should be + * handled by the queue, for example DRM scheduler, in device layer. + * For here we can unlock SRCU. + */ + srcu_read_unlock(&client->hwctx_srcu, idx); + trace_amdxdna_debug_point(hwctx->name, *seq, "job pushed"); + + return 0; + +put_fence: + dma_fence_put(job->fence); +unlock_srcu: + srcu_read_unlock(&client->hwctx_srcu, idx); + amdxdna_arg_bos_put(job); +cmd_put: + amdxdna_gem_put_obj(job->cmd_bo); +free_job: + kfree(job); + return ret; +} + +/* + * The submit command ioctl submits a command to firmware. One firmware command + * may contain multiple command BOs for processing as a whole. + * The command sequence number is returned which can be used for wait command ioctl. + */ +static int amdxdna_drm_submit_execbuf(struct amdxdna_client *client, + struct amdxdna_drm_exec_cmd *args) +{ + struct amdxdna_dev *xdna = client->xdna; + u32 *arg_bo_hdls; + u32 cmd_bo_hdl; + int ret; + + if (!args->arg_count || args->arg_count > MAX_ARG_COUNT) { + XDNA_ERR(xdna, "Invalid arg bo count %d", args->arg_count); + return -EINVAL; + } + + /* Only support single command for now. */ + if (args->cmd_count != 1) { + XDNA_ERR(xdna, "Invalid cmd bo count %d", args->cmd_count); + return -EINVAL; + } + + cmd_bo_hdl = (u32)args->cmd_handles; + arg_bo_hdls = kcalloc(args->arg_count, sizeof(u32), GFP_KERNEL); + if (!arg_bo_hdls) + return -ENOMEM; + ret = copy_from_user(arg_bo_hdls, u64_to_user_ptr(args->args), + args->arg_count * sizeof(u32)); + if (ret) { + ret = -EFAULT; + goto free_cmd_bo_hdls; + } + + ret = amdxdna_cmd_submit(client, cmd_bo_hdl, arg_bo_hdls, + args->arg_count, args->hwctx, &args->seq); + if (ret) + XDNA_DBG(xdna, "Submit cmds failed, ret %d", ret); + +free_cmd_bo_hdls: + kfree(arg_bo_hdls); + if (!ret) + XDNA_DBG(xdna, "Pushed cmd %lld to scheduler", args->seq); + return ret; +} + +int amdxdna_drm_submit_cmd_ioctl(struct drm_device *dev, void *data, struct drm_file *filp) +{ + struct amdxdna_client *client = filp->driver_priv; + struct amdxdna_drm_exec_cmd *args = data; + + if (args->ext || args->ext_flags) + return -EINVAL; + + switch (args->type) { + case AMDXDNA_CMD_SUBMIT_EXEC_BUF: + return amdxdna_drm_submit_execbuf(client, args); + } + + XDNA_ERR(client->xdna, "Invalid command type %d", args->type); + return -EINVAL; +} diff --git a/drivers/accel/amdxdna/amdxdna_ctx.h b/drivers/accel/amdxdna/amdxdna_ctx.h new file mode 100644 index 0000000000000000000000000000000000000000..80b0304193ec3f695ec03e13b7db7f8d30cd18a9 --- /dev/null +++ b/drivers/accel/amdxdna/amdxdna_ctx.h @@ -0,0 +1,162 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (C) 2022-2024, Advanced Micro Devices, Inc. + */ + +#ifndef _AMDXDNA_CTX_H_ +#define _AMDXDNA_CTX_H_ + +#include + +#include "amdxdna_gem.h" + +struct amdxdna_hwctx_priv; + +enum ert_cmd_opcode { + ERT_START_CU = 0, + ERT_CMD_CHAIN = 19, + ERT_START_NPU = 20, +}; + +enum ert_cmd_state { + ERT_CMD_STATE_INVALID, + ERT_CMD_STATE_NEW, + ERT_CMD_STATE_QUEUED, + ERT_CMD_STATE_RUNNING, + ERT_CMD_STATE_COMPLETED, + ERT_CMD_STATE_ERROR, + ERT_CMD_STATE_ABORT, + ERT_CMD_STATE_SUBMITTED, + ERT_CMD_STATE_TIMEOUT, + ERT_CMD_STATE_NORESPONSE, +}; + +/* + * Interpretation of the beginning of data payload for ERT_START_NPU in + * amdxdna_cmd. The rest of the payload in amdxdna_cmd is regular kernel args. + */ +struct amdxdna_cmd_start_npu { + u64 buffer; /* instruction buffer address */ + u32 buffer_size; /* size of buffer in bytes */ + u32 prop_count; /* properties count */ + u32 prop_args[]; /* properties and regular kernel arguments */ +}; + +/* + * Interpretation of the beginning of data payload for ERT_CMD_CHAIN in + * amdxdna_cmd. The rest of the payload in amdxdna_cmd is cmd BO handles. + */ +struct amdxdna_cmd_chain { + u32 command_count; + u32 submit_index; + u32 error_index; + u32 reserved[3]; + u64 data[] __counted_by(command_count); +}; + +/* Exec buffer command header format */ +#define AMDXDNA_CMD_STATE GENMASK(3, 0) +#define AMDXDNA_CMD_EXTRA_CU_MASK GENMASK(11, 10) +#define AMDXDNA_CMD_COUNT GENMASK(22, 12) +#define AMDXDNA_CMD_OPCODE GENMASK(27, 23) +struct amdxdna_cmd { + u32 header; + u32 data[]; +}; + +struct amdxdna_hwctx { + struct amdxdna_client *client; + struct amdxdna_hwctx_priv *priv; + char *name; + + u32 id; + u32 max_opc; + u32 num_tiles; + u32 mem_size; + u32 fw_ctx_id; + u32 col_list_len; + u32 *col_list; + u32 start_col; + u32 num_col; +#define HWCTX_STAT_INIT 0 +#define HWCTX_STAT_READY 1 +#define HWCTX_STAT_STOP 2 + u32 status; + u32 old_status; + + struct amdxdna_qos_info qos; + struct amdxdna_hwctx_param_config_cu *cus; + u32 syncobj_hdl; +}; + +#define drm_job_to_xdna_job(j) \ + container_of(j, struct amdxdna_sched_job, base) + +struct amdxdna_sched_job { + struct drm_sched_job base; + struct kref refcnt; + struct amdxdna_hwctx *hwctx; + struct mm_struct *mm; + /* The fence to notice DRM scheduler that job is done by hardware */ + struct dma_fence *fence; + /* user can wait on this fence */ + struct dma_fence *out_fence; + bool job_done; + u64 seq; + struct amdxdna_gem_obj *cmd_bo; + size_t bo_cnt; + struct drm_gem_object *bos[] __counted_by(bo_cnt); +}; + +static inline u32 +amdxdna_cmd_get_op(struct amdxdna_gem_obj *abo) +{ + struct amdxdna_cmd *cmd = abo->mem.kva; + + return FIELD_GET(AMDXDNA_CMD_OPCODE, cmd->header); +} + +static inline void +amdxdna_cmd_set_state(struct amdxdna_gem_obj *abo, enum ert_cmd_state s) +{ + struct amdxdna_cmd *cmd = abo->mem.kva; + + cmd->header &= ~AMDXDNA_CMD_STATE; + cmd->header |= FIELD_PREP(AMDXDNA_CMD_STATE, s); +} + +static inline enum ert_cmd_state +amdxdna_cmd_get_state(struct amdxdna_gem_obj *abo) +{ + struct amdxdna_cmd *cmd = abo->mem.kva; + + return FIELD_GET(AMDXDNA_CMD_STATE, cmd->header); +} + +void *amdxdna_cmd_get_payload(struct amdxdna_gem_obj *abo, u32 *size); +int amdxdna_cmd_get_cu_idx(struct amdxdna_gem_obj *abo); + +static inline u32 amdxdna_hwctx_col_map(struct amdxdna_hwctx *hwctx) +{ + return GENMASK(hwctx->start_col + hwctx->num_col - 1, + hwctx->start_col); +} + +void amdxdna_sched_job_cleanup(struct amdxdna_sched_job *job); +void amdxdna_hwctx_remove_all(struct amdxdna_client *client); +void amdxdna_hwctx_suspend(struct amdxdna_client *client); +void amdxdna_hwctx_resume(struct amdxdna_client *client); + +int amdxdna_cmd_submit(struct amdxdna_client *client, + u32 cmd_bo_hdls, u32 *arg_bo_hdls, u32 arg_bo_cnt, + u32 hwctx_hdl, u64 *seq); + +int amdxdna_cmd_wait(struct amdxdna_client *client, u32 hwctx_hdl, + u64 seq, u32 timeout); + +int amdxdna_drm_create_hwctx_ioctl(struct drm_device *dev, void *data, struct drm_file *filp); +int amdxdna_drm_config_hwctx_ioctl(struct drm_device *dev, void *data, struct drm_file *filp); +int amdxdna_drm_destroy_hwctx_ioctl(struct drm_device *dev, void *data, struct drm_file *filp); +int amdxdna_drm_submit_cmd_ioctl(struct drm_device *dev, void *data, struct drm_file *filp); + +#endif /* _AMDXDNA_CTX_H_ */ diff --git a/drivers/accel/amdxdna/amdxdna_gem.c b/drivers/accel/amdxdna/amdxdna_gem.c new file mode 100644 index 0000000000000000000000000000000000000000..606433d73236cd7a21e6b06bfebfca5bb801768e --- /dev/null +++ b/drivers/accel/amdxdna/amdxdna_gem.c @@ -0,0 +1,622 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2024, Advanced Micro Devices, Inc. + */ + +#include +#include +#include +#include +#include +#include +#include +#include + +#include "amdxdna_ctx.h" +#include "amdxdna_gem.h" +#include "amdxdna_pci_drv.h" + +#define XDNA_MAX_CMD_BO_SIZE SZ_32K + +static int +amdxdna_gem_insert_node_locked(struct amdxdna_gem_obj *abo, bool use_vmap) +{ + struct amdxdna_client *client = abo->client; + struct amdxdna_dev *xdna = client->xdna; + struct amdxdna_mem *mem = &abo->mem; + u64 offset; + u32 align; + int ret; + + align = 1 << max(PAGE_SHIFT, xdna->dev_info->dev_mem_buf_shift); + ret = drm_mm_insert_node_generic(&abo->dev_heap->mm, &abo->mm_node, + mem->size, align, + 0, DRM_MM_INSERT_BEST); + if (ret) { + XDNA_ERR(xdna, "Failed to alloc dev bo memory, ret %d", ret); + return ret; + } + + mem->dev_addr = abo->mm_node.start; + offset = mem->dev_addr - abo->dev_heap->mem.dev_addr; + mem->userptr = abo->dev_heap->mem.userptr + offset; + mem->pages = &abo->dev_heap->base.pages[offset >> PAGE_SHIFT]; + mem->nr_pages = mem->size >> PAGE_SHIFT; + + if (use_vmap) { + mem->kva = vmap(mem->pages, mem->nr_pages, VM_MAP, PAGE_KERNEL); + if (!mem->kva) { + XDNA_ERR(xdna, "Failed to vmap"); + drm_mm_remove_node(&abo->mm_node); + return -EFAULT; + } + } + + return 0; +} + +static void amdxdna_gem_obj_free(struct drm_gem_object *gobj) +{ + struct amdxdna_dev *xdna = to_xdna_dev(gobj->dev); + struct amdxdna_gem_obj *abo = to_xdna_obj(gobj); + struct iosys_map map = IOSYS_MAP_INIT_VADDR(abo->mem.kva); + + XDNA_DBG(xdna, "BO type %d xdna_addr 0x%llx", abo->type, abo->mem.dev_addr); + if (abo->pinned) + amdxdna_gem_unpin(abo); + + if (abo->type == AMDXDNA_BO_DEV) { + mutex_lock(&abo->client->mm_lock); + drm_mm_remove_node(&abo->mm_node); + mutex_unlock(&abo->client->mm_lock); + + vunmap(abo->mem.kva); + drm_gem_object_put(to_gobj(abo->dev_heap)); + drm_gem_object_release(gobj); + mutex_destroy(&abo->lock); + kfree(abo); + return; + } + + if (abo->type == AMDXDNA_BO_DEV_HEAP) + drm_mm_takedown(&abo->mm); + + drm_gem_vunmap_unlocked(gobj, &map); + mutex_destroy(&abo->lock); + drm_gem_shmem_free(&abo->base); +} + +static const struct drm_gem_object_funcs amdxdna_gem_dev_obj_funcs = { + .free = amdxdna_gem_obj_free, +}; + +static bool amdxdna_hmm_invalidate(struct mmu_interval_notifier *mni, + const struct mmu_notifier_range *range, + unsigned long cur_seq) +{ + struct amdxdna_gem_obj *abo = container_of(mni, struct amdxdna_gem_obj, + mem.notifier); + struct amdxdna_dev *xdna = to_xdna_dev(to_gobj(abo)->dev); + + XDNA_DBG(xdna, "Invalid range 0x%llx, 0x%lx, type %d", + abo->mem.userptr, abo->mem.size, abo->type); + + if (!mmu_notifier_range_blockable(range)) + return false; + + xdna->dev_info->ops->hmm_invalidate(abo, cur_seq); + + return true; +} + +static const struct mmu_interval_notifier_ops amdxdna_hmm_ops = { + .invalidate = amdxdna_hmm_invalidate, +}; + +static void amdxdna_hmm_unregister(struct amdxdna_gem_obj *abo) +{ + struct amdxdna_dev *xdna = to_xdna_dev(to_gobj(abo)->dev); + + if (!xdna->dev_info->ops->hmm_invalidate) + return; + + mmu_interval_notifier_remove(&abo->mem.notifier); + kvfree(abo->mem.pfns); + abo->mem.pfns = NULL; +} + +static int amdxdna_hmm_register(struct amdxdna_gem_obj *abo, unsigned long addr, + size_t len) +{ + struct amdxdna_dev *xdna = to_xdna_dev(to_gobj(abo)->dev); + u32 nr_pages; + int ret; + + if (!xdna->dev_info->ops->hmm_invalidate) + return 0; + + if (abo->mem.pfns) + return -EEXIST; + + nr_pages = (PAGE_ALIGN(addr + len) - (addr & PAGE_MASK)) >> PAGE_SHIFT; + abo->mem.pfns = kvcalloc(nr_pages, sizeof(*abo->mem.pfns), + GFP_KERNEL); + if (!abo->mem.pfns) + return -ENOMEM; + + ret = mmu_interval_notifier_insert_locked(&abo->mem.notifier, + current->mm, + addr, + len, + &amdxdna_hmm_ops); + if (ret) { + XDNA_ERR(xdna, "Insert mmu notifier failed, ret %d", ret); + kvfree(abo->mem.pfns); + } + abo->mem.userptr = addr; + + return ret; +} + +static int amdxdna_gem_obj_mmap(struct drm_gem_object *gobj, + struct vm_area_struct *vma) +{ + struct amdxdna_gem_obj *abo = to_xdna_obj(gobj); + unsigned long num_pages; + int ret; + + ret = amdxdna_hmm_register(abo, vma->vm_start, gobj->size); + if (ret) + return ret; + + ret = drm_gem_shmem_mmap(&abo->base, vma); + if (ret) + goto hmm_unreg; + + num_pages = gobj->size >> PAGE_SHIFT; + /* Try to insert the pages */ + vm_flags_mod(vma, VM_MIXEDMAP, VM_PFNMAP); + ret = vm_insert_pages(vma, vma->vm_start, abo->base.pages, &num_pages); + if (ret) + XDNA_ERR(abo->client->xdna, "Failed insert pages, ret %d", ret); + + return 0; + +hmm_unreg: + amdxdna_hmm_unregister(abo); + return ret; +} + +static vm_fault_t amdxdna_gem_vm_fault(struct vm_fault *vmf) +{ + return drm_gem_shmem_vm_ops.fault(vmf); +} + +static void amdxdna_gem_vm_open(struct vm_area_struct *vma) +{ + drm_gem_shmem_vm_ops.open(vma); +} + +static void amdxdna_gem_vm_close(struct vm_area_struct *vma) +{ + struct drm_gem_object *gobj = vma->vm_private_data; + + amdxdna_hmm_unregister(to_xdna_obj(gobj)); + drm_gem_shmem_vm_ops.close(vma); +} + +static const struct vm_operations_struct amdxdna_gem_vm_ops = { + .fault = amdxdna_gem_vm_fault, + .open = amdxdna_gem_vm_open, + .close = amdxdna_gem_vm_close, +}; + +static const struct drm_gem_object_funcs amdxdna_gem_shmem_funcs = { + .free = amdxdna_gem_obj_free, + .print_info = drm_gem_shmem_object_print_info, + .pin = drm_gem_shmem_object_pin, + .unpin = drm_gem_shmem_object_unpin, + .get_sg_table = drm_gem_shmem_object_get_sg_table, + .vmap = drm_gem_shmem_object_vmap, + .vunmap = drm_gem_shmem_object_vunmap, + .mmap = amdxdna_gem_obj_mmap, + .vm_ops = &amdxdna_gem_vm_ops, +}; + +static struct amdxdna_gem_obj * +amdxdna_gem_create_obj(struct drm_device *dev, size_t size) +{ + struct amdxdna_gem_obj *abo; + + abo = kzalloc(sizeof(*abo), GFP_KERNEL); + if (!abo) + return ERR_PTR(-ENOMEM); + + abo->pinned = false; + abo->assigned_hwctx = AMDXDNA_INVALID_CTX_HANDLE; + mutex_init(&abo->lock); + + abo->mem.userptr = AMDXDNA_INVALID_ADDR; + abo->mem.dev_addr = AMDXDNA_INVALID_ADDR; + abo->mem.size = size; + + return abo; +} + +/* For drm_driver->gem_create_object callback */ +struct drm_gem_object * +amdxdna_gem_create_object_cb(struct drm_device *dev, size_t size) +{ + struct amdxdna_gem_obj *abo; + + abo = amdxdna_gem_create_obj(dev, size); + if (IS_ERR(abo)) + return ERR_CAST(abo); + + to_gobj(abo)->funcs = &amdxdna_gem_shmem_funcs; + + return to_gobj(abo); +} + +static struct amdxdna_gem_obj * +amdxdna_drm_alloc_shmem(struct drm_device *dev, + struct amdxdna_drm_create_bo *args, + struct drm_file *filp) +{ + struct amdxdna_client *client = filp->driver_priv; + struct drm_gem_shmem_object *shmem; + struct amdxdna_gem_obj *abo; + + shmem = drm_gem_shmem_create(dev, args->size); + if (IS_ERR(shmem)) + return ERR_CAST(shmem); + + shmem->map_wc = false; + + abo = to_xdna_obj(&shmem->base); + abo->client = client; + abo->type = AMDXDNA_BO_SHMEM; + + return abo; +} + +static struct amdxdna_gem_obj * +amdxdna_drm_create_dev_heap(struct drm_device *dev, + struct amdxdna_drm_create_bo *args, + struct drm_file *filp) +{ + struct amdxdna_client *client = filp->driver_priv; + struct amdxdna_dev *xdna = to_xdna_dev(dev); + struct drm_gem_shmem_object *shmem; + struct amdxdna_gem_obj *abo; + int ret; + + if (args->size > xdna->dev_info->dev_mem_size) { + XDNA_DBG(xdna, "Invalid dev heap size 0x%llx, limit 0x%lx", + args->size, xdna->dev_info->dev_mem_size); + return ERR_PTR(-EINVAL); + } + + mutex_lock(&client->mm_lock); + if (client->dev_heap) { + XDNA_DBG(client->xdna, "dev heap is already created"); + ret = -EBUSY; + goto mm_unlock; + } + + shmem = drm_gem_shmem_create(dev, args->size); + if (IS_ERR(shmem)) { + ret = PTR_ERR(shmem); + goto mm_unlock; + } + + shmem->map_wc = false; + abo = to_xdna_obj(&shmem->base); + + abo->type = AMDXDNA_BO_DEV_HEAP; + abo->client = client; + abo->mem.dev_addr = client->xdna->dev_info->dev_mem_base; + drm_mm_init(&abo->mm, abo->mem.dev_addr, abo->mem.size); + + client->dev_heap = abo; + drm_gem_object_get(to_gobj(abo)); + mutex_unlock(&client->mm_lock); + + return abo; + +mm_unlock: + mutex_unlock(&client->mm_lock); + return ERR_PTR(ret); +} + +struct amdxdna_gem_obj * +amdxdna_drm_alloc_dev_bo(struct drm_device *dev, + struct amdxdna_drm_create_bo *args, + struct drm_file *filp, bool use_vmap) +{ + struct amdxdna_client *client = filp->driver_priv; + struct amdxdna_dev *xdna = to_xdna_dev(dev); + size_t aligned_sz = PAGE_ALIGN(args->size); + struct amdxdna_gem_obj *abo, *heap; + int ret; + + mutex_lock(&client->mm_lock); + heap = client->dev_heap; + if (!heap) { + ret = -EINVAL; + goto mm_unlock; + } + + if (heap->mem.userptr == AMDXDNA_INVALID_ADDR) { + XDNA_ERR(xdna, "Invalid dev heap userptr"); + ret = -EINVAL; + goto mm_unlock; + } + + if (args->size > heap->mem.size) { + XDNA_ERR(xdna, "Invalid dev bo size 0x%llx, limit 0x%lx", + args->size, heap->mem.size); + ret = -EINVAL; + goto mm_unlock; + } + + abo = amdxdna_gem_create_obj(&xdna->ddev, aligned_sz); + if (IS_ERR(abo)) { + ret = PTR_ERR(abo); + goto mm_unlock; + } + to_gobj(abo)->funcs = &amdxdna_gem_dev_obj_funcs; + abo->type = AMDXDNA_BO_DEV; + abo->client = client; + abo->dev_heap = heap; + ret = amdxdna_gem_insert_node_locked(abo, use_vmap); + if (ret) { + XDNA_ERR(xdna, "Failed to alloc dev bo memory, ret %d", ret); + goto mm_unlock; + } + + drm_gem_object_get(to_gobj(heap)); + drm_gem_private_object_init(&xdna->ddev, to_gobj(abo), aligned_sz); + + mutex_unlock(&client->mm_lock); + return abo; + +mm_unlock: + mutex_unlock(&client->mm_lock); + return ERR_PTR(ret); +} + +static struct amdxdna_gem_obj * +amdxdna_drm_create_cmd_bo(struct drm_device *dev, + struct amdxdna_drm_create_bo *args, + struct drm_file *filp) +{ + struct amdxdna_dev *xdna = to_xdna_dev(dev); + struct drm_gem_shmem_object *shmem; + struct amdxdna_gem_obj *abo; + struct iosys_map map; + int ret; + + if (args->size > XDNA_MAX_CMD_BO_SIZE) { + XDNA_ERR(xdna, "Command bo size 0x%llx too large", args->size); + return ERR_PTR(-EINVAL); + } + + if (args->size < sizeof(struct amdxdna_cmd)) { + XDNA_DBG(xdna, "Command BO size 0x%llx too small", args->size); + return ERR_PTR(-EINVAL); + } + + shmem = drm_gem_shmem_create(dev, args->size); + if (IS_ERR(shmem)) + return ERR_CAST(shmem); + + shmem->map_wc = false; + abo = to_xdna_obj(&shmem->base); + + abo->type = AMDXDNA_BO_CMD; + abo->client = filp->driver_priv; + + ret = drm_gem_vmap_unlocked(to_gobj(abo), &map); + if (ret) { + XDNA_ERR(xdna, "Vmap cmd bo failed, ret %d", ret); + goto release_obj; + } + abo->mem.kva = map.vaddr; + + return abo; + +release_obj: + drm_gem_shmem_free(shmem); + return ERR_PTR(ret); +} + +int amdxdna_drm_create_bo_ioctl(struct drm_device *dev, void *data, struct drm_file *filp) +{ + struct amdxdna_dev *xdna = to_xdna_dev(dev); + struct amdxdna_drm_create_bo *args = data; + struct amdxdna_gem_obj *abo; + int ret; + + if (args->flags || args->vaddr || !args->size) + return -EINVAL; + + XDNA_DBG(xdna, "BO arg type %d vaddr 0x%llx size 0x%llx flags 0x%llx", + args->type, args->vaddr, args->size, args->flags); + switch (args->type) { + case AMDXDNA_BO_SHMEM: + abo = amdxdna_drm_alloc_shmem(dev, args, filp); + break; + case AMDXDNA_BO_DEV_HEAP: + abo = amdxdna_drm_create_dev_heap(dev, args, filp); + break; + case AMDXDNA_BO_DEV: + abo = amdxdna_drm_alloc_dev_bo(dev, args, filp, false); + break; + case AMDXDNA_BO_CMD: + abo = amdxdna_drm_create_cmd_bo(dev, args, filp); + break; + default: + return -EINVAL; + } + if (IS_ERR(abo)) + return PTR_ERR(abo); + + /* ready to publish object to userspace */ + ret = drm_gem_handle_create(filp, to_gobj(abo), &args->handle); + if (ret) { + XDNA_ERR(xdna, "Create handle failed"); + goto put_obj; + } + + XDNA_DBG(xdna, "BO hdl %d type %d userptr 0x%llx xdna_addr 0x%llx size 0x%lx", + args->handle, args->type, abo->mem.userptr, + abo->mem.dev_addr, abo->mem.size); +put_obj: + /* Dereference object reference. Handle holds it now. */ + drm_gem_object_put(to_gobj(abo)); + return ret; +} + +int amdxdna_gem_pin_nolock(struct amdxdna_gem_obj *abo) +{ + struct amdxdna_dev *xdna = to_xdna_dev(to_gobj(abo)->dev); + int ret; + + switch (abo->type) { + case AMDXDNA_BO_SHMEM: + case AMDXDNA_BO_DEV_HEAP: + ret = drm_gem_shmem_pin(&abo->base); + break; + case AMDXDNA_BO_DEV: + ret = drm_gem_shmem_pin(&abo->dev_heap->base); + break; + default: + ret = -EOPNOTSUPP; + } + + XDNA_DBG(xdna, "BO type %d ret %d", abo->type, ret); + return ret; +} + +int amdxdna_gem_pin(struct amdxdna_gem_obj *abo) +{ + int ret; + + if (abo->type == AMDXDNA_BO_DEV) + abo = abo->dev_heap; + + mutex_lock(&abo->lock); + ret = amdxdna_gem_pin_nolock(abo); + mutex_unlock(&abo->lock); + + return ret; +} + +void amdxdna_gem_unpin(struct amdxdna_gem_obj *abo) +{ + if (abo->type == AMDXDNA_BO_DEV) + abo = abo->dev_heap; + + mutex_lock(&abo->lock); + drm_gem_shmem_unpin(&abo->base); + mutex_unlock(&abo->lock); +} + +struct amdxdna_gem_obj *amdxdna_gem_get_obj(struct amdxdna_client *client, + u32 bo_hdl, u8 bo_type) +{ + struct amdxdna_dev *xdna = client->xdna; + struct amdxdna_gem_obj *abo; + struct drm_gem_object *gobj; + + gobj = drm_gem_object_lookup(client->filp, bo_hdl); + if (!gobj) { + XDNA_DBG(xdna, "Can not find bo %d", bo_hdl); + return NULL; + } + + abo = to_xdna_obj(gobj); + if (bo_type == AMDXDNA_BO_INVALID || abo->type == bo_type) + return abo; + + drm_gem_object_put(gobj); + return NULL; +} + +int amdxdna_drm_get_bo_info_ioctl(struct drm_device *dev, void *data, struct drm_file *filp) +{ + struct amdxdna_drm_get_bo_info *args = data; + struct amdxdna_dev *xdna = to_xdna_dev(dev); + struct amdxdna_gem_obj *abo; + struct drm_gem_object *gobj; + int ret = 0; + + if (args->ext || args->ext_flags || args->pad) + return -EINVAL; + + gobj = drm_gem_object_lookup(filp, args->handle); + if (!gobj) { + XDNA_DBG(xdna, "Lookup GEM object %d failed", args->handle); + return -ENOENT; + } + + abo = to_xdna_obj(gobj); + args->vaddr = abo->mem.userptr; + args->xdna_addr = abo->mem.dev_addr; + + if (abo->type != AMDXDNA_BO_DEV) + args->map_offset = drm_vma_node_offset_addr(&gobj->vma_node); + else + args->map_offset = AMDXDNA_INVALID_ADDR; + + XDNA_DBG(xdna, "BO hdl %d map_offset 0x%llx vaddr 0x%llx xdna_addr 0x%llx", + args->handle, args->map_offset, args->vaddr, args->xdna_addr); + + drm_gem_object_put(gobj); + return ret; +} + +/* + * The sync bo ioctl is to make sure the CPU cache is in sync with memory. + * This is required because NPU is not cache coherent device. CPU cache + * flushing/invalidation is expensive so it is best to handle this outside + * of the command submission path. This ioctl allows explicit cache + * flushing/invalidation outside of the critical path. + */ +int amdxdna_drm_sync_bo_ioctl(struct drm_device *dev, + void *data, struct drm_file *filp) +{ + struct amdxdna_dev *xdna = to_xdna_dev(dev); + struct amdxdna_drm_sync_bo *args = data; + struct amdxdna_gem_obj *abo; + struct drm_gem_object *gobj; + int ret; + + gobj = drm_gem_object_lookup(filp, args->handle); + if (!gobj) { + XDNA_ERR(xdna, "Lookup GEM object failed"); + return -ENOENT; + } + abo = to_xdna_obj(gobj); + + ret = amdxdna_gem_pin(abo); + if (ret) { + XDNA_ERR(xdna, "Pin BO %d failed, ret %d", args->handle, ret); + goto put_obj; + } + + if (abo->type == AMDXDNA_BO_DEV) + drm_clflush_pages(abo->mem.pages, abo->mem.nr_pages); + else + drm_clflush_pages(abo->base.pages, gobj->size >> PAGE_SHIFT); + + amdxdna_gem_unpin(abo); + + XDNA_DBG(xdna, "Sync bo %d offset 0x%llx, size 0x%llx\n", + args->handle, args->offset, args->size); + +put_obj: + drm_gem_object_put(gobj); + return ret; +} diff --git a/drivers/accel/amdxdna/amdxdna_gem.h b/drivers/accel/amdxdna/amdxdna_gem.h new file mode 100644 index 0000000000000000000000000000000000000000..8ccc0375dd9da0e3a98599fedbdca4b0ffd525b6 --- /dev/null +++ b/drivers/accel/amdxdna/amdxdna_gem.h @@ -0,0 +1,65 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (C) 2024, Advanced Micro Devices, Inc. + */ + +#ifndef _AMDXDNA_GEM_H_ +#define _AMDXDNA_GEM_H_ + +struct amdxdna_mem { + u64 userptr; + void *kva; + u64 dev_addr; + size_t size; + struct page **pages; + u32 nr_pages; + struct mmu_interval_notifier notifier; + unsigned long *pfns; + bool map_invalid; +}; + +struct amdxdna_gem_obj { + struct drm_gem_shmem_object base; + struct amdxdna_client *client; + u8 type; + bool pinned; + struct mutex lock; /* Protects: pinned */ + struct amdxdna_mem mem; + + /* Below members is uninitialized when needed */ + struct drm_mm mm; /* For AMDXDNA_BO_DEV_HEAP */ + struct amdxdna_gem_obj *dev_heap; /* For AMDXDNA_BO_DEV */ + struct drm_mm_node mm_node; /* For AMDXDNA_BO_DEV */ + u32 assigned_hwctx; +}; + +#define to_gobj(obj) (&(obj)->base.base) + +static inline struct amdxdna_gem_obj *to_xdna_obj(struct drm_gem_object *gobj) +{ + return container_of(gobj, struct amdxdna_gem_obj, base.base); +} + +struct amdxdna_gem_obj *amdxdna_gem_get_obj(struct amdxdna_client *client, + u32 bo_hdl, u8 bo_type); +static inline void amdxdna_gem_put_obj(struct amdxdna_gem_obj *abo) +{ + drm_gem_object_put(to_gobj(abo)); +} + +struct drm_gem_object * +amdxdna_gem_create_object_cb(struct drm_device *dev, size_t size); +struct amdxdna_gem_obj * +amdxdna_drm_alloc_dev_bo(struct drm_device *dev, + struct amdxdna_drm_create_bo *args, + struct drm_file *filp, bool use_vmap); + +int amdxdna_gem_pin_nolock(struct amdxdna_gem_obj *abo); +int amdxdna_gem_pin(struct amdxdna_gem_obj *abo); +void amdxdna_gem_unpin(struct amdxdna_gem_obj *abo); + +int amdxdna_drm_create_bo_ioctl(struct drm_device *dev, void *data, struct drm_file *filp); +int amdxdna_drm_get_bo_info_ioctl(struct drm_device *dev, void *data, struct drm_file *filp); +int amdxdna_drm_sync_bo_ioctl(struct drm_device *dev, void *data, struct drm_file *filp); + +#endif /* _AMDXDNA_GEM_H_ */ diff --git a/drivers/accel/amdxdna/amdxdna_mailbox.c b/drivers/accel/amdxdna/amdxdna_mailbox.c new file mode 100644 index 0000000000000000000000000000000000000000..1afc8079e3d1bc6c9123b7bfb3bca125557820b2 --- /dev/null +++ b/drivers/accel/amdxdna/amdxdna_mailbox.c @@ -0,0 +1,561 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2022-2024, Advanced Micro Devices, Inc. + */ + +#include +#include +#include +#include +#include +#include + +#define CREATE_TRACE_POINTS +#include + +#include "amdxdna_mailbox.h" + +#define MB_ERR(chann, fmt, args...) \ +({ \ + typeof(chann) _chann = chann; \ + dev_err((_chann)->mb->dev, "xdna_mailbox.%d: "fmt, \ + (_chann)->msix_irq, ##args); \ +}) +#define MB_DBG(chann, fmt, args...) \ +({ \ + typeof(chann) _chann = chann; \ + dev_dbg((_chann)->mb->dev, "xdna_mailbox.%d: "fmt, \ + (_chann)->msix_irq, ##args); \ +}) +#define MB_WARN_ONCE(chann, fmt, args...) \ +({ \ + typeof(chann) _chann = chann; \ + dev_warn_once((_chann)->mb->dev, "xdna_mailbox.%d: "fmt, \ + (_chann)->msix_irq, ##args); \ +}) + +#define MAGIC_VAL 0x1D000000U +#define MAGIC_VAL_MASK 0xFF000000 +#define MAX_MSG_ID_ENTRIES 256 +#define MSG_RX_TIMER 200 /* milliseconds */ +#define MAILBOX_NAME "xdna_mailbox" + +enum channel_res_type { + CHAN_RES_X2I, + CHAN_RES_I2X, + CHAN_RES_NUM +}; + +struct mailbox { + struct device *dev; + struct xdna_mailbox_res res; +}; + +struct mailbox_channel { + struct mailbox *mb; + struct xdna_mailbox_chann_res res[CHAN_RES_NUM]; + int msix_irq; + u32 iohub_int_addr; + struct xarray chan_xa; + u32 next_msgid; + u32 x2i_tail; + + /* Received msg related fields */ + struct workqueue_struct *work_q; + struct work_struct rx_work; + u32 i2x_head; + bool bad_state; +}; + +#define MSG_BODY_SZ GENMASK(10, 0) +#define MSG_PROTO_VER GENMASK(23, 16) +struct xdna_msg_header { + __u32 total_size; + __u32 sz_ver; + __u32 id; + __u32 opcode; +} __packed; + +static_assert(sizeof(struct xdna_msg_header) == 16); + +struct mailbox_pkg { + struct xdna_msg_header header; + __u32 payload[]; +}; + +/* The protocol version. */ +#define MSG_PROTOCOL_VERSION 0x1 +/* The tombstone value. */ +#define TOMBSTONE 0xDEADFACE + +struct mailbox_msg { + void *handle; + int (*notify_cb)(void *handle, const u32 *data, size_t size); + size_t pkg_size; /* package size in bytes */ + struct mailbox_pkg pkg; +}; + +static void mailbox_reg_write(struct mailbox_channel *mb_chann, u32 mbox_reg, u32 data) +{ + struct xdna_mailbox_res *mb_res = &mb_chann->mb->res; + u64 ringbuf_addr = mb_res->mbox_base + mbox_reg; + + writel(data, (void *)ringbuf_addr); +} + +static u32 mailbox_reg_read(struct mailbox_channel *mb_chann, u32 mbox_reg) +{ + struct xdna_mailbox_res *mb_res = &mb_chann->mb->res; + u64 ringbuf_addr = mb_res->mbox_base + mbox_reg; + + return readl((void *)ringbuf_addr); +} + +static int mailbox_reg_read_non_zero(struct mailbox_channel *mb_chann, u32 mbox_reg, u32 *val) +{ + struct xdna_mailbox_res *mb_res = &mb_chann->mb->res; + u64 ringbuf_addr = mb_res->mbox_base + mbox_reg; + int ret, value; + + /* Poll till value is not zero */ + ret = readx_poll_timeout(readl, (void *)ringbuf_addr, value, + value, 1 /* us */, 100); + if (ret < 0) + return ret; + + *val = value; + return 0; +} + +static inline void +mailbox_set_headptr(struct mailbox_channel *mb_chann, u32 headptr_val) +{ + mailbox_reg_write(mb_chann, mb_chann->res[CHAN_RES_I2X].mb_head_ptr_reg, headptr_val); + mb_chann->i2x_head = headptr_val; +} + +static inline void +mailbox_set_tailptr(struct mailbox_channel *mb_chann, u32 tailptr_val) +{ + mailbox_reg_write(mb_chann, mb_chann->res[CHAN_RES_X2I].mb_tail_ptr_reg, tailptr_val); + mb_chann->x2i_tail = tailptr_val; +} + +static inline u32 +mailbox_get_headptr(struct mailbox_channel *mb_chann, enum channel_res_type type) +{ + return mailbox_reg_read(mb_chann, mb_chann->res[type].mb_head_ptr_reg); +} + +static inline u32 +mailbox_get_tailptr(struct mailbox_channel *mb_chann, enum channel_res_type type) +{ + return mailbox_reg_read(mb_chann, mb_chann->res[type].mb_tail_ptr_reg); +} + +static inline u32 +mailbox_get_ringbuf_size(struct mailbox_channel *mb_chann, enum channel_res_type type) +{ + return mb_chann->res[type].rb_size; +} + +static inline int mailbox_validate_msgid(int msg_id) +{ + return (msg_id & MAGIC_VAL_MASK) == MAGIC_VAL; +} + +static int mailbox_acquire_msgid(struct mailbox_channel *mb_chann, struct mailbox_msg *mb_msg) +{ + u32 msg_id; + int ret; + + ret = xa_alloc_cyclic_irq(&mb_chann->chan_xa, &msg_id, mb_msg, + XA_LIMIT(0, MAX_MSG_ID_ENTRIES - 1), + &mb_chann->next_msgid, GFP_NOWAIT); + if (ret < 0) + return ret; + + /* + * Add MAGIC_VAL to the higher bits. + */ + msg_id |= MAGIC_VAL; + return msg_id; +} + +static void mailbox_release_msgid(struct mailbox_channel *mb_chann, int msg_id) +{ + msg_id &= ~MAGIC_VAL_MASK; + xa_erase_irq(&mb_chann->chan_xa, msg_id); +} + +static void mailbox_release_msg(struct mailbox_channel *mb_chann, + struct mailbox_msg *mb_msg) +{ + MB_DBG(mb_chann, "msg_id 0x%x msg opcode 0x%x", + mb_msg->pkg.header.id, mb_msg->pkg.header.opcode); + mb_msg->notify_cb(mb_msg->handle, NULL, 0); + kfree(mb_msg); +} + +static int +mailbox_send_msg(struct mailbox_channel *mb_chann, struct mailbox_msg *mb_msg) +{ + u32 ringbuf_size; + u32 head, tail; + u32 start_addr; + u64 write_addr; + u32 tmp_tail; + + head = mailbox_get_headptr(mb_chann, CHAN_RES_X2I); + tail = mb_chann->x2i_tail; + ringbuf_size = mailbox_get_ringbuf_size(mb_chann, CHAN_RES_X2I); + start_addr = mb_chann->res[CHAN_RES_X2I].rb_start_addr; + tmp_tail = tail + mb_msg->pkg_size; + + if (tail < head && tmp_tail >= head) + goto no_space; + + if (tail >= head && (tmp_tail > ringbuf_size - sizeof(u32) && + mb_msg->pkg_size >= head)) + goto no_space; + + if (tail >= head && tmp_tail > ringbuf_size - sizeof(u32)) { + write_addr = mb_chann->mb->res.ringbuf_base + start_addr + tail; + writel(TOMBSTONE, (void *)write_addr); + + /* tombstone is set. Write from the start of the ringbuf */ + tail = 0; + } + + write_addr = mb_chann->mb->res.ringbuf_base + start_addr + tail; + memcpy_toio((void *)write_addr, &mb_msg->pkg, mb_msg->pkg_size); + mailbox_set_tailptr(mb_chann, tail + mb_msg->pkg_size); + + trace_mbox_set_tail(MAILBOX_NAME, mb_chann->msix_irq, + mb_msg->pkg.header.opcode, + mb_msg->pkg.header.id); + + return 0; + +no_space: + return -ENOSPC; +} + +static int +mailbox_get_resp(struct mailbox_channel *mb_chann, struct xdna_msg_header *header, + void *data) +{ + struct mailbox_msg *mb_msg; + int msg_id; + int ret; + + msg_id = header->id; + if (!mailbox_validate_msgid(msg_id)) { + MB_ERR(mb_chann, "Bad message ID 0x%x", msg_id); + return -EINVAL; + } + + msg_id &= ~MAGIC_VAL_MASK; + mb_msg = xa_erase_irq(&mb_chann->chan_xa, msg_id); + if (!mb_msg) { + MB_ERR(mb_chann, "Cannot find msg 0x%x", msg_id); + return -EINVAL; + } + + MB_DBG(mb_chann, "opcode 0x%x size %d id 0x%x", + header->opcode, header->total_size, header->id); + ret = mb_msg->notify_cb(mb_msg->handle, data, header->total_size); + if (unlikely(ret)) + MB_ERR(mb_chann, "Message callback ret %d", ret); + + kfree(mb_msg); + return ret; +} + +static int mailbox_get_msg(struct mailbox_channel *mb_chann) +{ + struct xdna_msg_header header; + u32 msg_size, rest; + u32 ringbuf_size; + u32 head, tail; + u32 start_addr; + u64 read_addr; + int ret; + + if (mailbox_reg_read_non_zero(mb_chann, mb_chann->res[CHAN_RES_I2X].mb_tail_ptr_reg, &tail)) + return -EINVAL; + head = mb_chann->i2x_head; + ringbuf_size = mailbox_get_ringbuf_size(mb_chann, CHAN_RES_I2X); + start_addr = mb_chann->res[CHAN_RES_I2X].rb_start_addr; + + if (unlikely(tail > ringbuf_size || !IS_ALIGNED(tail, 4))) { + MB_WARN_ONCE(mb_chann, "Invalid tail 0x%x", tail); + return -EINVAL; + } + + /* ringbuf empty */ + if (head == tail) + return -ENOENT; + + if (head == ringbuf_size) + head = 0; + + /* Peek size of the message or TOMBSTONE */ + read_addr = mb_chann->mb->res.ringbuf_base + start_addr + head; + header.total_size = readl((void *)read_addr); + /* size is TOMBSTONE, set next read from 0 */ + if (header.total_size == TOMBSTONE) { + if (head < tail) { + MB_WARN_ONCE(mb_chann, "Tombstone, head 0x%x tail 0x%x", + head, tail); + return -EINVAL; + } + mailbox_set_headptr(mb_chann, 0); + return 0; + } + + if (unlikely(!header.total_size || !IS_ALIGNED(header.total_size, 4))) { + MB_WARN_ONCE(mb_chann, "Invalid total size 0x%x", header.total_size); + return -EINVAL; + } + msg_size = sizeof(header) + header.total_size; + + if (msg_size > ringbuf_size - head || msg_size > tail - head) { + MB_WARN_ONCE(mb_chann, "Invalid message size %d, tail %d, head %d", + msg_size, tail, head); + return -EINVAL; + } + + rest = sizeof(header) - sizeof(u32); + read_addr += sizeof(u32); + memcpy_fromio((u32 *)&header + 1, (void *)read_addr, rest); + read_addr += rest; + + ret = mailbox_get_resp(mb_chann, &header, (u32 *)read_addr); + + mailbox_set_headptr(mb_chann, head + msg_size); + /* After update head, it can equal to ringbuf_size. This is expected. */ + trace_mbox_set_head(MAILBOX_NAME, mb_chann->msix_irq, + header.opcode, header.id); + + return ret; +} + +static irqreturn_t mailbox_irq_handler(int irq, void *p) +{ + struct mailbox_channel *mb_chann = p; + + trace_mbox_irq_handle(MAILBOX_NAME, irq); + /* Schedule a rx_work to call the callback functions */ + queue_work(mb_chann->work_q, &mb_chann->rx_work); + /* Clear IOHUB register */ + mailbox_reg_write(mb_chann, mb_chann->iohub_int_addr, 0); + + return IRQ_HANDLED; +} + +static void mailbox_rx_worker(struct work_struct *rx_work) +{ + struct mailbox_channel *mb_chann; + int ret; + + mb_chann = container_of(rx_work, struct mailbox_channel, rx_work); + + if (READ_ONCE(mb_chann->bad_state)) { + MB_ERR(mb_chann, "Channel in bad state, work aborted"); + return; + } + + while (1) { + /* + * If return is 0, keep consuming next message, until there is + * no messages or an error happened. + */ + ret = mailbox_get_msg(mb_chann); + if (ret == -ENOENT) + break; + + /* Other error means device doesn't look good, disable irq. */ + if (unlikely(ret)) { + MB_ERR(mb_chann, "Unexpected ret %d, disable irq", ret); + WRITE_ONCE(mb_chann->bad_state, true); + disable_irq(mb_chann->msix_irq); + break; + } + } +} + +int xdna_mailbox_send_msg(struct mailbox_channel *mb_chann, + const struct xdna_mailbox_msg *msg, u64 tx_timeout) +{ + struct xdna_msg_header *header; + struct mailbox_msg *mb_msg; + size_t pkg_size; + int ret; + + pkg_size = sizeof(*header) + msg->send_size; + if (pkg_size > mailbox_get_ringbuf_size(mb_chann, CHAN_RES_X2I)) { + MB_ERR(mb_chann, "Message size larger than ringbuf size"); + return -EINVAL; + } + + if (unlikely(!IS_ALIGNED(msg->send_size, 4))) { + MB_ERR(mb_chann, "Message must be 4 bytes align"); + return -EINVAL; + } + + /* The fist word in payload can NOT be TOMBSTONE */ + if (unlikely(((u32 *)msg->send_data)[0] == TOMBSTONE)) { + MB_ERR(mb_chann, "Tomb stone in data"); + return -EINVAL; + } + + if (READ_ONCE(mb_chann->bad_state)) { + MB_ERR(mb_chann, "Channel in bad state"); + return -EPIPE; + } + + mb_msg = kzalloc(sizeof(*mb_msg) + pkg_size, GFP_KERNEL); + if (!mb_msg) + return -ENOMEM; + + mb_msg->handle = msg->handle; + mb_msg->notify_cb = msg->notify_cb; + mb_msg->pkg_size = pkg_size; + + header = &mb_msg->pkg.header; + /* + * Hardware use total_size and size to split huge message. + * We do not support it here. Thus the values are the same. + */ + header->total_size = msg->send_size; + header->sz_ver = FIELD_PREP(MSG_BODY_SZ, msg->send_size) | + FIELD_PREP(MSG_PROTO_VER, MSG_PROTOCOL_VERSION); + header->opcode = msg->opcode; + memcpy(mb_msg->pkg.payload, msg->send_data, msg->send_size); + + ret = mailbox_acquire_msgid(mb_chann, mb_msg); + if (unlikely(ret < 0)) { + MB_ERR(mb_chann, "mailbox_acquire_msgid failed"); + goto msg_id_failed; + } + header->id = ret; + + MB_DBG(mb_chann, "opcode 0x%x size %d id 0x%x", + header->opcode, header->total_size, header->id); + + ret = mailbox_send_msg(mb_chann, mb_msg); + if (ret) { + MB_DBG(mb_chann, "Error in mailbox send msg, ret %d", ret); + goto release_id; + } + + return 0; + +release_id: + mailbox_release_msgid(mb_chann, header->id); +msg_id_failed: + kfree(mb_msg); + return ret; +} + +struct mailbox_channel * +xdna_mailbox_create_channel(struct mailbox *mb, + const struct xdna_mailbox_chann_res *x2i, + const struct xdna_mailbox_chann_res *i2x, + u32 iohub_int_addr, + int mb_irq) +{ + struct mailbox_channel *mb_chann; + int ret; + + if (!is_power_of_2(x2i->rb_size) || !is_power_of_2(i2x->rb_size)) { + pr_err("Ring buf size must be power of 2"); + return NULL; + } + + mb_chann = kzalloc(sizeof(*mb_chann), GFP_KERNEL); + if (!mb_chann) + return NULL; + + mb_chann->mb = mb; + mb_chann->msix_irq = mb_irq; + mb_chann->iohub_int_addr = iohub_int_addr; + memcpy(&mb_chann->res[CHAN_RES_X2I], x2i, sizeof(*x2i)); + memcpy(&mb_chann->res[CHAN_RES_I2X], i2x, sizeof(*i2x)); + + xa_init_flags(&mb_chann->chan_xa, XA_FLAGS_ALLOC | XA_FLAGS_LOCK_IRQ); + mb_chann->x2i_tail = mailbox_get_tailptr(mb_chann, CHAN_RES_X2I); + mb_chann->i2x_head = mailbox_get_headptr(mb_chann, CHAN_RES_I2X); + + INIT_WORK(&mb_chann->rx_work, mailbox_rx_worker); + mb_chann->work_q = create_singlethread_workqueue(MAILBOX_NAME); + if (!mb_chann->work_q) { + MB_ERR(mb_chann, "Create workqueue failed"); + goto free_and_out; + } + + /* Everything look good. Time to enable irq handler */ + ret = request_irq(mb_irq, mailbox_irq_handler, 0, MAILBOX_NAME, mb_chann); + if (ret) { + MB_ERR(mb_chann, "Failed to request irq %d ret %d", mb_irq, ret); + goto destroy_wq; + } + + mb_chann->bad_state = false; + + MB_DBG(mb_chann, "Mailbox channel created (irq: %d)", mb_chann->msix_irq); + return mb_chann; + +destroy_wq: + destroy_workqueue(mb_chann->work_q); +free_and_out: + kfree(mb_chann); + return NULL; +} + +int xdna_mailbox_destroy_channel(struct mailbox_channel *mb_chann) +{ + struct mailbox_msg *mb_msg; + unsigned long msg_id; + + MB_DBG(mb_chann, "IRQ disabled and RX work cancelled"); + free_irq(mb_chann->msix_irq, mb_chann); + destroy_workqueue(mb_chann->work_q); + /* We can clean up and release resources */ + + xa_for_each(&mb_chann->chan_xa, msg_id, mb_msg) + mailbox_release_msg(mb_chann, mb_msg); + + xa_destroy(&mb_chann->chan_xa); + + MB_DBG(mb_chann, "Mailbox channel destroyed, irq: %d", mb_chann->msix_irq); + kfree(mb_chann); + return 0; +} + +void xdna_mailbox_stop_channel(struct mailbox_channel *mb_chann) +{ + /* Disable an irq and wait. This might sleep. */ + disable_irq(mb_chann->msix_irq); + + /* Cancel RX work and wait for it to finish */ + cancel_work_sync(&mb_chann->rx_work); + MB_DBG(mb_chann, "IRQ disabled and RX work cancelled"); +} + +struct mailbox *xdnam_mailbox_create(struct drm_device *ddev, + const struct xdna_mailbox_res *res) +{ + struct mailbox *mb; + + mb = drmm_kzalloc(ddev, sizeof(*mb), GFP_KERNEL); + if (!mb) + return NULL; + mb->dev = ddev->dev; + + /* mailbox and ring buf base and size information */ + memcpy(&mb->res, res, sizeof(*res)); + + return mb; +} diff --git a/drivers/accel/amdxdna/amdxdna_mailbox.h b/drivers/accel/amdxdna/amdxdna_mailbox.h new file mode 100644 index 0000000000000000000000000000000000000000..6ab7f5424633bb43c013c94df13e0dba24555728 --- /dev/null +++ b/drivers/accel/amdxdna/amdxdna_mailbox.h @@ -0,0 +1,124 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (C) 2022-2024, Advanced Micro Devices, Inc. + */ + +#ifndef _AIE2_MAILBOX_H_ +#define _AIE2_MAILBOX_H_ + +struct mailbox; +struct mailbox_channel; + +/* + * xdna_mailbox_msg - message struct + * + * @opcode: opcode for firmware + * @handle: handle used for the notify callback + * @notify_cb: callback function to notify the sender when there is response + * @send_data: pointing to sending data + * @send_size: size of the sending data + * + * The mailbox will split the sending data in to multiple firmware message if + * the size of the data is too big. This is transparent to the sender. The + * sender will receive one notification. + */ +struct xdna_mailbox_msg { + u32 opcode; + void *handle; + int (*notify_cb)(void *handle, const u32 *data, size_t size); + u8 *send_data; + size_t send_size; +}; + +/* + * xdna_mailbox_res - mailbox hardware resource + * + * @ringbuf_base: ring buffer base address + * @ringbuf_size: ring buffer size + * @mbox_base: mailbox base address + * @mbox_size: mailbox size + */ +struct xdna_mailbox_res { + u64 ringbuf_base; + size_t ringbuf_size; + u64 mbox_base; + size_t mbox_size; + const char *name; +}; + +/* + * xdna_mailbox_chann_res - resources + * + * @rb_start_addr: ring buffer start address + * @rb_size: ring buffer size + * @mb_head_ptr_reg: mailbox head pointer register + * @mb_tail_ptr_reg: mailbox tail pointer register + */ +struct xdna_mailbox_chann_res { + u32 rb_start_addr; + u32 rb_size; + u32 mb_head_ptr_reg; + u32 mb_tail_ptr_reg; +}; + +/* + * xdna_mailbox_create() -- create mailbox subsystem and initialize + * + * @ddev: device pointer + * @res: SRAM and mailbox resources + * + * Return: If success, return a handle of mailbox subsystem. + * Otherwise, return NULL pointer. + */ +struct mailbox *xdnam_mailbox_create(struct drm_device *ddev, + const struct xdna_mailbox_res *res); + +/* + * xdna_mailbox_create_channel() -- Create a mailbox channel instance + * + * @mailbox: the handle return from xdna_mailbox_create() + * @x2i: host to firmware mailbox resources + * @i2x: firmware to host mailbox resources + * @xdna_mailbox_intr_reg: register addr of MSI-X interrupt + * @mb_irq: Linux IRQ number associated with mailbox MSI-X interrupt vector index + * + * Return: If success, return a handle of mailbox channel. Otherwise, return NULL. + */ +struct mailbox_channel * +xdna_mailbox_create_channel(struct mailbox *mailbox, + const struct xdna_mailbox_chann_res *x2i, + const struct xdna_mailbox_chann_res *i2x, + u32 xdna_mailbox_intr_reg, + int mb_irq); + +/* + * xdna_mailbox_destroy_channel() -- destroy mailbox channel + * + * @mailbox_chann: the handle return from xdna_mailbox_create_channel() + * + * Return: if success, return 0. otherwise return error code + */ +int xdna_mailbox_destroy_channel(struct mailbox_channel *mailbox_chann); + +/* + * xdna_mailbox_stop_channel() -- stop mailbox channel + * + * @mailbox_chann: the handle return from xdna_mailbox_create_channel() + * + * Return: if success, return 0. otherwise return error code + */ +void xdna_mailbox_stop_channel(struct mailbox_channel *mailbox_chann); + +/* + * xdna_mailbox_send_msg() -- Send a message + * + * @mailbox_chann: Mailbox channel handle + * @msg: message struct for message information + * @tx_timeout: the timeout value for sending the message in ms. + * + * Return: If success return 0, otherwise, return error code + */ +int xdna_mailbox_send_msg(struct mailbox_channel *mailbox_chann, + const struct xdna_mailbox_msg *msg, u64 tx_timeout); + +#endif /* _AIE2_MAILBOX_ */ diff --git a/drivers/accel/amdxdna/amdxdna_mailbox_helper.c b/drivers/accel/amdxdna/amdxdna_mailbox_helper.c new file mode 100644 index 0000000000000000000000000000000000000000..5139a9c96a91619d7abebe1f385daa555d8286e3 --- /dev/null +++ b/drivers/accel/amdxdna/amdxdna_mailbox_helper.c @@ -0,0 +1,61 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2024, Advanced Micro Devices, Inc. + */ + +#include +#include +#include +#include +#include +#include +#include + +#include "amdxdna_gem.h" +#include "amdxdna_mailbox.h" +#include "amdxdna_mailbox_helper.h" +#include "amdxdna_pci_drv.h" + +int xdna_msg_cb(void *handle, const u32 *data, size_t size) +{ + struct xdna_notify *cb_arg = handle; + int ret; + + if (unlikely(!data)) + goto out; + + if (unlikely(cb_arg->size != size)) { + cb_arg->error = -EINVAL; + goto out; + } + + print_hex_dump_debug("resp data: ", DUMP_PREFIX_OFFSET, + 16, 4, data, cb_arg->size, true); + memcpy(cb_arg->data, data, cb_arg->size); +out: + ret = cb_arg->error; + complete(&cb_arg->comp); + return ret; +} + +int xdna_send_msg_wait(struct amdxdna_dev *xdna, struct mailbox_channel *chann, + struct xdna_mailbox_msg *msg) +{ + struct xdna_notify *hdl = msg->handle; + int ret; + + ret = xdna_mailbox_send_msg(chann, msg, TX_TIMEOUT); + if (ret) { + XDNA_ERR(xdna, "Send message failed, ret %d", ret); + return ret; + } + + ret = wait_for_completion_timeout(&hdl->comp, + msecs_to_jiffies(RX_TIMEOUT)); + if (!ret) { + XDNA_ERR(xdna, "Wait for completion timeout"); + return -ETIME; + } + + return hdl->error; +} diff --git a/drivers/accel/amdxdna/amdxdna_mailbox_helper.h b/drivers/accel/amdxdna/amdxdna_mailbox_helper.h new file mode 100644 index 0000000000000000000000000000000000000000..23e1317b79fe65fe7a44e6a4e8e95f39383b80b5 --- /dev/null +++ b/drivers/accel/amdxdna/amdxdna_mailbox_helper.h @@ -0,0 +1,42 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (C) 2023-2024, Advanced Micro Devices, Inc. + */ + +#ifndef _AMDXDNA_MAILBOX_HELPER_H +#define _AMDXDNA_MAILBOX_HELPER_H + +#define TX_TIMEOUT 2000 /* milliseconds */ +#define RX_TIMEOUT 5000 /* milliseconds */ + +struct amdxdna_dev; + +struct xdna_notify { + struct completion comp; + u32 *data; + size_t size; + int error; +}; + +#define DECLARE_XDNA_MSG_COMMON(name, op, status) \ + struct name##_req req = { 0 }; \ + struct name##_resp resp = { status }; \ + struct xdna_notify hdl = { \ + .error = 0, \ + .data = (u32 *)&resp, \ + .size = sizeof(resp), \ + .comp = COMPLETION_INITIALIZER_ONSTACK(hdl.comp), \ + }; \ + struct xdna_mailbox_msg msg = { \ + .send_data = (u8 *)&req, \ + .send_size = sizeof(req), \ + .handle = &hdl, \ + .opcode = op, \ + .notify_cb = xdna_msg_cb, \ + } + +int xdna_msg_cb(void *handle, const u32 *data, size_t size); +int xdna_send_msg_wait(struct amdxdna_dev *xdna, struct mailbox_channel *chann, + struct xdna_mailbox_msg *msg); + +#endif /* _AMDXDNA_MAILBOX_HELPER_H */ diff --git a/drivers/accel/amdxdna/amdxdna_pci_drv.c b/drivers/accel/amdxdna/amdxdna_pci_drv.c new file mode 100644 index 0000000000000000000000000000000000000000..194e44fc243d191e3d831f12b890d3040e19ed17 --- /dev/null +++ b/drivers/accel/amdxdna/amdxdna_pci_drv.c @@ -0,0 +1,429 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2022-2024, Advanced Micro Devices, Inc. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "amdxdna_ctx.h" +#include "amdxdna_gem.h" +#include "amdxdna_pci_drv.h" + +#define AMDXDNA_AUTOSUSPEND_DELAY 5000 /* milliseconds */ + +/* + * Bind the driver base on (vendor_id, device_id) pair and later use the + * (device_id, rev_id) pair as a key to select the devices. The devices with + * same device_id have very similar interface to host driver. + */ +static const struct pci_device_id pci_ids[] = { + { PCI_DEVICE(PCI_VENDOR_ID_AMD, 0x1502) }, + { PCI_DEVICE(PCI_VENDOR_ID_AMD, 0x17f0) }, + {0} +}; + +MODULE_DEVICE_TABLE(pci, pci_ids); + +static const struct amdxdna_device_id amdxdna_ids[] = { + { 0x1502, 0x0, &dev_npu1_info }, + { 0x17f0, 0x0, &dev_npu2_info }, + { 0x17f0, 0x10, &dev_npu4_info }, + { 0x17f0, 0x11, &dev_npu5_info }, + { 0x17f0, 0x20, &dev_npu6_info }, + {0} +}; + +static int amdxdna_drm_open(struct drm_device *ddev, struct drm_file *filp) +{ + struct amdxdna_dev *xdna = to_xdna_dev(ddev); + struct amdxdna_client *client; + int ret; + + ret = pm_runtime_resume_and_get(ddev->dev); + if (ret) { + XDNA_ERR(xdna, "Failed to get rpm, ret %d", ret); + return ret; + } + + client = kzalloc(sizeof(*client), GFP_KERNEL); + if (!client) { + ret = -ENOMEM; + goto put_rpm; + } + + client->pid = pid_nr(filp->pid); + client->xdna = xdna; + + client->sva = iommu_sva_bind_device(xdna->ddev.dev, current->mm); + if (IS_ERR(client->sva)) { + ret = PTR_ERR(client->sva); + XDNA_ERR(xdna, "SVA bind device failed, ret %d", ret); + goto failed; + } + client->pasid = iommu_sva_get_pasid(client->sva); + if (client->pasid == IOMMU_PASID_INVALID) { + XDNA_ERR(xdna, "SVA get pasid failed"); + ret = -ENODEV; + goto unbind_sva; + } + mutex_init(&client->hwctx_lock); + init_srcu_struct(&client->hwctx_srcu); + xa_init_flags(&client->hwctx_xa, XA_FLAGS_ALLOC); + mutex_init(&client->mm_lock); + + mutex_lock(&xdna->dev_lock); + list_add_tail(&client->node, &xdna->client_list); + mutex_unlock(&xdna->dev_lock); + + filp->driver_priv = client; + client->filp = filp; + + XDNA_DBG(xdna, "pid %d opened", client->pid); + return 0; + +unbind_sva: + iommu_sva_unbind_device(client->sva); +failed: + kfree(client); +put_rpm: + pm_runtime_mark_last_busy(ddev->dev); + pm_runtime_put_autosuspend(ddev->dev); + + return ret; +} + +static void amdxdna_drm_close(struct drm_device *ddev, struct drm_file *filp) +{ + struct amdxdna_client *client = filp->driver_priv; + struct amdxdna_dev *xdna = to_xdna_dev(ddev); + + XDNA_DBG(xdna, "closing pid %d", client->pid); + + xa_destroy(&client->hwctx_xa); + cleanup_srcu_struct(&client->hwctx_srcu); + mutex_destroy(&client->hwctx_lock); + mutex_destroy(&client->mm_lock); + if (client->dev_heap) + drm_gem_object_put(to_gobj(client->dev_heap)); + + iommu_sva_unbind_device(client->sva); + + XDNA_DBG(xdna, "pid %d closed", client->pid); + kfree(client); + pm_runtime_mark_last_busy(ddev->dev); + pm_runtime_put_autosuspend(ddev->dev); +} + +static int amdxdna_flush(struct file *f, fl_owner_t id) +{ + struct drm_file *filp = f->private_data; + struct amdxdna_client *client = filp->driver_priv; + struct amdxdna_dev *xdna = client->xdna; + int idx; + + XDNA_DBG(xdna, "PID %d flushing...", client->pid); + if (!drm_dev_enter(&xdna->ddev, &idx)) + return 0; + + mutex_lock(&xdna->dev_lock); + list_del_init(&client->node); + mutex_unlock(&xdna->dev_lock); + amdxdna_hwctx_remove_all(client); + + drm_dev_exit(idx); + return 0; +} + +static int amdxdna_drm_get_info_ioctl(struct drm_device *dev, void *data, struct drm_file *filp) +{ + struct amdxdna_client *client = filp->driver_priv; + struct amdxdna_dev *xdna = to_xdna_dev(dev); + struct amdxdna_drm_get_info *args = data; + int ret; + + if (!xdna->dev_info->ops->get_aie_info) + return -EOPNOTSUPP; + + XDNA_DBG(xdna, "Request parameter %u", args->param); + mutex_lock(&xdna->dev_lock); + ret = xdna->dev_info->ops->get_aie_info(client, args); + mutex_unlock(&xdna->dev_lock); + return ret; +} + +static int amdxdna_drm_set_state_ioctl(struct drm_device *dev, void *data, struct drm_file *filp) +{ + struct amdxdna_client *client = filp->driver_priv; + struct amdxdna_dev *xdna = to_xdna_dev(dev); + struct amdxdna_drm_set_state *args = data; + int ret; + + if (!xdna->dev_info->ops->set_aie_state) + return -EOPNOTSUPP; + + XDNA_DBG(xdna, "Request parameter %u", args->param); + mutex_lock(&xdna->dev_lock); + ret = xdna->dev_info->ops->set_aie_state(client, args); + mutex_unlock(&xdna->dev_lock); + + return ret; +} + +static const struct drm_ioctl_desc amdxdna_drm_ioctls[] = { + /* Context */ + DRM_IOCTL_DEF_DRV(AMDXDNA_CREATE_HWCTX, amdxdna_drm_create_hwctx_ioctl, 0), + DRM_IOCTL_DEF_DRV(AMDXDNA_DESTROY_HWCTX, amdxdna_drm_destroy_hwctx_ioctl, 0), + DRM_IOCTL_DEF_DRV(AMDXDNA_CONFIG_HWCTX, amdxdna_drm_config_hwctx_ioctl, 0), + /* BO */ + DRM_IOCTL_DEF_DRV(AMDXDNA_CREATE_BO, amdxdna_drm_create_bo_ioctl, 0), + DRM_IOCTL_DEF_DRV(AMDXDNA_GET_BO_INFO, amdxdna_drm_get_bo_info_ioctl, 0), + DRM_IOCTL_DEF_DRV(AMDXDNA_SYNC_BO, amdxdna_drm_sync_bo_ioctl, 0), + /* Execution */ + DRM_IOCTL_DEF_DRV(AMDXDNA_EXEC_CMD, amdxdna_drm_submit_cmd_ioctl, 0), + /* AIE hardware */ + DRM_IOCTL_DEF_DRV(AMDXDNA_GET_INFO, amdxdna_drm_get_info_ioctl, 0), + DRM_IOCTL_DEF_DRV(AMDXDNA_SET_STATE, amdxdna_drm_set_state_ioctl, DRM_ROOT_ONLY), +}; + +static const struct file_operations amdxdna_fops = { + .owner = THIS_MODULE, + .open = accel_open, + .release = drm_release, + .flush = amdxdna_flush, + .unlocked_ioctl = drm_ioctl, + .compat_ioctl = drm_compat_ioctl, + .poll = drm_poll, + .read = drm_read, + .llseek = noop_llseek, + .mmap = drm_gem_mmap, + .fop_flags = FOP_UNSIGNED_OFFSET, +}; + +const struct drm_driver amdxdna_drm_drv = { + .driver_features = DRIVER_GEM | DRIVER_COMPUTE_ACCEL | + DRIVER_SYNCOBJ | DRIVER_SYNCOBJ_TIMELINE, + .fops = &amdxdna_fops, + .name = "amdxdna_accel_driver", + .desc = "AMD XDNA DRM implementation", + .open = amdxdna_drm_open, + .postclose = amdxdna_drm_close, + .ioctls = amdxdna_drm_ioctls, + .num_ioctls = ARRAY_SIZE(amdxdna_drm_ioctls), + + .gem_create_object = amdxdna_gem_create_object_cb, +}; + +static const struct amdxdna_dev_info * +amdxdna_get_dev_info(struct pci_dev *pdev) +{ + int i; + + for (i = 0; i < ARRAY_SIZE(amdxdna_ids); i++) { + if (pdev->device == amdxdna_ids[i].device && + pdev->revision == amdxdna_ids[i].revision) + return amdxdna_ids[i].dev_info; + } + return NULL; +} + +static int amdxdna_probe(struct pci_dev *pdev, const struct pci_device_id *id) +{ + struct device *dev = &pdev->dev; + struct amdxdna_dev *xdna; + int ret; + + xdna = devm_drm_dev_alloc(dev, &amdxdna_drm_drv, typeof(*xdna), ddev); + if (IS_ERR(xdna)) + return PTR_ERR(xdna); + + xdna->dev_info = amdxdna_get_dev_info(pdev); + if (!xdna->dev_info) + return -ENODEV; + + drmm_mutex_init(&xdna->ddev, &xdna->dev_lock); + init_rwsem(&xdna->notifier_lock); + INIT_LIST_HEAD(&xdna->client_list); + pci_set_drvdata(pdev, xdna); + + if (IS_ENABLED(CONFIG_LOCKDEP)) { + fs_reclaim_acquire(GFP_KERNEL); + might_lock(&xdna->notifier_lock); + fs_reclaim_release(GFP_KERNEL); + } + + mutex_lock(&xdna->dev_lock); + ret = xdna->dev_info->ops->init(xdna); + mutex_unlock(&xdna->dev_lock); + if (ret) { + XDNA_ERR(xdna, "Hardware init failed, ret %d", ret); + return ret; + } + + ret = amdxdna_sysfs_init(xdna); + if (ret) { + XDNA_ERR(xdna, "Create amdxdna attrs failed: %d", ret); + goto failed_dev_fini; + } + + pm_runtime_set_autosuspend_delay(dev, AMDXDNA_AUTOSUSPEND_DELAY); + pm_runtime_use_autosuspend(dev); + pm_runtime_allow(dev); + + ret = drm_dev_register(&xdna->ddev, 0); + if (ret) { + XDNA_ERR(xdna, "DRM register failed, ret %d", ret); + pm_runtime_forbid(dev); + goto failed_sysfs_fini; + } + + pm_runtime_mark_last_busy(dev); + pm_runtime_put_autosuspend(dev); + return 0; + +failed_sysfs_fini: + amdxdna_sysfs_fini(xdna); +failed_dev_fini: + mutex_lock(&xdna->dev_lock); + xdna->dev_info->ops->fini(xdna); + mutex_unlock(&xdna->dev_lock); + return ret; +} + +static void amdxdna_remove(struct pci_dev *pdev) +{ + struct amdxdna_dev *xdna = pci_get_drvdata(pdev); + struct device *dev = &pdev->dev; + struct amdxdna_client *client; + + pm_runtime_get_noresume(dev); + pm_runtime_forbid(dev); + + drm_dev_unplug(&xdna->ddev); + amdxdna_sysfs_fini(xdna); + + mutex_lock(&xdna->dev_lock); + client = list_first_entry_or_null(&xdna->client_list, + struct amdxdna_client, node); + while (client) { + list_del_init(&client->node); + mutex_unlock(&xdna->dev_lock); + + amdxdna_hwctx_remove_all(client); + + mutex_lock(&xdna->dev_lock); + client = list_first_entry_or_null(&xdna->client_list, + struct amdxdna_client, node); + } + + xdna->dev_info->ops->fini(xdna); + mutex_unlock(&xdna->dev_lock); +} + +static int amdxdna_dev_suspend_nolock(struct amdxdna_dev *xdna) +{ + if (xdna->dev_info->ops->suspend) + xdna->dev_info->ops->suspend(xdna); + + return 0; +} + +static int amdxdna_dev_resume_nolock(struct amdxdna_dev *xdna) +{ + if (xdna->dev_info->ops->resume) + return xdna->dev_info->ops->resume(xdna); + + return 0; +} + +static int amdxdna_pmops_suspend(struct device *dev) +{ + struct amdxdna_dev *xdna = pci_get_drvdata(to_pci_dev(dev)); + struct amdxdna_client *client; + + mutex_lock(&xdna->dev_lock); + list_for_each_entry(client, &xdna->client_list, node) + amdxdna_hwctx_suspend(client); + + amdxdna_dev_suspend_nolock(xdna); + mutex_unlock(&xdna->dev_lock); + + return 0; +} + +static int amdxdna_pmops_resume(struct device *dev) +{ + struct amdxdna_dev *xdna = pci_get_drvdata(to_pci_dev(dev)); + struct amdxdna_client *client; + int ret; + + XDNA_INFO(xdna, "firmware resuming..."); + mutex_lock(&xdna->dev_lock); + ret = amdxdna_dev_resume_nolock(xdna); + if (ret) { + XDNA_ERR(xdna, "resume NPU firmware failed"); + mutex_unlock(&xdna->dev_lock); + return ret; + } + + XDNA_INFO(xdna, "hardware context resuming..."); + list_for_each_entry(client, &xdna->client_list, node) + amdxdna_hwctx_resume(client); + mutex_unlock(&xdna->dev_lock); + + return 0; +} + +static int amdxdna_rpmops_suspend(struct device *dev) +{ + struct amdxdna_dev *xdna = pci_get_drvdata(to_pci_dev(dev)); + int ret; + + mutex_lock(&xdna->dev_lock); + ret = amdxdna_dev_suspend_nolock(xdna); + mutex_unlock(&xdna->dev_lock); + + XDNA_DBG(xdna, "Runtime suspend done ret: %d", ret); + return ret; +} + +static int amdxdna_rpmops_resume(struct device *dev) +{ + struct amdxdna_dev *xdna = pci_get_drvdata(to_pci_dev(dev)); + int ret; + + mutex_lock(&xdna->dev_lock); + ret = amdxdna_dev_resume_nolock(xdna); + mutex_unlock(&xdna->dev_lock); + + XDNA_DBG(xdna, "Runtime resume done ret: %d", ret); + return ret; +} + +static const struct dev_pm_ops amdxdna_pm_ops = { + SYSTEM_SLEEP_PM_OPS(amdxdna_pmops_suspend, amdxdna_pmops_resume) + RUNTIME_PM_OPS(amdxdna_rpmops_suspend, amdxdna_rpmops_resume, NULL) +}; + +static struct pci_driver amdxdna_pci_driver = { + .name = KBUILD_MODNAME, + .id_table = pci_ids, + .probe = amdxdna_probe, + .remove = amdxdna_remove, + .driver.pm = &amdxdna_pm_ops, +}; + +module_pci_driver(amdxdna_pci_driver); + +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("XRT Team "); +MODULE_DESCRIPTION("amdxdna driver"); diff --git a/drivers/accel/amdxdna/amdxdna_pci_drv.h b/drivers/accel/amdxdna/amdxdna_pci_drv.h new file mode 100644 index 0000000000000000000000000000000000000000..37848a8d8031b9552b2b894ab4800b8795f35d1f --- /dev/null +++ b/drivers/accel/amdxdna/amdxdna_pci_drv.h @@ -0,0 +1,147 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (C) 2022-2024, Advanced Micro Devices, Inc. + */ + +#ifndef _AMDXDNA_PCI_DRV_H_ +#define _AMDXDNA_PCI_DRV_H_ + +#include + +#define XDNA_INFO(xdna, fmt, args...) drm_info(&(xdna)->ddev, fmt, ##args) +#define XDNA_WARN(xdna, fmt, args...) drm_warn(&(xdna)->ddev, "%s: "fmt, __func__, ##args) +#define XDNA_ERR(xdna, fmt, args...) drm_err(&(xdna)->ddev, "%s: "fmt, __func__, ##args) +#define XDNA_DBG(xdna, fmt, args...) drm_dbg(&(xdna)->ddev, fmt, ##args) +#define XDNA_INFO_ONCE(xdna, fmt, args...) drm_info_once(&(xdna)->ddev, fmt, ##args) + +#define XDNA_MBZ_DBG(xdna, ptr, sz) \ + ({ \ + int __i; \ + int __ret = 0; \ + u8 *__ptr = (u8 *)(ptr); \ + for (__i = 0; __i < (sz); __i++) { \ + if (__ptr[__i]) { \ + XDNA_DBG(xdna, "MBZ check failed"); \ + __ret = -EINVAL; \ + break; \ + } \ + } \ + __ret; \ + }) + +#define to_xdna_dev(drm_dev) \ + ((struct amdxdna_dev *)container_of(drm_dev, struct amdxdna_dev, ddev)) + +extern const struct drm_driver amdxdna_drm_drv; + +struct amdxdna_client; +struct amdxdna_dev; +struct amdxdna_drm_get_info; +struct amdxdna_drm_set_state; +struct amdxdna_gem_obj; +struct amdxdna_hwctx; +struct amdxdna_sched_job; + +/* + * struct amdxdna_dev_ops - Device hardware operation callbacks + */ +struct amdxdna_dev_ops { + int (*init)(struct amdxdna_dev *xdna); + void (*fini)(struct amdxdna_dev *xdna); + int (*resume)(struct amdxdna_dev *xdna); + void (*suspend)(struct amdxdna_dev *xdna); + int (*hwctx_init)(struct amdxdna_hwctx *hwctx); + void (*hwctx_fini)(struct amdxdna_hwctx *hwctx); + int (*hwctx_config)(struct amdxdna_hwctx *hwctx, u32 type, u64 value, void *buf, u32 size); + void (*hmm_invalidate)(struct amdxdna_gem_obj *abo, unsigned long cur_seq); + void (*hwctx_suspend)(struct amdxdna_hwctx *hwctx); + void (*hwctx_resume)(struct amdxdna_hwctx *hwctx); + int (*cmd_submit)(struct amdxdna_hwctx *hwctx, struct amdxdna_sched_job *job, u64 *seq); + int (*get_aie_info)(struct amdxdna_client *client, struct amdxdna_drm_get_info *args); + int (*set_aie_state)(struct amdxdna_client *client, struct amdxdna_drm_set_state *args); +}; + +/* + * struct amdxdna_dev_info - Device hardware information + * Record device static information, like reg, mbox, PSP, SMU bar index + */ +struct amdxdna_dev_info { + int reg_bar; + int mbox_bar; + int sram_bar; + int psp_bar; + int smu_bar; + int device_type; + int first_col; + u32 dev_mem_buf_shift; + u64 dev_mem_base; + size_t dev_mem_size; + char *vbnv; + const struct amdxdna_dev_priv *dev_priv; + const struct amdxdna_dev_ops *ops; +}; + +struct amdxdna_fw_ver { + u32 major; + u32 minor; + u32 sub; + u32 build; +}; + +struct amdxdna_dev { + struct drm_device ddev; + struct amdxdna_dev_hdl *dev_handle; + const struct amdxdna_dev_info *dev_info; + void *xrs_hdl; + + struct mutex dev_lock; /* per device lock */ + struct list_head client_list; + struct amdxdna_fw_ver fw_ver; + struct rw_semaphore notifier_lock; /* for mmu notifier*/ +}; + +/* + * struct amdxdna_device_id - PCI device info + */ +struct amdxdna_device_id { + unsigned short device; + u8 revision; + const struct amdxdna_dev_info *dev_info; +}; + +/* + * struct amdxdna_client - amdxdna client + * A per fd data structure for managing context and other user process stuffs. + */ +struct amdxdna_client { + struct list_head node; + pid_t pid; + struct mutex hwctx_lock; /* protect hwctx */ + /* do NOT wait this srcu when hwctx_lock is held */ + struct srcu_struct hwctx_srcu; + struct xarray hwctx_xa; + u32 next_hwctxid; + struct amdxdna_dev *xdna; + struct drm_file *filp; + + struct mutex mm_lock; /* protect memory related */ + struct amdxdna_gem_obj *dev_heap; + + struct iommu_sva *sva; + int pasid; +}; + +#define amdxdna_for_each_hwctx(client, hwctx_id, entry) \ + xa_for_each(&(client)->hwctx_xa, hwctx_id, entry) + +/* Add device info below */ +extern const struct amdxdna_dev_info dev_npu1_info; +extern const struct amdxdna_dev_info dev_npu2_info; +extern const struct amdxdna_dev_info dev_npu4_info; +extern const struct amdxdna_dev_info dev_npu5_info; +extern const struct amdxdna_dev_info dev_npu6_info; + +int amdxdna_sysfs_init(struct amdxdna_dev *xdna); +void amdxdna_sysfs_fini(struct amdxdna_dev *xdna); + +#endif /* _AMDXDNA_PCI_DRV_H_ */ diff --git a/drivers/accel/amdxdna/amdxdna_sysfs.c b/drivers/accel/amdxdna/amdxdna_sysfs.c new file mode 100644 index 0000000000000000000000000000000000000000..f27e4ee960a0cdb9821649545ed1447f4c0499e9 --- /dev/null +++ b/drivers/accel/amdxdna/amdxdna_sysfs.c @@ -0,0 +1,67 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2023-2024, Advanced Micro Devices, Inc. + */ + +#include +#include +#include +#include +#include +#include + +#include "amdxdna_gem.h" +#include "amdxdna_pci_drv.h" + +static ssize_t vbnv_show(struct device *dev, struct device_attribute *attr, char *buf) +{ + struct amdxdna_dev *xdna = dev_get_drvdata(dev); + + return sprintf(buf, "%s\n", xdna->dev_info->vbnv); +} +static DEVICE_ATTR_RO(vbnv); + +static ssize_t device_type_show(struct device *dev, struct device_attribute *attr, char *buf) +{ + struct amdxdna_dev *xdna = dev_get_drvdata(dev); + + return sprintf(buf, "%d\n", xdna->dev_info->device_type); +} +static DEVICE_ATTR_RO(device_type); + +static ssize_t fw_version_show(struct device *dev, struct device_attribute *attr, char *buf) +{ + struct amdxdna_dev *xdna = dev_get_drvdata(dev); + + return sprintf(buf, "%d.%d.%d.%d\n", xdna->fw_ver.major, + xdna->fw_ver.minor, xdna->fw_ver.sub, + xdna->fw_ver.build); +} +static DEVICE_ATTR_RO(fw_version); + +static struct attribute *amdxdna_attrs[] = { + &dev_attr_device_type.attr, + &dev_attr_vbnv.attr, + &dev_attr_fw_version.attr, + NULL, +}; + +static struct attribute_group amdxdna_attr_group = { + .attrs = amdxdna_attrs, +}; + +int amdxdna_sysfs_init(struct amdxdna_dev *xdna) +{ + int ret; + + ret = sysfs_create_group(&xdna->ddev.dev->kobj, &amdxdna_attr_group); + if (ret) + XDNA_ERR(xdna, "Create attr group failed"); + + return ret; +} + +void amdxdna_sysfs_fini(struct amdxdna_dev *xdna) +{ + sysfs_remove_group(&xdna->ddev.dev->kobj, &amdxdna_attr_group); +} diff --git a/drivers/accel/amdxdna/npu1_regs.c b/drivers/accel/amdxdna/npu1_regs.c new file mode 100644 index 0000000000000000000000000000000000000000..e408af57e3787186188f084e6850263bdc9a2fc1 --- /dev/null +++ b/drivers/accel/amdxdna/npu1_regs.c @@ -0,0 +1,114 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2023-2024, Advanced Micro Devices, Inc. + */ + +#include +#include +#include +#include + +#include "aie2_pci.h" +#include "amdxdna_mailbox.h" +#include "amdxdna_pci_drv.h" + +/* Address definition from NPU1 docs */ +#define MPNPU_PUB_SEC_INTR 0x3010090 +#define MPNPU_PUB_PWRMGMT_INTR 0x3010094 +#define MPNPU_PUB_SCRATCH2 0x30100A0 +#define MPNPU_PUB_SCRATCH3 0x30100A4 +#define MPNPU_PUB_SCRATCH4 0x30100A8 +#define MPNPU_PUB_SCRATCH5 0x30100AC +#define MPNPU_PUB_SCRATCH6 0x30100B0 +#define MPNPU_PUB_SCRATCH7 0x30100B4 +#define MPNPU_PUB_SCRATCH9 0x30100BC + +#define MPNPU_SRAM_X2I_MAILBOX_0 0x30A0000 +#define MPNPU_SRAM_X2I_MAILBOX_1 0x30A2000 +#define MPNPU_SRAM_I2X_MAILBOX_15 0x30BF000 + +#define MPNPU_APERTURE0_BASE 0x3000000 +#define MPNPU_APERTURE1_BASE 0x3080000 +#define MPNPU_APERTURE2_BASE 0x30C0000 + +/* PCIe BAR Index for NPU1 */ +#define NPU1_REG_BAR_INDEX 0 +#define NPU1_MBOX_BAR_INDEX 4 +#define NPU1_PSP_BAR_INDEX 0 +#define NPU1_SMU_BAR_INDEX 0 +#define NPU1_SRAM_BAR_INDEX 2 +/* Associated BARs and Apertures */ +#define NPU1_REG_BAR_BASE MPNPU_APERTURE0_BASE +#define NPU1_MBOX_BAR_BASE MPNPU_APERTURE2_BASE +#define NPU1_PSP_BAR_BASE MPNPU_APERTURE0_BASE +#define NPU1_SMU_BAR_BASE MPNPU_APERTURE0_BASE +#define NPU1_SRAM_BAR_BASE MPNPU_APERTURE1_BASE + +const struct rt_config npu1_default_rt_cfg[] = { + { 2, 1, AIE2_RT_CFG_INIT }, /* PDI APP LOAD MODE */ + { 1, 1, AIE2_RT_CFG_CLK_GATING }, /* Clock gating on */ + { 0 }, +}; + +const struct dpm_clk_freq npu1_dpm_clk_table[] = { + {400, 800}, + {600, 1024}, + {600, 1024}, + {600, 1024}, + {600, 1024}, + {720, 1309}, + {720, 1309}, + {847, 1600}, + { 0 } +}; + +const struct amdxdna_dev_priv npu1_dev_priv = { + .fw_path = "amdnpu/1502_00/npu.sbin", + .protocol_major = 0x5, + .protocol_minor = 0x7, + .rt_config = npu1_default_rt_cfg, + .dpm_clk_tbl = npu1_dpm_clk_table, + .col_align = COL_ALIGN_NONE, + .mbox_dev_addr = NPU1_MBOX_BAR_BASE, + .mbox_size = 0, /* Use BAR size */ + .sram_dev_addr = NPU1_SRAM_BAR_BASE, + .sram_offs = { + DEFINE_BAR_OFFSET(MBOX_CHANN_OFF, NPU1_SRAM, MPNPU_SRAM_X2I_MAILBOX_0), + DEFINE_BAR_OFFSET(FW_ALIVE_OFF, NPU1_SRAM, MPNPU_SRAM_I2X_MAILBOX_15), + }, + .psp_regs_off = { + DEFINE_BAR_OFFSET(PSP_CMD_REG, NPU1_PSP, MPNPU_PUB_SCRATCH2), + DEFINE_BAR_OFFSET(PSP_ARG0_REG, NPU1_PSP, MPNPU_PUB_SCRATCH3), + DEFINE_BAR_OFFSET(PSP_ARG1_REG, NPU1_PSP, MPNPU_PUB_SCRATCH4), + DEFINE_BAR_OFFSET(PSP_ARG2_REG, NPU1_PSP, MPNPU_PUB_SCRATCH9), + DEFINE_BAR_OFFSET(PSP_INTR_REG, NPU1_PSP, MPNPU_PUB_SEC_INTR), + DEFINE_BAR_OFFSET(PSP_STATUS_REG, NPU1_PSP, MPNPU_PUB_SCRATCH2), + DEFINE_BAR_OFFSET(PSP_RESP_REG, NPU1_PSP, MPNPU_PUB_SCRATCH3), + }, + .smu_regs_off = { + DEFINE_BAR_OFFSET(SMU_CMD_REG, NPU1_SMU, MPNPU_PUB_SCRATCH5), + DEFINE_BAR_OFFSET(SMU_ARG_REG, NPU1_SMU, MPNPU_PUB_SCRATCH7), + DEFINE_BAR_OFFSET(SMU_INTR_REG, NPU1_SMU, MPNPU_PUB_PWRMGMT_INTR), + DEFINE_BAR_OFFSET(SMU_RESP_REG, NPU1_SMU, MPNPU_PUB_SCRATCH6), + DEFINE_BAR_OFFSET(SMU_OUT_REG, NPU1_SMU, MPNPU_PUB_SCRATCH7), + }, + .hw_ops = { + .set_dpm = npu1_set_dpm, + }, +}; + +const struct amdxdna_dev_info dev_npu1_info = { + .reg_bar = NPU1_REG_BAR_INDEX, + .mbox_bar = NPU1_MBOX_BAR_INDEX, + .sram_bar = NPU1_SRAM_BAR_INDEX, + .psp_bar = NPU1_PSP_BAR_INDEX, + .smu_bar = NPU1_SMU_BAR_INDEX, + .first_col = 1, + .dev_mem_buf_shift = 15, /* 32 KiB aligned */ + .dev_mem_base = AIE2_DEVM_BASE, + .dev_mem_size = AIE2_DEVM_SIZE, + .vbnv = "RyzenAI-npu1", + .device_type = AMDXDNA_DEV_TYPE_KMQ, + .dev_priv = &npu1_dev_priv, + .ops = &aie2_ops, +}; diff --git a/drivers/accel/amdxdna/npu2_regs.c b/drivers/accel/amdxdna/npu2_regs.c new file mode 100644 index 0000000000000000000000000000000000000000..286bd0d475e2ac31758ae4bdad6f3cb8b8d13752 --- /dev/null +++ b/drivers/accel/amdxdna/npu2_regs.c @@ -0,0 +1,113 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2023-2024, Advanced Micro Devices, Inc. + */ + +#include +#include +#include +#include + +#include "aie2_pci.h" +#include "amdxdna_mailbox.h" +#include "amdxdna_pci_drv.h" + +/* NPU Public Registers on MpNPUAxiXbar (refer to Diag npu_registers.h) */ +#define MPNPU_PUB_SEC_INTR 0x3010060 +#define MPNPU_PUB_PWRMGMT_INTR 0x3010064 +#define MPNPU_PUB_SCRATCH0 0x301006C +#define MPNPU_PUB_SCRATCH1 0x3010070 +#define MPNPU_PUB_SCRATCH2 0x3010074 +#define MPNPU_PUB_SCRATCH3 0x3010078 +#define MPNPU_PUB_SCRATCH4 0x301007C +#define MPNPU_PUB_SCRATCH5 0x3010080 +#define MPNPU_PUB_SCRATCH6 0x3010084 +#define MPNPU_PUB_SCRATCH7 0x3010088 +#define MPNPU_PUB_SCRATCH8 0x301008C +#define MPNPU_PUB_SCRATCH9 0x3010090 +#define MPNPU_PUB_SCRATCH10 0x3010094 +#define MPNPU_PUB_SCRATCH11 0x3010098 +#define MPNPU_PUB_SCRATCH12 0x301009C +#define MPNPU_PUB_SCRATCH13 0x30100A0 +#define MPNPU_PUB_SCRATCH14 0x30100A4 +#define MPNPU_PUB_SCRATCH15 0x30100A8 +#define MP0_C2PMSG_73 0x3810A24 +#define MP0_C2PMSG_123 0x3810AEC + +#define MP1_C2PMSG_0 0x3B10900 +#define MP1_C2PMSG_60 0x3B109F0 +#define MP1_C2PMSG_61 0x3B109F4 + +#define MPNPU_SRAM_X2I_MAILBOX_0 0x3600000 +#define MPNPU_SRAM_X2I_MAILBOX_15 0x361E000 +#define MPNPU_SRAM_X2I_MAILBOX_31 0x363E000 +#define MPNPU_SRAM_I2X_MAILBOX_31 0x363F000 + +#define MMNPU_APERTURE0_BASE 0x3000000 +#define MMNPU_APERTURE1_BASE 0x3600000 +#define MMNPU_APERTURE3_BASE 0x3810000 +#define MMNPU_APERTURE4_BASE 0x3B10000 + +/* PCIe BAR Index for NPU2 */ +#define NPU2_REG_BAR_INDEX 0 +#define NPU2_MBOX_BAR_INDEX 0 +#define NPU2_PSP_BAR_INDEX 4 +#define NPU2_SMU_BAR_INDEX 5 +#define NPU2_SRAM_BAR_INDEX 2 +/* Associated BARs and Apertures */ +#define NPU2_REG_BAR_BASE MMNPU_APERTURE0_BASE +#define NPU2_MBOX_BAR_BASE MMNPU_APERTURE0_BASE +#define NPU2_PSP_BAR_BASE MMNPU_APERTURE3_BASE +#define NPU2_SMU_BAR_BASE MMNPU_APERTURE4_BASE +#define NPU2_SRAM_BAR_BASE MMNPU_APERTURE1_BASE + +const struct amdxdna_dev_priv npu2_dev_priv = { + .fw_path = "amdnpu/17f0_00/npu.sbin", + .protocol_major = 0x6, + .protocol_minor = 0x6, + .rt_config = npu4_default_rt_cfg, + .dpm_clk_tbl = npu4_dpm_clk_table, + .col_align = COL_ALIGN_NATURE, + .mbox_dev_addr = NPU2_MBOX_BAR_BASE, + .mbox_size = 0, /* Use BAR size */ + .sram_dev_addr = NPU2_SRAM_BAR_BASE, + .sram_offs = { + DEFINE_BAR_OFFSET(MBOX_CHANN_OFF, NPU2_SRAM, MPNPU_SRAM_X2I_MAILBOX_0), + DEFINE_BAR_OFFSET(FW_ALIVE_OFF, NPU2_SRAM, MPNPU_SRAM_X2I_MAILBOX_15), + }, + .psp_regs_off = { + DEFINE_BAR_OFFSET(PSP_CMD_REG, NPU2_PSP, MP0_C2PMSG_123), + DEFINE_BAR_OFFSET(PSP_ARG0_REG, NPU2_REG, MPNPU_PUB_SCRATCH3), + DEFINE_BAR_OFFSET(PSP_ARG1_REG, NPU2_REG, MPNPU_PUB_SCRATCH4), + DEFINE_BAR_OFFSET(PSP_ARG2_REG, NPU2_REG, MPNPU_PUB_SCRATCH9), + DEFINE_BAR_OFFSET(PSP_INTR_REG, NPU2_PSP, MP0_C2PMSG_73), + DEFINE_BAR_OFFSET(PSP_STATUS_REG, NPU2_PSP, MP0_C2PMSG_123), + DEFINE_BAR_OFFSET(PSP_RESP_REG, NPU2_REG, MPNPU_PUB_SCRATCH3), + }, + .smu_regs_off = { + DEFINE_BAR_OFFSET(SMU_CMD_REG, NPU2_SMU, MP1_C2PMSG_0), + DEFINE_BAR_OFFSET(SMU_ARG_REG, NPU2_SMU, MP1_C2PMSG_60), + DEFINE_BAR_OFFSET(SMU_INTR_REG, NPU2_SMU, MMNPU_APERTURE4_BASE), + DEFINE_BAR_OFFSET(SMU_RESP_REG, NPU2_SMU, MP1_C2PMSG_61), + DEFINE_BAR_OFFSET(SMU_OUT_REG, NPU2_SMU, MP1_C2PMSG_60), + }, + .hw_ops = { + .set_dpm = npu4_set_dpm, + }, +}; + +const struct amdxdna_dev_info dev_npu2_info = { + .reg_bar = NPU2_REG_BAR_INDEX, + .mbox_bar = NPU2_MBOX_BAR_INDEX, + .sram_bar = NPU2_SRAM_BAR_INDEX, + .psp_bar = NPU2_PSP_BAR_INDEX, + .smu_bar = NPU2_SMU_BAR_INDEX, + .first_col = 0, + .dev_mem_buf_shift = 15, /* 32 KiB aligned */ + .dev_mem_base = AIE2_DEVM_BASE, + .dev_mem_size = AIE2_DEVM_SIZE, + .vbnv = "RyzenAI-npu2", + .device_type = AMDXDNA_DEV_TYPE_KMQ, + .dev_priv = &npu2_dev_priv, + .ops = &aie2_ops, /* NPU2 can share NPU1's callback */ +}; diff --git a/drivers/accel/amdxdna/npu4_regs.c b/drivers/accel/amdxdna/npu4_regs.c new file mode 100644 index 0000000000000000000000000000000000000000..00c52833ce894ff1b376abe967dec02e6442b354 --- /dev/null +++ b/drivers/accel/amdxdna/npu4_regs.c @@ -0,0 +1,134 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2023-2024, Advanced Micro Devices, Inc. + */ + +#include +#include +#include +#include + +#include "aie2_pci.h" +#include "amdxdna_mailbox.h" +#include "amdxdna_pci_drv.h" + +/* NPU Public Registers on MpNPUAxiXbar (refer to Diag npu_registers.h) */ +#define MPNPU_PUB_SEC_INTR 0x3010060 +#define MPNPU_PUB_PWRMGMT_INTR 0x3010064 +#define MPNPU_PUB_SCRATCH0 0x301006C +#define MPNPU_PUB_SCRATCH1 0x3010070 +#define MPNPU_PUB_SCRATCH2 0x3010074 +#define MPNPU_PUB_SCRATCH3 0x3010078 +#define MPNPU_PUB_SCRATCH4 0x301007C +#define MPNPU_PUB_SCRATCH5 0x3010080 +#define MPNPU_PUB_SCRATCH6 0x3010084 +#define MPNPU_PUB_SCRATCH7 0x3010088 +#define MPNPU_PUB_SCRATCH8 0x301008C +#define MPNPU_PUB_SCRATCH9 0x3010090 +#define MPNPU_PUB_SCRATCH10 0x3010094 +#define MPNPU_PUB_SCRATCH11 0x3010098 +#define MPNPU_PUB_SCRATCH12 0x301009C +#define MPNPU_PUB_SCRATCH13 0x30100A0 +#define MPNPU_PUB_SCRATCH14 0x30100A4 +#define MPNPU_PUB_SCRATCH15 0x30100A8 +#define MP0_C2PMSG_73 0x3810A24 +#define MP0_C2PMSG_123 0x3810AEC + +#define MP1_C2PMSG_0 0x3B10900 +#define MP1_C2PMSG_60 0x3B109F0 +#define MP1_C2PMSG_61 0x3B109F4 + +#define MPNPU_SRAM_X2I_MAILBOX_0 0x3600000 +#define MPNPU_SRAM_X2I_MAILBOX_15 0x361E000 +#define MPNPU_SRAM_X2I_MAILBOX_31 0x363E000 +#define MPNPU_SRAM_I2X_MAILBOX_31 0x363F000 + +#define MMNPU_APERTURE0_BASE 0x3000000 +#define MMNPU_APERTURE1_BASE 0x3600000 +#define MMNPU_APERTURE3_BASE 0x3810000 +#define MMNPU_APERTURE4_BASE 0x3B10000 + +/* PCIe BAR Index for NPU4 */ +#define NPU4_REG_BAR_INDEX 0 +#define NPU4_MBOX_BAR_INDEX 0 +#define NPU4_PSP_BAR_INDEX 4 +#define NPU4_SMU_BAR_INDEX 5 +#define NPU4_SRAM_BAR_INDEX 2 +/* Associated BARs and Apertures */ +#define NPU4_REG_BAR_BASE MMNPU_APERTURE0_BASE +#define NPU4_MBOX_BAR_BASE MMNPU_APERTURE0_BASE +#define NPU4_PSP_BAR_BASE MMNPU_APERTURE3_BASE +#define NPU4_SMU_BAR_BASE MMNPU_APERTURE4_BASE +#define NPU4_SRAM_BAR_BASE MMNPU_APERTURE1_BASE + +const struct rt_config npu4_default_rt_cfg[] = { + { 5, 1, AIE2_RT_CFG_INIT }, /* PDI APP LOAD MODE */ + { 1, 1, AIE2_RT_CFG_CLK_GATING }, /* Clock gating on */ + { 2, 1, AIE2_RT_CFG_CLK_GATING }, /* Clock gating on */ + { 3, 1, AIE2_RT_CFG_CLK_GATING }, /* Clock gating on */ + { 4, 1, AIE2_RT_CFG_CLK_GATING }, /* Clock gating on */ + { 0 }, +}; + +const struct dpm_clk_freq npu4_dpm_clk_table[] = { + {396, 792}, + {600, 1056}, + {792, 1152}, + {975, 1267}, + {975, 1267}, + {1056, 1408}, + {1152, 1584}, + {1267, 1800}, + { 0 } +}; + +const struct amdxdna_dev_priv npu4_dev_priv = { + .fw_path = "amdnpu/17f0_10/npu.sbin", + .protocol_major = 0x6, + .protocol_minor = 12, + .rt_config = npu4_default_rt_cfg, + .dpm_clk_tbl = npu4_dpm_clk_table, + .col_align = COL_ALIGN_NATURE, + .mbox_dev_addr = NPU4_MBOX_BAR_BASE, + .mbox_size = 0, /* Use BAR size */ + .sram_dev_addr = NPU4_SRAM_BAR_BASE, + .sram_offs = { + DEFINE_BAR_OFFSET(MBOX_CHANN_OFF, NPU4_SRAM, MPNPU_SRAM_X2I_MAILBOX_0), + DEFINE_BAR_OFFSET(FW_ALIVE_OFF, NPU4_SRAM, MPNPU_SRAM_X2I_MAILBOX_15), + }, + .psp_regs_off = { + DEFINE_BAR_OFFSET(PSP_CMD_REG, NPU4_PSP, MP0_C2PMSG_123), + DEFINE_BAR_OFFSET(PSP_ARG0_REG, NPU4_REG, MPNPU_PUB_SCRATCH3), + DEFINE_BAR_OFFSET(PSP_ARG1_REG, NPU4_REG, MPNPU_PUB_SCRATCH4), + DEFINE_BAR_OFFSET(PSP_ARG2_REG, NPU4_REG, MPNPU_PUB_SCRATCH9), + DEFINE_BAR_OFFSET(PSP_INTR_REG, NPU4_PSP, MP0_C2PMSG_73), + DEFINE_BAR_OFFSET(PSP_STATUS_REG, NPU4_PSP, MP0_C2PMSG_123), + DEFINE_BAR_OFFSET(PSP_RESP_REG, NPU4_REG, MPNPU_PUB_SCRATCH3), + }, + .smu_regs_off = { + DEFINE_BAR_OFFSET(SMU_CMD_REG, NPU4_SMU, MP1_C2PMSG_0), + DEFINE_BAR_OFFSET(SMU_ARG_REG, NPU4_SMU, MP1_C2PMSG_60), + DEFINE_BAR_OFFSET(SMU_INTR_REG, NPU4_SMU, MMNPU_APERTURE4_BASE), + DEFINE_BAR_OFFSET(SMU_RESP_REG, NPU4_SMU, MP1_C2PMSG_61), + DEFINE_BAR_OFFSET(SMU_OUT_REG, NPU4_SMU, MP1_C2PMSG_60), + }, + .hw_ops = { + .set_dpm = npu4_set_dpm, + }, +}; + +const struct amdxdna_dev_info dev_npu4_info = { + .reg_bar = NPU4_REG_BAR_INDEX, + .mbox_bar = NPU4_MBOX_BAR_INDEX, + .sram_bar = NPU4_SRAM_BAR_INDEX, + .psp_bar = NPU4_PSP_BAR_INDEX, + .smu_bar = NPU4_SMU_BAR_INDEX, + .first_col = 0, + .dev_mem_buf_shift = 15, /* 32 KiB aligned */ + .dev_mem_base = AIE2_DEVM_BASE, + .dev_mem_size = AIE2_DEVM_SIZE, + .vbnv = "RyzenAI-npu4", + .device_type = AMDXDNA_DEV_TYPE_KMQ, + .dev_priv = &npu4_dev_priv, + .ops = &aie2_ops, /* NPU4 can share NPU1's callback */ +}; diff --git a/drivers/accel/amdxdna/npu5_regs.c b/drivers/accel/amdxdna/npu5_regs.c new file mode 100644 index 0000000000000000000000000000000000000000..118849272f2776a69ba44b375d067f7bc5e48c2b --- /dev/null +++ b/drivers/accel/amdxdna/npu5_regs.c @@ -0,0 +1,113 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2024, Advanced Micro Devices, Inc. + */ + +#include +#include +#include +#include + +#include "aie2_pci.h" +#include "amdxdna_mailbox.h" +#include "amdxdna_pci_drv.h" + +/* NPU Public Registers on MpNPUAxiXbar (refer to Diag npu_registers.h) */ +#define MPNPU_PUB_SEC_INTR 0x3010060 +#define MPNPU_PUB_PWRMGMT_INTR 0x3010064 +#define MPNPU_PUB_SCRATCH0 0x301006C +#define MPNPU_PUB_SCRATCH1 0x3010070 +#define MPNPU_PUB_SCRATCH2 0x3010074 +#define MPNPU_PUB_SCRATCH3 0x3010078 +#define MPNPU_PUB_SCRATCH4 0x301007C +#define MPNPU_PUB_SCRATCH5 0x3010080 +#define MPNPU_PUB_SCRATCH6 0x3010084 +#define MPNPU_PUB_SCRATCH7 0x3010088 +#define MPNPU_PUB_SCRATCH8 0x301008C +#define MPNPU_PUB_SCRATCH9 0x3010090 +#define MPNPU_PUB_SCRATCH10 0x3010094 +#define MPNPU_PUB_SCRATCH11 0x3010098 +#define MPNPU_PUB_SCRATCH12 0x301009C +#define MPNPU_PUB_SCRATCH13 0x30100A0 +#define MPNPU_PUB_SCRATCH14 0x30100A4 +#define MPNPU_PUB_SCRATCH15 0x30100A8 +#define MP0_C2PMSG_73 0x3810A24 +#define MP0_C2PMSG_123 0x3810AEC + +#define MP1_C2PMSG_0 0x3B10900 +#define MP1_C2PMSG_60 0x3B109F0 +#define MP1_C2PMSG_61 0x3B109F4 + +#define MPNPU_SRAM_X2I_MAILBOX_0 0x3600000 +#define MPNPU_SRAM_X2I_MAILBOX_15 0x361E000 +#define MPNPU_SRAM_X2I_MAILBOX_31 0x363E000 +#define MPNPU_SRAM_I2X_MAILBOX_31 0x363F000 + +#define MMNPU_APERTURE0_BASE 0x3000000 +#define MMNPU_APERTURE1_BASE 0x3600000 +#define MMNPU_APERTURE3_BASE 0x3810000 +#define MMNPU_APERTURE4_BASE 0x3B10000 + +/* PCIe BAR Index for NPU5 */ +#define NPU5_REG_BAR_INDEX 0 +#define NPU5_MBOX_BAR_INDEX 0 +#define NPU5_PSP_BAR_INDEX 4 +#define NPU5_SMU_BAR_INDEX 5 +#define NPU5_SRAM_BAR_INDEX 2 +/* Associated BARs and Apertures */ +#define NPU5_REG_BAR_BASE MMNPU_APERTURE0_BASE +#define NPU5_MBOX_BAR_BASE MMNPU_APERTURE0_BASE +#define NPU5_PSP_BAR_BASE MMNPU_APERTURE3_BASE +#define NPU5_SMU_BAR_BASE MMNPU_APERTURE4_BASE +#define NPU5_SRAM_BAR_BASE MMNPU_APERTURE1_BASE + +const struct amdxdna_dev_priv npu5_dev_priv = { + .fw_path = "amdnpu/17f0_11/npu.sbin", + .protocol_major = 0x6, + .protocol_minor = 12, + .rt_config = npu4_default_rt_cfg, + .dpm_clk_tbl = npu4_dpm_clk_table, + .col_align = COL_ALIGN_NATURE, + .mbox_dev_addr = NPU5_MBOX_BAR_BASE, + .mbox_size = 0, /* Use BAR size */ + .sram_dev_addr = NPU5_SRAM_BAR_BASE, + .sram_offs = { + DEFINE_BAR_OFFSET(MBOX_CHANN_OFF, NPU5_SRAM, MPNPU_SRAM_X2I_MAILBOX_0), + DEFINE_BAR_OFFSET(FW_ALIVE_OFF, NPU5_SRAM, MPNPU_SRAM_X2I_MAILBOX_15), + }, + .psp_regs_off = { + DEFINE_BAR_OFFSET(PSP_CMD_REG, NPU5_PSP, MP0_C2PMSG_123), + DEFINE_BAR_OFFSET(PSP_ARG0_REG, NPU5_REG, MPNPU_PUB_SCRATCH3), + DEFINE_BAR_OFFSET(PSP_ARG1_REG, NPU5_REG, MPNPU_PUB_SCRATCH4), + DEFINE_BAR_OFFSET(PSP_ARG2_REG, NPU5_REG, MPNPU_PUB_SCRATCH9), + DEFINE_BAR_OFFSET(PSP_INTR_REG, NPU5_PSP, MP0_C2PMSG_73), + DEFINE_BAR_OFFSET(PSP_STATUS_REG, NPU5_PSP, MP0_C2PMSG_123), + DEFINE_BAR_OFFSET(PSP_RESP_REG, NPU5_REG, MPNPU_PUB_SCRATCH3), + }, + .smu_regs_off = { + DEFINE_BAR_OFFSET(SMU_CMD_REG, NPU5_SMU, MP1_C2PMSG_0), + DEFINE_BAR_OFFSET(SMU_ARG_REG, NPU5_SMU, MP1_C2PMSG_60), + DEFINE_BAR_OFFSET(SMU_INTR_REG, NPU5_SMU, MMNPU_APERTURE4_BASE), + DEFINE_BAR_OFFSET(SMU_RESP_REG, NPU5_SMU, MP1_C2PMSG_61), + DEFINE_BAR_OFFSET(SMU_OUT_REG, NPU5_SMU, MP1_C2PMSG_60), + }, + .hw_ops = { + .set_dpm = npu4_set_dpm, + }, +}; + +const struct amdxdna_dev_info dev_npu5_info = { + .reg_bar = NPU5_REG_BAR_INDEX, + .mbox_bar = NPU5_MBOX_BAR_INDEX, + .sram_bar = NPU5_SRAM_BAR_INDEX, + .psp_bar = NPU5_PSP_BAR_INDEX, + .smu_bar = NPU5_SMU_BAR_INDEX, + .first_col = 0, + .dev_mem_buf_shift = 15, /* 32 KiB aligned */ + .dev_mem_base = AIE2_DEVM_BASE, + .dev_mem_size = AIE2_DEVM_SIZE, + .vbnv = "RyzenAI-npu5", + .device_type = AMDXDNA_DEV_TYPE_KMQ, + .dev_priv = &npu5_dev_priv, + .ops = &aie2_ops, +}; diff --git a/drivers/accel/amdxdna/npu6_regs.c b/drivers/accel/amdxdna/npu6_regs.c new file mode 100644 index 0000000000000000000000000000000000000000..f46c760cefc713acf58b6452a416b004252e3d03 --- /dev/null +++ b/drivers/accel/amdxdna/npu6_regs.c @@ -0,0 +1,114 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2024, Advanced Micro Devices, Inc. + */ + +#include +#include +#include +#include + +#include "aie2_pci.h" +#include "amdxdna_mailbox.h" +#include "amdxdna_pci_drv.h" + +/* NPU Public Registers on MpNPUAxiXbar (refer to Diag npu_registers.h) */ +#define MPNPU_PUB_SEC_INTR 0x3010060 +#define MPNPU_PUB_PWRMGMT_INTR 0x3010064 +#define MPNPU_PUB_SCRATCH0 0x301006C +#define MPNPU_PUB_SCRATCH1 0x3010070 +#define MPNPU_PUB_SCRATCH2 0x3010074 +#define MPNPU_PUB_SCRATCH3 0x3010078 +#define MPNPU_PUB_SCRATCH4 0x301007C +#define MPNPU_PUB_SCRATCH5 0x3010080 +#define MPNPU_PUB_SCRATCH6 0x3010084 +#define MPNPU_PUB_SCRATCH7 0x3010088 +#define MPNPU_PUB_SCRATCH8 0x301008C +#define MPNPU_PUB_SCRATCH9 0x3010090 +#define MPNPU_PUB_SCRATCH10 0x3010094 +#define MPNPU_PUB_SCRATCH11 0x3010098 +#define MPNPU_PUB_SCRATCH12 0x301009C +#define MPNPU_PUB_SCRATCH13 0x30100A0 +#define MPNPU_PUB_SCRATCH14 0x30100A4 +#define MPNPU_PUB_SCRATCH15 0x30100A8 +#define MP0_C2PMSG_73 0x3810A24 +#define MP0_C2PMSG_123 0x3810AEC + +#define MP1_C2PMSG_0 0x3B10900 +#define MP1_C2PMSG_60 0x3B109F0 +#define MP1_C2PMSG_61 0x3B109F4 + +#define MPNPU_SRAM_X2I_MAILBOX_0 0x3600000 +#define MPNPU_SRAM_X2I_MAILBOX_15 0x361E000 +#define MPNPU_SRAM_X2I_MAILBOX_31 0x363E000 +#define MPNPU_SRAM_I2X_MAILBOX_31 0x363F000 + +#define MMNPU_APERTURE0_BASE 0x3000000 +#define MMNPU_APERTURE1_BASE 0x3600000 +#define MMNPU_APERTURE3_BASE 0x3810000 +#define MMNPU_APERTURE4_BASE 0x3B10000 + +/* PCIe BAR Index for NPU6 */ +#define NPU6_REG_BAR_INDEX 0 +#define NPU6_MBOX_BAR_INDEX 0 +#define NPU6_PSP_BAR_INDEX 4 +#define NPU6_SMU_BAR_INDEX 5 +#define NPU6_SRAM_BAR_INDEX 2 +/* Associated BARs and Apertures */ +#define NPU6_REG_BAR_BASE MMNPU_APERTURE0_BASE +#define NPU6_MBOX_BAR_BASE MMNPU_APERTURE0_BASE +#define NPU6_PSP_BAR_BASE MMNPU_APERTURE3_BASE +#define NPU6_SMU_BAR_BASE MMNPU_APERTURE4_BASE +#define NPU6_SRAM_BAR_BASE MMNPU_APERTURE1_BASE + +const struct amdxdna_dev_priv npu6_dev_priv = { + .fw_path = "amdnpu/17f0_10/npu.sbin", + .protocol_major = 0x6, + .protocol_minor = 12, + .rt_config = npu4_default_rt_cfg, + .dpm_clk_tbl = npu4_dpm_clk_table, + .col_align = COL_ALIGN_NATURE, + .mbox_dev_addr = NPU6_MBOX_BAR_BASE, + .mbox_size = 0, /* Use BAR size */ + .sram_dev_addr = NPU6_SRAM_BAR_BASE, + .sram_offs = { + DEFINE_BAR_OFFSET(MBOX_CHANN_OFF, NPU6_SRAM, MPNPU_SRAM_X2I_MAILBOX_0), + DEFINE_BAR_OFFSET(FW_ALIVE_OFF, NPU6_SRAM, MPNPU_SRAM_X2I_MAILBOX_15), + }, + .psp_regs_off = { + DEFINE_BAR_OFFSET(PSP_CMD_REG, NPU6_PSP, MP0_C2PMSG_123), + DEFINE_BAR_OFFSET(PSP_ARG0_REG, NPU6_REG, MPNPU_PUB_SCRATCH3), + DEFINE_BAR_OFFSET(PSP_ARG1_REG, NPU6_REG, MPNPU_PUB_SCRATCH4), + DEFINE_BAR_OFFSET(PSP_ARG2_REG, NPU6_REG, MPNPU_PUB_SCRATCH9), + DEFINE_BAR_OFFSET(PSP_INTR_REG, NPU6_PSP, MP0_C2PMSG_73), + DEFINE_BAR_OFFSET(PSP_STATUS_REG, NPU6_PSP, MP0_C2PMSG_123), + DEFINE_BAR_OFFSET(PSP_RESP_REG, NPU6_REG, MPNPU_PUB_SCRATCH3), + }, + .smu_regs_off = { + DEFINE_BAR_OFFSET(SMU_CMD_REG, NPU6_SMU, MP1_C2PMSG_0), + DEFINE_BAR_OFFSET(SMU_ARG_REG, NPU6_SMU, MP1_C2PMSG_60), + DEFINE_BAR_OFFSET(SMU_INTR_REG, NPU6_SMU, MMNPU_APERTURE4_BASE), + DEFINE_BAR_OFFSET(SMU_RESP_REG, NPU6_SMU, MP1_C2PMSG_61), + DEFINE_BAR_OFFSET(SMU_OUT_REG, NPU6_SMU, MP1_C2PMSG_60), + }, + .hw_ops = { + .set_dpm = npu4_set_dpm, + }, + +}; + +const struct amdxdna_dev_info dev_npu6_info = { + .reg_bar = NPU6_REG_BAR_INDEX, + .mbox_bar = NPU6_MBOX_BAR_INDEX, + .sram_bar = NPU6_SRAM_BAR_INDEX, + .psp_bar = NPU6_PSP_BAR_INDEX, + .smu_bar = NPU6_SMU_BAR_INDEX, + .first_col = 0, + .dev_mem_buf_shift = 15, /* 32 KiB aligned */ + .dev_mem_base = AIE2_DEVM_BASE, + .dev_mem_size = AIE2_DEVM_SIZE, + .vbnv = "RyzenAI-npu6", + .device_type = AMDXDNA_DEV_TYPE_KMQ, + .dev_priv = &npu6_dev_priv, + .ops = &aie2_ops, +}; diff --git a/drivers/accel/habanalabs/common/habanalabs_drv.c b/drivers/accel/habanalabs/common/habanalabs_drv.c index 708dfd10f39c584a6221c29015f4b0323574145a..5409b2c656c803f6d172dd882711357061f30022 100644 --- a/drivers/accel/habanalabs/common/habanalabs_drv.c +++ b/drivers/accel/habanalabs/common/habanalabs_drv.c @@ -101,7 +101,6 @@ static const struct drm_driver hl_driver = { .major = LINUX_VERSION_MAJOR, .minor = LINUX_VERSION_PATCHLEVEL, .patchlevel = LINUX_VERSION_SUBLEVEL, - .date = "20190505", .fops = &hl_fops, .open = hl_device_open, diff --git a/drivers/accel/ivpu/ivpu_drv.c b/drivers/accel/ivpu/ivpu_drv.c index ca2bf47ce2484a8e7ecca0f936e5a3438b22ff81..1e8ffbe25eee9bd2369e18a1a54501ac90959d0f 100644 --- a/drivers/accel/ivpu/ivpu_drv.c +++ b/drivers/accel/ivpu/ivpu_drv.c @@ -458,15 +458,7 @@ static const struct drm_driver driver = { .name = DRIVER_NAME, .desc = DRIVER_DESC, -#ifdef DRIVER_DATE - .date = DRIVER_DATE, - .major = DRIVER_MAJOR, - .minor = DRIVER_MINOR, - .patchlevel = DRIVER_PATCHLEVEL, -#else - .date = UTS_RELEASE, .major = 1, -#endif }; static void ivpu_context_abort_invalid(struct ivpu_device *vdev) diff --git a/drivers/accel/ivpu/ivpu_pm.c b/drivers/accel/ivpu/ivpu_pm.c index dbc0711e28d138d5a6d1617e58abb6d8163eca07..6821051dfa3a7de0093a3aa06854dfc52fe3a171 100644 --- a/drivers/accel/ivpu/ivpu_pm.c +++ b/drivers/accel/ivpu/ivpu_pm.c @@ -78,8 +78,8 @@ static int ivpu_resume(struct ivpu_device *vdev) int ret; retry: - pci_restore_state(to_pci_dev(vdev->drm.dev)); pci_set_power_state(to_pci_dev(vdev->drm.dev), PCI_D0); + pci_restore_state(to_pci_dev(vdev->drm.dev)); ret = ivpu_hw_power_up(vdev); if (ret) { diff --git a/drivers/accel/qaic/qaic_drv.c b/drivers/accel/qaic/qaic_drv.c index 4ddf89308ff5a44624ab6209744c0290fd824c94..81819b9ef8d4fd148d1e85a1ad57d987a5cc199f 100644 --- a/drivers/accel/qaic/qaic_drv.c +++ b/drivers/accel/qaic/qaic_drv.c @@ -208,7 +208,6 @@ static const struct drm_driver qaic_accel_driver = { .name = QAIC_NAME, .desc = QAIC_DESC, - .date = "20190618", .fops = &qaic_accel_fops, .open = qaic_open, diff --git a/drivers/accel/qaic/sahara.c b/drivers/accel/qaic/sahara.c index 6d772143d6125656025ddae591c11e3ad873f610..21d58aed0deba63782d370f76c2b3649f0972cbb 100644 --- a/drivers/accel/qaic/sahara.c +++ b/drivers/accel/qaic/sahara.c @@ -772,8 +772,7 @@ static void sahara_mhi_remove(struct mhi_device *mhi_dev) cancel_work_sync(&context->fw_work); cancel_work_sync(&context->dump_work); - if (context->mem_dump) - vfree(context->mem_dump); + vfree(context->mem_dump); sahara_release_image(context); mhi_unprepare_from_transfer(mhi_dev); } diff --git a/drivers/acpi/acpica/evxfregn.c b/drivers/acpi/acpica/evxfregn.c index 95f78383bbdba16bab4d0a2022c87780cf6615bb..bff2d099f4691ef7dd30f191029a7a477ac332b8 100644 --- a/drivers/acpi/acpica/evxfregn.c +++ b/drivers/acpi/acpica/evxfregn.c @@ -232,8 +232,6 @@ acpi_remove_address_space_handler(acpi_handle device, /* Now we can delete the handler object */ - acpi_os_release_mutex(handler_obj->address_space. - context_mutex); acpi_ut_remove_reference(handler_obj); goto unlock_and_exit; } diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c index 5429ec9ef06f06535f32c82d38e968414bbc8e5b..a5d47819b3a4e295f7fe16e7f465c6d3c5fa792d 100644 --- a/drivers/acpi/nfit/core.c +++ b/drivers/acpi/nfit/core.c @@ -454,8 +454,13 @@ int acpi_nfit_ctl(struct nvdimm_bus_descriptor *nd_desc, struct nvdimm *nvdimm, if (cmd_rc) *cmd_rc = -EINVAL; - if (cmd == ND_CMD_CALL) + if (cmd == ND_CMD_CALL) { + if (!buf || buf_len < sizeof(*call_pkg)) + return -EINVAL; + call_pkg = buf; + } + func = cmd_to_func(nfit_mem, cmd, call_pkg, &family); if (func < 0) return func; diff --git a/drivers/acpi/resource.c b/drivers/acpi/resource.c index 7fe842dae1ec05ce6726af2ae4fcc8eff3698dcb..821867de43bea3d2b56f5011490a99a555949572 100644 --- a/drivers/acpi/resource.c +++ b/drivers/acpi/resource.c @@ -250,6 +250,9 @@ static bool acpi_decode_space(struct resource_win *win, switch (addr->resource_type) { case ACPI_MEMORY_RANGE: acpi_dev_memresource_flags(res, len, wp); + + if (addr->info.mem.caching == ACPI_PREFETCHABLE_MEMORY) + res->flags |= IORESOURCE_PREFETCH; break; case ACPI_IO_RANGE: acpi_dev_ioresource_flags(res, len, iodec, @@ -265,9 +268,6 @@ static bool acpi_decode_space(struct resource_win *win, if (addr->producer_consumer == ACPI_PRODUCER) res->flags |= IORESOURCE_WINDOW; - if (addr->info.mem.caching == ACPI_PREFETCHABLE_MEMORY) - res->flags |= IORESOURCE_PREFETCH; - return !(res->flags & IORESOURCE_DISABLED); } diff --git a/drivers/ata/sata_highbank.c b/drivers/ata/sata_highbank.c index b1b40e9551deda1b02f902ff065e64843559eea0..c8c817c51230a74df721016eb80238c188d96d01 100644 --- a/drivers/ata/sata_highbank.c +++ b/drivers/ata/sata_highbank.c @@ -348,6 +348,7 @@ static int highbank_initialize_phys(struct device *dev, void __iomem *addr) phy_nodes[phy] = phy_data.np; cphy_base[phy] = of_iomap(phy_nodes[phy], 0); if (cphy_base[phy] == NULL) { + of_node_put(phy_data.np); return 0; } phy_count += 1; diff --git a/drivers/bluetooth/btmtk.c b/drivers/bluetooth/btmtk.c index 8a3f7c3fcfec87d2eb991f0931bdc8af77f5983e..7fd9d5ddce02c2fea8843a9d4485e7ef0a893e0a 100644 --- a/drivers/bluetooth/btmtk.c +++ b/drivers/bluetooth/btmtk.c @@ -395,6 +395,7 @@ int btmtk_process_coredump(struct hci_dev *hdev, struct sk_buff *skb) { struct btmtk_data *data = hci_get_priv(hdev); int err; + bool complete = false; if (!IS_ENABLED(CONFIG_DEV_COREDUMP)) { kfree_skb(skb); @@ -416,19 +417,22 @@ int btmtk_process_coredump(struct hci_dev *hdev, struct sk_buff *skb) fallthrough; case HCI_DEVCOREDUMP_ACTIVE: default: + /* Mediatek coredump data would be more than MTK_COREDUMP_NUM */ + if (data->cd_info.cnt >= MTK_COREDUMP_NUM && + skb->len > MTK_COREDUMP_END_LEN) + if (!memcmp((char *)&skb->data[skb->len - MTK_COREDUMP_END_LEN], + MTK_COREDUMP_END, MTK_COREDUMP_END_LEN - 1)) + complete = true; + err = hci_devcd_append(hdev, skb); if (err < 0) break; data->cd_info.cnt++; - /* Mediatek coredump data would be more than MTK_COREDUMP_NUM */ - if (data->cd_info.cnt > MTK_COREDUMP_NUM && - skb->len > MTK_COREDUMP_END_LEN) - if (!memcmp((char *)&skb->data[skb->len - MTK_COREDUMP_END_LEN], - MTK_COREDUMP_END, MTK_COREDUMP_END_LEN - 1)) { - bt_dev_info(hdev, "Mediatek coredump end"); - hci_devcd_complete(hdev); - } + if (complete) { + bt_dev_info(hdev, "Mediatek coredump end"); + hci_devcd_complete(hdev); + } break; } diff --git a/drivers/clk/clk-en7523.c b/drivers/clk/clk-en7523.c index e52c5460e927f54c6df152c60560f438f89ec928..495c0d607c7d5be2163e3258aaa2fd8745095a90 100644 --- a/drivers/clk/clk-en7523.c +++ b/drivers/clk/clk-en7523.c @@ -87,6 +87,7 @@ static const u32 slic_base[] = { 100000000, 3125000 }; static const u32 npu_base[] = { 333000000, 400000000, 500000000 }; /* EN7581 */ static const u32 emi7581_base[] = { 540000000, 480000000, 400000000, 300000000 }; +static const u32 bus7581_base[] = { 600000000, 540000000 }; static const u32 npu7581_base[] = { 800000000, 750000000, 720000000, 600000000 }; static const u32 crypto_base[] = { 540000000, 480000000 }; @@ -222,8 +223,8 @@ static const struct en_clk_desc en7581_base_clks[] = { .base_reg = REG_BUS_CLK_DIV_SEL, .base_bits = 1, .base_shift = 8, - .base_values = bus_base, - .n_base_values = ARRAY_SIZE(bus_base), + .base_values = bus7581_base, + .n_base_values = ARRAY_SIZE(bus7581_base), .div_bits = 3, .div_shift = 0, @@ -503,6 +504,8 @@ static void en7523_register_clocks(struct device *dev, struct clk_hw_onecell_dat u32 rate; int i; + clk_data->num = EN7523_NUM_CLOCKS; + for (i = 0; i < ARRAY_SIZE(en7523_base_clks); i++) { const struct en_clk_desc *desc = &en7523_base_clks[i]; u32 reg = desc->div_reg ? desc->div_reg : desc->base_reg; @@ -524,8 +527,6 @@ static void en7523_register_clocks(struct device *dev, struct clk_hw_onecell_dat hw = en7523_register_pcie_clk(dev, np_base); clk_data->hws[EN7523_CLK_PCIE] = hw; - - clk_data->num = EN7523_NUM_CLOCKS; } static int en7523_clk_hw_init(struct platform_device *pdev, diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c index bdc6e5b90da581b3b9cbedcde3366c880a7dfd92..9b45fa005030f56e1478b9742715ebcde898133f 100644 --- a/drivers/clk/clk.c +++ b/drivers/clk/clk.c @@ -2530,7 +2530,7 @@ static int clk_core_set_rate_nolock(struct clk_core *core, rate = clk_core_req_round_rate_nolock(core, req_rate); /* bail early if nothing to do */ - if (rate == clk_core_get_rate_recalc(core)) + if (rate == clk_core_get_rate_nolock(core)) return 0; /* fail on a direct rate set of a protected provider */ diff --git a/drivers/clk/meson/Kconfig b/drivers/clk/meson/Kconfig index febb5d7348ff07c2da0cb5fd41d2ad2607e5bd5d..be2e3a5f83363b07cdcec2601acf15780ff24892 100644 --- a/drivers/clk/meson/Kconfig +++ b/drivers/clk/meson/Kconfig @@ -106,7 +106,7 @@ config COMMON_CLK_AXG_AUDIO select COMMON_CLK_MESON_SCLK_DIV select COMMON_CLK_MESON_CLKC_UTILS select REGMAP_MMIO - depends on RESET_MESON_AUX + select RESET_CONTROLLER help Support for the audio clock controller on AmLogic A113D devices, aka axg, Say Y if you want audio subsystem to work. diff --git a/drivers/clk/meson/axg-audio.c b/drivers/clk/meson/axg-audio.c index 7a3460804424456bf275e9dd4a1d9014c81a7bec..9df627b142f89788966ede0262aaaf39e13f0b49 100644 --- a/drivers/clk/meson/axg-audio.c +++ b/drivers/clk/meson/axg-audio.c @@ -15,8 +15,6 @@ #include #include -#include - #include "meson-clkc-utils.h" #include "axg-audio.h" #include "clk-regmap.h" @@ -1680,6 +1678,84 @@ static struct clk_regmap *const sm1_clk_regmaps[] = { &sm1_earcrx_dmac_clk, }; +struct axg_audio_reset_data { + struct reset_controller_dev rstc; + struct regmap *map; + unsigned int offset; +}; + +static void axg_audio_reset_reg_and_bit(struct axg_audio_reset_data *rst, + unsigned long id, + unsigned int *reg, + unsigned int *bit) +{ + unsigned int stride = regmap_get_reg_stride(rst->map); + + *reg = (id / (stride * BITS_PER_BYTE)) * stride; + *reg += rst->offset; + *bit = id % (stride * BITS_PER_BYTE); +} + +static int axg_audio_reset_update(struct reset_controller_dev *rcdev, + unsigned long id, bool assert) +{ + struct axg_audio_reset_data *rst = + container_of(rcdev, struct axg_audio_reset_data, rstc); + unsigned int offset, bit; + + axg_audio_reset_reg_and_bit(rst, id, &offset, &bit); + + regmap_update_bits(rst->map, offset, BIT(bit), + assert ? BIT(bit) : 0); + + return 0; +} + +static int axg_audio_reset_status(struct reset_controller_dev *rcdev, + unsigned long id) +{ + struct axg_audio_reset_data *rst = + container_of(rcdev, struct axg_audio_reset_data, rstc); + unsigned int val, offset, bit; + + axg_audio_reset_reg_and_bit(rst, id, &offset, &bit); + + regmap_read(rst->map, offset, &val); + + return !!(val & BIT(bit)); +} + +static int axg_audio_reset_assert(struct reset_controller_dev *rcdev, + unsigned long id) +{ + return axg_audio_reset_update(rcdev, id, true); +} + +static int axg_audio_reset_deassert(struct reset_controller_dev *rcdev, + unsigned long id) +{ + return axg_audio_reset_update(rcdev, id, false); +} + +static int axg_audio_reset_toggle(struct reset_controller_dev *rcdev, + unsigned long id) +{ + int ret; + + ret = axg_audio_reset_assert(rcdev, id); + if (ret) + return ret; + + return axg_audio_reset_deassert(rcdev, id); +} + +static const struct reset_control_ops axg_audio_rstc_ops = { + .assert = axg_audio_reset_assert, + .deassert = axg_audio_reset_deassert, + .reset = axg_audio_reset_toggle, + .status = axg_audio_reset_status, +}; + static struct regmap_config axg_audio_regmap_cfg = { .reg_bits = 32, .val_bits = 32, @@ -1690,14 +1766,16 @@ struct audioclk_data { struct clk_regmap *const *regmap_clks; unsigned int regmap_clk_num; struct meson_clk_hw_data hw_clks; + unsigned int reset_offset; + unsigned int reset_num; unsigned int max_register; - const char *rst_drvname; }; static int axg_audio_clkc_probe(struct platform_device *pdev) { struct device *dev = &pdev->dev; const struct audioclk_data *data; + struct axg_audio_reset_data *rst; struct regmap *map; void __iomem *regs; struct clk_hw *hw; @@ -1756,11 +1834,22 @@ static int axg_audio_clkc_probe(struct platform_device *pdev) if (ret) return ret; - /* Register auxiliary reset driver when applicable */ - if (data->rst_drvname) - ret = devm_meson_rst_aux_register(dev, map, data->rst_drvname); + /* Stop here if there is no reset */ + if (!data->reset_num) + return 0; + + rst = devm_kzalloc(dev, sizeof(*rst), GFP_KERNEL); + if (!rst) + return -ENOMEM; + + rst->map = map; + rst->offset = data->reset_offset; + rst->rstc.nr_resets = data->reset_num; + rst->rstc.ops = &axg_audio_rstc_ops; + rst->rstc.of_node = dev->of_node; + rst->rstc.owner = THIS_MODULE; - return ret; + return devm_reset_controller_register(dev, &rst->rstc); } static const struct audioclk_data axg_audioclk_data = { @@ -1780,8 +1869,9 @@ static const struct audioclk_data g12a_audioclk_data = { .hws = g12a_audio_hw_clks, .num = ARRAY_SIZE(g12a_audio_hw_clks), }, + .reset_offset = AUDIO_SW_RESET, + .reset_num = 26, .max_register = AUDIO_CLK_SPDIFOUT_B_CTRL, - .rst_drvname = "rst-g12a", }; static const struct audioclk_data sm1_audioclk_data = { @@ -1791,8 +1881,9 @@ static const struct audioclk_data sm1_audioclk_data = { .hws = sm1_audio_hw_clks, .num = ARRAY_SIZE(sm1_audio_hw_clks), }, + .reset_offset = AUDIO_SM1_SW_RESET0, + .reset_num = 39, .max_register = AUDIO_EARCRX_DMAC_CLK_CTRL, - .rst_drvname = "rst-sm1", }; static const struct of_device_id clkc_match_table[] = { diff --git a/drivers/crypto/hisilicon/debugfs.c b/drivers/crypto/hisilicon/debugfs.c index 1b9b7bccdeff0864643b1779b5a2071e26f57562..45e130b901eb5e7ae4605459ce54e624f0fece49 100644 --- a/drivers/crypto/hisilicon/debugfs.c +++ b/drivers/crypto/hisilicon/debugfs.c @@ -192,7 +192,7 @@ static int qm_sqc_dump(struct hisi_qm *qm, char *s, char *name) down_read(&qm->qps_lock); if (qm->sqc) { - memcpy(&sqc, qm->sqc + qp_id * sizeof(struct qm_sqc), sizeof(struct qm_sqc)); + memcpy(&sqc, qm->sqc + qp_id, sizeof(struct qm_sqc)); sqc.base_h = cpu_to_le32(QM_XQC_ADDR_MASK); sqc.base_l = cpu_to_le32(QM_XQC_ADDR_MASK); dump_show(qm, &sqc, sizeof(struct qm_sqc), "SOFT SQC"); @@ -229,7 +229,7 @@ static int qm_cqc_dump(struct hisi_qm *qm, char *s, char *name) down_read(&qm->qps_lock); if (qm->cqc) { - memcpy(&cqc, qm->cqc + qp_id * sizeof(struct qm_cqc), sizeof(struct qm_cqc)); + memcpy(&cqc, qm->cqc + qp_id, sizeof(struct qm_cqc)); cqc.base_h = cpu_to_le32(QM_XQC_ADDR_MASK); cqc.base_l = cpu_to_le32(QM_XQC_ADDR_MASK); dump_show(qm, &cqc, sizeof(struct qm_cqc), "SOFT CQC"); diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c index ddfbdb66b794d78d6bbff978bfa520a7b55c542f..5d356b7c45897c4a8f6905252c8d932407e380cb 100644 --- a/drivers/edac/amd64_edac.c +++ b/drivers/edac/amd64_edac.c @@ -3362,36 +3362,24 @@ static bool dct_ecc_enabled(struct amd64_pvt *pvt) static bool umc_ecc_enabled(struct amd64_pvt *pvt) { - u8 umc_en_mask = 0, ecc_en_mask = 0; - u16 nid = pvt->mc_node_id; struct amd64_umc *umc; - u8 ecc_en = 0, i; + bool ecc_en = false; + int i; + /* Check whether at least one UMC is enabled: */ for_each_umc(i) { umc = &pvt->umc[i]; - /* Only check enabled UMCs. */ - if (!(umc->sdp_ctrl & UMC_SDP_INIT)) - continue; - - umc_en_mask |= BIT(i); - - if (umc->umc_cap_hi & UMC_ECC_ENABLED) - ecc_en_mask |= BIT(i); + if (umc->sdp_ctrl & UMC_SDP_INIT && + umc->umc_cap_hi & UMC_ECC_ENABLED) { + ecc_en = true; + break; + } } - /* Check whether at least one UMC is enabled: */ - if (umc_en_mask) - ecc_en = umc_en_mask == ecc_en_mask; - else - edac_dbg(0, "Node %d: No enabled UMCs.\n", nid); - - edac_dbg(3, "Node %d: DRAM ECC %s.\n", nid, (ecc_en ? "enabled" : "disabled")); + edac_dbg(3, "Node %d: DRAM ECC %s.\n", pvt->mc_node_id, (ecc_en ? "enabled" : "disabled")); - if (!ecc_en) - return false; - else - return true; + return ecc_en; } static inline void diff --git a/drivers/firmware/efi/Kconfig b/drivers/firmware/efi/Kconfig index e312d731f4a36e98b105cfae3452f43ae6c674b1..5fe61b9ab5f969ca4fd9c15b0528f03824f9d62f 100644 --- a/drivers/firmware/efi/Kconfig +++ b/drivers/firmware/efi/Kconfig @@ -76,10 +76,6 @@ config EFI_ZBOOT bool "Enable the generic EFI decompressor" depends on EFI_GENERIC_STUB && !ARM select HAVE_KERNEL_GZIP - select HAVE_KERNEL_LZ4 - select HAVE_KERNEL_LZMA - select HAVE_KERNEL_LZO - select HAVE_KERNEL_XZ select HAVE_KERNEL_ZSTD help Create the bootable image as an EFI application that carries the diff --git a/drivers/firmware/efi/esrt.c b/drivers/firmware/efi/esrt.c index 7a81c0ce4780593d01bcabbbbb8920d0e5d8b690..4bb7b0584bc904e49a363638da8a45554070c015 100644 --- a/drivers/firmware/efi/esrt.c +++ b/drivers/firmware/efi/esrt.c @@ -75,8 +75,6 @@ static LIST_HEAD(entry_list); struct esre_attribute { struct attribute attr; ssize_t (*show)(struct esre_entry *entry, char *buf); - ssize_t (*store)(struct esre_entry *entry, - const char *buf, size_t count); }; static struct esre_entry *to_entry(struct kobject *kobj) diff --git a/drivers/firmware/efi/libstub/Makefile.zboot b/drivers/firmware/efi/libstub/Makefile.zboot index 65ffd0b760b2fb46c058da7c89b68300c001e912..48842b5c106b83a26cfc7fd7e2dfe95f85a289e4 100644 --- a/drivers/firmware/efi/libstub/Makefile.zboot +++ b/drivers/firmware/efi/libstub/Makefile.zboot @@ -12,22 +12,16 @@ quiet_cmd_copy_and_pad = PAD $@ $(obj)/vmlinux.bin: $(obj)/$(EFI_ZBOOT_PAYLOAD) FORCE $(call if_changed,copy_and_pad) -comp-type-$(CONFIG_KERNEL_GZIP) := gzip -comp-type-$(CONFIG_KERNEL_LZ4) := lz4 -comp-type-$(CONFIG_KERNEL_LZMA) := lzma -comp-type-$(CONFIG_KERNEL_LZO) := lzo -comp-type-$(CONFIG_KERNEL_XZ) := xzkern -comp-type-$(CONFIG_KERNEL_ZSTD) := zstd22 - # in GZIP, the appended le32 carrying the uncompressed size is part of the # format, but in other cases, we just append it at the end for convenience, # causing the original tools to complain when checking image integrity. -# So disregard it when calculating the payload size in the zimage header. -zboot-method-y := $(comp-type-y)_with_size -zboot-size-len-y := 4 +comp-type-y := gzip +zboot-method-y := gzip +zboot-size-len-y := 0 -zboot-method-$(CONFIG_KERNEL_GZIP) := gzip -zboot-size-len-$(CONFIG_KERNEL_GZIP) := 0 +comp-type-$(CONFIG_KERNEL_ZSTD) := zstd +zboot-method-$(CONFIG_KERNEL_ZSTD) := zstd22_with_size +zboot-size-len-$(CONFIG_KERNEL_ZSTD) := 4 $(obj)/vmlinuz: $(obj)/vmlinux.bin FORCE $(call if_changed,$(zboot-method-y)) diff --git a/drivers/gpio/Kconfig b/drivers/gpio/Kconfig index 56fee58e281e7cac7f287eb04e4c17a17f75ed04..93ee3aa092f81cd9573fcf91aef82ede1c5d7130 100644 --- a/drivers/gpio/Kconfig +++ b/drivers/gpio/Kconfig @@ -482,8 +482,9 @@ config GPIO_MT7621 Say yes here to support the Mediatek MT7621 SoC GPIO device. config GPIO_MVEBU - def_bool y + bool "Marvell Orion and EBU GPIO support" if COMPILE_TEST depends on PLAT_ORION || ARCH_MVEBU || COMPILE_TEST + default PLAT_ORION || ARCH_MVEBU select GENERIC_IRQ_CHIP select REGMAP_MMIO diff --git a/drivers/gpio/gpio-graniterapids.c b/drivers/gpio/gpio-graniterapids.c index f2e911a3d2ca022f7455cb9f599177738ac764d4..ad6a045fd3d2d2b02dfd6999627cf17952e74d9f 100644 --- a/drivers/gpio/gpio-graniterapids.c +++ b/drivers/gpio/gpio-graniterapids.c @@ -32,12 +32,14 @@ #define GNR_PINS_PER_REG 32 #define GNR_NUM_REGS DIV_ROUND_UP(GNR_NUM_PINS, GNR_PINS_PER_REG) -#define GNR_CFG_BAR 0x00 +#define GNR_CFG_PADBAR 0x00 #define GNR_CFG_LOCK_OFFSET 0x04 -#define GNR_GPI_STATUS_OFFSET 0x20 +#define GNR_GPI_STATUS_OFFSET 0x14 #define GNR_GPI_ENABLE_OFFSET 0x24 -#define GNR_CFG_DW_RX_MASK GENMASK(25, 22) +#define GNR_CFG_DW_HOSTSW_MODE BIT(27) +#define GNR_CFG_DW_RX_MASK GENMASK(23, 22) +#define GNR_CFG_DW_INTSEL_MASK GENMASK(21, 14) #define GNR_CFG_DW_RX_DISABLE FIELD_PREP(GNR_CFG_DW_RX_MASK, 2) #define GNR_CFG_DW_RX_EDGE FIELD_PREP(GNR_CFG_DW_RX_MASK, 1) #define GNR_CFG_DW_RX_LEVEL FIELD_PREP(GNR_CFG_DW_RX_MASK, 0) @@ -50,6 +52,7 @@ * struct gnr_gpio - Intel Granite Rapids-D vGPIO driver state * @gc: GPIO controller interface * @reg_base: base address of the GPIO registers + * @pad_base: base address of the vGPIO pad configuration registers * @ro_bitmap: bitmap of read-only pins * @lock: guard the registers * @pad_backup: backup of the register state for suspend @@ -57,6 +60,7 @@ struct gnr_gpio { struct gpio_chip gc; void __iomem *reg_base; + void __iomem *pad_base; DECLARE_BITMAP(ro_bitmap, GNR_NUM_PINS); raw_spinlock_t lock; u32 pad_backup[]; @@ -65,7 +69,7 @@ struct gnr_gpio { static void __iomem *gnr_gpio_get_padcfg_addr(const struct gnr_gpio *priv, unsigned int gpio) { - return priv->reg_base + gpio * sizeof(u32); + return priv->pad_base + gpio * sizeof(u32); } static int gnr_gpio_configure_line(struct gpio_chip *gc, unsigned int gpio, @@ -88,6 +92,20 @@ static int gnr_gpio_configure_line(struct gpio_chip *gc, unsigned int gpio, return 0; } +static int gnr_gpio_request(struct gpio_chip *gc, unsigned int gpio) +{ + struct gnr_gpio *priv = gpiochip_get_data(gc); + u32 dw; + + dw = readl(gnr_gpio_get_padcfg_addr(priv, gpio)); + if (!(dw & GNR_CFG_DW_HOSTSW_MODE)) { + dev_warn(gc->parent, "GPIO %u is not owned by host", gpio); + return -EBUSY; + } + + return 0; +} + static int gnr_gpio_get(struct gpio_chip *gc, unsigned int gpio) { const struct gnr_gpio *priv = gpiochip_get_data(gc); @@ -139,6 +157,7 @@ static int gnr_gpio_direction_output(struct gpio_chip *gc, unsigned int gpio, in static const struct gpio_chip gnr_gpio_chip = { .owner = THIS_MODULE, + .request = gnr_gpio_request, .get = gnr_gpio_get, .set = gnr_gpio_set, .get_direction = gnr_gpio_get_direction, @@ -166,7 +185,7 @@ static void gnr_gpio_irq_ack(struct irq_data *d) guard(raw_spinlock_irqsave)(&priv->lock); reg = readl(addr); - reg &= ~BIT(bit_idx); + reg |= BIT(bit_idx); writel(reg, addr); } @@ -209,10 +228,18 @@ static void gnr_gpio_irq_unmask(struct irq_data *d) static int gnr_gpio_irq_set_type(struct irq_data *d, unsigned int type) { struct gpio_chip *gc = irq_data_get_irq_chip_data(d); - irq_hw_number_t pin = irqd_to_hwirq(d); - u32 mask = GNR_CFG_DW_RX_MASK; + struct gnr_gpio *priv = gpiochip_get_data(gc); + irq_hw_number_t hwirq = irqd_to_hwirq(d); + u32 reg; u32 set; + /* Allow interrupts only if Interrupt Select field is non-zero */ + reg = readl(gnr_gpio_get_padcfg_addr(priv, hwirq)); + if (!(reg & GNR_CFG_DW_INTSEL_MASK)) { + dev_dbg(gc->parent, "GPIO %lu cannot be used as IRQ", hwirq); + return -EPERM; + } + /* Falling edge and level low triggers not supported by the GPIO controller */ switch (type) { case IRQ_TYPE_NONE: @@ -230,10 +257,11 @@ static int gnr_gpio_irq_set_type(struct irq_data *d, unsigned int type) return -EINVAL; } - return gnr_gpio_configure_line(gc, pin, mask, set); + return gnr_gpio_configure_line(gc, hwirq, GNR_CFG_DW_RX_MASK, set); } static const struct irq_chip gnr_gpio_irq_chip = { + .name = "gpio-graniterapids", .irq_ack = gnr_gpio_irq_ack, .irq_mask = gnr_gpio_irq_mask, .irq_unmask = gnr_gpio_irq_unmask, @@ -291,6 +319,7 @@ static int gnr_gpio_probe(struct platform_device *pdev) struct gnr_gpio *priv; void __iomem *regs; int irq, ret; + u32 offset; priv = devm_kzalloc(dev, struct_size(priv, pad_backup, num_backup_pins), GFP_KERNEL); if (!priv) @@ -302,6 +331,10 @@ static int gnr_gpio_probe(struct platform_device *pdev) if (IS_ERR(regs)) return PTR_ERR(regs); + priv->reg_base = regs; + offset = readl(priv->reg_base + GNR_CFG_PADBAR); + priv->pad_base = priv->reg_base + offset; + irq = platform_get_irq(pdev, 0); if (irq < 0) return irq; @@ -311,8 +344,6 @@ static int gnr_gpio_probe(struct platform_device *pdev) if (ret) return dev_err_probe(dev, ret, "failed to request interrupt\n"); - priv->reg_base = regs + readl(regs + GNR_CFG_BAR); - gnr_gpio_init_pin_ro_bits(dev, priv->reg_base + GNR_CFG_LOCK_OFFSET, priv->ro_bitmap); @@ -324,7 +355,6 @@ static int gnr_gpio_probe(struct platform_device *pdev) girq = &priv->gc.irq; gpio_irq_chip_set_chip(girq, &gnr_gpio_irq_chip); - girq->chip->name = dev_name(dev); girq->parent_handler = NULL; girq->num_parents = 0; girq->parents = NULL; diff --git a/drivers/gpio/gpio-idio-16.c b/drivers/gpio/gpio-idio-16.c index 2c9512589297217176c0450e6eab8adad4d3fe87..0103be977c66bb8d165c1c92123368be6832d120 100644 --- a/drivers/gpio/gpio-idio-16.c +++ b/drivers/gpio/gpio-idio-16.c @@ -3,6 +3,9 @@ * GPIO library for the ACCES IDIO-16 family * Copyright (C) 2022 William Breathitt Gray */ + +#define DEFAULT_SYMBOL_NAMESPACE "GPIO_IDIO_16" + #include #include #include @@ -14,8 +17,6 @@ #include "gpio-idio-16.h" -#define DEFAULT_SYMBOL_NAMESPACE "GPIO_IDIO_16" - #define IDIO_16_DAT_BASE 0x0 #define IDIO_16_OUT_BASE IDIO_16_DAT_BASE #define IDIO_16_IN_BASE (IDIO_16_DAT_BASE + 1) diff --git a/drivers/gpio/gpio-ljca.c b/drivers/gpio/gpio-ljca.c index 503d2441c58f27381bff532cf01fd45f654707f0..817ecb12d5509a124364620fc96accf1d4dc0e21 100644 --- a/drivers/gpio/gpio-ljca.c +++ b/drivers/gpio/gpio-ljca.c @@ -82,9 +82,9 @@ static int ljca_gpio_config(struct ljca_gpio_dev *ljca_gpio, u8 gpio_id, int ret; mutex_lock(&ljca_gpio->trans_lock); + packet->num = 1; packet->item[0].index = gpio_id; packet->item[0].value = config | ljca_gpio->connect_mode[gpio_id]; - packet->num = 1; ret = ljca_transfer(ljca_gpio->ljca, LJCA_GPIO_CONFIG, (u8 *)packet, struct_size(packet, item, packet->num), NULL, 0); diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig index 5504721007cc190e7d768d42aa9633baa0115f5e..6f1cf235073d9289bdb81d2cf4ee22f8928c261b 100644 --- a/drivers/gpu/drm/Kconfig +++ b/drivers/gpu/drm/Kconfig @@ -102,10 +102,15 @@ config DRM_KMS_HELPER help CRTC helpers for KMS drivers. +config DRM_DRAW + bool + depends on DRM + config DRM_PANIC bool "Display a user-friendly message when a kernel panic occurs" depends on DRM select FONT_SUPPORT + select DRM_DRAW help Enable a drm panic handler, which will display a user-friendly message when a kernel panic occurs. It's useful when using a user-space @@ -217,77 +222,7 @@ config DRM_CLIENT option. Drivers that support the default clients should select DRM_CLIENT_SELECTION instead. -config DRM_CLIENT_LIB - tristate - depends on DRM - select DRM_KMS_HELPER if DRM_FBDEV_EMULATION - select FB_CORE if DRM_FBDEV_EMULATION - help - This option enables the DRM client library and selects all - modules and components according to the enabled clients. - -config DRM_CLIENT_SELECTION - tristate - depends on DRM - select DRM_CLIENT_LIB if DRM_FBDEV_EMULATION - help - Drivers that support in-kernel DRM clients have to select this - option. - -config DRM_CLIENT_SETUP - bool - depends on DRM_CLIENT_SELECTION - help - Enables the DRM client selection. DRM drivers that support the - default clients should select DRM_CLIENT_SELECTION instead. - -menu "Supported DRM clients" - depends on DRM_CLIENT_SELECTION - -config DRM_FBDEV_EMULATION - bool "Enable legacy fbdev support for your modesetting driver" - depends on DRM_CLIENT_SELECTION - select DRM_CLIENT - select DRM_CLIENT_SETUP - select FRAMEBUFFER_CONSOLE_DETECT_PRIMARY if FRAMEBUFFER_CONSOLE - default FB - help - Choose this option if you have a need for the legacy fbdev - support. Note that this support also provides the linux console - support on top of your modesetting driver. - - If in doubt, say "Y". - -config DRM_FBDEV_OVERALLOC - int "Overallocation of the fbdev buffer" - depends on DRM_FBDEV_EMULATION - default 100 - help - Defines the fbdev buffer overallocation in percent. Default - is 100. Typical values for double buffering will be 200, - triple buffering 300. - -config DRM_FBDEV_LEAK_PHYS_SMEM - bool "Shamelessly allow leaking of fbdev physical address (DANGEROUS)" - depends on DRM_FBDEV_EMULATION && EXPERT - default n - help - In order to keep user-space compatibility, we want in certain - use-cases to keep leaking the fbdev physical address to the - user-space program handling the fbdev buffer. - This affects, not only, Amlogic, Allwinner or Rockchip devices - with ARM Mali GPUs using an userspace Blob. - This option is not supported by upstream developers and should be - removed as soon as possible and be considered as a broken and - legacy behaviour from a modern fbdev device driver. - - Please send any bug reports when using this to your proprietary - software vendor that requires this. - - If in doubt, say "N" or spread the word to your closed source - library vendor. - -endmenu +source "drivers/gpu/drm/clients/Kconfig" config DRM_LOAD_EDID_FIRMWARE bool "Allow to specify an EDID data set instead of probing for it" @@ -526,6 +461,10 @@ config DRM_HYPERV config DRM_EXPORT_FOR_TESTS bool +# Separate option as not all DRM drivers use it +config DRM_PANEL_BACKLIGHT_QUIRKS + tristate + config DRM_LIB_RANDOM bool default n diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile index 463afad1b5ca6275e61223adc8ca036c3d4d6b03..19fb370fbc56772077973c864df71e4b8e0bf99b 100644 --- a/drivers/gpu/drm/Makefile +++ b/drivers/gpu/drm/Makefile @@ -91,10 +91,12 @@ drm-$(CONFIG_DRM_PRIVACY_SCREEN) += \ drm_privacy_screen_x86.o drm-$(CONFIG_DRM_ACCEL) += ../../accel/drm_accel.o drm-$(CONFIG_DRM_PANIC) += drm_panic.o +drm-$(CONFIG_DRM_DRAW) += drm_draw.o drm-$(CONFIG_DRM_PANIC_SCREEN_QR_CODE) += drm_panic_qr.o obj-$(CONFIG_DRM) += drm.o obj-$(CONFIG_DRM_PANEL_ORIENTATION_QUIRKS) += drm_panel_orientation_quirks.o +obj-$(CONFIG_DRM_PANEL_BACKLIGHT_QUIRKS) += drm_panel_backlight_quirks.o # # Memory-management helpers @@ -148,14 +150,6 @@ drm_kms_helper-$(CONFIG_DRM_PANEL_BRIDGE) += bridge/panel.o drm_kms_helper-$(CONFIG_DRM_FBDEV_EMULATION) += drm_fb_helper.o obj-$(CONFIG_DRM_KMS_HELPER) += drm_kms_helper.o -# -# DRM clients -# - -drm_client_lib-y := drm_client_setup.o -drm_client_lib-$(CONFIG_DRM_FBDEV_EMULATION) += drm_fbdev_client.o -obj-$(CONFIG_DRM_CLIENT_LIB) += drm_client_lib.o - # # Drivers and the rest # @@ -165,6 +159,7 @@ obj-y += tests/ obj-$(CONFIG_DRM_MIPI_DBI) += drm_mipi_dbi.o obj-$(CONFIG_DRM_MIPI_DSI) += drm_mipi_dsi.o obj-y += arm/ +obj-y += clients/ obj-y += display/ obj-$(CONFIG_DRM_TTM) += ttm/ obj-$(CONFIG_DRM_SCHED) += scheduler/ diff --git a/drivers/gpu/drm/amd/amdgpu/Kconfig b/drivers/gpu/drm/amd/amdgpu/Kconfig index 41fa3377d9cf566294c6f86eedd8b31a22c77510..1a11cab741aca4483673f8e7794b4d3022d269e6 100644 --- a/drivers/gpu/drm/amd/amdgpu/Kconfig +++ b/drivers/gpu/drm/amd/amdgpu/Kconfig @@ -26,6 +26,7 @@ config DRM_AMDGPU select DRM_BUDDY select DRM_SUBALLOC_HELPER select DRM_EXEC + select DRM_PANEL_BACKLIGHT_QUIRKS # amdgpu depends on ACPI_VIDEO when ACPI is enabled, for select to work # ACPI_VIDEO's dependencies must also be selected. select INPUT if ACPI diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile b/drivers/gpu/drm/amd/amdgpu/Makefile index c7b18c52825d67b8dace1901810db9f9add02a0d..5b21674b07fb27de0e409cd6c14f288f2ca936bf 100644 --- a/drivers/gpu/drm/amd/amdgpu/Makefile +++ b/drivers/gpu/drm/amd/amdgpu/Makefile @@ -1,5 +1,5 @@ # -# Copyright 2017 Advanced Micro Devices, Inc. +# Copyright 2017-2024 Advanced Micro Devices, Inc. All rights reserved. # # Permission is hereby granted, free of charge, to any person obtaining a # copy of this software and associated documentation files (the "Software"), @@ -105,7 +105,7 @@ amdgpu-y += \ # add UMC block amdgpu-y += \ - umc_v6_0.o umc_v6_1.o umc_v6_7.o umc_v8_7.o umc_v8_10.o umc_v12_0.o + umc_v6_0.o umc_v6_1.o umc_v6_7.o umc_v8_7.o umc_v8_10.o umc_v12_0.o umc_v8_14.o # add IH block amdgpu-y += \ @@ -200,6 +200,7 @@ amdgpu-y += \ vcn_v4_0_3.o \ vcn_v4_0_5.o \ vcn_v5_0_0.o \ + vcn_v5_0_1.o \ amdgpu_jpeg.o \ jpeg_v1_0.o \ jpeg_v2_0.o \ @@ -208,7 +209,8 @@ amdgpu-y += \ jpeg_v4_0.o \ jpeg_v4_0_3.o \ jpeg_v4_0_5.o \ - jpeg_v5_0_0.o + jpeg_v5_0_0.o \ + jpeg_v5_0_1.o # add VPE block amdgpu-y += \ diff --git a/drivers/gpu/drm/amd/amdgpu/aldebaran.c b/drivers/gpu/drm/amd/amdgpu/aldebaran.c index f44de9d4b6a17f212bb9911bb3f33df69a349315..e13fbd97414126ef068bece1b57c61c6767803d9 100644 --- a/drivers/gpu/drm/amd/amdgpu/aldebaran.c +++ b/drivers/gpu/drm/amd/amdgpu/aldebaran.c @@ -334,6 +334,8 @@ aldebaran_mode2_restore_hwcontext(struct amdgpu_reset_control *reset_ctl, AMDGPU_INIT_LEVEL_RESET_RECOVERY); dev_info(tmp_adev->dev, "GPU reset succeeded, trying to resume\n"); + /*TBD: Ideally should clear only GFX, SDMA blocks*/ + amdgpu_ras_clear_err_state(tmp_adev); r = aldebaran_mode2_restore_ip(tmp_adev); if (r) goto end; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index 4653a8d2823a6d6f645c620ba56caf0769e0d2b1..69895fccb474aefae082fee2f6db916f4afd41ab 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -880,6 +880,7 @@ struct amdgpu_device { bool need_swiotlb; bool accel_working; struct notifier_block acpi_nb; + struct notifier_block pm_nb; struct amdgpu_i2c_chan *i2c_bus[AMDGPU_MAX_I2C_BUS]; struct debugfs_blob_wrapper debugfs_vbios_blob; struct debugfs_blob_wrapper debugfs_discovery_blob; @@ -1174,7 +1175,6 @@ struct amdgpu_device { struct work_struct reset_work; - bool job_hang; bool dc_enabled; /* Mask of active clusters */ uint32_t aid_mask; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_aca.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_aca.h index 5ef6b745f2223b8dd3eb94a33763a79c7d94d491..f3289d2899130a90898569a1ef9042e1ffded0e6 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_aca.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_aca.h @@ -71,6 +71,11 @@ struct ras_query_context; #define ACA_ERROR_CE_MASK BIT_MASK(ACA_ERROR_TYPE_CE) #define ACA_ERROR_DEFERRED_MASK BIT_MASK(ACA_ERROR_TYPE_DEFERRED) +#define mmSMNAID_AID0_MCA_SMU 0x03b30400 /* SMN AID AID0 */ +#define mmSMNAID_XCD0_MCA_SMU 0x36430400 /* SMN AID XCD0 */ +#define mmSMNAID_XCD1_MCA_SMU 0x38430400 /* SMN AID XCD1 */ +#define mmSMNXCD_XCD0_MCA_SMU 0x40430400 /* SMN XCD XCD0 */ + enum aca_reg_idx { ACA_REG_IDX_CTL = 0, ACA_REG_IDX_STATUS = 1, diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_acp.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_acp.c index ec5e0dcf86135c6ef723d0fcb5e4126225e4ffd6..deb0785350e8e9ad0fc3fb1f2ae97f51b174c1b9 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_acp.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_acp.c @@ -140,7 +140,7 @@ static int acp_poweroff(struct generic_pm_domain *genpd) * 2. power off the acp tiles * 3. check and enter ulv state */ - amdgpu_dpm_set_powergating_by_smu(adev, AMD_IP_BLOCK_TYPE_ACP, true); + amdgpu_dpm_set_powergating_by_smu(adev, AMD_IP_BLOCK_TYPE_ACP, true, 0); return 0; } @@ -157,7 +157,7 @@ static int acp_poweron(struct generic_pm_domain *genpd) * 2. turn on acp clock * 3. power on acp tiles */ - amdgpu_dpm_set_powergating_by_smu(adev, AMD_IP_BLOCK_TYPE_ACP, false); + amdgpu_dpm_set_powergating_by_smu(adev, AMD_IP_BLOCK_TYPE_ACP, false, 0); return 0; } @@ -236,7 +236,7 @@ static int acp_hw_init(struct amdgpu_ip_block *ip_block) ip_block->version->major, ip_block->version->minor); /* -ENODEV means board uses AZ rather than ACP */ if (r == -ENODEV) { - amdgpu_dpm_set_powergating_by_smu(adev, AMD_IP_BLOCK_TYPE_ACP, true); + amdgpu_dpm_set_powergating_by_smu(adev, AMD_IP_BLOCK_TYPE_ACP, true, 0); return 0; } else if (r) { return r; @@ -508,7 +508,7 @@ static int acp_hw_fini(struct amdgpu_ip_block *ip_block) /* return early if no ACP */ if (!adev->acp.acp_genpd) { - amdgpu_dpm_set_powergating_by_smu(adev, AMD_IP_BLOCK_TYPE_ACP, false); + amdgpu_dpm_set_powergating_by_smu(adev, AMD_IP_BLOCK_TYPE_ACP, false, 0); return 0; } @@ -565,7 +565,7 @@ static int acp_suspend(struct amdgpu_ip_block *ip_block) /* power up on suspend */ if (!adev->acp.acp_cell) - amdgpu_dpm_set_powergating_by_smu(adev, AMD_IP_BLOCK_TYPE_ACP, false); + amdgpu_dpm_set_powergating_by_smu(adev, AMD_IP_BLOCK_TYPE_ACP, false, 0); return 0; } @@ -575,7 +575,7 @@ static int acp_resume(struct amdgpu_ip_block *ip_block) /* power down again on resume */ if (!adev->acp.acp_cell) - amdgpu_dpm_set_powergating_by_smu(adev, AMD_IP_BLOCK_TYPE_ACP, true); + amdgpu_dpm_set_powergating_by_smu(adev, AMD_IP_BLOCK_TYPE_ACP, true, 0); return 0; } @@ -584,19 +584,19 @@ static bool acp_is_idle(void *handle) return true; } -static int acp_set_clockgating_state(void *handle, +static int acp_set_clockgating_state(struct amdgpu_ip_block *ip_block, enum amd_clockgating_state state) { return 0; } -static int acp_set_powergating_state(void *handle, +static int acp_set_powergating_state(struct amdgpu_ip_block *ip_block, enum amd_powergating_state state) { - struct amdgpu_device *adev = (struct amdgpu_device *)handle; + struct amdgpu_device *adev = ip_block->adev; bool enable = (state == AMD_PG_STATE_GATE); - amdgpu_dpm_set_powergating_by_smu(adev, AMD_IP_BLOCK_TYPE_ACP, enable); + amdgpu_dpm_set_powergating_by_smu(adev, AMD_IP_BLOCK_TYPE_ACP, enable, 0); return 0; } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c index 3afcd1e8aa543534808eacd78ff9d63a99bdf163..2e5732dfd425ea7a7319f4c27a5f1def190655ac 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c @@ -368,7 +368,7 @@ void amdgpu_amdkfd_free_gtt_mem(struct amdgpu_device *adev, void **mem_obj) { struct amdgpu_bo **bo = (struct amdgpu_bo **) mem_obj; - amdgpu_bo_reserve(*bo, true); + (void)amdgpu_bo_reserve(*bo, true); amdgpu_bo_kunmap(*bo); amdgpu_bo_unpin(*bo); amdgpu_bo_unreserve(*bo); @@ -724,7 +724,9 @@ void amdgpu_amdkfd_set_compute_idle(struct amdgpu_device *adev, bool idle) /* Disable GFXOFF and PG. Temporary workaround * to fix some compute applications issue on GFX9. */ - adev->ip_blocks[AMD_IP_BLOCK_TYPE_GFX].version->funcs->set_powergating_state((void *)adev, state); + struct amdgpu_ip_block *gfx_block = amdgpu_device_ip_get_ip_block(adev, AMD_IP_BLOCK_TYPE_GFX); + if (gfx_block != NULL) + gfx_block->version->funcs->set_powergating_state((void *)gfx_block, state); } amdgpu_dpm_switch_power_profile(adev, PP_SMC_POWER_PROFILE_COMPUTE, @@ -834,7 +836,7 @@ int amdgpu_amdkfd_unmap_hiq(struct amdgpu_device *adev, u32 doorbell_off, if (!kiq->pmf || !kiq->pmf->kiq_unmap_queues) return -EINVAL; - if (!kiq_ring->sched.ready || adev->job_hang) + if (!kiq_ring->sched.ready || amdgpu_in_reset(adev)) return 0; ring_funcs = kzalloc(sizeof(*ring_funcs), GFP_KERNEL); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h index 4b80ad860639c7f2c3136c8124f04854621b69b5..8af67f18500a7486bdc48a5647fbb520736eeaa9 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h @@ -433,6 +433,9 @@ void kgd2kfd_unlock_kfd(void); int kgd2kfd_start_sched(struct kfd_dev *kfd, uint32_t node_id); int kgd2kfd_stop_sched(struct kfd_dev *kfd, uint32_t node_id); bool kgd2kfd_compute_active(struct kfd_dev *kfd, uint32_t node_id); +bool kgd2kfd_vmfault_fast_path(struct amdgpu_device *adev, struct amdgpu_iv_entry *entry, + bool retry_fault); + #else static inline int kgd2kfd_init(void) { @@ -518,5 +521,12 @@ static inline bool kgd2kfd_compute_active(struct kfd_dev *kfd, uint32_t node_id) { return false; } + +static inline bool kgd2kfd_vmfault_fast_path(struct amdgpu_device *adev, struct amdgpu_iv_entry *entry, + bool retry_fault) +{ + return false; +} + #endif #endif /* AMDGPU_AMDKFD_H_INCLUDED */ diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c index f30548f4c3b3e2fd7f32e8d71d7ab5d27fd4786e..1e998f972c308bb019c2b5131b7a2ff3752e2606 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c @@ -730,7 +730,7 @@ kfd_mem_dmaunmap_userptr(struct kgd_mem *mem, return; amdgpu_bo_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_CPU); - ttm_bo_validate(&bo->tbo, &bo->placement, &ctx); + (void)ttm_bo_validate(&bo->tbo, &bo->placement, &ctx); dma_unmap_sgtable(adev->dev, ttm->sg, direction, 0); sg_free_table(ttm->sg); @@ -779,7 +779,7 @@ kfd_mem_dmaunmap_sg_bo(struct kgd_mem *mem, } amdgpu_bo_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_CPU); - ttm_bo_validate(&bo->tbo, &bo->placement, &ctx); + (void)ttm_bo_validate(&bo->tbo, &bo->placement, &ctx); dir = mem->alloc_flags & KFD_IOC_ALLOC_MEM_FLAGS_WRITABLE ? DMA_BIDIRECTIONAL : DMA_TO_DEVICE; @@ -989,7 +989,7 @@ unwind: if (!attachment[i]) continue; if (attachment[i]->bo_va) { - amdgpu_bo_reserve(bo[i], true); + (void)amdgpu_bo_reserve(bo[i], true); if (--attachment[i]->bo_va->ref_count == 0) amdgpu_vm_bo_del(adev, attachment[i]->bo_va); amdgpu_bo_unreserve(bo[i]); @@ -1259,11 +1259,11 @@ static int unmap_bo_from_gpuvm(struct kgd_mem *mem, return -EBUSY; } - amdgpu_vm_bo_unmap(adev, bo_va, entry->va); + (void)amdgpu_vm_bo_unmap(adev, bo_va, entry->va); - amdgpu_vm_clear_freed(adev, vm, &bo_va->last_pt_update); + (void)amdgpu_vm_clear_freed(adev, vm, &bo_va->last_pt_update); - amdgpu_sync_fence(sync, bo_va->last_pt_update); + (void)amdgpu_sync_fence(sync, bo_va->last_pt_update); return 0; } @@ -2352,7 +2352,7 @@ void amdgpu_amdkfd_gpuvm_unmap_gtt_bo_from_kernel(struct kgd_mem *mem) { struct amdgpu_bo *bo = mem->bo; - amdgpu_bo_reserve(bo, true); + (void)amdgpu_bo_reserve(bo, true); amdgpu_bo_kunmap(bo); amdgpu_bo_unpin(bo); amdgpu_bo_unreserve(bo); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_bios.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_bios.c index 45affc02548c16bc3a361a654d6f2e73bcc82862..423fd2eebe1e05a40ebed53316447501eda81249 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_bios.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_bios.c @@ -47,35 +47,37 @@ /* Check if current bios is an ATOM BIOS. * Return true if it is ATOM BIOS. Otherwise, return false. */ -static bool check_atom_bios(uint8_t *bios, size_t size) +static bool check_atom_bios(struct amdgpu_device *adev, size_t size) { uint16_t tmp, bios_header_start; + uint8_t *bios = adev->bios; if (!bios || size < 0x49) { - DRM_INFO("vbios mem is null or mem size is wrong\n"); + dev_dbg(adev->dev, "VBIOS mem is null or mem size is wrong\n"); return false; } if (!AMD_IS_VALID_VBIOS(bios)) { - DRM_INFO("BIOS signature incorrect %x %x\n", bios[0], bios[1]); + dev_dbg(adev->dev, "VBIOS signature incorrect %x %x\n", bios[0], + bios[1]); return false; } bios_header_start = bios[0x48] | (bios[0x49] << 8); if (!bios_header_start) { - DRM_INFO("Can't locate bios header\n"); + dev_dbg(adev->dev, "Can't locate VBIOS header\n"); return false; } tmp = bios_header_start + 4; if (size < tmp) { - DRM_INFO("BIOS header is broken\n"); + dev_dbg(adev->dev, "VBIOS header is broken\n"); return false; } if (!memcmp(bios + tmp, "ATOM", 4) || !memcmp(bios + tmp, "MOTA", 4)) { - DRM_DEBUG("ATOMBIOS detected\n"); + dev_dbg(adev->dev, "ATOMBIOS detected\n"); return true; } @@ -118,7 +120,7 @@ static bool amdgpu_read_bios_from_vram(struct amdgpu_device *adev) memcpy_fromio(adev->bios, bios, size); iounmap(bios); - if (!check_atom_bios(adev->bios, size)) { + if (!check_atom_bios(adev, size)) { kfree(adev->bios); return false; } @@ -146,7 +148,7 @@ bool amdgpu_read_bios(struct amdgpu_device *adev) memcpy_fromio(adev->bios, bios, size); pci_unmap_rom(adev->pdev, bios); - if (!check_atom_bios(adev->bios, size)) { + if (!check_atom_bios(adev, size)) { kfree(adev->bios); return false; } @@ -186,7 +188,7 @@ static bool amdgpu_read_bios_from_rom(struct amdgpu_device *adev) /* read complete BIOS */ amdgpu_asic_read_bios_from_rom(adev, adev->bios, len); - if (!check_atom_bios(adev->bios, len)) { + if (!check_atom_bios(adev, len)) { kfree(adev->bios); return false; } @@ -216,7 +218,7 @@ static bool amdgpu_read_platform_bios(struct amdgpu_device *adev) memcpy_fromio(adev->bios, bios, romlen); iounmap(bios); - if (!check_atom_bios(adev->bios, romlen)) + if (!check_atom_bios(adev, romlen)) goto free_bios; adev->bios_size = romlen; @@ -324,7 +326,7 @@ static bool amdgpu_atrm_get_bios(struct amdgpu_device *adev) break; } - if (!check_atom_bios(adev->bios, size)) { + if (!check_atom_bios(adev, size)) { kfree(adev->bios); return false; } @@ -389,7 +391,7 @@ static bool amdgpu_acpi_vfct_bios(struct amdgpu_device *adev) vhdr->ImageLength, GFP_KERNEL); - if (!check_atom_bios(adev->bios, vhdr->ImageLength)) { + if (!check_atom_bios(adev, vhdr->ImageLength)) { kfree(adev->bios); return false; } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cgs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cgs.c index 16153d275d7ae5154f3fce4933fe434fc3291249..68bce6a6d09d1f77023eb8748d7e6382d9dfe467 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cgs.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cgs.c @@ -414,7 +414,9 @@ static int amdgpu_cgs_get_firmware_info(struct cgs_device *cgs_device, return -EINVAL; } - err = amdgpu_ucode_request(adev, &adev->pm.fw, "%s", fw_name); + err = amdgpu_ucode_request(adev, &adev->pm.fw, + AMDGPU_UCODE_REQUIRED, + "%s", fw_name); if (err) { DRM_ERROR("Failed to load firmware \"%s\"", fw_name); amdgpu_ucode_release(&adev->pm.fw); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c index d891ab779ca7f5168f8fd4e80c4f100c9b643186..5df21529b3b13e26a68c2526d1830cef0d15f8ad 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c @@ -1801,13 +1801,18 @@ int amdgpu_cs_find_mapping(struct amdgpu_cs_parser *parser, if (dma_resv_locking_ctx((*bo)->tbo.base.resv) != &parser->exec.ticket) return -EINVAL; + /* Make sure VRAM is allocated contigiously */ (*bo)->flags |= AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS; - amdgpu_bo_placement_from_domain(*bo, (*bo)->allowed_domains); - for (i = 0; i < (*bo)->placement.num_placement; i++) - (*bo)->placements[i].flags |= TTM_PL_FLAG_CONTIGUOUS; - r = ttm_bo_validate(&(*bo)->tbo, &(*bo)->placement, &ctx); - if (r) - return r; + if ((*bo)->tbo.resource->mem_type == TTM_PL_VRAM && + !((*bo)->tbo.resource->placement & TTM_PL_FLAG_CONTIGUOUS)) { + + amdgpu_bo_placement_from_domain(*bo, (*bo)->allowed_domains); + for (i = 0; i < (*bo)->placement.num_placement; i++) + (*bo)->placements[i].flags |= TTM_PL_FLAG_CONTIGUOUS; + r = ttm_bo_validate(&(*bo)->tbo, &(*bo)->placement, &ctx); + if (r) + return r; + } return amdgpu_ttm_alloc_gart(&(*bo)->tbo); } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c index a68338cb7b4afb423cb0b2cac624c401a88da8de..49ca8c814455d35880087986502785edd5dc47c8 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c @@ -2095,6 +2095,7 @@ int amdgpu_debugfs_init(struct amdgpu_device *adev) if (amdgpu_umsch_mm & amdgpu_umsch_mm_fwlog) amdgpu_debugfs_umsch_fwlog_init(adev, &adev->umsch_mm); + amdgpu_debugfs_vcn_sched_mask_init(adev); amdgpu_debugfs_jpeg_sched_mask_init(adev); amdgpu_debugfs_gfx_sched_mask_init(adev); amdgpu_debugfs_compute_sched_mask_init(adev); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dev_coredump.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_dev_coredump.c index 946c48829f1970724654efec2eabf37ceb3b3013..824f9da5b6cea9c95f46e4268cb914589ea3b088 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dev_coredump.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dev_coredump.c @@ -343,11 +343,10 @@ void amdgpu_coredump(struct amdgpu_device *adev, bool skip_vram_check, coredump->skip_vram_check = skip_vram_check; coredump->reset_vram_lost = vram_lost; - if (job && job->vm) { - struct amdgpu_vm *vm = job->vm; + if (job && job->pasid) { struct amdgpu_task_info *ti; - ti = amdgpu_vm_get_task_info_vm(vm); + ti = amdgpu_vm_get_task_info_pasid(adev, job->pasid); if (ti) { coredump->reset_task_info = *ti; amdgpu_vm_put_task_info(ti); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 96316111300a7a389536043b1f3cf35c1947bd7c..36053b3d48b3e92d4d32be71d2471d154b085df4 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -145,7 +145,7 @@ const char *amdgpu_asic_name[] = { "LAST", }; -#define AMDGPU_IP_BLK_MASK_ALL GENMASK(AMDGPU_MAX_IP_NUM, 0) +#define AMDGPU_IP_BLK_MASK_ALL GENMASK(AMD_IP_BLOCK_TYPE_NUM - 1, 0) /* * Default init level where all blocks are expected to be initialized. This is * the level of initialization expected by default and also after a full reset @@ -199,14 +199,16 @@ void amdgpu_set_init_level(struct amdgpu_device *adev, } static inline void amdgpu_device_stop_pending_resets(struct amdgpu_device *adev); +static int amdgpu_device_pm_notifier(struct notifier_block *nb, unsigned long mode, + void *data); /** * DOC: pcie_replay_count * * The amdgpu driver provides a sysfs API for reporting the total number - * of PCIe replays (NAKs) + * of PCIe replays (NAKs). * The file pcie_replay_count is used for this and returns the total - * number of replays as a sum of the NAKs generated and NAKs received + * number of replays as a sum of the NAKs generated and NAKs received. */ static ssize_t amdgpu_device_get_pcie_replay_count(struct device *dev, @@ -417,6 +419,9 @@ bool amdgpu_device_supports_boco(struct drm_device *dev) { struct amdgpu_device *adev = drm_to_adev(dev); + if (!IS_ENABLED(CONFIG_HOTPLUG_PCI_PCIE)) + return false; + if (adev->has_pr3 || ((adev->flags & AMD_IS_PX) && amdgpu_is_atpx_hybrid())) return true; @@ -429,8 +434,8 @@ bool amdgpu_device_supports_boco(struct drm_device *dev) * @dev: drm_device pointer * * Return: - * 1 if the device supporte BACO; - * 3 if the device support MACO (only works if BACO is supported) + * 1 if the device supports BACO; + * 3 if the device supports MACO (only works if BACO is supported) * otherwise return 0. */ int amdgpu_device_supports_baco(struct drm_device *dev) @@ -577,7 +582,7 @@ void amdgpu_device_mm_access(struct amdgpu_device *adev, loff_t pos, } /** - * amdgpu_device_aper_access - access vram by vram aperature + * amdgpu_device_aper_access - access vram by vram aperture * * @adev: amdgpu_device pointer * @pos: offset of the buffer in vram @@ -668,7 +673,7 @@ bool amdgpu_device_skip_hw_access(struct amdgpu_device *adev) * here is that the GPU reset is not running on another thread in parallel. * * For this we trylock the read side of the reset semaphore, if that succeeds - * we know that the reset is not running in paralell. + * we know that the reset is not running in parallel. * * If the trylock fails we assert that we are either already holding the read * side of the lock or are the reset thread itself and hold the write side of @@ -1399,6 +1404,7 @@ static int amdgpu_device_asic_init(struct amdgpu_device *adev) if (amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(9, 4, 3) || amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(9, 4, 4) || + amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(9, 5, 0) || amdgpu_ip_version(adev, GC_HWIP, 0) >= IP_VERSION(11, 0, 0)) { amdgpu_psp_wait_for_bootloader(adev); ret = amdgpu_atomfirmware_asic_init(adev, true); @@ -1733,7 +1739,7 @@ bool amdgpu_device_need_post(struct amdgpu_device *adev) uint32_t fw_ver; err = request_firmware(&adev->pm.fw, "amdgpu/fiji_smc.bin", adev->dev); - /* force vPost if error occured */ + /* force vPost if error occurred */ if (err) return true; @@ -2165,7 +2171,7 @@ int amdgpu_device_ip_set_clockgating_state(void *dev, if (!adev->ip_blocks[i].version->funcs->set_clockgating_state) continue; r = adev->ip_blocks[i].version->funcs->set_clockgating_state( - (void *)adev, state); + &adev->ip_blocks[i], state); if (r) DRM_ERROR("set_clockgating_state of IP block <%s> failed %d\n", adev->ip_blocks[i].version->funcs->name, r); @@ -2199,7 +2205,7 @@ int amdgpu_device_ip_set_powergating_state(void *dev, if (!adev->ip_blocks[i].version->funcs->set_powergating_state) continue; r = adev->ip_blocks[i].version->funcs->set_powergating_state( - (void *)adev, state); + &adev->ip_blocks[i], state); if (r) DRM_ERROR("set_powergating_state of IP block <%s> failed %d\n", adev->ip_blocks[i].version->funcs->name, r); @@ -2378,7 +2384,7 @@ int amdgpu_device_ip_block_add(struct amdgpu_device *adev, * the module parameter virtual_display. This feature provides a virtual * display hardware on headless boards or in virtualized environments. * This function parses and validates the configuration string specified by - * the user and configues the virtual display configuration (number of + * the user and configures the virtual display configuration (number of * virtual connectors, crtcs, etc.) specified. */ static void amdgpu_device_enable_virtual_display(struct amdgpu_device *adev) @@ -2441,7 +2447,7 @@ void amdgpu_device_set_sriov_virtual_display(struct amdgpu_device *adev) * @adev: amdgpu_device pointer * * Parses the asic configuration parameters specified in the gpu info - * firmware and makes them availale to the driver for use in configuring + * firmware and makes them available to the driver for use in configuring * the asic. * Returns 0 on success, -EINVAL on failure. */ @@ -2482,6 +2488,7 @@ static int amdgpu_device_parse_gpu_info_fw(struct amdgpu_device *adev) } err = amdgpu_ucode_request(adev, &adev->firmware.gpu_info_fw, + AMDGPU_UCODE_OPTIONAL, "amdgpu/%s_gpu_info.bin", chip_name); if (err) { dev_err(adev->dev, @@ -2501,7 +2508,7 @@ static int amdgpu_device_parse_gpu_info_fw(struct amdgpu_device *adev) le32_to_cpu(hdr->header.ucode_array_offset_bytes)); /* - * Should be droped when DAL no longer needs it. + * Should be dropped when DAL no longer needs it. */ if (adev->asic_type == CHIP_NAVI12) goto parse_soc_bounding_box; @@ -3061,7 +3068,7 @@ init_failed: * * Writes a reset magic value to the gart pointer in VRAM. The driver calls * this function before a GPU reset. If the value is retained after a - * GPU reset, VRAM has not been lost. Some GPU resets may destry VRAM contents. + * GPU reset, VRAM has not been lost. Some GPU resets may destroy VRAM contents. */ static void amdgpu_device_fill_reset_magic(struct amdgpu_device *adev) { @@ -3137,7 +3144,7 @@ int amdgpu_device_set_cg_state(struct amdgpu_device *adev, adev->ip_blocks[i].version->type != AMD_IP_BLOCK_TYPE_JPEG && adev->ip_blocks[i].version->funcs->set_clockgating_state) { /* enable clockgating to save power */ - r = adev->ip_blocks[i].version->funcs->set_clockgating_state((void *)adev, + r = adev->ip_blocks[i].version->funcs->set_clockgating_state(&adev->ip_blocks[i], state); if (r) { DRM_ERROR("set_clockgating_state(gate) of IP block <%s> failed %d\n", @@ -3174,7 +3181,7 @@ int amdgpu_device_set_pg_state(struct amdgpu_device *adev, adev->ip_blocks[i].version->type != AMD_IP_BLOCK_TYPE_JPEG && adev->ip_blocks[i].version->funcs->set_powergating_state) { /* enable powergating to save power */ - r = adev->ip_blocks[i].version->funcs->set_powergating_state((void *)adev, + r = adev->ip_blocks[i].version->funcs->set_powergating_state(&adev->ip_blocks[i], state); if (r) { DRM_ERROR("set_powergating_state(gate) of IP block <%s> failed %d\n", @@ -3376,7 +3383,7 @@ static int amdgpu_device_ip_fini_early(struct amdgpu_device *adev) amdgpu_amdkfd_suspend(adev, false); - /* Workaroud for ASICs need to disable SMC first */ + /* Workaround for ASICs need to disable SMC first */ amdgpu_device_smu_fini_early(adev); for (i = adev->num_ip_blocks - 1; i >= 0; i--) { @@ -3478,7 +3485,7 @@ static void amdgpu_device_delay_enable_gfx_off(struct work_struct *work) WARN_ON_ONCE(adev->gfx.gfx_off_state); WARN_ON_ONCE(adev->gfx.gfx_off_req_count); - if (!amdgpu_dpm_set_powergating_by_smu(adev, AMD_IP_BLOCK_TYPE_GFX, true)) + if (!amdgpu_dpm_set_powergating_by_smu(adev, AMD_IP_BLOCK_TYPE_GFX, true, 0)) adev->gfx.gfx_off_state = true; } @@ -4306,7 +4313,7 @@ int amdgpu_device_init(struct amdgpu_device *adev, /* * Reset domain needs to be present early, before XGMI hive discovered - * (if any) and intitialized to use reset sem and in_gpu reset flag + * (if any) and initialized to use reset sem and in_gpu reset flag * early on during init and before calling to RREG32. */ adev->reset_domain = amdgpu_reset_create_reset_domain(SINGLE_DEVICE, "amdgpu-reset-dev"); @@ -4596,6 +4603,11 @@ fence_driver_init: amdgpu_device_check_iommu_direct_map(adev); + adev->pm_nb.notifier_call = amdgpu_device_pm_notifier; + r = register_pm_notifier(&adev->pm_nb); + if (r) + goto failed; + return 0; release_ras_con: @@ -4660,6 +4672,8 @@ void amdgpu_device_fini_hw(struct amdgpu_device *adev) drain_workqueue(adev->mman.bdev.wq); adev->shutdown = true; + unregister_pm_notifier(&adev->pm_nb); + /* make sure IB test finished before entering exclusive mode * to avoid preemption on IB test */ @@ -4778,8 +4792,8 @@ static int amdgpu_device_evict_resources(struct amdgpu_device *adev) { int ret; - /* No need to evict vram on APUs for suspend to ram or s2idle */ - if ((adev->in_s3 || adev->in_s0ix) && (adev->flags & AMD_IS_APU)) + /* No need to evict vram on APUs unless going to S4 */ + if (!adev->in_s4 && (adev->flags & AMD_IS_APU)) return 0; ret = amdgpu_ttm_evict_resources(adev, TTM_PL_VRAM); @@ -4791,6 +4805,41 @@ static int amdgpu_device_evict_resources(struct amdgpu_device *adev) /* * Suspend & resume. */ +/** + * amdgpu_device_pm_notifier - Notification block for Suspend/Hibernate events + * @nb: notifier block + * @mode: suspend mode + * @data: data + * + * This function is called when the system is about to suspend or hibernate. + * It is used to evict resources from the device before the system goes to + * sleep while there is still access to swap. + */ +static int amdgpu_device_pm_notifier(struct notifier_block *nb, unsigned long mode, + void *data) +{ + struct amdgpu_device *adev = container_of(nb, struct amdgpu_device, pm_nb); + int r; + + switch (mode) { + case PM_HIBERNATION_PREPARE: + adev->in_s4 = true; + fallthrough; + case PM_SUSPEND_PREPARE: + r = amdgpu_device_evict_resources(adev); + /* + * This is considered non-fatal at this time because + * amdgpu_device_prepare() will also fatally evict resources. + * See https://gitlab.freedesktop.org/drm/amd/-/issues/3781 + */ + if (r) + drm_warn(adev_to_drm(adev), "Failed to evict resources, freeze active processes if problems occur: %d\n", r); + break; + } + + return NOTIFY_DONE; +} + /** * amdgpu_device_prepare - prepare for device suspend * @@ -4830,7 +4879,7 @@ int amdgpu_device_prepare(struct drm_device *dev) return 0; unprepare: - adev->in_s0ix = adev->in_s3 = false; + adev->in_s0ix = adev->in_s3 = adev->in_s4 = false; return r; } @@ -5181,7 +5230,7 @@ static int amdgpu_device_reset_sriov(struct amdgpu_device *adev, if (r) return r; - amdgpu_ras_set_fed(adev, false); + amdgpu_ras_clear_err_state(adev); amdgpu_irq_gpu_reset_resume_helper(adev); /* some sw clean up VF needs to do before recover */ @@ -5238,16 +5287,18 @@ static int amdgpu_device_reset_sriov(struct amdgpu_device *adev, } /** - * amdgpu_device_has_job_running - check if there is any job in mirror list + * amdgpu_device_has_job_running - check if there is any unfinished job * * @adev: amdgpu_device pointer * - * check if there is any job in mirror list + * check if there is any job running on the device when guest driver receives + * FLR notification from host driver. If there are still jobs running, then + * the guest driver will not respond the FLR reset. Instead, let the job hit + * the timeout and guest driver then issue the reset request. */ bool amdgpu_device_has_job_running(struct amdgpu_device *adev) { int i; - struct drm_sched_job *job; for (i = 0; i < AMDGPU_MAX_RINGS; ++i) { struct amdgpu_ring *ring = adev->rings[i]; @@ -5255,11 +5306,7 @@ bool amdgpu_device_has_job_running(struct amdgpu_device *adev) if (!amdgpu_ring_sched_ready(ring)) continue; - spin_lock(&ring->sched.job_list_lock); - job = list_first_entry_or_null(&ring->sched.pending_list, - struct drm_sched_job, list); - spin_unlock(&ring->sched.job_list_lock); - if (job) + if (amdgpu_fence_count_emitted(ring)) return true; } return false; @@ -5484,7 +5531,7 @@ int amdgpu_device_reinit_after_reset(struct amdgpu_reset_context *reset_context) amdgpu_set_init_level(tmp_adev, init_level); if (full_reset) { /* post card */ - amdgpu_ras_set_fed(tmp_adev, false); + amdgpu_ras_clear_err_state(tmp_adev); r = amdgpu_device_asic_init(tmp_adev); if (r) { dev_warn(tmp_adev->dev, "asic atom init failed!"); @@ -5817,6 +5864,18 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, bool audio_suspended = false; int retry_limit = AMDGPU_MAX_RETRY_LIMIT; + /* + * If it reaches here because of hang/timeout and a RAS error is + * detected at the same time, let RAS recovery take care of it. + */ + if (amdgpu_ras_is_err_state(adev, AMDGPU_RAS_BLOCK__ANY) && + !amdgpu_sriov_vf(adev) && + reset_context->src != AMDGPU_RESET_SRC_RAS) { + dev_dbg(adev->dev, + "Gpu recovery from source: %d yielding to RAS error recovery handling", + reset_context->src); + return 0; + } /* * Special case: RAS triggered and full reset isn't supported */ @@ -5900,7 +5959,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, amdgpu_amdkfd_pre_reset(tmp_adev, reset_context); /* - * Mark these ASICs to be reseted as untracked first + * Mark these ASICs to be reset as untracked first * And add them back after reset completed */ amdgpu_unregister_gpu_instance(tmp_adev); @@ -6103,7 +6162,7 @@ static void amdgpu_device_partner_bandwidth(struct amdgpu_device *adev, * * @adev: amdgpu_device pointer * - * Fetchs and stores in the driver the PCIE capabilities (gen speed + * Fetches and stores in the driver the PCIE capabilities (gen speed * and lanes) of the slot the device is in. Handles APUs and * virtualized environments where PCIE config space may not be available. */ diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c index 1040204ac8b97c6895fbe7b00e6fbc6d2532835b..949d74eff29465b12df862d8c1e00c09a9990d96 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c @@ -1,5 +1,5 @@ /* - * Copyright 2018 Advanced Micro Devices, Inc. + * Copyright 2018-2024 Advanced Micro Devices, Inc. All rights reserved. * * Permission is hereby granted, free of charge, to any person obtaining a * copy of this software and associated documentation files (the "Software"), @@ -104,7 +104,9 @@ #include "smuio_v13_0_6.h" #include "smuio_v14_0_2.h" #include "vcn_v5_0_0.h" +#include "vcn_v5_0_1.h" #include "jpeg_v5_0_0.h" +#include "jpeg_v5_0_1.h" #include "amdgpu_vpe.h" #if defined(CONFIG_DRM_AMD_ISP) @@ -1340,7 +1342,7 @@ static int amdgpu_discovery_reg_base_init(struct amdgpu_device *adev) */ if (adev->vcn.num_vcn_inst < AMDGPU_MAX_VCN_INSTANCES) { - adev->vcn.vcn_config[adev->vcn.num_vcn_inst] = + adev->vcn.inst[adev->vcn.num_vcn_inst].vcn_config = ip->revision & 0xc0; adev->vcn.num_vcn_inst++; adev->vcn.inst_mask |= @@ -1705,7 +1707,7 @@ static int amdgpu_discovery_get_vcn_info(struct amdgpu_device *adev) * so this won't overflow. */ for (v = 0; v < adev->vcn.num_vcn_inst; v++) { - adev->vcn.vcn_codec_disable_mask[v] = + adev->vcn.inst[v].vcn_codec_disable_mask = le32_to_cpu(vcn_info->v1.instance_info[v].fuse_data.all_bits); } break; @@ -1836,6 +1838,7 @@ static int amdgpu_discovery_set_common_ip_blocks(struct amdgpu_device *adev) case IP_VERSION(9, 4, 2): case IP_VERSION(9, 4, 3): case IP_VERSION(9, 4, 4): + case IP_VERSION(9, 5, 0): amdgpu_device_ip_block_add(adev, &vega10_common_ip_block); break; case IP_VERSION(10, 1, 10): @@ -1890,6 +1893,7 @@ static int amdgpu_discovery_set_gmc_ip_blocks(struct amdgpu_device *adev) case IP_VERSION(9, 4, 2): case IP_VERSION(9, 4, 3): case IP_VERSION(9, 4, 4): + case IP_VERSION(9, 5, 0): amdgpu_device_ip_block_add(adev, &gmc_v9_0_ip_block); break; case IP_VERSION(10, 1, 10): @@ -2013,6 +2017,7 @@ static int amdgpu_discovery_set_psp_ip_blocks(struct amdgpu_device *adev) case IP_VERSION(13, 0, 8): case IP_VERSION(13, 0, 10): case IP_VERSION(13, 0, 11): + case IP_VERSION(13, 0, 12): case IP_VERSION(13, 0, 14): case IP_VERSION(14, 0, 0): case IP_VERSION(14, 0, 1): @@ -2184,6 +2189,7 @@ static int amdgpu_discovery_set_gc_ip_blocks(struct amdgpu_device *adev) break; case IP_VERSION(9, 4, 3): case IP_VERSION(9, 4, 4): + case IP_VERSION(9, 5, 0): amdgpu_device_ip_block_add(adev, &gfx_v9_4_3_ip_block); break; case IP_VERSION(10, 1, 10): @@ -2238,6 +2244,7 @@ static int amdgpu_discovery_set_sdma_ip_blocks(struct amdgpu_device *adev) break; case IP_VERSION(4, 4, 2): case IP_VERSION(4, 4, 5): + case IP_VERSION(4, 4, 4): amdgpu_device_ip_block_add(adev, &sdma_v4_4_2_ip_block); break; case IP_VERSION(5, 0, 0): @@ -2361,6 +2368,10 @@ static int amdgpu_discovery_set_mm_ip_blocks(struct amdgpu_device *adev) amdgpu_device_ip_block_add(adev, &vcn_v5_0_0_ip_block); amdgpu_device_ip_block_add(adev, &jpeg_v5_0_0_ip_block); break; + case IP_VERSION(5, 0, 1): + amdgpu_device_ip_block_add(adev, &vcn_v5_0_1_ip_block); + amdgpu_device_ip_block_add(adev, &jpeg_v5_0_1_ip_block); + break; default: dev_err(adev->dev, "Failed to add vcn/jpeg ip block(UVD_HWIP:0x%x)\n", @@ -2405,6 +2416,7 @@ static void amdgpu_discovery_init_soc_config(struct amdgpu_device *adev) switch (amdgpu_ip_version(adev, GC_HWIP, 0)) { case IP_VERSION(9, 4, 3): case IP_VERSION(9, 4, 4): + case IP_VERSION(9, 5, 0): aqua_vanjaram_init_soc_config(adev); break; default: @@ -2652,6 +2664,7 @@ int amdgpu_discovery_set_ip_blocks(struct amdgpu_device *adev) case IP_VERSION(9, 4, 2): case IP_VERSION(9, 4, 3): case IP_VERSION(9, 4, 4): + case IP_VERSION(9, 5, 0): adev->family = AMDGPU_FAMILY_AI; break; case IP_VERSION(9, 1, 0): diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c index b119d27271c1a24e9f482a4a26b8470a8f2616ee..35c778426a7c704f37dfdc8533dc61d79e3bf9e3 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c @@ -33,6 +33,7 @@ #include "soc15_common.h" #include "gc/gc_11_0_0_offset.h" #include "gc/gc_11_0_0_sh_mask.h" +#include "bif/bif_4_1_d.h" #include #include @@ -1788,3 +1789,82 @@ int amdgpu_display_resume_helper(struct amdgpu_device *adev) return 0; } +/* panic_bo is set in amdgpu_dm_plane_get_scanout_buffer() and only used in amdgpu_dm_set_pixel() + * they are called from the panic handler, and protected by the drm_panic spinlock. + */ +static struct amdgpu_bo *panic_abo; + +/* Use the indirect MMIO to write each pixel to the GPU VRAM, + * This is a simplified version of amdgpu_device_mm_access() + */ +static void amdgpu_display_set_pixel(struct drm_scanout_buffer *sb, + unsigned int x, + unsigned int y, + u32 color) +{ + struct amdgpu_res_cursor cursor; + unsigned long offset; + struct amdgpu_bo *abo = panic_abo; + struct amdgpu_device *adev = amdgpu_ttm_adev(abo->tbo.bdev); + uint32_t tmp; + + offset = x * 4 + y * sb->pitch[0]; + amdgpu_res_first(abo->tbo.resource, offset, 4, &cursor); + + tmp = cursor.start >> 31; + WREG32_NO_KIQ(mmMM_INDEX, ((uint32_t) cursor.start) | 0x80000000); + if (tmp != 0xffffffff) + WREG32_NO_KIQ(mmMM_INDEX_HI, tmp); + WREG32_NO_KIQ(mmMM_DATA, color); +} + +int amdgpu_display_get_scanout_buffer(struct drm_plane *plane, + struct drm_scanout_buffer *sb) +{ + struct amdgpu_bo *abo; + struct drm_framebuffer *fb = plane->state->fb; + + if (!fb) + return -EINVAL; + + DRM_DEBUG_KMS("Framebuffer %dx%d %p4cc\n", fb->width, fb->height, &fb->format->format); + + abo = gem_to_amdgpu_bo(fb->obj[0]); + if (!abo) + return -EINVAL; + + sb->width = fb->width; + sb->height = fb->height; + /* Use the generic linear format, because tiling will be disabled in panic_flush() */ + sb->format = drm_format_info(fb->format->format); + if (!sb->format) + return -EINVAL; + + sb->pitch[0] = fb->pitches[0]; + + if (abo->flags & AMDGPU_GEM_CREATE_NO_CPU_ACCESS) { + if (abo->tbo.resource->mem_type != TTM_PL_VRAM) { + drm_warn(plane->dev, "amdgpu panic, framebuffer not in VRAM\n"); + return -EINVAL; + } + /* Only handle 32bits format, to simplify mmio access */ + if (fb->format->cpp[0] != 4) { + drm_warn(plane->dev, "amdgpu panic, pixel format is not 32bits\n"); + return -EINVAL; + } + sb->set_pixel = amdgpu_display_set_pixel; + panic_abo = abo; + return 0; + } + if (!abo->kmap.virtual && + ttm_bo_kmap(&abo->tbo, 0, PFN_UP(abo->tbo.base.size), &abo->kmap)) { + drm_warn(plane->dev, "amdgpu bo map failed, panic won't be displayed\n"); + return -ENOMEM; + } + if (abo->kmap.bo_kmap_type & TTM_BO_MAP_IOMEM_MASK) + iosys_map_set_vaddr_iomem(&sb->map[0], abo->kmap.virtual); + else + iosys_map_set_vaddr(&sb->map[0], abo->kmap.virtual); + + return 0; +} diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.h index 9d19940f73c8fcd99223550a935c98c82ce91cac..dfa0d642ac161b1f56c2a616baa503e7ac98bd53 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.h @@ -23,6 +23,8 @@ #ifndef __AMDGPU_DISPLAY_H__ #define __AMDGPU_DISPLAY_H__ +#include + #define amdgpu_display_vblank_get_counter(adev, crtc) (adev)->mode_info.funcs->vblank_get_counter((adev), (crtc)) #define amdgpu_display_backlight_set_level(adev, e, l) (adev)->mode_info.funcs->backlight_set_level((e), (l)) #define amdgpu_display_backlight_get_level(adev, e) (adev)->mode_info.funcs->backlight_get_level((e)) @@ -49,4 +51,7 @@ amdgpu_lookup_format_info(u32 format, uint64_t modifier); int amdgpu_display_suspend_helper(struct amdgpu_device *adev); int amdgpu_display_resume_helper(struct amdgpu_device *adev); +int amdgpu_display_get_scanout_buffer(struct drm_plane *plane, + struct drm_scanout_buffer *sb); + #endif diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c index 38686203bea63044fa2cf1a4c2cad35d96d9adaa..492b09d8457190a9b58fdcf9607a7fa65b0b7b43 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -23,7 +23,7 @@ */ #include -#include +#include #include #include #include @@ -2552,7 +2552,6 @@ static int amdgpu_pmops_freeze(struct device *dev) struct amdgpu_device *adev = drm_to_adev(drm_dev); int r; - adev->in_s4 = true; r = amdgpu_device_suspend(drm_dev, true); adev->in_s4 = false; if (r) @@ -2916,7 +2915,6 @@ static const struct drm_driver amdgpu_kms_driver = { .name = DRIVER_NAME, .desc = DRIVER_DESC, - .date = DRIVER_DATE, .major = KMS_DRIVER_MAJOR, .minor = KMS_DRIVER_MINOR, .patchlevel = KMS_DRIVER_PATCHLEVEL, @@ -2940,7 +2938,6 @@ const struct drm_driver amdgpu_partition_driver = { .name = DRIVER_NAME, .desc = DRIVER_DESC, - .date = DRIVER_DATE, .major = KMS_DRIVER_MAJOR, .minor = KMS_DRIVER_MINOR, .patchlevel = KMS_DRIVER_PATCHLEVEL, diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.h index 5bc2cb661af7a0b9723c7af609ccf5d8d5b42c56..2d86cc6f7f4dd42d7065abcf9c6dfeab8ea82ec0 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.h @@ -40,7 +40,6 @@ #define DRIVER_NAME "amdgpu" #define DRIVER_DESC "AMD GPU" -#define DRIVER_DATE "20150101" extern const struct drm_driver amdgpu_partition_driver; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.c index ceb5163480f4c2f5103088cb28956eb972ef785f..09c9194d5bd58dff567761abc0bb05ea3c34d5d4 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.c @@ -384,7 +384,7 @@ int amdgpu_fru_sysfs_init(struct amdgpu_device *adev) void amdgpu_fru_sysfs_fini(struct amdgpu_device *adev) { - if (!is_fru_eeprom_supported(adev, NULL) || !adev->fru_info) + if (!adev->fru_info) return; sysfs_remove_files(&adev->dev->kobj, amdgpu_fru_attributes); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.h index bc58dca18035a1f3fbe22aa43594ab74cd34cd38..98f3196599ef7ff1e2008084a7e4175c0078c127 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.h @@ -32,7 +32,7 @@ struct amdgpu_fru_info { char product_name[AMDGPU_PRODUCT_NAME_LEN]; char serial[20]; char manufacturer_name[32]; - char fru_id[32]; + char fru_id[50]; }; int amdgpu_fru_get_product_info(struct amdgpu_device *adev); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c index 69a6b6dba0a540b5289d833a829900a42e9280bf..6d5d81f0dc4e7bffa6f7d0e8f2aa0daf60d06199 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c @@ -515,7 +515,7 @@ int amdgpu_gfx_disable_kcq(struct amdgpu_device *adev, int xcc_id) if (!kiq->pmf || !kiq->pmf->kiq_unmap_queues) return -EINVAL; - if (!kiq_ring->sched.ready || adev->job_hang || amdgpu_in_reset(adev)) + if (!kiq_ring->sched.ready || amdgpu_in_reset(adev)) return 0; spin_lock(&kiq->ring_lock); @@ -567,7 +567,7 @@ int amdgpu_gfx_disable_kgq(struct amdgpu_device *adev, int xcc_id) if (!kiq->pmf || !kiq->pmf->kiq_unmap_queues) return -EINVAL; - if (!adev->gfx.kiq[0].ring.sched.ready || adev->job_hang) + if (!adev->gfx.kiq[0].ring.sched.ready || amdgpu_in_reset(adev)) return 0; if (amdgpu_gfx_is_master_xcc(adev, xcc_id)) { @@ -806,7 +806,7 @@ void amdgpu_gfx_off_ctrl(struct amdgpu_device *adev, bool enable) /* If going to s2idle, no need to wait */ if (adev->in_s0ix) { if (!amdgpu_dpm_set_powergating_by_smu(adev, - AMD_IP_BLOCK_TYPE_GFX, true)) + AMD_IP_BLOCK_TYPE_GFX, true, 0)) adev->gfx.gfx_off_state = true; } else { schedule_delayed_work(&adev->gfx.gfx_off_delay_work, @@ -818,7 +818,7 @@ void amdgpu_gfx_off_ctrl(struct amdgpu_device *adev, bool enable) cancel_delayed_work_sync(&adev->gfx.gfx_off_delay_work); if (adev->gfx.gfx_off_state && - !amdgpu_dpm_set_powergating_by_smu(adev, AMD_IP_BLOCK_TYPE_GFX, false)) { + !amdgpu_dpm_set_powergating_by_smu(adev, AMD_IP_BLOCK_TYPE_GFX, false, 0)) { adev->gfx.gfx_off_state = false; if (adev->gfx.funcs->init_spm_golden) { @@ -1484,6 +1484,24 @@ static int amdgpu_gfx_run_cleaner_shader(struct amdgpu_device *adev, int xcp_id) return 0; } +/** + * amdgpu_gfx_set_run_cleaner_shader - Execute the AMDGPU GFX Cleaner Shader + * @dev: The device structure + * @attr: The device attribute structure + * @buf: The buffer containing the input data + * @count: The size of the input data + * + * Provides the sysfs interface to manually run a cleaner shader, which is + * used to clear the GPU state between different tasks. Writing a value to the + * 'run_cleaner_shader' sysfs file triggers the cleaner shader execution. + * The value written corresponds to the partition index on multi-partition + * devices. On single-partition devices, the value should be '0'. + * + * The cleaner shader clears the Local Data Store (LDS) and General Purpose + * Registers (GPRs) to ensure data isolation between GPU workloads. + * + * Return: The number of bytes written to the sysfs file. + */ static ssize_t amdgpu_gfx_set_run_cleaner_shader(struct device *dev, struct device_attribute *attr, const char *buf, @@ -1532,6 +1550,19 @@ static ssize_t amdgpu_gfx_set_run_cleaner_shader(struct device *dev, return count; } +/** + * amdgpu_gfx_get_enforce_isolation - Query AMDGPU GFX Enforce Isolation Settings + * @dev: The device structure + * @attr: The device attribute structure + * @buf: The buffer to store the output data + * + * Provides the sysfs read interface to get the current settings of the 'enforce_isolation' + * feature for each GPU partition. Reading from the 'enforce_isolation' + * sysfs file returns the isolation settings for all partitions, where '0' + * indicates disabled and '1' indicates enabled. + * + * Return: The number of bytes read from the sysfs file. + */ static ssize_t amdgpu_gfx_get_enforce_isolation(struct device *dev, struct device_attribute *attr, char *buf) @@ -1555,6 +1586,20 @@ static ssize_t amdgpu_gfx_get_enforce_isolation(struct device *dev, return size; } +/** + * amdgpu_gfx_set_enforce_isolation - Control AMDGPU GFX Enforce Isolation + * @dev: The device structure + * @attr: The device attribute structure + * @buf: The buffer containing the input data + * @count: The size of the input data + * + * This function allows control over the 'enforce_isolation' feature, which + * serializes access to the graphics engine. Writing '1' or '0' to the + * 'enforce_isolation' sysfs file enables or disables process isolation for + * each partition. The input should specify the setting for all partitions. + * + * Return: The number of bytes written to the sysfs file. + */ static ssize_t amdgpu_gfx_set_enforce_isolation(struct device *dev, struct device_attribute *attr, const char *buf, size_t count) @@ -1940,6 +1985,17 @@ void amdgpu_gfx_enforce_isolation_handler(struct work_struct *work) mutex_unlock(&adev->enforce_isolation_mutex); } +/** + * amdgpu_gfx_enforce_isolation_wait_for_kfd - Manage KFD wait period for process isolation + * @adev: amdgpu_device pointer + * @idx: Index of the GPU partition + * + * When kernel submissions come in, the jobs are given a time slice and once + * that time slice is up, if there are KFD user queues active, kernel + * submissions are blocked until KFD has had its time slice. Once the KFD time + * slice is up, KFD user queues are preempted and kernel submissions are + * unblocked and allowed to run again. + */ static void amdgpu_gfx_enforce_isolation_wait_for_kfd(struct amdgpu_device *adev, u32 idx) @@ -1985,6 +2041,15 @@ amdgpu_gfx_enforce_isolation_wait_for_kfd(struct amdgpu_device *adev, msleep(GFX_SLICE_PERIOD_MS); } +/** + * amdgpu_gfx_enforce_isolation_ring_begin_use - Begin use of a ring with enforced isolation + * @ring: Pointer to the amdgpu_ring structure + * + * Ring begin_use helper implementation for gfx which serializes access to the + * gfx IP between kernel submission IOCTLs and KFD user queues when isolation + * enforcement is enabled. The kernel submission IOCTLs and KFD user queues + * each get a time slice when both are active. + */ void amdgpu_gfx_enforce_isolation_ring_begin_use(struct amdgpu_ring *ring) { struct amdgpu_device *adev = ring->adev; @@ -2012,6 +2077,15 @@ void amdgpu_gfx_enforce_isolation_ring_begin_use(struct amdgpu_ring *ring) mutex_unlock(&adev->enforce_isolation_mutex); } +/** + * amdgpu_gfx_enforce_isolation_ring_end_use - End use of a ring with enforced isolation + * @ring: Pointer to the amdgpu_ring structure + * + * Ring end_use helper implementation for gfx which serializes access to the + * gfx IP between kernel submission IOCTLs and KFD user queues when isolation + * enforcement is enabled. The kernel submission IOCTLs and KFD user queues + * each get a time slice when both are active. + */ void amdgpu_gfx_enforce_isolation_ring_end_use(struct amdgpu_ring *ring) { struct amdgpu_device *adev = ring->adev; @@ -2050,7 +2124,7 @@ static int amdgpu_debugfs_gfx_sched_mask_set(void *data, u64 val) if (!adev) return -ENODEV; - mask = (1 << adev->gfx.num_gfx_rings) - 1; + mask = (1ULL << adev->gfx.num_gfx_rings) - 1; if ((val & mask) == 0) return -EINVAL; @@ -2078,7 +2152,7 @@ static int amdgpu_debugfs_gfx_sched_mask_get(void *data, u64 *val) for (i = 0; i < adev->gfx.num_gfx_rings; ++i) { ring = &adev->gfx.gfx_ring[i]; if (ring->sched.ready) - mask |= 1 << i; + mask |= 1ULL << i; } *val = mask; @@ -2120,7 +2194,7 @@ static int amdgpu_debugfs_compute_sched_mask_set(void *data, u64 val) if (!adev) return -ENODEV; - mask = (1 << adev->gfx.num_compute_rings) - 1; + mask = (1ULL << adev->gfx.num_compute_rings) - 1; if ((val & mask) == 0) return -EINVAL; @@ -2149,7 +2223,7 @@ static int amdgpu_debugfs_compute_sched_mask_get(void *data, u64 *val) for (i = 0; i < adev->gfx.num_compute_rings; ++i) { ring = &adev->gfx.compute_ring[i]; if (ring->sched.ready) - mask |= 1 << i; + mask |= 1ULL << i; } *val = mask; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c index 8b512dc28df8384861be0385e8cbdb292406421e..d751995dc13126d478762efc8552de2b350e8ead 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c @@ -89,16 +89,14 @@ int amdgpu_ib_get(struct amdgpu_device *adev, struct amdgpu_vm *vm, /** * amdgpu_ib_free - free an IB (Indirect Buffer) * - * @adev: amdgpu_device pointer * @ib: IB object to free * @f: the fence SA bo need wait on for the ib alloation * * Free an IB (all asics). */ -void amdgpu_ib_free(struct amdgpu_device *adev, struct amdgpu_ib *ib, - struct dma_fence *f) +void amdgpu_ib_free(struct amdgpu_ib *ib, struct dma_fence *f) { - amdgpu_sa_bo_free(adev, &ib->sa_bo, f); + amdgpu_sa_bo_free(&ib->sa_bo, f); } /** diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c index f3b0aaf3ebc69e7f90f8cf0c3f0e5d3417991669..901f8b12c672d14ddc10484d2fa6418af767b68a 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c @@ -298,3 +298,9 @@ uint64_t amdgpu_ih_decode_iv_ts_helper(struct amdgpu_ih_ring *ih, u32 rptr, dw2 = le32_to_cpu(ih->ring[ring_index + 2]); return dw1 | ((u64)(dw2 & 0xffff) << 32); } + +const char *amdgpu_ih_ring_name(struct amdgpu_device *adev, struct amdgpu_ih_ring *ih) +{ + return ih == &adev->irq.ih ? "ih" : ih == &adev->irq.ih_soft ? "sw ih" : + ih == &adev->irq.ih1 ? "ih1" : ih == &adev->irq.ih2 ? "ih2" : "unknown"; +} diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h index 508f02eb0cf8f958d26853cd7b4698091554d94e..7d4395a5d8ac9f3194a50e296304b1bec862ad47 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h @@ -110,4 +110,5 @@ void amdgpu_ih_decode_iv_helper(struct amdgpu_device *adev, struct amdgpu_iv_entry *entry); uint64_t amdgpu_ih_decode_iv_ts_helper(struct amdgpu_ih_ring *ih, u32 rptr, signed int offset); +const char *amdgpu_ih_ring_name(struct amdgpu_device *adev, struct amdgpu_ih_ring *ih); #endif diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_isp.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_isp.c index 263ce1811cc84219fc5a5eaadf3589047d12aca8..732744488b033653b610756038de50d650af2084 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_isp.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_isp.c @@ -77,7 +77,8 @@ static int isp_load_fw_by_psp(struct amdgpu_device *adev) sizeof(ucode_prefix)); /* read isp fw */ - r = amdgpu_ucode_request(adev, &adev->isp.fw, "amdgpu/%s.bin", ucode_prefix); + r = amdgpu_ucode_request(adev, &adev->isp.fw, AMDGPU_UCODE_OPTIONAL, + "amdgpu/%s.bin", ucode_prefix); if (r) { amdgpu_ucode_release(&adev->isp.fw); return r; @@ -128,13 +129,13 @@ static bool isp_is_idle(void *handle) return true; } -static int isp_set_clockgating_state(void *handle, +static int isp_set_clockgating_state(struct amdgpu_ip_block *ip_block, enum amd_clockgating_state state) { return 0; } -static int isp_set_powergating_state(void *handle, +static int isp_set_powergating_state(struct amdgpu_ip_block *ip_block, enum amd_powergating_state state) { return 0; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c index b9d08bc965813305055171e526a729061b5e98b8..100f044759435e21ef7c5d14ca88fcc8b18c1c98 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c @@ -102,8 +102,6 @@ static enum drm_gpu_sched_stat amdgpu_job_timedout(struct drm_sched_job *s_job) return DRM_GPU_SCHED_STAT_ENODEV; } - adev->job_hang = true; - /* * Do the coredump immediately after a job timeout to get a very * close dump/snapshot/representation of GPU's current error status @@ -181,7 +179,6 @@ static enum drm_gpu_sched_stat amdgpu_job_timedout(struct drm_sched_job *s_job) } exit: - adev->job_hang = false; drm_dev_exit(idx); return DRM_GPU_SCHED_STAT_NOMINAL; } @@ -197,11 +194,6 @@ int amdgpu_job_alloc(struct amdgpu_device *adev, struct amdgpu_vm *vm, if (!*job) return -ENOMEM; - /* - * Initialize the scheduler to at least some ring so that we always - * have a pointer to adev. - */ - (*job)->base.sched = &adev->rings[0]->sched; (*job)->vm = vm; amdgpu_sync_create(&(*job)->explicit_sync); @@ -255,7 +247,6 @@ void amdgpu_job_set_resources(struct amdgpu_job *job, struct amdgpu_bo *gds, void amdgpu_job_free_resources(struct amdgpu_job *job) { - struct amdgpu_ring *ring = to_amdgpu_ring(job->base.sched); struct dma_fence *f; unsigned i; @@ -268,7 +259,7 @@ void amdgpu_job_free_resources(struct amdgpu_job *job) f = NULL; for (i = 0; i < job->num_ibs; ++i) - amdgpu_ib_free(ring->adev, &job->ibs[i], f); + amdgpu_ib_free(&job->ibs[i], f); } static void amdgpu_job_free_cb(struct drm_sched_job *s_job) @@ -367,6 +358,13 @@ amdgpu_job_prepare_job(struct drm_sched_job *sched_job, dev_err(ring->adev->dev, "Error getting VM ID (%d)\n", r); goto error; } + /* + * The VM structure might be released after the VMID is + * assigned, we had multiple problems with people trying to use + * the VM pointer so better set it to NULL. + */ + if (!fence) + job->vm = NULL; } return fence; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.h index 3eb4a4653fceeb20ce886e49dd12c32308b2a519..d9cb343a870843c0051c67a5177ce80a9829b2b2 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.h @@ -27,7 +27,8 @@ #include "amdgpu_ras.h" #define AMDGPU_MAX_JPEG_INSTANCES 4 -#define AMDGPU_MAX_JPEG_RINGS 8 +#define AMDGPU_MAX_JPEG_RINGS 10 +#define AMDGPU_MAX_JPEG_RINGS_4_0_3 8 #define AMDGPU_JPEG_HARVEST_JPEG0 (1 << 0) #define AMDGPU_JPEG_HARVEST_JPEG1 (1 << 1) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c index 59ec20b07a6af32f336f84856d437254577b9a72..32b27a1658e7823c4813301977c403cfb738b111 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c @@ -1610,10 +1610,12 @@ int amdgpu_mes_init_microcode(struct amdgpu_device *adev, int pipe) pipe == AMDGPU_MES_SCHED_PIPE ? "" : "1"); } - r = amdgpu_ucode_request(adev, &adev->mes.fw[pipe], "%s", fw_name); + r = amdgpu_ucode_request(adev, &adev->mes.fw[pipe], AMDGPU_UCODE_REQUIRED, + "%s", fw_name); if (r && need_retry && pipe == AMDGPU_MES_SCHED_PIPE) { dev_info(adev->dev, "try to fall back to %s_mes.bin\n", ucode_prefix); r = amdgpu_ucode_request(adev, &adev->mes.fw[pipe], + AMDGPU_UCODE_REQUIRED, "amdgpu/%s_mes.bin", ucode_prefix); } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index 6852d50caa89a93ec5d9463653f5910668852e95..4f057996ef35ba61c08434046e080564cf19ca4b 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -41,6 +41,7 @@ #include "amdgpu_amdkfd.h" #include "amdgpu_vram_mgr.h" #include "amdgpu_vm.h" +#include "amdgpu_dma_buf.h" /** * DOC: amdgpu_object @@ -324,6 +325,9 @@ error_free: * * Allocates and pins a BO for kernel internal use. * + * This function is exported to allow the V4L2 isp device + * external to drm device to create and access the kernel BO. + * * Note: For bo_ptr new BO is only created if bo_ptr points to NULL. * * Returns: @@ -347,6 +351,76 @@ int amdgpu_bo_create_kernel(struct amdgpu_device *adev, return 0; } +EXPORT_SYMBOL(amdgpu_bo_create_kernel); + +/** + * amdgpu_bo_create_isp_user - create user BO for isp + * + * @adev: amdgpu device object + * @dma_buf: DMABUF handle for isp buffer + * @domain: where to place it + * @bo: used to initialize BOs in structures + * @gpu_addr: GPU addr of the pinned BO + * + * Imports isp DMABUF to allocate and pin a user BO for isp internal use. It does + * GART alloc to generate gpu_addr for BO to make it accessible through the + * GART aperture for ISP HW. + * + * This function is exported to allow the V4L2 isp device external to drm device + * to create and access the isp user BO. + * + * Returns: + * 0 on success, negative error code otherwise. + */ +int amdgpu_bo_create_isp_user(struct amdgpu_device *adev, + struct dma_buf *dma_buf, u32 domain, struct amdgpu_bo **bo, + u64 *gpu_addr) + +{ + struct drm_gem_object *gem_obj; + int r; + + gem_obj = amdgpu_gem_prime_import(&adev->ddev, dma_buf); + *bo = gem_to_amdgpu_bo(gem_obj); + if (!(*bo)) { + dev_err(adev->dev, "failed to get valid isp user bo\n"); + return -EINVAL; + } + + r = amdgpu_bo_reserve(*bo, false); + if (r) { + dev_err(adev->dev, "(%d) failed to reserve isp user bo\n", r); + return r; + } + + r = amdgpu_bo_pin(*bo, domain); + if (r) { + dev_err(adev->dev, "(%d) isp user bo pin failed\n", r); + goto error_unreserve; + } + + r = amdgpu_ttm_alloc_gart(&(*bo)->tbo); + if (r) { + dev_err(adev->dev, "%p bind failed\n", *bo); + goto error_unpin; + } + + if (!WARN_ON(!gpu_addr)) + *gpu_addr = amdgpu_bo_gpu_offset(*bo); + + amdgpu_bo_unreserve(*bo); + + return 0; + +error_unpin: + amdgpu_bo_unpin(*bo); +error_unreserve: + amdgpu_bo_unreserve(*bo); + amdgpu_bo_unref(bo); + + return r; +} +EXPORT_SYMBOL(amdgpu_bo_create_isp_user); /** * amdgpu_bo_create_kernel_at - create BO for kernel use at specific location @@ -423,6 +497,9 @@ error: * @cpu_addr: pointer to where the BO's CPU memory space address was stored * * unmaps and unpin a BO for kernel internal use. + * + * This function is exported to allow the V4L2 isp device + * external to drm device to free the kernel BO. */ void amdgpu_bo_free_kernel(struct amdgpu_bo **bo, u64 *gpu_addr, void **cpu_addr) @@ -447,6 +524,30 @@ void amdgpu_bo_free_kernel(struct amdgpu_bo **bo, u64 *gpu_addr, if (cpu_addr) *cpu_addr = NULL; } +EXPORT_SYMBOL(amdgpu_bo_free_kernel); + +/** + * amdgpu_bo_free_isp_user - free BO for isp use + * + * @bo: amdgpu isp user BO to free + * + * unpin and unref BO for isp internal use. + * + * This function is exported to allow the V4L2 isp device + * external to drm device to free the isp user BO. + */ +void amdgpu_bo_free_isp_user(struct amdgpu_bo *bo) +{ + if (bo == NULL) + return; + + if (amdgpu_bo_reserve(bo, true) == 0) { + amdgpu_bo_unpin(bo); + amdgpu_bo_unreserve(bo); + } + amdgpu_bo_unref(&bo); +} +EXPORT_SYMBOL(amdgpu_bo_free_isp_user); /* Validate bo size is bit bigger than the request domain */ static bool amdgpu_bo_validate_size(struct amdgpu_device *adev, diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h index be6769852ece4d752744a7117f8d818b7733873a..ab3fe7b42da7a390923148c755bfb5e29879851a 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h @@ -260,6 +260,10 @@ int amdgpu_bo_create_kernel(struct amdgpu_device *adev, unsigned long size, int align, u32 domain, struct amdgpu_bo **bo_ptr, u64 *gpu_addr, void **cpu_addr); +int amdgpu_bo_create_isp_user(struct amdgpu_device *adev, + struct dma_buf *dbuf, u32 domain, + struct amdgpu_bo **bo, + u64 *gpu_addr); int amdgpu_bo_create_kernel_at(struct amdgpu_device *adev, uint64_t offset, uint64_t size, struct amdgpu_bo **bo_ptr, void **cpu_addr); @@ -271,6 +275,7 @@ int amdgpu_bo_create_vm(struct amdgpu_device *adev, struct amdgpu_bo_vm **ubo_ptr); void amdgpu_bo_free_kernel(struct amdgpu_bo **bo, u64 *gpu_addr, void **cpu_addr); +void amdgpu_bo_free_isp_user(struct amdgpu_bo *bo); int amdgpu_bo_kmap(struct amdgpu_bo *bo, void **ptr); void *amdgpu_bo_kptr(struct amdgpu_bo *bo); void amdgpu_bo_kunmap(struct amdgpu_bo *bo); @@ -337,8 +342,7 @@ int amdgpu_sa_bo_manager_start(struct amdgpu_device *adev, int amdgpu_sa_bo_new(struct amdgpu_sa_manager *sa_manager, struct drm_suballoc **sa_bo, unsigned int size); -void amdgpu_sa_bo_free(struct amdgpu_device *adev, - struct drm_suballoc **sa_bo, +void amdgpu_sa_bo_free(struct drm_suballoc **sa_bo, struct dma_fence *fence); #if defined(CONFIG_DEBUG_FS) void amdgpu_sa_bo_dump_debug_info(struct amdgpu_sa_manager *sa_manager, diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c index 448f9e742983f3ef0c5fccc18d85f0c2449aa08e..3c1b08441a6f503d007644289ff8325357a2788b 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c @@ -208,6 +208,7 @@ static int psp_early_init(struct amdgpu_ip_block *ip_block) psp->boot_time_tmr = false; fallthrough; case IP_VERSION(13, 0, 6): + case IP_VERSION(13, 0, 12): case IP_VERSION(13, 0, 14): psp_v13_0_set_psp_funcs(psp); psp->autoload_supported = false; @@ -359,6 +360,7 @@ static bool psp_get_runtime_db_entry(struct amdgpu_device *adev, int i; if (amdgpu_ip_version(adev, MP0_HWIP, 0) == IP_VERSION(13, 0, 6) || + amdgpu_ip_version(adev, MP0_HWIP, 0) == IP_VERSION(13, 0, 12) || amdgpu_ip_version(adev, MP0_HWIP, 0) == IP_VERSION(13, 0, 14)) return false; @@ -870,6 +872,7 @@ static bool psp_skip_tmr(struct psp_context *psp) case IP_VERSION(13, 0, 2): case IP_VERSION(13, 0, 6): case IP_VERSION(13, 0, 10): + case IP_VERSION(13, 0, 12): case IP_VERSION(13, 0, 14): return true; default: @@ -2264,7 +2267,8 @@ int psp_securedisplay_invoke(struct psp_context *psp, uint32_t ta_cmd_id) return -EINVAL; if (ta_cmd_id != TA_SECUREDISPLAY_COMMAND__QUERY_TA && - ta_cmd_id != TA_SECUREDISPLAY_COMMAND__SEND_ROI_CRC) + ta_cmd_id != TA_SECUREDISPLAY_COMMAND__SEND_ROI_CRC && + ta_cmd_id != TA_SECUREDISPLAY_COMMAND__SEND_ROI_CRC_V2) return -EINVAL; ret = psp_ta_invoke(psp, ta_cmd_id, &psp->securedisplay_context.context); @@ -2385,6 +2389,15 @@ static int psp_hw_start(struct psp_context *psp) } } + if ((is_psp_fw_valid(psp->spdm_drv)) && + (psp->funcs->bootloader_load_spdm_drv != NULL)) { + ret = psp_bootloader_load_spdm_drv(psp); + if (ret) { + dev_err(adev->dev, "PSP load spdm_drv failed!\n"); + return ret; + } + } + if ((is_psp_fw_valid(psp->sos)) && (psp->funcs->bootloader_load_sos != NULL)) { ret = psp_bootloader_load_sos(psp); @@ -3289,7 +3302,8 @@ int psp_init_asd_microcode(struct psp_context *psp, const char *chip_name) const struct psp_firmware_header_v1_0 *asd_hdr; int err = 0; - err = amdgpu_ucode_request(adev, &adev->psp.asd_fw, "amdgpu/%s_asd.bin", chip_name); + err = amdgpu_ucode_request(adev, &adev->psp.asd_fw, AMDGPU_UCODE_REQUIRED, + "amdgpu/%s_asd.bin", chip_name); if (err) goto out; @@ -3311,7 +3325,8 @@ int psp_init_toc_microcode(struct psp_context *psp, const char *chip_name) const struct psp_firmware_header_v1_0 *toc_hdr; int err = 0; - err = amdgpu_ucode_request(adev, &adev->psp.toc_fw, "amdgpu/%s_toc.bin", chip_name); + err = amdgpu_ucode_request(adev, &adev->psp.toc_fw, AMDGPU_UCODE_REQUIRED, + "amdgpu/%s_toc.bin", chip_name); if (err) goto out; @@ -3407,6 +3422,12 @@ static int parse_sos_bin_descriptor(struct psp_context *psp, psp->ipkeymgr_drv.size_bytes = le32_to_cpu(desc->size_bytes); psp->ipkeymgr_drv.start_addr = ucode_start_addr; break; + case PSP_FW_TYPE_PSP_SPDM_DRV: + psp->spdm_drv.fw_version = le32_to_cpu(desc->fw_version); + psp->spdm_drv.feature_version = le32_to_cpu(desc->fw_version); + psp->spdm_drv.size_bytes = le32_to_cpu(desc->size_bytes); + psp->spdm_drv.start_addr = ucode_start_addr; + break; default: dev_warn(psp->adev->dev, "Unsupported PSP FW type: %d\n", desc->fw_type); break; @@ -3474,7 +3495,8 @@ int psp_init_sos_microcode(struct psp_context *psp, const char *chip_name) uint8_t *ucode_array_start_addr; int err = 0; - err = amdgpu_ucode_request(adev, &adev->psp.sos_fw, "amdgpu/%s_sos.bin", chip_name); + err = amdgpu_ucode_request(adev, &adev->psp.sos_fw, AMDGPU_UCODE_REQUIRED, + "amdgpu/%s_sos.bin", chip_name); if (err) goto out; @@ -3750,7 +3772,8 @@ int psp_init_ta_microcode(struct psp_context *psp, const char *chip_name) struct amdgpu_device *adev = psp->adev; int err; - err = amdgpu_ucode_request(adev, &adev->psp.ta_fw, "amdgpu/%s_ta.bin", chip_name); + err = amdgpu_ucode_request(adev, &adev->psp.ta_fw, AMDGPU_UCODE_REQUIRED, + "amdgpu/%s_ta.bin", chip_name); if (err) return err; @@ -3785,7 +3808,8 @@ int psp_init_cap_microcode(struct psp_context *psp, const char *chip_name) return -EINVAL; } - err = amdgpu_ucode_request(adev, &adev->psp.cap_fw, "amdgpu/%s_cap.bin", chip_name); + err = amdgpu_ucode_request(adev, &adev->psp.cap_fw, AMDGPU_UCODE_OPTIONAL, + "amdgpu/%s_cap.bin", chip_name); if (err) { if (err == -ENODEV) { dev_warn(adev->dev, "cap microcode does not exist, skip\n"); @@ -3849,13 +3873,13 @@ int psp_config_sq_perfmon(struct psp_context *psp, return ret; } -static int psp_set_clockgating_state(void *handle, +static int psp_set_clockgating_state(struct amdgpu_ip_block *ip_block, enum amd_clockgating_state state) { return 0; } -static int psp_set_powergating_state(void *handle, +static int psp_set_powergating_state(struct amdgpu_ip_block *ip_block, enum amd_powergating_state state) { return 0; @@ -3908,7 +3932,8 @@ static ssize_t psp_usbc_pd_fw_sysfs_write(struct device *dev, if (!drm_dev_enter(ddev, &idx)) return -ENODEV; - ret = amdgpu_ucode_request(adev, &usbc_pd_fw, "amdgpu/%s", buf); + ret = amdgpu_ucode_request(adev, &usbc_pd_fw, AMDGPU_UCODE_REQUIRED, + "amdgpu/%s", buf); if (ret) goto fail; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h index 567cb1f924ca8992cc2630efea3a818abf418b25..8d5acc415d386b48e485dbb5463ad733ea1583c1 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h @@ -80,6 +80,7 @@ enum psp_bootloader_cmd { PSP_BL__DRAM_LONG_TRAIN = 0x100000, PSP_BL__DRAM_SHORT_TRAIN = 0x200000, PSP_BL__LOAD_TOS_SPL_TABLE = 0x10000000, + PSP_BL__LOAD_SPDMDRV = 0x20000000, }; enum psp_ring_type { @@ -120,6 +121,7 @@ struct psp_funcs { int (*bootloader_load_dbg_drv)(struct psp_context *psp); int (*bootloader_load_ras_drv)(struct psp_context *psp); int (*bootloader_load_ipkeymgr_drv)(struct psp_context *psp); + int (*bootloader_load_spdm_drv)(struct psp_context *psp); int (*bootloader_load_sos)(struct psp_context *psp); int (*ring_create)(struct psp_context *psp, enum psp_ring_type ring_type); @@ -343,6 +345,7 @@ struct psp_context { struct psp_bin_desc dbg_drv; struct psp_bin_desc ras_drv; struct psp_bin_desc ipkeymgr_drv; + struct psp_bin_desc spdm_drv; /* tmr buffer */ struct amdgpu_bo *tmr_bo; @@ -434,6 +437,9 @@ struct amdgpu_psp_funcs { #define psp_bootloader_load_ipkeymgr_drv(psp) \ ((psp)->funcs->bootloader_load_ipkeymgr_drv ? \ (psp)->funcs->bootloader_load_ipkeymgr_drv((psp)) : 0) +#define psp_bootloader_load_spdm_drv(psp) \ + ((psp)->funcs->bootloader_load_spdm_drv ? \ + (psp)->funcs->bootloader_load_spdm_drv((psp)) : 0) #define psp_bootloader_load_sos(psp) \ ((psp)->funcs->bootloader_load_sos ? (psp)->funcs->bootloader_load_sos((psp)) : 0) #define psp_smu_reload_quirk(psp) \ diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c index 4c9fa24dd9726a405935907524ed7bf7862779d1..01c947066a2eb02d2e2848a3943c918147146b0e 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c @@ -36,6 +36,7 @@ #include "amdgpu_xgmi.h" #include "ivsrcid/nbio/irqsrcs_nbif_7_4.h" #include "nbio_v4_3.h" +#include "nbif_v6_3_1.h" #include "nbio_v7_9.h" #include "atom.h" #include "amdgpu_reset.h" @@ -192,7 +193,7 @@ static int amdgpu_reserve_page_direct(struct amdgpu_device *adev, uint64_t addre if (amdgpu_bad_page_threshold != 0) { amdgpu_ras_add_bad_pages(adev, err_data.err_addr, - err_data.err_addr_cnt); + err_data.err_addr_cnt, false); amdgpu_ras_save_bad_pages(adev, NULL); } @@ -2015,6 +2016,7 @@ static bool amdgpu_ras_aca_is_supported(struct amdgpu_device *adev) switch (amdgpu_ip_version(adev, MP0_HWIP, 0)) { case IP_VERSION(13, 0, 6): + case IP_VERSION(13, 0, 12): case IP_VERSION(13, 0, 14): ret = true; break; @@ -2156,6 +2158,16 @@ void amdgpu_ras_interrupt_fatal_error_handler(struct amdgpu_device *adev) /* Fatal error events are handled on host side */ if (amdgpu_sriov_vf(adev)) return; + /** + * If the current interrupt is caused by a non-fatal RAS error, skip + * check for fatal error. For fatal errors, FED status of all devices + * in XGMI hive gets set when the first device gets fatal error + * interrupt. The error gets propagated to other devices as well, so + * make sure to ack the interrupt regardless of FED status. + */ + if (!amdgpu_ras_get_fed_status(adev) && + amdgpu_ras_is_err_state(adev, AMDGPU_RAS_BLOCK__ANY)) + return; if (adev->nbio.ras && adev->nbio.ras->handle_ras_controller_intr_no_bifring) @@ -2185,6 +2197,7 @@ static void amdgpu_ras_interrupt_poison_consumption_handler(struct ras_manager * if (ret) return; + amdgpu_ras_set_err_poison(adev, block_obj->ras_comm.block); /* both query_poison_status and handle_poison_consumption are optional, * but at least one of them should be implemented if we need poison * consumption handler @@ -2717,40 +2730,192 @@ static int amdgpu_ras_realloc_eh_data_space(struct amdgpu_device *adev, return 0; } +static int amdgpu_ras_mca2pa_by_idx(struct amdgpu_device *adev, + struct eeprom_table_record *bps, + struct ras_err_data *err_data) +{ + struct ta_ras_query_address_input addr_in; + uint32_t socket = 0; + int ret = 0; + + if (adev->smuio.funcs && adev->smuio.funcs->get_socket_id) + socket = adev->smuio.funcs->get_socket_id(adev); + + /* reinit err_data */ + err_data->err_addr_cnt = 0; + err_data->err_addr_len = adev->umc.retire_unit; + + memset(&addr_in, 0, sizeof(addr_in)); + addr_in.ma.err_addr = bps->address; + addr_in.ma.socket_id = socket; + addr_in.ma.ch_inst = bps->mem_channel; + /* tell RAS TA the node instance is not used */ + addr_in.ma.node_inst = TA_RAS_INV_NODE; + + if (adev->umc.ras && adev->umc.ras->convert_ras_err_addr) + ret = adev->umc.ras->convert_ras_err_addr(adev, err_data, + &addr_in, NULL, false); + + return ret; +} + +static int amdgpu_ras_mca2pa(struct amdgpu_device *adev, + struct eeprom_table_record *bps, + struct ras_err_data *err_data) +{ + struct ta_ras_query_address_input addr_in; + uint32_t die_id, socket = 0; + + if (adev->smuio.funcs && adev->smuio.funcs->get_socket_id) + socket = adev->smuio.funcs->get_socket_id(adev); + + /* although die id is gotten from PA in nps1 mode, the id is + * fitable for any nps mode + */ + if (adev->umc.ras && adev->umc.ras->get_die_id_from_pa) + die_id = adev->umc.ras->get_die_id_from_pa(adev, bps->address, + bps->retired_page << AMDGPU_GPU_PAGE_SHIFT); + else + return -EINVAL; + + /* reinit err_data */ + err_data->err_addr_cnt = 0; + err_data->err_addr_len = adev->umc.retire_unit; + + memset(&addr_in, 0, sizeof(addr_in)); + addr_in.ma.err_addr = bps->address; + addr_in.ma.ch_inst = bps->mem_channel; + addr_in.ma.umc_inst = bps->mcumc_id; + addr_in.ma.node_inst = die_id; + addr_in.ma.socket_id = socket; + + if (adev->umc.ras && adev->umc.ras->convert_ras_err_addr) + return adev->umc.ras->convert_ras_err_addr(adev, err_data, + &addr_in, NULL, false); + else + return -EINVAL; +} + /* it deal with vram only. */ int amdgpu_ras_add_bad_pages(struct amdgpu_device *adev, - struct eeprom_table_record *bps, int pages) + struct eeprom_table_record *bps, int pages, bool from_rom) { struct amdgpu_ras *con = amdgpu_ras_get_context(adev); struct ras_err_handler_data *data; + struct ras_err_data err_data; + struct eeprom_table_record *err_rec; + struct amdgpu_ras_eeprom_control *control = + &adev->psp.ras_context.ras->eeprom_control; + enum amdgpu_memory_partition nps = AMDGPU_NPS1_PARTITION_MODE; int ret = 0; - uint32_t i; + uint32_t i, j, loop_cnt = 1; + bool find_pages_per_pa = false; if (!con || !con->eh_data || !bps || pages <= 0) return 0; + if (from_rom) { + err_data.err_addr = + kcalloc(adev->umc.retire_unit, + sizeof(struct eeprom_table_record), GFP_KERNEL); + if (!err_data.err_addr) { + dev_warn(adev->dev, "Failed to alloc UMC error address record in mca2pa conversion!\n"); + ret = -ENOMEM; + goto out; + } + + err_rec = err_data.err_addr; + loop_cnt = adev->umc.retire_unit; + if (adev->gmc.gmc_funcs->query_mem_partition_mode) + nps = adev->gmc.gmc_funcs->query_mem_partition_mode(adev); + } + mutex_lock(&con->recovery_lock); data = con->eh_data; if (!data) - goto out; + goto free; for (i = 0; i < pages; i++) { - if (amdgpu_ras_check_bad_page_unlock(con, - bps[i].retired_page << AMDGPU_GPU_PAGE_SHIFT)) - continue; + if (from_rom && + control->rec_type == AMDGPU_RAS_EEPROM_REC_MCA) { + if (!find_pages_per_pa) { + if (amdgpu_ras_mca2pa_by_idx(adev, &bps[i], &err_data)) { + if (!i && nps == AMDGPU_NPS1_PARTITION_MODE) { + /* may use old RAS TA, use PA to find pages in + * one row + */ + if (amdgpu_umc_pages_in_a_row(adev, &err_data, + bps[i].retired_page << AMDGPU_GPU_PAGE_SHIFT)) + goto free; + else + find_pages_per_pa = true; + } else { + /* unsupported cases */ + goto free; + } + } + } else { + if (amdgpu_umc_pages_in_a_row(adev, &err_data, + bps[i].retired_page << AMDGPU_GPU_PAGE_SHIFT)) + goto free; + } + } else { + if (from_rom && !find_pages_per_pa) { + if (bps[i].retired_page & UMC_CHANNEL_IDX_V2) { + /* bad page in any NPS mode in eeprom */ + if (amdgpu_ras_mca2pa_by_idx(adev, &bps[i], &err_data)) + goto free; + } else { + /* legacy bad page in eeprom, generated only in + * NPS1 mode + */ + if (amdgpu_ras_mca2pa(adev, &bps[i], &err_data)) { + /* old RAS TA or ASICs which don't support to + * convert addrss via mca address + */ + if (!i && nps == AMDGPU_NPS1_PARTITION_MODE) { + find_pages_per_pa = true; + err_rec = &bps[i]; + loop_cnt = 1; + } else { + /* non-nps1 mode, old RAS TA + * can't support it + */ + goto free; + } + } + } - if (!data->space_left && - amdgpu_ras_realloc_eh_data_space(adev, data, 256)) { - ret = -ENOMEM; - goto out; + if (!find_pages_per_pa) + i += (adev->umc.retire_unit - 1); + } else { + err_rec = &bps[i]; + } } - amdgpu_ras_reserve_page(adev, bps[i].retired_page); + for (j = 0; j < loop_cnt; j++) { + if (amdgpu_ras_check_bad_page_unlock(con, + err_rec[j].retired_page << AMDGPU_GPU_PAGE_SHIFT)) + continue; + + if (!data->space_left && + amdgpu_ras_realloc_eh_data_space(adev, data, 256)) { + ret = -ENOMEM; + goto free; + } + + amdgpu_ras_reserve_page(adev, err_rec[j].retired_page); - memcpy(&data->bps[data->count], &bps[i], sizeof(*data->bps)); - data->count++; - data->space_left--; + memcpy(&data->bps[data->count], &(err_rec[j]), + sizeof(struct eeprom_table_record)); + data->count++; + data->space_left--; + } } + +free: + if (from_rom) + kfree(err_data.err_addr); out: mutex_unlock(&con->recovery_lock); @@ -2768,7 +2933,7 @@ int amdgpu_ras_save_bad_pages(struct amdgpu_device *adev, struct amdgpu_ras *con = amdgpu_ras_get_context(adev); struct ras_err_handler_data *data; struct amdgpu_ras_eeprom_control *control; - int save_count; + int save_count, unit_num, bad_page_num, i; if (!con || !con->eh_data) { if (new_cnt) @@ -2780,19 +2945,32 @@ int amdgpu_ras_save_bad_pages(struct amdgpu_device *adev, mutex_lock(&con->recovery_lock); control = &con->eeprom_control; data = con->eh_data; - save_count = data->count - control->ras_num_recs; + bad_page_num = control->ras_num_bad_pages; + save_count = data->count - bad_page_num; mutex_unlock(&con->recovery_lock); + unit_num = save_count / adev->umc.retire_unit; if (new_cnt) - *new_cnt = save_count / adev->umc.retire_unit; + *new_cnt = unit_num; /* only new entries are saved */ if (save_count > 0) { - if (amdgpu_ras_eeprom_append(control, - &data->bps[control->ras_num_recs], - save_count)) { - dev_err(adev->dev, "Failed to save EEPROM table data!"); - return -EIO; + if (control->rec_type == AMDGPU_RAS_EEPROM_REC_PA) { + if (amdgpu_ras_eeprom_append(control, + &data->bps[control->ras_num_recs], + save_count)) { + dev_err(adev->dev, "Failed to save EEPROM table data!"); + return -EIO; + } + } else { + for (i = 0; i < unit_num; i++) { + if (amdgpu_ras_eeprom_append(control, + &data->bps[bad_page_num + i * adev->umc.retire_unit], + 1)) { + dev_err(adev->dev, "Failed to save EEPROM table data!"); + return -EIO; + } + } } dev_info(adev->dev, "Saved %d pages to EEPROM table.\n", save_count); @@ -2821,11 +2999,32 @@ static int amdgpu_ras_load_bad_pages(struct amdgpu_device *adev) return -ENOMEM; ret = amdgpu_ras_eeprom_read(control, bps, control->ras_num_recs); - if (ret) + if (ret) { dev_err(adev->dev, "Failed to load EEPROM table records!"); - else - ret = amdgpu_ras_add_bad_pages(adev, bps, control->ras_num_recs); + } else { + if (control->ras_num_recs > 1 && + adev->umc.ras && adev->umc.ras->convert_ras_err_addr) { + if ((bps[0].address == bps[1].address) && + (bps[0].mem_channel == bps[1].mem_channel)) + control->rec_type = AMDGPU_RAS_EEPROM_REC_PA; + else + control->rec_type = AMDGPU_RAS_EEPROM_REC_MCA; + } + + ret = amdgpu_ras_eeprom_check(control); + if (ret) + goto out; + + /* HW not usable */ + if (amdgpu_ras_is_rma(adev)) { + ret = -EHWPOISON; + goto out; + } + + ret = amdgpu_ras_add_bad_pages(adev, bps, control->ras_num_recs, true); + } +out: kfree(bps); return ret; } @@ -3205,31 +3404,36 @@ static int amdgpu_ras_page_retirement_thread(void *param) int amdgpu_ras_init_badpage_info(struct amdgpu_device *adev) { struct amdgpu_ras *con = amdgpu_ras_get_context(adev); + struct amdgpu_ras_eeprom_control *control; int ret; if (!con || amdgpu_sriov_vf(adev)) return 0; - ret = amdgpu_ras_eeprom_init(&con->eeprom_control); - + control = &con->eeprom_control; + ret = amdgpu_ras_eeprom_init(control); if (ret) return ret; - /* HW not usable */ - if (amdgpu_ras_is_rma(adev)) - return -EHWPOISON; + if (!adev->umc.ras || !adev->umc.ras->convert_ras_err_addr) + control->rec_type = AMDGPU_RAS_EEPROM_REC_PA; + + /* default status is MCA storage */ + if (control->ras_num_recs <= 1 && + adev->umc.ras && adev->umc.ras->convert_ras_err_addr) + control->rec_type = AMDGPU_RAS_EEPROM_REC_MCA; - if (con->eeprom_control.ras_num_recs) { + if (control->ras_num_recs) { ret = amdgpu_ras_load_bad_pages(adev); if (ret) return ret; amdgpu_dpm_send_hbm_bad_pages_num( - adev, con->eeprom_control.ras_num_recs); + adev, control->ras_num_bad_pages); if (con->update_channel_flag == true) { amdgpu_dpm_send_hbm_bad_channel_flag( - adev, con->eeprom_control.bad_channel_bitmap); + adev, control->bad_channel_bitmap); con->update_channel_flag = false; } } @@ -3366,6 +3570,7 @@ static bool amdgpu_ras_asic_supported(struct amdgpu_device *adev) switch (amdgpu_ip_version(adev, MP0_HWIP, 0)) { case IP_VERSION(13, 0, 2): case IP_VERSION(13, 0, 6): + case IP_VERSION(13, 0, 12): case IP_VERSION(13, 0, 14): return true; default: @@ -3378,7 +3583,9 @@ static bool amdgpu_ras_asic_supported(struct amdgpu_device *adev) case IP_VERSION(13, 0, 0): case IP_VERSION(13, 0, 6): case IP_VERSION(13, 0, 10): + case IP_VERSION(13, 0, 12): case IP_VERSION(13, 0, 14): + case IP_VERSION(14, 0, 3): return true; default: return false; @@ -3629,6 +3836,7 @@ static void amdgpu_ras_init_reserved_vram_size(struct amdgpu_device *adev) switch (amdgpu_ip_version(adev, MP0_HWIP, 0)) { case IP_VERSION(13, 0, 2): case IP_VERSION(13, 0, 6): + case IP_VERSION(13, 0, 12): case IP_VERSION(13, 0, 14): con->reserved_pages_in_bytes = AMDGPU_RAS_RESERVED_VRAM_SIZE; break; @@ -3704,7 +3912,19 @@ int amdgpu_ras_init(struct amdgpu_device *adev) * check DF RAS */ adev->nbio.ras = &nbio_v4_3_ras; break; + case IP_VERSION(6, 3, 1): + if (adev->ras_hw_enabled & (1 << AMDGPU_RAS_BLOCK__DF)) + /* unlike other generation of nbio ras, + * nbif v6_3_1 only support fatal error interrupt + * to inform software that DF is freezed due to + * system fatal error event. driver should not + * enable nbio ras in such case. Instead, + * check DF RAS + */ + adev->nbio.ras = &nbif_v6_3_1_ras; + break; case IP_VERSION(7, 9, 0): + case IP_VERSION(7, 9, 1): if (!adev->gmc.is_app_apu) adev->nbio.ras = &nbio_v7_9_ras; break; @@ -4083,16 +4303,56 @@ bool amdgpu_ras_get_fed_status(struct amdgpu_device *adev) if (!ras) return false; - return atomic_read(&ras->fed); + return test_bit(AMDGPU_RAS_BLOCK__LAST, &ras->ras_err_state); } void amdgpu_ras_set_fed(struct amdgpu_device *adev, bool status) { struct amdgpu_ras *ras; + ras = amdgpu_ras_get_context(adev); + if (ras) { + if (status) + set_bit(AMDGPU_RAS_BLOCK__LAST, &ras->ras_err_state); + else + clear_bit(AMDGPU_RAS_BLOCK__LAST, &ras->ras_err_state); + } +} + +void amdgpu_ras_clear_err_state(struct amdgpu_device *adev) +{ + struct amdgpu_ras *ras; + ras = amdgpu_ras_get_context(adev); if (ras) - atomic_set(&ras->fed, !!status); + ras->ras_err_state = 0; +} + +void amdgpu_ras_set_err_poison(struct amdgpu_device *adev, + enum amdgpu_ras_block block) +{ + struct amdgpu_ras *ras; + + ras = amdgpu_ras_get_context(adev); + if (ras) + set_bit(block, &ras->ras_err_state); +} + +bool amdgpu_ras_is_err_state(struct amdgpu_device *adev, int block) +{ + struct amdgpu_ras *ras; + + ras = amdgpu_ras_get_context(adev); + if (ras) { + if (block == AMDGPU_RAS_BLOCK__ANY) + return (ras->ras_err_state != 0); + else + return test_bit(block, &ras->ras_err_state) || + test_bit(AMDGPU_RAS_BLOCK__LAST, + &ras->ras_err_state); + } + + return false; } static struct ras_event_manager *__get_ras_event_mgr(struct amdgpu_device *adev) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h index 6db772ecfee47f4b76ba87e61e508233494cca8c..82db986c36a0a83b53570e7a209521531431d1ab 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h @@ -99,7 +99,8 @@ enum amdgpu_ras_block { AMDGPU_RAS_BLOCK__IH, AMDGPU_RAS_BLOCK__MPIO, - AMDGPU_RAS_BLOCK__LAST + AMDGPU_RAS_BLOCK__LAST, + AMDGPU_RAS_BLOCK__ANY = -1 }; enum amdgpu_ras_mca_block { @@ -482,6 +483,8 @@ struct ras_ecc_err { uint64_t ipid; uint64_t addr; uint64_t pa_pfn; + /* save global channel index across all UMC instances */ + uint32_t channel_idx; struct ras_err_pages err_pages; }; @@ -558,8 +561,8 @@ struct amdgpu_ras { struct ras_ecc_log_info umc_ecc_log; struct delayed_work page_retirement_dwork; - /* Fatal error detected flag */ - atomic_t fed; + /* ras errors detected */ + unsigned long ras_err_state; /* RAS event manager */ struct ras_event_manager __event_mgr; @@ -750,7 +753,7 @@ int amdgpu_ras_query_error_count(struct amdgpu_device *adev, /* error handling functions */ int amdgpu_ras_add_bad_pages(struct amdgpu_device *adev, - struct eeprom_table_record *bps, int pages); + struct eeprom_table_record *bps, int pages, bool from_rom); int amdgpu_ras_save_bad_pages(struct amdgpu_device *adev, unsigned long *new_cnt); @@ -952,6 +955,10 @@ ssize_t amdgpu_ras_aca_sysfs_read(struct device *dev, struct device_attribute *a void amdgpu_ras_set_fed(struct amdgpu_device *adev, bool status); bool amdgpu_ras_get_fed_status(struct amdgpu_device *adev); +void amdgpu_ras_set_err_poison(struct amdgpu_device *adev, + enum amdgpu_ras_block block); +void amdgpu_ras_clear_err_state(struct amdgpu_device *adev); +bool amdgpu_ras_is_err_state(struct amdgpu_device *adev, int block); u64 amdgpu_ras_acquire_event_id(struct amdgpu_device *adev, enum ras_event_type type); int amdgpu_ras_mark_ras_event_caller(struct amdgpu_device *adev, enum ras_event_type type, diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c index f28f6b4ba765d0022badbb5e2790f62d7b867e44..52c16bfeccaad88330489283d2c7792e902554aa 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c @@ -470,9 +470,10 @@ int amdgpu_ras_eeprom_reset_table(struct amdgpu_ras_eeprom_control *control) res = __write_table_ras_info(control); control->ras_num_recs = 0; + control->ras_num_bad_pages = 0; control->ras_fri = 0; - amdgpu_dpm_send_hbm_bad_pages_num(adev, control->ras_num_recs); + amdgpu_dpm_send_hbm_bad_pages_num(adev, control->ras_num_bad_pages); control->bad_channel_bitmap = 0; amdgpu_dpm_send_hbm_bad_channel_flag(adev, control->bad_channel_bitmap); @@ -559,7 +560,7 @@ bool amdgpu_ras_eeprom_check_err_threshold(struct amdgpu_device *adev) if (con->eeprom_control.tbl_hdr.header == RAS_TABLE_HDR_BAD) { if (amdgpu_bad_page_threshold == -1) { dev_warn(adev->dev, "RAS records:%d exceed threshold:%d", - con->eeprom_control.ras_num_recs, con->bad_page_cnt_threshold); + con->eeprom_control.ras_num_bad_pages, con->bad_page_cnt_threshold); dev_warn(adev->dev, "But GPU can be operated due to bad_page_threshold = -1.\n"); return false; @@ -621,6 +622,7 @@ amdgpu_ras_eeprom_append_table(struct amdgpu_ras_eeprom_control *control, const u32 num) { struct amdgpu_ras *con = amdgpu_ras_get_context(to_amdgpu_device(control)); + struct amdgpu_device *adev = to_amdgpu_device(control); u32 a, b, i; u8 *buf, *pp; int res; @@ -723,6 +725,12 @@ amdgpu_ras_eeprom_append_table(struct amdgpu_ras_eeprom_control *control, control->ras_num_recs = 1 + (control->ras_max_record_count + b - control->ras_fri) % control->ras_max_record_count; + + if (control->rec_type == AMDGPU_RAS_EEPROM_REC_PA) + control->ras_num_bad_pages = control->ras_num_recs; + else + control->ras_num_bad_pages = + control->ras_num_recs * adev->umc.retire_unit; Out: kfree(buf); return res; @@ -740,10 +748,10 @@ amdgpu_ras_eeprom_update_header(struct amdgpu_ras_eeprom_control *control) /* Modify the header if it exceeds. */ if (amdgpu_bad_page_threshold != 0 && - control->ras_num_recs >= ras->bad_page_cnt_threshold) { + control->ras_num_bad_pages >= ras->bad_page_cnt_threshold) { dev_warn(adev->dev, "Saved bad pages %d reaches threshold value %d\n", - control->ras_num_recs, ras->bad_page_cnt_threshold); + control->ras_num_bad_pages, ras->bad_page_cnt_threshold); control->tbl_hdr.header = RAS_TABLE_HDR_BAD; if (control->tbl_hdr.version == RAS_TABLE_VER_V2_1) { control->tbl_rai.rma_status = GPU_RETIRED__ECC_REACH_THRESHOLD; @@ -798,9 +806,9 @@ amdgpu_ras_eeprom_update_header(struct amdgpu_ras_eeprom_control *control) */ if (amdgpu_bad_page_threshold != 0 && control->tbl_hdr.version == RAS_TABLE_VER_V2_1 && - control->ras_num_recs < ras->bad_page_cnt_threshold) + control->ras_num_bad_pages < ras->bad_page_cnt_threshold) control->tbl_rai.health_percent = ((ras->bad_page_cnt_threshold - - control->ras_num_recs) * 100) / + control->ras_num_bad_pages) * 100) / ras->bad_page_cnt_threshold; /* Recalc the checksum. @@ -841,7 +849,7 @@ int amdgpu_ras_eeprom_append(struct amdgpu_ras_eeprom_control *control, const u32 num) { struct amdgpu_device *adev = to_amdgpu_device(control); - int res; + int res, i; if (!__is_ras_eeprom_supported(adev)) return 0; @@ -855,6 +863,10 @@ int amdgpu_ras_eeprom_append(struct amdgpu_ras_eeprom_control *control, return -EINVAL; } + /* set the new channel index flag */ + for (i = 0; i < num; i++) + record[i].retired_page |= UMC_CHANNEL_IDX_V2; + mutex_lock(&control->ras_tbl_mutex); res = amdgpu_ras_eeprom_append_table(control, record, num); @@ -864,6 +876,11 @@ int amdgpu_ras_eeprom_append(struct amdgpu_ras_eeprom_control *control, amdgpu_ras_debugfs_set_ret_size(control); mutex_unlock(&control->ras_tbl_mutex); + + /* clear channel index flag, the flag is only saved on eeprom */ + for (i = 0; i < num; i++) + record[i].retired_page &= ~UMC_CHANNEL_IDX_V2; + return res; } @@ -1373,9 +1390,35 @@ int amdgpu_ras_eeprom_init(struct amdgpu_ras_eeprom_control *control) } control->ras_fri = RAS_OFFSET_TO_INDEX(control, hdr->first_rec_offset); + return 0; +} + +int amdgpu_ras_eeprom_check(struct amdgpu_ras_eeprom_control *control) +{ + struct amdgpu_device *adev = to_amdgpu_device(control); + struct amdgpu_ras_eeprom_table_header *hdr = &control->tbl_hdr; + struct amdgpu_ras *ras = amdgpu_ras_get_context(adev); + int res; + + if (!__is_ras_eeprom_supported(adev)) + return 0; + + /* Verify i2c adapter is initialized */ + if (!adev->pm.ras_eeprom_i2c_bus || !adev->pm.ras_eeprom_i2c_bus->algo) + return -ENOENT; + + if (!__get_eeprom_i2c_addr(adev, control)) + return -EINVAL; + + if (control->rec_type == AMDGPU_RAS_EEPROM_REC_PA) + control->ras_num_bad_pages = control->ras_num_recs; + else + control->ras_num_bad_pages = + control->ras_num_recs * adev->umc.retire_unit; + if (hdr->header == RAS_TABLE_HDR_VAL) { DRM_DEBUG_DRIVER("Found existing EEPROM table with %d records", - control->ras_num_recs); + control->ras_num_bad_pages); if (hdr->version == RAS_TABLE_VER_V2_1) { res = __read_table_ras_info(control); @@ -1390,9 +1433,9 @@ int amdgpu_ras_eeprom_init(struct amdgpu_ras_eeprom_control *control) /* Warn if we are at 90% of the threshold or above */ - if (10 * control->ras_num_recs >= 9 * ras->bad_page_cnt_threshold) + if (10 * control->ras_num_bad_pages >= 9 * ras->bad_page_cnt_threshold) dev_warn(adev->dev, "RAS records:%u exceeds 90%% of threshold:%d", - control->ras_num_recs, + control->ras_num_bad_pages, ras->bad_page_cnt_threshold); } else if (hdr->header == RAS_TABLE_HDR_BAD && amdgpu_bad_page_threshold != 0) { @@ -1403,10 +1446,12 @@ int amdgpu_ras_eeprom_init(struct amdgpu_ras_eeprom_control *control) } res = __verify_ras_table_checksum(control); - if (res) - DRM_ERROR("RAS Table incorrect checksum or error:%d\n", + if (res) { + dev_err(adev->dev, "RAS Table incorrect checksum or error:%d\n", res); - if (ras->bad_page_cnt_threshold > control->ras_num_recs) { + return -EINVAL; + } + if (ras->bad_page_cnt_threshold > control->ras_num_bad_pages) { /* This means that, the threshold was increased since * the last time the system was booted, and now, * ras->bad_page_cnt_threshold - control->num_recs > 0, @@ -1416,13 +1461,13 @@ int amdgpu_ras_eeprom_init(struct amdgpu_ras_eeprom_control *control) dev_info(adev->dev, "records:%d threshold:%d, resetting " "RAS table header signature", - control->ras_num_recs, + control->ras_num_bad_pages, ras->bad_page_cnt_threshold); res = amdgpu_ras_eeprom_correct_header_tag(control, RAS_TABLE_HDR_VAL); } else { dev_err(adev->dev, "RAS records:%d exceed threshold:%d", - control->ras_num_recs, ras->bad_page_cnt_threshold); + control->ras_num_bad_pages, ras->bad_page_cnt_threshold); if (amdgpu_bad_page_threshold == -1) { dev_warn(adev->dev, "GPU will be initialized due to bad_page_threshold = -1."); res = 0; @@ -1431,7 +1476,7 @@ int amdgpu_ras_eeprom_init(struct amdgpu_ras_eeprom_control *control) dev_err(adev->dev, "RAS records:%d exceed threshold:%d, " "GPU will not be initialized. Replace this GPU or increase the threshold", - control->ras_num_recs, ras->bad_page_cnt_threshold); + control->ras_num_bad_pages, ras->bad_page_cnt_threshold); } } } else { diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.h index b9ebda577797dc12ed6f0af1c74ecca5db4818b5..81d55cb7b397f6538fb387fbe0287870c3509f0c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.h @@ -43,6 +43,19 @@ enum amdgpu_ras_eeprom_err_type { AMDGPU_RAS_EEPROM_ERR_COUNT, }; +/* + * one UMC MCA address could map to multiply physical address (PA), + * such as 1:16, we use eeprom_table_record.address to store MCA + * address and use eeprom_table_record.retired_page to save PA. + * + * AMDGPU_RAS_EEPROM_REC_PA: one record store one PA + * AMDGPU_RAS_EEPROM_REC_MCA: one record store one MCA address + */ +enum amdgpu_ras_eeprom_rec_type { + AMDGPU_RAS_EEPROM_REC_PA, + AMDGPU_RAS_EEPROM_REC_MCA, +}; + struct amdgpu_ras_eeprom_table_header { uint32_t header; uint32_t version; @@ -82,6 +95,11 @@ struct amdgpu_ras_eeprom_control { */ u32 ras_num_recs; + /* the bad page number is ras_num_recs or + * ras_num_recs * umc.retire_unit + */ + u32 ras_num_bad_pages; + /* First record index to read, 0-based. * Range is [0, num_recs-1]. This is * an absolute index, starting right after @@ -102,6 +120,7 @@ struct amdgpu_ras_eeprom_control { /* Record channel info which occurred bad pages */ u32 bad_channel_bitmap; + enum amdgpu_ras_eeprom_rec_type rec_type; }; /* @@ -145,6 +164,8 @@ uint32_t amdgpu_ras_eeprom_max_record_count(struct amdgpu_ras_eeprom_control *co void amdgpu_ras_debugfs_set_ret_size(struct amdgpu_ras_eeprom_control *control); +int amdgpu_ras_eeprom_check(struct amdgpu_ras_eeprom_control *control); + extern const struct file_operations amdgpu_ras_debugfs_eeprom_size_ops; extern const struct file_operations amdgpu_ras_debugfs_eeprom_table_ops; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c index a0acb65f4b40afbcd9d1a305b4893aa66fadfde7..dabfbdf6f1ce670646803a9b1d284bcf8e46925d 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c @@ -183,6 +183,7 @@ int amdgpu_reset_init(struct amdgpu_device *adev) switch (amdgpu_ip_version(adev, MP1_HWIP, 0)) { case IP_VERSION(13, 0, 2): case IP_VERSION(13, 0, 6): + case IP_VERSION(13, 0, 12): case IP_VERSION(13, 0, 14): ret = aldebaran_reset_init(adev); break; @@ -206,6 +207,7 @@ int amdgpu_reset_fini(struct amdgpu_device *adev) switch (amdgpu_ip_version(adev, MP1_HWIP, 0)) { case IP_VERSION(13, 0, 2): case IP_VERSION(13, 0, 6): + case IP_VERSION(13, 0, 12): case IP_VERSION(13, 0, 14): ret = aldebaran_reset_fini(adev); break; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h index 36fc9578c53c03433226bd36f6c0d9af20777bd5..dee5a1b4e572136a8afd973f569e8c2bbd90da65 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h @@ -462,8 +462,7 @@ int amdgpu_ib_get(struct amdgpu_device *adev, struct amdgpu_vm *vm, unsigned size, enum amdgpu_ib_pool_type pool, struct amdgpu_ib *ib); -void amdgpu_ib_free(struct amdgpu_device *adev, struct amdgpu_ib *ib, - struct dma_fence *f); +void amdgpu_ib_free(struct amdgpu_ib *ib, struct dma_fence *f); int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned num_ibs, struct amdgpu_ib *ibs, struct amdgpu_job *job, struct dma_fence **f); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sa.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_sa.c index 10df731998b22f873611092942008560ec67394d..39070b2a4c04fdda0ea29c4329192d453815908f 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sa.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sa.c @@ -93,8 +93,7 @@ int amdgpu_sa_bo_new(struct amdgpu_sa_manager *sa_manager, return 0; } -void amdgpu_sa_bo_free(struct amdgpu_device *adev, struct drm_suballoc **sa_bo, - struct dma_fence *fence) +void amdgpu_sa_bo_free(struct drm_suballoc **sa_bo, struct dma_fence *fence) { if (sa_bo == NULL || *sa_bo == NULL) { return; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c index 113f0d2426187e75a8d184f466ca32ebeaaa62a5..632295bf3875394c7b69935fc62b3803abc83651 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c @@ -219,9 +219,11 @@ int amdgpu_sdma_init_microcode(struct amdgpu_device *adev, amdgpu_ucode_ip_version_decode(adev, SDMA0_HWIP, ucode_prefix, sizeof(ucode_prefix)); if (instance == 0) err = amdgpu_ucode_request(adev, &adev->sdma.instance[instance].fw, + AMDGPU_UCODE_REQUIRED, "amdgpu/%s.bin", ucode_prefix); else err = amdgpu_ucode_request(adev, &adev->sdma.instance[instance].fw, + AMDGPU_UCODE_REQUIRED, "amdgpu/%s%d.bin", ucode_prefix, instance); if (err) goto out; @@ -260,6 +262,8 @@ int amdgpu_sdma_init_microcode(struct amdgpu_device *adev, * groups of SDMAs */ if ((amdgpu_ip_version(adev, SDMA0_HWIP, 0) == IP_VERSION(4, 4, 2) || + amdgpu_ip_version(adev, SDMA0_HWIP, 0) == + IP_VERSION(4, 4, 4) || amdgpu_ip_version(adev, SDMA0_HWIP, 0) == IP_VERSION(4, 4, 5)) && adev->firmware.load_type == diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index c8180cad0abdd8c8aaa61b6376fd0fcf7f84a72b..2d85f1ec3041abe691157787d711a9494bea2230 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -1762,7 +1762,8 @@ static int amdgpu_ttm_reserve_tmr(struct amdgpu_device *adev) if (!adev->bios && (amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(9, 4, 3) || - amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(9, 4, 4))) + amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(9, 4, 4) || + amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(9, 5, 0))) reserve_size = max(reserve_size, (uint32_t)280 << 20); else if (!reserve_size) reserve_size = DISCOVERY_TMR_OFFSET; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c index 4c7b53648a507afcb49bff8ddba487fa026bc4fa..cf700824b960b7ace8aa43f677ee64f3490ecd23 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c @@ -1434,6 +1434,7 @@ void amdgpu_ucode_ip_version_decode(struct amdgpu_device *adev, int block_type, * * @adev: amdgpu device * @fw: pointer to load firmware to + * @required: whether the firmware is required * @fmt: firmware name format string * @...: variable arguments * @@ -1442,7 +1443,7 @@ void amdgpu_ucode_ip_version_decode(struct amdgpu_device *adev, int block_type, * the error code to -ENODEV, so that early_init functions will fail to load. */ int amdgpu_ucode_request(struct amdgpu_device *adev, const struct firmware **fw, - const char *fmt, ...) + enum amdgpu_ucode_required required, const char *fmt, ...) { char fname[AMDGPU_UCODE_NAME_MAX]; va_list ap; @@ -1456,16 +1457,24 @@ int amdgpu_ucode_request(struct amdgpu_device *adev, const struct firmware **fw, return -EOVERFLOW; } - r = request_firmware(fw, fname, adev->dev); + if (required == AMDGPU_UCODE_REQUIRED) + r = request_firmware(fw, fname, adev->dev); + else { + r = firmware_request_nowarn(fw, fname, adev->dev); + if (r) + drm_info(&adev->ddev, "Optional firmware \"%s\" was not found\n", fname); + } if (r) return -ENODEV; r = amdgpu_ucode_validate(*fw); - if (r) { + if (r) + /* + * The amdgpu_ucode_request() should be paired with amdgpu_ucode_release() + * regardless of success/failure, and the amdgpu_ucode_release() takes care of + * firmware release and need to avoid redundant release FW operation here. + */ dev_dbg(adev->dev, "\"%s\" failed to validate\n", fname); - release_firmware(*fw); - *fw = NULL; - } return r; } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.h index 4150ec0aa10d65ca66da5088c5d0ecbf43434283..4eedd92f000be9b1b9da5cf4f209fc6d89303e11 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.h @@ -126,6 +126,7 @@ enum psp_fw_type { PSP_FW_TYPE_PSP_DBG_DRV, PSP_FW_TYPE_PSP_RAS_DRV, PSP_FW_TYPE_PSP_IPKEYMGR_DRV, + PSP_FW_TYPE_PSP_SPDM_DRV, PSP_FW_TYPE_MAX_INDEX, }; @@ -551,6 +552,11 @@ enum amdgpu_firmware_load_type { AMDGPU_FW_LOAD_RLC_BACKDOOR_AUTO, }; +enum amdgpu_ucode_required { + AMDGPU_UCODE_OPTIONAL, + AMDGPU_UCODE_REQUIRED, +}; + /* conform to smu_ucode_xfer_cz.h */ #define AMDGPU_SDMA0_UCODE_LOADED 0x00000001 #define AMDGPU_SDMA1_UCODE_LOADED 0x00000002 @@ -604,9 +610,9 @@ void amdgpu_ucode_print_rlc_hdr(const struct common_firmware_header *hdr); void amdgpu_ucode_print_sdma_hdr(const struct common_firmware_header *hdr); void amdgpu_ucode_print_psp_hdr(const struct common_firmware_header *hdr); void amdgpu_ucode_print_gpu_info_hdr(const struct common_firmware_header *hdr); -__printf(3, 4) +__printf(4, 5) int amdgpu_ucode_request(struct amdgpu_device *adev, const struct firmware **fw, - const char *fmt, ...); + enum amdgpu_ucode_required required, const char *fmt, ...); void amdgpu_ucode_release(const struct firmware **fw); bool amdgpu_ucode_hdr_version(union amdgpu_firmware_header *hdr, uint16_t hdr_major, uint16_t hdr_minor); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c index 896f3609b0eedd162ed67517caf64f76744434c2..eafe20d8fe0b65e81125f463325885305ce8dd4a 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c @@ -78,7 +78,7 @@ int amdgpu_umc_page_retirement_mca(struct amdgpu_device *adev, if (amdgpu_bad_page_threshold != 0) { amdgpu_ras_add_bad_pages(adev, err_data.err_addr, - err_data.err_addr_cnt); + err_data.err_addr_cnt, false); amdgpu_ras_save_bad_pages(adev, NULL); } @@ -166,10 +166,11 @@ void amdgpu_umc_handle_bad_pages(struct amdgpu_device *adev, if ((amdgpu_bad_page_threshold != 0) && err_data->err_addr_cnt) { amdgpu_ras_add_bad_pages(adev, err_data->err_addr, - err_data->err_addr_cnt); + err_data->err_addr_cnt, false); amdgpu_ras_save_bad_pages(adev, &err_count); - amdgpu_dpm_send_hbm_bad_pages_num(adev, con->eeprom_control.ras_num_recs); + amdgpu_dpm_send_hbm_bad_pages_num(adev, + con->eeprom_control.ras_num_bad_pages); if (con->update_channel_flag == true) { amdgpu_dpm_send_hbm_bad_channel_flag(adev, con->eeprom_control.bad_channel_bitmap); @@ -444,3 +445,77 @@ int amdgpu_umc_logs_ecc_err(struct amdgpu_device *adev, return ret; } + +int amdgpu_umc_pages_in_a_row(struct amdgpu_device *adev, + struct ras_err_data *err_data, uint64_t pa_addr) +{ + struct ta_ras_query_address_output addr_out; + + /* reinit err_data */ + err_data->err_addr_cnt = 0; + err_data->err_addr_len = adev->umc.retire_unit; + + addr_out.pa.pa = pa_addr; + if (adev->umc.ras && adev->umc.ras->convert_ras_err_addr) + return adev->umc.ras->convert_ras_err_addr(adev, err_data, NULL, + &addr_out, false); + else + return -EINVAL; +} + +int amdgpu_umc_lookup_bad_pages_in_a_row(struct amdgpu_device *adev, + uint64_t pa_addr, uint64_t *pfns, int len) +{ + int i, ret; + struct ras_err_data err_data; + + err_data.err_addr = kcalloc(adev->umc.retire_unit, + sizeof(struct eeprom_table_record), GFP_KERNEL); + if (!err_data.err_addr) { + dev_warn(adev->dev, "Failed to alloc memory in bad page lookup!\n"); + return 0; + } + + ret = amdgpu_umc_pages_in_a_row(adev, &err_data, pa_addr); + if (ret) + goto out; + + for (i = 0; i < adev->umc.retire_unit; i++) { + if (i >= len) + goto out; + + pfns[i] = err_data.err_addr[i].retired_page; + } + ret = i; + +out: + kfree(err_data.err_addr); + return ret; +} + +int amdgpu_umc_mca_to_addr(struct amdgpu_device *adev, + uint64_t err_addr, uint32_t ch, uint32_t umc, + uint32_t node, uint32_t socket, + struct ta_ras_query_address_output *addr_out, bool dump_addr) +{ + struct ta_ras_query_address_input addr_in; + int ret; + + memset(&addr_in, 0, sizeof(addr_in)); + addr_in.ma.err_addr = err_addr; + addr_in.ma.ch_inst = ch; + addr_in.ma.umc_inst = umc; + addr_in.ma.node_inst = node; + addr_in.ma.socket_id = socket; + + if (adev->umc.ras && adev->umc.ras->convert_ras_err_addr) { + ret = adev->umc.ras->convert_ras_err_addr(adev, NULL, &addr_in, + addr_out, dump_addr); + if (ret) + return ret; + } else { + return 0; + } + + return 0; +} diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h index ce4179db2a6d13f1c423cf95feff1a73e304d242..a4a7e61817aa7c32578630e5043ecd51d8f09209 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h @@ -54,6 +54,22 @@ /* Page retirement tag */ #define UMC_ECC_NEW_DETECTED_TAG 0x1 +/* + * a flag to indicate v2 of channel index stored in eeprom + * + * v1 (legacy way): store channel index within a umc instance in eeprom + * range in UMC v12: 0 ~ 7 + * v2: store global channel index in eeprom + * range in UMC v12: 0 ~ 127 + * + * NOTE: it's better to store it in eeprom_table_record.mem_channel, + * but there is only 8 bits in mem_channel, and the channel number may + * increase in the future, we decide to save it in + * eeprom_table_record.retired_page. retired_page is useless in v2, + * we depend on eeprom_table_record.address instead of retired_page in v2. + * Only 48 bits are saved on eeprom, use bit 47 here. + */ +#define UMC_CHANNEL_IDX_V2 BIT_ULL(47) typedef int (*umc_func)(struct amdgpu_device *adev, uint32_t node_inst, uint32_t umc_inst, uint32_t ch_inst, void *data); @@ -70,6 +86,13 @@ struct amdgpu_umc_ras { enum amdgpu_mca_error_type type, void *ras_error_status); int (*update_ecc_status)(struct amdgpu_device *adev, uint64_t status, uint64_t ipid, uint64_t addr); + int (*convert_ras_err_addr)(struct amdgpu_device *adev, + struct ras_err_data *err_data, + struct ta_ras_query_address_input *addr_in, + struct ta_ras_query_address_output *addr_out, + bool dump_addr); + uint32_t (*get_die_id_from_pa)(struct amdgpu_device *adev, + uint64_t mca_addr, uint64_t retired_page); }; struct amdgpu_umc_funcs { @@ -134,4 +157,12 @@ int amdgpu_umc_logs_ecc_err(struct amdgpu_device *adev, void amdgpu_umc_handle_bad_pages(struct amdgpu_device *adev, void *ras_error_status); +int amdgpu_umc_pages_in_a_row(struct amdgpu_device *adev, + struct ras_err_data *err_data, uint64_t pa_addr); +int amdgpu_umc_lookup_bad_pages_in_a_row(struct amdgpu_device *adev, + uint64_t pa_addr, uint64_t *pfns, int len); +int amdgpu_umc_mca_to_addr(struct amdgpu_device *adev, + uint64_t err_addr, uint32_t ch, uint32_t umc, + uint32_t node, uint32_t socket, + struct ta_ras_query_address_output *addr_out, bool dump_addr); #endif diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_umsch_mm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_umsch_mm.c index bd2d3863c3ed1f03f2d5d9d0ae1a28a84e24e4be..dde15c6a96e1ae1939de40425af4fc85aa4b8b64 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_umsch_mm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_umsch_mm.c @@ -587,7 +587,8 @@ int amdgpu_umsch_mm_init_microcode(struct amdgpu_umsch_mm *umsch) break; } - r = amdgpu_ucode_request(adev, &adev->umsch_mm.fw, "%s", fw_name); + r = amdgpu_ucode_request(adev, &adev->umsch_mm.fw, AMDGPU_UCODE_REQUIRED, + "%s", fw_name); if (r) { release_firmware(adev->umsch_mm.fw); adev->umsch_mm.fw = NULL; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c index 31fd30dcd593bad8ed0f092314720591fc5fa1be..74758b5ffc6c8fb28081e7a5242c247b169a955f 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c @@ -260,7 +260,7 @@ int amdgpu_uvd_sw_init(struct amdgpu_device *adev) return -EINVAL; } - r = amdgpu_ucode_request(adev, &adev->uvd.fw, "%s", fw_name); + r = amdgpu_ucode_request(adev, &adev->uvd.fw, AMDGPU_UCODE_REQUIRED, "%s", fw_name); if (r) { dev_err(adev->dev, "amdgpu_uvd: Can't validate firmware \"%s\"\n", fw_name); @@ -551,6 +551,8 @@ static void amdgpu_uvd_force_into_uvd_segment(struct amdgpu_bo *abo) for (i = 0; i < abo->placement.num_placement; ++i) { abo->placements[i].fpfn = 0 >> PAGE_SHIFT; abo->placements[i].lpfn = (256 * 1024 * 1024) >> PAGE_SHIFT; + if (abo->placements[i].mem_type == TTM_PL_VRAM) + abo->placements[i].flags |= TTM_PL_FLAG_CONTIGUOUS; } } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c index 599d3ca4e0ef9e20e55ae8a76840705f9f894326..b9060bcd48064d659bb930c57ed3730b4328b2f0 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c @@ -158,7 +158,7 @@ int amdgpu_vce_sw_init(struct amdgpu_device *adev, unsigned long size) return -EINVAL; } - r = amdgpu_ucode_request(adev, &adev->vce.fw, "%s", fw_name); + r = amdgpu_ucode_request(adev, &adev->vce.fw, AMDGPU_UCODE_REQUIRED, "%s", fw_name); if (r) { dev_err(adev->dev, "amdgpu_vce: Can't validate firmware \"%s\"\n", fw_name); @@ -503,7 +503,7 @@ static int amdgpu_vce_get_create_msg(struct amdgpu_ring *ring, uint32_t handle, ib->ptr[i] = 0x0; r = amdgpu_job_submit_direct(job, ring, &f); - amdgpu_ib_free(ring->adev, &ib_msg, f); + amdgpu_ib_free(&ib_msg, f); if (r) goto err; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c index 3e94c3ba1ba2cf494984d1bbb2e5d05e1f68674f..83faf6e6788a212bbe536fc26d2f8fd7fe0a61e5 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c @@ -1,5 +1,5 @@ /* - * Copyright 2016 Advanced Micro Devices, Inc. + * Copyright 2016-2024 Advanced Micro Devices, Inc. * All Rights Reserved. * * Permission is hereby granted, free of charge, to any person obtaining a @@ -62,6 +62,7 @@ #define FIRMWARE_VCN4_0_6 "amdgpu/vcn_4_0_6.bin" #define FIRMWARE_VCN4_0_6_1 "amdgpu/vcn_4_0_6_1.bin" #define FIRMWARE_VCN5_0_0 "amdgpu/vcn_5_0_0.bin" +#define FIRMWARE_VCN5_0_1 "amdgpu/vcn_5_0_1.bin" MODULE_FIRMWARE(FIRMWARE_RAVEN); MODULE_FIRMWARE(FIRMWARE_PICASSO); @@ -88,6 +89,7 @@ MODULE_FIRMWARE(FIRMWARE_VCN4_0_5); MODULE_FIRMWARE(FIRMWARE_VCN4_0_6); MODULE_FIRMWARE(FIRMWARE_VCN4_0_6_1); MODULE_FIRMWARE(FIRMWARE_VCN5_0_0); +MODULE_FIRMWARE(FIRMWARE_VCN5_0_1); static void amdgpu_vcn_idle_work_handler(struct work_struct *work); @@ -99,11 +101,15 @@ int amdgpu_vcn_early_init(struct amdgpu_device *adev) amdgpu_ucode_ip_version_decode(adev, UVD_HWIP, ucode_prefix, sizeof(ucode_prefix)); for (i = 0; i < adev->vcn.num_vcn_inst; i++) { if (i == 1 && amdgpu_ip_version(adev, UVD_HWIP, 0) == IP_VERSION(4, 0, 6)) - r = amdgpu_ucode_request(adev, &adev->vcn.fw[i], "amdgpu/%s_%d.bin", ucode_prefix, i); + r = amdgpu_ucode_request(adev, &adev->vcn.inst[i].fw, + AMDGPU_UCODE_REQUIRED, + "amdgpu/%s_%d.bin", ucode_prefix, i); else - r = amdgpu_ucode_request(adev, &adev->vcn.fw[i], "amdgpu/%s.bin", ucode_prefix); + r = amdgpu_ucode_request(adev, &adev->vcn.inst[i].fw, + AMDGPU_UCODE_REQUIRED, + "amdgpu/%s.bin", ucode_prefix); if (r) { - amdgpu_ucode_release(&adev->vcn.fw[i]); + amdgpu_ucode_release(&adev->vcn.inst[i].fw); return r; } } @@ -151,7 +157,7 @@ int amdgpu_vcn_sw_init(struct amdgpu_device *adev) adev->vcn.using_unified_queue = amdgpu_ip_version(adev, UVD_HWIP, 0) >= IP_VERSION(4, 0, 0); - hdr = (const struct common_firmware_header *)adev->vcn.fw[0]->data; + hdr = (const struct common_firmware_header *)adev->vcn.inst[0].fw->data; adev->vcn.fw_version = le32_to_cpu(hdr->ucode_version); /* Bit 20-23, it is encode major and non-zero for new naming convention. @@ -270,7 +276,7 @@ int amdgpu_vcn_sw_fini(struct amdgpu_device *adev) for (i = 0; i < adev->vcn.num_enc_rings; ++i) amdgpu_ring_fini(&adev->vcn.inst[j].ring_enc[i]); - amdgpu_ucode_release(&adev->vcn.fw[j]); + amdgpu_ucode_release(&adev->vcn.inst[j].fw); } mutex_destroy(&adev->vcn.vcn1_jpeg1_workaround); @@ -282,7 +288,7 @@ int amdgpu_vcn_sw_fini(struct amdgpu_device *adev) bool amdgpu_vcn_is_disabled_vcn(struct amdgpu_device *adev, enum vcn_ring_type type, uint32_t vcn_instance) { bool ret = false; - int vcn_config = adev->vcn.vcn_config[vcn_instance]; + int vcn_config = adev->vcn.inst[vcn_instance].vcn_config; if ((type == VCN_ENCODE_RING) && (vcn_config & VCN_BLOCK_ENCODE_DISABLE_MASK)) ret = true; @@ -362,12 +368,12 @@ int amdgpu_vcn_resume(struct amdgpu_device *adev) const struct common_firmware_header *hdr; unsigned int offset; - hdr = (const struct common_firmware_header *)adev->vcn.fw[i]->data; + hdr = (const struct common_firmware_header *)adev->vcn.inst[i].fw->data; if (adev->firmware.load_type != AMDGPU_FW_LOAD_PSP) { offset = le32_to_cpu(hdr->ucode_array_offset_bytes); if (drm_dev_enter(adev_to_drm(adev), &idx)) { memcpy_toio(adev->vcn.inst[i].cpu_addr, - adev->vcn.fw[i]->data + offset, + adev->vcn.inst[i].fw->data + offset, le32_to_cpu(hdr->ucode_size_bytes)); drm_dev_exit(idx); } @@ -580,7 +586,7 @@ static int amdgpu_vcn_dec_send_msg(struct amdgpu_ring *ring, if (r) goto err_free; - amdgpu_ib_free(adev, ib_msg, f); + amdgpu_ib_free(ib_msg, f); if (fence) *fence = dma_fence_get(f); @@ -591,7 +597,7 @@ static int amdgpu_vcn_dec_send_msg(struct amdgpu_ring *ring, err_free: amdgpu_job_free(job); err: - amdgpu_ib_free(adev, ib_msg, f); + amdgpu_ib_free(ib_msg, f); return r; } @@ -773,7 +779,7 @@ static int amdgpu_vcn_dec_sw_send_msg(struct amdgpu_ring *ring, if (r) goto err_free; - amdgpu_ib_free(adev, ib_msg, f); + amdgpu_ib_free(ib_msg, f); if (fence) *fence = dma_fence_get(f); @@ -784,7 +790,7 @@ static int amdgpu_vcn_dec_sw_send_msg(struct amdgpu_ring *ring, err_free: amdgpu_job_free(job); err: - amdgpu_ib_free(adev, ib_msg, f); + amdgpu_ib_free(ib_msg, f); return r; } @@ -1014,7 +1020,7 @@ int amdgpu_vcn_enc_ring_test_ib(struct amdgpu_ring *ring, long timeout) r = 0; error: - amdgpu_ib_free(adev, &ib, fence); + amdgpu_ib_free(&ib, fence); dma_fence_put(fence); return r; @@ -1025,7 +1031,8 @@ int amdgpu_vcn_unified_ring_test_ib(struct amdgpu_ring *ring, long timeout) struct amdgpu_device *adev = ring->adev; long r; - if (amdgpu_ip_version(adev, UVD_HWIP, 0) != IP_VERSION(4, 0, 3)) { + if ((amdgpu_ip_version(adev, UVD_HWIP, 0) != IP_VERSION(4, 0, 3)) && + (amdgpu_ip_version(adev, UVD_HWIP, 0) != IP_VERSION(5, 0, 1))) { r = amdgpu_vcn_enc_ring_test_ib(ring, timeout); if (r) goto error; @@ -1063,7 +1070,7 @@ void amdgpu_vcn_setup_ucode(struct amdgpu_device *adev) if (adev->vcn.harvest_config & (1 << i)) continue; - hdr = (const struct common_firmware_header *)adev->vcn.fw[i]->data; + hdr = (const struct common_firmware_header *)adev->vcn.inst[i].fw->data; /* currently only support 2 FW instances */ if (i >= 2) { dev_info(adev->dev, "More then 2 VCN FW instances!\n"); @@ -1071,12 +1078,14 @@ void amdgpu_vcn_setup_ucode(struct amdgpu_device *adev) } idx = AMDGPU_UCODE_ID_VCN + i; adev->firmware.ucode[idx].ucode_id = idx; - adev->firmware.ucode[idx].fw = adev->vcn.fw[i]; + adev->firmware.ucode[idx].fw = adev->vcn.inst[i].fw; adev->firmware.fw_size += ALIGN(le32_to_cpu(hdr->ucode_size_bytes), PAGE_SIZE); if (amdgpu_ip_version(adev, UVD_HWIP, 0) == - IP_VERSION(4, 0, 3)) + IP_VERSION(4, 0, 3) || + amdgpu_ip_version(adev, UVD_HWIP, 0) == + IP_VERSION(5, 0, 1)) break; } } @@ -1320,3 +1329,71 @@ void amdgpu_vcn_sysfs_reset_mask_fini(struct amdgpu_device *adev) device_remove_file(adev->dev, &dev_attr_vcn_reset_mask); } } + +/* + * debugfs to enable/disable vcn job submission to specific core or + * instance. It is created only if the queue type is unified. + */ +#if defined(CONFIG_DEBUG_FS) +static int amdgpu_debugfs_vcn_sched_mask_set(void *data, u64 val) +{ + struct amdgpu_device *adev = (struct amdgpu_device *)data; + u32 i; + u64 mask; + struct amdgpu_ring *ring; + + if (!adev) + return -ENODEV; + + mask = (1ULL << adev->vcn.num_vcn_inst) - 1; + if ((val & mask) == 0) + return -EINVAL; + for (i = 0; i < adev->vcn.num_vcn_inst; ++i) { + ring = &adev->vcn.inst[i].ring_enc[0]; + if (val & (1ULL << i)) + ring->sched.ready = true; + else + ring->sched.ready = false; + } + /* publish sched.ready flag update effective immediately across smp */ + smp_rmb(); + return 0; +} + +static int amdgpu_debugfs_vcn_sched_mask_get(void *data, u64 *val) +{ + struct amdgpu_device *adev = (struct amdgpu_device *)data; + u32 i; + u64 mask = 0; + struct amdgpu_ring *ring; + + if (!adev) + return -ENODEV; + for (i = 0; i < adev->vcn.num_vcn_inst; ++i) { + ring = &adev->vcn.inst[i].ring_enc[0]; + if (ring->sched.ready) + mask |= 1ULL << i; + } + *val = mask; + return 0; +} + +DEFINE_DEBUGFS_ATTRIBUTE(amdgpu_debugfs_vcn_sched_mask_fops, + amdgpu_debugfs_vcn_sched_mask_get, + amdgpu_debugfs_vcn_sched_mask_set, "%llx\n"); +#endif + +void amdgpu_debugfs_vcn_sched_mask_init(struct amdgpu_device *adev) +{ +#if defined(CONFIG_DEBUG_FS) + struct drm_minor *minor = adev_to_drm(adev)->primary; + struct dentry *root = minor->debugfs_root; + char name[32]; + + if (adev->vcn.num_vcn_inst <= 1 || !adev->vcn.using_unified_queue) + return; + sprintf(name, "amdgpu_vcn_sched_mask"); + debugfs_create_file(name, 0600, root, adev, + &amdgpu_debugfs_vcn_sched_mask_fops); +#endif +} diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h index 1e32311c1dff73f83b829dbb78a526fb84f8f89d..adaf4388ad2806f72fad1142e167541ecf556d81 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h @@ -1,5 +1,5 @@ /* - * Copyright 2016 Advanced Micro Devices, Inc. + * Copyright 2016-2024 Advanced Micro Devices, Inc. All rights reserved. * * Permission is hereby granted, free of charge, to any person obtaining a * copy of this software and associated documentation files (the "Software"), @@ -163,20 +163,30 @@ #define SOC24_DPG_MODE_OFFSET(ip, inst_idx, reg) \ ({ \ uint32_t internal_reg_offset, addr; \ - bool video_range, aon_range; \ + bool video_range, video1_range, aon_range, aon1_range; \ \ addr = (adev->reg_offset[ip##_HWIP][inst_idx][reg##_BASE_IDX] + reg); \ addr <<= 2; \ video_range = ((((0xFFFFF & addr) >= (VCN_VID_SOC_ADDRESS)) && \ ((0xFFFFF & addr) < ((VCN_VID_SOC_ADDRESS + 0x2600))))); \ + video1_range = ((((0xFFFFF & addr) >= (VCN1_VID_SOC_ADDRESS)) && \ + ((0xFFFFF & addr) < ((VCN1_VID_SOC_ADDRESS + 0x2600))))); \ aon_range = ((((0xFFFFF & addr) >= (VCN_AON_SOC_ADDRESS)) && \ ((0xFFFFF & addr) < ((VCN_AON_SOC_ADDRESS + 0x600))))); \ + aon1_range = ((((0xFFFFF & addr) >= (VCN1_AON_SOC_ADDRESS)) && \ + ((0xFFFFF & addr) < ((VCN1_AON_SOC_ADDRESS + 0x600))))); \ if (video_range) \ internal_reg_offset = ((0xFFFFF & addr) - (VCN_VID_SOC_ADDRESS) + \ (VCN_VID_IP_ADDRESS)); \ else if (aon_range) \ internal_reg_offset = ((0xFFFFF & addr) - (VCN_AON_SOC_ADDRESS) + \ (VCN_AON_IP_ADDRESS)); \ + else if (video1_range) \ + internal_reg_offset = ((0xFFFFF & addr) - (VCN1_VID_SOC_ADDRESS) + \ + (VCN_VID_IP_ADDRESS)); \ + else if (aon1_range) \ + internal_reg_offset = ((0xFFFFF & addr) - (VCN1_AON_SOC_ADDRESS) + \ + (VCN_AON_IP_ADDRESS)); \ else \ internal_reg_offset = (0xFFFFF & addr); \ \ @@ -297,6 +307,9 @@ struct amdgpu_vcn_inst { atomic_t dpg_enc_submission_cnt; struct amdgpu_vcn_fw_shared fw_shared; uint8_t aid_id; + const struct firmware *fw; /* VCN firmware */ + uint8_t vcn_config; + uint32_t vcn_codec_disable_mask; }; struct amdgpu_vcn_ras { @@ -306,15 +319,12 @@ struct amdgpu_vcn_ras { struct amdgpu_vcn { unsigned fw_version; struct delayed_work idle_work; - const struct firmware *fw[AMDGPU_MAX_VCN_INSTANCES]; /* VCN firmware */ unsigned num_enc_rings; enum amd_powergating_state cur_state; bool indirect_sram; uint8_t num_vcn_inst; struct amdgpu_vcn_inst inst[AMDGPU_MAX_VCN_INSTANCES]; - uint8_t vcn_config[AMDGPU_MAX_VCN_INSTANCES]; - uint32_t vcn_codec_disable_mask[AMDGPU_MAX_VCN_INSTANCES]; struct amdgpu_vcn_reg internal; struct mutex vcn_pg_lock; struct mutex vcn1_jpeg1_workaround; @@ -523,5 +533,6 @@ int amdgpu_vcn_psp_update_sram(struct amdgpu_device *adev, int inst_idx, int amdgpu_vcn_save_vcpu_bo(struct amdgpu_device *adev); int amdgpu_vcn_sysfs_reset_mask_init(struct amdgpu_device *adev); void amdgpu_vcn_sysfs_reset_mask_fini(struct amdgpu_device *adev); +void amdgpu_debugfs_vcn_sched_mask_init(struct amdgpu_device *adev); #endif diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c index c704e9803e1107afd93ad2e321b890294fead38b..0af469ec6fccdd08b4072ca4158341511a745dc7 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c @@ -1263,12 +1263,10 @@ static int amdgpu_virt_cache_host_error_counts(struct amdgpu_device *adev, if (used_size > (AMD_SRIOV_RAS_TELEMETRY_SIZE_KB << 10)) return 0; - tmp = kmalloc(used_size, GFP_KERNEL); + tmp = kmemdup(&host_telemetry->body.error_count, used_size, GFP_KERNEL); if (!tmp) return -ENOMEM; - memcpy(tmp, &host_telemetry->body.error_count, used_size); - if (checksum != amd_sriov_msg_checksum(tmp, used_size, 0, 0)) goto out; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c index 8bf28d33680753be420f3b4f2b1c0a4ff10f4287..03308261f89434c989a1aa548420b7897533f166 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c @@ -632,13 +632,13 @@ static bool amdgpu_vkms_is_idle(void *handle) return true; } -static int amdgpu_vkms_set_clockgating_state(void *handle, +static int amdgpu_vkms_set_clockgating_state(struct amdgpu_ip_block *ip_block, enum amd_clockgating_state state) { return 0; } -static int amdgpu_vkms_set_powergating_state(void *handle, +static int amdgpu_vkms_set_powergating_state(struct amdgpu_ip_block *ip_block, enum amd_powergating_state state) { return 0; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c index 8d9bf7a0857fde8d68937adb455549638d89646d..c9c48b782ec1b479471220e19aefe91bf60887c5 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -674,12 +674,8 @@ int amdgpu_vm_flush(struct amdgpu_ring *ring, struct amdgpu_job *job, pasid_mapping_needed &= adev->gmc.gmc_funcs->emit_pasid_mapping && ring->funcs->emit_wreg; - if (adev->gfx.enable_cleaner_shader && - ring->funcs->emit_cleaner_shader && - job->enforce_isolation) - ring->funcs->emit_cleaner_shader(ring); - - if (!vm_flush_needed && !gds_switch_needed && !need_pipe_sync) + if (!vm_flush_needed && !gds_switch_needed && !need_pipe_sync && + !(job->enforce_isolation && !job->vmid)) return 0; amdgpu_ring_ib_begin(ring); @@ -690,6 +686,11 @@ int amdgpu_vm_flush(struct amdgpu_ring *ring, struct amdgpu_job *job, if (need_pipe_sync) amdgpu_ring_emit_pipeline_sync(ring); + if (adev->gfx.enable_cleaner_shader && + ring->funcs->emit_cleaner_shader && + job->enforce_isolation) + ring->funcs->emit_cleaner_shader(ring); + if (vm_flush_needed) { trace_amdgpu_vm_flush(ring, job->vmid, job->vm_pd_addr); amdgpu_ring_emit_vm_flush(ring, job->vmid, job->vm_pd_addr); @@ -1265,10 +1266,9 @@ int amdgpu_vm_bo_update(struct amdgpu_device *adev, struct amdgpu_bo_va *bo_va, * next command submission. */ if (amdgpu_vm_is_bo_always_valid(vm, bo)) { - uint32_t mem_type = bo->tbo.resource->mem_type; - - if (!(bo->preferred_domains & - amdgpu_mem_type_to_domain(mem_type))) + if (bo->tbo.resource && + !(bo->preferred_domains & + amdgpu_mem_type_to_domain(bo->tbo.resource->mem_type))) amdgpu_vm_bo_evicted(&bo_va->base); else amdgpu_vm_bo_idle(&bo_va->base); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c index 110b120d7375d013ffce806e71c1f4ca9a0b17ec..121ee17b522bd4fb1ed2cf25c9c8d22b27a86f8c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c @@ -236,7 +236,8 @@ int amdgpu_vpe_init_microcode(struct amdgpu_vpe *vpe) int ret; amdgpu_ucode_ip_version_decode(adev, VPE_HWIP, fw_prefix, sizeof(fw_prefix)); - ret = amdgpu_ucode_request(adev, &adev->vpe.fw, "amdgpu/%s.bin", fw_prefix); + ret = amdgpu_ucode_request(adev, &adev->vpe.fw, AMDGPU_UCODE_REQUIRED, + "amdgpu/%s.bin", fw_prefix); if (ret) goto out; @@ -646,16 +647,16 @@ static int vpe_ring_preempt_ib(struct amdgpu_ring *ring) return r; } -static int vpe_set_clockgating_state(void *handle, +static int vpe_set_clockgating_state(struct amdgpu_ip_block *ip_block, enum amd_clockgating_state state) { return 0; } -static int vpe_set_powergating_state(void *handle, +static int vpe_set_powergating_state(struct amdgpu_ip_block *ip_block, enum amd_powergating_state state) { - struct amdgpu_device *adev = (struct amdgpu_device *)handle; + struct amdgpu_device *adev = ip_block->adev; struct amdgpu_vpe *vpe = &adev->vpe; if (!adev->pm.dpm_enabled) @@ -833,7 +834,7 @@ static int vpe_ring_test_ib(struct amdgpu_ring *ring, long timeout) ret = (le32_to_cpu(adev->wb.wb[index]) == test_pattern) ? 0 : -EINVAL; err1: - amdgpu_ib_free(adev, &ib, NULL); + amdgpu_ib_free(&ib, NULL); dma_fence_put(f); err0: amdgpu_device_wb_free(adev, index); diff --git a/drivers/gpu/drm/amd/amdgpu/cik.c b/drivers/gpu/drm/amd/amdgpu/cik.c index e2cb1f080e882314b8759d9c08e7c04b8dc50d5e..08d6787893b37cb0592783927347b903540b11c2 100644 --- a/drivers/gpu/drm/amd/amdgpu/cik.c +++ b/drivers/gpu/drm/amd/amdgpu/cik.c @@ -2161,13 +2161,13 @@ static int cik_common_soft_reset(struct amdgpu_ip_block *ip_block) return 0; } -static int cik_common_set_clockgating_state(void *handle, +static int cik_common_set_clockgating_state(struct amdgpu_ip_block *ip_block, enum amd_clockgating_state state) { return 0; } -static int cik_common_set_powergating_state(void *handle, +static int cik_common_set_powergating_state(struct amdgpu_ip_block *ip_block, enum amd_powergating_state state) { return 0; diff --git a/drivers/gpu/drm/amd/amdgpu/cik_ih.c b/drivers/gpu/drm/amd/amdgpu/cik_ih.c index 1da17755ad538e9884f05b28900533c2d20c2f28..444563486769c0658ee1c6440aa551260e715f5b 100644 --- a/drivers/gpu/drm/amd/amdgpu/cik_ih.c +++ b/drivers/gpu/drm/amd/amdgpu/cik_ih.c @@ -402,13 +402,13 @@ static int cik_ih_soft_reset(struct amdgpu_ip_block *ip_block) return 0; } -static int cik_ih_set_clockgating_state(void *handle, +static int cik_ih_set_clockgating_state(struct amdgpu_ip_block *ip_block, enum amd_clockgating_state state) { return 0; } -static int cik_ih_set_powergating_state(void *handle, +static int cik_ih_set_powergating_state(struct amdgpu_ip_block *ip_block, enum amd_powergating_state state) { return 0; diff --git a/drivers/gpu/drm/amd/amdgpu/cik_sdma.c b/drivers/gpu/drm/amd/amdgpu/cik_sdma.c index ede1a028d48d54805214a73b66477ce71c07daee..d9bd8f3f17e27b4e7e1e713ef2748e976a85ba8e 100644 --- a/drivers/gpu/drm/amd/amdgpu/cik_sdma.c +++ b/drivers/gpu/drm/amd/amdgpu/cik_sdma.c @@ -133,9 +133,11 @@ static int cik_sdma_init_microcode(struct amdgpu_device *adev) for (i = 0; i < adev->sdma.num_instances; i++) { if (i == 0) err = amdgpu_ucode_request(adev, &adev->sdma.instance[i].fw, + AMDGPU_UCODE_REQUIRED, "amdgpu/%s_sdma.bin", chip_name); else err = amdgpu_ucode_request(adev, &adev->sdma.instance[i].fw, + AMDGPU_UCODE_REQUIRED, "amdgpu/%s_sdma1.bin", chip_name); if (err) goto out; @@ -696,7 +698,7 @@ static int cik_sdma_ring_test_ib(struct amdgpu_ring *ring, long timeout) r = -EINVAL; err1: - amdgpu_ib_free(adev, &ib, NULL); + amdgpu_ib_free(&ib, NULL); dma_fence_put(f); err0: amdgpu_device_wb_free(adev, index); @@ -1189,11 +1191,11 @@ static int cik_sdma_process_illegal_inst_irq(struct amdgpu_device *adev, return 0; } -static int cik_sdma_set_clockgating_state(void *handle, +static int cik_sdma_set_clockgating_state(struct amdgpu_ip_block *ip_block, enum amd_clockgating_state state) { bool gate = false; - struct amdgpu_device *adev = (struct amdgpu_device *)handle; + struct amdgpu_device *adev = ip_block->adev; if (state == AMD_CG_STATE_GATE) gate = true; @@ -1204,7 +1206,7 @@ static int cik_sdma_set_clockgating_state(void *handle, return 0; } -static int cik_sdma_set_powergating_state(void *handle, +static int cik_sdma_set_powergating_state(struct amdgpu_ip_block *ip_block, enum amd_powergating_state state) { return 0; diff --git a/drivers/gpu/drm/amd/amdgpu/cz_ih.c b/drivers/gpu/drm/amd/amdgpu/cz_ih.c index d72973bd570dfd3144f80010f569b3a7d87ba7b7..82586b76aeda889e9f1804daf6e9d3c2994be8da 100644 --- a/drivers/gpu/drm/amd/amdgpu/cz_ih.c +++ b/drivers/gpu/drm/amd/amdgpu/cz_ih.c @@ -398,14 +398,14 @@ static int cz_ih_soft_reset(struct amdgpu_ip_block *ip_block) return 0; } -static int cz_ih_set_clockgating_state(void *handle, +static int cz_ih_set_clockgating_state(struct amdgpu_ip_block *ip_block, enum amd_clockgating_state state) { // TODO return 0; } -static int cz_ih_set_powergating_state(void *handle, +static int cz_ih_set_powergating_state(struct amdgpu_ip_block *ip_block, enum amd_powergating_state state) { // TODO diff --git a/drivers/gpu/drm/amd/amdgpu/dce_v10_0.c b/drivers/gpu/drm/amd/amdgpu/dce_v10_0.c index 5098c50d54c85a147cc582ab42840c3868edf8b5..c5e3d2251b18cf02f9bb406fc90a395cc862b653 100644 --- a/drivers/gpu/drm/amd/amdgpu/dce_v10_0.c +++ b/drivers/gpu/drm/amd/amdgpu/dce_v10_0.c @@ -2687,6 +2687,32 @@ static const struct drm_crtc_helper_funcs dce_v10_0_crtc_helper_funcs = { .get_scanout_position = amdgpu_crtc_get_scanout_position, }; +static void dce_v10_0_panic_flush(struct drm_plane *plane) +{ + struct drm_framebuffer *fb; + struct amdgpu_crtc *amdgpu_crtc; + struct amdgpu_device *adev; + uint32_t fb_format; + + if (!plane->fb) + return; + + fb = plane->fb; + amdgpu_crtc = to_amdgpu_crtc(plane->crtc); + adev = drm_to_adev(fb->dev); + + /* Disable DC tiling */ + fb_format = RREG32(mmGRPH_CONTROL + amdgpu_crtc->crtc_offset); + fb_format &= ~GRPH_CONTROL__GRPH_ARRAY_MODE_MASK; + WREG32(mmGRPH_CONTROL + amdgpu_crtc->crtc_offset, fb_format); + +} + +static const struct drm_plane_helper_funcs dce_v10_0_drm_primary_plane_helper_funcs = { + .get_scanout_buffer = amdgpu_display_get_scanout_buffer, + .panic_flush = dce_v10_0_panic_flush, +}; + static int dce_v10_0_crtc_init(struct amdgpu_device *adev, int index) { struct amdgpu_crtc *amdgpu_crtc; @@ -2734,6 +2760,7 @@ static int dce_v10_0_crtc_init(struct amdgpu_device *adev, int index) amdgpu_crtc->encoder = NULL; amdgpu_crtc->connector = NULL; drm_crtc_helper_add(&amdgpu_crtc->base, &dce_v10_0_crtc_helper_funcs); + drm_plane_helper_add(amdgpu_crtc->base.primary, &dce_v10_0_drm_primary_plane_helper_funcs); return 0; } @@ -3302,13 +3329,13 @@ static int dce_v10_0_hpd_irq(struct amdgpu_device *adev, return 0; } -static int dce_v10_0_set_clockgating_state(void *handle, +static int dce_v10_0_set_clockgating_state(struct amdgpu_ip_block *ip_block, enum amd_clockgating_state state) { return 0; } -static int dce_v10_0_set_powergating_state(void *handle, +static int dce_v10_0_set_powergating_state(struct amdgpu_ip_block *ip_block, enum amd_powergating_state state) { return 0; diff --git a/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c b/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c index c5680ff4ab9fd8f2127c55f83e6d1562b3430e5d..ea42a4472bf6c915425a01915ee8e1f6d148b667 100644 --- a/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c +++ b/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c @@ -2800,6 +2800,32 @@ static const struct drm_crtc_helper_funcs dce_v11_0_crtc_helper_funcs = { .get_scanout_position = amdgpu_crtc_get_scanout_position, }; +static void dce_v11_0_panic_flush(struct drm_plane *plane) +{ + struct drm_framebuffer *fb; + struct amdgpu_crtc *amdgpu_crtc; + struct amdgpu_device *adev; + uint32_t fb_format; + + if (!plane->fb) + return; + + fb = plane->fb; + amdgpu_crtc = to_amdgpu_crtc(plane->crtc); + adev = drm_to_adev(fb->dev); + + /* Disable DC tiling */ + fb_format = RREG32(mmGRPH_CONTROL + amdgpu_crtc->crtc_offset); + fb_format &= ~GRPH_CONTROL__GRPH_ARRAY_MODE_MASK; + WREG32(mmGRPH_CONTROL + amdgpu_crtc->crtc_offset, fb_format); + +} + +static const struct drm_plane_helper_funcs dce_v11_0_drm_primary_plane_helper_funcs = { + .get_scanout_buffer = amdgpu_display_get_scanout_buffer, + .panic_flush = dce_v11_0_panic_flush, +}; + static int dce_v11_0_crtc_init(struct amdgpu_device *adev, int index) { struct amdgpu_crtc *amdgpu_crtc; @@ -2847,6 +2873,7 @@ static int dce_v11_0_crtc_init(struct amdgpu_device *adev, int index) amdgpu_crtc->encoder = NULL; amdgpu_crtc->connector = NULL; drm_crtc_helper_add(&amdgpu_crtc->base, &dce_v11_0_crtc_helper_funcs); + drm_plane_helper_add(amdgpu_crtc->base.primary, &dce_v11_0_drm_primary_plane_helper_funcs); return 0; } @@ -3434,13 +3461,13 @@ static int dce_v11_0_hpd_irq(struct amdgpu_device *adev, return 0; } -static int dce_v11_0_set_clockgating_state(void *handle, +static int dce_v11_0_set_clockgating_state(struct amdgpu_ip_block *ip_block, enum amd_clockgating_state state) { return 0; } -static int dce_v11_0_set_powergating_state(void *handle, +static int dce_v11_0_set_powergating_state(struct amdgpu_ip_block *ip_block, enum amd_powergating_state state) { return 0; diff --git a/drivers/gpu/drm/amd/amdgpu/dce_v6_0.c b/drivers/gpu/drm/amd/amdgpu/dce_v6_0.c index eb7de9122d99f7208e3ca5b1c86ecd10766be5b1..915804a6a1d7df978268753e78dba08c35cb891d 100644 --- a/drivers/gpu/drm/amd/amdgpu/dce_v6_0.c +++ b/drivers/gpu/drm/amd/amdgpu/dce_v6_0.c @@ -2602,6 +2602,32 @@ static const struct drm_crtc_helper_funcs dce_v6_0_crtc_helper_funcs = { .get_scanout_position = amdgpu_crtc_get_scanout_position, }; +static void dce_v6_0_panic_flush(struct drm_plane *plane) +{ + struct drm_framebuffer *fb; + struct amdgpu_crtc *amdgpu_crtc; + struct amdgpu_device *adev; + uint32_t fb_format; + + if (!plane->fb) + return; + + fb = plane->fb; + amdgpu_crtc = to_amdgpu_crtc(plane->crtc); + adev = drm_to_adev(fb->dev); + + /* Disable DC tiling */ + fb_format = RREG32(mmGRPH_CONTROL + amdgpu_crtc->crtc_offset); + fb_format &= ~GRPH_ARRAY_MODE(0x7); + WREG32(mmGRPH_CONTROL + amdgpu_crtc->crtc_offset, fb_format); + +} + +static const struct drm_plane_helper_funcs dce_v6_0_drm_primary_plane_helper_funcs = { + .get_scanout_buffer = amdgpu_display_get_scanout_buffer, + .panic_flush = dce_v6_0_panic_flush, +}; + static int dce_v6_0_crtc_init(struct amdgpu_device *adev, int index) { struct amdgpu_crtc *amdgpu_crtc; @@ -2629,6 +2655,7 @@ static int dce_v6_0_crtc_init(struct amdgpu_device *adev, int index) amdgpu_crtc->encoder = NULL; amdgpu_crtc->connector = NULL; drm_crtc_helper_add(&amdgpu_crtc->base, &dce_v6_0_crtc_helper_funcs); + drm_plane_helper_add(amdgpu_crtc->base.primary, &dce_v6_0_drm_primary_plane_helper_funcs); return 0; } @@ -3124,13 +3151,13 @@ static int dce_v6_0_hpd_irq(struct amdgpu_device *adev, } -static int dce_v6_0_set_clockgating_state(void *handle, +static int dce_v6_0_set_clockgating_state(struct amdgpu_ip_block *ip_block, enum amd_clockgating_state state) { return 0; } -static int dce_v6_0_set_powergating_state(void *handle, +static int dce_v6_0_set_powergating_state(struct amdgpu_ip_block *ip_block, enum amd_powergating_state state) { return 0; diff --git a/drivers/gpu/drm/amd/amdgpu/dce_v8_0.c b/drivers/gpu/drm/amd/amdgpu/dce_v8_0.c index 04b79ff87f756c75dba16fa34ef115bea74cf805..f2edc0fece5bfe195dfb7a03f4133b2cf65cabf7 100644 --- a/drivers/gpu/drm/amd/amdgpu/dce_v8_0.c +++ b/drivers/gpu/drm/amd/amdgpu/dce_v8_0.c @@ -2613,6 +2613,31 @@ static const struct drm_crtc_helper_funcs dce_v8_0_crtc_helper_funcs = { .get_scanout_position = amdgpu_crtc_get_scanout_position, }; +static void dce_v8_0_panic_flush(struct drm_plane *plane) +{ + struct drm_framebuffer *fb; + struct amdgpu_crtc *amdgpu_crtc; + struct amdgpu_device *adev; + uint32_t fb_format; + + if (!plane->fb) + return; + + fb = plane->fb; + amdgpu_crtc = to_amdgpu_crtc(plane->crtc); + adev = drm_to_adev(fb->dev); + + /* Disable DC tiling */ + fb_format = RREG32(mmGRPH_CONTROL + amdgpu_crtc->crtc_offset); + fb_format &= ~GRPH_CONTROL__GRPH_ARRAY_MODE_MASK; + WREG32(mmGRPH_CONTROL + amdgpu_crtc->crtc_offset, fb_format); +} + +static const struct drm_plane_helper_funcs dce_v8_0_drm_primary_plane_helper_funcs = { + .get_scanout_buffer = amdgpu_display_get_scanout_buffer, + .panic_flush = dce_v8_0_panic_flush, +}; + static int dce_v8_0_crtc_init(struct amdgpu_device *adev, int index) { struct amdgpu_crtc *amdgpu_crtc; @@ -2640,6 +2665,7 @@ static int dce_v8_0_crtc_init(struct amdgpu_device *adev, int index) amdgpu_crtc->encoder = NULL; amdgpu_crtc->connector = NULL; drm_crtc_helper_add(&amdgpu_crtc->base, &dce_v8_0_crtc_helper_funcs); + drm_plane_helper_add(amdgpu_crtc->base.primary, &dce_v8_0_drm_primary_plane_helper_funcs); return 0; } @@ -3212,13 +3238,13 @@ static int dce_v8_0_hpd_irq(struct amdgpu_device *adev, } -static int dce_v8_0_set_clockgating_state(void *handle, +static int dce_v8_0_set_clockgating_state(struct amdgpu_ip_block *ip_block, enum amd_clockgating_state state) { return 0; } -static int dce_v8_0_set_powergating_state(void *handle, +static int dce_v8_0_set_powergating_state(struct amdgpu_ip_block *ip_block, enum amd_powergating_state state) { return 0; diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c index 24dce803a829cbd3f4177b675af9ac4b09cfc091..003522c2d9027256a8052868789442f8a64483fd 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c @@ -3673,7 +3673,7 @@ static void gfx_v10_0_ring_invalidate_tlbs(struct amdgpu_ring *ring, static void gfx_v10_0_update_spm_vmid_internal(struct amdgpu_device *adev, unsigned int vmid); -static int gfx_v10_0_set_powergating_state(void *handle, +static int gfx_v10_0_set_powergating_state(struct amdgpu_ip_block *ip_block, enum amd_powergating_state state); static void gfx10_kiq_set_resources(struct amdgpu_ring *kiq_ring, uint64_t queue_mask) { @@ -4036,7 +4036,7 @@ static int gfx_v10_0_ring_test_ib(struct amdgpu_ring *ring, long timeout) else r = -EINVAL; err2: - amdgpu_ib_free(adev, &ib, NULL); + amdgpu_ib_free(&ib, NULL); dma_fence_put(f); err1: amdgpu_device_wb_free(adev, index); @@ -4138,18 +4138,21 @@ static int gfx_v10_0_init_microcode(struct amdgpu_device *adev) amdgpu_ucode_ip_version_decode(adev, GC_HWIP, ucode_prefix, sizeof(ucode_prefix)); err = amdgpu_ucode_request(adev, &adev->gfx.pfp_fw, + AMDGPU_UCODE_REQUIRED, "amdgpu/%s_pfp%s.bin", ucode_prefix, wks); if (err) goto out; amdgpu_gfx_cp_init_microcode(adev, AMDGPU_UCODE_ID_CP_PFP); err = amdgpu_ucode_request(adev, &adev->gfx.me_fw, + AMDGPU_UCODE_REQUIRED, "amdgpu/%s_me%s.bin", ucode_prefix, wks); if (err) goto out; amdgpu_gfx_cp_init_microcode(adev, AMDGPU_UCODE_ID_CP_ME); err = amdgpu_ucode_request(adev, &adev->gfx.ce_fw, + AMDGPU_UCODE_REQUIRED, "amdgpu/%s_ce%s.bin", ucode_prefix, wks); if (err) goto out; @@ -4173,6 +4176,7 @@ static int gfx_v10_0_init_microcode(struct amdgpu_device *adev) } err = amdgpu_ucode_request(adev, &adev->gfx.mec_fw, + AMDGPU_UCODE_REQUIRED, "amdgpu/%s_mec%s.bin", ucode_prefix, wks); if (err) goto out; @@ -4180,6 +4184,7 @@ static int gfx_v10_0_init_microcode(struct amdgpu_device *adev) amdgpu_gfx_cp_init_microcode(adev, AMDGPU_UCODE_ID_CP_MEC1_JT); err = amdgpu_ucode_request(adev, &adev->gfx.mec2_fw, + AMDGPU_UCODE_REQUIRED, "amdgpu/%s_mec2%s.bin", ucode_prefix, wks); if (!err) { amdgpu_gfx_cp_init_microcode(adev, AMDGPU_UCODE_ID_CP_MEC2); @@ -5952,7 +5957,7 @@ static int gfx_v10_0_cp_gfx_enable(struct amdgpu_device *adev, bool enable) else WREG32_SOC15(GC, 0, mmCP_ME_CNTL, tmp); - if (adev->job_hang && !enable) + if (amdgpu_in_reset(adev) && !enable) return 0; for (i = 0; i < adev->usec_timeout; i++) { @@ -6599,17 +6604,13 @@ static void gfx_v10_0_kiq_setting(struct amdgpu_ring *ring) tmp = RREG32_SOC15(GC, 0, mmRLC_CP_SCHEDULERS_Sienna_Cichlid); tmp &= 0xffffff00; tmp |= (ring->me << 5) | (ring->pipe << 3) | (ring->queue); - WREG32_SOC15(GC, 0, mmRLC_CP_SCHEDULERS_Sienna_Cichlid, tmp); - tmp |= 0x80; - WREG32_SOC15(GC, 0, mmRLC_CP_SCHEDULERS_Sienna_Cichlid, tmp); + WREG32_SOC15(GC, 0, mmRLC_CP_SCHEDULERS_Sienna_Cichlid, tmp | 0x80); break; default: tmp = RREG32_SOC15(GC, 0, mmRLC_CP_SCHEDULERS); tmp &= 0xffffff00; tmp |= (ring->me << 5) | (ring->pipe << 3) | (ring->queue); - WREG32_SOC15(GC, 0, mmRLC_CP_SCHEDULERS, tmp); - tmp |= 0x80; - WREG32_SOC15(GC, 0, mmRLC_CP_SCHEDULERS, tmp); + WREG32_SOC15(GC, 0, mmRLC_CP_SCHEDULERS, tmp | 0x80); break; } } @@ -7457,7 +7458,7 @@ static int gfx_v10_0_hw_fini(struct amdgpu_ip_block *ip_block) * otherwise the gfxoff disallowing will be failed to set. */ if (amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(10, 3, 1)) - gfx_v10_0_set_powergating_state(ip_block->adev, AMD_PG_STATE_UNGATE); + gfx_v10_0_set_powergating_state(ip_block, AMD_PG_STATE_UNGATE); if (!adev->no_hw_access) { if (amdgpu_async_gfx_ring) { @@ -8345,10 +8346,10 @@ static const struct amdgpu_rlc_funcs gfx_v10_0_rlc_funcs_sriov = { .is_rlcg_access_range = gfx_v10_0_is_rlcg_access_range, }; -static int gfx_v10_0_set_powergating_state(void *handle, +static int gfx_v10_0_set_powergating_state(struct amdgpu_ip_block *ip_block, enum amd_powergating_state state) { - struct amdgpu_device *adev = (struct amdgpu_device *)handle; + struct amdgpu_device *adev = ip_block->adev; bool enable = (state == AMD_PG_STATE_GATE); if (amdgpu_sriov_vf(adev)) @@ -8383,10 +8384,10 @@ static int gfx_v10_0_set_powergating_state(void *handle, return 0; } -static int gfx_v10_0_set_clockgating_state(void *handle, +static int gfx_v10_0_set_clockgating_state(struct amdgpu_ip_block *ip_block, enum amd_clockgating_state state) { - struct amdgpu_device *adev = (struct amdgpu_device *)handle; + struct amdgpu_device *adev = ip_block->adev; if (amdgpu_sriov_vf(adev)) return 0; diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c index 2ae058a224f4dc409a73f21a32ab288595187ae5..fe5d7cd814f7d5b09fc5160848f9dc3160af29a9 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c @@ -615,7 +615,7 @@ static int gfx_v11_0_ring_test_ib(struct amdgpu_ring *ring, long timeout) r = -EINVAL; err2: if (!ring->is_mes_queue) - amdgpu_ib_free(adev, &ib, NULL); + amdgpu_ib_free(&ib, NULL); dma_fence_put(f); err1: if (!ring->is_mes_queue) @@ -639,6 +639,7 @@ static int gfx_v11_0_init_toc_microcode(struct amdgpu_device *adev, const char * int err = 0; err = amdgpu_ucode_request(adev, &adev->psp.toc_fw, + AMDGPU_UCODE_REQUIRED, "amdgpu/%s_toc.bin", ucode_prefix); if (err) goto out; @@ -688,6 +689,7 @@ static int gfx_v11_0_init_microcode(struct amdgpu_device *adev) amdgpu_ucode_ip_version_decode(adev, GC_HWIP, ucode_prefix, sizeof(ucode_prefix)); err = amdgpu_ucode_request(adev, &adev->gfx.pfp_fw, + AMDGPU_UCODE_REQUIRED, "amdgpu/%s_pfp.bin", ucode_prefix); if (err) goto out; @@ -705,6 +707,7 @@ static int gfx_v11_0_init_microcode(struct amdgpu_device *adev) } err = amdgpu_ucode_request(adev, &adev->gfx.me_fw, + AMDGPU_UCODE_REQUIRED, "amdgpu/%s_me.bin", ucode_prefix); if (err) goto out; @@ -720,9 +723,11 @@ static int gfx_v11_0_init_microcode(struct amdgpu_device *adev) if (amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(11, 0, 0) && adev->pdev->revision == 0xCE) err = amdgpu_ucode_request(adev, &adev->gfx.rlc_fw, + AMDGPU_UCODE_REQUIRED, "amdgpu/gc_11_0_0_rlc_1.bin"); else err = amdgpu_ucode_request(adev, &adev->gfx.rlc_fw, + AMDGPU_UCODE_REQUIRED, "amdgpu/%s_rlc.bin", ucode_prefix); if (err) goto out; @@ -735,6 +740,7 @@ static int gfx_v11_0_init_microcode(struct amdgpu_device *adev) } err = amdgpu_ucode_request(adev, &adev->gfx.mec_fw, + AMDGPU_UCODE_REQUIRED, "amdgpu/%s_mec.bin", ucode_prefix); if (err) goto out; @@ -3918,9 +3924,7 @@ static void gfx_v11_0_kiq_setting(struct amdgpu_ring *ring) tmp = RREG32_SOC15(GC, 0, regRLC_CP_SCHEDULERS); tmp &= 0xffffff00; tmp |= (ring->me << 5) | (ring->pipe << 3) | (ring->queue); - WREG32_SOC15(GC, 0, regRLC_CP_SCHEDULERS, tmp); - tmp |= 0x80; - WREG32_SOC15(GC, 0, regRLC_CP_SCHEDULERS, tmp); + WREG32_SOC15(GC, 0, regRLC_CP_SCHEDULERS, tmp | 0x80); } static void gfx_v11_0_cp_set_doorbell_range(struct amdgpu_device *adev) @@ -5458,10 +5462,10 @@ static void gfx_v11_cntl_pg(struct amdgpu_device *adev, bool enable) amdgpu_gfx_rlc_exit_safe_mode(adev, 0); } -static int gfx_v11_0_set_powergating_state(void *handle, +static int gfx_v11_0_set_powergating_state(struct amdgpu_ip_block *ip_block, enum amd_powergating_state state) { - struct amdgpu_device *adev = (struct amdgpu_device *)handle; + struct amdgpu_device *adev = ip_block->adev; bool enable = (state == AMD_PG_STATE_GATE); if (amdgpu_sriov_vf(adev)) @@ -5494,10 +5498,10 @@ static int gfx_v11_0_set_powergating_state(void *handle, return 0; } -static int gfx_v11_0_set_clockgating_state(void *handle, +static int gfx_v11_0_set_clockgating_state(struct amdgpu_ip_block *ip_block, enum amd_clockgating_state state) { - struct amdgpu_device *adev = (struct amdgpu_device *)handle; + struct amdgpu_device *adev = ip_block->adev; if (amdgpu_sriov_vf(adev)) return 0; @@ -6646,30 +6650,14 @@ static int gfx_v11_0_reset_kgq(struct amdgpu_ring *ring, unsigned int vmid) static int gfx_v11_0_reset_kcq(struct amdgpu_ring *ring, unsigned int vmid) { struct amdgpu_device *adev = ring->adev; - int i, r = 0; + int r = 0; if (amdgpu_sriov_vf(adev)) return -EINVAL; - amdgpu_gfx_rlc_enter_safe_mode(adev, 0); - mutex_lock(&adev->srbm_mutex); - soc21_grbm_select(adev, ring->me, ring->pipe, ring->queue, 0); - WREG32_SOC15(GC, 0, regCP_HQD_DEQUEUE_REQUEST, 0x2); - WREG32_SOC15(GC, 0, regSPI_COMPUTE_QUEUE_RESET, 0x1); - - /* make sure dequeue is complete*/ - for (i = 0; i < adev->usec_timeout; i++) { - if (!(RREG32_SOC15(GC, 0, regCP_HQD_ACTIVE) & 1)) - break; - udelay(1); - } - if (i >= adev->usec_timeout) - r = -ETIMEDOUT; - soc21_grbm_select(adev, 0, 0, 0, 0); - mutex_unlock(&adev->srbm_mutex); - amdgpu_gfx_rlc_exit_safe_mode(adev, 0); + r = amdgpu_mes_reset_legacy_queue(ring->adev, ring, vmid, true); if (r) { - dev_err(adev->dev, "fail to wait on hqd deactivate\n"); + dev_err(adev->dev, "reset via MMIO failed %d\n", r); return r; } diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c index fe7c48f2fb2a700810ffbcca0a211623e3486ef0..40c47a2dd06025671658c087a5cfa5c00dadb448 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c @@ -513,7 +513,7 @@ static int gfx_v12_0_ring_test_ib(struct amdgpu_ring *ring, long timeout) r = -EINVAL; err2: if (!ring->is_mes_queue) - amdgpu_ib_free(adev, &ib, NULL); + amdgpu_ib_free(&ib, NULL); dma_fence_put(f); err1: if (!ring->is_mes_queue) @@ -537,6 +537,7 @@ static int gfx_v12_0_init_toc_microcode(struct amdgpu_device *adev, const char * int err = 0; err = amdgpu_ucode_request(adev, &adev->psp.toc_fw, + AMDGPU_UCODE_REQUIRED, "amdgpu/%s_toc.bin", ucode_prefix); if (err) goto out; @@ -566,6 +567,7 @@ static int gfx_v12_0_init_microcode(struct amdgpu_device *adev) amdgpu_ucode_ip_version_decode(adev, GC_HWIP, ucode_prefix, sizeof(ucode_prefix)); err = amdgpu_ucode_request(adev, &adev->gfx.pfp_fw, + AMDGPU_UCODE_REQUIRED, "amdgpu/%s_pfp.bin", ucode_prefix); if (err) goto out; @@ -573,6 +575,7 @@ static int gfx_v12_0_init_microcode(struct amdgpu_device *adev) amdgpu_gfx_cp_init_microcode(adev, AMDGPU_UCODE_ID_CP_RS64_PFP_P0_STACK); err = amdgpu_ucode_request(adev, &adev->gfx.me_fw, + AMDGPU_UCODE_REQUIRED, "amdgpu/%s_me.bin", ucode_prefix); if (err) goto out; @@ -581,6 +584,7 @@ static int gfx_v12_0_init_microcode(struct amdgpu_device *adev) if (!amdgpu_sriov_vf(adev)) { err = amdgpu_ucode_request(adev, &adev->gfx.rlc_fw, + AMDGPU_UCODE_REQUIRED, "amdgpu/%s_rlc.bin", ucode_prefix); if (err) goto out; @@ -593,6 +597,7 @@ static int gfx_v12_0_init_microcode(struct amdgpu_device *adev) } err = amdgpu_ucode_request(adev, &adev->gfx.mec_fw, + AMDGPU_UCODE_REQUIRED, "amdgpu/%s_mec.bin", ucode_prefix); if (err) goto out; @@ -2832,9 +2837,7 @@ static void gfx_v12_0_kiq_setting(struct amdgpu_ring *ring) tmp = RREG32_SOC15(GC, 0, regRLC_CP_SCHEDULERS); tmp &= 0xffffff00; tmp |= (ring->me << 5) | (ring->pipe << 3) | (ring->queue); - WREG32_SOC15(GC, 0, regRLC_CP_SCHEDULERS, tmp); - tmp |= 0x80; - WREG32_SOC15(GC, 0, regRLC_CP_SCHEDULERS, tmp); + WREG32_SOC15(GC, 0, regRLC_CP_SCHEDULERS, tmp | 0x80); } static void gfx_v12_0_cp_set_doorbell_range(struct amdgpu_device *adev) @@ -3864,10 +3867,10 @@ static void gfx_v12_cntl_pg(struct amdgpu_device *adev, bool enable) } #endif -static int gfx_v12_0_set_powergating_state(void *handle, +static int gfx_v12_0_set_powergating_state(struct amdgpu_ip_block *ip_block, enum amd_powergating_state state) { - struct amdgpu_device *adev = (struct amdgpu_device *)handle; + struct amdgpu_device *adev = ip_block->adev; bool enable = (state == AMD_PG_STATE_GATE); if (amdgpu_sriov_vf(adev)) @@ -4115,15 +4118,15 @@ static int gfx_v12_0_update_gfx_clock_gating(struct amdgpu_device *adev, return 0; } -static int gfx_v12_0_set_clockgating_state(void *handle, +static int gfx_v12_0_set_clockgating_state(struct amdgpu_ip_block *ip_block, enum amd_clockgating_state state) { - struct amdgpu_device *adev = (struct amdgpu_device *)handle; + struct amdgpu_device *adev = ip_block->adev; if (amdgpu_sriov_vf(adev)) return 0; - switch (adev->ip_versions[GC_HWIP][0]) { + switch (amdgpu_ip_version(adev, GC_HWIP, 0)) { case IP_VERSION(12, 0, 0): case IP_VERSION(12, 0, 1): gfx_v12_0_update_gfx_clock_gating(adev, @@ -5233,24 +5236,16 @@ static int gfx_v12_0_reset_kgq(struct amdgpu_ring *ring, unsigned int vmid) static int gfx_v12_0_reset_kcq(struct amdgpu_ring *ring, unsigned int vmid) { struct amdgpu_device *adev = ring->adev; - int r, i; + int r; if (amdgpu_sriov_vf(adev)) return -EINVAL; - amdgpu_gfx_rlc_enter_safe_mode(adev, 0); - mutex_lock(&adev->srbm_mutex); - soc24_grbm_select(adev, ring->me, ring->pipe, ring->queue, 0); - WREG32_SOC15(GC, 0, regCP_HQD_DEQUEUE_REQUEST, 0x2); - WREG32_SOC15(GC, 0, regSPI_COMPUTE_QUEUE_RESET, 0x1); - for (i = 0; i < adev->usec_timeout; i++) { - if (!(RREG32_SOC15(GC, 0, regCP_HQD_ACTIVE) & 1)) - break; - udelay(1); + r = amdgpu_mes_reset_legacy_queue(ring->adev, ring, vmid, true); + if (r) { + dev_err(adev->dev, "reset via MMIO failed %d\n", r); + return r; } - soc24_grbm_select(adev, 0, 0, 0, 0); - mutex_unlock(&adev->srbm_mutex); - amdgpu_gfx_rlc_exit_safe_mode(adev, 0); r = amdgpu_bo_reserve(ring->mqd_obj, false); if (unlikely(r != 0)) { diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.h b/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.h index bcc9c72ccbde2b2a0775746357ced39cfed51e26..f7184b2dc4e873abffb11d3b0710f92a15e7f987 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.h +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.h @@ -26,4 +26,6 @@ extern const struct amdgpu_ip_block_version gfx_v12_0_ip_block; +int gfx_v12_0_request_gfx_index_mutex(struct amdgpu_device *adev, + bool req); #endif diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c index 41f50bf380c4058af9768b52ea2f31c67975ad78..f26e2cdec07a2fffce9789a7fca4acc114699bdc 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c @@ -337,6 +337,7 @@ static int gfx_v6_0_init_microcode(struct amdgpu_device *adev) } err = amdgpu_ucode_request(adev, &adev->gfx.pfp_fw, + AMDGPU_UCODE_REQUIRED, "amdgpu/%s_pfp.bin", chip_name); if (err) goto out; @@ -345,6 +346,7 @@ static int gfx_v6_0_init_microcode(struct amdgpu_device *adev) adev->gfx.pfp_feature_version = le32_to_cpu(cp_hdr->ucode_feature_version); err = amdgpu_ucode_request(adev, &adev->gfx.me_fw, + AMDGPU_UCODE_REQUIRED, "amdgpu/%s_me.bin", chip_name); if (err) goto out; @@ -353,6 +355,7 @@ static int gfx_v6_0_init_microcode(struct amdgpu_device *adev) adev->gfx.me_feature_version = le32_to_cpu(cp_hdr->ucode_feature_version); err = amdgpu_ucode_request(adev, &adev->gfx.ce_fw, + AMDGPU_UCODE_REQUIRED, "amdgpu/%s_ce.bin", chip_name); if (err) goto out; @@ -361,6 +364,7 @@ static int gfx_v6_0_init_microcode(struct amdgpu_device *adev) adev->gfx.ce_feature_version = le32_to_cpu(cp_hdr->ucode_feature_version); err = amdgpu_ucode_request(adev, &adev->gfx.rlc_fw, + AMDGPU_UCODE_REQUIRED, "amdgpu/%s_rlc.bin", chip_name); if (err) goto out; @@ -1906,7 +1910,7 @@ static int gfx_v6_0_ring_test_ib(struct amdgpu_ring *ring, long timeout) r = -EINVAL; error: - amdgpu_ib_free(adev, &ib, NULL); + amdgpu_ib_free(&ib, NULL); dma_fence_put(f); return r; } @@ -3373,11 +3377,11 @@ static int gfx_v6_0_priv_inst_irq(struct amdgpu_device *adev, return 0; } -static int gfx_v6_0_set_clockgating_state(void *handle, +static int gfx_v6_0_set_clockgating_state(struct amdgpu_ip_block *ip_block, enum amd_clockgating_state state) { bool gate = false; - struct amdgpu_device *adev = (struct amdgpu_device *)handle; + struct amdgpu_device *adev = ip_block->adev; if (state == AMD_CG_STATE_GATE) gate = true; @@ -3395,11 +3399,11 @@ static int gfx_v6_0_set_clockgating_state(void *handle, return 0; } -static int gfx_v6_0_set_powergating_state(void *handle, +static int gfx_v6_0_set_powergating_state(struct amdgpu_ip_block *ip_block, enum amd_powergating_state state) { bool gate = false; - struct amdgpu_device *adev = (struct amdgpu_device *)handle; + struct amdgpu_device *adev = ip_block->adev; if (state == AMD_PG_STATE_GATE) gate = true; diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c index 824d5913103b37e3872e3b47079f7374ff0991fa..84745b2453abeb1e1839489a817b1fe6d2bdb61c 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c @@ -934,33 +934,39 @@ static int gfx_v7_0_init_microcode(struct amdgpu_device *adev) } err = amdgpu_ucode_request(adev, &adev->gfx.pfp_fw, + AMDGPU_UCODE_REQUIRED, "amdgpu/%s_pfp.bin", chip_name); if (err) goto out; err = amdgpu_ucode_request(adev, &adev->gfx.me_fw, + AMDGPU_UCODE_REQUIRED, "amdgpu/%s_me.bin", chip_name); if (err) goto out; err = amdgpu_ucode_request(adev, &adev->gfx.ce_fw, + AMDGPU_UCODE_REQUIRED, "amdgpu/%s_ce.bin", chip_name); if (err) goto out; err = amdgpu_ucode_request(adev, &adev->gfx.mec_fw, + AMDGPU_UCODE_REQUIRED, "amdgpu/%s_mec.bin", chip_name); if (err) goto out; if (adev->asic_type == CHIP_KAVERI) { err = amdgpu_ucode_request(adev, &adev->gfx.mec2_fw, + AMDGPU_UCODE_REQUIRED, "amdgpu/%s_mec2.bin", chip_name); if (err) goto out; } err = amdgpu_ucode_request(adev, &adev->gfx.rlc_fw, + AMDGPU_UCODE_REQUIRED, "amdgpu/%s_rlc.bin", chip_name); out: if (err) { @@ -2324,7 +2330,7 @@ static int gfx_v7_0_ring_test_ib(struct amdgpu_ring *ring, long timeout) r = -EINVAL; error: - amdgpu_ib_free(adev, &ib, NULL); + amdgpu_ib_free(&ib, NULL); dma_fence_put(f); return r; } @@ -4846,11 +4852,11 @@ static int gfx_v7_0_priv_inst_irq(struct amdgpu_device *adev, return 0; } -static int gfx_v7_0_set_clockgating_state(void *handle, +static int gfx_v7_0_set_clockgating_state(struct amdgpu_ip_block *ip_block, enum amd_clockgating_state state) { bool gate = false; - struct amdgpu_device *adev = (struct amdgpu_device *)handle; + struct amdgpu_device *adev = ip_block->adev; if (state == AMD_CG_STATE_GATE) gate = true; @@ -4869,11 +4875,11 @@ static int gfx_v7_0_set_clockgating_state(void *handle, return 0; } -static int gfx_v7_0_set_powergating_state(void *handle, +static int gfx_v7_0_set_powergating_state(struct amdgpu_ip_block *ip_block, enum amd_powergating_state state) { bool gate = false; - struct amdgpu_device *adev = (struct amdgpu_device *)handle; + struct amdgpu_device *adev = ip_block->adev; if (state == AMD_PG_STATE_GATE) gate = true; diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c index b7006c41e270b6d2900bf0e1f3499fa3cd53a52b..af73f85527b7a91ea5a570a224cd38f44574a4d2 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c @@ -914,7 +914,7 @@ static int gfx_v8_0_ring_test_ib(struct amdgpu_ring *ring, long timeout) r = -EINVAL; err2: - amdgpu_ib_free(adev, &ib, NULL); + amdgpu_ib_free(&ib, NULL); dma_fence_put(f); err1: amdgpu_device_wb_free(adev, index); @@ -982,13 +982,16 @@ static int gfx_v8_0_init_microcode(struct amdgpu_device *adev) if (adev->asic_type >= CHIP_POLARIS10 && adev->asic_type <= CHIP_POLARIS12) { err = amdgpu_ucode_request(adev, &adev->gfx.pfp_fw, + AMDGPU_UCODE_OPTIONAL, "amdgpu/%s_pfp_2.bin", chip_name); if (err == -ENODEV) { err = amdgpu_ucode_request(adev, &adev->gfx.pfp_fw, + AMDGPU_UCODE_REQUIRED, "amdgpu/%s_pfp.bin", chip_name); } } else { err = amdgpu_ucode_request(adev, &adev->gfx.pfp_fw, + AMDGPU_UCODE_REQUIRED, "amdgpu/%s_pfp.bin", chip_name); } if (err) @@ -999,13 +1002,16 @@ static int gfx_v8_0_init_microcode(struct amdgpu_device *adev) if (adev->asic_type >= CHIP_POLARIS10 && adev->asic_type <= CHIP_POLARIS12) { err = amdgpu_ucode_request(adev, &adev->gfx.me_fw, + AMDGPU_UCODE_OPTIONAL, "amdgpu/%s_me_2.bin", chip_name); if (err == -ENODEV) { err = amdgpu_ucode_request(adev, &adev->gfx.me_fw, + AMDGPU_UCODE_REQUIRED, "amdgpu/%s_me.bin", chip_name); } } else { err = amdgpu_ucode_request(adev, &adev->gfx.me_fw, + AMDGPU_UCODE_REQUIRED, "amdgpu/%s_me.bin", chip_name); } if (err) @@ -1017,13 +1023,16 @@ static int gfx_v8_0_init_microcode(struct amdgpu_device *adev) if (adev->asic_type >= CHIP_POLARIS10 && adev->asic_type <= CHIP_POLARIS12) { err = amdgpu_ucode_request(adev, &adev->gfx.ce_fw, + AMDGPU_UCODE_OPTIONAL, "amdgpu/%s_ce_2.bin", chip_name); if (err == -ENODEV) { err = amdgpu_ucode_request(adev, &adev->gfx.ce_fw, + AMDGPU_UCODE_REQUIRED, "amdgpu/%s_ce.bin", chip_name); } } else { err = amdgpu_ucode_request(adev, &adev->gfx.ce_fw, + AMDGPU_UCODE_REQUIRED, "amdgpu/%s_ce.bin", chip_name); } if (err) @@ -1044,6 +1053,7 @@ static int gfx_v8_0_init_microcode(struct amdgpu_device *adev) adev->virt.chained_ib_support = false; err = amdgpu_ucode_request(adev, &adev->gfx.rlc_fw, + AMDGPU_UCODE_REQUIRED, "amdgpu/%s_rlc.bin", chip_name); if (err) goto out; @@ -1093,13 +1103,16 @@ static int gfx_v8_0_init_microcode(struct amdgpu_device *adev) if (adev->asic_type >= CHIP_POLARIS10 && adev->asic_type <= CHIP_POLARIS12) { err = amdgpu_ucode_request(adev, &adev->gfx.mec_fw, + AMDGPU_UCODE_OPTIONAL, "amdgpu/%s_mec_2.bin", chip_name); if (err == -ENODEV) { err = amdgpu_ucode_request(adev, &adev->gfx.mec_fw, + AMDGPU_UCODE_REQUIRED, "amdgpu/%s_mec.bin", chip_name); } } else { err = amdgpu_ucode_request(adev, &adev->gfx.mec_fw, + AMDGPU_UCODE_REQUIRED, "amdgpu/%s_mec.bin", chip_name); } if (err) @@ -1112,13 +1125,16 @@ static int gfx_v8_0_init_microcode(struct amdgpu_device *adev) (adev->asic_type != CHIP_TOPAZ)) { if (adev->asic_type >= CHIP_POLARIS10 && adev->asic_type <= CHIP_POLARIS12) { err = amdgpu_ucode_request(adev, &adev->gfx.mec2_fw, + AMDGPU_UCODE_OPTIONAL, "amdgpu/%s_mec2_2.bin", chip_name); if (err == -ENODEV) { err = amdgpu_ucode_request(adev, &adev->gfx.mec2_fw, + AMDGPU_UCODE_REQUIRED, "amdgpu/%s_mec2.bin", chip_name); } } else { err = amdgpu_ucode_request(adev, &adev->gfx.mec2_fw, + AMDGPU_UCODE_REQUIRED, "amdgpu/%s_mec2.bin", chip_name); } if (!err) { @@ -1640,7 +1656,7 @@ static int gfx_v8_0_do_edc_gpr_workarounds(struct amdgpu_device *adev) RREG32(sec_ded_counter_registers[i]); fail: - amdgpu_ib_free(adev, &ib, NULL); + amdgpu_ib_free(&ib, NULL); dma_fence_put(f); return r; @@ -4304,9 +4320,7 @@ static void gfx_v8_0_kiq_setting(struct amdgpu_ring *ring) tmp = RREG32(mmRLC_CP_SCHEDULERS); tmp &= 0xffffff00; tmp |= (ring->me << 5) | (ring->pipe << 3) | (ring->queue); - WREG32(mmRLC_CP_SCHEDULERS, tmp); - tmp |= 0x80; - WREG32(mmRLC_CP_SCHEDULERS, tmp); + WREG32(mmRLC_CP_SCHEDULERS, tmp | 0x80); } static int gfx_v8_0_kiq_kcq_enable(struct amdgpu_device *adev) @@ -5321,7 +5335,7 @@ static void gfx_v8_0_enable_gfx_static_mg_power_gating(struct amdgpu_device *ade (adev->asic_type == CHIP_POLARIS12) || (adev->asic_type == CHIP_VEGAM)) /* Send msg to SMU via Powerplay */ - amdgpu_dpm_set_powergating_by_smu(adev, AMD_IP_BLOCK_TYPE_GFX, enable); + amdgpu_dpm_set_powergating_by_smu(adev, AMD_IP_BLOCK_TYPE_GFX, enable, 0); WREG32_FIELD(RLC_PG_CNTL, STATIC_PER_CU_PG_ENABLE, enable ? 1 : 0); } @@ -5367,10 +5381,10 @@ static void cz_update_gfx_cg_power_gating(struct amdgpu_device *adev, } } -static int gfx_v8_0_set_powergating_state(void *handle, +static int gfx_v8_0_set_powergating_state(struct amdgpu_ip_block *ip_block, enum amd_powergating_state state) { - struct amdgpu_device *adev = (struct amdgpu_device *)handle; + struct amdgpu_device *adev = ip_block->adev; bool enable = (state == AMD_PG_STATE_GATE); if (amdgpu_sriov_vf(adev)) @@ -5982,10 +5996,10 @@ static int gfx_v8_0_polaris_update_gfx_clock_gating(struct amdgpu_device *adev, return 0; } -static int gfx_v8_0_set_clockgating_state(void *handle, +static int gfx_v8_0_set_clockgating_state(struct amdgpu_ip_block *ip_block, enum amd_clockgating_state state) { - struct amdgpu_device *adev = (struct amdgpu_device *)handle; + struct amdgpu_device *adev = ip_block->adev; if (amdgpu_sriov_vf(adev)) return 0; diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c index 0b6f09f2cc9bd01acf69ffc7dbd8878a4edfd8fd..4b5006dc3d347d4936c59f3be3773c19d4497ea5 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c @@ -1243,7 +1243,7 @@ static int gfx_v9_0_ring_test_ib(struct amdgpu_ring *ring, long timeout) r = -EINVAL; err2: - amdgpu_ib_free(adev, &ib, NULL); + amdgpu_ib_free(&ib, NULL); dma_fence_put(f); err1: amdgpu_device_wb_free(adev, index); @@ -1429,18 +1429,21 @@ static int gfx_v9_0_init_cp_gfx_microcode(struct amdgpu_device *adev, int err; err = amdgpu_ucode_request(adev, &adev->gfx.pfp_fw, + AMDGPU_UCODE_REQUIRED, "amdgpu/%s_pfp.bin", chip_name); if (err) goto out; amdgpu_gfx_cp_init_microcode(adev, AMDGPU_UCODE_ID_CP_PFP); err = amdgpu_ucode_request(adev, &adev->gfx.me_fw, + AMDGPU_UCODE_REQUIRED, "amdgpu/%s_me.bin", chip_name); if (err) goto out; amdgpu_gfx_cp_init_microcode(adev, AMDGPU_UCODE_ID_CP_ME); err = amdgpu_ucode_request(adev, &adev->gfx.ce_fw, + AMDGPU_UCODE_REQUIRED, "amdgpu/%s_ce.bin", chip_name); if (err) goto out; @@ -1476,6 +1479,7 @@ static int gfx_v9_0_init_rlc_microcode(struct amdgpu_device *adev, (((adev->pdev->revision >= 0xC8) && (adev->pdev->revision <= 0xCF)) || ((adev->pdev->revision >= 0xD8) && (adev->pdev->revision <= 0xDF)))) err = amdgpu_ucode_request(adev, &adev->gfx.rlc_fw, + AMDGPU_UCODE_REQUIRED, "amdgpu/%s_rlc_am4.bin", chip_name); else if (!strcmp(chip_name, "raven") && (amdgpu_pm_load_smu_firmware(adev, &smu_version) == 0) && (smu_version >= 0x41e2b)) @@ -1483,9 +1487,11 @@ static int gfx_v9_0_init_rlc_microcode(struct amdgpu_device *adev, *SMC is loaded by SBIOS on APU and it's able to get the SMU version directly. */ err = amdgpu_ucode_request(adev, &adev->gfx.rlc_fw, + AMDGPU_UCODE_REQUIRED, "amdgpu/%s_kicker_rlc.bin", chip_name); else err = amdgpu_ucode_request(adev, &adev->gfx.rlc_fw, + AMDGPU_UCODE_REQUIRED, "amdgpu/%s_rlc.bin", chip_name); if (err) goto out; @@ -1518,9 +1524,11 @@ static int gfx_v9_0_init_cp_compute_microcode(struct amdgpu_device *adev, if (amdgpu_sriov_vf(adev) && (adev->asic_type == CHIP_ALDEBARAN)) err = amdgpu_ucode_request(adev, &adev->gfx.mec_fw, - "amdgpu/%s_sjt_mec.bin", chip_name); + AMDGPU_UCODE_REQUIRED, + "amdgpu/%s_sjt_mec.bin", chip_name); else err = amdgpu_ucode_request(adev, &adev->gfx.mec_fw, + AMDGPU_UCODE_REQUIRED, "amdgpu/%s_mec.bin", chip_name); if (err) goto out; @@ -1531,9 +1539,11 @@ static int gfx_v9_0_init_cp_compute_microcode(struct amdgpu_device *adev, if (gfx_v9_0_load_mec2_fw_bin_support(adev)) { if (amdgpu_sriov_vf(adev) && (adev->asic_type == CHIP_ALDEBARAN)) err = amdgpu_ucode_request(adev, &adev->gfx.mec2_fw, + AMDGPU_UCODE_REQUIRED, "amdgpu/%s_sjt_mec2.bin", chip_name); else err = amdgpu_ucode_request(adev, &adev->gfx.mec2_fw, + AMDGPU_UCODE_REQUIRED, "amdgpu/%s_mec2.bin", chip_name); if (!err) { amdgpu_gfx_cp_init_microcode(adev, AMDGPU_UCODE_ID_CP_MEC2); @@ -3488,9 +3498,7 @@ static void gfx_v9_0_kiq_setting(struct amdgpu_ring *ring) tmp = RREG32_SOC15(GC, 0, mmRLC_CP_SCHEDULERS); tmp &= 0xffffff00; tmp |= (ring->me << 5) | (ring->pipe << 3) | (ring->queue); - WREG32_SOC15_RLC(GC, 0, mmRLC_CP_SCHEDULERS, tmp); - tmp |= 0x80; - WREG32_SOC15_RLC(GC, 0, mmRLC_CP_SCHEDULERS, tmp); + WREG32_SOC15_RLC(GC, 0, mmRLC_CP_SCHEDULERS, tmp | 0x80); } static void gfx_v9_0_mqd_set_priority(struct amdgpu_ring *ring, struct v9_mqd *mqd) @@ -4780,7 +4788,7 @@ static int gfx_v9_0_do_edc_gpr_workarounds(struct amdgpu_device *adev) } fail: - amdgpu_ib_free(adev, &ib, NULL); + amdgpu_ib_free(&ib, NULL); dma_fence_put(f); return r; @@ -5232,10 +5240,10 @@ static const struct amdgpu_rlc_funcs gfx_v9_0_rlc_funcs = { .is_rlcg_access_range = gfx_v9_0_is_rlcg_access_range, }; -static int gfx_v9_0_set_powergating_state(void *handle, +static int gfx_v9_0_set_powergating_state(struct amdgpu_ip_block *ip_block, enum amd_powergating_state state) { - struct amdgpu_device *adev = (struct amdgpu_device *)handle; + struct amdgpu_device *adev = ip_block->adev; bool enable = (state == AMD_PG_STATE_GATE); switch (amdgpu_ip_version(adev, GC_HWIP, 0)) { @@ -5277,10 +5285,10 @@ static int gfx_v9_0_set_powergating_state(void *handle, return 0; } -static int gfx_v9_0_set_clockgating_state(void *handle, +static int gfx_v9_0_set_clockgating_state(struct amdgpu_ip_block *ip_block, enum amd_clockgating_state state) { - struct amdgpu_device *adev = (struct amdgpu_device *)handle; + struct amdgpu_device *adev = ip_block->adev; if (amdgpu_sriov_vf(adev)) return 0; diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_2.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_2.c index 3f4fd2f08163def8957dea468e2b0eec025483d0..d81449f9d8225737be04b174ae2d217e933e23b3 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_2.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_2.c @@ -412,7 +412,7 @@ static int gfx_v9_4_2_run_shader(struct amdgpu_device *adev, r = amdgpu_ib_schedule(ring, 1, ib, NULL, fence_ptr); if (r) { dev_err(adev->dev, "ib submit failed (%d).\n", r); - amdgpu_ib_free(adev, ib, NULL); + amdgpu_ib_free(ib, NULL); } return r; } @@ -611,16 +611,16 @@ static int gfx_v9_4_2_do_sgprs_init(struct amdgpu_device *adev) } disp2_failed: - amdgpu_ib_free(adev, &disp_ibs[2], NULL); + amdgpu_ib_free(&disp_ibs[2], NULL); dma_fence_put(fences[2]); disp1_failed: - amdgpu_ib_free(adev, &disp_ibs[1], NULL); + amdgpu_ib_free(&disp_ibs[1], NULL); dma_fence_put(fences[1]); disp0_failed: - amdgpu_ib_free(adev, &disp_ibs[0], NULL); + amdgpu_ib_free(&disp_ibs[0], NULL); dma_fence_put(fences[0]); pro_end: - amdgpu_ib_free(adev, &wb_ib, NULL); + amdgpu_ib_free(&wb_ib, NULL); if (r) dev_info(adev->dev, "Init SGPRS Failed\n"); @@ -687,10 +687,10 @@ static int gfx_v9_4_2_do_vgprs_init(struct amdgpu_device *adev) } disp_failed: - amdgpu_ib_free(adev, &disp_ib, NULL); + amdgpu_ib_free(&disp_ib, NULL); dma_fence_put(fence); pro_end: - amdgpu_ib_free(adev, &wb_ib, NULL); + amdgpu_ib_free(&wb_ib, NULL); if (r) dev_info(adev->dev, "Init VGPRS Failed\n"); diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c index e2b3dda57030c068d41aa025e8b6e06240ca13e7..a5bdcaf7a0814fd0ed9a35c306258c912cc86470 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c @@ -43,8 +43,12 @@ MODULE_FIRMWARE("amdgpu/gc_9_4_3_mec.bin"); MODULE_FIRMWARE("amdgpu/gc_9_4_4_mec.bin"); +MODULE_FIRMWARE("amdgpu/gc_9_5_0_mec.bin"); MODULE_FIRMWARE("amdgpu/gc_9_4_3_rlc.bin"); MODULE_FIRMWARE("amdgpu/gc_9_4_4_rlc.bin"); +MODULE_FIRMWARE("amdgpu/gc_9_5_0_rlc.bin"); +MODULE_FIRMWARE("amdgpu/gc_9_4_3_sjt_mec.bin"); +MODULE_FIRMWARE("amdgpu/gc_9_4_4_sjt_mec.bin"); #define GFX9_MEC_HPD_SIZE 4096 #define RLCG_UCODE_LOADING_START_ADDRESS 0x00002000L @@ -52,10 +56,6 @@ MODULE_FIRMWARE("amdgpu/gc_9_4_4_rlc.bin"); #define GOLDEN_GB_ADDR_CONFIG 0x2a114042 #define CP_HQD_PERSISTENT_STATE_DEFAULT 0xbe05301 -#define mmSMNAID_XCD0_MCA_SMU 0x36430400 /* SMN AID XCD0 */ -#define mmSMNAID_XCD1_MCA_SMU 0x38430400 /* SMN AID XCD1 */ -#define mmSMNXCD_XCD0_MCA_SMU 0x40430400 /* SMN XCD XCD0 */ - #define XCC_REG_RANGE_0_LOW 0x2000 /* XCC gfxdec0 lower Bound */ #define XCC_REG_RANGE_0_HIGH 0x3400 /* XCC gfxdec0 upper Bound */ #define XCC_REG_RANGE_1_LOW 0xA000 /* XCC gfxdec1 lower Bound */ @@ -349,13 +349,17 @@ static void gfx_v9_4_3_init_golden_registers(struct amdgpu_device *adev) WREG32_SOC15(GC, dev_inst, regGB_ADDR_CONFIG, GOLDEN_GB_ADDR_CONFIG); - /* Golden settings applied by driver for ASIC with rev_id 0 */ - if (adev->rev_id == 0) { - WREG32_FIELD15_PREREG(GC, dev_inst, TCP_UTCL1_CNTL1, - REDUCE_FIFO_DEPTH_BY_2, 2); + if (amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(9, 5, 0)) { + WREG32_FIELD15_PREREG(GC, dev_inst, TCP_UTCL1_CNTL2, SPARE, 0x1); } else { - WREG32_FIELD15_PREREG(GC, dev_inst, TCP_UTCL1_CNTL2, - SPARE, 0x1); + /* Golden settings applied by driver for ASIC with rev_id 0 */ + if (adev->rev_id == 0) { + WREG32_FIELD15_PREREG(GC, dev_inst, TCP_UTCL1_CNTL1, + REDUCE_FIFO_DEPTH_BY_2, 2); + } else { + WREG32_FIELD15_PREREG(GC, dev_inst, TCP_UTCL1_CNTL2, + SPARE, 0x1); + } } } } @@ -499,7 +503,7 @@ static int gfx_v9_4_3_ring_test_ib(struct amdgpu_ring *ring, long timeout) r = -EINVAL; err2: - amdgpu_ib_free(adev, &ib, NULL); + amdgpu_ib_free(&ib, NULL); dma_fence_put(f); err1: amdgpu_device_wb_free(adev, index); @@ -543,6 +547,7 @@ static int gfx_v9_4_3_init_rlc_microcode(struct amdgpu_device *adev, err = amdgpu_ucode_request(adev, &adev->gfx.rlc_fw, + AMDGPU_UCODE_REQUIRED, "amdgpu/%s_rlc.bin", chip_name); if (err) goto out; @@ -574,8 +579,14 @@ static int gfx_v9_4_3_init_cp_compute_microcode(struct amdgpu_device *adev, { int err; - err = amdgpu_ucode_request(adev, &adev->gfx.mec_fw, - "amdgpu/%s_mec.bin", chip_name); + if (amdgpu_sriov_vf(adev)) + err = amdgpu_ucode_request(adev, &adev->gfx.mec_fw, + AMDGPU_UCODE_REQUIRED, + "amdgpu/%s_sjt_mec.bin", chip_name); + else + err = amdgpu_ucode_request(adev, &adev->gfx.mec_fw, + AMDGPU_UCODE_REQUIRED, + "amdgpu/%s_mec.bin", chip_name); if (err) goto out; amdgpu_gfx_cp_init_microcode(adev, AMDGPU_UCODE_ID_CP_MEC1); @@ -929,6 +940,7 @@ static int gfx_v9_4_3_gpu_early_init(struct amdgpu_device *adev) switch (amdgpu_ip_version(adev, GC_HWIP, 0)) { case IP_VERSION(9, 4, 3): case IP_VERSION(9, 4, 4): + case IP_VERSION(9, 5, 0): adev->gfx.config.max_hw_contexts = 8; adev->gfx.config.sc_prim_fifo_size_frontend = 0x20; adev->gfx.config.sc_prim_fifo_size_backend = 0x100; @@ -1779,9 +1791,7 @@ static void gfx_v9_4_3_xcc_kiq_setting(struct amdgpu_ring *ring, int xcc_id) tmp = RREG32_SOC15(GC, GET_INST(GC, xcc_id), regRLC_CP_SCHEDULERS); tmp &= 0xffffff00; tmp |= (ring->me << 5) | (ring->pipe << 3) | (ring->queue); - WREG32_SOC15_RLC(GC, GET_INST(GC, xcc_id), regRLC_CP_SCHEDULERS, tmp); - tmp |= 0x80; - WREG32_SOC15_RLC(GC, GET_INST(GC, xcc_id), regRLC_CP_SCHEDULERS, tmp); + WREG32_SOC15_RLC(GC, GET_INST(GC, xcc_id), regRLC_CP_SCHEDULERS, tmp | 0x80); } static void gfx_v9_4_3_mqd_set_priority(struct amdgpu_ring *ring, struct v9_mqd *mqd) @@ -2764,16 +2774,16 @@ static const struct amdgpu_rlc_funcs gfx_v9_4_3_rlc_funcs = { .is_rlcg_access_range = gfx_v9_4_3_is_rlcg_access_range, }; -static int gfx_v9_4_3_set_powergating_state(void *handle, +static int gfx_v9_4_3_set_powergating_state(struct amdgpu_ip_block *ip_block, enum amd_powergating_state state) { return 0; } -static int gfx_v9_4_3_set_clockgating_state(void *handle, +static int gfx_v9_4_3_set_clockgating_state(struct amdgpu_ip_block *ip_block, enum amd_clockgating_state state) { - struct amdgpu_device *adev = (struct amdgpu_device *)handle; + struct amdgpu_device *adev = ip_block->adev; int i, num_xcc; if (amdgpu_sriov_vf(adev)) @@ -4653,7 +4663,6 @@ static void gfx_v9_4_3_ip_dump(struct amdgpu_ip_block *ip_block) num_xcc = NUM_XCC(adev->gfx.xcc_mask); - amdgpu_gfx_off_ctrl(adev, false); for (xcc_id = 0; xcc_id < num_xcc; xcc_id++) { xcc_offset = xcc_id * reg_count; for (i = 0; i < reg_count; i++) @@ -4661,7 +4670,6 @@ static void gfx_v9_4_3_ip_dump(struct amdgpu_ip_block *ip_block) RREG32(SOC15_REG_ENTRY_OFFSET_INST(gc_reg_list_9_4_3[i], GET_INST(GC, xcc_id))); } - amdgpu_gfx_off_ctrl(adev, true); /* dump compute queue registers for all instances */ if (!adev->gfx.ip_dump_compute_queues) @@ -4670,7 +4678,6 @@ static void gfx_v9_4_3_ip_dump(struct amdgpu_ip_block *ip_block) num_inst = adev->gfx.mec.num_mec * adev->gfx.mec.num_pipe_per_mec * adev->gfx.mec.num_queue_per_pipe; reg_count = ARRAY_SIZE(gc_cp_reg_list_9_4_3); - amdgpu_gfx_off_ctrl(adev, false); mutex_lock(&adev->srbm_mutex); for (xcc_id = 0; xcc_id < num_xcc; xcc_id++) { xcc_offset = xcc_id * reg_count * num_inst; @@ -4697,7 +4704,6 @@ static void gfx_v9_4_3_ip_dump(struct amdgpu_ip_block *ip_block) } soc15_grbm_select(adev, 0, 0, 0, 0, 0); mutex_unlock(&adev->srbm_mutex); - amdgpu_gfx_off_ctrl(adev, true); } static void gfx_v9_4_3_ring_emit_cleaner_shader(struct amdgpu_ring *ring) @@ -4860,6 +4866,7 @@ static void gfx_v9_4_3_set_gds_init(struct amdgpu_device *adev) switch (amdgpu_ip_version(adev, GC_HWIP, 0)) { case IP_VERSION(9, 4, 3): case IP_VERSION(9, 4, 4): + case IP_VERSION(9, 5, 0): /* 9.4.3 removed all the GDS internal memory, * only support GWS opcode in kernel, like barrier * semaphore.etc */ @@ -4873,6 +4880,7 @@ static void gfx_v9_4_3_set_gds_init(struct amdgpu_device *adev) switch (amdgpu_ip_version(adev, GC_HWIP, 0)) { case IP_VERSION(9, 4, 3): case IP_VERSION(9, 4, 4): + case IP_VERSION(9, 5, 0): /* deprecated for 9.4.3, no usage at all */ adev->gds.gds_compute_max_wave_id = 0; break; diff --git a/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_2.c b/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_2.c index ed8e130c7d195bd6c19d59fda4696717e3a7bf54..5470cef7e9bd172eea2f933365551ae40695d920 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_2.c +++ b/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_2.c @@ -368,7 +368,9 @@ static void gfxhub_v1_2_xcc_setup_vmid_config(struct amdgpu_device *adev, amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(9, 4, 3) || amdgpu_ip_version(adev, GC_HWIP, 0) == - IP_VERSION(9, 4, 4)); + IP_VERSION(9, 4, 4) || + amdgpu_ip_version(adev, GC_HWIP, 0) == + IP_VERSION(9, 5, 0)); WREG32_SOC15_OFFSET(GC, GET_INST(GC, j), regVM_CONTEXT1_CNTL, i * hub->ctx_distance, tmp); WREG32_SOC15_OFFSET(GC, GET_INST(GC, j), diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c index 697599c46240ed9b7190765f4d48211ebc2f9fbf..9bedca9a79c63c69123bddcaf8c82613d2524f55 100644 --- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c @@ -1088,11 +1088,11 @@ static int gmc_v10_0_wait_for_idle(struct amdgpu_ip_block *ip_block) return 0; } -static int gmc_v10_0_set_clockgating_state(void *handle, +static int gmc_v10_0_set_clockgating_state(struct amdgpu_ip_block *ip_block, enum amd_clockgating_state state) { int r; - struct amdgpu_device *adev = (struct amdgpu_device *)handle; + struct amdgpu_device *adev = ip_block->adev; /* * The issue mmhub can't disconnect from DF with MMHUB clock gating being disabled @@ -1131,7 +1131,7 @@ static void gmc_v10_0_get_clockgating_state(void *handle, u64 *flags) athub_v2_0_get_clockgating(adev, flags); } -static int gmc_v10_0_set_powergating_state(void *handle, +static int gmc_v10_0_set_powergating_state(struct amdgpu_ip_block *ip_block, enum amd_powergating_state state) { return 0; diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c index f893ab4c14df353ff345e9d72bdd398cacd8f1dc..72751ab4c766ec2e1f8de0d2d98d5c1e6ba62b51 100644 --- a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c @@ -996,11 +996,11 @@ static int gmc_v11_0_wait_for_idle(struct amdgpu_ip_block *ip_block) return 0; } -static int gmc_v11_0_set_clockgating_state(void *handle, +static int gmc_v11_0_set_clockgating_state(struct amdgpu_ip_block *ip_block, enum amd_clockgating_state state) { int r; - struct amdgpu_device *adev = (struct amdgpu_device *)handle; + struct amdgpu_device *adev = ip_block->adev; r = adev->mmhub.funcs->set_clockgating(adev, state); if (r) @@ -1018,7 +1018,7 @@ static void gmc_v11_0_get_clockgating_state(void *handle, u64 *flags) athub_v3_0_get_clockgating(adev, flags); } -static int gmc_v11_0_set_powergating_state(void *handle, +static int gmc_v11_0_set_powergating_state(struct amdgpu_ip_block *ip_block, enum amd_powergating_state state) { return 0; diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v12_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v12_0.c index d22b027fd0bb8f5a58dcb260efc08ffb55540849..b749f1c3f6a9afb65634ed118f4734438df11877 100644 --- a/drivers/gpu/drm/amd/amdgpu/gmc_v12_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v12_0.c @@ -40,7 +40,7 @@ #include "gfxhub_v12_0.h" #include "mmhub_v4_1_0.h" #include "athub_v4_1_0.h" - +#include "umc_v8_14.h" static int gmc_v12_0_ecc_interrupt_state(struct amdgpu_device *adev, struct amdgpu_irq_src *src, @@ -581,6 +581,18 @@ static void gmc_v12_0_set_gmc_funcs(struct amdgpu_device *adev) static void gmc_v12_0_set_umc_funcs(struct amdgpu_device *adev) { + switch (amdgpu_ip_version(adev, UMC_HWIP, 0)) { + case IP_VERSION(8, 14, 0): + adev->umc.channel_inst_num = UMC_V8_14_CHANNEL_INSTANCE_NUM; + adev->umc.umc_inst_num = UMC_V8_14_UMC_INSTANCE_NUM(adev); + adev->umc.node_inst_num = 0; + adev->umc.max_ras_err_cnt_per_query = UMC_V8_14_TOTAL_CHANNEL_NUM(adev); + adev->umc.channel_offs = UMC_V8_14_PER_CHANNEL_OFFSET; + adev->umc.ras = &umc_v8_14_ras; + break; + default: + break; + } } @@ -829,6 +841,10 @@ static int gmc_v12_0_sw_init(struct amdgpu_ip_block *ip_block) amdgpu_vm_manager_init(adev); + r = amdgpu_gmc_ras_sw_init(adev); + if (r) + return r; + return 0; } @@ -980,11 +996,11 @@ static int gmc_v12_0_wait_for_idle(struct amdgpu_ip_block *ip_block) return 0; } -static int gmc_v12_0_set_clockgating_state(void *handle, +static int gmc_v12_0_set_clockgating_state(struct amdgpu_ip_block *ip_block, enum amd_clockgating_state state) { int r; - struct amdgpu_device *adev = (struct amdgpu_device *)handle; + struct amdgpu_device *adev = ip_block->adev; r = adev->mmhub.funcs->set_clockgating(adev, state); if (r) @@ -1002,7 +1018,7 @@ static void gmc_v12_0_get_clockgating_state(void *handle, u64 *flags) athub_v4_1_0_get_clockgating(adev, flags); } -static int gmc_v12_0_set_powergating_state(void *handle, +static int gmc_v12_0_set_powergating_state(struct amdgpu_ip_block *ip_block, enum amd_powergating_state state) { return 0; diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c index ca000b3d1afcd19309e8c33313d17325b50c2760..2245dda92021c3b9f82bb82104391bd25251828e 100644 --- a/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c @@ -131,7 +131,8 @@ static int gmc_v6_0_init_microcode(struct amdgpu_device *adev) if (((RREG32(mmMC_SEQ_MISC0) & 0xff000000) >> 24) == 0x58) chip_name = "si58"; - err = amdgpu_ucode_request(adev, &adev->gmc.fw, "amdgpu/%s_mc.bin", chip_name); + err = amdgpu_ucode_request(adev, &adev->gmc.fw, AMDGPU_UCODE_REQUIRED, + "amdgpu/%s_mc.bin", chip_name); if (err) { dev_err(adev->dev, "si_mc: Failed to load firmware \"%s_mc.bin\"\n", @@ -1094,13 +1095,13 @@ static int gmc_v6_0_process_interrupt(struct amdgpu_device *adev, return 0; } -static int gmc_v6_0_set_clockgating_state(void *handle, +static int gmc_v6_0_set_clockgating_state(struct amdgpu_ip_block *ip_block, enum amd_clockgating_state state) { return 0; } -static int gmc_v6_0_set_powergating_state(void *handle, +static int gmc_v6_0_set_powergating_state(struct amdgpu_ip_block *ip_block, enum amd_powergating_state state) { return 0; diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c index b6016f11956ec1abfb32c7b29b5e1f5f1c0e3392..9aac4b1101e3aa744e39614f534e14743efcfc04 100644 --- a/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c @@ -157,7 +157,8 @@ static int gmc_v7_0_init_microcode(struct amdgpu_device *adev) return -EINVAL; } - err = amdgpu_ucode_request(adev, &adev->gmc.fw, "amdgpu/%s_mc.bin", chip_name); + err = amdgpu_ucode_request(adev, &adev->gmc.fw, AMDGPU_UCODE_REQUIRED, + "amdgpu/%s_mc.bin", chip_name); if (err) { pr_err("cik_mc: Failed to load firmware \"%s_mc.bin\"\n", chip_name); amdgpu_ucode_release(&adev->gmc.fw); @@ -1317,11 +1318,11 @@ static int gmc_v7_0_process_interrupt(struct amdgpu_device *adev, return 0; } -static int gmc_v7_0_set_clockgating_state(void *handle, +static int gmc_v7_0_set_clockgating_state(struct amdgpu_ip_block *ip_block, enum amd_clockgating_state state) { bool gate = false; - struct amdgpu_device *adev = (struct amdgpu_device *)handle; + struct amdgpu_device *adev = ip_block->adev; if (state == AMD_CG_STATE_GATE) gate = true; @@ -1337,7 +1338,7 @@ static int gmc_v7_0_set_clockgating_state(void *handle, return 0; } -static int gmc_v7_0_set_powergating_state(void *handle, +static int gmc_v7_0_set_powergating_state(struct amdgpu_ip_block *ip_block, enum amd_powergating_state state) { return 0; diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c index 12d5967ecd45fe974e064d8e1392c8d020210d32..d06585207c331b34716c591c20a57dadc3757330 100644 --- a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c @@ -259,7 +259,8 @@ static int gmc_v8_0_init_microcode(struct amdgpu_device *adev) return -EINVAL; } - err = amdgpu_ucode_request(adev, &adev->gmc.fw, "amdgpu/%s_mc.bin", chip_name); + err = amdgpu_ucode_request(adev, &adev->gmc.fw, AMDGPU_UCODE_REQUIRED, + "amdgpu/%s_mc.bin", chip_name); if (err) { pr_err("mc: Failed to load firmware \"%s_mc.bin\"\n", chip_name); amdgpu_ucode_release(&adev->gmc.fw); @@ -1658,10 +1659,10 @@ static void fiji_update_mc_light_sleep(struct amdgpu_device *adev, } } -static int gmc_v8_0_set_clockgating_state(void *handle, +static int gmc_v8_0_set_clockgating_state(struct amdgpu_ip_block *ip_block, enum amd_clockgating_state state) { - struct amdgpu_device *adev = (struct amdgpu_device *)handle; + struct amdgpu_device *adev = ip_block->adev; if (amdgpu_sriov_vf(adev)) return 0; @@ -1679,7 +1680,7 @@ static int gmc_v8_0_set_clockgating_state(void *handle, return 0; } -static int gmc_v8_0_set_powergating_state(void *handle, +static int gmc_v8_0_set_powergating_state(struct amdgpu_ip_block *ip_block, enum amd_powergating_state state) { return 0; diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c index 50c5da3020cb317befab39e15ff98fb7283d1deb..291549765c38c5b18f92d477d9038bd044192f0c 100644 --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c @@ -623,6 +623,9 @@ static int gmc_v9_0_process_interrupt(struct amdgpu_device *adev, } } + if (kgd2kfd_vmfault_fast_path(adev, entry, retry_fault)) + return 1; + if (!printk_ratelimit()) return 0; @@ -645,7 +648,8 @@ static int gmc_v9_0_process_interrupt(struct amdgpu_device *adev, soc15_ih_clientid_name[entry->client_id]); if (amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(9, 4, 3) || - amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(9, 4, 4)) + amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(9, 4, 4) || + amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(9, 5, 0)) dev_err(adev->dev, " cookie node_id %d fault from die %s%d%s\n", node_id, node_id % 4 == 3 ? "RSV" : "AID", node_id / 4, node_id % 4 == 1 ? ".XCD0" : node_id % 4 == 2 ? ".XCD1" : ""); @@ -795,7 +799,8 @@ static bool gmc_v9_0_use_invalidate_semaphore(struct amdgpu_device *adev, { if (amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(9, 4, 2) || amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(9, 4, 3) || - amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(9, 4, 4)) + amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(9, 4, 4) || + amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(9, 5, 0)) return false; return ((vmhub == AMDGPU_MMHUB0(0) || @@ -1138,12 +1143,13 @@ static void gmc_v9_0_get_coherence_flags(struct amdgpu_device *adev, bool uncached = bo->flags & AMDGPU_GEM_CREATE_UNCACHED; struct amdgpu_vm *vm = mapping->bo_va->base.vm; unsigned int mtype_local, mtype; + uint32_t gc_ip_version = amdgpu_ip_version(adev, GC_HWIP, 0); bool snoop = false; bool is_local; dma_resv_assert_held(bo->tbo.base.resv); - switch (amdgpu_ip_version(adev, GC_HWIP, 0)) { + switch (gc_ip_version) { case IP_VERSION(9, 4, 1): case IP_VERSION(9, 4, 2): if (is_vram) { @@ -1157,10 +1163,7 @@ static void gmc_v9_0_get_coherence_flags(struct amdgpu_device *adev, /* FIXME: is this still needed? Or does * amdgpu_ttm_tt_pde_flags already handle this? */ - if ((amdgpu_ip_version(adev, GC_HWIP, 0) == - IP_VERSION(9, 4, 2) || - amdgpu_ip_version(adev, GC_HWIP, 0) == - IP_VERSION(9, 4, 3)) && + if (gc_ip_version == IP_VERSION(9, 4, 2) && adev->gmc.xgmi.connected_to_cpu) snoop = true; } else { @@ -1184,6 +1187,7 @@ static void gmc_v9_0_get_coherence_flags(struct amdgpu_device *adev, break; case IP_VERSION(9, 4, 3): case IP_VERSION(9, 4, 4): + case IP_VERSION(9, 5, 0): /* Only local VRAM BOs or system memory on non-NUMA APUs * can be assumed to be local in their entirety. Choose * MTYPE_NC as safe fallback for all system memory BOs on @@ -1208,7 +1212,7 @@ static void gmc_v9_0_get_coherence_flags(struct amdgpu_device *adev, if (uncached) { mtype = MTYPE_UC; } else if (ext_coherent) { - if (adev->rev_id) + if (gc_ip_version == IP_VERSION(9, 5, 0) || adev->rev_id) mtype = is_local ? MTYPE_CC : MTYPE_UC; else mtype = MTYPE_UC; @@ -1218,10 +1222,10 @@ static void gmc_v9_0_get_coherence_flags(struct amdgpu_device *adev, /* dGPU */ if (is_local) mtype = mtype_local; - else if (is_vram) - mtype = MTYPE_NC; - else + else if (gc_ip_version < IP_VERSION(9, 5, 0) && !is_vram) mtype = MTYPE_UC; + else + mtype = MTYPE_NC; } break; @@ -1275,7 +1279,8 @@ static void gmc_v9_0_override_vm_pte_flags(struct amdgpu_device *adev, * memory can use more efficient MTYPEs. */ if (amdgpu_ip_version(adev, GC_HWIP, 0) != IP_VERSION(9, 4, 3) && - amdgpu_ip_version(adev, GC_HWIP, 0) != IP_VERSION(9, 4, 4)) + amdgpu_ip_version(adev, GC_HWIP, 0) != IP_VERSION(9, 4, 4) && + amdgpu_ip_version(adev, GC_HWIP, 0) != IP_VERSION(9, 5, 0)) return; /* Only direct-mapped memory allows us to determine the NUMA node from @@ -1540,6 +1545,7 @@ static void gmc_v9_0_set_mmhub_ras_funcs(struct amdgpu_device *adev) adev->mmhub.ras = &mmhub_v1_7_ras; break; case IP_VERSION(1, 8, 0): + case IP_VERSION(1, 8, 1): adev->mmhub.ras = &mmhub_v1_8_ras; break; default: @@ -1551,7 +1557,8 @@ static void gmc_v9_0_set_mmhub_ras_funcs(struct amdgpu_device *adev) static void gmc_v9_0_set_gfxhub_funcs(struct amdgpu_device *adev) { if (amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(9, 4, 3) || - amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(9, 4, 4)) + amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(9, 4, 4) || + amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(9, 5, 0)) adev->gfxhub.funcs = &gfxhub_v1_2_funcs; else adev->gfxhub.funcs = &gfxhub_v1_0_funcs; @@ -1619,7 +1626,8 @@ static int gmc_v9_0_early_init(struct amdgpu_ip_block *ip_block) if (amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(9, 4, 0) || amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(9, 4, 1) || amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(9, 4, 3) || - amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(9, 4, 4)) + amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(9, 4, 4) || + amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(9, 5, 0)) adev->gmc.xgmi.supported = true; if (amdgpu_ip_version(adev, XGMI_HWIP, 0) == IP_VERSION(6, 1, 0)) { @@ -1792,6 +1800,7 @@ static int gmc_v9_0_mc_init(struct amdgpu_device *adev) case IP_VERSION(9, 4, 2): case IP_VERSION(9, 4, 3): case IP_VERSION(9, 4, 4): + case IP_VERSION(9, 5, 0): default: adev->gmc.gart_size = 512ULL << 20; break; @@ -2070,7 +2079,8 @@ static int gmc_v9_0_sw_init(struct amdgpu_ip_block *ip_block) spin_lock_init(&adev->gmc.invalidate_lock); if (amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(9, 4, 3) || - amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(9, 4, 4)) { + amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(9, 4, 4) || + amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(9, 5, 0)) { gmc_v9_4_3_init_vram_info(adev); } else if (!adev->bios) { if (adev->flags & AMD_IS_APU) { @@ -2154,6 +2164,7 @@ static int gmc_v9_0_sw_init(struct amdgpu_ip_block *ip_block) break; case IP_VERSION(9, 4, 3): case IP_VERSION(9, 4, 4): + case IP_VERSION(9, 5, 0): bitmap_set(adev->vmhubs_mask, AMDGPU_GFXHUB(0), NUM_XCC(adev->gfx.xcc_mask)); @@ -2220,7 +2231,8 @@ static int gmc_v9_0_sw_init(struct amdgpu_ip_block *ip_block) amdgpu_gmc_get_vbios_allocations(adev); if (amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(9, 4, 3) || - amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(9, 4, 4)) { + amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(9, 4, 4) || + amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(9, 5, 0)) { r = gmc_v9_0_init_mem_ranges(adev); if (r) return r; @@ -2250,7 +2262,8 @@ static int gmc_v9_0_sw_init(struct amdgpu_ip_block *ip_block) (amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(9, 4, 1) || amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(9, 4, 2) || amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(9, 4, 3) || - amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(9, 4, 4)) ? + amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(9, 4, 4) || + amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(9, 5, 0)) ? 3 : 8; @@ -2263,7 +2276,8 @@ static int gmc_v9_0_sw_init(struct amdgpu_ip_block *ip_block) return r; if (amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(9, 4, 3) || - amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(9, 4, 4)) + amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(9, 4, 4) || + amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(9, 5, 0)) amdgpu_gmc_sysfs_init(adev); return 0; @@ -2274,7 +2288,8 @@ static int gmc_v9_0_sw_fini(struct amdgpu_ip_block *ip_block) struct amdgpu_device *adev = ip_block->adev; if (amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(9, 4, 3) || - amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(9, 4, 4)) + amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(9, 4, 4) || + amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(9, 5, 0)) amdgpu_gmc_sysfs_fini(adev); amdgpu_gmc_ras_fini(adev); @@ -2544,10 +2559,10 @@ static int gmc_v9_0_soft_reset(struct amdgpu_ip_block *ip_block) return 0; } -static int gmc_v9_0_set_clockgating_state(void *handle, +static int gmc_v9_0_set_clockgating_state(struct amdgpu_ip_block *ip_block, enum amd_clockgating_state state) { - struct amdgpu_device *adev = (struct amdgpu_device *)handle; + struct amdgpu_device *adev = ip_block->adev; adev->mmhub.funcs->set_clockgating(adev, state); @@ -2565,7 +2580,7 @@ static void gmc_v9_0_get_clockgating_state(void *handle, u64 *flags) athub_v1_0_get_clockgating(adev, flags); } -static int gmc_v9_0_set_powergating_state(void *handle, +static int gmc_v9_0_set_powergating_state(struct amdgpu_ip_block *ip_block, enum amd_powergating_state state) { return 0; diff --git a/drivers/gpu/drm/amd/amdgpu/iceland_ih.c b/drivers/gpu/drm/amd/amdgpu/iceland_ih.c index 7f45e93c0397b304ddaff74012ec936900650f58..8ac3d32822684ac401ff6c85ba713066e847bcc3 100644 --- a/drivers/gpu/drm/amd/amdgpu/iceland_ih.c +++ b/drivers/gpu/drm/amd/amdgpu/iceland_ih.c @@ -392,13 +392,13 @@ static int iceland_ih_soft_reset(struct amdgpu_ip_block *ip_block) return 0; } -static int iceland_ih_set_clockgating_state(void *handle, +static int iceland_ih_set_clockgating_state(struct amdgpu_ip_block *ip_block, enum amd_clockgating_state state) { return 0; } -static int iceland_ih_set_powergating_state(void *handle, +static int iceland_ih_set_powergating_state(struct amdgpu_ip_block *ip_block, enum amd_powergating_state state) { return 0; diff --git a/drivers/gpu/drm/amd/amdgpu/ih_v6_0.c b/drivers/gpu/drm/amd/amdgpu/ih_v6_0.c index 38f953fd65d9df535aef333ff7d6c49b56e931cc..f8a4851644377b5239689f838158a08cd6c1baf6 100644 --- a/drivers/gpu/drm/amd/amdgpu/ih_v6_0.c +++ b/drivers/gpu/drm/amd/amdgpu/ih_v6_0.c @@ -693,10 +693,10 @@ static void ih_v6_0_update_clockgating_state(struct amdgpu_device *adev, } } -static int ih_v6_0_set_clockgating_state(void *handle, +static int ih_v6_0_set_clockgating_state(struct amdgpu_ip_block *ip_block, enum amd_clockgating_state state) { - struct amdgpu_device *adev = (struct amdgpu_device *)handle; + struct amdgpu_device *adev = ip_block->adev; ih_v6_0_update_clockgating_state(adev, state == AMD_CG_STATE_GATE); @@ -756,10 +756,10 @@ static void ih_v6_0_update_ih_mem_power_gating(struct amdgpu_device *adev, WREG32_SOC15(OSSSYS, 0, regIH_MEM_POWER_CTRL, ih_mem_pwr_cntl); } -static int ih_v6_0_set_powergating_state(void *handle, +static int ih_v6_0_set_powergating_state(struct amdgpu_ip_block *ip_block, enum amd_powergating_state state) { - struct amdgpu_device *adev = (struct amdgpu_device *)handle; + struct amdgpu_device *adev = ip_block->adev; bool enable = (state == AMD_PG_STATE_GATE); if (adev->pg_flags & AMD_PG_SUPPORT_IH_SRAM_PG) diff --git a/drivers/gpu/drm/amd/amdgpu/ih_v6_1.c b/drivers/gpu/drm/amd/amdgpu/ih_v6_1.c index 61381e0c379514473634d5578037c5b49da6fe13..dd0042efceec3c07e734287d30ef818ae02c62cd 100644 --- a/drivers/gpu/drm/amd/amdgpu/ih_v6_1.c +++ b/drivers/gpu/drm/amd/amdgpu/ih_v6_1.c @@ -674,10 +674,10 @@ static void ih_v6_1_update_clockgating_state(struct amdgpu_device *adev, return; } -static int ih_v6_1_set_clockgating_state(void *handle, +static int ih_v6_1_set_clockgating_state(struct amdgpu_ip_block *ip_block, enum amd_clockgating_state state) { - struct amdgpu_device *adev = (struct amdgpu_device *)handle; + struct amdgpu_device *adev = ip_block->adev; ih_v6_1_update_clockgating_state(adev, state == AMD_CG_STATE_GATE); @@ -737,10 +737,10 @@ static void ih_v6_1_update_ih_mem_power_gating(struct amdgpu_device *adev, WREG32_SOC15(OSSSYS, 0, regIH_MEM_POWER_CTRL, ih_mem_pwr_cntl); } -static int ih_v6_1_set_powergating_state(void *handle, +static int ih_v6_1_set_powergating_state(struct amdgpu_ip_block *ip_block, enum amd_powergating_state state) { - struct amdgpu_device *adev = (struct amdgpu_device *)handle; + struct amdgpu_device *adev = ip_block->adev; bool enable = (state == AMD_PG_STATE_GATE); if (adev->pg_flags & AMD_PG_SUPPORT_IH_SRAM_PG) diff --git a/drivers/gpu/drm/amd/amdgpu/ih_v7_0.c b/drivers/gpu/drm/amd/amdgpu/ih_v7_0.c index d2428cf5d3858b1dd0fde8627a575d6ee7e13145..8f9b15c171f36a35fad76ded4b2bf28ed5f88720 100644 --- a/drivers/gpu/drm/amd/amdgpu/ih_v7_0.c +++ b/drivers/gpu/drm/amd/amdgpu/ih_v7_0.c @@ -664,10 +664,10 @@ static void ih_v7_0_update_clockgating_state(struct amdgpu_device *adev, return; } -static int ih_v7_0_set_clockgating_state(void *handle, +static int ih_v7_0_set_clockgating_state(struct amdgpu_ip_block *ip_block, enum amd_clockgating_state state) { - struct amdgpu_device *adev = (struct amdgpu_device *)handle; + struct amdgpu_device *adev = ip_block->adev; ih_v7_0_update_clockgating_state(adev, state == AMD_CG_STATE_GATE); @@ -727,10 +727,10 @@ static void ih_v7_0_update_ih_mem_power_gating(struct amdgpu_device *adev, WREG32_SOC15(OSSSYS, 0, regIH_MEM_POWER_CTRL, ih_mem_pwr_cntl); } -static int ih_v7_0_set_powergating_state(void *handle, +static int ih_v7_0_set_powergating_state(struct amdgpu_ip_block *ip_block, enum amd_powergating_state state) { - struct amdgpu_device *adev = (struct amdgpu_device *)handle; + struct amdgpu_device *adev = ip_block->adev; bool enable = (state == AMD_PG_STATE_GATE); if (adev->pg_flags & AMD_PG_SUPPORT_IH_SRAM_PG) diff --git a/drivers/gpu/drm/amd/amdgpu/imu_v11_0.c b/drivers/gpu/drm/amd/amdgpu/imu_v11_0.c index d4f72e47ae9e2082ae9041ad050b6fa559b00092..aeca5c08ea2f2bd820149391df37bffccdfb4829 100644 --- a/drivers/gpu/drm/amd/amdgpu/imu_v11_0.c +++ b/drivers/gpu/drm/amd/amdgpu/imu_v11_0.c @@ -50,7 +50,8 @@ static int imu_v11_0_init_microcode(struct amdgpu_device *adev) DRM_DEBUG("\n"); amdgpu_ucode_ip_version_decode(adev, GC_HWIP, ucode_prefix, sizeof(ucode_prefix)); - err = amdgpu_ucode_request(adev, &adev->gfx.imu_fw, "amdgpu/%s_imu.bin", ucode_prefix); + err = amdgpu_ucode_request(adev, &adev->gfx.imu_fw, AMDGPU_UCODE_REQUIRED, + "amdgpu/%s_imu.bin", ucode_prefix); if (err) goto out; diff --git a/drivers/gpu/drm/amd/amdgpu/imu_v12_0.c b/drivers/gpu/drm/amd/amdgpu/imu_v12_0.c index 1341f02920314e60d766e3df127c4236e32604a3..df898dbb746e3f848378c5b96c261afba97fe977 100644 --- a/drivers/gpu/drm/amd/amdgpu/imu_v12_0.c +++ b/drivers/gpu/drm/amd/amdgpu/imu_v12_0.c @@ -47,7 +47,8 @@ static int imu_v12_0_init_microcode(struct amdgpu_device *adev) DRM_DEBUG("\n"); amdgpu_ucode_ip_version_decode(adev, GC_HWIP, ucode_prefix, sizeof(ucode_prefix)); - err = amdgpu_ucode_request(adev, &adev->gfx.imu_fw, "amdgpu/%s_imu.bin", ucode_prefix); + err = amdgpu_ucode_request(adev, &adev->gfx.imu_fw, AMDGPU_UCODE_REQUIRED, + "amdgpu/%s_imu.bin", ucode_prefix); if (err) goto out; diff --git a/drivers/gpu/drm/amd/amdgpu/jpeg_v2_0.c b/drivers/gpu/drm/amd/amdgpu/jpeg_v2_0.c index 6e29b69894a57de6b49bd6f5b613caea04990afb..7c9251c03815183683506faaced1c5e1dda9c32f 100644 --- a/drivers/gpu/drm/amd/amdgpu/jpeg_v2_0.c +++ b/drivers/gpu/drm/amd/amdgpu/jpeg_v2_0.c @@ -35,7 +35,7 @@ static void jpeg_v2_0_set_dec_ring_funcs(struct amdgpu_device *adev); static void jpeg_v2_0_set_irq_funcs(struct amdgpu_device *adev); -static int jpeg_v2_0_set_powergating_state(void *handle, +static int jpeg_v2_0_set_powergating_state(struct amdgpu_ip_block *ip_block, enum amd_powergating_state state); /** @@ -154,7 +154,7 @@ static int jpeg_v2_0_hw_fini(struct amdgpu_ip_block *ip_block) if (adev->jpeg.cur_state != AMD_PG_STATE_GATE && RREG32_SOC15(JPEG, 0, mmUVD_JRBC_STATUS)) - jpeg_v2_0_set_powergating_state(adev, AMD_PG_STATE_GATE); + jpeg_v2_0_set_powergating_state(ip_block, AMD_PG_STATE_GATE); return 0; } @@ -675,14 +675,14 @@ static int jpeg_v2_0_wait_for_idle(struct amdgpu_ip_block *ip_block) return ret; } -static int jpeg_v2_0_set_clockgating_state(void *handle, +static int jpeg_v2_0_set_clockgating_state(struct amdgpu_ip_block *ip_block, enum amd_clockgating_state state) { - struct amdgpu_device *adev = (struct amdgpu_device *)handle; + struct amdgpu_device *adev = ip_block->adev; bool enable = (state == AMD_CG_STATE_GATE); if (enable) { - if (!jpeg_v2_0_is_idle(handle)) + if (!jpeg_v2_0_is_idle(adev)) return -EBUSY; jpeg_v2_0_enable_clock_gating(adev); } else { @@ -692,10 +692,10 @@ static int jpeg_v2_0_set_clockgating_state(void *handle, return 0; } -static int jpeg_v2_0_set_powergating_state(void *handle, +static int jpeg_v2_0_set_powergating_state(struct amdgpu_ip_block *ip_block, enum amd_powergating_state state) { - struct amdgpu_device *adev = (struct amdgpu_device *)handle; + struct amdgpu_device *adev = ip_block->adev; int ret; if (state == adev->jpeg.cur_state) diff --git a/drivers/gpu/drm/amd/amdgpu/jpeg_v2_5.c b/drivers/gpu/drm/amd/amdgpu/jpeg_v2_5.c index 9ac421486f05fde3d1b5bca4d8e523c0da2887d4..11f6af2646e760a6654f467babcd236b9cf17c6c 100644 --- a/drivers/gpu/drm/amd/amdgpu/jpeg_v2_5.c +++ b/drivers/gpu/drm/amd/amdgpu/jpeg_v2_5.c @@ -38,7 +38,7 @@ static void jpeg_v2_5_set_dec_ring_funcs(struct amdgpu_device *adev); static void jpeg_v2_5_set_irq_funcs(struct amdgpu_device *adev); -static int jpeg_v2_5_set_powergating_state(void *handle, +static int jpeg_v2_5_set_powergating_state(struct amdgpu_ip_block *ip_block, enum amd_powergating_state state); static void jpeg_v2_5_set_ras_funcs(struct amdgpu_device *adev); @@ -219,7 +219,7 @@ static int jpeg_v2_5_hw_fini(struct amdgpu_ip_block *ip_block) if (adev->jpeg.cur_state != AMD_PG_STATE_GATE && RREG32_SOC15(JPEG, i, mmUVD_JRBC_STATUS)) - jpeg_v2_5_set_powergating_state(adev, AMD_PG_STATE_GATE); + jpeg_v2_5_set_powergating_state(ip_block, AMD_PG_STATE_GATE); if (amdgpu_ras_is_supported(adev, AMDGPU_RAS_BLOCK__JPEG)) amdgpu_irq_put(adev, &adev->jpeg.inst[i].ras_poison_irq, 0); @@ -518,10 +518,10 @@ static int jpeg_v2_5_wait_for_idle(struct amdgpu_ip_block *ip_block) return 0; } -static int jpeg_v2_5_set_clockgating_state(void *handle, +static int jpeg_v2_5_set_clockgating_state(struct amdgpu_ip_block *ip_block, enum amd_clockgating_state state) { - struct amdgpu_device *adev = (struct amdgpu_device *)handle; + struct amdgpu_device *adev = ip_block->adev; bool enable = (state == AMD_CG_STATE_GATE); int i; @@ -530,7 +530,7 @@ static int jpeg_v2_5_set_clockgating_state(void *handle, continue; if (enable) { - if (!jpeg_v2_5_is_idle(handle)) + if (!jpeg_v2_5_is_idle(adev)) return -EBUSY; jpeg_v2_5_enable_clock_gating(adev, i); } else { @@ -541,10 +541,10 @@ static int jpeg_v2_5_set_clockgating_state(void *handle, return 0; } -static int jpeg_v2_5_set_powergating_state(void *handle, +static int jpeg_v2_5_set_powergating_state(struct amdgpu_ip_block *ip_block, enum amd_powergating_state state) { - struct amdgpu_device *adev = (struct amdgpu_device *)handle; + struct amdgpu_device *adev = ip_block->adev; int ret; if (state == adev->jpeg.cur_state) diff --git a/drivers/gpu/drm/amd/amdgpu/jpeg_v3_0.c b/drivers/gpu/drm/amd/amdgpu/jpeg_v3_0.c index e0df6800502ca8e7ac4f3497a46c9a7543e477d4..4eca65ea9053b81e641fac9856cd03ca48eec5cf 100644 --- a/drivers/gpu/drm/amd/amdgpu/jpeg_v3_0.c +++ b/drivers/gpu/drm/amd/amdgpu/jpeg_v3_0.c @@ -36,7 +36,7 @@ static void jpeg_v3_0_set_dec_ring_funcs(struct amdgpu_device *adev); static void jpeg_v3_0_set_irq_funcs(struct amdgpu_device *adev); -static int jpeg_v3_0_set_powergating_state(void *handle, +static int jpeg_v3_0_set_powergating_state(struct amdgpu_ip_block *ip_block, enum amd_powergating_state state); /** @@ -168,7 +168,7 @@ static int jpeg_v3_0_hw_fini(struct amdgpu_ip_block *ip_block) if (adev->jpeg.cur_state != AMD_PG_STATE_GATE && RREG32_SOC15(JPEG, 0, mmUVD_JRBC_STATUS)) - jpeg_v3_0_set_powergating_state(adev, AMD_PG_STATE_GATE); + jpeg_v3_0_set_powergating_state(ip_block, AMD_PG_STATE_GATE); return 0; } @@ -466,14 +466,14 @@ static int jpeg_v3_0_wait_for_idle(struct amdgpu_ip_block *ip_block) UVD_JRBC_STATUS__RB_JOB_DONE_MASK); } -static int jpeg_v3_0_set_clockgating_state(void *handle, +static int jpeg_v3_0_set_clockgating_state(struct amdgpu_ip_block *ip_block, enum amd_clockgating_state state) { - struct amdgpu_device *adev = (struct amdgpu_device *)handle; + struct amdgpu_device *adev = ip_block->adev; bool enable = state == AMD_CG_STATE_GATE; if (enable) { - if (!jpeg_v3_0_is_idle(handle)) + if (!jpeg_v3_0_is_idle(adev)) return -EBUSY; jpeg_v3_0_enable_clock_gating(adev); } else { @@ -483,10 +483,10 @@ static int jpeg_v3_0_set_clockgating_state(void *handle, return 0; } -static int jpeg_v3_0_set_powergating_state(void *handle, +static int jpeg_v3_0_set_powergating_state(struct amdgpu_ip_block *ip_block, enum amd_powergating_state state) { - struct amdgpu_device *adev = (struct amdgpu_device *)handle; + struct amdgpu_device *adev = ip_block->adev; int ret; if(state == adev->jpeg.cur_state) diff --git a/drivers/gpu/drm/amd/amdgpu/jpeg_v4_0.c b/drivers/gpu/drm/amd/amdgpu/jpeg_v4_0.c index eca1963c33b6bf5f109bc965f289ffbfa85854d4..0aef1f64afd02c39d0755218fbbd2f344cd98599 100644 --- a/drivers/gpu/drm/amd/amdgpu/jpeg_v4_0.c +++ b/drivers/gpu/drm/amd/amdgpu/jpeg_v4_0.c @@ -39,7 +39,7 @@ static int jpeg_v4_0_start_sriov(struct amdgpu_device *adev); static void jpeg_v4_0_set_dec_ring_funcs(struct amdgpu_device *adev); static void jpeg_v4_0_set_irq_funcs(struct amdgpu_device *adev); -static int jpeg_v4_0_set_powergating_state(void *handle, +static int jpeg_v4_0_set_powergating_state(struct amdgpu_ip_block *ip_block, enum amd_powergating_state state); static void jpeg_v4_0_set_ras_funcs(struct amdgpu_device *adev); @@ -206,7 +206,7 @@ static int jpeg_v4_0_hw_fini(struct amdgpu_ip_block *ip_block) if (!amdgpu_sriov_vf(adev)) { if (adev->jpeg.cur_state != AMD_PG_STATE_GATE && RREG32_SOC15(JPEG, 0, regUVD_JRBC_STATUS)) - jpeg_v4_0_set_powergating_state(adev, AMD_PG_STATE_GATE); + jpeg_v4_0_set_powergating_state(ip_block, AMD_PG_STATE_GATE); } if (amdgpu_ras_is_supported(adev, AMDGPU_RAS_BLOCK__JPEG)) amdgpu_irq_put(adev, &adev->jpeg.inst->ras_poison_irq, 0); @@ -635,14 +635,14 @@ static int jpeg_v4_0_wait_for_idle(struct amdgpu_ip_block *ip_block) UVD_JRBC_STATUS__RB_JOB_DONE_MASK); } -static int jpeg_v4_0_set_clockgating_state(void *handle, +static int jpeg_v4_0_set_clockgating_state(struct amdgpu_ip_block *ip_block, enum amd_clockgating_state state) { - struct amdgpu_device *adev = (struct amdgpu_device *)handle; + struct amdgpu_device *adev = ip_block->adev; bool enable = state == AMD_CG_STATE_GATE; if (enable) { - if (!jpeg_v4_0_is_idle(handle)) + if (!jpeg_v4_0_is_idle(adev)) return -EBUSY; jpeg_v4_0_enable_clock_gating(adev); } else { @@ -652,10 +652,10 @@ static int jpeg_v4_0_set_clockgating_state(void *handle, return 0; } -static int jpeg_v4_0_set_powergating_state(void *handle, +static int jpeg_v4_0_set_powergating_state(struct amdgpu_ip_block *ip_block, enum amd_powergating_state state) { - struct amdgpu_device *adev = (struct amdgpu_device *)handle; + struct amdgpu_device *adev = ip_block->adev; int ret; if (amdgpu_sriov_vf(adev)) { diff --git a/drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_3.c b/drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_3.c index 67b51bcbacd19772903ea5b189269ec1d2986a20..88f9771c16869c7c631325163c5e987cb6f0149e 100644 --- a/drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_3.c +++ b/drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_3.c @@ -43,7 +43,7 @@ enum jpeg_engin_status { static void jpeg_v4_0_3_set_dec_ring_funcs(struct amdgpu_device *adev); static void jpeg_v4_0_3_set_irq_funcs(struct amdgpu_device *adev); -static int jpeg_v4_0_3_set_powergating_state(void *handle, +static int jpeg_v4_0_3_set_powergating_state(struct amdgpu_ip_block *ip_block, enum amd_powergating_state state); static void jpeg_v4_0_3_set_ras_funcs(struct amdgpu_device *adev); static void jpeg_v4_0_3_dec_ring_set_wptr(struct amdgpu_ring *ring); @@ -76,7 +76,7 @@ static int jpeg_v4_0_3_early_init(struct amdgpu_ip_block *ip_block) { struct amdgpu_device *adev = ip_block->adev; - adev->jpeg.num_jpeg_rings = AMDGPU_MAX_JPEG_RINGS; + adev->jpeg.num_jpeg_rings = AMDGPU_MAX_JPEG_RINGS_4_0_3; jpeg_v4_0_3_set_dec_ring_funcs(adev); jpeg_v4_0_3_set_irq_funcs(adev); @@ -321,7 +321,7 @@ static int jpeg_v4_0_3_hw_init(struct amdgpu_ip_block *ip_block) if (r) return r; - for (i = 0; i < adev->vcn.num_vcn_inst; ++i) { + for (i = 0; i < adev->jpeg.num_jpeg_inst; ++i) { for (j = 0; j < adev->jpeg.num_jpeg_rings; ++j) { ring = &adev->jpeg.inst[i].ring_dec[j]; ring->wptr = 0; @@ -379,7 +379,7 @@ static int jpeg_v4_0_3_hw_fini(struct amdgpu_ip_block *ip_block) if (!amdgpu_sriov_vf(adev)) { if (adev->jpeg.cur_state != AMD_PG_STATE_GATE) - ret = jpeg_v4_0_3_set_powergating_state(adev, AMD_PG_STATE_GATE); + ret = jpeg_v4_0_3_set_powergating_state(ip_block, AMD_PG_STATE_GATE); } return ret; @@ -949,16 +949,16 @@ static int jpeg_v4_0_3_wait_for_idle(struct amdgpu_ip_block *ip_block) return ret; } -static int jpeg_v4_0_3_set_clockgating_state(void *handle, +static int jpeg_v4_0_3_set_clockgating_state(struct amdgpu_ip_block *ip_block, enum amd_clockgating_state state) { - struct amdgpu_device *adev = (struct amdgpu_device *)handle; + struct amdgpu_device *adev = ip_block->adev; bool enable = state == AMD_CG_STATE_GATE; int i; for (i = 0; i < adev->jpeg.num_jpeg_inst; ++i) { if (enable) { - if (!jpeg_v4_0_3_is_idle(handle)) + if (!jpeg_v4_0_3_is_idle(adev)) return -EBUSY; jpeg_v4_0_3_enable_clock_gating(adev, i); } else { @@ -968,10 +968,10 @@ static int jpeg_v4_0_3_set_clockgating_state(void *handle, return 0; } -static int jpeg_v4_0_3_set_powergating_state(void *handle, +static int jpeg_v4_0_3_set_powergating_state(struct amdgpu_ip_block *ip_block, enum amd_powergating_state state) { - struct amdgpu_device *adev = (struct amdgpu_device *)handle; + struct amdgpu_device *adev = ip_block->adev; int ret; if (amdgpu_sriov_vf(adev)) { @@ -1231,9 +1231,95 @@ static const struct amdgpu_ras_block_hw_ops jpeg_v4_0_3_ras_hw_ops = { .reset_ras_error_count = jpeg_v4_0_3_reset_ras_error_count, }; +static int jpeg_v4_0_3_aca_bank_parser(struct aca_handle *handle, struct aca_bank *bank, + enum aca_smu_type type, void *data) +{ + struct aca_bank_info info; + u64 misc0; + int ret; + + ret = aca_bank_info_decode(bank, &info); + if (ret) + return ret; + + misc0 = bank->regs[ACA_REG_IDX_MISC0]; + switch (type) { + case ACA_SMU_TYPE_UE: + ret = aca_error_cache_log_bank_error(handle, &info, ACA_ERROR_TYPE_UE, + 1ULL); + break; + case ACA_SMU_TYPE_CE: + ret = aca_error_cache_log_bank_error(handle, &info, ACA_ERROR_TYPE_CE, + ACA_REG__MISC0__ERRCNT(misc0)); + break; + default: + return -EINVAL; + } + + return ret; +} + +/* reference to smu driver if header file */ +static int jpeg_v4_0_3_err_codes[] = { + 16, 17, 18, 19, 20, 21, 22, 23, /* JPEG[0-7][S|D] */ + 24, 25, 26, 27, 28, 29, 30, 31 +}; + +static bool jpeg_v4_0_3_aca_bank_is_valid(struct aca_handle *handle, struct aca_bank *bank, + enum aca_smu_type type, void *data) +{ + u32 instlo; + + instlo = ACA_REG__IPID__INSTANCEIDLO(bank->regs[ACA_REG_IDX_IPID]); + instlo &= GENMASK(31, 1); + + if (instlo != mmSMNAID_AID0_MCA_SMU) + return false; + + if (aca_bank_check_error_codes(handle->adev, bank, + jpeg_v4_0_3_err_codes, + ARRAY_SIZE(jpeg_v4_0_3_err_codes))) + return false; + + return true; +} + +static const struct aca_bank_ops jpeg_v4_0_3_aca_bank_ops = { + .aca_bank_parser = jpeg_v4_0_3_aca_bank_parser, + .aca_bank_is_valid = jpeg_v4_0_3_aca_bank_is_valid, +}; + +static const struct aca_info jpeg_v4_0_3_aca_info = { + .hwip = ACA_HWIP_TYPE_SMU, + .mask = ACA_ERROR_UE_MASK, + .bank_ops = &jpeg_v4_0_3_aca_bank_ops, +}; + +static int jpeg_v4_0_3_ras_late_init(struct amdgpu_device *adev, struct ras_common_if *ras_block) +{ + int r; + + r = amdgpu_ras_block_late_init(adev, ras_block); + if (r) + return r; + + r = amdgpu_ras_bind_aca(adev, AMDGPU_RAS_BLOCK__JPEG, + &jpeg_v4_0_3_aca_info, NULL); + if (r) + goto late_fini; + + return 0; + +late_fini: + amdgpu_ras_block_late_fini(adev, ras_block); + + return r; +} + static struct amdgpu_jpeg_ras jpeg_v4_0_3_ras = { .ras_block = { .hw_ops = &jpeg_v4_0_3_ras_hw_ops, + .ras_late_init = jpeg_v4_0_3_ras_late_init, }, }; diff --git a/drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_5.c b/drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_5.c index 1d9e3b101c3a3225bdb8f181d88746adafc17372..6b3656984957346ba6991c3a602dcb7e50fbbd5c 100644 --- a/drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_5.c +++ b/drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_5.c @@ -48,7 +48,7 @@ static void jpeg_v4_0_5_set_dec_ring_funcs(struct amdgpu_device *adev); static void jpeg_v4_0_5_set_irq_funcs(struct amdgpu_device *adev); -static int jpeg_v4_0_5_set_powergating_state(void *handle, +static int jpeg_v4_0_5_set_powergating_state(struct amdgpu_ip_block *ip_block, enum amd_powergating_state state); static void jpeg_v4_0_5_dec_ring_set_wptr(struct amdgpu_ring *ring); @@ -236,7 +236,7 @@ static int jpeg_v4_0_5_hw_fini(struct amdgpu_ip_block *ip_block) if (!amdgpu_sriov_vf(adev)) { if (adev->jpeg.cur_state != AMD_PG_STATE_GATE && RREG32_SOC15(JPEG, i, regUVD_JRBC_STATUS)) - jpeg_v4_0_5_set_powergating_state(adev, AMD_PG_STATE_GATE); + jpeg_v4_0_5_set_powergating_state(ip_block, AMD_PG_STATE_GATE); } } return 0; @@ -660,10 +660,10 @@ static int jpeg_v4_0_5_wait_for_idle(struct amdgpu_ip_block *ip_block) return 0; } -static int jpeg_v4_0_5_set_clockgating_state(void *handle, +static int jpeg_v4_0_5_set_clockgating_state(struct amdgpu_ip_block *ip_block, enum amd_clockgating_state state) { - struct amdgpu_device *adev = (struct amdgpu_device *)handle; + struct amdgpu_device *adev = ip_block->adev; bool enable = (state == AMD_CG_STATE_GATE) ? true : false; int i; @@ -672,7 +672,7 @@ static int jpeg_v4_0_5_set_clockgating_state(void *handle, continue; if (enable) { - if (!jpeg_v4_0_5_is_idle(handle)) + if (!jpeg_v4_0_5_is_idle(adev)) return -EBUSY; jpeg_v4_0_5_enable_clock_gating(adev, i); @@ -684,10 +684,10 @@ static int jpeg_v4_0_5_set_clockgating_state(void *handle, return 0; } -static int jpeg_v4_0_5_set_powergating_state(void *handle, +static int jpeg_v4_0_5_set_powergating_state(struct amdgpu_ip_block *ip_block, enum amd_powergating_state state) { - struct amdgpu_device *adev = (struct amdgpu_device *)handle; + struct amdgpu_device *adev = ip_block->adev; int ret; if (amdgpu_sriov_vf(adev)) { diff --git a/drivers/gpu/drm/amd/amdgpu/jpeg_v5_0_0.c b/drivers/gpu/drm/amd/amdgpu/jpeg_v5_0_0.c index 58fb1e5fa89c481fc5411189762f88bd1b347c81..d5cf0f2799d44827db12ae164d969e9ebaa11d1c 100644 --- a/drivers/gpu/drm/amd/amdgpu/jpeg_v5_0_0.c +++ b/drivers/gpu/drm/amd/amdgpu/jpeg_v5_0_0.c @@ -31,12 +31,12 @@ #include "vcn/vcn_5_0_0_offset.h" #include "vcn/vcn_5_0_0_sh_mask.h" -#include "ivsrcid/vcn/irqsrcs_vcn_4_0.h" +#include "ivsrcid/vcn/irqsrcs_vcn_5_0.h" #include "jpeg_v5_0_0.h" static void jpeg_v5_0_0_set_dec_ring_funcs(struct amdgpu_device *adev); static void jpeg_v5_0_0_set_irq_funcs(struct amdgpu_device *adev); -static int jpeg_v5_0_0_set_powergating_state(void *handle, +static int jpeg_v5_0_0_set_powergating_state(struct amdgpu_ip_block *ip_block, enum amd_powergating_state state); /** @@ -74,7 +74,7 @@ static int jpeg_v5_0_0_sw_init(struct amdgpu_ip_block *ip_block) /* JPEG TRAP */ r = amdgpu_irq_add_id(adev, SOC15_IH_CLIENTID_VCN, - VCN_4_0__SRCID__JPEG_DECODE, &adev->jpeg.inst->irq); + VCN_5_0__SRCID__JPEG_DECODE, &adev->jpeg.inst->irq); if (r) return r; @@ -172,7 +172,7 @@ static int jpeg_v5_0_0_hw_fini(struct amdgpu_ip_block *ip_block) if (adev->jpeg.cur_state != AMD_PG_STATE_GATE && RREG32_SOC15(JPEG, 0, regUVD_JRBC_STATUS)) - jpeg_v5_0_0_set_powergating_state(adev, AMD_PG_STATE_GATE); + jpeg_v5_0_0_set_powergating_state(ip_block, AMD_PG_STATE_GATE); return 0; } @@ -560,14 +560,14 @@ static int jpeg_v5_0_0_wait_for_idle(struct amdgpu_ip_block *ip_block) UVD_JRBC_STATUS__RB_JOB_DONE_MASK); } -static int jpeg_v5_0_0_set_clockgating_state(void *handle, +static int jpeg_v5_0_0_set_clockgating_state(struct amdgpu_ip_block *ip_block, enum amd_clockgating_state state) { - struct amdgpu_device *adev = (struct amdgpu_device *)handle; + struct amdgpu_device *adev = ip_block->adev; bool enable = (state == AMD_CG_STATE_GATE) ? true : false; if (enable) { - if (!jpeg_v5_0_0_is_idle(handle)) + if (!jpeg_v5_0_0_is_idle(adev)) return -EBUSY; jpeg_v5_0_0_enable_clock_gating(adev); } else { @@ -577,10 +577,10 @@ static int jpeg_v5_0_0_set_clockgating_state(void *handle, return 0; } -static int jpeg_v5_0_0_set_powergating_state(void *handle, +static int jpeg_v5_0_0_set_powergating_state(struct amdgpu_ip_block *ip_block, enum amd_powergating_state state) { - struct amdgpu_device *adev = (struct amdgpu_device *)handle; + struct amdgpu_device *adev = ip_block->adev; int ret; if (state == adev->jpeg.cur_state) @@ -612,7 +612,7 @@ static int jpeg_v5_0_0_process_interrupt(struct amdgpu_device *adev, DRM_DEBUG("IH: JPEG TRAP\n"); switch (entry->src_id) { - case VCN_4_0__SRCID__JPEG_DECODE: + case VCN_5_0__SRCID__JPEG_DECODE: amdgpu_fence_process(adev->jpeg.inst->ring_dec); break; default: diff --git a/drivers/gpu/drm/amd/amdgpu/jpeg_v5_0_1.c b/drivers/gpu/drm/amd/amdgpu/jpeg_v5_0_1.c new file mode 100644 index 0000000000000000000000000000000000000000..40d4c32a8c2a6e584e813880f43fd9531a4dcaf4 --- /dev/null +++ b/drivers/gpu/drm/amd/amdgpu/jpeg_v5_0_1.c @@ -0,0 +1,708 @@ +// SPDX-License-Identifier: GPL-2.0 OR MIT +/* + * Copyright 2014-2024 Advanced Micro Devices, Inc. All rights reserved. + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR + * OTHER DEALINGS IN THE SOFTWARE. + */ + +#include "amdgpu.h" +#include "amdgpu_jpeg.h" +#include "amdgpu_pm.h" +#include "soc15.h" +#include "soc15d.h" +#include "jpeg_v4_0_3.h" +#include "jpeg_v5_0_1.h" + +#include "vcn/vcn_5_0_0_offset.h" +#include "vcn/vcn_5_0_0_sh_mask.h" +#include "ivsrcid/vcn/irqsrcs_vcn_5_0.h" + +static void jpeg_v5_0_1_set_dec_ring_funcs(struct amdgpu_device *adev); +static void jpeg_v5_0_1_set_irq_funcs(struct amdgpu_device *adev); +static int jpeg_v5_0_1_set_powergating_state(struct amdgpu_ip_block *ip_block, + enum amd_powergating_state state); +static void jpeg_v5_0_1_dec_ring_set_wptr(struct amdgpu_ring *ring); + +static int amdgpu_ih_srcid_jpeg[] = { + VCN_5_0__SRCID__JPEG_DECODE, + VCN_5_0__SRCID__JPEG1_DECODE, + VCN_5_0__SRCID__JPEG2_DECODE, + VCN_5_0__SRCID__JPEG3_DECODE, + VCN_5_0__SRCID__JPEG4_DECODE, + VCN_5_0__SRCID__JPEG5_DECODE, + VCN_5_0__SRCID__JPEG6_DECODE, + VCN_5_0__SRCID__JPEG7_DECODE, + VCN_5_0__SRCID__JPEG8_DECODE, + VCN_5_0__SRCID__JPEG9_DECODE, +}; + +static int jpeg_v5_0_1_core_reg_offset(u32 pipe) +{ + if (pipe <= AMDGPU_MAX_JPEG_RINGS_4_0_3) + return ((0x40 * pipe) - 0xc80); + else + return ((0x40 * pipe) - 0x440); +} + +/** + * jpeg_v5_0_1_early_init - set function pointers + * + * @ip_block: Pointer to the amdgpu_ip_block for this hw instance. + * + * Set ring and irq function pointers + */ +static int jpeg_v5_0_1_early_init(struct amdgpu_ip_block *ip_block) +{ + struct amdgpu_device *adev = ip_block->adev; + + if (!adev->jpeg.num_jpeg_inst || adev->jpeg.num_jpeg_inst > AMDGPU_MAX_JPEG_INSTANCES) + return -ENOENT; + + adev->jpeg.num_jpeg_rings = AMDGPU_MAX_JPEG_RINGS; + jpeg_v5_0_1_set_dec_ring_funcs(adev); + jpeg_v5_0_1_set_irq_funcs(adev); + + return 0; +} + +/** + * jpeg_v5_0_1_sw_init - sw init for JPEG block + * + * @ip_block: Pointer to the amdgpu_ip_block for this hw instance. + * + * Load firmware and sw initialization + */ +static int jpeg_v5_0_1_sw_init(struct amdgpu_ip_block *ip_block) +{ + struct amdgpu_device *adev = ip_block->adev; + struct amdgpu_ring *ring; + int i, j, r, jpeg_inst; + + for (j = 0; j < adev->jpeg.num_jpeg_rings; ++j) { + /* JPEG TRAP */ + r = amdgpu_irq_add_id(adev, SOC15_IH_CLIENTID_VCN, + amdgpu_ih_srcid_jpeg[j], &adev->jpeg.inst->irq); + if (r) + return r; + } + + r = amdgpu_jpeg_sw_init(adev); + if (r) + return r; + + r = amdgpu_jpeg_resume(adev); + if (r) + return r; + + for (i = 0; i < adev->jpeg.num_jpeg_inst; ++i) { + jpeg_inst = GET_INST(JPEG, i); + + for (j = 0; j < adev->jpeg.num_jpeg_rings; ++j) { + ring = &adev->jpeg.inst[i].ring_dec[j]; + ring->use_doorbell = false; + ring->vm_hub = AMDGPU_MMHUB0(adev->jpeg.inst[i].aid_id); + if (!amdgpu_sriov_vf(adev)) { + ring->doorbell_index = + (adev->doorbell_index.vcn.vcn_ring0_1 << 1) + + 1 + j + 11 * jpeg_inst; + } else { + if (j < 4) + ring->doorbell_index = + (adev->doorbell_index.vcn.vcn_ring0_1 << 1) + + 4 + j + 32 * jpeg_inst; + else + ring->doorbell_index = + (adev->doorbell_index.vcn.vcn_ring0_1 << 1) + + 8 + j + 32 * jpeg_inst; + } + sprintf(ring->name, "jpeg_dec_%d.%d", adev->jpeg.inst[i].aid_id, j); + r = amdgpu_ring_init(adev, ring, 512, &adev->jpeg.inst->irq, 0, + AMDGPU_RING_PRIO_DEFAULT, NULL); + if (r) + return r; + + adev->jpeg.internal.jpeg_pitch[j] = + regUVD_JRBC0_UVD_JRBC_SCRATCH0_INTERNAL_OFFSET; + adev->jpeg.inst[i].external.jpeg_pitch[j] = + SOC15_REG_OFFSET1(JPEG, jpeg_inst, regUVD_JRBC_SCRATCH0, + (j ? jpeg_v5_0_1_core_reg_offset(j) : 0)); + } + } + + return 0; +} + +/** + * jpeg_v5_0_1_sw_fini - sw fini for JPEG block + * + * @ip_block: Pointer to the amdgpu_ip_block for this hw instance. + * + * JPEG suspend and free up sw allocation + */ +static int jpeg_v5_0_1_sw_fini(struct amdgpu_ip_block *ip_block) +{ + struct amdgpu_device *adev = ip_block->adev; + int r; + + r = amdgpu_jpeg_suspend(adev); + if (r) + return r; + + r = amdgpu_jpeg_sw_fini(adev); + + return r; +} + +/** + * jpeg_v5_0_1_hw_init - start and test JPEG block + * + * @ip_block: Pointer to the amdgpu_ip_block for this hw instance. + * + */ +static int jpeg_v5_0_1_hw_init(struct amdgpu_ip_block *ip_block) +{ + struct amdgpu_device *adev = ip_block->adev; + struct amdgpu_ring *ring; + int i, j, r, jpeg_inst; + + if (amdgpu_sriov_vf(adev)) { + /* jpeg_v5_0_1_start_sriov(adev); */ + for (i = 0; i < adev->jpeg.num_jpeg_inst; ++i) { + for (j = 0; j < adev->jpeg.num_jpeg_rings; ++j) { + ring = &adev->jpeg.inst[i].ring_dec[j]; + ring->wptr = 0; + ring->wptr_old = 0; + jpeg_v5_0_1_dec_ring_set_wptr(ring); + ring->sched.ready = true; + } + } + return 0; + } + for (i = 0; i < adev->jpeg.num_jpeg_inst; ++i) { + jpeg_inst = GET_INST(JPEG, i); + ring = adev->jpeg.inst[i].ring_dec; + if (ring->use_doorbell) + adev->nbio.funcs->vcn_doorbell_range(adev, ring->use_doorbell, + (adev->doorbell_index.vcn.vcn_ring0_1 << 1) + 11 * jpeg_inst, + adev->jpeg.inst[i].aid_id); + + for (j = 0; j < adev->jpeg.num_jpeg_rings; ++j) { + ring = &adev->jpeg.inst[i].ring_dec[j]; + if (ring->use_doorbell) + WREG32_SOC15_OFFSET(VCN, GET_INST(VCN, i), regVCN_JPEG_DB_CTRL, + (ring->pipe ? (ring->pipe - 0x15) : 0), + ring->doorbell_index << + VCN_JPEG_DB_CTRL__OFFSET__SHIFT | + VCN_JPEG_DB_CTRL__EN_MASK); + r = amdgpu_ring_test_helper(ring); + if (r) + return r; + } + } + + return 0; +} + +/** + * jpeg_v5_0_1_hw_fini - stop the hardware block + * + * @ip_block: Pointer to the amdgpu_ip_block for this hw instance. + * + * Stop the JPEG block, mark ring as not ready any more + */ +static int jpeg_v5_0_1_hw_fini(struct amdgpu_ip_block *ip_block) +{ + struct amdgpu_device *adev = ip_block->adev; + int ret = 0; + + cancel_delayed_work_sync(&adev->jpeg.idle_work); + + if (adev->jpeg.cur_state != AMD_PG_STATE_GATE) + ret = jpeg_v5_0_1_set_powergating_state(ip_block, AMD_PG_STATE_GATE); + + return ret; +} + +/** + * jpeg_v5_0_1_suspend - suspend JPEG block + * + * @ip_block: Pointer to the amdgpu_ip_block for this hw instance. + * + * HW fini and suspend JPEG block + */ +static int jpeg_v5_0_1_suspend(struct amdgpu_ip_block *ip_block) +{ + struct amdgpu_device *adev = ip_block->adev; + int r; + + r = jpeg_v5_0_1_hw_fini(ip_block); + if (r) + return r; + + r = amdgpu_jpeg_suspend(adev); + + return r; +} + +/** + * jpeg_v5_0_1_resume - resume JPEG block + * + * @ip_block: Pointer to the amdgpu_ip_block for this hw instance. + * + * Resume firmware and hw init JPEG block + */ +static int jpeg_v5_0_1_resume(struct amdgpu_ip_block *ip_block) +{ + struct amdgpu_device *adev = ip_block->adev; + int r; + + r = amdgpu_jpeg_resume(adev); + if (r) + return r; + + r = jpeg_v5_0_1_hw_init(ip_block); + + return r; +} + +static int jpeg_v5_0_1_disable_antihang(struct amdgpu_device *adev, int inst_idx) +{ + int jpeg_inst; + + jpeg_inst = GET_INST(JPEG, inst_idx); + /* disable anti hang mechanism */ + WREG32_P(SOC15_REG_OFFSET(JPEG, jpeg_inst, regUVD_JPEG_POWER_STATUS), 0, + ~UVD_JPEG_POWER_STATUS__JPEG_POWER_STATUS_MASK); + + /* keep the JPEG in static PG mode */ + WREG32_P(SOC15_REG_OFFSET(JPEG, jpeg_inst, regUVD_JPEG_POWER_STATUS), 0, + ~UVD_JPEG_POWER_STATUS__JPEG_PG_MODE_MASK); + + return 0; +} + +static int jpeg_v5_0_1_enable_antihang(struct amdgpu_device *adev, int inst_idx) +{ + int jpeg_inst; + + jpeg_inst = GET_INST(JPEG, inst_idx); + /* enable anti hang mechanism */ + WREG32_P(SOC15_REG_OFFSET(JPEG, jpeg_inst, regUVD_JPEG_POWER_STATUS), + UVD_JPEG_POWER_STATUS__JPEG_POWER_STATUS_MASK, + ~UVD_JPEG_POWER_STATUS__JPEG_POWER_STATUS_MASK); + + return 0; +} + +/** + * jpeg_v5_0_1_start - start JPEG block + * + * @adev: amdgpu_device pointer + * + * Setup and start the JPEG block + */ +static int jpeg_v5_0_1_start(struct amdgpu_device *adev) +{ + struct amdgpu_ring *ring; + int i, j, jpeg_inst, r; + + for (i = 0; i < adev->jpeg.num_jpeg_inst; ++i) { + jpeg_inst = GET_INST(JPEG, i); + + /* disable antihang */ + r = jpeg_v5_0_1_disable_antihang(adev, i); + if (r) + return r; + + /* MJPEG global tiling registers */ + WREG32_SOC15(JPEG, 0, regJPEG_DEC_GFX10_ADDR_CONFIG, + adev->gfx.config.gb_addr_config); + + /* enable JMI channel */ + WREG32_P(SOC15_REG_OFFSET(JPEG, jpeg_inst, regUVD_JMI_CNTL), 0, + ~UVD_JMI_CNTL__SOFT_RESET_MASK); + + for (j = 0; j < adev->jpeg.num_jpeg_rings; ++j) { + int reg_offset = (j ? jpeg_v5_0_1_core_reg_offset(j) : 0); + u32 reg, data, mask; + + ring = &adev->jpeg.inst[i].ring_dec[j]; + + /* enable System Interrupt for JRBC */ + reg = SOC15_REG_OFFSET(JPEG, jpeg_inst, regJPEG_SYS_INT_EN); + if (j < AMDGPU_MAX_JPEG_RINGS_4_0_3) { + data = JPEG_SYS_INT_EN__DJRBC0_MASK << j; + mask = ~(JPEG_SYS_INT_EN__DJRBC0_MASK << j); + WREG32_P(reg, data, mask); + } else { + data = JPEG_SYS_INT_EN__DJRBC0_MASK << (j+12); + mask = ~(JPEG_SYS_INT_EN__DJRBC0_MASK << (j+12)); + WREG32_P(reg, data, mask); + } + + WREG32_SOC15_OFFSET(JPEG, jpeg_inst, + regUVD_LMI_JRBC_RB_VMID, + reg_offset, 0); + WREG32_SOC15_OFFSET(JPEG, jpeg_inst, + regUVD_JRBC_RB_CNTL, + reg_offset, + (0x00000001L | 0x00000002L)); + WREG32_SOC15_OFFSET(JPEG, jpeg_inst, + regUVD_LMI_JRBC_RB_64BIT_BAR_LOW, + reg_offset, lower_32_bits(ring->gpu_addr)); + WREG32_SOC15_OFFSET(JPEG, jpeg_inst, + regUVD_LMI_JRBC_RB_64BIT_BAR_HIGH, + reg_offset, upper_32_bits(ring->gpu_addr)); + WREG32_SOC15_OFFSET(JPEG, jpeg_inst, + regUVD_JRBC_RB_RPTR, + reg_offset, 0); + WREG32_SOC15_OFFSET(JPEG, jpeg_inst, + regUVD_JRBC_RB_WPTR, + reg_offset, 0); + WREG32_SOC15_OFFSET(JPEG, jpeg_inst, + regUVD_JRBC_RB_CNTL, + reg_offset, 0x00000002L); + WREG32_SOC15_OFFSET(JPEG, jpeg_inst, + regUVD_JRBC_RB_SIZE, + reg_offset, ring->ring_size / 4); + ring->wptr = RREG32_SOC15_OFFSET(JPEG, jpeg_inst, regUVD_JRBC_RB_WPTR, + reg_offset); + } + } + + return 0; +} + +/** + * jpeg_v5_0_1_stop - stop JPEG block + * + * @adev: amdgpu_device pointer + * + * stop the JPEG block + */ +static int jpeg_v5_0_1_stop(struct amdgpu_device *adev) +{ + int i, jpeg_inst, r; + + for (i = 0; i < adev->jpeg.num_jpeg_inst; ++i) { + jpeg_inst = GET_INST(JPEG, i); + /* reset JMI */ + WREG32_P(SOC15_REG_OFFSET(JPEG, jpeg_inst, regUVD_JMI_CNTL), + UVD_JMI_CNTL__SOFT_RESET_MASK, + ~UVD_JMI_CNTL__SOFT_RESET_MASK); + + /* enable antihang */ + r = jpeg_v5_0_1_enable_antihang(adev, i); + if (r) + return r; + } + + return 0; +} + +/** + * jpeg_v5_0_1_dec_ring_get_rptr - get read pointer + * + * @ring: amdgpu_ring pointer + * + * Returns the current hardware read pointer + */ +static uint64_t jpeg_v5_0_1_dec_ring_get_rptr(struct amdgpu_ring *ring) +{ + struct amdgpu_device *adev = ring->adev; + + return RREG32_SOC15_OFFSET(JPEG, GET_INST(JPEG, ring->me), regUVD_JRBC_RB_RPTR, + ring->pipe ? jpeg_v5_0_1_core_reg_offset(ring->pipe) : 0); +} + +/** + * jpeg_v5_0_1_dec_ring_get_wptr - get write pointer + * + * @ring: amdgpu_ring pointer + * + * Returns the current hardware write pointer + */ +static uint64_t jpeg_v5_0_1_dec_ring_get_wptr(struct amdgpu_ring *ring) +{ + struct amdgpu_device *adev = ring->adev; + + if (ring->use_doorbell) + return adev->wb.wb[ring->wptr_offs]; + + return RREG32_SOC15_OFFSET(JPEG, GET_INST(JPEG, ring->me), regUVD_JRBC_RB_WPTR, + ring->pipe ? jpeg_v5_0_1_core_reg_offset(ring->pipe) : 0); +} + +/** + * jpeg_v5_0_1_dec_ring_set_wptr - set write pointer + * + * @ring: amdgpu_ring pointer + * + * Commits the write pointer to the hardware + */ +static void jpeg_v5_0_1_dec_ring_set_wptr(struct amdgpu_ring *ring) +{ + struct amdgpu_device *adev = ring->adev; + + if (ring->use_doorbell) { + adev->wb.wb[ring->wptr_offs] = lower_32_bits(ring->wptr); + WDOORBELL32(ring->doorbell_index, lower_32_bits(ring->wptr)); + } else { + WREG32_SOC15_OFFSET(JPEG, GET_INST(JPEG, ring->me), + regUVD_JRBC_RB_WPTR, + (ring->pipe ? jpeg_v5_0_1_core_reg_offset(ring->pipe) : 0), + lower_32_bits(ring->wptr)); + } +} + +static bool jpeg_v5_0_1_is_idle(void *handle) +{ + struct amdgpu_device *adev = (struct amdgpu_device *)handle; + bool ret = false; + int i, j; + + for (i = 0; i < adev->jpeg.num_jpeg_inst; ++i) { + for (j = 0; j < adev->jpeg.num_jpeg_rings; ++j) { + int reg_offset = (j ? jpeg_v5_0_1_core_reg_offset(j) : 0); + + ret &= ((RREG32_SOC15_OFFSET(JPEG, GET_INST(JPEG, i), + regUVD_JRBC_STATUS, reg_offset) & + UVD_JRBC_STATUS__RB_JOB_DONE_MASK) == + UVD_JRBC_STATUS__RB_JOB_DONE_MASK); + } + } + + return ret; +} + +static int jpeg_v5_0_1_wait_for_idle(struct amdgpu_ip_block *ip_block) +{ + struct amdgpu_device *adev = ip_block->adev; + int ret = 0; + int i, j; + + for (i = 0; i < adev->jpeg.num_jpeg_inst; ++i) { + for (j = 0; j < adev->jpeg.num_jpeg_rings; ++j) { + int reg_offset = (j ? jpeg_v5_0_1_core_reg_offset(j) : 0); + + ret &= SOC15_WAIT_ON_RREG_OFFSET(JPEG, GET_INST(JPEG, i), + regUVD_JRBC_STATUS, reg_offset, + UVD_JRBC_STATUS__RB_JOB_DONE_MASK, + UVD_JRBC_STATUS__RB_JOB_DONE_MASK); + } + } + return ret; +} + +static int jpeg_v5_0_1_set_clockgating_state(struct amdgpu_ip_block *ip_block, + enum amd_clockgating_state state) +{ + struct amdgpu_device *adev = ip_block->adev; + bool enable = (state == AMD_CG_STATE_GATE) ? true : false; + + int i; + + if (!enable) + return 0; + + for (i = 0; i < adev->jpeg.num_jpeg_inst; ++i) { + if (!jpeg_v5_0_1_is_idle(adev)) + return -EBUSY; + } + + return 0; +} + +static int jpeg_v5_0_1_set_powergating_state(struct amdgpu_ip_block *ip_block, + enum amd_powergating_state state) +{ + struct amdgpu_device *adev = ip_block->adev; + int ret; + + if (state == adev->jpeg.cur_state) + return 0; + + if (state == AMD_PG_STATE_GATE) + ret = jpeg_v5_0_1_stop(adev); + else + ret = jpeg_v5_0_1_start(adev); + + if (!ret) + adev->jpeg.cur_state = state; + + return ret; +} + +static int jpeg_v5_0_1_set_interrupt_state(struct amdgpu_device *adev, + struct amdgpu_irq_src *source, + unsigned int type, + enum amdgpu_interrupt_state state) +{ + return 0; +} + +static int jpeg_v5_0_1_process_interrupt(struct amdgpu_device *adev, + struct amdgpu_irq_src *source, + struct amdgpu_iv_entry *entry) +{ + u32 i, inst; + + i = node_id_to_phys_map[entry->node_id]; + DRM_DEV_DEBUG(adev->dev, "IH: JPEG TRAP\n"); + + for (inst = 0; inst < adev->jpeg.num_jpeg_inst; ++inst) + if (adev->jpeg.inst[inst].aid_id == i) + break; + + if (inst >= adev->jpeg.num_jpeg_inst) { + dev_WARN_ONCE(adev->dev, 1, + "Interrupt received for unknown JPEG instance %d", + entry->node_id); + return 0; + } + + switch (entry->src_id) { + case VCN_5_0__SRCID__JPEG_DECODE: + amdgpu_fence_process(&adev->jpeg.inst[inst].ring_dec[0]); + break; + case VCN_5_0__SRCID__JPEG1_DECODE: + amdgpu_fence_process(&adev->jpeg.inst[inst].ring_dec[1]); + break; + case VCN_5_0__SRCID__JPEG2_DECODE: + amdgpu_fence_process(&adev->jpeg.inst[inst].ring_dec[2]); + break; + case VCN_5_0__SRCID__JPEG3_DECODE: + amdgpu_fence_process(&adev->jpeg.inst[inst].ring_dec[3]); + break; + case VCN_5_0__SRCID__JPEG4_DECODE: + amdgpu_fence_process(&adev->jpeg.inst[inst].ring_dec[4]); + break; + case VCN_5_0__SRCID__JPEG5_DECODE: + amdgpu_fence_process(&adev->jpeg.inst[inst].ring_dec[5]); + break; + case VCN_5_0__SRCID__JPEG6_DECODE: + amdgpu_fence_process(&adev->jpeg.inst[inst].ring_dec[6]); + break; + case VCN_5_0__SRCID__JPEG7_DECODE: + amdgpu_fence_process(&adev->jpeg.inst[inst].ring_dec[7]); + break; + case VCN_5_0__SRCID__JPEG8_DECODE: + amdgpu_fence_process(&adev->jpeg.inst[inst].ring_dec[8]); + break; + case VCN_5_0__SRCID__JPEG9_DECODE: + amdgpu_fence_process(&adev->jpeg.inst[inst].ring_dec[9]); + break; + default: + DRM_DEV_ERROR(adev->dev, "Unhandled interrupt: %d %d\n", + entry->src_id, entry->src_data[0]); + break; + } + + return 0; +} + +static const struct amd_ip_funcs jpeg_v5_0_1_ip_funcs = { + .name = "jpeg_v5_0_1", + .early_init = jpeg_v5_0_1_early_init, + .late_init = NULL, + .sw_init = jpeg_v5_0_1_sw_init, + .sw_fini = jpeg_v5_0_1_sw_fini, + .hw_init = jpeg_v5_0_1_hw_init, + .hw_fini = jpeg_v5_0_1_hw_fini, + .suspend = jpeg_v5_0_1_suspend, + .resume = jpeg_v5_0_1_resume, + .is_idle = jpeg_v5_0_1_is_idle, + .wait_for_idle = jpeg_v5_0_1_wait_for_idle, + .check_soft_reset = NULL, + .pre_soft_reset = NULL, + .soft_reset = NULL, + .post_soft_reset = NULL, + .set_clockgating_state = jpeg_v5_0_1_set_clockgating_state, + .set_powergating_state = jpeg_v5_0_1_set_powergating_state, + .dump_ip_state = NULL, + .print_ip_state = NULL, +}; + +static const struct amdgpu_ring_funcs jpeg_v5_0_1_dec_ring_vm_funcs = { + .type = AMDGPU_RING_TYPE_VCN_JPEG, + .align_mask = 0xf, + .get_rptr = jpeg_v5_0_1_dec_ring_get_rptr, + .get_wptr = jpeg_v5_0_1_dec_ring_get_wptr, + .set_wptr = jpeg_v5_0_1_dec_ring_set_wptr, + .emit_frame_size = + SOC15_FLUSH_GPU_TLB_NUM_WREG * 6 + + SOC15_FLUSH_GPU_TLB_NUM_REG_WAIT * 8 + + 8 + /* jpeg_v5_0_1_dec_ring_emit_vm_flush */ + 22 + 22 + /* jpeg_v5_0_1_dec_ring_emit_fence x2 vm fence */ + 8 + 16, + .emit_ib_size = 22, /* jpeg_v5_0_1_dec_ring_emit_ib */ + .emit_ib = jpeg_v4_0_3_dec_ring_emit_ib, + .emit_fence = jpeg_v4_0_3_dec_ring_emit_fence, + .emit_vm_flush = jpeg_v4_0_3_dec_ring_emit_vm_flush, + .test_ring = amdgpu_jpeg_dec_ring_test_ring, + .test_ib = amdgpu_jpeg_dec_ring_test_ib, + .insert_nop = jpeg_v4_0_3_dec_ring_nop, + .insert_start = jpeg_v4_0_3_dec_ring_insert_start, + .insert_end = jpeg_v4_0_3_dec_ring_insert_end, + .pad_ib = amdgpu_ring_generic_pad_ib, + .begin_use = amdgpu_jpeg_ring_begin_use, + .end_use = amdgpu_jpeg_ring_end_use, + .emit_wreg = jpeg_v4_0_3_dec_ring_emit_wreg, + .emit_reg_wait = jpeg_v4_0_3_dec_ring_emit_reg_wait, + .emit_reg_write_reg_wait = amdgpu_ring_emit_reg_write_reg_wait_helper, +}; + +static void jpeg_v5_0_1_set_dec_ring_funcs(struct amdgpu_device *adev) +{ + int i, j, jpeg_inst; + + for (i = 0; i < adev->jpeg.num_jpeg_inst; ++i) { + for (j = 0; j < adev->jpeg.num_jpeg_rings; ++j) { + adev->jpeg.inst[i].ring_dec[j].funcs = &jpeg_v5_0_1_dec_ring_vm_funcs; + adev->jpeg.inst[i].ring_dec[j].me = i; + adev->jpeg.inst[i].ring_dec[j].pipe = j; + } + jpeg_inst = GET_INST(JPEG, i); + adev->jpeg.inst[i].aid_id = + jpeg_inst / adev->jpeg.num_inst_per_aid; + } +} + +static const struct amdgpu_irq_src_funcs jpeg_v5_0_1_irq_funcs = { + .set = jpeg_v5_0_1_set_interrupt_state, + .process = jpeg_v5_0_1_process_interrupt, +}; + +static void jpeg_v5_0_1_set_irq_funcs(struct amdgpu_device *adev) +{ + int i; + + for (i = 0; i < adev->jpeg.num_jpeg_inst; ++i) + adev->jpeg.inst->irq.num_types += adev->jpeg.num_jpeg_rings; + + adev->jpeg.inst->irq.funcs = &jpeg_v5_0_1_irq_funcs; +} + +const struct amdgpu_ip_block_version jpeg_v5_0_1_ip_block = { + .type = AMD_IP_BLOCK_TYPE_JPEG, + .major = 5, + .minor = 0, + .rev = 1, + .funcs = &jpeg_v5_0_1_ip_funcs, +}; diff --git a/drivers/gpu/drm/amd/amdgpu/jpeg_v5_0_1.h b/drivers/gpu/drm/amd/amdgpu/jpeg_v5_0_1.h new file mode 100644 index 0000000000000000000000000000000000000000..8ce146c00bb697c7909b833aacf52f1ae11c8022 --- /dev/null +++ b/drivers/gpu/drm/amd/amdgpu/jpeg_v5_0_1.h @@ -0,0 +1,29 @@ +/* + * Copyright 2024 Advanced Micro Devices, Inc. + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR + * OTHER DEALINGS IN THE SOFTWARE. + * + */ + +#ifndef __JPEG_V5_0_1_H__ +#define __JPEG_V5_0_1_H__ + +extern const struct amdgpu_ip_block_version jpeg_v5_0_1_ip_block; + +#endif /* __JPEG_V5_0_0_H__ */ diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c index 9c905b9e937637442570ec20321bdfe2ea93601a..65f389eb65e5fa4e2d2319f312ab1ee4e83a6d66 100644 --- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c +++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c @@ -1505,9 +1505,7 @@ static void mes_v11_0_kiq_setting(struct amdgpu_ring *ring) tmp = RREG32_SOC15(GC, 0, regRLC_CP_SCHEDULERS); tmp &= 0xffffff00; tmp |= (ring->me << 5) | (ring->pipe << 3) | (ring->queue); - WREG32_SOC15(GC, 0, regRLC_CP_SCHEDULERS, tmp); - tmp |= 0x80; - WREG32_SOC15(GC, 0, regRLC_CP_SCHEDULERS, tmp); + WREG32_SOC15(GC, 0, regRLC_CP_SCHEDULERS, tmp | 0x80); } static void mes_v11_0_kiq_clear(struct amdgpu_device *adev) diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c b/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c index 9ecc5d61e49ba3151cb63c82130f6462a6dbb3c6..5b537806b4da640ef6e53b3488afa47ed3dbaf57 100644 --- a/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c +++ b/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c @@ -24,6 +24,7 @@ #include #include #include "amdgpu.h" +#include "gfx_v12_0.h" #include "soc15_common.h" #include "soc21.h" #include "gc/gc_12_0_0_offset.h" @@ -350,6 +351,132 @@ static int mes_v12_0_remove_hw_queue(struct amdgpu_mes *mes, offsetof(union MESAPI__REMOVE_QUEUE, api_status)); } +int gfx_v12_0_request_gfx_index_mutex(struct amdgpu_device *adev, + bool req) +{ + u32 i, tmp, val; + + for (i = 0; i < adev->usec_timeout; i++) { + /* Request with MeId=2, PipeId=0 */ + tmp = REG_SET_FIELD(0, CP_GFX_INDEX_MUTEX, REQUEST, req); + tmp = REG_SET_FIELD(tmp, CP_GFX_INDEX_MUTEX, CLIENTID, 4); + WREG32_SOC15(GC, 0, regCP_GFX_INDEX_MUTEX, tmp); + + val = RREG32_SOC15(GC, 0, regCP_GFX_INDEX_MUTEX); + if (req) { + if (val == tmp) + break; + } else { + tmp = REG_SET_FIELD(tmp, CP_GFX_INDEX_MUTEX, + REQUEST, 1); + + /* unlocked or locked by firmware */ + if (val != tmp) + break; + } + udelay(1); + } + + if (i >= adev->usec_timeout) + return -EINVAL; + + return 0; +} + +static int mes_v12_0_reset_queue_mmio(struct amdgpu_mes *mes, uint32_t queue_type, + uint32_t me_id, uint32_t pipe_id, + uint32_t queue_id, uint32_t vmid) +{ + struct amdgpu_device *adev = mes->adev; + uint32_t value, reg; + int i, r = 0; + + amdgpu_gfx_rlc_enter_safe_mode(adev, 0); + + if (queue_type == AMDGPU_RING_TYPE_GFX) { + dev_info(adev->dev, "reset gfx queue (%d:%d:%d: vmid:%d)\n", + me_id, pipe_id, queue_id, vmid); + + mutex_lock(&adev->gfx.reset_sem_mutex); + gfx_v12_0_request_gfx_index_mutex(adev, true); + /* all se allow writes */ + WREG32_SOC15(GC, 0, regGRBM_GFX_INDEX, + (uint32_t)(0x1 << GRBM_GFX_INDEX__SE_BROADCAST_WRITES__SHIFT)); + value = REG_SET_FIELD(0, CP_VMID_RESET, RESET_REQUEST, 1 << vmid); + if (pipe_id == 0) + value = REG_SET_FIELD(value, CP_VMID_RESET, PIPE0_QUEUES, 1 << queue_id); + else + value = REG_SET_FIELD(value, CP_VMID_RESET, PIPE1_QUEUES, 1 << queue_id); + WREG32_SOC15(GC, 0, regCP_VMID_RESET, value); + gfx_v12_0_request_gfx_index_mutex(adev, false); + mutex_unlock(&adev->gfx.reset_sem_mutex); + + mutex_lock(&adev->srbm_mutex); + soc21_grbm_select(adev, me_id, pipe_id, queue_id, 0); + /* wait till dequeue take effects */ + for (i = 0; i < adev->usec_timeout; i++) { + if (!(RREG32_SOC15(GC, 0, regCP_GFX_HQD_ACTIVE) & 1)) + break; + udelay(1); + } + if (i >= adev->usec_timeout) { + dev_err(adev->dev, "failed to wait on gfx hqd deactivate\n"); + r = -ETIMEDOUT; + } + + soc21_grbm_select(adev, 0, 0, 0, 0); + mutex_unlock(&adev->srbm_mutex); + } else if (queue_type == AMDGPU_RING_TYPE_COMPUTE) { + dev_info(adev->dev, "reset compute queue (%d:%d:%d)\n", + me_id, pipe_id, queue_id); + mutex_lock(&adev->srbm_mutex); + soc21_grbm_select(adev, me_id, pipe_id, queue_id, 0); + WREG32_SOC15(GC, 0, regCP_HQD_DEQUEUE_REQUEST, 0x2); + WREG32_SOC15(GC, 0, regSPI_COMPUTE_QUEUE_RESET, 0x1); + + /* wait till dequeue take effects */ + for (i = 0; i < adev->usec_timeout; i++) { + if (!(RREG32_SOC15(GC, 0, regCP_HQD_ACTIVE) & 1)) + break; + udelay(1); + } + if (i >= adev->usec_timeout) { + dev_err(adev->dev, "failed to wait on hqd deactivate\n"); + r = -ETIMEDOUT; + } + soc21_grbm_select(adev, 0, 0, 0, 0); + mutex_unlock(&adev->srbm_mutex); + } else if (queue_type == AMDGPU_RING_TYPE_SDMA) { + dev_info(adev->dev, "reset sdma queue (%d:%d:%d)\n", + me_id, pipe_id, queue_id); + switch (me_id) { + case 1: + reg = SOC15_REG_OFFSET(GC, 0, regSDMA1_QUEUE_RESET_REQ); + break; + case 0: + default: + reg = SOC15_REG_OFFSET(GC, 0, regSDMA0_QUEUE_RESET_REQ); + break; + } + + value = 1 << queue_id; + WREG32(reg, value); + /* wait for queue reset done */ + for (i = 0; i < adev->usec_timeout; i++) { + if (!(RREG32(reg) & value)) + break; + udelay(1); + } + if (i >= adev->usec_timeout) { + dev_err(adev->dev, "failed to wait on sdma queue reset done\n"); + r = -ETIMEDOUT; + } + } + + amdgpu_gfx_rlc_exit_safe_mode(adev, 0); + return r; +} + static int mes_v12_0_reset_hw_queue(struct amdgpu_mes *mes, struct mes_reset_queue_input *input) { @@ -721,6 +848,11 @@ static int mes_v12_0_reset_legacy_queue(struct amdgpu_mes *mes, union MESAPI__RESET mes_reset_queue_pkt; int pipe; + if (input->use_mmio) + return mes_v12_0_reset_queue_mmio(mes, input->queue_type, + input->me_id, input->pipe_id, + input->queue_id, input->vmid); + memset(&mes_reset_queue_pkt, 0, sizeof(mes_reset_queue_pkt)); mes_reset_queue_pkt.header.type = MES_API_TYPE_SCHEDULER; @@ -1455,9 +1587,7 @@ static void mes_v12_0_kiq_setting(struct amdgpu_ring *ring) tmp = RREG32_SOC15(GC, 0, regRLC_CP_SCHEDULERS); tmp &= 0xffffff00; tmp |= (ring->me << 5) | (ring->pipe << 3) | (ring->queue); - WREG32_SOC15(GC, 0, regRLC_CP_SCHEDULERS, tmp); - tmp |= 0x80; - WREG32_SOC15(GC, 0, regRLC_CP_SCHEDULERS, tmp); + WREG32_SOC15(GC, 0, regRLC_CP_SCHEDULERS, tmp | 0x80); } static int mes_v12_0_kiq_hw_init(struct amdgpu_device *adev) diff --git a/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c b/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c index e9a6f33ca7109df8b0678cb0129992ad774a22aa..243eabda06077259c4f0d91e7145c92a3d961c47 100644 --- a/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c +++ b/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c @@ -356,7 +356,7 @@ static void mmhub_v1_0_update_power_gating(struct amdgpu_device *adev, if (adev->pg_flags & AMD_PG_SUPPORT_MMHUB) amdgpu_dpm_set_powergating_by_smu(adev, AMD_IP_BLOCK_TYPE_GMC, - enable); + enable, 0); } static int mmhub_v1_0_gart_enable(struct amdgpu_device *adev) diff --git a/drivers/gpu/drm/amd/amdgpu/mmhub_v1_8.c b/drivers/gpu/drm/amd/amdgpu/mmhub_v1_8.c index b01bb759d0f4f4ce47accc72af6ac5020e829f78..e646e5cef0a2e7a241f935d5d76a0b99edda868f 100644 --- a/drivers/gpu/drm/amd/amdgpu/mmhub_v1_8.c +++ b/drivers/gpu/drm/amd/amdgpu/mmhub_v1_8.c @@ -33,7 +33,6 @@ #define regVM_L2_CNTL3_DEFAULT 0x80100007 #define regVM_L2_CNTL4_DEFAULT 0x000000c1 -#define mmSMNAID_AID0_MCA_SMU 0x03b30400 static u64 mmhub_v1_8_get_fb_location(struct amdgpu_device *adev) { diff --git a/drivers/gpu/drm/amd/amdgpu/mmhub_v4_1_0.c b/drivers/gpu/drm/amd/amdgpu/mmhub_v4_1_0.c index 0fbc3be81f140fcc81859767fc3ffe24646e144a..f2ab5001b4924961942d82e5eba5c9cc9413bef9 100644 --- a/drivers/gpu/drm/amd/amdgpu/mmhub_v4_1_0.c +++ b/drivers/gpu/drm/amd/amdgpu/mmhub_v4_1_0.c @@ -108,7 +108,7 @@ mmhub_v4_1_0_print_l2_protection_fault_status(struct amdgpu_device *adev, dev_err(adev->dev, "MMVM_L2_PROTECTION_FAULT_STATUS_LO32:0x%08X\n", status); - switch (adev->ip_versions[MMHUB_HWIP][0]) { + switch (amdgpu_ip_version(adev, MMHUB_HWIP, 0)) { case IP_VERSION(4, 1, 0): mmhub_cid = mmhub_client_ids_v4_1_0[cid][rw]; break; diff --git a/drivers/gpu/drm/amd/amdgpu/navi10_ih.c b/drivers/gpu/drm/amd/amdgpu/navi10_ih.c index 0820ed62e2e8ed7e7eb2d1db008511da5face99f..62cdfe10e6f41b83b5c0f42d104395045d0f9006 100644 --- a/drivers/gpu/drm/amd/amdgpu/navi10_ih.c +++ b/drivers/gpu/drm/amd/amdgpu/navi10_ih.c @@ -434,9 +434,8 @@ static u32 navi10_ih_get_wptr(struct amdgpu_device *adev, * this should allow us to catch up. */ tmp = (wptr + 32) & ih->ptr_mask; - dev_warn(adev->dev, "IH ring buffer overflow " - "(0x%08X, 0x%08X, 0x%08X)\n", - wptr, ih->rptr, tmp); + dev_warn(adev->dev, "%s ring buffer overflow (0x%08X, 0x%08X, 0x%08X)\n", + amdgpu_ih_ring_name(adev, ih), wptr, ih->rptr, tmp); ih->rptr = tmp; tmp = RREG32_NO_KIQ(ih_regs->ih_rb_cntl); @@ -667,17 +666,17 @@ static void navi10_ih_update_clockgating_state(struct amdgpu_device *adev, } } -static int navi10_ih_set_clockgating_state(void *handle, +static int navi10_ih_set_clockgating_state(struct amdgpu_ip_block *ip_block, enum amd_clockgating_state state) { - struct amdgpu_device *adev = (struct amdgpu_device *)handle; + struct amdgpu_device *adev = ip_block->adev; navi10_ih_update_clockgating_state(adev, state == AMD_CG_STATE_GATE); return 0; } -static int navi10_ih_set_powergating_state(void *handle, +static int navi10_ih_set_powergating_state(struct amdgpu_ip_block *ip_block, enum amd_powergating_state state) { return 0; diff --git a/drivers/gpu/drm/amd/amdgpu/nbif_v6_3_1.c b/drivers/gpu/drm/amd/amdgpu/nbif_v6_3_1.c index 39919e0892c1480ce3917369ef277bfd33a7997d..c92875ceb31f4505138758e1327c5bd5f825b700 100644 --- a/drivers/gpu/drm/amd/amdgpu/nbif_v6_3_1.c +++ b/drivers/gpu/drm/amd/amdgpu/nbif_v6_3_1.c @@ -28,6 +28,7 @@ #include "nbif/nbif_6_3_1_sh_mask.h" #include "pcie/pcie_6_1_0_offset.h" #include "pcie/pcie_6_1_0_sh_mask.h" +#include "ivsrcid/nbio/irqsrcs_nbif_7_4.h" #include static void nbif_v6_3_1_remap_hdp_registers(struct amdgpu_device *adev) @@ -518,3 +519,83 @@ const struct amdgpu_nbio_funcs nbif_v6_3_1_sriov_funcs = { .get_rom_offset = nbif_v6_3_1_get_rom_offset, .set_reg_remap = nbif_v6_3_1_set_reg_remap, }; + +static int nbif_v6_3_1_set_ras_err_event_athub_irq_state(struct amdgpu_device *adev, + struct amdgpu_irq_src *src, + unsigned type, + enum amdgpu_interrupt_state state) +{ + /* The ras_controller_irq enablement should be done in psp bl when it + * tries to enable ras feature. Driver only need to set the correct interrupt + * vector for bare-metal and sriov use case respectively + */ + uint32_t bif_doorbell_int_cntl; + + bif_doorbell_int_cntl = RREG32_SOC15(NBIO, 0, regBIF_BX0_BIF_DOORBELL_INT_CNTL); + bif_doorbell_int_cntl = REG_SET_FIELD(bif_doorbell_int_cntl, + BIF_BX0_BIF_DOORBELL_INT_CNTL, + RAS_ATHUB_ERR_EVENT_INTERRUPT_DISABLE, + (state == AMDGPU_IRQ_STATE_ENABLE) ? 0 : 1); + WREG32_SOC15(NBIO, 0, regBIF_BX0_BIF_DOORBELL_INT_CNTL, bif_doorbell_int_cntl); + + return 0; +} + +static int nbif_v6_3_1_process_err_event_athub_irq(struct amdgpu_device *adev, + struct amdgpu_irq_src *source, + struct amdgpu_iv_entry *entry) +{ + /* By design, the ih cookie for err_event_athub_irq should be written + * to bif ring. since bif ring is not enabled, just leave process callback + * as a dummy one. + */ + return 0; +} + +static const struct amdgpu_irq_src_funcs nbif_v6_3_1_ras_err_event_athub_irq_funcs = { + .set = nbif_v6_3_1_set_ras_err_event_athub_irq_state, + .process = nbif_v6_3_1_process_err_event_athub_irq, +}; + +static void nbif_v6_3_1_handle_ras_err_event_athub_intr_no_bifring(struct amdgpu_device *adev) +{ + uint32_t bif_doorbell_int_cntl; + + bif_doorbell_int_cntl = RREG32_SOC15(NBIO, 0, regBIF_BX0_BIF_DOORBELL_INT_CNTL); + if (REG_GET_FIELD(bif_doorbell_int_cntl, + BIF_BX0_BIF_DOORBELL_INT_CNTL, + RAS_ATHUB_ERR_EVENT_INTERRUPT_STATUS)) { + /* driver has to clear the interrupt status when bif ring is disabled */ + bif_doorbell_int_cntl = REG_SET_FIELD(bif_doorbell_int_cntl, + BIF_BX0_BIF_DOORBELL_INT_CNTL, + RAS_ATHUB_ERR_EVENT_INTERRUPT_CLEAR, 1); + WREG32_SOC15(NBIO, 0, regBIF_BX0_BIF_DOORBELL_INT_CNTL, bif_doorbell_int_cntl); + amdgpu_ras_global_ras_isr(adev); + } +} + +static int nbif_v6_3_1_init_ras_err_event_athub_interrupt(struct amdgpu_device *adev) +{ + int r; + + /* init the irq funcs */ + adev->nbio.ras_err_event_athub_irq.funcs = + &nbif_v6_3_1_ras_err_event_athub_irq_funcs; + adev->nbio.ras_err_event_athub_irq.num_types = 1; + + /* register ras err event athub interrupt + * nbif v6_3_1 uses the same irq source as nbio v7_4 + */ + r = amdgpu_irq_add_id(adev, SOC21_IH_CLIENTID_BIF, + NBIF_7_4__SRCID__ERREVENT_ATHUB_INTERRUPT, + &adev->nbio.ras_err_event_athub_irq); + + return r; +} + +struct amdgpu_nbio_ras nbif_v6_3_1_ras = { + .handle_ras_err_event_athub_intr_no_bifring = + nbif_v6_3_1_handle_ras_err_event_athub_intr_no_bifring, + .init_ras_err_event_athub_interrupt = + nbif_v6_3_1_init_ras_err_event_athub_interrupt, +}; diff --git a/drivers/gpu/drm/amd/amdgpu/nbif_v6_3_1.h b/drivers/gpu/drm/amd/amdgpu/nbif_v6_3_1.h index b7f2e0d88905d203787722442fde8292bf40cb5b..9ac4831d39e17bb23f9d2a70554a5db6df20a422 100644 --- a/drivers/gpu/drm/amd/amdgpu/nbif_v6_3_1.h +++ b/drivers/gpu/drm/amd/amdgpu/nbif_v6_3_1.h @@ -29,5 +29,6 @@ extern const struct nbio_hdp_flush_reg nbif_v6_3_1_hdp_flush_reg; extern const struct amdgpu_nbio_funcs nbif_v6_3_1_funcs; extern const struct amdgpu_nbio_funcs nbif_v6_3_1_sriov_funcs; +extern struct amdgpu_nbio_ras nbif_v6_3_1_ras; #endif diff --git a/drivers/gpu/drm/amd/amdgpu/nbio_v7_0.c b/drivers/gpu/drm/amd/amdgpu/nbio_v7_0.c index b1b57dcc5a7370453c0ed642d862fa93b73611ce..d1032e9992b49cb4c8d8612af49f490a7a2c9f0f 100644 --- a/drivers/gpu/drm/amd/amdgpu/nbio_v7_0.c +++ b/drivers/gpu/drm/amd/amdgpu/nbio_v7_0.c @@ -271,8 +271,19 @@ const struct nbio_hdp_flush_reg nbio_v7_0_hdp_flush_reg = { .ref_and_mask_sdma1 = GPU_HDP_FLUSH_DONE__SDMA1_MASK, }; +#define regRCC_DEV0_EPF6_STRAP4 0xd304 +#define regRCC_DEV0_EPF6_STRAP4_BASE_IDX 5 + static void nbio_v7_0_init_registers(struct amdgpu_device *adev) { + uint32_t data; + + switch (amdgpu_ip_version(adev, NBIO_HWIP, 0)) { + case IP_VERSION(2, 5, 0): + data = RREG32_SOC15(NBIO, 0, regRCC_DEV0_EPF6_STRAP4) & ~BIT(23); + WREG32_SOC15(NBIO, 0, regRCC_DEV0_EPF6_STRAP4, data); + break; + } } #define MMIO_REG_HOLE_OFFSET (0x80000 - PAGE_SIZE) diff --git a/drivers/gpu/drm/amd/amdgpu/nbio_v7_11.c b/drivers/gpu/drm/amd/amdgpu/nbio_v7_11.c index 814ab59fdd4a3a370897ca4ef1413ab70f6ac1d0..41421da63a084619555b9b7673ff7eeb535ca44d 100644 --- a/drivers/gpu/drm/amd/amdgpu/nbio_v7_11.c +++ b/drivers/gpu/drm/amd/amdgpu/nbio_v7_11.c @@ -275,7 +275,7 @@ static void nbio_v7_11_init_registers(struct amdgpu_device *adev) if (def != data) WREG32_SOC15(NBIO, 0, regBIF_BIF256_CI256_RC3X4_USB4_PCIE_MST_CTRL_3, data); - switch (adev->ip_versions[NBIO_HWIP][0]) { + switch (amdgpu_ip_version(adev, NBIO_HWIP, 0)) { case IP_VERSION(7, 11, 0): case IP_VERSION(7, 11, 1): case IP_VERSION(7, 11, 2): diff --git a/drivers/gpu/drm/amd/amdgpu/nbio_v7_7.c b/drivers/gpu/drm/amd/amdgpu/nbio_v7_7.c index 1ac730328516ff232497b66f59f361042efcb8d3..3fb6d2aa7e3b3995b9561371105c50c0daeef275 100644 --- a/drivers/gpu/drm/amd/amdgpu/nbio_v7_7.c +++ b/drivers/gpu/drm/amd/amdgpu/nbio_v7_7.c @@ -247,7 +247,7 @@ static void nbio_v7_7_init_registers(struct amdgpu_device *adev) if (def != data) WREG32_SOC15(NBIO, 0, regBIF0_PCIE_MST_CTRL_3, data); - switch (adev->ip_versions[NBIO_HWIP][0]) { + switch (amdgpu_ip_version(adev, NBIO_HWIP, 0)) { case IP_VERSION(7, 7, 0): data = RREG32_SOC15(NBIO, 0, regRCC_DEV0_EPF5_STRAP4) & ~BIT(23); WREG32_SOC15(NBIO, 0, regRCC_DEV0_EPF5_STRAP4, data); diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c b/drivers/gpu/drm/amd/amdgpu/nv.c index 3bad565ded73d04010f96082843c3af10f403015..47db483c35169443b73dc72a44e39bb4d30c79d9 100644 --- a/drivers/gpu/drm/amd/amdgpu/nv.c +++ b/drivers/gpu/drm/amd/amdgpu/nv.c @@ -1039,10 +1039,10 @@ static bool nv_common_is_idle(void *handle) return true; } -static int nv_common_set_clockgating_state(void *handle, +static int nv_common_set_clockgating_state(struct amdgpu_ip_block *ip_block, enum amd_clockgating_state state) { - struct amdgpu_device *adev = (struct amdgpu_device *)handle; + struct amdgpu_device *adev = ip_block->adev; if (amdgpu_sriov_vf(adev)) return 0; @@ -1070,7 +1070,7 @@ static int nv_common_set_clockgating_state(void *handle, return 0; } -static int nv_common_set_powergating_state(void *handle, +static int nv_common_set_powergating_state(struct amdgpu_ip_block *ip_block, enum amd_powergating_state state) { /* TODO */ diff --git a/drivers/gpu/drm/amd/amdgpu/psp_v13_0.c b/drivers/gpu/drm/amd/amdgpu/psp_v13_0.c index c4b775aaee9fe77d57333df97b7c0d224aeb9cf8..cc621064610f1d5d0b3b433248fd139291b485e2 100644 --- a/drivers/gpu/drm/amd/amdgpu/psp_v13_0.c +++ b/drivers/gpu/drm/amd/amdgpu/psp_v13_0.c @@ -51,6 +51,8 @@ MODULE_FIRMWARE("amdgpu/psp_13_0_11_toc.bin"); MODULE_FIRMWARE("amdgpu/psp_13_0_11_ta.bin"); MODULE_FIRMWARE("amdgpu/psp_13_0_6_sos.bin"); MODULE_FIRMWARE("amdgpu/psp_13_0_6_ta.bin"); +MODULE_FIRMWARE("amdgpu/psp_13_0_12_sos.bin"); +MODULE_FIRMWARE("amdgpu/psp_13_0_12_ta.bin"); MODULE_FIRMWARE("amdgpu/psp_13_0_14_sos.bin"); MODULE_FIRMWARE("amdgpu/psp_13_0_14_ta.bin"); MODULE_FIRMWARE("amdgpu/psp_14_0_0_toc.bin"); @@ -122,6 +124,7 @@ static int psp_v13_0_init_microcode(struct psp_context *psp) case IP_VERSION(13, 0, 6): case IP_VERSION(13, 0, 7): case IP_VERSION(13, 0, 10): + case IP_VERSION(13, 0, 12): case IP_VERSION(13, 0, 14): err = psp_init_sos_microcode(psp, ucode_prefix); if (err) @@ -177,6 +180,7 @@ static int psp_v13_0_wait_for_bootloader(struct psp_context *psp) retry_cnt = ((amdgpu_ip_version(adev, MP0_HWIP, 0) == IP_VERSION(13, 0, 6) || + amdgpu_ip_version(adev, MP0_HWIP, 0) == IP_VERSION(13, 0, 12) || amdgpu_ip_version(adev, MP0_HWIP, 0) == IP_VERSION(13, 0, 14))) ? PSP_VMBX_POLLING_LIMIT : 10; @@ -203,6 +207,7 @@ static int psp_v13_0_wait_for_bootloader_steady_state(struct psp_context *psp) int ret; if (amdgpu_ip_version(adev, MP0_HWIP, 0) == IP_VERSION(13, 0, 6) || + amdgpu_ip_version(adev, MP0_HWIP, 0) == IP_VERSION(13, 0, 12) || amdgpu_ip_version(adev, MP0_HWIP, 0) == IP_VERSION(13, 0, 14)) { ret = psp_v13_0_wait_for_vmbx_ready(psp); if (ret) @@ -288,6 +293,11 @@ static int psp_v13_0_bootloader_load_ras_drv(struct psp_context *psp) return psp_v13_0_bootloader_load_component(psp, &psp->ras_drv, PSP_BL__LOAD_RASDRV); } +static int psp_v13_0_bootloader_load_spdm_drv(struct psp_context *psp) +{ + return psp_v13_0_bootloader_load_component(psp, &psp->spdm_drv, PSP_BL__LOAD_SPDMDRV); +} + static inline void psp_v13_0_init_sos_version(struct psp_context *psp) { struct amdgpu_device *adev = psp->adev; @@ -798,6 +808,7 @@ static bool psp_v13_0_get_ras_capability(struct psp_context *psp) return false; if ((amdgpu_ip_version(adev, MP0_HWIP, 0) == IP_VERSION(13, 0, 6) || + amdgpu_ip_version(adev, MP0_HWIP, 0) == IP_VERSION(13, 0, 12) || amdgpu_ip_version(adev, MP0_HWIP, 0) == IP_VERSION(13, 0, 14)) && (!(adev->flags & AMD_IS_APU))) { reg_data = RREG32_SOC15(MP0, 0, regMP0_SMN_C2PMSG_127); @@ -857,6 +868,7 @@ static const struct psp_funcs psp_v13_0_funcs = { .bootloader_load_intf_drv = psp_v13_0_bootloader_load_intf_drv, .bootloader_load_dbg_drv = psp_v13_0_bootloader_load_dbg_drv, .bootloader_load_ras_drv = psp_v13_0_bootloader_load_ras_drv, + .bootloader_load_spdm_drv = psp_v13_0_bootloader_load_spdm_drv, .bootloader_load_sos = psp_v13_0_bootloader_load_sos, .ring_create = psp_v13_0_ring_create, .ring_stop = psp_v13_0_ring_stop, diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c b/drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c index 7948d74f872256bb43bd2d7e899aef0889a6d259..135c5099bfb8e977a0cd58bbaa89688ffa36cf5b 100644 --- a/drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c @@ -145,9 +145,11 @@ static int sdma_v2_4_init_microcode(struct amdgpu_device *adev) for (i = 0; i < adev->sdma.num_instances; i++) { if (i == 0) err = amdgpu_ucode_request(adev, &adev->sdma.instance[i].fw, + AMDGPU_UCODE_REQUIRED, "amdgpu/%s_sdma.bin", chip_name); else err = amdgpu_ucode_request(adev, &adev->sdma.instance[i].fw, + AMDGPU_UCODE_REQUIRED, "amdgpu/%s_sdma1.bin", chip_name); if (err) goto out; @@ -631,7 +633,7 @@ static int sdma_v2_4_ring_test_ib(struct amdgpu_ring *ring, long timeout) r = -EINVAL; err1: - amdgpu_ib_free(adev, &ib, NULL); + amdgpu_ib_free(&ib, NULL); dma_fence_put(f); err0: amdgpu_device_wb_free(adev, index); @@ -1080,14 +1082,14 @@ static int sdma_v2_4_process_illegal_inst_irq(struct amdgpu_device *adev, return 0; } -static int sdma_v2_4_set_clockgating_state(void *handle, +static int sdma_v2_4_set_clockgating_state(struct amdgpu_ip_block *ip_block, enum amd_clockgating_state state) { /* XXX handled via the smc on VI */ return 0; } -static int sdma_v2_4_set_powergating_state(void *handle, +static int sdma_v2_4_set_powergating_state(struct amdgpu_ip_block *ip_block, enum amd_powergating_state state) { return 0; diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c b/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c index 9a3d729545a7c6712888996e22e81fdba09ebaaf..c611328671ed130899adae12989fc95a04bf6dcb 100644 --- a/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c @@ -305,9 +305,11 @@ static int sdma_v3_0_init_microcode(struct amdgpu_device *adev) for (i = 0; i < adev->sdma.num_instances; i++) { if (i == 0) err = amdgpu_ucode_request(adev, &adev->sdma.instance[i].fw, + AMDGPU_UCODE_REQUIRED, "amdgpu/%s_sdma.bin", chip_name); else err = amdgpu_ucode_request(adev, &adev->sdma.instance[i].fw, + AMDGPU_UCODE_REQUIRED, "amdgpu/%s_sdma1.bin", chip_name); if (err) goto out; @@ -904,7 +906,7 @@ static int sdma_v3_0_ring_test_ib(struct amdgpu_ring *ring, long timeout) else r = -EINVAL; err1: - amdgpu_ib_free(adev, &ib, NULL); + amdgpu_ib_free(&ib, NULL); dma_fence_put(f); err0: amdgpu_device_wb_free(adev, index); @@ -1483,10 +1485,10 @@ static void sdma_v3_0_update_sdma_medium_grain_light_sleep( } } -static int sdma_v3_0_set_clockgating_state(void *handle, +static int sdma_v3_0_set_clockgating_state(struct amdgpu_ip_block *ip_block, enum amd_clockgating_state state) { - struct amdgpu_device *adev = (struct amdgpu_device *)handle; + struct amdgpu_device *adev = ip_block->adev; if (amdgpu_sriov_vf(adev)) return 0; @@ -1506,7 +1508,7 @@ static int sdma_v3_0_set_clockgating_state(void *handle, return 0; } -static int sdma_v3_0_set_powergating_state(void *handle, +static int sdma_v3_0_set_powergating_state(struct amdgpu_ip_block *ip_block, enum amd_powergating_state state) { return 0; diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c index c1f98f6cf20d4801676cb734c4d4ed0cb92b2dd4..b48d9c0b2e1c3bfcb392c39acd74ecce0977311a 100644 --- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c @@ -1565,7 +1565,7 @@ static int sdma_v4_0_ring_test_ib(struct amdgpu_ring *ring, long timeout) r = -EINVAL; err1: - amdgpu_ib_free(adev, &ib, NULL); + amdgpu_ib_free(&ib, NULL); dma_fence_put(f); err0: amdgpu_device_wb_free(adev, index); @@ -1956,7 +1956,7 @@ static int sdma_v4_0_hw_init(struct amdgpu_ip_block *ip_block) struct amdgpu_device *adev = ip_block->adev; if (adev->flags & AMD_IS_APU) - amdgpu_dpm_set_powergating_by_smu(adev, AMD_IP_BLOCK_TYPE_SDMA, false); + amdgpu_dpm_set_powergating_by_smu(adev, AMD_IP_BLOCK_TYPE_SDMA, false, 0); if (!amdgpu_sriov_vf(adev)) sdma_v4_0_init_golden_registers(adev); @@ -1983,7 +1983,7 @@ static int sdma_v4_0_hw_fini(struct amdgpu_ip_block *ip_block) sdma_v4_0_enable(adev, false); if (adev->flags & AMD_IS_APU) - amdgpu_dpm_set_powergating_by_smu(adev, AMD_IP_BLOCK_TYPE_SDMA, true); + amdgpu_dpm_set_powergating_by_smu(adev, AMD_IP_BLOCK_TYPE_SDMA, true, 0); return 0; } @@ -2297,10 +2297,10 @@ static void sdma_v4_0_update_medium_grain_light_sleep( } } -static int sdma_v4_0_set_clockgating_state(void *handle, +static int sdma_v4_0_set_clockgating_state(struct amdgpu_ip_block *ip_block, enum amd_clockgating_state state) { - struct amdgpu_device *adev = (struct amdgpu_device *)handle; + struct amdgpu_device *adev = ip_block->adev; if (amdgpu_sriov_vf(adev)) return 0; @@ -2312,10 +2312,10 @@ static int sdma_v4_0_set_clockgating_state(void *handle, return 0; } -static int sdma_v4_0_set_powergating_state(void *handle, +static int sdma_v4_0_set_powergating_state(struct amdgpu_ip_block *ip_block, enum amd_powergating_state state) { - struct amdgpu_device *adev = (struct amdgpu_device *)handle; + struct amdgpu_device *adev = ip_block->adev; switch (amdgpu_ip_version(adev, SDMA0_HWIP, 0)) { case IP_VERSION(4, 1, 0): diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c b/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c index a38553f38fdc8715040ecd48f35d923652a82316..56507ae919b0cb16f6285106927e43a555219e4e 100644 --- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c @@ -189,6 +189,7 @@ static int sdma_v4_4_2_init_microcode(struct amdgpu_device *adev) for (i = 0; i < adev->sdma.num_instances; i++) { if (amdgpu_ip_version(adev, SDMA0_HWIP, 0) == IP_VERSION(4, 4, 2) || + amdgpu_ip_version(adev, SDMA0_HWIP, 0) == IP_VERSION(4, 4, 4) || amdgpu_ip_version(adev, SDMA0_HWIP, 0) == IP_VERSION(4, 4, 5)) { ret = amdgpu_sdma_init_microcode(adev, 0, true); break; @@ -667,11 +668,12 @@ static uint32_t sdma_v4_4_2_rb_cntl(struct amdgpu_ring *ring, uint32_t rb_cntl) * * @adev: amdgpu_device pointer * @i: instance to resume + * @restore: used to restore wptr when restart * * Set up the gfx DMA ring buffers and enable them. * Returns 0 for success, error for failure. */ -static void sdma_v4_4_2_gfx_resume(struct amdgpu_device *adev, unsigned int i) +static void sdma_v4_4_2_gfx_resume(struct amdgpu_device *adev, unsigned int i, bool restore) { struct amdgpu_ring *ring = &adev->sdma.instance[i].ring; u32 rb_cntl, ib_cntl, wptr_poll_cntl; @@ -698,16 +700,24 @@ static void sdma_v4_4_2_gfx_resume(struct amdgpu_device *adev, unsigned int i) WREG32_SDMA(i, regSDMA_GFX_RB_BASE, ring->gpu_addr >> 8); WREG32_SDMA(i, regSDMA_GFX_RB_BASE_HI, ring->gpu_addr >> 40); - ring->wptr = 0; + if (!restore) + ring->wptr = 0; /* before programing wptr to a less value, need set minor_ptr_update first */ WREG32_SDMA(i, regSDMA_GFX_MINOR_PTR_UPDATE, 1); /* Initialize the ring buffer's read and write pointers */ - WREG32_SDMA(i, regSDMA_GFX_RB_RPTR, 0); - WREG32_SDMA(i, regSDMA_GFX_RB_RPTR_HI, 0); - WREG32_SDMA(i, regSDMA_GFX_RB_WPTR, 0); - WREG32_SDMA(i, regSDMA_GFX_RB_WPTR_HI, 0); + if (restore) { + WREG32_SDMA(i, regSDMA_GFX_RB_RPTR, lower_32_bits(ring->wptr << 2)); + WREG32_SDMA(i, regSDMA_GFX_RB_RPTR_HI, upper_32_bits(ring->wptr << 2)); + WREG32_SDMA(i, regSDMA_GFX_RB_WPTR, lower_32_bits(ring->wptr << 2)); + WREG32_SDMA(i, regSDMA_GFX_RB_WPTR_HI, upper_32_bits(ring->wptr << 2)); + } else { + WREG32_SDMA(i, regSDMA_GFX_RB_RPTR, 0); + WREG32_SDMA(i, regSDMA_GFX_RB_RPTR_HI, 0); + WREG32_SDMA(i, regSDMA_GFX_RB_WPTR, 0); + WREG32_SDMA(i, regSDMA_GFX_RB_WPTR_HI, 0); + } doorbell = RREG32_SDMA(i, regSDMA_GFX_DOORBELL); doorbell_offset = RREG32_SDMA(i, regSDMA_GFX_DOORBELL_OFFSET); @@ -755,11 +765,12 @@ static void sdma_v4_4_2_gfx_resume(struct amdgpu_device *adev, unsigned int i) * * @adev: amdgpu_device pointer * @i: instance to resume + * @restore: boolean to say restore needed or not * * Set up the page DMA ring buffers and enable them. * Returns 0 for success, error for failure. */ -static void sdma_v4_4_2_page_resume(struct amdgpu_device *adev, unsigned int i) +static void sdma_v4_4_2_page_resume(struct amdgpu_device *adev, unsigned int i, bool restore) { struct amdgpu_ring *ring = &adev->sdma.instance[i].page; u32 rb_cntl, ib_cntl, wptr_poll_cntl; @@ -775,10 +786,17 @@ static void sdma_v4_4_2_page_resume(struct amdgpu_device *adev, unsigned int i) WREG32_SDMA(i, regSDMA_PAGE_RB_CNTL, rb_cntl); /* Initialize the ring buffer's read and write pointers */ - WREG32_SDMA(i, regSDMA_PAGE_RB_RPTR, 0); - WREG32_SDMA(i, regSDMA_PAGE_RB_RPTR_HI, 0); - WREG32_SDMA(i, regSDMA_PAGE_RB_WPTR, 0); - WREG32_SDMA(i, regSDMA_PAGE_RB_WPTR_HI, 0); + if (restore) { + WREG32_SDMA(i, regSDMA_GFX_RB_RPTR, lower_32_bits(ring->wptr << 2)); + WREG32_SDMA(i, regSDMA_GFX_RB_RPTR_HI, upper_32_bits(ring->wptr << 2)); + WREG32_SDMA(i, regSDMA_GFX_RB_WPTR, lower_32_bits(ring->wptr << 2)); + WREG32_SDMA(i, regSDMA_GFX_RB_WPTR_HI, upper_32_bits(ring->wptr << 2)); + } else { + WREG32_SDMA(i, regSDMA_PAGE_RB_RPTR, 0); + WREG32_SDMA(i, regSDMA_PAGE_RB_RPTR_HI, 0); + WREG32_SDMA(i, regSDMA_PAGE_RB_WPTR, 0); + WREG32_SDMA(i, regSDMA_PAGE_RB_WPTR_HI, 0); + } /* set the wb address whether it's enabled or not */ WREG32_SDMA(i, regSDMA_PAGE_RB_RPTR_ADDR_HI, @@ -792,7 +810,8 @@ static void sdma_v4_4_2_page_resume(struct amdgpu_device *adev, unsigned int i) WREG32_SDMA(i, regSDMA_PAGE_RB_BASE, ring->gpu_addr >> 8); WREG32_SDMA(i, regSDMA_PAGE_RB_BASE_HI, ring->gpu_addr >> 40); - ring->wptr = 0; + if (!restore) + ring->wptr = 0; /* before programing wptr to a less value, need set minor_ptr_update first */ WREG32_SDMA(i, regSDMA_PAGE_MINOR_PTR_UPDATE, 1); @@ -911,12 +930,13 @@ static int sdma_v4_4_2_inst_load_microcode(struct amdgpu_device *adev, * * @adev: amdgpu_device pointer * @inst_mask: mask of dma engine instances to be enabled + * @restore: boolean to say restore needed or not * * Set up the DMA engines and enable them. * Returns 0 for success, error for failure. */ static int sdma_v4_4_2_inst_start(struct amdgpu_device *adev, - uint32_t inst_mask) + uint32_t inst_mask, bool restore) { struct amdgpu_ring *ring; uint32_t tmp_mask; @@ -927,7 +947,7 @@ static int sdma_v4_4_2_inst_start(struct amdgpu_device *adev, sdma_v4_4_2_inst_enable(adev, false, inst_mask); } else { /* bypass sdma microcode loading on Gopher */ - if (adev->firmware.load_type != AMDGPU_FW_LOAD_PSP && + if (!restore && adev->firmware.load_type != AMDGPU_FW_LOAD_PSP && adev->sdma.instance[0].fw) { r = sdma_v4_4_2_inst_load_microcode(adev, inst_mask); if (r) @@ -946,17 +966,19 @@ static int sdma_v4_4_2_inst_start(struct amdgpu_device *adev, uint32_t temp; WREG32_SDMA(i, regSDMA_SEM_WAIT_FAIL_TIMER_CNTL, 0); - sdma_v4_4_2_gfx_resume(adev, i); + sdma_v4_4_2_gfx_resume(adev, i, restore); if (adev->sdma.has_page_queue) - sdma_v4_4_2_page_resume(adev, i); + sdma_v4_4_2_page_resume(adev, i, restore); /* set utc l1 enable flag always to 1 */ temp = RREG32_SDMA(i, regSDMA_CNTL); temp = REG_SET_FIELD(temp, SDMA_CNTL, UTC_L1_ENABLE, 1); - /* enable context empty interrupt during initialization */ - temp = REG_SET_FIELD(temp, SDMA_CNTL, CTXEMPTY_INT_ENABLE, 1); - WREG32_SDMA(i, regSDMA_CNTL, temp); + if (amdgpu_ip_version(adev, SDMA0_HWIP, 0) < IP_VERSION(4, 4, 5)) { + /* enable context empty interrupt during initialization */ + temp = REG_SET_FIELD(temp, SDMA_CNTL, CTXEMPTY_INT_ENABLE, 1); + WREG32_SDMA(i, regSDMA_CNTL, temp); + } if (!amdgpu_sriov_vf(adev)) { if (adev->firmware.load_type != AMDGPU_FW_LOAD_PSP) { /* unhalt engine */ @@ -1110,7 +1132,7 @@ static int sdma_v4_4_2_ring_test_ib(struct amdgpu_ring *ring, long timeout) r = -EINVAL; err1: - amdgpu_ib_free(adev, &ib, NULL); + amdgpu_ib_free(&ib, NULL); dma_fence_put(f); err0: amdgpu_device_wb_free(adev, index); @@ -1466,6 +1488,7 @@ static int sdma_v4_4_2_sw_fini(struct amdgpu_ip_block *ip_block) amdgpu_sdma_sysfs_reset_mask_fini(adev); if (amdgpu_ip_version(adev, SDMA0_HWIP, 0) == IP_VERSION(4, 4, 2) || + amdgpu_ip_version(adev, SDMA0_HWIP, 0) == IP_VERSION(4, 4, 4) || amdgpu_ip_version(adev, SDMA0_HWIP, 0) == IP_VERSION(4, 4, 5)) amdgpu_sdma_destroy_inst_ctx(adev, true); else @@ -1486,7 +1509,7 @@ static int sdma_v4_4_2_hw_init(struct amdgpu_ip_block *ip_block) if (!amdgpu_sriov_vf(adev)) sdma_v4_4_2_inst_init_golden_registers(adev, inst_mask); - r = sdma_v4_4_2_inst_start(adev, inst_mask); + r = sdma_v4_4_2_inst_start(adev, inst_mask, false); return r; } @@ -1514,7 +1537,7 @@ static int sdma_v4_4_2_hw_fini(struct amdgpu_ip_block *ip_block) return 0; } -static int sdma_v4_4_2_set_clockgating_state(void *handle, +static int sdma_v4_4_2_set_clockgating_state(struct amdgpu_ip_block *ip_block, enum amd_clockgating_state state); static int sdma_v4_4_2_suspend(struct amdgpu_ip_block *ip_block) @@ -1522,7 +1545,7 @@ static int sdma_v4_4_2_suspend(struct amdgpu_ip_block *ip_block) struct amdgpu_device *adev = ip_block->adev; if (amdgpu_in_reset(adev)) - sdma_v4_4_2_set_clockgating_state(adev, AMD_CG_STATE_UNGATE); + sdma_v4_4_2_set_clockgating_state(ip_block, AMD_CG_STATE_UNGATE); return sdma_v4_4_2_hw_fini(ip_block); } @@ -1573,6 +1596,42 @@ static int sdma_v4_4_2_soft_reset(struct amdgpu_ip_block *ip_block) return 0; } +static int sdma_v4_4_2_reset_queue(struct amdgpu_ring *ring, unsigned int vmid) +{ + struct amdgpu_device *adev = ring->adev; + int i, r; + u32 inst_mask; + + if ((adev->flags & AMD_IS_APU) || amdgpu_sriov_vf(adev)) + return -EINVAL; + + /* stop queue */ + inst_mask = 1 << ring->me; + sdma_v4_4_2_inst_gfx_stop(adev, inst_mask); + if (adev->sdma.has_page_queue) + sdma_v4_4_2_inst_page_stop(adev, inst_mask); + + r = amdgpu_dpm_reset_sdma(adev, 1 << GET_INST(SDMA0, ring->me)); + if (r) + return r; + + udelay(50); + + for (i = 0; i < adev->usec_timeout; i++) { + if (!REG_GET_FIELD(RREG32_SDMA(ring->me, regSDMA_F32_CNTL), SDMA_F32_CNTL, HALT)) + break; + udelay(1); + } + + if (i == adev->usec_timeout) { + dev_err(adev->dev, "timed out waiting for SDMA%d unhalt after reset\n", + ring->me); + return -ETIMEDOUT; + } + + return sdma_v4_4_2_inst_start(adev, inst_mask, true); +} + static int sdma_v4_4_2_set_trap_irq_state(struct amdgpu_device *adev, struct amdgpu_irq_src *source, unsigned type, @@ -1821,10 +1880,10 @@ static void sdma_v4_4_2_inst_update_medium_grain_clock_gating( } } -static int sdma_v4_4_2_set_clockgating_state(void *handle, +static int sdma_v4_4_2_set_clockgating_state(struct amdgpu_ip_block *ip_block, enum amd_clockgating_state state) { - struct amdgpu_device *adev = (struct amdgpu_device *)handle; + struct amdgpu_device *adev = ip_block->adev; uint32_t inst_mask; if (amdgpu_sriov_vf(adev)) @@ -1839,7 +1898,7 @@ static int sdma_v4_4_2_set_clockgating_state(void *handle, return 0; } -static int sdma_v4_4_2_set_powergating_state(void *handle, +static int sdma_v4_4_2_set_powergating_state(struct amdgpu_ip_block *ip_block, enum amd_powergating_state state) { return 0; @@ -1895,7 +1954,6 @@ static void sdma_v4_4_2_dump_ip_state(struct amdgpu_ip_block *ip_block) if (!adev->sdma.ip_dump) return; - amdgpu_gfx_off_ctrl(adev, false); for (i = 0; i < adev->sdma.num_instances; i++) { instance_offset = i * reg_count; for (j = 0; j < reg_count; j++) @@ -1903,7 +1961,6 @@ static void sdma_v4_4_2_dump_ip_state(struct amdgpu_ip_block *ip_block) RREG32(sdma_v4_4_2_get_reg_offset(adev, i, sdma_reg_list_4_4_2[j].reg_offset)); } - amdgpu_gfx_off_ctrl(adev, true); } const struct amd_ip_funcs sdma_v4_4_2_ip_funcs = { @@ -1955,6 +2012,7 @@ static const struct amdgpu_ring_funcs sdma_v4_4_2_ring_funcs = { .emit_wreg = sdma_v4_4_2_ring_emit_wreg, .emit_reg_wait = sdma_v4_4_2_ring_emit_reg_wait, .emit_reg_write_reg_wait = amdgpu_ring_emit_reg_write_reg_wait_helper, + .reset = sdma_v4_4_2_reset_queue, }; static const struct amdgpu_ring_funcs sdma_v4_4_2_page_ring_funcs = { @@ -2167,7 +2225,7 @@ static int sdma_v4_4_2_xcp_resume(void *handle, uint32_t inst_mask) if (!amdgpu_sriov_vf(adev)) sdma_v4_4_2_inst_init_golden_registers(adev, inst_mask); - r = sdma_v4_4_2_inst_start(adev, inst_mask); + r = sdma_v4_4_2_inst_start(adev, inst_mask, false); return r; } diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c b/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c index fa9b4093495702a06cbfb18701023e03f2a4db21..b764550834a07133c23cdb955ea6f2b6ab7d1147 100644 --- a/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c @@ -1194,7 +1194,7 @@ static int sdma_v5_0_ring_test_ib(struct amdgpu_ring *ring, long timeout) r = -EINVAL; err1: - amdgpu_ib_free(adev, &ib, NULL); + amdgpu_ib_free(&ib, NULL); dma_fence_put(f); err0: if (!ring->is_mes_queue) @@ -1853,10 +1853,10 @@ static void sdma_v5_0_update_medium_grain_light_sleep(struct amdgpu_device *adev } } -static int sdma_v5_0_set_clockgating_state(void *handle, +static int sdma_v5_0_set_clockgating_state(struct amdgpu_ip_block *ip_block, enum amd_clockgating_state state) { - struct amdgpu_device *adev = (struct amdgpu_device *)handle; + struct amdgpu_device *adev = ip_block->adev; if (amdgpu_sriov_vf(adev)) return 0; @@ -1877,7 +1877,7 @@ static int sdma_v5_0_set_clockgating_state(void *handle, return 0; } -static int sdma_v5_0_set_powergating_state(void *handle, +static int sdma_v5_0_set_powergating_state(struct amdgpu_ip_block *ip_block, enum amd_powergating_state state) { return 0; diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c b/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c index ba5160399ab2a03e80c6aa4f57264febd5f28e11..b1818e87889a20c9173fbb7099e22390b5f53520 100644 --- a/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c @@ -1050,7 +1050,7 @@ static int sdma_v5_2_ring_test_ib(struct amdgpu_ring *ring, long timeout) r = -EINVAL; err1: - amdgpu_ib_free(adev, &ib, NULL); + amdgpu_ib_free(&ib, NULL); dma_fence_put(f); err0: if (!ring->is_mes_queue) @@ -1812,10 +1812,10 @@ static void sdma_v5_2_update_medium_grain_light_sleep(struct amdgpu_device *adev } } -static int sdma_v5_2_set_clockgating_state(void *handle, +static int sdma_v5_2_set_clockgating_state(struct amdgpu_ip_block *ip_block, enum amd_clockgating_state state) { - struct amdgpu_device *adev = (struct amdgpu_device *)handle; + struct amdgpu_device *adev = ip_block->adev; if (amdgpu_sriov_vf(adev)) return 0; @@ -1841,7 +1841,7 @@ static int sdma_v5_2_set_clockgating_state(void *handle, return 0; } -static int sdma_v5_2_set_powergating_state(void *handle, +static int sdma_v5_2_set_powergating_state(struct amdgpu_ip_block *ip_block, enum amd_powergating_state state) { return 0; diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c b/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c index d46128b0ec9202fdcd27f9423826406b3c49b261..1a023b45f0be89503fc496beebaa7a2457229a1d 100644 --- a/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c @@ -1063,7 +1063,7 @@ static int sdma_v6_0_ring_test_ib(struct amdgpu_ring *ring, long timeout) r = -EINVAL; err1: - amdgpu_ib_free(adev, &ib, NULL); + amdgpu_ib_free(&ib, NULL); dma_fence_put(f); err0: if (!ring->is_mes_queue) @@ -1601,13 +1601,13 @@ static int sdma_v6_0_process_illegal_inst_irq(struct amdgpu_device *adev, return 0; } -static int sdma_v6_0_set_clockgating_state(void *handle, +static int sdma_v6_0_set_clockgating_state(struct amdgpu_ip_block *ip_block, enum amd_clockgating_state state) { return 0; } -static int sdma_v6_0_set_powergating_state(void *handle, +static int sdma_v6_0_set_powergating_state(struct amdgpu_ip_block *ip_block, enum amd_powergating_state state) { return 0; diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c b/drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c index d2ce6b6a7ff64e0d07a122eb158b42a77916c574..9c17df2cf37b82e5ade898c859f3cb93702b4165 100644 --- a/drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c @@ -490,162 +490,185 @@ static void sdma_v7_0_enable(struct amdgpu_device *adev, bool enable) } /** - * sdma_v7_0_gfx_resume - setup and start the async dma engines + * sdma_v7_0_gfx_resume_instance - start/restart a certain sdma engine * * @adev: amdgpu_device pointer + * @i: instance + * @restore: used to restore wptr when restart * - * Set up the gfx DMA ring buffers and enable them. - * Returns 0 for success, error for failure. + * Set up the gfx DMA ring buffers and enable them. On restart, we will restore wptr and rptr. + * Return 0 for success. */ -static int sdma_v7_0_gfx_resume(struct amdgpu_device *adev) +static int sdma_v7_0_gfx_resume_instance(struct amdgpu_device *adev, int i, bool restore) { struct amdgpu_ring *ring; u32 rb_cntl, ib_cntl; u32 rb_bufsz; u32 doorbell; u32 doorbell_offset; - u32 tmp; + u32 temp; u64 wptr_gpu_addr; - int i, r; - - for (i = 0; i < adev->sdma.num_instances; i++) { - ring = &adev->sdma.instance[i].ring; + int r; - //if (!amdgpu_sriov_vf(adev)) - // WREG32_SOC15_IP(GC, sdma_v7_0_get_reg_offset(adev, i, regSDMA0_SEM_WAIT_FAIL_TIMER_CNTL), 0); + ring = &adev->sdma.instance[i].ring; - /* Set ring buffer size in dwords */ - rb_bufsz = order_base_2(ring->ring_size / 4); - rb_cntl = RREG32_SOC15_IP(GC, sdma_v7_0_get_reg_offset(adev, i, regSDMA0_QUEUE0_RB_CNTL)); - rb_cntl = REG_SET_FIELD(rb_cntl, SDMA0_QUEUE0_RB_CNTL, RB_SIZE, rb_bufsz); + /* Set ring buffer size in dwords */ + rb_bufsz = order_base_2(ring->ring_size / 4); + rb_cntl = RREG32_SOC15_IP(GC, sdma_v7_0_get_reg_offset(adev, i, regSDMA0_QUEUE0_RB_CNTL)); + rb_cntl = REG_SET_FIELD(rb_cntl, SDMA0_QUEUE0_RB_CNTL, RB_SIZE, rb_bufsz); #ifdef __BIG_ENDIAN - rb_cntl = REG_SET_FIELD(rb_cntl, SDMA0_QUEUE0_RB_CNTL, RB_SWAP_ENABLE, 1); - rb_cntl = REG_SET_FIELD(rb_cntl, SDMA0_QUEUE0_RB_CNTL, - RPTR_WRITEBACK_SWAP_ENABLE, 1); + rb_cntl = REG_SET_FIELD(rb_cntl, SDMA0_QUEUE0_RB_CNTL, RB_SWAP_ENABLE, 1); + rb_cntl = REG_SET_FIELD(rb_cntl, SDMA0_QUEUE0_RB_CNTL, + RPTR_WRITEBACK_SWAP_ENABLE, 1); #endif - rb_cntl = REG_SET_FIELD(rb_cntl, SDMA0_QUEUE0_RB_CNTL, RB_PRIV, 1); - WREG32_SOC15_IP(GC, sdma_v7_0_get_reg_offset(adev, i, regSDMA0_QUEUE0_RB_CNTL), rb_cntl); - - /* Initialize the ring buffer's read and write pointers */ + rb_cntl = REG_SET_FIELD(rb_cntl, SDMA0_QUEUE0_RB_CNTL, RB_PRIV, 1); + WREG32_SOC15_IP(GC, sdma_v7_0_get_reg_offset(adev, i, regSDMA0_QUEUE0_RB_CNTL), rb_cntl); + + /* Initialize the ring buffer's read and write pointers */ + if (restore) { + WREG32_SOC15_IP(GC, sdma_v7_0_get_reg_offset(adev, i, regSDMA0_QUEUE0_RB_RPTR), lower_32_bits(ring->wptr << 2)); + WREG32_SOC15_IP(GC, sdma_v7_0_get_reg_offset(adev, i, regSDMA0_QUEUE0_RB_RPTR_HI), upper_32_bits(ring->wptr << 2)); + WREG32_SOC15_IP(GC, sdma_v7_0_get_reg_offset(adev, i, regSDMA0_QUEUE0_RB_WPTR), lower_32_bits(ring->wptr << 2)); + WREG32_SOC15_IP(GC, sdma_v7_0_get_reg_offset(adev, i, regSDMA0_QUEUE0_RB_WPTR_HI), upper_32_bits(ring->wptr << 2)); + } else { WREG32_SOC15_IP(GC, sdma_v7_0_get_reg_offset(adev, i, regSDMA0_QUEUE0_RB_RPTR), 0); WREG32_SOC15_IP(GC, sdma_v7_0_get_reg_offset(adev, i, regSDMA0_QUEUE0_RB_RPTR_HI), 0); WREG32_SOC15_IP(GC, sdma_v7_0_get_reg_offset(adev, i, regSDMA0_QUEUE0_RB_WPTR), 0); WREG32_SOC15_IP(GC, sdma_v7_0_get_reg_offset(adev, i, regSDMA0_QUEUE0_RB_WPTR_HI), 0); + } + /* setup the wptr shadow polling */ + wptr_gpu_addr = ring->wptr_gpu_addr; + WREG32_SOC15_IP(GC, sdma_v7_0_get_reg_offset(adev, i, regSDMA0_QUEUE0_RB_WPTR_POLL_ADDR_LO), + lower_32_bits(wptr_gpu_addr)); + WREG32_SOC15_IP(GC, sdma_v7_0_get_reg_offset(adev, i, regSDMA0_QUEUE0_RB_WPTR_POLL_ADDR_HI), + upper_32_bits(wptr_gpu_addr)); + + /* set the wb address whether it's enabled or not */ + WREG32_SOC15_IP(GC, sdma_v7_0_get_reg_offset(adev, i, regSDMA0_QUEUE0_RB_RPTR_ADDR_HI), + upper_32_bits(ring->rptr_gpu_addr) & 0xFFFFFFFF); + WREG32_SOC15_IP(GC, sdma_v7_0_get_reg_offset(adev, i, regSDMA0_QUEUE0_RB_RPTR_ADDR_LO), + lower_32_bits(ring->rptr_gpu_addr) & 0xFFFFFFFC); + + rb_cntl = REG_SET_FIELD(rb_cntl, SDMA0_QUEUE0_RB_CNTL, RPTR_WRITEBACK_ENABLE, 1); + if (amdgpu_sriov_vf(adev)) + rb_cntl = REG_SET_FIELD(rb_cntl, SDMA0_QUEUE0_RB_CNTL, WPTR_POLL_ENABLE, 1); + else + rb_cntl = REG_SET_FIELD(rb_cntl, SDMA0_QUEUE0_RB_CNTL, WPTR_POLL_ENABLE, 0); - /* setup the wptr shadow polling */ - wptr_gpu_addr = ring->wptr_gpu_addr; - WREG32_SOC15_IP(GC, sdma_v7_0_get_reg_offset(adev, i, regSDMA0_QUEUE0_RB_WPTR_POLL_ADDR_LO), - lower_32_bits(wptr_gpu_addr)); - WREG32_SOC15_IP(GC, sdma_v7_0_get_reg_offset(adev, i, regSDMA0_QUEUE0_RB_WPTR_POLL_ADDR_HI), - upper_32_bits(wptr_gpu_addr)); - - /* set the wb address whether it's enabled or not */ - WREG32_SOC15_IP(GC, sdma_v7_0_get_reg_offset(adev, i, regSDMA0_QUEUE0_RB_RPTR_ADDR_HI), - upper_32_bits(ring->rptr_gpu_addr) & 0xFFFFFFFF); - WREG32_SOC15_IP(GC, sdma_v7_0_get_reg_offset(adev, i, regSDMA0_QUEUE0_RB_RPTR_ADDR_LO), - lower_32_bits(ring->rptr_gpu_addr) & 0xFFFFFFFC); - - rb_cntl = REG_SET_FIELD(rb_cntl, SDMA0_QUEUE0_RB_CNTL, RPTR_WRITEBACK_ENABLE, 1); - if (amdgpu_sriov_vf(adev)) - rb_cntl = REG_SET_FIELD(rb_cntl, SDMA0_QUEUE0_RB_CNTL, WPTR_POLL_ENABLE, 1); - else - rb_cntl = REG_SET_FIELD(rb_cntl, SDMA0_QUEUE0_RB_CNTL, WPTR_POLL_ENABLE, 0); - rb_cntl = REG_SET_FIELD(rb_cntl, SDMA0_QUEUE0_RB_CNTL, MCU_WPTR_POLL_ENABLE, 1); + rb_cntl = REG_SET_FIELD(rb_cntl, SDMA0_QUEUE0_RB_CNTL, MCU_WPTR_POLL_ENABLE, 1); - WREG32_SOC15_IP(GC, sdma_v7_0_get_reg_offset(adev, i, regSDMA0_QUEUE0_RB_BASE), ring->gpu_addr >> 8); - WREG32_SOC15_IP(GC, sdma_v7_0_get_reg_offset(adev, i, regSDMA0_QUEUE0_RB_BASE_HI), ring->gpu_addr >> 40); + WREG32_SOC15_IP(GC, sdma_v7_0_get_reg_offset(adev, i, regSDMA0_QUEUE0_RB_BASE), ring->gpu_addr >> 8); + WREG32_SOC15_IP(GC, sdma_v7_0_get_reg_offset(adev, i, regSDMA0_QUEUE0_RB_BASE_HI), ring->gpu_addr >> 40); + if (!restore) ring->wptr = 0; - /* before programing wptr to a less value, need set minor_ptr_update first */ - WREG32_SOC15_IP(GC, sdma_v7_0_get_reg_offset(adev, i, regSDMA0_QUEUE0_MINOR_PTR_UPDATE), 1); + /* before programing wptr to a less value, need set minor_ptr_update first */ + WREG32_SOC15_IP(GC, sdma_v7_0_get_reg_offset(adev, i, regSDMA0_QUEUE0_MINOR_PTR_UPDATE), 1); - if (!amdgpu_sriov_vf(adev)) { /* only bare-metal use register write for wptr */ - WREG32_SOC15_IP(GC, sdma_v7_0_get_reg_offset(adev, i, regSDMA0_QUEUE0_RB_WPTR), lower_32_bits(ring->wptr) << 2); - WREG32_SOC15_IP(GC, sdma_v7_0_get_reg_offset(adev, i, regSDMA0_QUEUE0_RB_WPTR_HI), upper_32_bits(ring->wptr) << 2); - } + if (!amdgpu_sriov_vf(adev)) { /* only bare-metal use register write for wptr */ + WREG32_SOC15_IP(GC, sdma_v7_0_get_reg_offset(adev, i, regSDMA0_QUEUE0_RB_WPTR), lower_32_bits(ring->wptr) << 2); + WREG32_SOC15_IP(GC, sdma_v7_0_get_reg_offset(adev, i, regSDMA0_QUEUE0_RB_WPTR_HI), upper_32_bits(ring->wptr) << 2); + } - doorbell = RREG32_SOC15_IP(GC, sdma_v7_0_get_reg_offset(adev, i, regSDMA0_QUEUE0_DOORBELL)); - doorbell_offset = RREG32_SOC15_IP(GC, sdma_v7_0_get_reg_offset(adev, i, regSDMA0_QUEUE0_DOORBELL_OFFSET)); + doorbell = RREG32_SOC15_IP(GC, sdma_v7_0_get_reg_offset(adev, i, regSDMA0_QUEUE0_DOORBELL)); + doorbell_offset = RREG32_SOC15_IP(GC, sdma_v7_0_get_reg_offset(adev, i, regSDMA0_QUEUE0_DOORBELL_OFFSET)); - if (ring->use_doorbell) { - doorbell = REG_SET_FIELD(doorbell, SDMA0_QUEUE0_DOORBELL, ENABLE, 1); - doorbell_offset = REG_SET_FIELD(doorbell_offset, SDMA0_QUEUE0_DOORBELL_OFFSET, - OFFSET, ring->doorbell_index); - } else { - doorbell = REG_SET_FIELD(doorbell, SDMA0_QUEUE0_DOORBELL, ENABLE, 0); - } - WREG32_SOC15_IP(GC, sdma_v7_0_get_reg_offset(adev, i, regSDMA0_QUEUE0_DOORBELL), doorbell); - WREG32_SOC15_IP(GC, sdma_v7_0_get_reg_offset(adev, i, regSDMA0_QUEUE0_DOORBELL_OFFSET), doorbell_offset); - - if (i == 0) - adev->nbio.funcs->sdma_doorbell_range(adev, i, ring->use_doorbell, - ring->doorbell_index, - adev->doorbell_index.sdma_doorbell_range * adev->sdma.num_instances); - - if (amdgpu_sriov_vf(adev)) - sdma_v7_0_ring_set_wptr(ring); - - /* set minor_ptr_update to 0 after wptr programed */ - WREG32_SOC15_IP(GC, sdma_v7_0_get_reg_offset(adev, i, regSDMA0_QUEUE0_MINOR_PTR_UPDATE), 0); - - /* Set up sdma hang watchdog */ - tmp = RREG32_SOC15_IP(GC, sdma_v7_0_get_reg_offset(adev, i, regSDMA0_WATCHDOG_CNTL)); - /* 100ms per unit */ - tmp = REG_SET_FIELD(tmp, SDMA0_WATCHDOG_CNTL, QUEUE_HANG_COUNT, - max(adev->usec_timeout/100000, 1)); - WREG32_SOC15_IP(GC, sdma_v7_0_get_reg_offset(adev, i, regSDMA0_WATCHDOG_CNTL), tmp); - - /* Set up RESP_MODE to non-copy addresses */ - tmp = RREG32_SOC15_IP(GC, sdma_v7_0_get_reg_offset(adev, i, regSDMA0_UTCL1_CNTL)); - tmp = REG_SET_FIELD(tmp, SDMA0_UTCL1_CNTL, RESP_MODE, 3); - tmp = REG_SET_FIELD(tmp, SDMA0_UTCL1_CNTL, REDO_DELAY, 9); - WREG32_SOC15_IP(GC, sdma_v7_0_get_reg_offset(adev, i, regSDMA0_UTCL1_CNTL), tmp); - - /* program default cache read and write policy */ - tmp = RREG32_SOC15_IP(GC, sdma_v7_0_get_reg_offset(adev, i, regSDMA0_UTCL1_PAGE)); - /* clean read policy and write policy bits */ - tmp &= 0xFF0FFF; - tmp |= ((CACHE_READ_POLICY_L2__DEFAULT << 12) | - (CACHE_WRITE_POLICY_L2__DEFAULT << 14)); - WREG32_SOC15_IP(GC, sdma_v7_0_get_reg_offset(adev, i, regSDMA0_UTCL1_PAGE), tmp); - - if (!amdgpu_sriov_vf(adev)) { - /* unhalt engine */ - tmp = RREG32_SOC15_IP(GC, sdma_v7_0_get_reg_offset(adev, i, regSDMA0_MCU_CNTL)); - tmp = REG_SET_FIELD(tmp, SDMA0_MCU_CNTL, HALT, 0); - tmp = REG_SET_FIELD(tmp, SDMA0_MCU_CNTL, RESET, 0); - WREG32_SOC15_IP(GC, sdma_v7_0_get_reg_offset(adev, i, regSDMA0_MCU_CNTL), tmp); - } + if (ring->use_doorbell) { + doorbell = REG_SET_FIELD(doorbell, SDMA0_QUEUE0_DOORBELL, ENABLE, 1); + doorbell_offset = REG_SET_FIELD(doorbell_offset, SDMA0_QUEUE0_DOORBELL_OFFSET, + OFFSET, ring->doorbell_index); + } else { + doorbell = REG_SET_FIELD(doorbell, SDMA0_QUEUE0_DOORBELL, ENABLE, 0); + } + WREG32_SOC15_IP(GC, sdma_v7_0_get_reg_offset(adev, i, regSDMA0_QUEUE0_DOORBELL), doorbell); + WREG32_SOC15_IP(GC, sdma_v7_0_get_reg_offset(adev, i, regSDMA0_QUEUE0_DOORBELL_OFFSET), doorbell_offset); - /* enable DMA RB */ - rb_cntl = REG_SET_FIELD(rb_cntl, SDMA0_QUEUE0_RB_CNTL, RB_ENABLE, 1); - WREG32_SOC15_IP(GC, sdma_v7_0_get_reg_offset(adev, i, regSDMA0_QUEUE0_RB_CNTL), rb_cntl); + if (i == 0) + adev->nbio.funcs->sdma_doorbell_range(adev, i, ring->use_doorbell, + ring->doorbell_index, + adev->doorbell_index.sdma_doorbell_range * adev->sdma.num_instances); - ib_cntl = RREG32_SOC15_IP(GC, sdma_v7_0_get_reg_offset(adev, i, regSDMA0_QUEUE0_IB_CNTL)); - ib_cntl = REG_SET_FIELD(ib_cntl, SDMA0_QUEUE0_IB_CNTL, IB_ENABLE, 1); + if (amdgpu_sriov_vf(adev)) + sdma_v7_0_ring_set_wptr(ring); + + /* set minor_ptr_update to 0 after wptr programed */ + WREG32_SOC15_IP(GC, sdma_v7_0_get_reg_offset(adev, i, regSDMA0_QUEUE0_MINOR_PTR_UPDATE), 0); + + /* Set up sdma hang watchdog */ + temp = RREG32_SOC15_IP(GC, sdma_v7_0_get_reg_offset(adev, i, regSDMA0_WATCHDOG_CNTL)); + /* 100ms per unit */ + temp = REG_SET_FIELD(temp, SDMA0_WATCHDOG_CNTL, QUEUE_HANG_COUNT, + max(adev->usec_timeout/100000, 1)); + WREG32_SOC15_IP(GC, sdma_v7_0_get_reg_offset(adev, i, regSDMA0_WATCHDOG_CNTL), temp); + + /* Set up RESP_MODE to non-copy addresses */ + temp = RREG32_SOC15_IP(GC, sdma_v7_0_get_reg_offset(adev, i, regSDMA0_UTCL1_CNTL)); + temp = REG_SET_FIELD(temp, SDMA0_UTCL1_CNTL, RESP_MODE, 3); + temp = REG_SET_FIELD(temp, SDMA0_UTCL1_CNTL, REDO_DELAY, 9); + WREG32_SOC15_IP(GC, sdma_v7_0_get_reg_offset(adev, i, regSDMA0_UTCL1_CNTL), temp); + + /* program default cache read and write policy */ + temp = RREG32_SOC15_IP(GC, sdma_v7_0_get_reg_offset(adev, i, regSDMA0_UTCL1_PAGE)); + /* clean read policy and write policy bits */ + temp &= 0xFF0FFF; + temp |= ((CACHE_READ_POLICY_L2__DEFAULT << 12) | + (CACHE_WRITE_POLICY_L2__DEFAULT << 14)); + WREG32_SOC15_IP(GC, sdma_v7_0_get_reg_offset(adev, i, regSDMA0_UTCL1_PAGE), temp); + + if (!amdgpu_sriov_vf(adev)) { + /* unhalt engine */ + temp = RREG32_SOC15_IP(GC, sdma_v7_0_get_reg_offset(adev, i, regSDMA0_MCU_CNTL)); + temp = REG_SET_FIELD(temp, SDMA0_MCU_CNTL, HALT, 0); + temp = REG_SET_FIELD(temp, SDMA0_MCU_CNTL, RESET, 0); + WREG32_SOC15_IP(GC, sdma_v7_0_get_reg_offset(adev, i, regSDMA0_MCU_CNTL), temp); + } + + /* enable DMA RB */ + rb_cntl = REG_SET_FIELD(rb_cntl, SDMA0_QUEUE0_RB_CNTL, RB_ENABLE, 1); + WREG32_SOC15_IP(GC, sdma_v7_0_get_reg_offset(adev, i, regSDMA0_QUEUE0_RB_CNTL), rb_cntl); + + ib_cntl = RREG32_SOC15_IP(GC, sdma_v7_0_get_reg_offset(adev, i, regSDMA0_QUEUE0_IB_CNTL)); + ib_cntl = REG_SET_FIELD(ib_cntl, SDMA0_QUEUE0_IB_CNTL, IB_ENABLE, 1); #ifdef __BIG_ENDIAN - ib_cntl = REG_SET_FIELD(ib_cntl, SDMA0_QUEUE0_IB_CNTL, IB_SWAP_ENABLE, 1); + ib_cntl = REG_SET_FIELD(ib_cntl, SDMA0_QUEUE0_IB_CNTL, IB_SWAP_ENABLE, 1); #endif - /* enable DMA IBs */ - WREG32_SOC15_IP(GC, sdma_v7_0_get_reg_offset(adev, i, regSDMA0_QUEUE0_IB_CNTL), ib_cntl); + /* enable DMA IBs */ + WREG32_SOC15_IP(GC, sdma_v7_0_get_reg_offset(adev, i, regSDMA0_QUEUE0_IB_CNTL), ib_cntl); + ring->sched.ready = true; - ring->sched.ready = true; + if (amdgpu_sriov_vf(adev)) { /* bare-metal sequence doesn't need below to lines */ + sdma_v7_0_ctx_switch_enable(adev, true); + sdma_v7_0_enable(adev, true); + } - if (amdgpu_sriov_vf(adev)) { /* bare-metal sequence doesn't need below to lines */ - sdma_v7_0_ctx_switch_enable(adev, true); - sdma_v7_0_enable(adev, true); - } + r = amdgpu_ring_test_helper(ring); + if (r) + ring->sched.ready = false; - r = amdgpu_ring_test_helper(ring); - if (r) { - ring->sched.ready = false; - return r; - } + return r; +} + +/** + * sdma_v7_0_gfx_resume - setup and start the async dma engines + * + * @adev: amdgpu_device pointer + * + * Set up the gfx DMA ring buffers and enable them. + * Returns 0 for success, error for failure. + */ +static int sdma_v7_0_gfx_resume(struct amdgpu_device *adev) +{ + int i, r; + for (i = 0; i < adev->sdma.num_instances; i++) { + r = sdma_v7_0_gfx_resume_instance(adev, i, false); + if (r) + return r; } return 0; + } /** @@ -806,6 +829,31 @@ static bool sdma_v7_0_check_soft_reset(struct amdgpu_ip_block *ip_block) return false; } +static int sdma_v7_0_reset_queue(struct amdgpu_ring *ring, unsigned int vmid) +{ + struct amdgpu_device *adev = ring->adev; + int i, r; + + if (amdgpu_sriov_vf(adev)) + return -EINVAL; + + for (i = 0; i < adev->sdma.num_instances; i++) { + if (ring == &adev->sdma.instance[i].ring) + break; + } + + if (i == adev->sdma.num_instances) { + DRM_ERROR("sdma instance not found\n"); + return -EINVAL; + } + + r = amdgpu_mes_reset_legacy_queue(adev, ring, vmid, true); + if (r) + return r; + + return sdma_v7_0_gfx_resume_instance(adev, i, true); +} + /** * sdma_v7_0_start - setup and start the async dma engines * @@ -1060,7 +1108,7 @@ static int sdma_v7_0_ring_test_ib(struct amdgpu_ring *ring, long timeout) r = -EINVAL; err1: - amdgpu_ib_free(adev, &ib, NULL); + amdgpu_ib_free(&ib, NULL); dma_fence_put(f); err0: if (!ring->is_mes_queue) @@ -1316,6 +1364,13 @@ static int sdma_v7_0_sw_init(struct amdgpu_ip_block *ip_block) return r; } + adev->sdma.supported_reset = + amdgpu_get_soft_full_reset_mask(&adev->sdma.instance[0].ring); + adev->sdma.supported_reset |= AMDGPU_RESET_TYPE_PER_QUEUE; + + r = amdgpu_sdma_sysfs_reset_mask_init(adev); + if (r) + return r; /* Allocate memory for SDMA IP Dump buffer */ ptr = kcalloc(adev->sdma.num_instances * reg_count, sizeof(uint32_t), GFP_KERNEL); if (ptr) @@ -1334,6 +1389,7 @@ static int sdma_v7_0_sw_fini(struct amdgpu_ip_block *ip_block) for (i = 0; i < adev->sdma.num_instances; i++) amdgpu_ring_fini(&adev->sdma.instance[i].ring); + amdgpu_sdma_sysfs_reset_mask_fini(adev); amdgpu_sdma_destroy_inst_ctx(adev, true); if (adev->firmware.load_type == AMDGPU_FW_LOAD_DIRECT) @@ -1524,13 +1580,13 @@ static int sdma_v7_0_process_illegal_inst_irq(struct amdgpu_device *adev, return 0; } -static int sdma_v7_0_set_clockgating_state(void *handle, +static int sdma_v7_0_set_clockgating_state(struct amdgpu_ip_block *ip_block, enum amd_clockgating_state state) { return 0; } -static int sdma_v7_0_set_powergating_state(void *handle, +static int sdma_v7_0_set_powergating_state(struct amdgpu_ip_block *ip_block, enum amd_powergating_state state) { return 0; @@ -1636,6 +1692,7 @@ static const struct amdgpu_ring_funcs sdma_v7_0_ring_funcs = { .emit_reg_write_reg_wait = sdma_v7_0_ring_emit_reg_write_reg_wait, .init_cond_exec = sdma_v7_0_ring_init_cond_exec, .preempt_ib = sdma_v7_0_ring_preempt_ib, + .reset = sdma_v7_0_reset_queue, }; static void sdma_v7_0_set_ring_funcs(struct amdgpu_device *adev) diff --git a/drivers/gpu/drm/amd/amdgpu/si.c b/drivers/gpu/drm/amd/amdgpu/si.c index 00f63d3fbea7157e486350018e2c3e975cbaa1d0..77ef7da2e4fe43fc028402b32d9136b4a45bb51f 100644 --- a/drivers/gpu/drm/amd/amdgpu/si.c +++ b/drivers/gpu/drm/amd/amdgpu/si.c @@ -2649,13 +2649,13 @@ static bool si_common_is_idle(void *handle) return true; } -static int si_common_set_clockgating_state(void *handle, +static int si_common_set_clockgating_state(struct amdgpu_ip_block *ip_block, enum amd_clockgating_state state) { return 0; } -static int si_common_set_powergating_state(void *handle, +static int si_common_set_powergating_state(struct amdgpu_ip_block *ip_block, enum amd_powergating_state state) { return 0; diff --git a/drivers/gpu/drm/amd/amdgpu/si_dma.c b/drivers/gpu/drm/amd/amdgpu/si_dma.c index 47647a6083e8b11acb8b957c181f6b592382c145..dbd78d5345a4279c1b5b640ec2bf8c1e59402368 100644 --- a/drivers/gpu/drm/amd/amdgpu/si_dma.c +++ b/drivers/gpu/drm/amd/amdgpu/si_dma.c @@ -286,7 +286,7 @@ static int si_dma_ring_test_ib(struct amdgpu_ring *ring, long timeout) r = -EINVAL; err1: - amdgpu_ib_free(adev, &ib, NULL); + amdgpu_ib_free(&ib, NULL); dma_fence_put(f); err0: amdgpu_device_wb_free(adev, index); @@ -629,13 +629,13 @@ static int si_dma_process_trap_irq(struct amdgpu_device *adev, return 0; } -static int si_dma_set_clockgating_state(void *handle, +static int si_dma_set_clockgating_state(struct amdgpu_ip_block *ip_block, enum amd_clockgating_state state) { u32 orig, data, offset; int i; bool enable; - struct amdgpu_device *adev = (struct amdgpu_device *)handle; + struct amdgpu_device *adev = ip_block->adev; enable = (state == AMD_CG_STATE_GATE); @@ -672,12 +672,12 @@ static int si_dma_set_clockgating_state(void *handle, return 0; } -static int si_dma_set_powergating_state(void *handle, +static int si_dma_set_powergating_state(struct amdgpu_ip_block *ip_block, enum amd_powergating_state state) { u32 tmp; - struct amdgpu_device *adev = (struct amdgpu_device *)handle; + struct amdgpu_device *adev = ip_block->adev; WREG32(DMA_PGFSM_WRITE, 0x00002000); WREG32(DMA_PGFSM_CONFIG, 0x100010ff); diff --git a/drivers/gpu/drm/amd/amdgpu/si_ih.c b/drivers/gpu/drm/amd/amdgpu/si_ih.c index 2ec1ebe4db11fc5edf2a5b3be9301099f3d12073..a32b6243c1f8732e3cfac289be009b45ef53c3a0 100644 --- a/drivers/gpu/drm/amd/amdgpu/si_ih.c +++ b/drivers/gpu/drm/amd/amdgpu/si_ih.c @@ -263,13 +263,13 @@ static int si_ih_soft_reset(struct amdgpu_ip_block *ip_block) return 0; } -static int si_ih_set_clockgating_state(void *handle, +static int si_ih_set_clockgating_state(struct amdgpu_ip_block *ip_block, enum amd_clockgating_state state) { return 0; } -static int si_ih_set_powergating_state(void *handle, +static int si_ih_set_powergating_state(struct amdgpu_ip_block *ip_block, enum amd_powergating_state state) { return 0; diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c b/drivers/gpu/drm/amd/amdgpu/soc15.c index ede072758dabf1569993980717df8ba5c5322708..a59b4c36cad73378abfe9273dd7857273afcab31 100644 --- a/drivers/gpu/drm/amd/amdgpu/soc15.c +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c @@ -171,6 +171,24 @@ static const struct amdgpu_video_codecs vcn_4_0_3_video_codecs_encode = { .codec_array = NULL, }; +static const struct amdgpu_video_codecs vcn_5_0_1_video_codecs_encode_vcn0 = { + .codec_count = 0, + .codec_array = NULL, +}; + +static const struct amdgpu_video_codec_info vcn_5_0_1_video_codecs_decode_array_vcn0[] = { + {codec_info_build(AMDGPU_INFO_VIDEO_CAPS_CODEC_IDX_MPEG4_AVC, 4096, 4096, 52)}, + {codec_info_build(AMDGPU_INFO_VIDEO_CAPS_CODEC_IDX_HEVC, 8192, 4352, 186)}, + {codec_info_build(AMDGPU_INFO_VIDEO_CAPS_CODEC_IDX_JPEG, 16384, 16384, 0)}, + {codec_info_build(AMDGPU_INFO_VIDEO_CAPS_CODEC_IDX_VP9, 8192, 4352, 0)}, + {codec_info_build(AMDGPU_INFO_VIDEO_CAPS_CODEC_IDX_AV1, 8192, 4352, 0)}, +}; + +static const struct amdgpu_video_codecs vcn_5_0_1_video_codecs_decode_vcn0 = { + .codec_count = ARRAY_SIZE(vcn_5_0_1_video_codecs_decode_array_vcn0), + .codec_array = vcn_5_0_1_video_codecs_decode_array_vcn0, +}; + static int soc15_query_video_codecs(struct amdgpu_device *adev, bool encode, const struct amdgpu_video_codecs **codecs) { @@ -209,6 +227,12 @@ static int soc15_query_video_codecs(struct amdgpu_device *adev, bool encode, else *codecs = &vcn_4_0_3_video_codecs_decode; return 0; + case IP_VERSION(5, 0, 1): + if (encode) + *codecs = &vcn_5_0_1_video_codecs_encode_vcn0; + else + *codecs = &vcn_5_0_1_video_codecs_decode_vcn0; + return 0; default: return -EINVAL; } @@ -327,6 +351,7 @@ static u32 soc15_get_xclk(struct amdgpu_device *adev) if (amdgpu_ip_version(adev, MP1_HWIP, 0) == IP_VERSION(12, 0, 0) || amdgpu_ip_version(adev, MP1_HWIP, 0) == IP_VERSION(12, 0, 1) || amdgpu_ip_version(adev, MP1_HWIP, 0) == IP_VERSION(13, 0, 6) || + amdgpu_ip_version(adev, MP1_HWIP, 0) == IP_VERSION(13, 0, 12) || amdgpu_ip_version(adev, MP1_HWIP, 0) == IP_VERSION(13, 0, 14)) return 10000; if (amdgpu_ip_version(adev, MP1_HWIP, 0) == IP_VERSION(10, 0, 0) || @@ -556,6 +581,7 @@ soc15_asic_reset_method(struct amdgpu_device *adev) break; case IP_VERSION(13, 0, 6): case IP_VERSION(13, 0, 14): + case IP_VERSION(13, 0, 12): /* Use gpu_recovery param to target a reset method. * Enable triggering of GPU reset only if specified * by module parameter. @@ -1177,6 +1203,7 @@ static int soc15_common_early_init(struct amdgpu_ip_block *ip_block) break; case IP_VERSION(9, 4, 3): case IP_VERSION(9, 4, 4): + case IP_VERSION(9, 5, 0): adev->asic_funcs = &aqua_vanjaram_asic_funcs; adev->cg_flags = AMD_CG_SUPPORT_GFX_MGCG | AMD_CG_SUPPORT_GFX_CGCG | @@ -1385,10 +1412,10 @@ static void soc15_update_drm_light_sleep(struct amdgpu_device *adev, bool enable WREG32(SOC15_REG_OFFSET(MP0, 0, mmMP0_MISC_LIGHT_SLEEP_CTRL), data); } -static int soc15_common_set_clockgating_state(void *handle, +static int soc15_common_set_clockgating_state(struct amdgpu_ip_block *ip_block, enum amd_clockgating_state state) { - struct amdgpu_device *adev = (struct amdgpu_device *)handle; + struct amdgpu_device *adev = ip_block->adev; if (amdgpu_sriov_vf(adev)) return 0; @@ -1453,6 +1480,7 @@ static void soc15_common_get_clockgating_state(void *handle, u64 *flags) if ((amdgpu_ip_version(adev, MP0_HWIP, 0) != IP_VERSION(13, 0, 2)) && (amdgpu_ip_version(adev, MP0_HWIP, 0) != IP_VERSION(13, 0, 6)) && + (amdgpu_ip_version(adev, MP0_HWIP, 0) != IP_VERSION(13, 0, 12)) && (amdgpu_ip_version(adev, MP0_HWIP, 0) != IP_VERSION(13, 0, 14))) { /* AMD_CG_SUPPORT_DRM_MGCG */ data = RREG32(SOC15_REG_OFFSET(MP0, 0, mmMP0_MISC_CGTT_CTRL0)); @@ -1473,7 +1501,7 @@ static void soc15_common_get_clockgating_state(void *handle, u64 *flags) adev->df.funcs->get_clockgating_state(adev, flags); } -static int soc15_common_set_powergating_state(void *handle, +static int soc15_common_set_powergating_state(struct amdgpu_ip_block *ip_block, enum amd_powergating_state state) { /* todo */ diff --git a/drivers/gpu/drm/amd/amdgpu/soc21.c b/drivers/gpu/drm/amd/amdgpu/soc21.c index d6999835918fa0d8d4d9fe0f6a52cb56af6f4804..62ad67d0b598f576ff5af435675622745eb36af9 100644 --- a/drivers/gpu/drm/amd/amdgpu/soc21.c +++ b/drivers/gpu/drm/amd/amdgpu/soc21.c @@ -928,10 +928,10 @@ static bool soc21_common_is_idle(void *handle) return true; } -static int soc21_common_set_clockgating_state(void *handle, +static int soc21_common_set_clockgating_state(struct amdgpu_ip_block *ip_block, enum amd_clockgating_state state) { - struct amdgpu_device *adev = (struct amdgpu_device *)handle; + struct amdgpu_device *adev = ip_block->adev; switch (amdgpu_ip_version(adev, NBIO_HWIP, 0)) { case IP_VERSION(4, 3, 0): @@ -954,10 +954,10 @@ static int soc21_common_set_clockgating_state(void *handle, return 0; } -static int soc21_common_set_powergating_state(void *handle, +static int soc21_common_set_powergating_state(struct amdgpu_ip_block *ip_block, enum amd_powergating_state state) { - struct amdgpu_device *adev = (struct amdgpu_device *)handle; + struct amdgpu_device *adev = ip_block->adev; switch (amdgpu_ip_version(adev, LSDMA_HWIP, 0)) { case IP_VERSION(6, 0, 0): diff --git a/drivers/gpu/drm/amd/amdgpu/soc24.c b/drivers/gpu/drm/amd/amdgpu/soc24.c index be96de92b2f5df99343904db5acb3374d680aac1..6b8e078ee7c7519c5ebdcd76ba56c5064117dca0 100644 --- a/drivers/gpu/drm/amd/amdgpu/soc24.c +++ b/drivers/gpu/drm/amd/amdgpu/soc24.c @@ -444,8 +444,18 @@ static int soc24_common_late_init(struct amdgpu_ip_block *ip_block) { struct amdgpu_device *adev = ip_block->adev; - if (amdgpu_sriov_vf(adev)) + if (amdgpu_sriov_vf(adev)) { xgpu_nv_mailbox_get_irq(adev); + } else { + if (adev->nbio.ras && + adev->nbio.ras_err_event_athub_irq.funcs) + /* don't need to fail gpu late init + * if enabling athub_err_event interrupt failed + * nbif v6_3_1 only support fatal error hanlding + * just enable the interrupt directly + */ + amdgpu_irq_get(adev, &adev->nbio.ras_err_event_athub_irq, 0); + } /* Enable selfring doorbell aperture late because doorbell BAR * aperture will change if resize BAR successfully in gmc sw_init. @@ -501,8 +511,13 @@ static int soc24_common_hw_fini(struct amdgpu_ip_block *ip_block) adev->nbio.funcs->enable_doorbell_aperture(adev, false); adev->nbio.funcs->enable_doorbell_selfring_aperture(adev, false); - if (amdgpu_sriov_vf(adev)) + if (amdgpu_sriov_vf(adev)) { xgpu_nv_mailbox_put_irq(adev); + } else { + if (adev->nbio.ras && + adev->nbio.ras_err_event_athub_irq.funcs) + amdgpu_irq_put(adev, &adev->nbio.ras_err_event_athub_irq, 0); + } return 0; } @@ -522,10 +537,10 @@ static bool soc24_common_is_idle(void *handle) return true; } -static int soc24_common_set_clockgating_state(void *handle, +static int soc24_common_set_clockgating_state(struct amdgpu_ip_block *ip_block, enum amd_clockgating_state state) { - struct amdgpu_device *adev = (struct amdgpu_device *)handle; + struct amdgpu_device *adev = ip_block->adev; switch (amdgpu_ip_version(adev, NBIO_HWIP, 0)) { case IP_VERSION(6, 3, 1): @@ -542,10 +557,10 @@ static int soc24_common_set_clockgating_state(void *handle, return 0; } -static int soc24_common_set_powergating_state(void *handle, +static int soc24_common_set_powergating_state(struct amdgpu_ip_block *ip_block, enum amd_powergating_state state) { - struct amdgpu_device *adev = (struct amdgpu_device *)handle; + struct amdgpu_device *adev = ip_block->adev; switch (amdgpu_ip_version(adev, LSDMA_HWIP, 0)) { case IP_VERSION(7, 0, 0): diff --git a/drivers/gpu/drm/amd/amdgpu/ta_ras_if.h b/drivers/gpu/drm/amd/amdgpu/ta_ras_if.h index 21b71a427b1fdfa5cf713f3ed187d9dcde3a65a7..64891f0993666e62b26aa95f5412ffc217ba45c6 100644 --- a/drivers/gpu/drm/amd/amdgpu/ta_ras_if.h +++ b/drivers/gpu/drm/amd/amdgpu/ta_ras_if.h @@ -30,6 +30,9 @@ #define RSP_ID_MASK (1U << 31) #define RSP_ID(cmdId) (((uint32_t)(cmdId)) | RSP_ID_MASK) +/* invalid node instance value */ +#define TA_RAS_INV_NODE 0xffff + /* RAS related enumerations */ /**********************************************************/ enum ras_command { diff --git a/drivers/gpu/drm/amd/amdgpu/ta_secureDisplay_if.h b/drivers/gpu/drm/amd/amdgpu/ta_secureDisplay_if.h index 00d8bdb8254fb5d8e7aafc3a10867b5fb95d70d4..9ec2e03d41c72d312d7947d1671b4b3b5df435dd 100644 --- a/drivers/gpu/drm/amd/amdgpu/ta_secureDisplay_if.h +++ b/drivers/gpu/drm/amd/amdgpu/ta_secureDisplay_if.h @@ -31,10 +31,12 @@ * Secure Display Command ID */ enum ta_securedisplay_command { - /* Query whether TA is responding used only for validation purpose */ + /* Query whether TA is responding. It is used only for validation purpose */ TA_SECUREDISPLAY_COMMAND__QUERY_TA = 1, /* Send region of Interest and CRC value to I2C */ TA_SECUREDISPLAY_COMMAND__SEND_ROI_CRC = 2, + /* V2 to send multiple regions of Interest and CRC value to I2C */ + TA_SECUREDISPLAY_COMMAND__SEND_ROI_CRC_V2 = 3, /* Maximum Command ID */ TA_SECUREDISPLAY_COMMAND__MAX_ID = 0x7FFFFFFF, }; @@ -83,6 +85,8 @@ enum ta_securedisplay_ta_query_cmd_ret { enum ta_securedisplay_buffer_size { /* 15 bytes = 8 byte (ROI) + 6 byte(CRC) + 1 byte(phy_id) */ TA_SECUREDISPLAY_I2C_BUFFER_SIZE = 15, + /* 16 bytes = 8 byte (ROI) + 6 byte(CRC) + 1 byte(phy_id) + 1 byte(roi_idx) */ + TA_SECUREDISPLAY_V2_I2C_BUFFER_SIZE = 16, }; /** Input/output structures for Secure Display commands */ @@ -95,7 +99,15 @@ enum ta_securedisplay_buffer_size { * Physical ID to determine which DIO scratch register should be used to get ROI */ struct ta_securedisplay_send_roi_crc_input { - uint32_t phy_id; /* Physical ID */ + /* Physical ID */ + uint32_t phy_id; +}; + +struct ta_securedisplay_send_roi_crc_v2_input { + /* Physical ID */ + uint32_t phy_id; + /* Region of interest index */ + uint8_t roi_idx; }; /** @union ta_securedisplay_cmd_input @@ -104,6 +116,8 @@ struct ta_securedisplay_send_roi_crc_input { union ta_securedisplay_cmd_input { /* send ROI and CRC input buffer format */ struct ta_securedisplay_send_roi_crc_input send_roi_crc; + /* send ROI and CRC input buffer format, v2 adds a ROI index */ + struct ta_securedisplay_send_roi_crc_v2_input send_roi_crc_v2; uint32_t reserved[4]; }; @@ -128,6 +142,10 @@ struct ta_securedisplay_send_roi_crc_output { uint8_t reserved; }; +struct ta_securedisplay_send_roi_crc_v2_output { + uint8_t i2c_buf[TA_SECUREDISPLAY_V2_I2C_BUFFER_SIZE]; /* I2C buffer */ +}; + /** @union ta_securedisplay_cmd_output * Output buffer */ @@ -136,6 +154,8 @@ union ta_securedisplay_cmd_output { struct ta_securedisplay_query_ta_output query_ta; /* Send ROI CRC output buffer format used only for validation purpose */ struct ta_securedisplay_send_roi_crc_output send_roi_crc; + /* Send ROI CRC output buffer format used only for validation purpose */ + struct ta_securedisplay_send_roi_crc_v2_output send_roi_crc_v2; uint32_t reserved[4]; }; diff --git a/drivers/gpu/drm/amd/amdgpu/tonga_ih.c b/drivers/gpu/drm/amd/amdgpu/tonga_ih.c index 5a04a677013808405162cfe76960eb33190f4952..0968e551f7b5f1e86283bb192ca493da645a1638 100644 --- a/drivers/gpu/drm/amd/amdgpu/tonga_ih.c +++ b/drivers/gpu/drm/amd/amdgpu/tonga_ih.c @@ -448,13 +448,13 @@ static int tonga_ih_soft_reset(struct amdgpu_ip_block *ip_block) return 0; } -static int tonga_ih_set_clockgating_state(void *handle, +static int tonga_ih_set_clockgating_state(struct amdgpu_ip_block *ip_block, enum amd_clockgating_state state) { return 0; } -static int tonga_ih_set_powergating_state(void *handle, +static int tonga_ih_set_powergating_state(struct amdgpu_ip_block *ip_block, enum amd_powergating_state state) { return 0; diff --git a/drivers/gpu/drm/amd/amdgpu/umc_v12_0.c b/drivers/gpu/drm/amd/amdgpu/umc_v12_0.c index 1a8ea834efa6bdba10f0d8af1f7ce9330c50eece..a7b9c358a2d4c5101a815131c6f0387fbc1ea55e 100644 --- a/drivers/gpu/drm/amd/amdgpu/umc_v12_0.c +++ b/drivers/gpu/drm/amd/amdgpu/umc_v12_0.c @@ -173,156 +173,96 @@ static void umc_v12_0_query_ras_error_count(struct amdgpu_device *adev, umc_v12_0_reset_error_count(adev); } -static void umc_v12_0_convert_error_address(struct amdgpu_device *adev, +static int umc_v12_0_convert_error_address(struct amdgpu_device *adev, struct ras_err_data *err_data, - struct ta_ras_query_address_input *addr_in) + struct ta_ras_query_address_input *addr_in, + struct ta_ras_query_address_output *addr_out, + bool dump_addr) { - uint32_t col, row, row_xor, bank, channel_index; - uint64_t soc_pa, retired_page, column, err_addr; - struct ta_ras_query_address_output addr_out; + uint32_t col, col_lower, row, row_lower, bank; + uint32_t channel_index = 0, umc_inst = 0; + uint32_t i, loop_bits[UMC_V12_0_RETIRE_LOOP_BITS]; + uint64_t soc_pa, column, err_addr; + struct ta_ras_query_address_output addr_out_tmp; + struct ta_ras_query_address_output *paddr_out; + enum amdgpu_memory_partition nps = AMDGPU_NPS1_PARTITION_MODE; + int ret = 0; + + if (!addr_out) + paddr_out = &addr_out_tmp; + else + paddr_out = addr_out; - err_addr = addr_in->ma.err_addr; - addr_in->addr_type = TA_RAS_MCA_TO_PA; - if (psp_ras_query_address(&adev->psp, addr_in, &addr_out)) { - dev_warn(adev->dev, "Failed to query RAS physical address for 0x%llx", - err_addr); + err_addr = bank = 0; + if (addr_in) { + err_addr = addr_in->ma.err_addr; + addr_in->addr_type = TA_RAS_MCA_TO_PA; + ret = psp_ras_query_address(&adev->psp, addr_in, paddr_out); + if (ret) { + dev_warn(adev->dev, "Failed to query RAS physical address for 0x%llx", + err_addr); - return; - } + goto out; + } - soc_pa = addr_out.pa.pa; - bank = addr_out.pa.bank; - channel_index = addr_out.pa.channel_idx; - - col = (err_addr >> 1) & 0x1fULL; - row = (err_addr >> 10) & 0x3fffULL; - row_xor = row ^ (0x1ULL << 13); - /* clear [C3 C2] in soc physical address */ - soc_pa &= ~(0x3ULL << UMC_V12_0_PA_C2_BIT); - /* clear [C4] in soc physical address */ - soc_pa &= ~(0x1ULL << UMC_V12_0_PA_C4_BIT); - - /* loop for all possibilities of [C4 C3 C2] */ - for (column = 0; column < UMC_V12_0_NA_MAP_PA_NUM; column++) { - retired_page = soc_pa | ((column & 0x3) << UMC_V12_0_PA_C2_BIT); - retired_page |= (((column & 0x4) >> 2) << UMC_V12_0_PA_C4_BIT); - /* include column bit 0 and 1 */ - col &= 0x3; - col |= (column << 2); - dev_info(adev->dev, - "Error Address(PA):0x%-10llx Row:0x%-4x Col:0x%-2x Bank:0x%x Channel:0x%x\n", - retired_page, row, col, bank, channel_index); - amdgpu_umc_fill_error_record(err_data, err_addr, - retired_page, channel_index, addr_in->ma.umc_inst); - - /* shift R13 bit */ - retired_page ^= (0x1ULL << UMC_V12_0_PA_R13_BIT); - dev_info(adev->dev, - "Error Address(PA):0x%-10llx Row:0x%-4x Col:0x%-2x Bank:0x%x Channel:0x%x\n", - retired_page, row_xor, col, bank, channel_index); - amdgpu_umc_fill_error_record(err_data, err_addr, - retired_page, channel_index, addr_in->ma.umc_inst); + bank = paddr_out->pa.bank; + /* no need to care about umc inst if addr_in is NULL */ + umc_inst = addr_in->ma.umc_inst; } -} -static void umc_v12_0_dump_addr_info(struct amdgpu_device *adev, - struct ta_ras_query_address_output *addr_out, - uint64_t err_addr) -{ - uint32_t col, row, row_xor, bank, channel_index; - uint64_t soc_pa, retired_page, column; - - soc_pa = addr_out->pa.pa; - bank = addr_out->pa.bank; - channel_index = addr_out->pa.channel_idx; - - col = (err_addr >> 1) & 0x1fULL; - row = (err_addr >> 10) & 0x3fffULL; - row_xor = row ^ (0x1ULL << 13); - /* clear [C3 C2] in soc physical address */ - soc_pa &= ~(0x3ULL << UMC_V12_0_PA_C2_BIT); - /* clear [C4] in soc physical address */ - soc_pa &= ~(0x1ULL << UMC_V12_0_PA_C4_BIT); - - /* loop for all possibilities of [C4 C3 C2] */ - for (column = 0; column < UMC_V12_0_NA_MAP_PA_NUM; column++) { - retired_page = soc_pa | ((column & 0x3) << UMC_V12_0_PA_C2_BIT); - retired_page |= (((column & 0x4) >> 2) << UMC_V12_0_PA_C4_BIT); - /* include column bit 0 and 1 */ - col &= 0x3; - col |= (column << 2); - dev_info(adev->dev, - "Error Address(PA):0x%-10llx Row:0x%-4x Col:0x%-2x Bank:0x%x Channel:0x%x\n", - retired_page, row, col, bank, channel_index); - - /* shift R13 bit */ - retired_page ^= (0x1ULL << UMC_V12_0_PA_R13_BIT); - dev_info(adev->dev, - "Error Address(PA):0x%-10llx Row:0x%-4x Col:0x%-2x Bank:0x%x Channel:0x%x\n", - retired_page, row_xor, col, bank, channel_index); - } -} + loop_bits[0] = UMC_V12_0_PA_C2_BIT; + loop_bits[1] = UMC_V12_0_PA_C3_BIT; + loop_bits[2] = UMC_V12_0_PA_C4_BIT; + loop_bits[3] = UMC_V12_0_PA_R13_BIT; -static int umc_v12_0_lookup_bad_pages_in_a_row(struct amdgpu_device *adev, - uint64_t pa_addr, uint64_t *pfns, int len) -{ - uint64_t soc_pa, retired_page, column; - uint32_t pos = 0; - - soc_pa = pa_addr; - /* clear [C3 C2] in soc physical address */ - soc_pa &= ~(0x3ULL << UMC_V12_0_PA_C2_BIT); - /* clear [C4] in soc physical address */ - soc_pa &= ~(0x1ULL << UMC_V12_0_PA_C4_BIT); - - /* loop for all possibilities of [C4 C3 C2] */ - for (column = 0; column < UMC_V12_0_NA_MAP_PA_NUM; column++) { - retired_page = soc_pa | ((column & 0x3) << UMC_V12_0_PA_C2_BIT); - retired_page |= (((column & 0x4) >> 2) << UMC_V12_0_PA_C4_BIT); - - if (pos >= len) - return 0; - pfns[pos++] = retired_page >> AMDGPU_GPU_PAGE_SHIFT; - - /* shift R13 bit */ - retired_page ^= (0x1ULL << UMC_V12_0_PA_R13_BIT); - - if (pos >= len) - return 0; - pfns[pos++] = retired_page >> AMDGPU_GPU_PAGE_SHIFT; + if (adev->gmc.gmc_funcs->query_mem_partition_mode) + nps = adev->gmc.gmc_funcs->query_mem_partition_mode(adev); + /* other nps modes are taken as nps1 */ + if (nps == AMDGPU_NPS4_PARTITION_MODE) { + loop_bits[0] = UMC_V12_0_PA_CH4_BIT; + loop_bits[1] = UMC_V12_0_PA_CH5_BIT; + loop_bits[2] = UMC_V12_0_PA_B0_BIT; + loop_bits[3] = UMC_V12_0_PA_R11_BIT; } - return pos; -} - -static int umc_v12_0_convert_mca_to_addr(struct amdgpu_device *adev, - uint64_t err_addr, uint32_t ch, uint32_t umc, - uint32_t node, uint32_t socket, - uint64_t *addr, bool dump_addr) -{ - struct ta_ras_query_address_input addr_in; - struct ta_ras_query_address_output addr_out; - - memset(&addr_in, 0, sizeof(addr_in)); - addr_in.ma.err_addr = err_addr; - addr_in.ma.ch_inst = ch; - addr_in.ma.umc_inst = umc; - addr_in.ma.node_inst = node; - addr_in.ma.socket_id = socket; - addr_in.addr_type = TA_RAS_MCA_TO_PA; - if (psp_ras_query_address(&adev->psp, &addr_in, &addr_out)) { - dev_warn(adev->dev, "Failed to query RAS physical address for 0x%llx", - err_addr); - return -EINVAL; + soc_pa = paddr_out->pa.pa; + channel_index = paddr_out->pa.channel_idx; + /* clear loop bits in soc physical address */ + for (i = 0; i < UMC_V12_0_RETIRE_LOOP_BITS; i++) + soc_pa &= ~BIT_ULL(loop_bits[i]); + + paddr_out->pa.pa = soc_pa; + /* get column bit 0 and 1 in mca address */ + col_lower = (err_addr >> 1) & 0x3ULL; + /* MA_R13_BIT will be handled later */ + row_lower = (err_addr >> UMC_V12_0_MA_R0_BIT) & 0x1fffULL; + + if (!err_data && !dump_addr) + goto out; + + /* loop for all possibilities of retired bits */ + for (column = 0; column < UMC_V12_0_BAD_PAGE_NUM_PER_CHANNEL; column++) { + soc_pa = paddr_out->pa.pa; + for (i = 0; i < UMC_V12_0_RETIRE_LOOP_BITS; i++) + soc_pa |= (((column >> i) & 0x1ULL) << loop_bits[i]); + + col = ((column & 0x7) << 2) | col_lower; + /* add row bit 13 */ + row = ((column >> 3) << 13) | row_lower; + + if (dump_addr) + dev_info(adev->dev, + "Error Address(PA):0x%-10llx Row:0x%-4x Col:0x%-2x Bank:0x%x Channel:0x%x\n", + soc_pa, row, col, bank, channel_index); + + if (err_data) + amdgpu_umc_fill_error_record(err_data, err_addr, + soc_pa, channel_index, umc_inst); } - if (dump_addr) - umc_v12_0_dump_addr_info(adev, &addr_out, err_addr); - - *addr = addr_out.pa.pa; - - return 0; +out: + return ret; } static int umc_v12_0_query_error_address(struct amdgpu_device *adev, @@ -374,7 +314,7 @@ static int umc_v12_0_query_error_address(struct amdgpu_device *adev, addr_in.ma.umc_inst = umc_inst; addr_in.ma.node_inst = node_inst; - umc_v12_0_convert_error_address(adev, err_data, &addr_in); + umc_v12_0_convert_error_address(adev, err_data, &addr_in, NULL, true); } /* clear umc status */ @@ -526,6 +466,9 @@ static int umc_v12_0_update_ecc_status(struct amdgpu_device *adev, uint64_t page_pfn[UMC_V12_0_BAD_PAGE_NUM_PER_CHANNEL]; uint64_t err_addr, pa_addr = 0; struct ras_ecc_err *ecc_err; + struct ta_ras_query_address_output addr_out; + enum amdgpu_memory_partition nps = AMDGPU_NPS1_PARTITION_MODE; + uint32_t shift_bit = UMC_V12_0_PA_C4_BIT; int count, ret, i; hwid = REG_GET_FIELD(ipid, MCMP1_IPIDT0, HardwareID); @@ -552,10 +495,10 @@ static int umc_v12_0_update_ecc_status(struct amdgpu_device *adev, MCA_IPID_2_UMC_CH(ipid), err_addr); - ret = umc_v12_0_convert_mca_to_addr(adev, + ret = amdgpu_umc_mca_to_addr(adev, err_addr, MCA_IPID_2_UMC_CH(ipid), MCA_IPID_2_UMC_INST(ipid), MCA_IPID_2_DIE_ID(ipid), - MCA_IPID_2_SOCKET_ID(ipid), &pa_addr, true); + MCA_IPID_2_SOCKET_ID(ipid), &addr_out, true); if (ret) return ret; @@ -563,14 +506,21 @@ static int umc_v12_0_update_ecc_status(struct amdgpu_device *adev, if (!ecc_err) return -ENOMEM; + pa_addr = addr_out.pa.pa; ecc_err->status = status; ecc_err->ipid = ipid; ecc_err->addr = addr; - ecc_err->pa_pfn = UMC_V12_ADDR_MASK_BAD_COLS(pa_addr) >> AMDGPU_GPU_PAGE_SHIFT; + ecc_err->pa_pfn = pa_addr >> AMDGPU_GPU_PAGE_SHIFT; + ecc_err->channel_idx = addr_out.pa.channel_idx; + + if (adev->gmc.gmc_funcs->query_mem_partition_mode) + nps = adev->gmc.gmc_funcs->query_mem_partition_mode(adev); + if (nps == AMDGPU_NPS4_PARTITION_MODE) + shift_bit = UMC_V12_0_PA_B0_BIT; /* If converted pa_pfn is 0, use pa C4 pfn. */ if (!ecc_err->pa_pfn) - ecc_err->pa_pfn = BIT_ULL(UMC_V12_0_PA_C4_BIT) >> AMDGPU_GPU_PAGE_SHIFT; + ecc_err->pa_pfn = BIT_ULL(shift_bit) >> AMDGPU_GPU_PAGE_SHIFT; ret = amdgpu_umc_logs_ecc_err(adev, &con->umc_ecc_log.de_page_tree, ecc_err); if (ret) { @@ -586,7 +536,7 @@ static int umc_v12_0_update_ecc_status(struct amdgpu_device *adev, con->umc_ecc_log.de_queried_count++; memset(page_pfn, 0, sizeof(page_pfn)); - count = umc_v12_0_lookup_bad_pages_in_a_row(adev, + count = amdgpu_umc_lookup_bad_pages_in_a_row(adev, pa_addr, page_pfn, ARRAY_SIZE(page_pfn)); if (count <= 0) { @@ -629,7 +579,7 @@ static int umc_v12_0_fill_error_record(struct amdgpu_device *adev, return -EINVAL; memset(page_pfn, 0, sizeof(page_pfn)); - count = umc_v12_0_lookup_bad_pages_in_a_row(adev, + count = amdgpu_umc_lookup_bad_pages_in_a_row(adev, ecc_err->pa_pfn << AMDGPU_GPU_PAGE_SHIFT, page_pfn, ARRAY_SIZE(page_pfn)); @@ -637,7 +587,7 @@ static int umc_v12_0_fill_error_record(struct amdgpu_device *adev, ret = amdgpu_umc_fill_error_record(err_data, ecc_err->addr, page_pfn[i] << AMDGPU_GPU_PAGE_SHIFT, - MCA_IPID_2_UMC_CH(ecc_err->ipid), + ecc_err->channel_idx, MCA_IPID_2_UMC_INST(ecc_err->ipid)); if (ret) break; @@ -676,6 +626,31 @@ static void umc_v12_0_query_ras_ecc_err_addr(struct amdgpu_device *adev, mutex_unlock(&con->umc_ecc_log.lock); } +static uint32_t umc_v12_0_get_die_id(struct amdgpu_device *adev, + uint64_t mca_addr, uint64_t retired_page) +{ + uint32_t die = 0; + + /* we only calculate die id for nps1 mode right now */ + die += ((((retired_page >> 12) & 0x1ULL)^ + ((retired_page >> 20) & 0x1ULL) ^ + ((retired_page >> 27) & 0x1ULL) ^ + ((retired_page >> 34) & 0x1ULL) ^ + ((retired_page >> 41) & 0x1ULL)) << 0); + + /* the original PA_C4 and PA_R13 may be cleared in retired_page, so + * get them from mca_addr. + */ + die += ((((retired_page >> 13) & 0x1ULL) ^ + ((mca_addr >> 5) & 0x1ULL) ^ + ((retired_page >> 28) & 0x1ULL) ^ + ((mca_addr >> 23) & 0x1ULL) ^ + ((retired_page >> 42) & 0x1ULL)) << 1); + die &= 3; + + return die; +} + struct amdgpu_umc_ras umc_v12_0_ras = { .ras_block = { .hw_ops = &umc_v12_0_ras_hw_ops, @@ -686,5 +661,7 @@ struct amdgpu_umc_ras umc_v12_0_ras = { .ecc_info_query_ras_error_address = umc_v12_0_query_ras_ecc_err_addr, .check_ecc_err_status = umc_v12_0_check_ecc_err_status, .update_ecc_status = umc_v12_0_update_ecc_status, + .convert_ras_err_addr = umc_v12_0_convert_error_address, + .get_die_id_from_pa = umc_v12_0_get_die_id, }; diff --git a/drivers/gpu/drm/amd/amdgpu/umc_v12_0.h b/drivers/gpu/drm/amd/amdgpu/umc_v12_0.h index be5598d76c1db2cba6518b61191fd8f59babf7ed..9298018d938f76a25bdcbedc2a874e394e4695ea 100644 --- a/drivers/gpu/drm/amd/amdgpu/umc_v12_0.h +++ b/drivers/gpu/drm/amd/amdgpu/umc_v12_0.h @@ -55,12 +55,24 @@ #define UMC_V12_0_NA_MAP_PA_NUM 8 /* R13 bit shift should be considered, double the number */ #define UMC_V12_0_BAD_PAGE_NUM_PER_CHANNEL (UMC_V12_0_NA_MAP_PA_NUM * 2) +/* C2, C3, C4, R13, four bits in MCA address are looped in retirement */ +#define UMC_V12_0_RETIRE_LOOP_BITS 4 /* column bits in SOC physical address */ #define UMC_V12_0_PA_C2_BIT 15 +#define UMC_V12_0_PA_C3_BIT 16 #define UMC_V12_0_PA_C4_BIT 21 /* row bits in SOC physical address */ +#define UMC_V12_0_PA_R0_BIT 22 +#define UMC_V12_0_PA_R11_BIT 33 #define UMC_V12_0_PA_R13_BIT 35 +/* channel bit in SOC physical address */ +#define UMC_V12_0_PA_CH4_BIT 12 +#define UMC_V12_0_PA_CH5_BIT 13 +/* bank bit in SOC physical address */ +#define UMC_V12_0_PA_B0_BIT 19 +/* row bits in MCA address */ +#define UMC_V12_0_MA_R0_BIT 10 #define MCA_UMC_HWID_V12_0 0x96 #define MCA_UMC_MCATYPE_V12_0 0x0 @@ -81,11 +93,6 @@ (((REG_GET_FIELD(ipid, MCMP1_IPIDT0, InstanceIdLo) & 0x1) << 2) | \ (REG_GET_FIELD(ipid, MCMP1_IPIDT0, InstanceIdHi) & 0x03)) -#define UMC_V12_ADDR_MASK_BAD_COLS(addr) \ - ((addr) & ~((0x3ULL << UMC_V12_0_PA_C2_BIT) | \ - (0x1ULL << UMC_V12_0_PA_C4_BIT) | \ - (0x1ULL << UMC_V12_0_PA_R13_BIT))) - bool umc_v12_0_is_deferred_error(struct amdgpu_device *adev, uint64_t mc_umc_status); bool umc_v12_0_is_uncorrectable_error(struct amdgpu_device *adev, uint64_t mc_umc_status); bool umc_v12_0_is_correctable_error(struct amdgpu_device *adev, uint64_t mc_umc_status); diff --git a/drivers/gpu/drm/amd/amdgpu/umc_v8_14.c b/drivers/gpu/drm/amd/amdgpu/umc_v8_14.c new file mode 100644 index 0000000000000000000000000000000000000000..eaca10a3c4a9df1de92c5512a580589ba2821d46 --- /dev/null +++ b/drivers/gpu/drm/amd/amdgpu/umc_v8_14.c @@ -0,0 +1,160 @@ +/* + * Copyright 2024 Advanced Micro Devices, Inc. + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR + * OTHER DEALINGS IN THE SOFTWARE. + * + */ +#include "umc_v8_14.h" +#include "amdgpu_ras.h" +#include "amdgpu_umc.h" +#include "amdgpu.h" +#include "umc/umc_8_14_0_offset.h" +#include "umc/umc_8_14_0_sh_mask.h" + +static inline uint32_t get_umc_v8_14_reg_offset(struct amdgpu_device *adev, + uint32_t umc_inst, + uint32_t ch_inst) +{ + return adev->umc.channel_offs * ch_inst + UMC_V8_14_INST_DIST * umc_inst; +} + +static int umc_v8_14_clear_error_count_per_channel(struct amdgpu_device *adev, + uint32_t node_inst, uint32_t umc_inst, + uint32_t ch_inst, void *data) +{ + uint32_t ecc_err_cnt_addr; + uint32_t umc_reg_offset = + get_umc_v8_14_reg_offset(adev, umc_inst, ch_inst); + + ecc_err_cnt_addr = + SOC15_REG_OFFSET(UMC, 0, regUMCCH0_GeccErrCnt); + + /* clear error count */ + WREG32_PCIE((ecc_err_cnt_addr + umc_reg_offset) * 4, + UMC_V8_14_CE_CNT_INIT); + + return 0; +} + +static void umc_v8_14_clear_error_count(struct amdgpu_device *adev) +{ + amdgpu_umc_loop_channels(adev, + umc_v8_14_clear_error_count_per_channel, NULL); +} + +static void umc_v8_14_query_correctable_error_count(struct amdgpu_device *adev, + uint32_t umc_reg_offset, + unsigned long *error_count) +{ + uint32_t ecc_err_cnt, ecc_err_cnt_addr; + + /* UMC 8_14 registers */ + ecc_err_cnt_addr = + SOC15_REG_OFFSET(UMC, 0, regUMCCH0_GeccErrCnt); + + ecc_err_cnt = RREG32_PCIE((ecc_err_cnt_addr + umc_reg_offset) * 4); + *error_count += + (REG_GET_FIELD(ecc_err_cnt, UMCCH0_GeccErrCnt, GeccErrCnt) - + UMC_V8_14_CE_CNT_INIT); +} + +static void umc_v8_14_query_uncorrectable_error_count(struct amdgpu_device *adev, + uint32_t umc_reg_offset, + unsigned long *error_count) +{ + uint32_t ecc_err_cnt, ecc_err_cnt_addr; + /* UMC 8_14 registers */ + ecc_err_cnt_addr = + SOC15_REG_OFFSET(UMC, 0, regUMCCH0_GeccErrCnt); + + ecc_err_cnt = RREG32_PCIE((ecc_err_cnt_addr + umc_reg_offset) * 4); + *error_count += + (REG_GET_FIELD(ecc_err_cnt, UMCCH0_GeccErrCnt, GeccUnCorrErrCnt) - + UMC_V8_14_CE_CNT_INIT); +} + +static int umc_v8_14_query_error_count_per_channel(struct amdgpu_device *adev, + uint32_t node_inst, uint32_t umc_inst, + uint32_t ch_inst, void *data) +{ + struct ras_err_data *err_data = (struct ras_err_data *)data; + uint32_t umc_reg_offset = + get_umc_v8_14_reg_offset(adev, umc_inst, ch_inst); + + umc_v8_14_query_correctable_error_count(adev, + umc_reg_offset, + &(err_data->ce_count)); + umc_v8_14_query_uncorrectable_error_count(adev, + umc_reg_offset, + &(err_data->ue_count)); + + return 0; +} + +static void umc_v8_14_query_ras_error_count(struct amdgpu_device *adev, + void *ras_error_status) +{ + amdgpu_umc_loop_channels(adev, + umc_v8_14_query_error_count_per_channel, ras_error_status); + + umc_v8_14_clear_error_count(adev); +} + +static int umc_v8_14_err_cnt_init_per_channel(struct amdgpu_device *adev, + uint32_t node_inst, uint32_t umc_inst, + uint32_t ch_inst, void *data) +{ + uint32_t ecc_err_cnt_sel, ecc_err_cnt_sel_addr; + uint32_t ecc_err_cnt_addr; + uint32_t umc_reg_offset = + get_umc_v8_14_reg_offset(adev, umc_inst, ch_inst); + + ecc_err_cnt_sel_addr = + SOC15_REG_OFFSET(UMC, 0, regUMCCH0_GeccErrCntSel); + ecc_err_cnt_addr = + SOC15_REG_OFFSET(UMC, 0, regUMCCH0_GeccErrCnt); + + ecc_err_cnt_sel = RREG32_PCIE((ecc_err_cnt_sel_addr + umc_reg_offset) * 4); + + /* set ce error interrupt type to APIC based interrupt */ + ecc_err_cnt_sel = REG_SET_FIELD(ecc_err_cnt_sel, UMCCH0_GeccErrCntSel, + GeccErrInt, 0x1); + WREG32_PCIE((ecc_err_cnt_sel_addr + umc_reg_offset) * 4, ecc_err_cnt_sel); + /* set error count to initial value */ + WREG32_PCIE((ecc_err_cnt_addr + umc_reg_offset) * 4, UMC_V8_14_CE_CNT_INIT); + + return 0; +} + +static void umc_v8_14_err_cnt_init(struct amdgpu_device *adev) +{ + amdgpu_umc_loop_channels(adev, + umc_v8_14_err_cnt_init_per_channel, NULL); +} + +const struct amdgpu_ras_block_hw_ops umc_v8_14_ras_hw_ops = { + .query_ras_error_count = umc_v8_14_query_ras_error_count, +}; + +struct amdgpu_umc_ras umc_v8_14_ras = { + .ras_block = { + .hw_ops = &umc_v8_14_ras_hw_ops, + }, + .err_cnt_init = umc_v8_14_err_cnt_init, +}; diff --git a/drivers/gpu/drm/amd/amdgpu/umc_v8_14.h b/drivers/gpu/drm/amd/amdgpu/umc_v8_14.h new file mode 100644 index 0000000000000000000000000000000000000000..20a258f0017aa121aa5f97237b912badc2f76235 --- /dev/null +++ b/drivers/gpu/drm/amd/amdgpu/umc_v8_14.h @@ -0,0 +1,51 @@ +/* + * Copyright 2024 Advanced Micro Devices, Inc. + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR + * OTHER DEALINGS IN THE SOFTWARE. + * + */ +#ifndef __UMC_V8_14_H__ +#define __UMC_V8_14_H__ + +#include "soc15_common.h" +#include "amdgpu.h" + +/* number of umc channel instance with memory map register access */ +#define UMC_V8_14_CHANNEL_INSTANCE_NUM 2 +/* number of umc instance with memory map register access */ +#define UMC_V8_14_UMC_INSTANCE_NUM(adev) ((adev)->umc.node_inst_num) + +/* Total channel instances for all available umc nodes */ +#define UMC_V8_14_TOTAL_CHANNEL_NUM(adev) \ + (UMC_V8_14_CHANNEL_INSTANCE_NUM * (adev)->gmc.num_umc) + +/* UMC register per channel offset */ +#define UMC_V8_14_PER_CHANNEL_OFFSET 0x400 + +#define UMC_V8_14_INST_DIST 0x40000 + +/* EccErrCnt max value */ +#define UMC_V8_14_CE_CNT_MAX 0xffff +/* umc ce interrupt threshold */ +#define UMC_V8_14_CE_INT_THRESHOLD 0xffff +/* umc ce count initial value */ +#define UMC_V8_14_CE_CNT_INIT (UMC_V8_14_CE_CNT_MAX - UMC_V8_14_CE_INT_THRESHOLD) + +extern struct amdgpu_umc_ras umc_v8_14_ras; +#endif diff --git a/drivers/gpu/drm/amd/amdgpu/uvd_v3_1.c b/drivers/gpu/drm/amd/amdgpu/uvd_v3_1.c index bdbca25d80c498bd1e4376263cf5ea26acca0118..5830e799c0a363f9aa128417e36538344a58015d 100644 --- a/drivers/gpu/drm/amd/amdgpu/uvd_v3_1.c +++ b/drivers/gpu/drm/amd/amdgpu/uvd_v3_1.c @@ -790,13 +790,13 @@ static int uvd_v3_1_soft_reset(struct amdgpu_ip_block *ip_block) return uvd_v3_1_start(adev); } -static int uvd_v3_1_set_clockgating_state(void *handle, +static int uvd_v3_1_set_clockgating_state(struct amdgpu_ip_block *ip_block, enum amd_clockgating_state state) { return 0; } -static int uvd_v3_1_set_powergating_state(void *handle, +static int uvd_v3_1_set_powergating_state(struct amdgpu_ip_block *ip_block, enum amd_powergating_state state) { return 0; diff --git a/drivers/gpu/drm/amd/amdgpu/uvd_v4_2.c b/drivers/gpu/drm/amd/amdgpu/uvd_v4_2.c index a836dc9cfcadeba8651e7192a3914d73f38c55b5..f93079e092158311eff5e3c3c8b6c2699ce8bc0b 100644 --- a/drivers/gpu/drm/amd/amdgpu/uvd_v4_2.c +++ b/drivers/gpu/drm/amd/amdgpu/uvd_v4_2.c @@ -44,7 +44,7 @@ static void uvd_v4_2_set_ring_funcs(struct amdgpu_device *adev); static void uvd_v4_2_set_irq_funcs(struct amdgpu_device *adev); static int uvd_v4_2_start(struct amdgpu_device *adev); static void uvd_v4_2_stop(struct amdgpu_device *adev); -static int uvd_v4_2_set_clockgating_state(void *handle, +static int uvd_v4_2_set_clockgating_state(struct amdgpu_ip_block *ip_block, enum amd_clockgating_state state); static void uvd_v4_2_set_dcm(struct amdgpu_device *adev, bool sw_mode); @@ -708,13 +708,13 @@ static int uvd_v4_2_process_interrupt(struct amdgpu_device *adev, return 0; } -static int uvd_v4_2_set_clockgating_state(void *handle, +static int uvd_v4_2_set_clockgating_state(struct amdgpu_ip_block *ip_block, enum amd_clockgating_state state) { return 0; } -static int uvd_v4_2_set_powergating_state(void *handle, +static int uvd_v4_2_set_powergating_state(struct amdgpu_ip_block *ip_block, enum amd_powergating_state state) { /* This doesn't actually powergate the UVD block. @@ -724,7 +724,7 @@ static int uvd_v4_2_set_powergating_state(void *handle, * revisit this when there is a cleaner line between * the smc and the hw blocks */ - struct amdgpu_device *adev = (struct amdgpu_device *)handle; + struct amdgpu_device *adev = ip_block->adev; if (state == AMD_PG_STATE_GATE) { uvd_v4_2_stop(adev); diff --git a/drivers/gpu/drm/amd/amdgpu/uvd_v5_0.c b/drivers/gpu/drm/amd/amdgpu/uvd_v5_0.c index ab55fae3569e4918be0dc23d900850b9be6087c9..050a0f30939084edd9da92efc0453ce5e613b431 100644 --- a/drivers/gpu/drm/amd/amdgpu/uvd_v5_0.c +++ b/drivers/gpu/drm/amd/amdgpu/uvd_v5_0.c @@ -42,7 +42,7 @@ static void uvd_v5_0_set_ring_funcs(struct amdgpu_device *adev); static void uvd_v5_0_set_irq_funcs(struct amdgpu_device *adev); static int uvd_v5_0_start(struct amdgpu_device *adev); static void uvd_v5_0_stop(struct amdgpu_device *adev); -static int uvd_v5_0_set_clockgating_state(void *handle, +static int uvd_v5_0_set_clockgating_state(struct amdgpu_ip_block *ip_block, enum amd_clockgating_state state); static void uvd_v5_0_enable_mgcg(struct amdgpu_device *adev, bool enable); @@ -155,7 +155,7 @@ static int uvd_v5_0_hw_init(struct amdgpu_ip_block *ip_block) int r; amdgpu_asic_set_uvd_clocks(adev, 10000, 10000); - uvd_v5_0_set_clockgating_state(adev, AMD_CG_STATE_UNGATE); + uvd_v5_0_set_clockgating_state(ip_block, AMD_CG_STATE_UNGATE); uvd_v5_0_enable_mgcg(adev, true); r = amdgpu_ring_test_helper(ring); @@ -790,16 +790,11 @@ static void uvd_v5_0_enable_mgcg(struct amdgpu_device *adev, } } -static int uvd_v5_0_set_clockgating_state(void *handle, +static int uvd_v5_0_set_clockgating_state(struct amdgpu_ip_block *ip_block, enum amd_clockgating_state state) { - struct amdgpu_device *adev = (struct amdgpu_device *)handle; + struct amdgpu_device *adev = ip_block->adev; bool enable = (state == AMD_CG_STATE_GATE); - struct amdgpu_ip_block *ip_block; - - ip_block = amdgpu_device_ip_get_ip_block(adev, AMD_IP_BLOCK_TYPE_UVD); - if (!ip_block) - return -EINVAL; if (enable) { /* wait for STATUS to clear */ @@ -817,7 +812,7 @@ static int uvd_v5_0_set_clockgating_state(void *handle, return 0; } -static int uvd_v5_0_set_powergating_state(void *handle, +static int uvd_v5_0_set_powergating_state(struct amdgpu_ip_block *ip_block, enum amd_powergating_state state) { /* This doesn't actually powergate the UVD block. @@ -827,7 +822,7 @@ static int uvd_v5_0_set_powergating_state(void *handle, * revisit this when there is a cleaner line between * the smc and the hw blocks */ - struct amdgpu_device *adev = (struct amdgpu_device *)handle; + struct amdgpu_device *adev = ip_block->adev; int ret = 0; if (state == AMD_PG_STATE_GATE) { diff --git a/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c b/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c index 39f8c3d3a135f448da8fa578c7c21505dd0cb3e4..d9d036ee51fb7f95dd86041010957530ebbbba47 100644 --- a/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c +++ b/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c @@ -48,7 +48,7 @@ static void uvd_v6_0_set_irq_funcs(struct amdgpu_device *adev); static int uvd_v6_0_start(struct amdgpu_device *adev); static void uvd_v6_0_stop(struct amdgpu_device *adev); static void uvd_v6_0_set_sw_clock_gating(struct amdgpu_device *adev); -static int uvd_v6_0_set_clockgating_state(void *handle, +static int uvd_v6_0_set_clockgating_state(struct amdgpu_ip_block *ip_block, enum amd_clockgating_state state); static void uvd_v6_0_enable_mgcg(struct amdgpu_device *adev, bool enable); @@ -467,7 +467,7 @@ static int uvd_v6_0_hw_init(struct amdgpu_ip_block *ip_block) int i, r; amdgpu_asic_set_uvd_clocks(adev, 10000, 10000); - uvd_v6_0_set_clockgating_state(adev, AMD_CG_STATE_UNGATE); + uvd_v6_0_set_clockgating_state(ip_block, AMD_CG_STATE_UNGATE); uvd_v6_0_enable_mgcg(adev, true); r = amdgpu_ring_test_helper(ring); @@ -1450,17 +1450,12 @@ static void uvd_v6_0_enable_mgcg(struct amdgpu_device *adev, } } -static int uvd_v6_0_set_clockgating_state(void *handle, +static int uvd_v6_0_set_clockgating_state(struct amdgpu_ip_block *ip_block, enum amd_clockgating_state state) { - struct amdgpu_device *adev = (struct amdgpu_device *)handle; - struct amdgpu_ip_block *ip_block; + struct amdgpu_device *adev = ip_block->adev; bool enable = (state == AMD_CG_STATE_GATE); - ip_block = amdgpu_device_ip_get_ip_block(adev, AMD_IP_BLOCK_TYPE_UVD); - if (!ip_block) - return -EINVAL; - if (enable) { /* wait for STATUS to clear */ if (uvd_v6_0_wait_for_idle(ip_block)) @@ -1476,7 +1471,7 @@ static int uvd_v6_0_set_clockgating_state(void *handle, return 0; } -static int uvd_v6_0_set_powergating_state(void *handle, +static int uvd_v6_0_set_powergating_state(struct amdgpu_ip_block *ip_block, enum amd_powergating_state state) { /* This doesn't actually powergate the UVD block. @@ -1486,7 +1481,7 @@ static int uvd_v6_0_set_powergating_state(void *handle, * revisit this when there is a cleaner line between * the smc and the hw blocks */ - struct amdgpu_device *adev = (struct amdgpu_device *)handle; + struct amdgpu_device *adev = ip_block->adev; int ret = 0; WREG32(mmUVD_POWER_STATUS, UVD_POWER_STATUS__UVD_PG_EN_MASK); diff --git a/drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c b/drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c index 079131aeb2f78f9849d572b5ea771610f35f9ea6..9d237b5937fb04a5b3c06387a64297b65ae94703 100644 --- a/drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c +++ b/drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c @@ -1288,7 +1288,7 @@ static int uvd_v7_0_ring_patch_cs_in_place(struct amdgpu_cs_parser *p, struct amdgpu_job *job, struct amdgpu_ib *ib) { - struct amdgpu_ring *ring = to_amdgpu_ring(job->base.sched); + struct amdgpu_ring *ring = amdgpu_job_ring(job); unsigned i; /* No patching necessary for the first instance */ @@ -1511,7 +1511,7 @@ static int uvd_v7_0_process_interrupt(struct amdgpu_device *adev, return 0; } -static int uvd_v7_0_set_clockgating_state(void *handle, +static int uvd_v7_0_set_clockgating_state(struct amdgpu_ip_block *ip_block, enum amd_clockgating_state state) { /* needed for driver unload*/ diff --git a/drivers/gpu/drm/amd/amdgpu/vce_v2_0.c b/drivers/gpu/drm/amd/amdgpu/vce_v2_0.c index c1ed91b39415438025410c61fd151be534555294..c633b7ff294384aac994707d4d03be00538cd436 100644 --- a/drivers/gpu/drm/amd/amdgpu/vce_v2_0.c +++ b/drivers/gpu/drm/amd/amdgpu/vce_v2_0.c @@ -578,13 +578,13 @@ static int vce_v2_0_process_interrupt(struct amdgpu_device *adev, return 0; } -static int vce_v2_0_set_clockgating_state(void *handle, +static int vce_v2_0_set_clockgating_state(struct amdgpu_ip_block *ip_block, enum amd_clockgating_state state) { bool gate = false; bool sw_cg = false; - struct amdgpu_device *adev = (struct amdgpu_device *)handle; + struct amdgpu_device *adev = ip_block->adev; if (state == AMD_CG_STATE_GATE) { gate = true; @@ -596,7 +596,7 @@ static int vce_v2_0_set_clockgating_state(void *handle, return 0; } -static int vce_v2_0_set_powergating_state(void *handle, +static int vce_v2_0_set_powergating_state(struct amdgpu_ip_block *ip_block, enum amd_powergating_state state) { /* This doesn't actually powergate the VCE block. @@ -606,7 +606,7 @@ static int vce_v2_0_set_powergating_state(void *handle, * revisit this when there is a cleaner line between * the smc and the hw blocks */ - struct amdgpu_device *adev = (struct amdgpu_device *)handle; + struct amdgpu_device *adev = ip_block->adev; if (state == AMD_PG_STATE_GATE) return vce_v2_0_stop(adev); diff --git a/drivers/gpu/drm/amd/amdgpu/vce_v3_0.c b/drivers/gpu/drm/amd/amdgpu/vce_v3_0.c index 6bb318a06f1976e7cbdb07bafde331843189bcba..f8bddcd19b6881419e3d8cafd155ee5cb9be4971 100644 --- a/drivers/gpu/drm/amd/amdgpu/vce_v3_0.c +++ b/drivers/gpu/drm/amd/amdgpu/vce_v3_0.c @@ -65,7 +65,7 @@ static void vce_v3_0_mc_resume(struct amdgpu_device *adev, int idx); static void vce_v3_0_set_ring_funcs(struct amdgpu_device *adev); static void vce_v3_0_set_irq_funcs(struct amdgpu_device *adev); static int vce_v3_0_wait_for_idle(struct amdgpu_ip_block *ip_block); -static int vce_v3_0_set_clockgating_state(void *handle, +static int vce_v3_0_set_clockgating_state(struct amdgpu_ip_block *ip_block, enum amd_clockgating_state state); /** * vce_v3_0_ring_get_rptr - get read pointer @@ -497,7 +497,7 @@ static int vce_v3_0_hw_fini(struct amdgpu_ip_block *ip_block) return r; vce_v3_0_stop(adev); - return vce_v3_0_set_clockgating_state(adev, AMD_CG_STATE_GATE); + return vce_v3_0_set_clockgating_state(ip_block, AMD_CG_STATE_GATE); } static int vce_v3_0_suspend(struct amdgpu_ip_block *ip_block) @@ -760,10 +760,10 @@ static int vce_v3_0_process_interrupt(struct amdgpu_device *adev, return 0; } -static int vce_v3_0_set_clockgating_state(void *handle, +static int vce_v3_0_set_clockgating_state(struct amdgpu_ip_block *ip_block, enum amd_clockgating_state state) { - struct amdgpu_device *adev = (struct amdgpu_device *)handle; + struct amdgpu_device *adev = ip_block->adev; bool enable = (state == AMD_CG_STATE_GATE); int i; @@ -801,7 +801,7 @@ static int vce_v3_0_set_clockgating_state(void *handle, return 0; } -static int vce_v3_0_set_powergating_state(void *handle, +static int vce_v3_0_set_powergating_state(struct amdgpu_ip_block *ip_block, enum amd_powergating_state state) { /* This doesn't actually powergate the VCE block. @@ -811,7 +811,7 @@ static int vce_v3_0_set_powergating_state(void *handle, * revisit this when there is a cleaner line between * the smc and the hw blocks */ - struct amdgpu_device *adev = (struct amdgpu_device *)handle; + struct amdgpu_device *adev = ip_block->adev; int ret = 0; if (state == AMD_PG_STATE_GATE) { diff --git a/drivers/gpu/drm/amd/amdgpu/vce_v4_0.c b/drivers/gpu/drm/amd/amdgpu/vce_v4_0.c index 79ee555768a589d9e357a9cd195202f44cfc9319..335bda64ff5bc21d1ddffa118d0c7930fa918b2f 100644 --- a/drivers/gpu/drm/amd/amdgpu/vce_v4_0.c +++ b/drivers/gpu/drm/amd/amdgpu/vce_v4_0.c @@ -684,14 +684,14 @@ static void vce_v4_0_mc_resume(struct amdgpu_device *adev) ~VCE_SYS_INT_EN__VCE_SYS_INT_TRAP_INTERRUPT_EN_MASK); } -static int vce_v4_0_set_clockgating_state(void *handle, +static int vce_v4_0_set_clockgating_state(struct amdgpu_ip_block *ip_block, enum amd_clockgating_state state) { /* needed for driver unload*/ return 0; } -static int vce_v4_0_set_powergating_state(void *handle, +static int vce_v4_0_set_powergating_state(struct amdgpu_ip_block *ip_block, enum amd_powergating_state state) { /* This doesn't actually powergate the VCE block. @@ -701,7 +701,7 @@ static int vce_v4_0_set_powergating_state(void *handle, * revisit this when there is a cleaner line between * the smc and the hw blocks */ - struct amdgpu_device *adev = (struct amdgpu_device *)handle; + struct amdgpu_device *adev = ip_block->adev; if (state == AMD_PG_STATE_GATE) return vce_v4_0_stop(adev); diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c index 10e99c926fb8b0921de960c25b4867b3999f80a3..5ea96c9835170de4d6fa604608aab710803bd15d 100644 --- a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c @@ -85,7 +85,8 @@ static int vcn_v1_0_stop(struct amdgpu_device *adev); static void vcn_v1_0_set_dec_ring_funcs(struct amdgpu_device *adev); static void vcn_v1_0_set_enc_ring_funcs(struct amdgpu_device *adev); static void vcn_v1_0_set_irq_funcs(struct amdgpu_device *adev); -static int vcn_v1_0_set_powergating_state(void *handle, enum amd_powergating_state state); +static int vcn_v1_0_set_powergating_state(struct amdgpu_ip_block *ip_block, + enum amd_powergating_state state); static int vcn_v1_0_pause_dpg_mode(struct amdgpu_device *adev, int inst_idx, struct dpg_pause_state *new_state); @@ -281,7 +282,7 @@ static int vcn_v1_0_hw_fini(struct amdgpu_ip_block *ip_block) if ((adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG) || (adev->vcn.cur_state != AMD_PG_STATE_GATE && RREG32_SOC15(VCN, 0, mmUVD_STATUS))) { - vcn_v1_0_set_powergating_state(adev, AMD_PG_STATE_GATE); + vcn_v1_0_set_powergating_state(ip_block, AMD_PG_STATE_GATE); } return 0; @@ -303,7 +304,7 @@ static int vcn_v1_0_suspend(struct amdgpu_ip_block *ip_block) idle_work_unexecuted = cancel_delayed_work_sync(&adev->vcn.idle_work); if (idle_work_unexecuted) { if (adev->pm.dpm_enabled) - amdgpu_dpm_enable_uvd(adev, false); + amdgpu_dpm_enable_vcn(adev, false, 0); } r = vcn_v1_0_hw_fini(ip_block); @@ -344,7 +345,7 @@ static int vcn_v1_0_resume(struct amdgpu_ip_block *ip_block) */ static void vcn_v1_0_mc_resume_spg_mode(struct amdgpu_device *adev) { - uint32_t size = AMDGPU_GPU_PAGE_ALIGN(adev->vcn.fw[0]->size + 4); + uint32_t size = AMDGPU_GPU_PAGE_ALIGN(adev->vcn.inst[0].fw->size + 4); uint32_t offset; /* cache window 0: fw */ @@ -411,7 +412,7 @@ static void vcn_v1_0_mc_resume_spg_mode(struct amdgpu_device *adev) static void vcn_v1_0_mc_resume_dpg_mode(struct amdgpu_device *adev) { - uint32_t size = AMDGPU_GPU_PAGE_ALIGN(adev->vcn.fw[0]->size + 4); + uint32_t size = AMDGPU_GPU_PAGE_ALIGN(adev->vcn.inst[0].fw->size + 4); uint32_t offset; /* cache window 0: fw */ @@ -1394,15 +1395,15 @@ static int vcn_v1_0_wait_for_idle(struct amdgpu_ip_block *ip_block) return ret; } -static int vcn_v1_0_set_clockgating_state(void *handle, +static int vcn_v1_0_set_clockgating_state(struct amdgpu_ip_block *ip_block, enum amd_clockgating_state state) { - struct amdgpu_device *adev = (struct amdgpu_device *)handle; + struct amdgpu_device *adev = ip_block->adev; bool enable = (state == AMD_CG_STATE_GATE); if (enable) { /* wait for STATUS to clear */ - if (!vcn_v1_0_is_idle(handle)) + if (!vcn_v1_0_is_idle(adev)) return -EBUSY; vcn_v1_0_enable_clock_gating(adev); } else { @@ -1799,7 +1800,7 @@ static void vcn_v1_0_dec_ring_insert_nop(struct amdgpu_ring *ring, uint32_t coun } } -static int vcn_v1_0_set_powergating_state(void *handle, +static int vcn_v1_0_set_powergating_state(struct amdgpu_ip_block *ip_block, enum amd_powergating_state state) { /* This doesn't actually powergate the VCN block. @@ -1810,7 +1811,7 @@ static int vcn_v1_0_set_powergating_state(void *handle, * the smc and the hw blocks */ int ret; - struct amdgpu_device *adev = (struct amdgpu_device *)handle; + struct amdgpu_device *adev = ip_block->adev; if (state == adev->vcn.cur_state) return 0; @@ -1856,7 +1857,7 @@ static void vcn_v1_0_idle_work_handler(struct work_struct *work) if (fences == 0) { amdgpu_gfx_off_ctrl(adev, true); if (adev->pm.dpm_enabled) - amdgpu_dpm_enable_uvd(adev, false); + amdgpu_dpm_enable_vcn(adev, false, 0); else amdgpu_device_ip_set_powergating_state(adev, AMD_IP_BLOCK_TYPE_VCN, AMD_PG_STATE_GATE); @@ -1886,7 +1887,7 @@ void vcn_v1_0_set_pg_for_begin_use(struct amdgpu_ring *ring, bool set_clocks) if (set_clocks) { amdgpu_gfx_off_ctrl(adev, false); if (adev->pm.dpm_enabled) - amdgpu_dpm_enable_uvd(adev, true); + amdgpu_dpm_enable_vcn(adev, true, 0); else amdgpu_device_ip_set_powergating_state(adev, AMD_IP_BLOCK_TYPE_VCN, AMD_PG_STATE_UNGATE); diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c b/drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c index e0322cbca3ecf671d383a4a74645329681c38e14..e42cfc731ad8e2bf10fbc79f903134d873601146 100644 --- a/drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c @@ -92,7 +92,7 @@ static const struct amdgpu_hwip_reg_entry vcn_reg_list_2_0[] = { static void vcn_v2_0_set_dec_ring_funcs(struct amdgpu_device *adev); static void vcn_v2_0_set_enc_ring_funcs(struct amdgpu_device *adev); static void vcn_v2_0_set_irq_funcs(struct amdgpu_device *adev); -static int vcn_v2_0_set_powergating_state(void *handle, +static int vcn_v2_0_set_powergating_state(struct amdgpu_ip_block *ip_block, enum amd_powergating_state state); static int vcn_v2_0_pause_dpg_mode(struct amdgpu_device *adev, int inst_idx, struct dpg_pause_state *new_state); @@ -318,7 +318,7 @@ static int vcn_v2_0_hw_fini(struct amdgpu_ip_block *ip_block) if ((adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG) || (adev->vcn.cur_state != AMD_PG_STATE_GATE && RREG32_SOC15(VCN, 0, mmUVD_STATUS))) - vcn_v2_0_set_powergating_state(adev, AMD_PG_STATE_GATE); + vcn_v2_0_set_powergating_state(ip_block, AMD_PG_STATE_GATE); return 0; } @@ -372,7 +372,7 @@ static int vcn_v2_0_resume(struct amdgpu_ip_block *ip_block) */ static void vcn_v2_0_mc_resume(struct amdgpu_device *adev) { - uint32_t size = AMDGPU_GPU_PAGE_ALIGN(adev->vcn.fw[0]->size + 4); + uint32_t size = AMDGPU_GPU_PAGE_ALIGN(adev->vcn.inst[0].fw->size + 4); uint32_t offset; if (amdgpu_sriov_vf(adev)) @@ -428,7 +428,7 @@ static void vcn_v2_0_mc_resume(struct amdgpu_device *adev) static void vcn_v2_0_mc_resume_dpg_mode(struct amdgpu_device *adev, bool indirect) { - uint32_t size = AMDGPU_GPU_PAGE_ALIGN(adev->vcn.fw[0]->size + 4); + uint32_t size = AMDGPU_GPU_PAGE_ALIGN(adev->vcn.inst[0].fw->size + 4); uint32_t offset; /* cache window 0: fw */ @@ -978,7 +978,7 @@ static int vcn_v2_0_start(struct amdgpu_device *adev) int i, j, r; if (adev->pm.dpm_enabled) - amdgpu_dpm_enable_uvd(adev, true); + amdgpu_dpm_enable_vcn(adev, true, 0); if (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG) return vcn_v2_0_start_dpg_mode(adev, adev->vcn.indirect_sram); @@ -1235,7 +1235,7 @@ static int vcn_v2_0_stop(struct amdgpu_device *adev) power_off: if (adev->pm.dpm_enabled) - amdgpu_dpm_enable_uvd(adev, false); + amdgpu_dpm_enable_vcn(adev, false, 0); return 0; } @@ -1335,10 +1335,10 @@ static int vcn_v2_0_wait_for_idle(struct amdgpu_ip_block *ip_block) return ret; } -static int vcn_v2_0_set_clockgating_state(void *handle, +static int vcn_v2_0_set_clockgating_state(struct amdgpu_ip_block *ip_block, enum amd_clockgating_state state) { - struct amdgpu_device *adev = (struct amdgpu_device *)handle; + struct amdgpu_device *adev = ip_block->adev; bool enable = (state == AMD_CG_STATE_GATE); if (amdgpu_sriov_vf(adev)) @@ -1346,7 +1346,7 @@ static int vcn_v2_0_set_clockgating_state(void *handle, if (enable) { /* wait for STATUS to clear */ - if (!vcn_v2_0_is_idle(handle)) + if (!vcn_v2_0_is_idle(adev)) return -EBUSY; vcn_v2_0_enable_clock_gating(adev); } else { @@ -1796,7 +1796,7 @@ int vcn_v2_0_dec_ring_test_ring(struct amdgpu_ring *ring) } -static int vcn_v2_0_set_powergating_state(void *handle, +static int vcn_v2_0_set_powergating_state(struct amdgpu_ip_block *ip_block, enum amd_powergating_state state) { /* This doesn't actually powergate the VCN block. @@ -1807,7 +1807,7 @@ static int vcn_v2_0_set_powergating_state(void *handle, * the smc and the hw blocks */ int ret; - struct amdgpu_device *adev = (struct amdgpu_device *)handle; + struct amdgpu_device *adev = ip_block->adev; if (amdgpu_sriov_vf(adev)) { adev->vcn.cur_state = AMD_PG_STATE_UNGATE; @@ -1920,7 +1920,7 @@ static int vcn_v2_0_start_sriov(struct amdgpu_device *adev) init_table += header->vcn_table_offset; - size = AMDGPU_GPU_PAGE_ALIGN(adev->vcn.fw[0]->size + 4); + size = AMDGPU_GPU_PAGE_ALIGN(adev->vcn.inst[0].fw->size + 4); MMSCH_V2_0_INSERT_DIRECT_RD_MOD_WT( SOC15_REG_OFFSET(UVD, i, mmUVD_STATUS), diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c b/drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c index 6aa08281d094581c5f30353d5363e2dfff3ce9c3..b518202955cad6a67c72fc6b4ccdcf1130f05a4b 100644 --- a/drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c @@ -95,7 +95,7 @@ static const struct amdgpu_hwip_reg_entry vcn_reg_list_2_5[] = { static void vcn_v2_5_set_dec_ring_funcs(struct amdgpu_device *adev); static void vcn_v2_5_set_enc_ring_funcs(struct amdgpu_device *adev); static void vcn_v2_5_set_irq_funcs(struct amdgpu_device *adev); -static int vcn_v2_5_set_powergating_state(void *handle, +static int vcn_v2_5_set_powergating_state(struct amdgpu_ip_block *ip_block, enum amd_powergating_state state); static int vcn_v2_5_pause_dpg_mode(struct amdgpu_device *adev, int inst_idx, struct dpg_pause_state *new_state); @@ -399,7 +399,7 @@ static int vcn_v2_5_hw_fini(struct amdgpu_ip_block *ip_block) if ((adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG) || (adev->vcn.cur_state != AMD_PG_STATE_GATE && RREG32_SOC15(VCN, i, mmUVD_STATUS))) - vcn_v2_5_set_powergating_state(adev, AMD_PG_STATE_GATE); + vcn_v2_5_set_powergating_state(ip_block, AMD_PG_STATE_GATE); if (amdgpu_ras_is_supported(adev, AMDGPU_RAS_BLOCK__VCN)) amdgpu_irq_put(adev, &adev->vcn.inst[i].ras_poison_irq, 0); @@ -465,7 +465,7 @@ static void vcn_v2_5_mc_resume(struct amdgpu_device *adev) if (adev->vcn.harvest_config & (1 << i)) continue; - size = AMDGPU_GPU_PAGE_ALIGN(adev->vcn.fw[i]->size + 4); + size = AMDGPU_GPU_PAGE_ALIGN(adev->vcn.inst[i].fw->size + 4); /* cache window 0: fw */ if (adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) { WREG32_SOC15(VCN, i, mmUVD_LMI_VCPU_CACHE_64BIT_BAR_LOW, @@ -514,7 +514,7 @@ static void vcn_v2_5_mc_resume(struct amdgpu_device *adev) static void vcn_v2_5_mc_resume_dpg_mode(struct amdgpu_device *adev, int inst_idx, bool indirect) { - uint32_t size = AMDGPU_GPU_PAGE_ALIGN(adev->vcn.fw[inst_idx]->size + 4); + uint32_t size = AMDGPU_GPU_PAGE_ALIGN(adev->vcn.inst[inst_idx].fw->size + 4); uint32_t offset; /* cache window 0: fw */ @@ -1012,8 +1012,10 @@ static int vcn_v2_5_start(struct amdgpu_device *adev) uint32_t rb_bufsz, tmp; int i, j, k, r; - if (adev->pm.dpm_enabled) - amdgpu_dpm_enable_uvd(adev, true); + for (i = 0; i < adev->vcn.num_vcn_inst; ++i) { + if (adev->pm.dpm_enabled) + amdgpu_dpm_enable_vcn(adev, true, i); + } for (i = 0; i < adev->vcn.num_vcn_inst; ++i) { if (adev->vcn.harvest_config & (1 << i)) @@ -1285,7 +1287,7 @@ static int vcn_v2_5_sriov_start(struct amdgpu_device *adev) SOC15_REG_OFFSET(VCN, i, mmUVD_STATUS), ~UVD_STATUS__UVD_BUSY, UVD_STATUS__UVD_BUSY); - size = AMDGPU_GPU_PAGE_ALIGN(adev->vcn.fw[i]->size + 4); + size = AMDGPU_GPU_PAGE_ALIGN(adev->vcn.inst[i].fw->size + 4); /* mc resume*/ if (adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) { MMSCH_V1_0_INSERT_DIRECT_WT( @@ -1485,8 +1487,10 @@ static int vcn_v2_5_stop(struct amdgpu_device *adev) ~UVD_POWER_STATUS__UVD_POWER_STATUS_MASK); } - if (adev->pm.dpm_enabled) - amdgpu_dpm_enable_uvd(adev, false); + for (i = 0; i < adev->vcn.num_vcn_inst; ++i) { + if (adev->pm.dpm_enabled) + amdgpu_dpm_enable_vcn(adev, false, i); + } return 0; } @@ -1778,6 +1782,7 @@ static bool vcn_v2_5_is_idle(void *handle) for (i = 0; i < adev->vcn.num_vcn_inst; ++i) { if (adev->vcn.harvest_config & (1 << i)) continue; + ret &= (RREG32_SOC15(VCN, i, mmUVD_STATUS) == UVD_STATUS__IDLE); } @@ -1801,17 +1806,17 @@ static int vcn_v2_5_wait_for_idle(struct amdgpu_ip_block *ip_block) return ret; } -static int vcn_v2_5_set_clockgating_state(void *handle, +static int vcn_v2_5_set_clockgating_state(struct amdgpu_ip_block *ip_block, enum amd_clockgating_state state) { - struct amdgpu_device *adev = (struct amdgpu_device *)handle; + struct amdgpu_device *adev = ip_block->adev; bool enable = (state == AMD_CG_STATE_GATE); if (amdgpu_sriov_vf(adev)) return 0; if (enable) { - if (!vcn_v2_5_is_idle(handle)) + if (!vcn_v2_5_is_idle(adev)) return -EBUSY; vcn_v2_5_enable_clock_gating(adev); } else { @@ -1821,10 +1826,10 @@ static int vcn_v2_5_set_clockgating_state(void *handle, return 0; } -static int vcn_v2_5_set_powergating_state(void *handle, +static int vcn_v2_5_set_powergating_state(struct amdgpu_ip_block *ip_block, enum amd_powergating_state state) { - struct amdgpu_device *adev = (struct amdgpu_device *)handle; + struct amdgpu_device *adev = ip_block->adev; int ret; if (amdgpu_sriov_vf(adev)) diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c b/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c index 6732ad7f16f549c94a5c2bdb7106d1157f2d874a..63ddd4cca9109cc25313537eb8c85baec60a8d33 100644 --- a/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c @@ -105,7 +105,7 @@ static int vcn_v3_0_start_sriov(struct amdgpu_device *adev); static void vcn_v3_0_set_dec_ring_funcs(struct amdgpu_device *adev); static void vcn_v3_0_set_enc_ring_funcs(struct amdgpu_device *adev); static void vcn_v3_0_set_irq_funcs(struct amdgpu_device *adev); -static int vcn_v3_0_set_powergating_state(void *handle, +static int vcn_v3_0_set_powergating_state(struct amdgpu_ip_block *ip_block, enum amd_powergating_state state); static int vcn_v3_0_pause_dpg_mode(struct amdgpu_device *adev, int inst_idx, struct dpg_pause_state *new_state); @@ -430,9 +430,9 @@ static int vcn_v3_0_hw_fini(struct amdgpu_ip_block *ip_block) if (!amdgpu_sriov_vf(adev)) { if ((adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG) || - (adev->vcn.cur_state != AMD_PG_STATE_GATE && - RREG32_SOC15(VCN, i, mmUVD_STATUS))) { - vcn_v3_0_set_powergating_state(adev, AMD_PG_STATE_GATE); + (adev->vcn.cur_state != AMD_PG_STATE_GATE && + RREG32_SOC15(VCN, i, mmUVD_STATUS))) { + vcn_v3_0_set_powergating_state(ip_block, AMD_PG_STATE_GATE); } } } @@ -490,7 +490,7 @@ static int vcn_v3_0_resume(struct amdgpu_ip_block *ip_block) */ static void vcn_v3_0_mc_resume(struct amdgpu_device *adev, int inst) { - uint32_t size = AMDGPU_GPU_PAGE_ALIGN(adev->vcn.fw[inst]->size + 4); + uint32_t size = AMDGPU_GPU_PAGE_ALIGN(adev->vcn.inst[inst].fw->size + 4); uint32_t offset; /* cache window 0: fw */ @@ -540,7 +540,7 @@ static void vcn_v3_0_mc_resume(struct amdgpu_device *adev, int inst) static void vcn_v3_0_mc_resume_dpg_mode(struct amdgpu_device *adev, int inst_idx, bool indirect) { - uint32_t size = AMDGPU_GPU_PAGE_ALIGN(adev->vcn.fw[inst_idx]->size + 4); + uint32_t size = AMDGPU_GPU_PAGE_ALIGN(adev->vcn.inst[inst_idx].fw->size + 4); uint32_t offset; /* cache window 0: fw */ @@ -1141,8 +1141,10 @@ static int vcn_v3_0_start(struct amdgpu_device *adev) uint32_t rb_bufsz, tmp; int i, j, k, r; - if (adev->pm.dpm_enabled) - amdgpu_dpm_enable_uvd(adev, true); + for (i = 0; i < adev->vcn.num_vcn_inst; ++i) { + if (adev->pm.dpm_enabled) + amdgpu_dpm_enable_vcn(adev, true, i); + } for (i = 0; i < adev->vcn.num_vcn_inst; ++i) { if (adev->vcn.harvest_config & (1 << i)) @@ -1373,7 +1375,7 @@ static int vcn_v3_0_start_sriov(struct amdgpu_device *adev) mmUVD_STATUS), ~UVD_STATUS__UVD_BUSY, UVD_STATUS__UVD_BUSY); - cache_size = AMDGPU_GPU_PAGE_ALIGN(adev->vcn.fw[i]->size + 4); + cache_size = AMDGPU_GPU_PAGE_ALIGN(adev->vcn.inst[i].fw->size + 4); if (adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) { MMSCH_V3_0_INSERT_DIRECT_WT(SOC15_REG_OFFSET(VCN, i, @@ -1632,8 +1634,10 @@ static int vcn_v3_0_stop(struct amdgpu_device *adev) vcn_v3_0_enable_static_power_gating(adev, i); } - if (adev->pm.dpm_enabled) - amdgpu_dpm_enable_uvd(adev, false); + for (i = 0; i < adev->vcn.num_vcn_inst; ++i) { + if (adev->pm.dpm_enabled) + amdgpu_dpm_enable_vcn(adev, false, i); + } return 0; } @@ -2132,10 +2136,10 @@ static int vcn_v3_0_wait_for_idle(struct amdgpu_ip_block *ip_block) return ret; } -static int vcn_v3_0_set_clockgating_state(void *handle, +static int vcn_v3_0_set_clockgating_state(struct amdgpu_ip_block *ip_block, enum amd_clockgating_state state) { - struct amdgpu_device *adev = (struct amdgpu_device *)handle; + struct amdgpu_device *adev = ip_block->adev; bool enable = state == AMD_CG_STATE_GATE; int i; @@ -2155,10 +2159,10 @@ static int vcn_v3_0_set_clockgating_state(void *handle, return 0; } -static int vcn_v3_0_set_powergating_state(void *handle, +static int vcn_v3_0_set_powergating_state(struct amdgpu_ip_block *ip_block, enum amd_powergating_state state) { - struct amdgpu_device *adev = (struct amdgpu_device *)handle; + struct amdgpu_device *adev = ip_block->adev; int ret; /* for SRIOV, guest should not control VCN Power-gating diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c index fcc8511e91ee0ffe3edf53cfaba810d3cfb60f2d..00551d6f0370198723c09a3d8adbe13914448110 100644 --- a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c @@ -96,7 +96,7 @@ static int amdgpu_ih_clientid_vcns[] = { static int vcn_v4_0_start_sriov(struct amdgpu_device *adev); static void vcn_v4_0_set_unified_ring_funcs(struct amdgpu_device *adev); static void vcn_v4_0_set_irq_funcs(struct amdgpu_device *adev); -static int vcn_v4_0_set_powergating_state(void *handle, +static int vcn_v4_0_set_powergating_state(struct amdgpu_ip_block *ip_block, enum amd_powergating_state state); static int vcn_v4_0_pause_dpg_mode(struct amdgpu_device *adev, int inst_idx, struct dpg_pause_state *new_state); @@ -366,9 +366,9 @@ static int vcn_v4_0_hw_fini(struct amdgpu_ip_block *ip_block) continue; if (!amdgpu_sriov_vf(adev)) { if ((adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG) || - (adev->vcn.cur_state != AMD_PG_STATE_GATE && - RREG32_SOC15(VCN, i, regUVD_STATUS))) { - vcn_v4_0_set_powergating_state(adev, AMD_PG_STATE_GATE); + (adev->vcn.cur_state != AMD_PG_STATE_GATE && + RREG32_SOC15(VCN, i, regUVD_STATUS))) { + vcn_v4_0_set_powergating_state(ip_block, AMD_PG_STATE_GATE); } } if (amdgpu_ras_is_supported(adev, AMDGPU_RAS_BLOCK__VCN)) @@ -431,7 +431,7 @@ static void vcn_v4_0_mc_resume(struct amdgpu_device *adev, int inst) uint32_t offset, size; const struct common_firmware_header *hdr; - hdr = (const struct common_firmware_header *)adev->vcn.fw[inst]->data; + hdr = (const struct common_firmware_header *)adev->vcn.inst[inst].fw->data; size = AMDGPU_GPU_PAGE_ALIGN(le32_to_cpu(hdr->ucode_size_bytes) + 8); /* cache window 0: fw */ @@ -491,7 +491,7 @@ static void vcn_v4_0_mc_resume_dpg_mode(struct amdgpu_device *adev, int inst_idx { uint32_t offset, size; const struct common_firmware_header *hdr; - hdr = (const struct common_firmware_header *)adev->vcn.fw[inst_idx]->data; + hdr = (const struct common_firmware_header *)adev->vcn.inst[inst_idx].fw->data; size = AMDGPU_GPU_PAGE_ALIGN(le32_to_cpu(hdr->ucode_size_bytes) + 8); /* cache window 0: fw */ @@ -1097,8 +1097,10 @@ static int vcn_v4_0_start(struct amdgpu_device *adev) uint32_t tmp; int i, j, k, r; - if (adev->pm.dpm_enabled) - amdgpu_dpm_enable_uvd(adev, true); + for (i = 0; i < adev->vcn.num_vcn_inst; ++i) { + if (adev->pm.dpm_enabled) + amdgpu_dpm_enable_vcn(adev, true, i); + } for (i = 0; i < adev->vcn.num_vcn_inst; ++i) { if (adev->vcn.harvest_config & (1 << i)) @@ -1341,7 +1343,7 @@ static int vcn_v4_0_start_sriov(struct amdgpu_device *adev) regUVD_STATUS), ~UVD_STATUS__UVD_BUSY, UVD_STATUS__UVD_BUSY); - cache_size = AMDGPU_GPU_PAGE_ALIGN(adev->vcn.fw[i]->size + 4); + cache_size = AMDGPU_GPU_PAGE_ALIGN(adev->vcn.inst[i].fw->size + 4); if (adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) { MMSCH_V4_0_INSERT_DIRECT_WT(SOC15_REG_OFFSET(VCN, i, @@ -1623,8 +1625,10 @@ static int vcn_v4_0_stop(struct amdgpu_device *adev) vcn_v4_0_enable_static_power_gating(adev, i); } - if (adev->pm.dpm_enabled) - amdgpu_dpm_enable_uvd(adev, false); + for (i = 0; i < adev->vcn.num_vcn_inst; ++i) { + if (adev->pm.dpm_enabled) + amdgpu_dpm_enable_vcn(adev, false, i); + } return 0; } @@ -2007,14 +2011,15 @@ static int vcn_v4_0_wait_for_idle(struct amdgpu_ip_block *ip_block) /** * vcn_v4_0_set_clockgating_state - set VCN block clockgating state * - * @handle: amdgpu_device pointer + * @ip_block: amdgpu_ip_block pointer * @state: clock gating state * * Set VCN block clockgating state */ -static int vcn_v4_0_set_clockgating_state(void *handle, enum amd_clockgating_state state) +static int vcn_v4_0_set_clockgating_state(struct amdgpu_ip_block *ip_block, + enum amd_clockgating_state state) { - struct amdgpu_device *adev = (struct amdgpu_device *)handle; + struct amdgpu_device *adev = ip_block->adev; bool enable = state == AMD_CG_STATE_GATE; int i; @@ -2037,14 +2042,15 @@ static int vcn_v4_0_set_clockgating_state(void *handle, enum amd_clockgating_sta /** * vcn_v4_0_set_powergating_state - set VCN block powergating state * - * @handle: amdgpu_device pointer + * @ip_block: amdgpu_ip_block pointer * @state: power gating state * * Set VCN block powergating state */ -static int vcn_v4_0_set_powergating_state(void *handle, enum amd_powergating_state state) +static int vcn_v4_0_set_powergating_state(struct amdgpu_ip_block *ip_block, + enum amd_powergating_state state) { - struct amdgpu_device *adev = (struct amdgpu_device *)handle; + struct amdgpu_device *adev = ip_block->adev; int ret; /* for SRIOV, guest should not control VCN Power-gating diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0_3.c b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0_3.c index 3f69b9b2bcd07988af54d43fae49cf7d3799eeec..ecdc027f822037ec150aa1d9d57841b67b9fc951 100644 --- a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0_3.c +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0_3.c @@ -87,7 +87,7 @@ static const struct amdgpu_hwip_reg_entry vcn_reg_list_4_0_3[] = { static int vcn_v4_0_3_start_sriov(struct amdgpu_device *adev); static void vcn_v4_0_3_set_unified_ring_funcs(struct amdgpu_device *adev); static void vcn_v4_0_3_set_irq_funcs(struct amdgpu_device *adev); -static int vcn_v4_0_3_set_powergating_state(void *handle, +static int vcn_v4_0_3_set_powergating_state(struct amdgpu_ip_block *ip_block, enum amd_powergating_state state); static int vcn_v4_0_3_pause_dpg_mode(struct amdgpu_device *adev, int inst_idx, struct dpg_pause_state *new_state); @@ -349,7 +349,7 @@ static int vcn_v4_0_3_hw_fini(struct amdgpu_ip_block *ip_block) cancel_delayed_work_sync(&adev->vcn.idle_work); if (adev->vcn.cur_state != AMD_PG_STATE_GATE) - vcn_v4_0_3_set_powergating_state(adev, AMD_PG_STATE_GATE); + vcn_v4_0_3_set_powergating_state(ip_block, AMD_PG_STATE_GATE); return 0; } @@ -407,7 +407,7 @@ static void vcn_v4_0_3_mc_resume(struct amdgpu_device *adev, int inst_idx) uint32_t offset, size, vcn_inst; const struct common_firmware_header *hdr; - hdr = (const struct common_firmware_header *)adev->vcn.fw[inst_idx]->data; + hdr = (const struct common_firmware_header *)adev->vcn.inst[inst_idx].fw->data; size = AMDGPU_GPU_PAGE_ALIGN(le32_to_cpu(hdr->ucode_size_bytes) + 8); vcn_inst = GET_INST(VCN, inst_idx); @@ -482,7 +482,7 @@ static void vcn_v4_0_3_mc_resume_dpg_mode(struct amdgpu_device *adev, int inst_i uint32_t offset, size; const struct common_firmware_header *hdr; - hdr = (const struct common_firmware_header *)adev->vcn.fw[inst_idx]->data; + hdr = (const struct common_firmware_header *)adev->vcn.inst[inst_idx].fw->data; size = AMDGPU_GPU_PAGE_ALIGN(le32_to_cpu(hdr->ucode_size_bytes) + 8); /* cache window 0: fw */ @@ -957,6 +957,8 @@ static int vcn_v4_0_3_start_sriov(struct amdgpu_device *adev) for (i = 0; i < adev->vcn.num_vcn_inst; i++) { vcn_inst = GET_INST(VCN, i); + vcn_v4_0_3_fw_shared_init(adev, vcn_inst); + memset(&header, 0, sizeof(struct mmsch_v4_0_3_init_header)); header.version = MMSCH_VERSION; header.total_size = sizeof(struct mmsch_v4_0_3_init_header) >> 2; @@ -969,7 +971,7 @@ static int vcn_v4_0_3_start_sriov(struct amdgpu_device *adev) MMSCH_V4_0_INSERT_DIRECT_RD_MOD_WT(SOC15_REG_OFFSET(VCN, 0, regUVD_STATUS), ~UVD_STATUS__UVD_BUSY, UVD_STATUS__UVD_BUSY); - cache_size = AMDGPU_GPU_PAGE_ALIGN(adev->vcn.fw[i]->size + 4); + cache_size = AMDGPU_GPU_PAGE_ALIGN(adev->vcn.inst[i].fw->size + 4); if (adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) { MMSCH_V4_0_INSERT_DIRECT_WT(SOC15_REG_OFFSET(VCN, 0, @@ -1121,8 +1123,10 @@ static int vcn_v4_0_3_start(struct amdgpu_device *adev) int i, j, k, r, vcn_inst; uint32_t tmp; - if (adev->pm.dpm_enabled) - amdgpu_dpm_enable_uvd(adev, true); + for (i = 0; i < adev->vcn.num_vcn_inst; ++i) { + if (adev->pm.dpm_enabled) + amdgpu_dpm_enable_vcn(adev, true, i); + } for (i = 0; i < adev->vcn.num_vcn_inst; ++i) { if (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG) { @@ -1395,8 +1399,10 @@ static int vcn_v4_0_3_stop(struct amdgpu_device *adev) vcn_v4_0_3_enable_clock_gating(adev, i); } Done: - if (adev->pm.dpm_enabled) - amdgpu_dpm_enable_uvd(adev, false); + for (i = 0; i < adev->vcn.num_vcn_inst; ++i) { + if (adev->pm.dpm_enabled) + amdgpu_dpm_enable_vcn(adev, false, i); + } return 0; } @@ -1616,15 +1622,15 @@ static int vcn_v4_0_3_wait_for_idle(struct amdgpu_ip_block *ip_block) /* vcn_v4_0_3_set_clockgating_state - set VCN block clockgating state * - * @handle: amdgpu_device pointer + * @ip_block: amdgpu_ip_block pointer * @state: clock gating state * * Set VCN block clockgating state */ -static int vcn_v4_0_3_set_clockgating_state(void *handle, +static int vcn_v4_0_3_set_clockgating_state(struct amdgpu_ip_block *ip_block, enum amd_clockgating_state state) { - struct amdgpu_device *adev = (struct amdgpu_device *)handle; + struct amdgpu_device *adev = ip_block->adev; bool enable = state == AMD_CG_STATE_GATE; int i; @@ -1644,15 +1650,15 @@ static int vcn_v4_0_3_set_clockgating_state(void *handle, /** * vcn_v4_0_3_set_powergating_state - set VCN block powergating state * - * @handle: amdgpu_device pointer + * @ip_block: amdgpu_ip_block pointer * @state: power gating state * * Set VCN block powergating state */ -static int vcn_v4_0_3_set_powergating_state(void *handle, +static int vcn_v4_0_3_set_powergating_state(struct amdgpu_ip_block *ip_block, enum amd_powergating_state state) { - struct amdgpu_device *adev = (struct amdgpu_device *)handle; + struct amdgpu_device *adev = ip_block->adev; int ret; /* for SRIOV, guest should not control VCN Power-gating @@ -1911,9 +1917,94 @@ static const struct amdgpu_ras_block_hw_ops vcn_v4_0_3_ras_hw_ops = { .reset_ras_error_count = vcn_v4_0_3_reset_ras_error_count, }; +static int vcn_v4_0_3_aca_bank_parser(struct aca_handle *handle, struct aca_bank *bank, + enum aca_smu_type type, void *data) +{ + struct aca_bank_info info; + u64 misc0; + int ret; + + ret = aca_bank_info_decode(bank, &info); + if (ret) + return ret; + + misc0 = bank->regs[ACA_REG_IDX_MISC0]; + switch (type) { + case ACA_SMU_TYPE_UE: + ret = aca_error_cache_log_bank_error(handle, &info, ACA_ERROR_TYPE_UE, + 1ULL); + break; + case ACA_SMU_TYPE_CE: + ret = aca_error_cache_log_bank_error(handle, &info, ACA_ERROR_TYPE_CE, + ACA_REG__MISC0__ERRCNT(misc0)); + break; + default: + return -EINVAL; + } + + return ret; +} + +/* reference to smu driver if header file */ +static int vcn_v4_0_3_err_codes[] = { + 14, 15, /* VCN */ +}; + +static bool vcn_v4_0_3_aca_bank_is_valid(struct aca_handle *handle, struct aca_bank *bank, + enum aca_smu_type type, void *data) +{ + u32 instlo; + + instlo = ACA_REG__IPID__INSTANCEIDLO(bank->regs[ACA_REG_IDX_IPID]); + instlo &= GENMASK(31, 1); + + if (instlo != mmSMNAID_AID0_MCA_SMU) + return false; + + if (aca_bank_check_error_codes(handle->adev, bank, + vcn_v4_0_3_err_codes, + ARRAY_SIZE(vcn_v4_0_3_err_codes))) + return false; + + return true; +} + +static const struct aca_bank_ops vcn_v4_0_3_aca_bank_ops = { + .aca_bank_parser = vcn_v4_0_3_aca_bank_parser, + .aca_bank_is_valid = vcn_v4_0_3_aca_bank_is_valid, +}; + +static const struct aca_info vcn_v4_0_3_aca_info = { + .hwip = ACA_HWIP_TYPE_SMU, + .mask = ACA_ERROR_UE_MASK, + .bank_ops = &vcn_v4_0_3_aca_bank_ops, +}; + +static int vcn_v4_0_3_ras_late_init(struct amdgpu_device *adev, struct ras_common_if *ras_block) +{ + int r; + + r = amdgpu_ras_block_late_init(adev, ras_block); + if (r) + return r; + + r = amdgpu_ras_bind_aca(adev, AMDGPU_RAS_BLOCK__VCN, + &vcn_v4_0_3_aca_info, NULL); + if (r) + goto late_fini; + + return 0; + +late_fini: + amdgpu_ras_block_late_fini(adev, ras_block); + + return r; +} + static struct amdgpu_vcn_ras vcn_v4_0_3_ras = { .ras_block = { .hw_ops = &vcn_v4_0_3_ras_hw_ops, + .ras_late_init = vcn_v4_0_3_ras_late_init, }, }; diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c index 71961fb3f7ff5f62ff250d31ff197ac33c3f9ae2..23d3c16c9d9f290b24e0a9c8bad3fd3d9640ad7e 100644 --- a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c @@ -95,7 +95,7 @@ static int amdgpu_ih_clientid_vcns[] = { static void vcn_v4_0_5_set_unified_ring_funcs(struct amdgpu_device *adev); static void vcn_v4_0_5_set_irq_funcs(struct amdgpu_device *adev); -static int vcn_v4_0_5_set_powergating_state(void *handle, +static int vcn_v4_0_5_set_powergating_state(struct amdgpu_ip_block *ip_block, enum amd_powergating_state state); static int vcn_v4_0_5_pause_dpg_mode(struct amdgpu_device *adev, int inst_idx, struct dpg_pause_state *new_state); @@ -309,7 +309,7 @@ static int vcn_v4_0_5_hw_fini(struct amdgpu_ip_block *ip_block) if ((adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG) || (adev->vcn.cur_state != AMD_PG_STATE_GATE && RREG32_SOC15(VCN, i, regUVD_STATUS))) { - vcn_v4_0_5_set_powergating_state(adev, AMD_PG_STATE_GATE); + vcn_v4_0_5_set_powergating_state(ip_block, AMD_PG_STATE_GATE); } } } @@ -370,7 +370,7 @@ static void vcn_v4_0_5_mc_resume(struct amdgpu_device *adev, int inst) uint32_t offset, size; const struct common_firmware_header *hdr; - hdr = (const struct common_firmware_header *)adev->vcn.fw[inst]->data; + hdr = (const struct common_firmware_header *)adev->vcn.inst[inst].fw->data; size = AMDGPU_GPU_PAGE_ALIGN(le32_to_cpu(hdr->ucode_size_bytes) + 8); /* cache window 0: fw */ @@ -431,7 +431,7 @@ static void vcn_v4_0_5_mc_resume_dpg_mode(struct amdgpu_device *adev, int inst_i uint32_t offset, size; const struct common_firmware_header *hdr; - hdr = (const struct common_firmware_header *)adev->vcn.fw[inst_idx]->data; + hdr = (const struct common_firmware_header *)adev->vcn.inst[inst_idx].fw->data; size = AMDGPU_GPU_PAGE_ALIGN(le32_to_cpu(hdr->ucode_size_bytes) + 8); /* cache window 0: fw */ @@ -1000,8 +1000,10 @@ static int vcn_v4_0_5_start(struct amdgpu_device *adev) uint32_t tmp; int i, j, k, r; - if (adev->pm.dpm_enabled) - amdgpu_dpm_enable_uvd(adev, true); + for (i = 0; i < adev->vcn.num_vcn_inst; ++i) { + if (adev->pm.dpm_enabled) + amdgpu_dpm_enable_vcn(adev, true, i); + } for (i = 0; i < adev->vcn.num_vcn_inst; ++i) { if (adev->vcn.harvest_config & (1 << i)) @@ -1277,8 +1279,10 @@ static int vcn_v4_0_5_stop(struct amdgpu_device *adev) vcn_v4_0_5_enable_static_power_gating(adev, i); } - if (adev->pm.dpm_enabled) - amdgpu_dpm_enable_uvd(adev, false); + for (i = 0; i < adev->vcn.num_vcn_inst; ++i) { + if (adev->pm.dpm_enabled) + amdgpu_dpm_enable_vcn(adev, false, i); + } return 0; } @@ -1492,14 +1496,15 @@ static int vcn_v4_0_5_wait_for_idle(struct amdgpu_ip_block *ip_block) /** * vcn_v4_0_5_set_clockgating_state - set VCN block clockgating state * - * @handle: amdgpu_device pointer + * @ip_block: amdgpu_ip_block pointer * @state: clock gating state * * Set VCN block clockgating state */ -static int vcn_v4_0_5_set_clockgating_state(void *handle, enum amd_clockgating_state state) +static int vcn_v4_0_5_set_clockgating_state(struct amdgpu_ip_block *ip_block, + enum amd_clockgating_state state) { - struct amdgpu_device *adev = (struct amdgpu_device *)handle; + struct amdgpu_device *adev = ip_block->adev; bool enable = (state == AMD_CG_STATE_GATE) ? true : false; int i; @@ -1522,14 +1527,15 @@ static int vcn_v4_0_5_set_clockgating_state(void *handle, enum amd_clockgating_s /** * vcn_v4_0_5_set_powergating_state - set VCN block powergating state * - * @handle: amdgpu_device pointer + * @ip_block: amdgpu_ip_block pointer * @state: power gating state * * Set VCN block powergating state */ -static int vcn_v4_0_5_set_powergating_state(void *handle, enum amd_powergating_state state) +static int vcn_v4_0_5_set_powergating_state(struct amdgpu_ip_block *ip_block, + enum amd_powergating_state state) { - struct amdgpu_device *adev = (struct amdgpu_device *)handle; + struct amdgpu_device *adev = ip_block->adev; int ret; if (state == adev->vcn.cur_state) diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c b/drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c index bd3d2bbdc16bb6e6205fd7e65d90b5ebde3871ec..b6d78381ebfbc75b617bfe2e264a409a3efbef60 100644 --- a/drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c @@ -32,7 +32,7 @@ #include "vcn/vcn_5_0_0_offset.h" #include "vcn/vcn_5_0_0_sh_mask.h" -#include "ivsrcid/vcn/irqsrcs_vcn_4_0.h" +#include "ivsrcid/vcn/irqsrcs_vcn_5_0.h" #include "vcn_v5_0_0.h" #include @@ -78,7 +78,7 @@ static int amdgpu_ih_clientid_vcns[] = { static void vcn_v5_0_0_set_unified_ring_funcs(struct amdgpu_device *adev); static void vcn_v5_0_0_set_irq_funcs(struct amdgpu_device *adev); -static int vcn_v5_0_0_set_powergating_state(void *handle, +static int vcn_v5_0_0_set_powergating_state(struct amdgpu_ip_block *ip_block, enum amd_powergating_state state); static int vcn_v5_0_0_pause_dpg_mode(struct amdgpu_device *adev, int inst_idx, struct dpg_pause_state *new_state); @@ -105,6 +105,21 @@ static int vcn_v5_0_0_early_init(struct amdgpu_ip_block *ip_block) return amdgpu_vcn_early_init(adev); } +void vcn_v5_0_0_alloc_ip_dump(struct amdgpu_device *adev) +{ + uint32_t reg_count = ARRAY_SIZE(vcn_reg_list_5_0); + uint32_t *ptr; + + /* Allocate memory for VCN IP Dump buffer */ + ptr = kcalloc(adev->vcn.num_vcn_inst * reg_count, sizeof(uint32_t), GFP_KERNEL); + if (!ptr) { + DRM_ERROR("Failed to allocate memory for VCN IP Dump\n"); + adev->vcn.ip_dump = NULL; + } else { + adev->vcn.ip_dump = ptr; + } +} + /** * vcn_v5_0_0_sw_init - sw init for VCN block * @@ -117,8 +132,6 @@ static int vcn_v5_0_0_sw_init(struct amdgpu_ip_block *ip_block) struct amdgpu_ring *ring; struct amdgpu_device *adev = ip_block->adev; int i, r; - uint32_t reg_count = ARRAY_SIZE(vcn_reg_list_5_0); - uint32_t *ptr; r = amdgpu_vcn_sw_init(adev); if (r) @@ -140,13 +153,13 @@ static int vcn_v5_0_0_sw_init(struct amdgpu_ip_block *ip_block) /* VCN UNIFIED TRAP */ r = amdgpu_irq_add_id(adev, amdgpu_ih_clientid_vcns[i], - VCN_4_0__SRCID__UVD_ENC_GENERAL_PURPOSE, &adev->vcn.inst[i].irq); + VCN_5_0__SRCID__UVD_ENC_GENERAL_PURPOSE, &adev->vcn.inst[i].irq); if (r) return r; /* VCN POISON TRAP */ r = amdgpu_irq_add_id(adev, amdgpu_ih_clientid_vcns[i], - VCN_4_0__SRCID_UVD_POISON, &adev->vcn.inst[i].irq); + VCN_5_0__SRCID_UVD_POISON, &adev->vcn.inst[i].irq); if (r) return r; @@ -177,14 +190,7 @@ static int vcn_v5_0_0_sw_init(struct amdgpu_ip_block *ip_block) if (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG) adev->vcn.pause_dpg_mode = vcn_v5_0_0_pause_dpg_mode; - /* Allocate memory for VCN IP Dump buffer */ - ptr = kcalloc(adev->vcn.num_vcn_inst * reg_count, sizeof(uint32_t), GFP_KERNEL); - if (!ptr) { - DRM_ERROR("Failed to allocate memory for VCN IP Dump\n"); - adev->vcn.ip_dump = NULL; - } else { - adev->vcn.ip_dump = ptr; - } + vcn_v5_0_0_alloc_ip_dump(adev); r = amdgpu_vcn_sysfs_reset_mask_init(adev); if (r) @@ -283,7 +289,7 @@ static int vcn_v5_0_0_hw_fini(struct amdgpu_ip_block *ip_block) if ((adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG) || (adev->vcn.cur_state != AMD_PG_STATE_GATE && RREG32_SOC15(VCN, i, regUVD_STATUS))) { - vcn_v5_0_0_set_powergating_state(adev, AMD_PG_STATE_GATE); + vcn_v5_0_0_set_powergating_state(ip_block, AMD_PG_STATE_GATE); } } } @@ -344,7 +350,7 @@ static void vcn_v5_0_0_mc_resume(struct amdgpu_device *adev, int inst) uint32_t offset, size; const struct common_firmware_header *hdr; - hdr = (const struct common_firmware_header *)adev->vcn.fw[inst]->data; + hdr = (const struct common_firmware_header *)adev->vcn.inst[inst].fw->data; size = AMDGPU_GPU_PAGE_ALIGN(le32_to_cpu(hdr->ucode_size_bytes) + 8); /* cache window 0: fw */ @@ -405,7 +411,7 @@ static void vcn_v5_0_0_mc_resume_dpg_mode(struct amdgpu_device *adev, int inst_i uint32_t offset, size; const struct common_firmware_header *hdr; - hdr = (const struct common_firmware_header *)adev->vcn.fw[inst_idx]->data; + hdr = (const struct common_firmware_header *)adev->vcn.inst[inst_idx].fw->data; size = AMDGPU_GPU_PAGE_ALIGN(le32_to_cpu(hdr->ucode_size_bytes) + 8); /* cache window 0: fw */ @@ -771,8 +777,10 @@ static int vcn_v5_0_0_start(struct amdgpu_device *adev) uint32_t tmp; int i, j, k, r; - if (adev->pm.dpm_enabled) - amdgpu_dpm_enable_uvd(adev, true); + for (i = 0; i < adev->vcn.num_vcn_inst; ++i) { + if (adev->pm.dpm_enabled) + amdgpu_dpm_enable_vcn(adev, true, i); + } for (i = 0; i < adev->vcn.num_vcn_inst; ++i) { if (adev->vcn.harvest_config & (1 << i)) @@ -1018,8 +1026,10 @@ static int vcn_v5_0_0_stop(struct amdgpu_device *adev) vcn_v5_0_0_enable_static_power_gating(adev, i); } - if (adev->pm.dpm_enabled) - amdgpu_dpm_enable_uvd(adev, false); + for (i = 0; i < adev->vcn.num_vcn_inst; ++i) { + if (adev->pm.dpm_enabled) + amdgpu_dpm_enable_vcn(adev, false, i); + } return 0; } @@ -1229,14 +1239,15 @@ static int vcn_v5_0_0_wait_for_idle(struct amdgpu_ip_block *ip_block) /** * vcn_v5_0_0_set_clockgating_state - set VCN block clockgating state * - * @handle: amdgpu_device pointer + * @ip_block: amdgpu_ip_block pointer * @state: clock gating state * * Set VCN block clockgating state */ -static int vcn_v5_0_0_set_clockgating_state(void *handle, enum amd_clockgating_state state) +static int vcn_v5_0_0_set_clockgating_state(struct amdgpu_ip_block *ip_block, + enum amd_clockgating_state state) { - struct amdgpu_device *adev = (struct amdgpu_device *)handle; + struct amdgpu_device *adev = ip_block->adev; bool enable = (state == AMD_CG_STATE_GATE) ? true : false; int i; @@ -1259,14 +1270,15 @@ static int vcn_v5_0_0_set_clockgating_state(void *handle, enum amd_clockgating_s /** * vcn_v5_0_0_set_powergating_state - set VCN block powergating state * - * @handle: amdgpu_device pointer + * @ip_block: amdgpu_ip_block pointer * @state: power gating state * * Set VCN block powergating state */ -static int vcn_v5_0_0_set_powergating_state(void *handle, enum amd_powergating_state state) +static int vcn_v5_0_0_set_powergating_state(struct amdgpu_ip_block *ip_block, + enum amd_powergating_state state) { - struct amdgpu_device *adev = (struct amdgpu_device *)handle; + struct amdgpu_device *adev = ip_block->adev; int ret; if (state == adev->vcn.cur_state) @@ -1312,10 +1324,10 @@ static int vcn_v5_0_0_process_interrupt(struct amdgpu_device *adev, struct amdgp DRM_DEBUG("IH: VCN TRAP\n"); switch (entry->src_id) { - case VCN_4_0__SRCID__UVD_ENC_GENERAL_PURPOSE: + case VCN_5_0__SRCID__UVD_ENC_GENERAL_PURPOSE: amdgpu_fence_process(&adev->vcn.inst[ip_instance].ring_enc[0]); break; - case VCN_4_0__SRCID_UVD_POISON: + case VCN_5_0__SRCID_UVD_POISON: amdgpu_vcn_process_poison_irq(adev, source, entry); break; default: @@ -1351,7 +1363,8 @@ static void vcn_v5_0_0_set_irq_funcs(struct amdgpu_device *adev) } } -static void vcn_v5_0_print_ip_state(struct amdgpu_ip_block *ip_block, struct drm_printer *p) +void vcn_v5_0_0_print_ip_state(struct amdgpu_ip_block *ip_block, + struct drm_printer *p) { struct amdgpu_device *adev = ip_block->adev; int i, j; @@ -1383,7 +1396,7 @@ static void vcn_v5_0_print_ip_state(struct amdgpu_ip_block *ip_block, struct drm } } -static void vcn_v5_0_dump_ip_state(struct amdgpu_ip_block *ip_block) +void vcn_v5_0_0_dump_ip_state(struct amdgpu_ip_block *ip_block) { struct amdgpu_device *adev = ip_block->adev; int i, j; @@ -1424,8 +1437,8 @@ static const struct amd_ip_funcs vcn_v5_0_0_ip_funcs = { .wait_for_idle = vcn_v5_0_0_wait_for_idle, .set_clockgating_state = vcn_v5_0_0_set_clockgating_state, .set_powergating_state = vcn_v5_0_0_set_powergating_state, - .dump_ip_state = vcn_v5_0_dump_ip_state, - .print_ip_state = vcn_v5_0_print_ip_state, + .dump_ip_state = vcn_v5_0_0_dump_ip_state, + .print_ip_state = vcn_v5_0_0_print_ip_state, }; const struct amdgpu_ip_block_version vcn_v5_0_0_ip_block = { diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.h b/drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.h index 51bbccd4360ffa1f665f99a0481d877cbf615a2e..b8927652bc50ca4874053b776bcc5a2e2f6830de 100644 --- a/drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.h +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.h @@ -32,6 +32,11 @@ #define VCN_VID_IP_ADDRESS 0x0 #define VCN_AON_IP_ADDRESS 0x30000 +void vcn_v5_0_0_alloc_ip_dump(struct amdgpu_device *adev); +void vcn_v5_0_0_print_ip_state(struct amdgpu_ip_block *ip_block, + struct drm_printer *p); +void vcn_v5_0_0_dump_ip_state(struct amdgpu_ip_block *ip_block); + extern const struct amdgpu_ip_block_version vcn_v5_0_0_ip_block; #endif /* __VCN_V5_0_0_H__ */ diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v5_0_1.c b/drivers/gpu/drm/amd/amdgpu/vcn_v5_0_1.c new file mode 100644 index 0000000000000000000000000000000000000000..8b463c977d08f2e42721d9a16857f8b194dd5a5f --- /dev/null +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v5_0_1.c @@ -0,0 +1,1118 @@ +/* + * Copyright 2024 Advanced Micro Devices, Inc. All rights reserved. + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR + * OTHER DEALINGS IN THE SOFTWARE. + * + */ + +#include +#include "amdgpu.h" +#include "amdgpu_vcn.h" +#include "amdgpu_pm.h" +#include "soc15.h" +#include "soc15d.h" +#include "soc15_hw_ip.h" +#include "vcn_v2_0.h" + +#include "vcn/vcn_5_0_0_offset.h" +#include "vcn/vcn_5_0_0_sh_mask.h" +#include "ivsrcid/vcn/irqsrcs_vcn_5_0.h" +#include "vcn_v5_0_0.h" +#include "vcn_v5_0_1.h" + +#include + +static void vcn_v5_0_1_set_unified_ring_funcs(struct amdgpu_device *adev); +static void vcn_v5_0_1_set_irq_funcs(struct amdgpu_device *adev); +static int vcn_v5_0_1_set_powergating_state(struct amdgpu_ip_block *ip_block, + enum amd_powergating_state state); +static void vcn_v5_0_1_unified_ring_set_wptr(struct amdgpu_ring *ring); + +/** + * vcn_v5_0_1_early_init - set function pointers and load microcode + * + * @ip_block: Pointer to the amdgpu_ip_block for this hw instance. + * + * Set ring and irq function pointers + * Load microcode from filesystem + */ +static int vcn_v5_0_1_early_init(struct amdgpu_ip_block *ip_block) +{ + struct amdgpu_device *adev = ip_block->adev; + + /* re-use enc ring as unified ring */ + adev->vcn.num_enc_rings = 1; + + vcn_v5_0_1_set_unified_ring_funcs(adev); + vcn_v5_0_1_set_irq_funcs(adev); + + return amdgpu_vcn_early_init(adev); +} + +/** + * vcn_v5_0_1_sw_init - sw init for VCN block + * + * @ip_block: Pointer to the amdgpu_ip_block for this hw instance. + * + * Load firmware and sw initialization + */ +static int vcn_v5_0_1_sw_init(struct amdgpu_ip_block *ip_block) +{ + struct amdgpu_device *adev = ip_block->adev; + struct amdgpu_ring *ring; + int i, r, vcn_inst; + + r = amdgpu_vcn_sw_init(adev); + if (r) + return r; + + amdgpu_vcn_setup_ucode(adev); + + r = amdgpu_vcn_resume(adev); + if (r) + return r; + + /* VCN UNIFIED TRAP */ + r = amdgpu_irq_add_id(adev, SOC15_IH_CLIENTID_VCN, + VCN_5_0__SRCID__UVD_ENC_GENERAL_PURPOSE, &adev->vcn.inst->irq); + if (r) + return r; + + for (i = 0; i < adev->vcn.num_vcn_inst; i++) { + volatile struct amdgpu_vcn5_fw_shared *fw_shared; + + vcn_inst = GET_INST(VCN, i); + + ring = &adev->vcn.inst[i].ring_enc[0]; + ring->use_doorbell = true; + ring->doorbell_index = (adev->doorbell_index.vcn.vcn_ring0_1 << 1) + 9 * vcn_inst; + + ring->vm_hub = AMDGPU_MMHUB0(adev->vcn.inst[i].aid_id); + sprintf(ring->name, "vcn_unified_%d", adev->vcn.inst[i].aid_id); + + r = amdgpu_ring_init(adev, ring, 512, &adev->vcn.inst[i].irq, 0, + AMDGPU_RING_PRIO_DEFAULT, &adev->vcn.inst[i].sched_score); + if (r) + return r; + + fw_shared = adev->vcn.inst[i].fw_shared.cpu_addr; + fw_shared->present_flag_0 = cpu_to_le32(AMDGPU_FW_SHARED_FLAG_0_UNIFIED_QUEUE); + fw_shared->sq.is_enabled = true; + + if (amdgpu_vcnfw_log) + amdgpu_vcn_fwlog_init(&adev->vcn.inst[i]); + } + + /* TODO: Add queue reset mask when FW fully supports it */ + adev->vcn.supported_reset = + amdgpu_get_soft_full_reset_mask(&adev->vcn.inst[0].ring_enc[0]); + + vcn_v5_0_0_alloc_ip_dump(adev); + + return amdgpu_vcn_sysfs_reset_mask_init(adev); +} + +/** + * vcn_v5_0_1_sw_fini - sw fini for VCN block + * + * @ip_block: Pointer to the amdgpu_ip_block for this hw instance. + * + * VCN suspend and free up sw allocation + */ +static int vcn_v5_0_1_sw_fini(struct amdgpu_ip_block *ip_block) +{ + struct amdgpu_device *adev = ip_block->adev; + int i, r, idx; + + if (drm_dev_enter(adev_to_drm(adev), &idx)) { + for (i = 0; i < adev->vcn.num_vcn_inst; i++) { + volatile struct amdgpu_vcn4_fw_shared *fw_shared; + + fw_shared = adev->vcn.inst[i].fw_shared.cpu_addr; + fw_shared->present_flag_0 = 0; + fw_shared->sq.is_enabled = 0; + } + + drm_dev_exit(idx); + } + + r = amdgpu_vcn_suspend(adev); + if (r) + return r; + + r = amdgpu_vcn_sw_fini(adev); + + amdgpu_vcn_sysfs_reset_mask_fini(adev); + + kfree(adev->vcn.ip_dump); + + return r; +} + +/** + * vcn_v5_0_1_hw_init - start and test VCN block + * + * @ip_block: Pointer to the amdgpu_ip_block for this hw instance. + * + * Initialize the hardware, boot up the VCPU and do some testing + */ +static int vcn_v5_0_1_hw_init(struct amdgpu_ip_block *ip_block) +{ + struct amdgpu_device *adev = ip_block->adev; + struct amdgpu_ring *ring; + int i, r, vcn_inst; + + for (i = 0; i < adev->vcn.num_vcn_inst; ++i) { + vcn_inst = GET_INST(VCN, i); + ring = &adev->vcn.inst[i].ring_enc[0]; + + if (ring->use_doorbell) + adev->nbio.funcs->vcn_doorbell_range(adev, ring->use_doorbell, + ((adev->doorbell_index.vcn.vcn_ring0_1 << 1) + + 9 * vcn_inst), + adev->vcn.inst[i].aid_id); + + r = amdgpu_ring_test_helper(ring); + if (r) + return r; + } + + return 0; +} + +/** + * vcn_v5_0_1_hw_fini - stop the hardware block + * + * @ip_block: Pointer to the amdgpu_ip_block for this hw instance. + * + * Stop the VCN block, mark ring as not ready any more + */ +static int vcn_v5_0_1_hw_fini(struct amdgpu_ip_block *ip_block) +{ + struct amdgpu_device *adev = ip_block->adev; + + cancel_delayed_work_sync(&adev->vcn.idle_work); + + return 0; +} + +/** + * vcn_v5_0_1_suspend - suspend VCN block + * + * @ip_block: Pointer to the amdgpu_ip_block for this hw instance. + * + * HW fini and suspend VCN block + */ +static int vcn_v5_0_1_suspend(struct amdgpu_ip_block *ip_block) +{ + struct amdgpu_device *adev = ip_block->adev; + int r; + + r = vcn_v5_0_1_hw_fini(ip_block); + if (r) + return r; + + r = amdgpu_vcn_suspend(adev); + + return r; +} + +/** + * vcn_v5_0_1_resume - resume VCN block + * + * @ip_block: Pointer to the amdgpu_ip_block for this hw instance. + * + * Resume firmware and hw init VCN block + */ +static int vcn_v5_0_1_resume(struct amdgpu_ip_block *ip_block) +{ + struct amdgpu_device *adev = ip_block->adev; + int r; + + r = amdgpu_vcn_resume(adev); + if (r) + return r; + + r = vcn_v5_0_1_hw_init(ip_block); + + return r; +} + +/** + * vcn_v5_0_1_mc_resume - memory controller programming + * + * @adev: amdgpu_device pointer + * @inst: instance number + * + * Let the VCN memory controller know it's offsets + */ +static void vcn_v5_0_1_mc_resume(struct amdgpu_device *adev, int inst) +{ + uint32_t offset, size, vcn_inst; + const struct common_firmware_header *hdr; + + hdr = (const struct common_firmware_header *)adev->vcn.inst[inst].fw->data; + size = AMDGPU_GPU_PAGE_ALIGN(le32_to_cpu(hdr->ucode_size_bytes) + 8); + + vcn_inst = GET_INST(VCN, inst); + /* cache window 0: fw */ + if (adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) { + WREG32_SOC15(VCN, vcn_inst, regUVD_LMI_VCPU_CACHE_64BIT_BAR_LOW, + (adev->firmware.ucode[AMDGPU_UCODE_ID_VCN + inst].tmr_mc_addr_lo)); + WREG32_SOC15(VCN, vcn_inst, regUVD_LMI_VCPU_CACHE_64BIT_BAR_HIGH, + (adev->firmware.ucode[AMDGPU_UCODE_ID_VCN + inst].tmr_mc_addr_hi)); + WREG32_SOC15(VCN, vcn_inst, regUVD_VCPU_CACHE_OFFSET0, 0); + offset = 0; + } else { + WREG32_SOC15(VCN, vcn_inst, regUVD_LMI_VCPU_CACHE_64BIT_BAR_LOW, + lower_32_bits(adev->vcn.inst[inst].gpu_addr)); + WREG32_SOC15(VCN, vcn_inst, regUVD_LMI_VCPU_CACHE_64BIT_BAR_HIGH, + upper_32_bits(adev->vcn.inst[inst].gpu_addr)); + offset = size; + WREG32_SOC15(VCN, vcn_inst, regUVD_VCPU_CACHE_OFFSET0, + AMDGPU_UVD_FIRMWARE_OFFSET >> 3); + } + WREG32_SOC15(VCN, vcn_inst, regUVD_VCPU_CACHE_SIZE0, size); + + /* cache window 1: stack */ + WREG32_SOC15(VCN, vcn_inst, regUVD_LMI_VCPU_CACHE1_64BIT_BAR_LOW, + lower_32_bits(adev->vcn.inst[inst].gpu_addr + offset)); + WREG32_SOC15(VCN, vcn_inst, regUVD_LMI_VCPU_CACHE1_64BIT_BAR_HIGH, + upper_32_bits(adev->vcn.inst[inst].gpu_addr + offset)); + WREG32_SOC15(VCN, vcn_inst, regUVD_VCPU_CACHE_OFFSET1, 0); + WREG32_SOC15(VCN, vcn_inst, regUVD_VCPU_CACHE_SIZE1, AMDGPU_VCN_STACK_SIZE); + + /* cache window 2: context */ + WREG32_SOC15(VCN, vcn_inst, regUVD_LMI_VCPU_CACHE2_64BIT_BAR_LOW, + lower_32_bits(adev->vcn.inst[inst].gpu_addr + offset + AMDGPU_VCN_STACK_SIZE)); + WREG32_SOC15(VCN, vcn_inst, regUVD_LMI_VCPU_CACHE2_64BIT_BAR_HIGH, + upper_32_bits(adev->vcn.inst[inst].gpu_addr + offset + AMDGPU_VCN_STACK_SIZE)); + WREG32_SOC15(VCN, vcn_inst, regUVD_VCPU_CACHE_OFFSET2, 0); + WREG32_SOC15(VCN, vcn_inst, regUVD_VCPU_CACHE_SIZE2, AMDGPU_VCN_CONTEXT_SIZE); + + /* non-cache window */ + WREG32_SOC15(VCN, vcn_inst, regUVD_LMI_VCPU_NC0_64BIT_BAR_LOW, + lower_32_bits(adev->vcn.inst[inst].fw_shared.gpu_addr)); + WREG32_SOC15(VCN, vcn_inst, regUVD_LMI_VCPU_NC0_64BIT_BAR_HIGH, + upper_32_bits(adev->vcn.inst[inst].fw_shared.gpu_addr)); + WREG32_SOC15(VCN, vcn_inst, regUVD_VCPU_NONCACHE_OFFSET0, 0); + WREG32_SOC15(VCN, vcn_inst, regUVD_VCPU_NONCACHE_SIZE0, + AMDGPU_GPU_PAGE_ALIGN(sizeof(struct amdgpu_vcn4_fw_shared))); +} + +/** + * vcn_v5_0_1_mc_resume_dpg_mode - memory controller programming for dpg mode + * + * @adev: amdgpu_device pointer + * @inst_idx: instance number index + * @indirect: indirectly write sram + * + * Let the VCN memory controller know it's offsets with dpg mode + */ +static void vcn_v5_0_1_mc_resume_dpg_mode(struct amdgpu_device *adev, int inst_idx, bool indirect) +{ + uint32_t offset, size; + const struct common_firmware_header *hdr; + + hdr = (const struct common_firmware_header *)adev->vcn.inst[inst_idx].fw->data; + size = AMDGPU_GPU_PAGE_ALIGN(le32_to_cpu(hdr->ucode_size_bytes) + 8); + + /* cache window 0: fw */ + if (adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) { + if (!indirect) { + WREG32_SOC24_DPG_MODE(inst_idx, SOC24_DPG_MODE_OFFSET( + VCN, 0, regUVD_LMI_VCPU_CACHE_64BIT_BAR_LOW), + (adev->firmware.ucode[AMDGPU_UCODE_ID_VCN + + inst_idx].tmr_mc_addr_lo), 0, indirect); + WREG32_SOC24_DPG_MODE(inst_idx, SOC24_DPG_MODE_OFFSET( + VCN, 0, regUVD_LMI_VCPU_CACHE_64BIT_BAR_HIGH), + (adev->firmware.ucode[AMDGPU_UCODE_ID_VCN + + inst_idx].tmr_mc_addr_hi), 0, indirect); + WREG32_SOC24_DPG_MODE(inst_idx, SOC24_DPG_MODE_OFFSET( + VCN, 0, regUVD_VCPU_CACHE_OFFSET0), 0, 0, indirect); + } else { + WREG32_SOC24_DPG_MODE(inst_idx, SOC24_DPG_MODE_OFFSET( + VCN, 0, regUVD_LMI_VCPU_CACHE_64BIT_BAR_LOW), 0, 0, indirect); + WREG32_SOC24_DPG_MODE(inst_idx, SOC24_DPG_MODE_OFFSET( + VCN, 0, regUVD_LMI_VCPU_CACHE_64BIT_BAR_HIGH), 0, 0, indirect); + WREG32_SOC24_DPG_MODE(inst_idx, SOC24_DPG_MODE_OFFSET( + VCN, 0, regUVD_VCPU_CACHE_OFFSET0), 0, 0, indirect); + } + offset = 0; + } else { + WREG32_SOC24_DPG_MODE(inst_idx, SOC24_DPG_MODE_OFFSET( + VCN, 0, regUVD_LMI_VCPU_CACHE_64BIT_BAR_LOW), + lower_32_bits(adev->vcn.inst[inst_idx].gpu_addr), 0, indirect); + WREG32_SOC24_DPG_MODE(inst_idx, SOC24_DPG_MODE_OFFSET( + VCN, 0, regUVD_LMI_VCPU_CACHE_64BIT_BAR_HIGH), + upper_32_bits(adev->vcn.inst[inst_idx].gpu_addr), 0, indirect); + offset = size; + WREG32_SOC24_DPG_MODE(inst_idx, SOC24_DPG_MODE_OFFSET( + VCN, 0, regUVD_VCPU_CACHE_OFFSET0), + AMDGPU_UVD_FIRMWARE_OFFSET >> 3, 0, indirect); + } + + if (!indirect) + WREG32_SOC24_DPG_MODE(inst_idx, SOC24_DPG_MODE_OFFSET( + VCN, 0, regUVD_VCPU_CACHE_SIZE0), size, 0, indirect); + else + WREG32_SOC24_DPG_MODE(inst_idx, SOC24_DPG_MODE_OFFSET( + VCN, 0, regUVD_VCPU_CACHE_SIZE0), 0, 0, indirect); + + /* cache window 1: stack */ + if (!indirect) { + WREG32_SOC24_DPG_MODE(inst_idx, SOC24_DPG_MODE_OFFSET( + VCN, 0, regUVD_LMI_VCPU_CACHE1_64BIT_BAR_LOW), + lower_32_bits(adev->vcn.inst[inst_idx].gpu_addr + offset), 0, indirect); + WREG32_SOC24_DPG_MODE(inst_idx, SOC24_DPG_MODE_OFFSET( + VCN, 0, regUVD_LMI_VCPU_CACHE1_64BIT_BAR_HIGH), + upper_32_bits(adev->vcn.inst[inst_idx].gpu_addr + offset), 0, indirect); + WREG32_SOC24_DPG_MODE(inst_idx, SOC24_DPG_MODE_OFFSET( + VCN, 0, regUVD_VCPU_CACHE_OFFSET1), 0, 0, indirect); + } else { + WREG32_SOC24_DPG_MODE(inst_idx, SOC24_DPG_MODE_OFFSET( + VCN, 0, regUVD_LMI_VCPU_CACHE1_64BIT_BAR_LOW), 0, 0, indirect); + WREG32_SOC24_DPG_MODE(inst_idx, SOC24_DPG_MODE_OFFSET( + VCN, 0, regUVD_LMI_VCPU_CACHE1_64BIT_BAR_HIGH), 0, 0, indirect); + WREG32_SOC24_DPG_MODE(inst_idx, SOC24_DPG_MODE_OFFSET( + VCN, 0, regUVD_VCPU_CACHE_OFFSET1), 0, 0, indirect); + } + WREG32_SOC24_DPG_MODE(inst_idx, SOC24_DPG_MODE_OFFSET( + VCN, 0, regUVD_VCPU_CACHE_SIZE1), AMDGPU_VCN_STACK_SIZE, 0, indirect); + + /* cache window 2: context */ + WREG32_SOC24_DPG_MODE(inst_idx, SOC24_DPG_MODE_OFFSET( + VCN, 0, regUVD_LMI_VCPU_CACHE2_64BIT_BAR_LOW), + lower_32_bits(adev->vcn.inst[inst_idx].gpu_addr + offset + + AMDGPU_VCN_STACK_SIZE), 0, indirect); + WREG32_SOC24_DPG_MODE(inst_idx, SOC24_DPG_MODE_OFFSET( + VCN, 0, regUVD_LMI_VCPU_CACHE2_64BIT_BAR_HIGH), + upper_32_bits(adev->vcn.inst[inst_idx].gpu_addr + offset + + AMDGPU_VCN_STACK_SIZE), 0, indirect); + WREG32_SOC24_DPG_MODE(inst_idx, SOC24_DPG_MODE_OFFSET( + VCN, 0, regUVD_VCPU_CACHE_OFFSET2), 0, 0, indirect); + WREG32_SOC24_DPG_MODE(inst_idx, SOC24_DPG_MODE_OFFSET( + VCN, 0, regUVD_VCPU_CACHE_SIZE2), AMDGPU_VCN_CONTEXT_SIZE, 0, indirect); + + /* non-cache window */ + WREG32_SOC24_DPG_MODE(inst_idx, SOC24_DPG_MODE_OFFSET( + VCN, 0, regUVD_LMI_VCPU_NC0_64BIT_BAR_LOW), + lower_32_bits(adev->vcn.inst[inst_idx].fw_shared.gpu_addr), 0, indirect); + WREG32_SOC24_DPG_MODE(inst_idx, SOC24_DPG_MODE_OFFSET( + VCN, 0, regUVD_LMI_VCPU_NC0_64BIT_BAR_HIGH), + upper_32_bits(adev->vcn.inst[inst_idx].fw_shared.gpu_addr), 0, indirect); + WREG32_SOC24_DPG_MODE(inst_idx, SOC24_DPG_MODE_OFFSET( + VCN, 0, regUVD_VCPU_NONCACHE_OFFSET0), 0, 0, indirect); + WREG32_SOC24_DPG_MODE(inst_idx, SOC24_DPG_MODE_OFFSET( + VCN, 0, regUVD_VCPU_NONCACHE_SIZE0), + AMDGPU_GPU_PAGE_ALIGN(sizeof(struct amdgpu_vcn4_fw_shared)), 0, indirect); + + /* VCN global tiling registers */ + WREG32_SOC24_DPG_MODE(inst_idx, SOC24_DPG_MODE_OFFSET( + VCN, 0, regUVD_GFX10_ADDR_CONFIG), adev->gfx.config.gb_addr_config, 0, indirect); +} + +/** + * vcn_v5_0_1_disable_clock_gating - disable VCN clock gating + * + * @adev: amdgpu_device pointer + * @inst: instance number + * + * Disable clock gating for VCN block + */ +static void vcn_v5_0_1_disable_clock_gating(struct amdgpu_device *adev, int inst) +{ +} + +/** + * vcn_v5_0_1_enable_clock_gating - enable VCN clock gating + * + * @adev: amdgpu_device pointer + * @inst: instance number + * + * Enable clock gating for VCN block + */ +static void vcn_v5_0_1_enable_clock_gating(struct amdgpu_device *adev, int inst) +{ +} + +/** + * vcn_v5_0_1_start_dpg_mode - VCN start with dpg mode + * + * @adev: amdgpu_device pointer + * @inst_idx: instance number index + * @indirect: indirectly write sram + * + * Start VCN block with dpg mode + */ +static int vcn_v5_0_1_start_dpg_mode(struct amdgpu_device *adev, int inst_idx, bool indirect) +{ + volatile struct amdgpu_vcn4_fw_shared *fw_shared = + adev->vcn.inst[inst_idx].fw_shared.cpu_addr; + struct amdgpu_ring *ring; + int vcn_inst; + uint32_t tmp; + + vcn_inst = GET_INST(VCN, inst_idx); + + /* disable register anti-hang mechanism */ + WREG32_P(SOC15_REG_OFFSET(VCN, vcn_inst, regUVD_POWER_STATUS), 1, + ~UVD_POWER_STATUS__UVD_POWER_STATUS_MASK); + + /* enable dynamic power gating mode */ + tmp = RREG32_SOC15(VCN, vcn_inst, regUVD_POWER_STATUS); + tmp |= UVD_POWER_STATUS__UVD_PG_MODE_MASK; + WREG32_SOC15(VCN, vcn_inst, regUVD_POWER_STATUS, tmp); + + if (indirect) { + adev->vcn.inst[inst_idx].dpg_sram_curr_addr = + (uint32_t *)adev->vcn.inst[inst_idx].dpg_sram_cpu_addr; + /* Use dummy register 0xDEADBEEF passing AID selection to PSP FW */ + WREG32_SOC24_DPG_MODE(inst_idx, 0xDEADBEEF, + adev->vcn.inst[inst_idx].aid_id, 0, true); + } + + /* enable VCPU clock */ + tmp = (0xFF << UVD_VCPU_CNTL__PRB_TIMEOUT_VAL__SHIFT); + tmp |= UVD_VCPU_CNTL__CLK_EN_MASK | UVD_VCPU_CNTL__BLK_RST_MASK; + WREG32_SOC24_DPG_MODE(inst_idx, SOC24_DPG_MODE_OFFSET( + VCN, 0, regUVD_VCPU_CNTL), tmp, 0, indirect); + + /* disable master interrupt */ + WREG32_SOC24_DPG_MODE(inst_idx, SOC24_DPG_MODE_OFFSET( + VCN, 0, regUVD_MASTINT_EN), 0, 0, indirect); + + /* setup regUVD_LMI_CTRL */ + tmp = (UVD_LMI_CTRL__WRITE_CLEAN_TIMER_EN_MASK | + UVD_LMI_CTRL__REQ_MODE_MASK | + UVD_LMI_CTRL__CRC_RESET_MASK | + UVD_LMI_CTRL__MASK_MC_URGENT_MASK | + UVD_LMI_CTRL__DATA_COHERENCY_EN_MASK | + UVD_LMI_CTRL__VCPU_DATA_COHERENCY_EN_MASK | + (8 << UVD_LMI_CTRL__WRITE_CLEAN_TIMER__SHIFT) | + 0x00100000L); + WREG32_SOC24_DPG_MODE(inst_idx, SOC24_DPG_MODE_OFFSET( + VCN, 0, regUVD_LMI_CTRL), tmp, 0, indirect); + + vcn_v5_0_1_mc_resume_dpg_mode(adev, inst_idx, indirect); + + tmp = (0xFF << UVD_VCPU_CNTL__PRB_TIMEOUT_VAL__SHIFT); + tmp |= UVD_VCPU_CNTL__CLK_EN_MASK; + WREG32_SOC24_DPG_MODE(inst_idx, SOC24_DPG_MODE_OFFSET( + VCN, 0, regUVD_VCPU_CNTL), tmp, 0, indirect); + + /* enable LMI MC and UMC channels */ + tmp = 0x1f << UVD_LMI_CTRL2__RE_OFLD_MIF_WR_REQ_NUM__SHIFT; + WREG32_SOC24_DPG_MODE(inst_idx, SOC24_DPG_MODE_OFFSET( + VCN, 0, regUVD_LMI_CTRL2), tmp, 0, indirect); + + /* enable master interrupt */ + WREG32_SOC24_DPG_MODE(inst_idx, SOC24_DPG_MODE_OFFSET( + VCN, 0, regUVD_MASTINT_EN), + UVD_MASTINT_EN__VCPU_EN_MASK, 0, indirect); + + if (indirect) + amdgpu_vcn_psp_update_sram(adev, inst_idx, AMDGPU_UCODE_ID_VCN0_RAM); + + ring = &adev->vcn.inst[inst_idx].ring_enc[0]; + + WREG32_SOC15(VCN, vcn_inst, regUVD_RB_BASE_LO, lower_32_bits(ring->gpu_addr)); + WREG32_SOC15(VCN, vcn_inst, regUVD_RB_BASE_HI, upper_32_bits(ring->gpu_addr)); + WREG32_SOC15(VCN, vcn_inst, regUVD_RB_SIZE, ring->ring_size / sizeof(uint32_t)); + + tmp = RREG32_SOC15(VCN, vcn_inst, regVCN_RB_ENABLE); + tmp &= ~(VCN_RB_ENABLE__RB1_EN_MASK); + WREG32_SOC15(VCN, vcn_inst, regVCN_RB_ENABLE, tmp); + fw_shared->sq.queue_mode |= FW_QUEUE_RING_RESET; + WREG32_SOC15(VCN, vcn_inst, regUVD_RB_RPTR, 0); + WREG32_SOC15(VCN, vcn_inst, regUVD_RB_WPTR, 0); + + tmp = RREG32_SOC15(VCN, vcn_inst, regUVD_RB_RPTR); + WREG32_SOC15(VCN, vcn_inst, regUVD_RB_WPTR, tmp); + ring->wptr = RREG32_SOC15(VCN, vcn_inst, regUVD_RB_WPTR); + + tmp = RREG32_SOC15(VCN, vcn_inst, regVCN_RB_ENABLE); + tmp |= VCN_RB_ENABLE__RB1_EN_MASK; + WREG32_SOC15(VCN, vcn_inst, regVCN_RB_ENABLE, tmp); + fw_shared->sq.queue_mode &= ~(FW_QUEUE_RING_RESET | FW_QUEUE_DPG_HOLD_OFF); + + WREG32_SOC15(VCN, vcn_inst, regVCN_RB1_DB_CTRL, + ring->doorbell_index << VCN_RB1_DB_CTRL__OFFSET__SHIFT | + VCN_RB1_DB_CTRL__EN_MASK); + /* Read DB_CTRL to flush the write DB_CTRL command. */ + RREG32_SOC15(VCN, vcn_inst, regVCN_RB1_DB_CTRL); + + return 0; +} + +/** + * vcn_v5_0_1_start - VCN start + * + * @adev: amdgpu_device pointer + * + * Start VCN block + */ +static int vcn_v5_0_1_start(struct amdgpu_device *adev) +{ + volatile struct amdgpu_vcn4_fw_shared *fw_shared; + struct amdgpu_ring *ring; + uint32_t tmp; + int i, j, k, r, vcn_inst; + + if (adev->pm.dpm_enabled) + amdgpu_dpm_enable_uvd(adev, true); + + for (i = 0; i < adev->vcn.num_vcn_inst; ++i) { + fw_shared = adev->vcn.inst[i].fw_shared.cpu_addr; + + if (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG) { + r = vcn_v5_0_1_start_dpg_mode(adev, i, adev->vcn.indirect_sram); + continue; + } + + vcn_inst = GET_INST(VCN, i); + + /* set VCN status busy */ + tmp = RREG32_SOC15(VCN, vcn_inst, regUVD_STATUS) | UVD_STATUS__UVD_BUSY; + WREG32_SOC15(VCN, vcn_inst, regUVD_STATUS, tmp); + + /* enable VCPU clock */ + WREG32_P(SOC15_REG_OFFSET(VCN, vcn_inst, regUVD_VCPU_CNTL), + UVD_VCPU_CNTL__CLK_EN_MASK, ~UVD_VCPU_CNTL__CLK_EN_MASK); + + /* disable master interrupt */ + WREG32_P(SOC15_REG_OFFSET(VCN, vcn_inst, regUVD_MASTINT_EN), 0, + ~UVD_MASTINT_EN__VCPU_EN_MASK); + + /* enable LMI MC and UMC channels */ + WREG32_P(SOC15_REG_OFFSET(VCN, vcn_inst, regUVD_LMI_CTRL2), 0, + ~UVD_LMI_CTRL2__STALL_ARB_UMC_MASK); + + tmp = RREG32_SOC15(VCN, vcn_inst, regUVD_SOFT_RESET); + tmp &= ~UVD_SOFT_RESET__LMI_SOFT_RESET_MASK; + tmp &= ~UVD_SOFT_RESET__LMI_UMC_SOFT_RESET_MASK; + WREG32_SOC15(VCN, vcn_inst, regUVD_SOFT_RESET, tmp); + + /* setup regUVD_LMI_CTRL */ + tmp = RREG32_SOC15(VCN, vcn_inst, regUVD_LMI_CTRL); + WREG32_SOC15(VCN, vcn_inst, regUVD_LMI_CTRL, tmp | + UVD_LMI_CTRL__WRITE_CLEAN_TIMER_EN_MASK | + UVD_LMI_CTRL__MASK_MC_URGENT_MASK | + UVD_LMI_CTRL__DATA_COHERENCY_EN_MASK | + UVD_LMI_CTRL__VCPU_DATA_COHERENCY_EN_MASK); + + vcn_v5_0_1_mc_resume(adev, i); + + /* VCN global tiling registers */ + WREG32_SOC15(VCN, vcn_inst, regUVD_GFX10_ADDR_CONFIG, + adev->gfx.config.gb_addr_config); + + /* unblock VCPU register access */ + WREG32_P(SOC15_REG_OFFSET(VCN, vcn_inst, regUVD_RB_ARB_CTRL), 0, + ~UVD_RB_ARB_CTRL__VCPU_DIS_MASK); + + /* release VCPU reset to boot */ + WREG32_P(SOC15_REG_OFFSET(VCN, vcn_inst, regUVD_VCPU_CNTL), 0, + ~UVD_VCPU_CNTL__BLK_RST_MASK); + + for (j = 0; j < 10; ++j) { + uint32_t status; + + for (k = 0; k < 100; ++k) { + status = RREG32_SOC15(VCN, vcn_inst, regUVD_STATUS); + if (status & 2) + break; + mdelay(100); + if (amdgpu_emu_mode == 1) + msleep(20); + } + + if (amdgpu_emu_mode == 1) { + r = -1; + if (status & 2) { + r = 0; + break; + } + } else { + r = 0; + if (status & 2) + break; + + dev_err(adev->dev, + "VCN[%d] is not responding, trying to reset the VCPU!!!\n", i); + WREG32_P(SOC15_REG_OFFSET(VCN, vcn_inst, regUVD_VCPU_CNTL), + UVD_VCPU_CNTL__BLK_RST_MASK, + ~UVD_VCPU_CNTL__BLK_RST_MASK); + mdelay(10); + WREG32_P(SOC15_REG_OFFSET(VCN, vcn_inst, regUVD_VCPU_CNTL), 0, + ~UVD_VCPU_CNTL__BLK_RST_MASK); + + mdelay(10); + r = -1; + } + } + + if (r) { + dev_err(adev->dev, "VCN[%d] is not responding, giving up!!!\n", i); + return r; + } + + /* enable master interrupt */ + WREG32_P(SOC15_REG_OFFSET(VCN, vcn_inst, regUVD_MASTINT_EN), + UVD_MASTINT_EN__VCPU_EN_MASK, + ~UVD_MASTINT_EN__VCPU_EN_MASK); + + /* clear the busy bit of VCN_STATUS */ + WREG32_P(SOC15_REG_OFFSET(VCN, vcn_inst, regUVD_STATUS), 0, + ~(2 << UVD_STATUS__VCPU_REPORT__SHIFT)); + + ring = &adev->vcn.inst[i].ring_enc[0]; + + WREG32_SOC15(VCN, vcn_inst, regVCN_RB1_DB_CTRL, + ring->doorbell_index << VCN_RB1_DB_CTRL__OFFSET__SHIFT | + VCN_RB1_DB_CTRL__EN_MASK); + + /* Read DB_CTRL to flush the write DB_CTRL command. */ + RREG32_SOC15(VCN, vcn_inst, regVCN_RB1_DB_CTRL); + + WREG32_SOC15(VCN, vcn_inst, regUVD_RB_BASE_LO, ring->gpu_addr); + WREG32_SOC15(VCN, vcn_inst, regUVD_RB_BASE_HI, upper_32_bits(ring->gpu_addr)); + WREG32_SOC15(VCN, vcn_inst, regUVD_RB_SIZE, ring->ring_size / 4); + + tmp = RREG32_SOC15(VCN, vcn_inst, regVCN_RB_ENABLE); + tmp &= ~(VCN_RB_ENABLE__RB1_EN_MASK); + WREG32_SOC15(VCN, vcn_inst, regVCN_RB_ENABLE, tmp); + fw_shared->sq.queue_mode |= FW_QUEUE_RING_RESET; + WREG32_SOC15(VCN, vcn_inst, regUVD_RB_RPTR, 0); + WREG32_SOC15(VCN, vcn_inst, regUVD_RB_WPTR, 0); + + tmp = RREG32_SOC15(VCN, vcn_inst, regUVD_RB_RPTR); + WREG32_SOC15(VCN, vcn_inst, regUVD_RB_WPTR, tmp); + ring->wptr = RREG32_SOC15(VCN, vcn_inst, regUVD_RB_WPTR); + + tmp = RREG32_SOC15(VCN, vcn_inst, regVCN_RB_ENABLE); + tmp |= VCN_RB_ENABLE__RB1_EN_MASK; + WREG32_SOC15(VCN, vcn_inst, regVCN_RB_ENABLE, tmp); + fw_shared->sq.queue_mode &= ~(FW_QUEUE_RING_RESET | FW_QUEUE_DPG_HOLD_OFF); + } + + return 0; +} + +/** + * vcn_v5_0_1_stop_dpg_mode - VCN stop with dpg mode + * + * @adev: amdgpu_device pointer + * @inst_idx: instance number index + * + * Stop VCN block with dpg mode + */ +static void vcn_v5_0_1_stop_dpg_mode(struct amdgpu_device *adev, int inst_idx) +{ + uint32_t tmp; + int vcn_inst; + + vcn_inst = GET_INST(VCN, inst_idx); + + /* Wait for power status to be 1 */ + SOC15_WAIT_ON_RREG(VCN, vcn_inst, regUVD_POWER_STATUS, 1, + UVD_POWER_STATUS__UVD_POWER_STATUS_MASK); + + /* wait for read ptr to be equal to write ptr */ + tmp = RREG32_SOC15(VCN, vcn_inst, regUVD_RB_WPTR); + SOC15_WAIT_ON_RREG(VCN, vcn_inst, regUVD_RB_RPTR, tmp, 0xFFFFFFFF); + + /* disable dynamic power gating mode */ + WREG32_P(SOC15_REG_OFFSET(VCN, vcn_inst, regUVD_POWER_STATUS), 0, + ~UVD_POWER_STATUS__UVD_PG_MODE_MASK); +} + +/** + * vcn_v5_0_1_stop - VCN stop + * + * @adev: amdgpu_device pointer + * + * Stop VCN block + */ +static int vcn_v5_0_1_stop(struct amdgpu_device *adev) +{ + volatile struct amdgpu_vcn4_fw_shared *fw_shared; + uint32_t tmp; + int i, r = 0, vcn_inst; + + for (i = 0; i < adev->vcn.num_vcn_inst; ++i) { + vcn_inst = GET_INST(VCN, i); + + fw_shared = adev->vcn.inst[i].fw_shared.cpu_addr; + fw_shared->sq.queue_mode |= FW_QUEUE_DPG_HOLD_OFF; + + if (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG) { + vcn_v5_0_1_stop_dpg_mode(adev, i); + continue; + } + + /* wait for vcn idle */ + r = SOC15_WAIT_ON_RREG(VCN, vcn_inst, regUVD_STATUS, UVD_STATUS__IDLE, 0x7); + if (r) + return r; + + tmp = UVD_LMI_STATUS__VCPU_LMI_WRITE_CLEAN_MASK | + UVD_LMI_STATUS__READ_CLEAN_MASK | + UVD_LMI_STATUS__WRITE_CLEAN_MASK | + UVD_LMI_STATUS__WRITE_CLEAN_RAW_MASK; + r = SOC15_WAIT_ON_RREG(VCN, vcn_inst, regUVD_LMI_STATUS, tmp, tmp); + if (r) + return r; + + /* disable LMI UMC channel */ + tmp = RREG32_SOC15(VCN, vcn_inst, regUVD_LMI_CTRL2); + tmp |= UVD_LMI_CTRL2__STALL_ARB_UMC_MASK; + WREG32_SOC15(VCN, vcn_inst, regUVD_LMI_CTRL2, tmp); + tmp = UVD_LMI_STATUS__UMC_READ_CLEAN_RAW_MASK | + UVD_LMI_STATUS__UMC_WRITE_CLEAN_RAW_MASK; + r = SOC15_WAIT_ON_RREG(VCN, vcn_inst, regUVD_LMI_STATUS, tmp, tmp); + if (r) + return r; + + /* block VCPU register access */ + WREG32_P(SOC15_REG_OFFSET(VCN, vcn_inst, regUVD_RB_ARB_CTRL), + UVD_RB_ARB_CTRL__VCPU_DIS_MASK, + ~UVD_RB_ARB_CTRL__VCPU_DIS_MASK); + + /* reset VCPU */ + WREG32_P(SOC15_REG_OFFSET(VCN, vcn_inst, regUVD_VCPU_CNTL), + UVD_VCPU_CNTL__BLK_RST_MASK, + ~UVD_VCPU_CNTL__BLK_RST_MASK); + + /* disable VCPU clock */ + WREG32_P(SOC15_REG_OFFSET(VCN, vcn_inst, regUVD_VCPU_CNTL), 0, + ~(UVD_VCPU_CNTL__CLK_EN_MASK)); + + /* apply soft reset */ + tmp = RREG32_SOC15(VCN, vcn_inst, regUVD_SOFT_RESET); + tmp |= UVD_SOFT_RESET__LMI_UMC_SOFT_RESET_MASK; + WREG32_SOC15(VCN, vcn_inst, regUVD_SOFT_RESET, tmp); + tmp = RREG32_SOC15(VCN, vcn_inst, regUVD_SOFT_RESET); + tmp |= UVD_SOFT_RESET__LMI_SOFT_RESET_MASK; + WREG32_SOC15(VCN, vcn_inst, regUVD_SOFT_RESET, tmp); + + /* clear status */ + WREG32_SOC15(VCN, vcn_inst, regUVD_STATUS, 0); + } + + if (adev->pm.dpm_enabled) + amdgpu_dpm_enable_uvd(adev, false); + + return 0; +} + +/** + * vcn_v5_0_1_unified_ring_get_rptr - get unified read pointer + * + * @ring: amdgpu_ring pointer + * + * Returns the current hardware unified read pointer + */ +static uint64_t vcn_v5_0_1_unified_ring_get_rptr(struct amdgpu_ring *ring) +{ + struct amdgpu_device *adev = ring->adev; + + if (ring != &adev->vcn.inst[ring->me].ring_enc[0]) + DRM_ERROR("wrong ring id is identified in %s", __func__); + + return RREG32_SOC15(VCN, GET_INST(VCN, ring->me), regUVD_RB_RPTR); +} + +/** + * vcn_v5_0_1_unified_ring_get_wptr - get unified write pointer + * + * @ring: amdgpu_ring pointer + * + * Returns the current hardware unified write pointer + */ +static uint64_t vcn_v5_0_1_unified_ring_get_wptr(struct amdgpu_ring *ring) +{ + struct amdgpu_device *adev = ring->adev; + + if (ring != &adev->vcn.inst[ring->me].ring_enc[0]) + DRM_ERROR("wrong ring id is identified in %s", __func__); + + if (ring->use_doorbell) + return *ring->wptr_cpu_addr; + else + return RREG32_SOC15(VCN, GET_INST(VCN, ring->me), regUVD_RB_WPTR); +} + +/** + * vcn_v5_0_1_unified_ring_set_wptr - set enc write pointer + * + * @ring: amdgpu_ring pointer + * + * Commits the enc write pointer to the hardware + */ +static void vcn_v5_0_1_unified_ring_set_wptr(struct amdgpu_ring *ring) +{ + struct amdgpu_device *adev = ring->adev; + + if (ring != &adev->vcn.inst[ring->me].ring_enc[0]) + DRM_ERROR("wrong ring id is identified in %s", __func__); + + if (ring->use_doorbell) { + *ring->wptr_cpu_addr = lower_32_bits(ring->wptr); + WDOORBELL32(ring->doorbell_index, lower_32_bits(ring->wptr)); + } else { + WREG32_SOC15(VCN, GET_INST(VCN, ring->me), regUVD_RB_WPTR, + lower_32_bits(ring->wptr)); + } +} + +static const struct amdgpu_ring_funcs vcn_v5_0_1_unified_ring_vm_funcs = { + .type = AMDGPU_RING_TYPE_VCN_ENC, + .align_mask = 0x3f, + .nop = VCN_ENC_CMD_NO_OP, + .get_rptr = vcn_v5_0_1_unified_ring_get_rptr, + .get_wptr = vcn_v5_0_1_unified_ring_get_wptr, + .set_wptr = vcn_v5_0_1_unified_ring_set_wptr, + .emit_frame_size = + SOC15_FLUSH_GPU_TLB_NUM_WREG * 3 + + SOC15_FLUSH_GPU_TLB_NUM_REG_WAIT * 4 + + 4 + /* vcn_v2_0_enc_ring_emit_vm_flush */ + 5 + 5 + /* vcn_v2_0_enc_ring_emit_fence x2 vm fence */ + 1, /* vcn_v2_0_enc_ring_insert_end */ + .emit_ib_size = 5, /* vcn_v2_0_enc_ring_emit_ib */ + .emit_ib = vcn_v2_0_enc_ring_emit_ib, + .emit_fence = vcn_v2_0_enc_ring_emit_fence, + .emit_vm_flush = vcn_v2_0_enc_ring_emit_vm_flush, + .test_ring = amdgpu_vcn_enc_ring_test_ring, + .test_ib = amdgpu_vcn_unified_ring_test_ib, + .insert_nop = amdgpu_ring_insert_nop, + .insert_end = vcn_v2_0_enc_ring_insert_end, + .pad_ib = amdgpu_ring_generic_pad_ib, + .begin_use = amdgpu_vcn_ring_begin_use, + .end_use = amdgpu_vcn_ring_end_use, + .emit_wreg = vcn_v2_0_enc_ring_emit_wreg, + .emit_reg_wait = vcn_v2_0_enc_ring_emit_reg_wait, + .emit_reg_write_reg_wait = amdgpu_ring_emit_reg_write_reg_wait_helper, +}; + +/** + * vcn_v5_0_1_set_unified_ring_funcs - set unified ring functions + * + * @adev: amdgpu_device pointer + * + * Set unified ring functions + */ +static void vcn_v5_0_1_set_unified_ring_funcs(struct amdgpu_device *adev) +{ + int i, vcn_inst; + + for (i = 0; i < adev->vcn.num_vcn_inst; ++i) { + adev->vcn.inst[i].ring_enc[0].funcs = &vcn_v5_0_1_unified_ring_vm_funcs; + adev->vcn.inst[i].ring_enc[0].me = i; + vcn_inst = GET_INST(VCN, i); + adev->vcn.inst[i].aid_id = vcn_inst / adev->vcn.num_inst_per_aid; + } +} + +/** + * vcn_v5_0_1_is_idle - check VCN block is idle + * + * @handle: amdgpu_device pointer + * + * Check whether VCN block is idle + */ +static bool vcn_v5_0_1_is_idle(void *handle) +{ + struct amdgpu_device *adev = (struct amdgpu_device *)handle; + int i, ret = 1; + + for (i = 0; i < adev->vcn.num_vcn_inst; ++i) + ret &= (RREG32_SOC15(VCN, GET_INST(VCN, i), regUVD_STATUS) == UVD_STATUS__IDLE); + + return ret; +} + +/** + * vcn_v5_0_1_wait_for_idle - wait for VCN block idle + * + * @ip_block: Pointer to the amdgpu_ip_block for this hw instance. + * + * Wait for VCN block idle + */ +static int vcn_v5_0_1_wait_for_idle(struct amdgpu_ip_block *ip_block) +{ + struct amdgpu_device *adev = ip_block->adev; + int i, ret = 0; + + for (i = 0; i < adev->vcn.num_vcn_inst; ++i) { + ret = SOC15_WAIT_ON_RREG(VCN, GET_INST(VCN, i), regUVD_STATUS, UVD_STATUS__IDLE, + UVD_STATUS__IDLE); + if (ret) + return ret; + } + + return ret; +} + +/** + * vcn_v5_0_1_set_clockgating_state - set VCN block clockgating state + * + * @ip_block: Pointer to the amdgpu_ip_block for this hw instance. + * @state: clock gating state + * + * Set VCN block clockgating state + */ +static int vcn_v5_0_1_set_clockgating_state(struct amdgpu_ip_block *ip_block, + enum amd_clockgating_state state) +{ + struct amdgpu_device *adev = ip_block->adev; + bool enable = state == AMD_CG_STATE_GATE; + int i; + + for (i = 0; i < adev->vcn.num_vcn_inst; ++i) { + if (enable) { + if (RREG32_SOC15(VCN, GET_INST(VCN, i), regUVD_STATUS) != UVD_STATUS__IDLE) + return -EBUSY; + vcn_v5_0_1_enable_clock_gating(adev, i); + } else { + vcn_v5_0_1_disable_clock_gating(adev, i); + } + } + + return 0; +} + +/** + * vcn_v5_0_1_set_powergating_state - set VCN block powergating state + * + * @ip_block: Pointer to the amdgpu_ip_block for this hw instance. + * @state: power gating state + * + * Set VCN block powergating state + */ +static int vcn_v5_0_1_set_powergating_state(struct amdgpu_ip_block *ip_block, + enum amd_powergating_state state) +{ + struct amdgpu_device *adev = ip_block->adev; + int ret; + + if (state == adev->vcn.cur_state) + return 0; + + if (state == AMD_PG_STATE_GATE) + ret = vcn_v5_0_1_stop(adev); + else + ret = vcn_v5_0_1_start(adev); + + if (!ret) + adev->vcn.cur_state = state; + + return ret; +} + +/** + * vcn_v5_0_1_process_interrupt - process VCN block interrupt + * + * @adev: amdgpu_device pointer + * @source: interrupt sources + * @entry: interrupt entry from clients and sources + * + * Process VCN block interrupt + */ +static int vcn_v5_0_1_process_interrupt(struct amdgpu_device *adev, struct amdgpu_irq_src *source, + struct amdgpu_iv_entry *entry) +{ + uint32_t i, inst; + + i = node_id_to_phys_map[entry->node_id]; + + DRM_DEV_DEBUG(adev->dev, "IH: VCN TRAP\n"); + + for (inst = 0; inst < adev->vcn.num_vcn_inst; ++inst) + if (adev->vcn.inst[inst].aid_id == i) + break; + if (inst >= adev->vcn.num_vcn_inst) { + dev_WARN_ONCE(adev->dev, 1, + "Interrupt received for unknown VCN instance %d", + entry->node_id); + return 0; + } + + switch (entry->src_id) { + case VCN_5_0__SRCID__UVD_ENC_GENERAL_PURPOSE: + amdgpu_fence_process(&adev->vcn.inst[inst].ring_enc[0]); + break; + default: + DRM_DEV_ERROR(adev->dev, "Unhandled interrupt: %d %d\n", + entry->src_id, entry->src_data[0]); + break; + } + + return 0; +} + +static const struct amdgpu_irq_src_funcs vcn_v5_0_1_irq_funcs = { + .process = vcn_v5_0_1_process_interrupt, +}; + +/** + * vcn_v5_0_1_set_irq_funcs - set VCN block interrupt irq functions + * + * @adev: amdgpu_device pointer + * + * Set VCN block interrupt irq functions + */ +static void vcn_v5_0_1_set_irq_funcs(struct amdgpu_device *adev) +{ + int i; + + for (i = 0; i < adev->vcn.num_vcn_inst; ++i) + adev->vcn.inst->irq.num_types++; + adev->vcn.inst->irq.funcs = &vcn_v5_0_1_irq_funcs; +} + +static const struct amd_ip_funcs vcn_v5_0_1_ip_funcs = { + .name = "vcn_v5_0_1", + .early_init = vcn_v5_0_1_early_init, + .late_init = NULL, + .sw_init = vcn_v5_0_1_sw_init, + .sw_fini = vcn_v5_0_1_sw_fini, + .hw_init = vcn_v5_0_1_hw_init, + .hw_fini = vcn_v5_0_1_hw_fini, + .suspend = vcn_v5_0_1_suspend, + .resume = vcn_v5_0_1_resume, + .is_idle = vcn_v5_0_1_is_idle, + .wait_for_idle = vcn_v5_0_1_wait_for_idle, + .check_soft_reset = NULL, + .pre_soft_reset = NULL, + .soft_reset = NULL, + .post_soft_reset = NULL, + .set_clockgating_state = vcn_v5_0_1_set_clockgating_state, + .set_powergating_state = vcn_v5_0_1_set_powergating_state, + .dump_ip_state = vcn_v5_0_0_dump_ip_state, + .print_ip_state = vcn_v5_0_0_print_ip_state, +}; + +const struct amdgpu_ip_block_version vcn_v5_0_1_ip_block = { + .type = AMD_IP_BLOCK_TYPE_VCN, + .major = 5, + .minor = 0, + .rev = 1, + .funcs = &vcn_v5_0_1_ip_funcs, +}; diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v5_0_1.h b/drivers/gpu/drm/amd/amdgpu/vcn_v5_0_1.h new file mode 100644 index 0000000000000000000000000000000000000000..82ac709f44bfb2c4bbfd75648d2f7f3d18722e8a --- /dev/null +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v5_0_1.h @@ -0,0 +1,29 @@ +/* + * Copyright 2024 Advanced Micro Devices, Inc. All rights reserved. + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR + * OTHER DEALINGS IN THE SOFTWARE. + * + */ + +#ifndef __VCN_v5_0_1_H__ +#define __VCN_v5_0_1_H__ + +extern const struct amdgpu_ip_block_version vcn_v5_0_1_ip_block; + +#endif /* __VCN_v5_0_1_H__ */ diff --git a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c index 0fedadd0a6a43e30034401377f850aaa24728072..98fc6941159e1925e04e08a57a2c6189fd7f099e 100644 --- a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c +++ b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c @@ -364,9 +364,8 @@ static u32 vega10_ih_get_wptr(struct amdgpu_device *adev, * this should allow us to catchup. */ tmp = (wptr + 32) & ih->ptr_mask; - dev_warn(adev->dev, "IH ring buffer overflow " - "(0x%08X, 0x%08X, 0x%08X)\n", - wptr, ih->rptr, tmp); + dev_warn_ratelimited(adev->dev, "%s ring buffer overflow (0x%08X, 0x%08X, 0x%08X)\n", + amdgpu_ih_ring_name(adev, ih), wptr, ih->rptr, tmp); ih->rptr = tmp; tmp = RREG32_NO_KIQ(ih_regs->ih_rb_cntl); @@ -605,10 +604,10 @@ static void vega10_ih_update_clockgating_state(struct amdgpu_device *adev, } } -static int vega10_ih_set_clockgating_state(void *handle, +static int vega10_ih_set_clockgating_state(struct amdgpu_ip_block *ip_block, enum amd_clockgating_state state) { - struct amdgpu_device *adev = (struct amdgpu_device *)handle; + struct amdgpu_device *adev = ip_block->adev; vega10_ih_update_clockgating_state(adev, state == AMD_CG_STATE_GATE); @@ -616,7 +615,7 @@ static int vega10_ih_set_clockgating_state(void *handle, } -static int vega10_ih_set_powergating_state(void *handle, +static int vega10_ih_set_powergating_state(struct amdgpu_ip_block *ip_block, enum amd_powergating_state state) { return 0; diff --git a/drivers/gpu/drm/amd/amdgpu/vega20_ih.c b/drivers/gpu/drm/amd/amdgpu/vega20_ih.c index 1c9aff742e4328fcfee8b677c621a59d62805f7b..e9e3b2ed4b7bfe6c62a2c88e09f671852cbade2f 100644 --- a/drivers/gpu/drm/amd/amdgpu/vega20_ih.c +++ b/drivers/gpu/drm/amd/amdgpu/vega20_ih.c @@ -366,6 +366,7 @@ static int vega20_ih_irq_init(struct amdgpu_device *adev) /* Enable IH Retry CAM */ if (amdgpu_ip_version(adev, OSSSYS_HWIP, 0) == IP_VERSION(4, 4, 0) || amdgpu_ip_version(adev, OSSSYS_HWIP, 0) == IP_VERSION(4, 4, 2) || + amdgpu_ip_version(adev, OSSSYS_HWIP, 0) == IP_VERSION(4, 4, 4) || amdgpu_ip_version(adev, OSSSYS_HWIP, 0) == IP_VERSION(4, 4, 5)) WREG32_FIELD15(OSSSYS, 0, IH_RETRY_INT_CAM_CNTL_ALDEBARAN, ENABLE, 1); @@ -443,9 +444,8 @@ static u32 vega20_ih_get_wptr(struct amdgpu_device *adev, * this should allow us to catchup. */ tmp = (wptr + 32) & ih->ptr_mask; - dev_warn(adev->dev, "IH ring buffer overflow " - "(0x%08X, 0x%08X, 0x%08X)\n", - wptr, ih->rptr, tmp); + dev_warn_ratelimited(adev->dev, "%s ring buffer overflow (0x%08X, 0x%08X, 0x%08X)\n", + amdgpu_ih_ring_name(adev, ih), wptr, ih->rptr, tmp); ih->rptr = tmp; tmp = RREG32_NO_KIQ(ih_regs->ih_rb_cntl); @@ -697,10 +697,10 @@ static void vega20_ih_update_clockgating_state(struct amdgpu_device *adev, } } -static int vega20_ih_set_clockgating_state(void *handle, +static int vega20_ih_set_clockgating_state(struct amdgpu_ip_block *ip_block, enum amd_clockgating_state state) { - struct amdgpu_device *adev = (struct amdgpu_device *)handle; + struct amdgpu_device *adev = ip_block->adev; vega20_ih_update_clockgating_state(adev, state == AMD_CG_STATE_GATE); @@ -708,7 +708,7 @@ static int vega20_ih_set_clockgating_state(void *handle, } -static int vega20_ih_set_powergating_state(void *handle, +static int vega20_ih_set_powergating_state(struct amdgpu_ip_block *ip_block, enum amd_powergating_state state) { return 0; diff --git a/drivers/gpu/drm/amd/amdgpu/vi.c b/drivers/gpu/drm/amd/amdgpu/vi.c index a83505815d398cd39271e4ce27adaf9cc655c7f2..06615f1603317310d03c2a9115ebf503a01c1e6f 100644 --- a/drivers/gpu/drm/amd/amdgpu/vi.c +++ b/drivers/gpu/drm/amd/amdgpu/vi.c @@ -1945,10 +1945,10 @@ static int vi_common_set_clockgating_state_by_smu(void *handle, return 0; } -static int vi_common_set_clockgating_state(void *handle, +static int vi_common_set_clockgating_state(struct amdgpu_ip_block *ip_block, enum amd_clockgating_state state) { - struct amdgpu_device *adev = (struct amdgpu_device *)handle; + struct amdgpu_device *adev = ip_block->adev; if (amdgpu_sriov_vf(adev)) return 0; @@ -1988,7 +1988,7 @@ static int vi_common_set_clockgating_state(void *handle, return 0; } -static int vi_common_set_powergating_state(void *handle, +static int vi_common_set_powergating_state(struct amdgpu_ip_block *ip_block, enum amd_powergating_state state) { return 0; diff --git a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h index 02f7ba8c93cd45080d0b1f059242f3c9033f4412..388b44ed5928c201d2de1b64dd7cb4d122efaa77 100644 --- a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h +++ b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h @@ -4122,3 +4122,494 @@ static const uint32_t cwsr_trap_gfx12_hex[] = { 0xbf9f0000, 0xbf9f0000, 0xbf9f0000, 0x00000000, }; + +static const uint32_t cwsr_trap_gfx9_5_0_hex[] = { + 0xbf820001, 0xbf8202d8, + 0xb8f8f802, 0x8978ff78, + 0x00020006, 0xb8fbf803, + 0x866eff78, 0x00002000, + 0xbf840008, 0xbf0d986d, + 0xbf85001f, 0x866eff7b, + 0x00000400, 0xbf850061, + 0xbf8e0010, 0xb8fbf803, + 0xbf82fffa, 0x866eff7b, + 0x03800900, 0xbf850015, + 0x866eff7b, 0x000071ff, + 0xbf840008, 0x866fff7b, + 0x00007080, 0xbf840001, + 0xbeee1a87, 0xb8eff801, + 0x8e6e8c6e, 0x866e6f6e, + 0xbf85000a, 0xbf0d986d, + 0xbf850003, 0x866eff6d, + 0x00ff0000, 0xbf850005, + 0xbf0d986d, 0xbf850004, + 0x866eff7b, 0x00000400, + 0xbf850046, 0xbeed1a9d, + 0xb8faf807, 0x867aff7a, + 0x001f8000, 0x8e7a8b7a, + 0x8979ff79, 0xfc000000, + 0x87797a79, 0xba7ff807, + 0x00000000, 0xb8faf812, + 0xb8fbf813, 0x8efa887a, + 0xbf0d8f7b, 0xbf840002, + 0x877bff7b, 0xffff0000, + 0xc0031cfd, 0x00000010, + 0xc0071bbd, 0x00000000, + 0xc0071ebd, 0x00000008, + 0xbf8cc07f, 0x8e739773, + 0x8979ff79, 0x01800000, + 0x87797379, 0xbf0d986d, + 0xbf840009, 0xbf0d9879, + 0xbf850007, 0x896dff6d, + 0x01ff0000, 0xba7f0583, + 0x00000000, 0xbf0d9d6d, + 0xbeed189d, 0xbf840012, + 0xbef91898, 0xbeed189d, + 0x86ee6e6e, 0xbf840001, + 0xbe801d6e, 0x866eff6d, + 0x01ff0000, 0xbf850005, + 0x8778ff78, 0x00002000, + 0x80ec886c, 0x82ed806d, + 0xbf820005, 0x866eff6d, + 0x01000000, 0xbf850002, + 0x806c846c, 0x826d806d, + 0x866dff6d, 0x0000ffff, + 0x8f7a8b79, 0x867aff7a, + 0x001f8000, 0xb97af807, + 0x86fe7e7e, 0x86ea6a6a, + 0x8f6e8378, 0xb96ee0c2, + 0xbf800002, 0xb9780002, + 0xbe801f6c, 0x866dff6d, + 0x0000ffff, 0xbefa0080, + 0xb97a0283, 0xb8faf807, + 0x867aff7a, 0x001f8000, + 0x8e7a8b7a, 0x8979ff79, + 0xfc000000, 0x87797a79, + 0xba7ff807, 0x00000000, + 0xbeee007e, 0xbeef007f, + 0xbefe0180, 0xbf900004, + 0x877a8478, 0xb97af802, + 0xbf8e0002, 0xbf88fffe, + 0xb8fa2985, 0x807a817a, + 0x8e7a8a7a, 0x8e7a817a, + 0xb8fb1605, 0x807b817b, + 0x8e7b867b, 0x807a7b7a, + 0x807a7e7a, 0x827b807f, + 0x867bff7b, 0x0000ffff, + 0xc04b1c3d, 0x00000050, + 0xbf8cc07f, 0xc04b1d3d, + 0x00000060, 0xbf8cc07f, + 0xc0431e7d, 0x00000074, + 0xbf8cc07f, 0xbef4007e, + 0x8675ff7f, 0x0000ffff, + 0x8775ff75, 0x00040000, + 0xbef60080, 0xbef700ff, + 0x00807fac, 0xbef1007c, + 0xbef00080, 0xb8f02985, + 0x80708170, 0x8e708a70, + 0x8e708170, 0xb8fa1605, + 0x807a817a, 0x8e7a867a, + 0x80707a70, 0xbef60084, + 0xbef600ff, 0x01000000, + 0xbefe007c, 0xbefc0070, + 0xc0611c7a, 0x0000007c, + 0xbf8cc07f, 0x80708470, + 0xbefc007e, 0xbefe007c, + 0xbefc0070, 0xc0611b3a, + 0x0000007c, 0xbf8cc07f, + 0x80708470, 0xbefc007e, + 0xbefe007c, 0xbefc0070, + 0xc0611b7a, 0x0000007c, + 0xbf8cc07f, 0x80708470, + 0xbefc007e, 0xbefe007c, + 0xbefc0070, 0xc0611bba, + 0x0000007c, 0xbf8cc07f, + 0x80708470, 0xbefc007e, + 0xbefe007c, 0xbefc0070, + 0xc0611bfa, 0x0000007c, + 0xbf8cc07f, 0x80708470, + 0xbefc007e, 0xbefe007c, + 0xbefc0070, 0xc0611e3a, + 0x0000007c, 0xbf8cc07f, + 0x80708470, 0xbefc007e, + 0xb8fbf803, 0xbefe007c, + 0xbefc0070, 0xc0611efa, + 0x0000007c, 0xbf8cc07f, + 0x80708470, 0xbefc007e, + 0xbefe007c, 0xbefc0070, + 0xc0611a3a, 0x0000007c, + 0xbf8cc07f, 0x80708470, + 0xbefc007e, 0xbefe007c, + 0xbefc0070, 0xc0611a7a, + 0x0000007c, 0xbf8cc07f, + 0x80708470, 0xbefc007e, + 0xb8f1f801, 0xbefe007c, + 0xbefc0070, 0xc0611c7a, + 0x0000007c, 0xbf8cc07f, + 0x80708470, 0xbefc007e, + 0x867aff7f, 0x04000000, + 0xbeef0080, 0x876f6f7a, + 0xb8f02985, 0x80708170, + 0x8e708a70, 0x8e708170, + 0xb8fb1605, 0x807b817b, + 0x8e7b847b, 0x8e76827b, + 0xbef600ff, 0x01000000, + 0xbef20174, 0x80747074, + 0x82758075, 0xbefc0080, + 0xbf800000, 0xbe802b00, + 0xbe822b02, 0xbe842b04, + 0xbe862b06, 0xbe882b08, + 0xbe8a2b0a, 0xbe8c2b0c, + 0xbe8e2b0e, 0xc06b003a, + 0x00000000, 0xbf8cc07f, + 0xc06b013a, 0x00000010, + 0xbf8cc07f, 0xc06b023a, + 0x00000020, 0xbf8cc07f, + 0xc06b033a, 0x00000030, + 0xbf8cc07f, 0x8074c074, + 0x82758075, 0x807c907c, + 0xbf0a7b7c, 0xbf85ffe7, + 0xbef40172, 0xbef00080, + 0xbefe00c1, 0xbeff00c1, + 0xbee80080, 0xbee90080, + 0xbef600ff, 0x01000000, + 0x867aff78, 0x00400000, + 0xbf850003, 0xb8faf803, + 0x897a7aff, 0x10000000, + 0xbf85004d, 0xbe840080, + 0xd2890000, 0x00000900, + 0x80048104, 0xd2890001, + 0x00000900, 0x80048104, + 0xd2890002, 0x00000900, + 0x80048104, 0xd2890003, + 0x00000900, 0x80048104, + 0xc069003a, 0x00000070, + 0xbf8cc07f, 0x80709070, + 0xbf06c004, 0xbf84ffee, + 0xbe840080, 0xd2890000, + 0x00000901, 0x80048104, + 0xd2890001, 0x00000901, + 0x80048104, 0xd2890002, + 0x00000901, 0x80048104, + 0xd2890003, 0x00000901, + 0x80048104, 0xc069003a, + 0x00000070, 0xbf8cc07f, + 0x80709070, 0xbf06c004, + 0xbf84ffee, 0xbe840080, + 0xd2890000, 0x00000902, + 0x80048104, 0xd2890001, + 0x00000902, 0x80048104, + 0xd2890002, 0x00000902, + 0x80048104, 0xd2890003, + 0x00000902, 0x80048104, + 0xc069003a, 0x00000070, + 0xbf8cc07f, 0x80709070, + 0xbf06c004, 0xbf84ffee, + 0xbe840080, 0xd2890000, + 0x00000903, 0x80048104, + 0xd2890001, 0x00000903, + 0x80048104, 0xd2890002, + 0x00000903, 0x80048104, + 0xd2890003, 0x00000903, + 0x80048104, 0xc069003a, + 0x00000070, 0xbf8cc07f, + 0x80709070, 0xbf06c004, + 0xbf84ffee, 0xbf820008, + 0xe0724000, 0x701d0000, + 0xe0724100, 0x701d0100, + 0xe0724200, 0x701d0200, + 0xe0724300, 0x701d0300, + 0xbefe00c1, 0xbeff00c1, + 0xb8fb5306, 0x867bc17b, + 0xbf840052, 0xbf8a0000, + 0x867aff6f, 0x04000000, + 0xbf84004e, 0x8e7b867b, + 0x8e7b827b, 0xbef6007b, + 0xb8f02985, 0x80708170, + 0x8e708a70, 0x8e708170, + 0xb8fa1605, 0x807a817a, + 0x8e7a867a, 0x80707a70, + 0x8070ff70, 0x00000080, + 0xbef600ff, 0x01000000, + 0xbefc0080, 0xd28c0002, + 0x000100c1, 0xd28d0003, + 0x000204c1, 0x867aff78, + 0x00400000, 0xbf850003, + 0xb8faf803, 0x897a7aff, + 0x10000000, 0xbf85001d, + 0x24040682, 0xd86c0000, + 0x00000002, 0xbf8cc07f, + 0xbe840080, 0xd2890000, + 0x00000900, 0x80048104, + 0xd2890001, 0x00000900, + 0x80048104, 0xd2890002, + 0x00000900, 0x80048104, + 0xd2890003, 0x00000900, + 0x80048104, 0xc069003a, + 0x00000070, 0xbf8cc07f, + 0x80709070, 0xbf06c004, + 0xbf84ffee, 0x680404ff, + 0x00000100, 0xd0c9006a, + 0x0000f702, 0xbf87ffe5, + 0xbf820016, 0xd1060002, + 0x00011103, 0x7e0602ff, + 0x00000200, 0xbefc00ff, + 0x00010000, 0xbe800077, + 0x8677ff77, 0xff7fffff, + 0x8777ff77, 0x00058000, + 0xd8ec0000, 0x00000002, + 0xbf8cc07f, 0xe0765000, + 0x701d0002, 0x68040702, + 0xd0c9006a, 0x0000f702, + 0xbefe016a, 0xbf87fff6, + 0xbef70000, 0xbef000ff, + 0x00000400, 0xbefe00c1, + 0xbeff00c1, 0xb8fb2b05, + 0x807b817b, 0x8e7b827b, + 0xbef600ff, 0x01000000, + 0xbefc0084, 0xbf0a7b7c, + 0xbf84006d, 0xbf11017c, + 0x807bff7b, 0x00001000, + 0x867aff78, 0x00400000, + 0xbf850003, 0xb8faf803, + 0x897a7aff, 0x10000000, + 0xbf850051, 0xbe840080, + 0xd2890000, 0x00000900, + 0x80048104, 0xd2890001, + 0x00000900, 0x80048104, + 0xd2890002, 0x00000900, + 0x80048104, 0xd2890003, + 0x00000900, 0x80048104, + 0xc069003a, 0x00000070, + 0xbf8cc07f, 0x80709070, + 0xbf06c004, 0xbf84ffee, + 0xbe840080, 0xd2890000, + 0x00000901, 0x80048104, + 0xd2890001, 0x00000901, + 0x80048104, 0xd2890002, + 0x00000901, 0x80048104, + 0xd2890003, 0x00000901, + 0x80048104, 0xc069003a, + 0x00000070, 0xbf8cc07f, + 0x80709070, 0xbf06c004, + 0xbf84ffee, 0xbe840080, + 0xd2890000, 0x00000902, + 0x80048104, 0xd2890001, + 0x00000902, 0x80048104, + 0xd2890002, 0x00000902, + 0x80048104, 0xd2890003, + 0x00000902, 0x80048104, + 0xc069003a, 0x00000070, + 0xbf8cc07f, 0x80709070, + 0xbf06c004, 0xbf84ffee, + 0xbe840080, 0xd2890000, + 0x00000903, 0x80048104, + 0xd2890001, 0x00000903, + 0x80048104, 0xd2890002, + 0x00000903, 0x80048104, + 0xd2890003, 0x00000903, + 0x80048104, 0xc069003a, + 0x00000070, 0xbf8cc07f, + 0x80709070, 0xbf06c004, + 0xbf84ffee, 0x807c847c, + 0xbf0a7b7c, 0xbf85ffb1, + 0xbf9c0000, 0xbf820012, + 0x7e000300, 0x7e020301, + 0x7e040302, 0x7e060303, + 0xe0724000, 0x701d0000, + 0xe0724100, 0x701d0100, + 0xe0724200, 0x701d0200, + 0xe0724300, 0x701d0300, + 0x807c847c, 0x8070ff70, + 0x00000400, 0xbf0a7b7c, + 0xbf85ffef, 0xbf9c0000, + 0xb8fb2985, 0x807b817b, + 0x8e7b837b, 0xb8fa2b05, + 0x807a817a, 0x8e7a827a, + 0x80fb7a7b, 0x867b7b7b, + 0xbf84007a, 0x807bff7b, + 0x00001000, 0xbefc0080, + 0xbf11017c, 0x867aff78, + 0x00400000, 0xbf850003, + 0xb8faf803, 0x897a7aff, + 0x10000000, 0xbf850059, + 0xd3d84000, 0x18000100, + 0xd3d84001, 0x18000101, + 0xd3d84002, 0x18000102, + 0xd3d84003, 0x18000103, + 0xbe840080, 0xd2890000, + 0x00000900, 0x80048104, + 0xd2890001, 0x00000900, + 0x80048104, 0xd2890002, + 0x00000900, 0x80048104, + 0xd2890003, 0x00000900, + 0x80048104, 0xc069003a, + 0x00000070, 0xbf8cc07f, + 0x80709070, 0xbf06c004, + 0xbf84ffee, 0xbe840080, + 0xd2890000, 0x00000901, + 0x80048104, 0xd2890001, + 0x00000901, 0x80048104, + 0xd2890002, 0x00000901, + 0x80048104, 0xd2890003, + 0x00000901, 0x80048104, + 0xc069003a, 0x00000070, + 0xbf8cc07f, 0x80709070, + 0xbf06c004, 0xbf84ffee, + 0xbe840080, 0xd2890000, + 0x00000902, 0x80048104, + 0xd2890001, 0x00000902, + 0x80048104, 0xd2890002, + 0x00000902, 0x80048104, + 0xd2890003, 0x00000902, + 0x80048104, 0xc069003a, + 0x00000070, 0xbf8cc07f, + 0x80709070, 0xbf06c004, + 0xbf84ffee, 0xbe840080, + 0xd2890000, 0x00000903, + 0x80048104, 0xd2890001, + 0x00000903, 0x80048104, + 0xd2890002, 0x00000903, + 0x80048104, 0xd2890003, + 0x00000903, 0x80048104, + 0xc069003a, 0x00000070, + 0xbf8cc07f, 0x80709070, + 0xbf06c004, 0xbf84ffee, + 0x807c847c, 0xbf0a7b7c, + 0xbf85ffa9, 0xbf9c0000, + 0xbf820016, 0xd3d84000, + 0x18000100, 0xd3d84001, + 0x18000101, 0xd3d84002, + 0x18000102, 0xd3d84003, + 0x18000103, 0xe0724000, + 0x701d0000, 0xe0724100, + 0x701d0100, 0xe0724200, + 0x701d0200, 0xe0724300, + 0x701d0300, 0x807c847c, + 0x8070ff70, 0x00000400, + 0xbf0a7b7c, 0xbf85ffeb, + 0xbf9c0000, 0xbf8200f4, + 0xbef4007e, 0x8675ff7f, + 0x0000ffff, 0x8775ff75, + 0x00040000, 0xbef60080, + 0xbef700ff, 0x00807fac, + 0x866eff7f, 0x04000000, + 0xbf840025, 0xbefe00c1, + 0xbeff00c1, 0xb8ef5306, + 0x866fc16f, 0xbf840020, + 0x8e6f866f, 0x8e6f826f, + 0xbef6006f, 0xb8f82985, + 0x80788178, 0x8e788a78, + 0x8e788178, 0xb8ee1605, + 0x806e816e, 0x8e6e866e, + 0x80786e78, 0x8078ff78, + 0x00000080, 0xbef600ff, + 0x01000000, 0xbefc0080, + 0xe0510000, 0x781d0000, + 0xe0510100, 0x781d0000, + 0xe0510200, 0x781d0000, + 0xe0510300, 0x781d0000, + 0xe0510400, 0x781d0000, + 0x807cff7c, 0x00000500, + 0x8078ff78, 0x00000500, + 0xbf0a6f7c, 0xbf85fff0, + 0xbefe00c1, 0xbeff00c1, + 0xbef600ff, 0x01000000, + 0xb8ef2b05, 0x806f816f, + 0x8e6f826f, 0x806fff6f, + 0x00008000, 0xbef80080, + 0xbeee0078, 0x8078ff78, + 0x00000400, 0xbefc0084, + 0xbf11087c, 0xe0524000, + 0x781d0000, 0xe0524100, + 0x781d0100, 0xe0524200, + 0x781d0200, 0xe0524300, + 0x781d0300, 0xbf8c0f70, + 0x7e000300, 0x7e020301, + 0x7e040302, 0x7e060303, + 0x807c847c, 0x8078ff78, + 0x00000400, 0xbf0a6f7c, + 0xbf85ffee, 0xb8ef2985, + 0x806f816f, 0x8e6f836f, + 0xb8f92b05, 0x80798179, + 0x8e798279, 0x80ef796f, + 0x866f6f6f, 0xbf84001a, + 0x806fff6f, 0x00008000, + 0xbefc0080, 0xbf11087c, + 0xe0524000, 0x781d0000, + 0xe0524100, 0x781d0100, + 0xe0524200, 0x781d0200, + 0xe0524300, 0x781d0300, + 0xbf8c0f70, 0xd3d94000, + 0x18000100, 0xd3d94001, + 0x18000101, 0xd3d94002, + 0x18000102, 0xd3d94003, + 0x18000103, 0x807c847c, + 0x8078ff78, 0x00000400, + 0xbf0a6f7c, 0xbf85ffea, + 0xbf9c0000, 0xe0524000, + 0x6e1d0000, 0xe0524100, + 0x6e1d0100, 0xe0524200, + 0x6e1d0200, 0xe0524300, + 0x6e1d0300, 0xbf8c0f70, + 0xb8f82985, 0x80788178, + 0x8e788a78, 0x8e788178, + 0xb8ee1605, 0x806e816e, + 0x8e6e866e, 0x80786e78, + 0x80f8c078, 0xb8ef1605, + 0x806f816f, 0x8e6f846f, + 0x8e76826f, 0xbef600ff, + 0x01000000, 0xbefc006f, + 0xc031003a, 0x00000078, + 0x80f8c078, 0xbf8cc07f, + 0x80fc907c, 0xbf800000, + 0xbe802d00, 0xbe822d02, + 0xbe842d04, 0xbe862d06, + 0xbe882d08, 0xbe8a2d0a, + 0xbe8c2d0c, 0xbe8e2d0e, + 0xbf06807c, 0xbf84fff0, + 0xb8f82985, 0x80788178, + 0x8e788a78, 0x8e788178, + 0xb8ee1605, 0x806e816e, + 0x8e6e866e, 0x80786e78, + 0xbef60084, 0xbef600ff, + 0x01000000, 0xc0211bfa, + 0x00000078, 0x80788478, + 0xc0211b3a, 0x00000078, + 0x80788478, 0xc0211b7a, + 0x00000078, 0x80788478, + 0xc0211c3a, 0x00000078, + 0x80788478, 0xc0211c7a, + 0x00000078, 0x80788478, + 0xc0211eba, 0x00000078, + 0x80788478, 0xc0211efa, + 0x00000078, 0x80788478, + 0xc0211a3a, 0x00000078, + 0x80788478, 0xc0211a7a, + 0x00000078, 0x80788478, + 0xc0211cfa, 0x00000078, + 0x80788478, 0xbf8cc07f, + 0xbefc006f, 0xbefe0070, + 0xbeff0071, 0x866f7bff, + 0x000003ff, 0xb96f4803, + 0x866f7bff, 0xfffff800, + 0x8f6f8b6f, 0xb96fa2c3, + 0xb973f801, 0xb8ee2985, + 0x806e816e, 0x8e6e8a6e, + 0x8e6e816e, 0xb8ef1605, + 0x806f816f, 0x8e6f866f, + 0x806e6f6e, 0x806e746e, + 0x826f8075, 0x866fff6f, + 0x0000ffff, 0xc00b1c37, + 0x00000050, 0xc00b1d37, + 0x00000060, 0xc0031e77, + 0x00000074, 0xbf8cc07f, + 0x8f6e8b79, 0x866eff6e, + 0x001f8000, 0xb96ef807, + 0x866dff6d, 0x0000ffff, + 0x86fe7e7e, 0x86ea6a6a, + 0x8f6e837a, 0xb96ee0c2, + 0xbf800002, 0xb97a0002, + 0xbf8a0000, 0xbe801f6c, + 0xbf9b0000, 0x00000000, +}; diff --git a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm index bb26338204f4ba84b5ae41a781e1becdf9ad72bb..0eabb7a8cab94201ecddff8dfb15ab2140461c7a 100644 --- a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm +++ b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm @@ -37,17 +37,28 @@ * gc_9_4_3: * cpp -DASIC_FAMILY=GC_9_4_3 cwsr_trap_handler_gfx9.asm -P -o gc_9_4_3.sp3 * sp3 gc_9_4_3.sp3 -hex gc_9_4_3.hex + * + * gc_9_5_0: + * cpp -DASIC_FAMILY=GC_9_5_0 cwsr_trap_handler_gfx9.asm -P -o gc_9_5_0.sp3 + * sp3 gc_9_5_0.sp3 -hex gc_9_5_0.hex */ #define CHIP_VEGAM 18 #define CHIP_ARCTURUS 23 #define CHIP_ALDEBARAN 25 #define CHIP_GC_9_4_3 26 +#define CHIP_GC_9_5_0 27 var ACK_SQC_STORE = 1 //workaround for suspected SQC store bug causing incorrect stores under concurrency var SAVE_AFTER_XNACK_ERROR = 1 //workaround for TCP store failure after XNACK error when ALLOW_REPLAY=0, for debugger var SINGLE_STEP_MISSED_WORKAROUND = (ASIC_FAMILY <= CHIP_ALDEBARAN) //workaround for lost MODE.DEBUG_EN exception when SAVECTX raised +#if ASIC_FAMILY < CHIP_GC_9_4_3 +#define VMEM_MODIFIERS slc:1 glc:1 +#else +#define VMEM_MODIFIERS sc0:1 nt:1 +#endif + /**************************************************************************/ /* variables */ /**************************************************************************/ @@ -62,7 +73,13 @@ var SQ_WAVE_STATUS_ALLOW_REPLAY_MASK = 0x400000 var SQ_WAVE_STATUS_ECC_ERR_MASK = 0x20000 var SQ_WAVE_LDS_ALLOC_LDS_SIZE_SHIFT = 12 +#if ASIC_FAMILY >= CHIP_GC_9_5_0 +var SQ_WAVE_LDS_ALLOC_LDS_SIZE_SIZE = 11 +var LDS_RESTORE_GRANULARITY_BYTES = 1280 +#else var SQ_WAVE_LDS_ALLOC_LDS_SIZE_SIZE = 9 +var LDS_RESTORE_GRANULARITY_BYTES = 512 +#endif var SQ_WAVE_GPR_ALLOC_VGPR_SIZE_SIZE = 6 var SQ_WAVE_GPR_ALLOC_SGPR_SIZE_SIZE = 3 //FIXME sq.blk still has 4 bits at this time while SQ programming guide has 3 bits var SQ_WAVE_GPR_ALLOC_SGPR_SIZE_SHIFT = 24 @@ -557,12 +574,21 @@ if SAVE_AFTER_XNACK_ERROR v_lshlrev_b32 v2, 2, v3 L_SAVE_LDS_LOOP_SQC: +#if ASIC_FAMILY < CHIP_GC_9_5_0 ds_read2_b32 v[0:1], v2 offset0:0 offset1:0x40 s_waitcnt lgkmcnt(0) - write_vgprs_to_mem_with_sqc(v0, 2, s_save_buf_rsrc0, s_save_mem_offset) v_add_u32 v2, 0x200, v2 +#else + // gfx950 needs to save in multiple of 256 bytes. + ds_read_b32 v0, v2 + s_waitcnt lgkmcnt(0) + write_vgprs_to_mem_with_sqc(v0, 1, s_save_buf_rsrc0, s_save_mem_offset) + + v_add_u32 v2, 0x100, v2 +#endif + v_cmp_lt_u32 vcc[0:1], v2, s_save_alloc_size s_cbranch_vccnz L_SAVE_LDS_LOOP_SQC @@ -581,11 +607,14 @@ end L_SAVE_LDS_LOOP_VECTOR: ds_read_b64 v[0:1], v2 //x =LDS[a], byte address s_waitcnt lgkmcnt(0) - buffer_store_dwordx2 v[0:1], v2, s_save_buf_rsrc0, s_save_mem_offset offen:1 glc:1 slc:1 + buffer_store_dwordx2 v[0:1], v2, s_save_buf_rsrc0, s_save_mem_offset VMEM_MODIFIERS offen:1 // s_waitcnt vmcnt(0) // v_add_u32 v2, vcc[0:1], v2, v3 v_add_u32 v2, v2, v3 v_cmp_lt_u32 vcc[0:1], v2, s_save_alloc_size +#if ASIC_FAMILY >= CHIP_GC_9_5_0 + s_mov_b64 exec, vcc +#endif s_cbranch_vccnz L_SAVE_LDS_LOOP_VECTOR // restore rsrc3 @@ -748,8 +777,13 @@ L_RESTORE: L_RESTORE_LDS_LOOP: buffer_load_dword v0, v0, s_restore_buf_rsrc0, s_restore_mem_offset lds:1 // first 64DW buffer_load_dword v0, v0, s_restore_buf_rsrc0, s_restore_mem_offset lds:1 offset:256 // second 64DW - s_add_u32 m0, m0, 256*2 // 128 DW - s_add_u32 s_restore_mem_offset, s_restore_mem_offset, 256*2 //mem offset increased by 128DW +#if ASIC_FAMILY >= CHIP_GC_9_5_0 + buffer_load_dword v0, v0, s_restore_buf_rsrc0, s_restore_mem_offset lds:1 offset:512 // third 64DW + buffer_load_dword v0, v0, s_restore_buf_rsrc0, s_restore_mem_offset lds:1 offset:768 // forth 64DW + buffer_load_dword v0, v0, s_restore_buf_rsrc0, s_restore_mem_offset lds:1 offset:1024 // fifth 64DW +#endif + s_add_u32 m0, m0, LDS_RESTORE_GRANULARITY_BYTES // 128/320 DW + s_add_u32 s_restore_mem_offset, s_restore_mem_offset, LDS_RESTORE_GRANULARITY_BYTES //mem offset increased by 128/320 DW s_cmp_lt_u32 m0, s_restore_alloc_size //scc=(m0 < s_restore_alloc_size) ? 1 : 0 s_cbranch_scc1 L_RESTORE_LDS_LOOP //LDS restore is complete? @@ -979,17 +1013,17 @@ L_TCP_STORE_CHECK_DONE: end function write_4vgprs_to_mem(s_rsrc, s_mem_offset) - buffer_store_dword v0, v0, s_rsrc, s_mem_offset slc:1 glc:1 - buffer_store_dword v1, v0, s_rsrc, s_mem_offset slc:1 glc:1 offset:256 - buffer_store_dword v2, v0, s_rsrc, s_mem_offset slc:1 glc:1 offset:256*2 - buffer_store_dword v3, v0, s_rsrc, s_mem_offset slc:1 glc:1 offset:256*3 + buffer_store_dword v0, v0, s_rsrc, s_mem_offset VMEM_MODIFIERS + buffer_store_dword v1, v0, s_rsrc, s_mem_offset VMEM_MODIFIERS offset:256 + buffer_store_dword v2, v0, s_rsrc, s_mem_offset VMEM_MODIFIERS offset:256*2 + buffer_store_dword v3, v0, s_rsrc, s_mem_offset VMEM_MODIFIERS offset:256*3 end function read_4vgprs_from_mem(s_rsrc, s_mem_offset) - buffer_load_dword v0, v0, s_rsrc, s_mem_offset slc:1 glc:1 - buffer_load_dword v1, v0, s_rsrc, s_mem_offset slc:1 glc:1 offset:256 - buffer_load_dword v2, v0, s_rsrc, s_mem_offset slc:1 glc:1 offset:256*2 - buffer_load_dword v3, v0, s_rsrc, s_mem_offset slc:1 glc:1 offset:256*3 + buffer_load_dword v0, v0, s_rsrc, s_mem_offset VMEM_MODIFIERS + buffer_load_dword v1, v0, s_rsrc, s_mem_offset VMEM_MODIFIERS offset:256 + buffer_load_dword v2, v0, s_rsrc, s_mem_offset VMEM_MODIFIERS offset:256*2 + buffer_load_dword v3, v0, s_rsrc, s_mem_offset VMEM_MODIFIERS offset:256*3 s_waitcnt vmcnt(0) end diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c index 7b826a136ceb9228aa227cbed8e10772af77eb77..693469c18c609b196034a82d324b510c54da9e14 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c @@ -1423,6 +1423,7 @@ err: static int kfd_fill_gpu_cache_info_from_gfx_config(struct kfd_dev *kdev, + bool cache_line_size_missing, struct kfd_gpu_cache_info *pcache_info) { struct amdgpu_device *adev = kdev->adev; @@ -1437,6 +1438,8 @@ static int kfd_fill_gpu_cache_info_from_gfx_config(struct kfd_dev *kdev, CRAT_CACHE_FLAGS_SIMD_CACHE); pcache_info[i].num_cu_shared = adev->gfx.config.gc_num_tcp_per_wpg / 2; pcache_info[i].cache_line_size = adev->gfx.config.gc_tcp_cache_line_size; + if (cache_line_size_missing && !pcache_info[i].cache_line_size) + pcache_info[i].cache_line_size = 128; i++; } /* Scalar L1 Instruction Cache per SQC */ @@ -1449,6 +1452,8 @@ static int kfd_fill_gpu_cache_info_from_gfx_config(struct kfd_dev *kdev, CRAT_CACHE_FLAGS_SIMD_CACHE); pcache_info[i].num_cu_shared = adev->gfx.config.gc_num_sqc_per_wgp * 2; pcache_info[i].cache_line_size = adev->gfx.config.gc_instruction_cache_line_size; + if (cache_line_size_missing && !pcache_info[i].cache_line_size) + pcache_info[i].cache_line_size = 128; i++; } /* Scalar L1 Data Cache per SQC */ @@ -1460,6 +1465,8 @@ static int kfd_fill_gpu_cache_info_from_gfx_config(struct kfd_dev *kdev, CRAT_CACHE_FLAGS_SIMD_CACHE); pcache_info[i].num_cu_shared = adev->gfx.config.gc_num_sqc_per_wgp * 2; pcache_info[i].cache_line_size = adev->gfx.config.gc_scalar_data_cache_line_size; + if (cache_line_size_missing && !pcache_info[i].cache_line_size) + pcache_info[i].cache_line_size = 64; i++; } /* GL1 Data Cache per SA */ @@ -1472,7 +1479,8 @@ static int kfd_fill_gpu_cache_info_from_gfx_config(struct kfd_dev *kdev, CRAT_CACHE_FLAGS_DATA_CACHE | CRAT_CACHE_FLAGS_SIMD_CACHE); pcache_info[i].num_cu_shared = adev->gfx.config.max_cu_per_sh; - pcache_info[i].cache_line_size = 0; + if (cache_line_size_missing) + pcache_info[i].cache_line_size = 128; i++; } /* L2 Data Cache per GPU (Total Tex Cache) */ @@ -1484,6 +1492,8 @@ static int kfd_fill_gpu_cache_info_from_gfx_config(struct kfd_dev *kdev, CRAT_CACHE_FLAGS_SIMD_CACHE); pcache_info[i].num_cu_shared = adev->gfx.config.max_cu_per_sh; pcache_info[i].cache_line_size = adev->gfx.config.gc_tcc_cache_line_size; + if (cache_line_size_missing && !pcache_info[i].cache_line_size) + pcache_info[i].cache_line_size = 128; i++; } /* L3 Data Cache per GPU */ @@ -1494,7 +1504,7 @@ static int kfd_fill_gpu_cache_info_from_gfx_config(struct kfd_dev *kdev, CRAT_CACHE_FLAGS_DATA_CACHE | CRAT_CACHE_FLAGS_SIMD_CACHE); pcache_info[i].num_cu_shared = adev->gfx.config.max_cu_per_sh; - pcache_info[i].cache_line_size = 0; + pcache_info[i].cache_line_size = 64; i++; } return i; @@ -1569,6 +1579,7 @@ static int kfd_fill_gpu_cache_info_from_gfx_config_v2(struct kfd_dev *kdev, int kfd_get_gpu_cache_info(struct kfd_node *kdev, struct kfd_gpu_cache_info **pcache_info) { int num_of_cache_types = 0; + bool cache_line_size_missing = false; switch (kdev->adev->asic_type) { case CHIP_KAVERI: @@ -1628,6 +1639,7 @@ int kfd_get_gpu_cache_info(struct kfd_node *kdev, struct kfd_gpu_cache_info **pc break; case IP_VERSION(9, 4, 3): case IP_VERSION(9, 4, 4): + case IP_VERSION(9, 5, 0): num_of_cache_types = kfd_fill_gpu_cache_info_from_gfx_config_v2(kdev->kfd, *pcache_info); @@ -1692,10 +1704,17 @@ int kfd_get_gpu_cache_info(struct kfd_node *kdev, struct kfd_gpu_cache_info **pc case IP_VERSION(11, 5, 0): case IP_VERSION(11, 5, 1): case IP_VERSION(11, 5, 2): + /* Cacheline size not available in IP discovery for gc11. + * kfd_fill_gpu_cache_info_from_gfx_config to hard code it + */ + cache_line_size_missing = true; + fallthrough; case IP_VERSION(12, 0, 0): case IP_VERSION(12, 0, 1): num_of_cache_types = - kfd_fill_gpu_cache_info_from_gfx_config(kdev->kfd, *pcache_info); + kfd_fill_gpu_cache_info_from_gfx_config(kdev->kfd, + cache_line_size_missing, + *pcache_info); break; default: *pcache_info = dummy_cache_info; diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debug.h b/drivers/gpu/drm/amd/amdkfd/kfd_debug.h index 924d0fd85dfb88bb03e96cbd6af03bb75c03d2da..27aa1a5b120ff7f3261190a395b528ce903c1219 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_debug.h +++ b/drivers/gpu/drm/amd/amdkfd/kfd_debug.h @@ -79,6 +79,7 @@ static inline bool kfd_dbg_is_per_vmid_supported(struct kfd_node *dev) return (KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 2) || KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 3) || KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 4) || + KFD_GC_VERSION(dev) == IP_VERSION(9, 5, 0) || KFD_GC_VERSION(dev) >= IP_VERSION(11, 0, 0)); } diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c index 9b51dd75fefc7de2502937e9f9e0352a77cf190f..a29374c8640565a3549cc9dc4b130b9ede5d893b 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c @@ -85,6 +85,7 @@ static void kfd_device_info_set_sdma_info(struct kfd_dev *kfd) case IP_VERSION(4, 4, 0):/* ALDEBARAN */ case IP_VERSION(4, 4, 2): case IP_VERSION(4, 4, 5): + case IP_VERSION(4, 4, 4): case IP_VERSION(5, 0, 0):/* NAVI10 */ case IP_VERSION(5, 0, 1):/* CYAN_SKILLFISH */ case IP_VERSION(5, 0, 2):/* NAVI14 */ @@ -152,6 +153,7 @@ static void kfd_device_info_set_event_interrupt_class(struct kfd_dev *kfd) break; case IP_VERSION(9, 4, 3): /* GC 9.4.3 */ case IP_VERSION(9, 4, 4): /* GC 9.4.4 */ + case IP_VERSION(9, 5, 0): /* GC 9.5.0 */ kfd->device_info.event_interrupt_class = &event_interrupt_class_v9_4_3; break; @@ -356,6 +358,10 @@ struct kfd_dev *kgd2kfd_probe(struct amdgpu_device *adev, bool vf) gfx_target_version = 90402; f2g = &gc_9_4_3_kfd2kgd; break; + case IP_VERSION(9, 5, 0): + gfx_target_version = 90500; + f2g = &gc_9_4_3_kfd2kgd; + break; /* Navi10 */ case IP_VERSION(10, 1, 10): gfx_target_version = 100100; @@ -515,6 +521,10 @@ static void kfd_cwsr_init(struct kfd_dev *kfd) > KFD_CWSR_TMA_OFFSET); kfd->cwsr_isa = cwsr_trap_gfx9_4_3_hex; kfd->cwsr_isa_size = sizeof(cwsr_trap_gfx9_4_3_hex); + } else if (KFD_GC_VERSION(kfd) == IP_VERSION(9, 5, 0)) { + BUILD_BUG_ON(sizeof(cwsr_trap_gfx9_5_0_hex) > PAGE_SIZE); + kfd->cwsr_isa = cwsr_trap_gfx9_5_0_hex; + kfd->cwsr_isa_size = sizeof(cwsr_trap_gfx9_5_0_hex); } else if (KFD_GC_VERSION(kfd) < IP_VERSION(10, 1, 1)) { BUILD_BUG_ON(sizeof(cwsr_trap_gfx9_hex) > KFD_CWSR_TMA_OFFSET); @@ -567,6 +577,7 @@ static int kfd_gws_init(struct kfd_node *node) && kfd->mec2_fw_version >= 0x28) || (KFD_GC_VERSION(node) == IP_VERSION(9, 4, 3) || KFD_GC_VERSION(node) == IP_VERSION(9, 4, 4)) || + (KFD_GC_VERSION(node) == IP_VERSION(9, 5, 0)) || (KFD_GC_VERSION(node) >= IP_VERSION(10, 3, 0) && KFD_GC_VERSION(node) < IP_VERSION(11, 0, 0) && kfd->mec2_fw_version >= 0x6b) || @@ -638,6 +649,14 @@ static void kfd_cleanup_nodes(struct kfd_dev *kfd, unsigned int num_nodes) struct kfd_node *knode; unsigned int i; + /* + * flush_work ensures that there are no outstanding + * work-queue items that will access interrupt_ring. New work items + * can't be created because we stopped interrupt handling above. + */ + flush_workqueue(kfd->ih_wq); + destroy_workqueue(kfd->ih_wq); + for (i = 0; i < num_nodes; i++) { knode = kfd->nodes[i]; device_queue_manager_uninit(knode->dqm); @@ -733,14 +752,14 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd, last_vmid_kfd = fls(gpu_resources->compute_vmid_bitmap)-1; vmid_num_kfd = last_vmid_kfd - first_vmid_kfd + 1; - /* For GFX9.4.3, we need special handling for VMIDs depending on - * partition mode. + /* For multi-partition capable GPUs, we need special handling for VMIDs + * depending on partition mode. * In CPX mode, the VMID range needs to be shared between XCDs. * Additionally, there are 13 VMIDs (3-15) available for KFD. To * divide them equally, we change starting VMID to 4 and not use * VMID 3. - * If the VMID range changes for GFX9.4.3, then this code MUST be - * revisited. + * If the VMID range changes for multi-partition capable GPUs, then + * this code MUST be revisited. */ if (kfd->adev->xcp_mgr) { partition_mode = amdgpu_xcp_query_partition_mode(kfd->adev->xcp_mgr, @@ -805,14 +824,12 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd, kfd->hive_id = kfd->adev->gmc.xgmi.hive_id; /* - * For GFX9.4.3, the KFD abstracts all partitions within a socket as - * xGMI connected in the topology so assign a unique hive id per - * device based on the pci device location if device is in PCIe mode. + * For multi-partition capable GPUs, the KFD abstracts all partitions + * within a socket as xGMI connected in the topology so assign a unique + * hive id per device based on the pci device location if device is in + * PCIe mode. */ - if (!kfd->hive_id && - (KFD_GC_VERSION(kfd) == IP_VERSION(9, 4, 3) || - KFD_GC_VERSION(kfd) == IP_VERSION(9, 4, 4)) && - kfd->num_nodes > 1) + if (!kfd->hive_id && kfd->num_nodes > 1) kfd->hive_id = pci_dev_id(kfd->adev->pdev); kfd->noretry = kfd->adev->gmc.noretry; @@ -850,12 +867,11 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd, KFD_XCP_MEMORY_SIZE(node->adev, node->node_id) >> 20); } - if ((KFD_GC_VERSION(kfd) == IP_VERSION(9, 4, 3) || - KFD_GC_VERSION(kfd) == IP_VERSION(9, 4, 4)) && - partition_mode == AMDGPU_CPX_PARTITION_MODE && + if (partition_mode == AMDGPU_CPX_PARTITION_MODE && kfd->num_nodes != 1) { - /* For GFX9.4.3 and CPX mode, first XCD gets VMID range - * 4-9 and second XCD gets VMID range 10-15. + /* For multi-partition capable GPUs and CPX mode, first + * XCD gets VMID range 4-9 and second XCD gets VMID + * range 10-15. */ node->vm_info.first_vmid_kfd = (i%2 == 0) ? @@ -879,8 +895,7 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd, amdgpu_amdkfd_get_local_mem_info(kfd->adev, &node->local_mem_info, node->xcp); - if (KFD_GC_VERSION(kfd) == IP_VERSION(9, 4, 3) || - KFD_GC_VERSION(kfd) == IP_VERSION(9, 4, 4)) + if (kfd->adev->xcp_mgr) kfd_setup_interrupt_bitmap(node, i); /* Initialize the KFD node */ @@ -1059,21 +1074,6 @@ static int kfd_resume(struct kfd_node *node) return err; } -static inline void kfd_queue_work(struct workqueue_struct *wq, - struct work_struct *work) -{ - int cpu, new_cpu; - - cpu = new_cpu = smp_processor_id(); - do { - new_cpu = cpumask_next(new_cpu, cpu_online_mask) % nr_cpu_ids; - if (cpu_to_node(new_cpu) == numa_node_id()) - break; - } while (cpu != new_cpu); - - queue_work_on(new_cpu, wq, work); -} - /* This is called directly from KGD at ISR. */ void kgd2kfd_interrupt(struct kfd_dev *kfd, const void *ih_ring_entry) { @@ -1099,7 +1099,7 @@ void kgd2kfd_interrupt(struct kfd_dev *kfd, const void *ih_ring_entry) patched_ihre, &is_patched) && enqueue_ih_ring_entry(node, is_patched ? patched_ihre : ih_ring_entry)) { - kfd_queue_work(node->ih_wq, &node->interrupt_work); + queue_work(node->kfd->ih_wq, &node->interrupt_work); spin_unlock_irqrestore(&node->interrupt_lock, flags); return; } @@ -1514,6 +1514,73 @@ bool kgd2kfd_compute_active(struct kfd_dev *kfd, uint32_t node_id) return kfd_compute_active(node); } +/** + * kgd2kfd_vmfault_fast_path() - KFD vm page fault interrupt handling fast path for gmc v9 + * @adev: amdgpu device + * @entry: vm fault interrupt vector + * @retry_fault: if this is retry fault + * + * retry fault - + * with CAM enabled, adev primary ring + * | gmc_v9_0_process_interrupt() + * adev soft_ring + * | gmc_v9_0_process_interrupt() worker failed to recover page fault + * KFD node ih_fifo + * | KFD interrupt_wq worker + * kfd_signal_vm_fault_event + * + * without CAM, adev primary ring1 + * | gmc_v9_0_process_interrupt worker failed to recvoer page fault + * KFD node ih_fifo + * | KFD interrupt_wq worker + * kfd_signal_vm_fault_event + * + * no-retry fault - + * adev primary ring + * | gmc_v9_0_process_interrupt() + * KFD node ih_fifo + * | KFD interrupt_wq worker + * kfd_signal_vm_fault_event + * + * fast path - After kfd_signal_vm_fault_event, gmc_v9_0_process_interrupt drop the page fault + * of same process, don't copy interrupt to KFD node ih_fifo. + * With gdb debugger enabled, need convert the retry fault to no-retry fault for + * debugger, cannot use the fast path. + * + * Return: + * true - use the fast path to handle this fault + * false - use normal path to handle it + */ +bool kgd2kfd_vmfault_fast_path(struct amdgpu_device *adev, struct amdgpu_iv_entry *entry, + bool retry_fault) +{ + struct kfd_process *p; + u32 cam_index; + + if (entry->ih == &adev->irq.ih_soft || entry->ih == &adev->irq.ih1) { + p = kfd_lookup_process_by_pasid(entry->pasid); + if (!p) + return true; + + if (p->gpu_page_fault && !p->debug_trap_enabled) { + if (retry_fault && adev->irq.retry_cam_enabled) { + cam_index = entry->src_data[2] & 0x3ff; + WDOORBELL32(adev->irq.retry_cam_doorbell_index, cam_index); + } + + kfd_unref_process(p); + return true; + } + + /* + * This is the first page fault, set flag and then signal user space + */ + p->gpu_page_fault = true; + kfd_unref_process(p); + } + return false; +} + #if defined(CONFIG_DEBUG_FS) /* This function will send a package to HIQ to hang the HWS diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c index c79fe9069e220e46e9508785972458c8499393b7..1405e8affd4840caaf18e67afe5a723923e0829d 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c @@ -207,6 +207,21 @@ static int add_queue_mes(struct device_queue_manager *dqm, struct queue *q, if (!down_read_trylock(&adev->reset_domain->sem)) return -EIO; + if (!pdd->proc_ctx_cpu_ptr) { + r = amdgpu_amdkfd_alloc_gtt_mem(adev, + AMDGPU_MES_PROC_CTX_SIZE, + &pdd->proc_ctx_bo, + &pdd->proc_ctx_gpu_addr, + &pdd->proc_ctx_cpu_ptr, + false); + if (r) { + dev_err(adev->dev, + "failed to allocate process context bo\n"); + return r; + } + memset(pdd->proc_ctx_cpu_ptr, 0, AMDGPU_MES_PROC_CTX_SIZE); + } + memset(&queue_input, 0x0, sizeof(struct mes_add_queue_input)); queue_input.process_id = qpd->pqm->process->pasid; queue_input.page_table_base_addr = qpd->page_table_base; @@ -2373,6 +2388,9 @@ static int wait_on_destroy_queue(struct device_queue_manager *dqm, q->process); int ret = 0; + if (WARN_ON(!pdd)) + return ret; + if (pdd->qpd.is_debug) return ret; diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager_v9.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager_v9.c index 210bcc048f4c511b9cfbd57034964055546135fb..67137e674f1d08dc21ae9cdd10c83c0592e29e88 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager_v9.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager_v9.c @@ -64,7 +64,8 @@ static int update_qpd_v9(struct device_queue_manager *dqm, qpd->sh_mem_config |= 1 << SH_MEM_CONFIG__RETRY_DISABLE__SHIFT; if (KFD_GC_VERSION(dqm->dev->kfd) == IP_VERSION(9, 4, 3) || - KFD_GC_VERSION(dqm->dev->kfd) == IP_VERSION(9, 4, 4)) + KFD_GC_VERSION(dqm->dev->kfd) == IP_VERSION(9, 4, 4) || + KFD_GC_VERSION(dqm->dev->kfd) == IP_VERSION(9, 5, 0)) qpd->sh_mem_config |= (1 << SH_MEM_CONFIG__F8_MODE__SHIFT); diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c b/drivers/gpu/drm/amd/amdkfd/kfd_events.c index ea37922492093534d4018be7f6c28fe48b50d10b..d075f24e5f9f39aff4d3d82d351e4d5fc4377e86 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_events.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.c @@ -748,6 +748,16 @@ void kfd_signal_event_interrupt(u32 pasid, uint32_t partial_id, uint64_t *slots = page_slots(p->signal_page); uint32_t id; + /* + * If id is valid but slot is not signaled, GPU may signal the same event twice + * before driver have chance to process the first interrupt, then signal slot is + * auto-reset after set_event wakeup the user space, just drop the second event as + * the application only need wakeup once. + */ + if ((valid_id_bits > 31 || (1U << valid_id_bits) >= KFD_SIGNAL_EVENT_LIMIT) && + partial_id < KFD_SIGNAL_EVENT_LIMIT && slots[partial_id] == UNSIGNALED_EVENT_SLOT) + goto out_unlock; + if (valid_id_bits) pr_debug_ratelimited("Partial ID invalid: %u (%u valid bits)\n", partial_id, valid_id_bits); @@ -776,6 +786,7 @@ void kfd_signal_event_interrupt(u32 pasid, uint32_t partial_id, } } +out_unlock: rcu_read_unlock(); kfd_unref_process(p); } diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c b/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c index d46a13156ee9d7f2b3213e32a3c9ffa869e19ac5..0cb5c582ce7dc42a68240e09fe0c62734b2a0a98 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c @@ -184,6 +184,7 @@ static void event_interrupt_poison_consumption_v9(struct kfd_node *dev, } else { reset = AMDGPU_RAS_GPU_RESET_MODE2_RESET; } + amdgpu_ras_set_err_poison(dev->adev, AMDGPU_RAS_BLOCK__GFX); break; case SOC15_IH_CLIENTID_VMC: case SOC15_IH_CLIENTID_VMC1: @@ -213,6 +214,7 @@ static void event_interrupt_poison_consumption_v9(struct kfd_node *dev, } else { reset = AMDGPU_RAS_GPU_RESET_MODE2_RESET; } + amdgpu_ras_set_err_poison(dev->adev, AMDGPU_RAS_BLOCK__SDMA); break; default: dev_warn(dev->adev->dev, diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_interrupt.c b/drivers/gpu/drm/amd/amdkfd/kfd_interrupt.c index 9b6b6e88259348a17e0fff9a1b15f83d098b892c..783c2f5a04e4bd016143a3bafe867fe474f80b39 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_interrupt.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_interrupt.c @@ -46,7 +46,7 @@ #include #include "kfd_priv.h" -#define KFD_IH_NUM_ENTRIES 8192 +#define KFD_IH_NUM_ENTRIES 16384 static void interrupt_wq(struct work_struct *); @@ -62,11 +62,14 @@ int kfd_interrupt_init(struct kfd_node *node) return r; } - node->ih_wq = alloc_workqueue("KFD IH", WQ_HIGHPRI, 1); - if (unlikely(!node->ih_wq)) { - kfifo_free(&node->ih_fifo); - dev_err(node->adev->dev, "Failed to allocate KFD IH workqueue\n"); - return -ENOMEM; + if (!node->kfd->ih_wq) { + node->kfd->ih_wq = alloc_workqueue("KFD IH", WQ_HIGHPRI | WQ_UNBOUND, + node->kfd->num_nodes); + if (unlikely(!node->kfd->ih_wq)) { + kfifo_free(&node->ih_fifo); + dev_err(node->adev->dev, "Failed to allocate KFD IH workqueue\n"); + return -ENOMEM; + } } spin_lock_init(&node->interrupt_lock); @@ -96,16 +99,6 @@ void kfd_interrupt_exit(struct kfd_node *node) spin_lock_irqsave(&node->interrupt_lock, flags); node->interrupts_active = false; spin_unlock_irqrestore(&node->interrupt_lock, flags); - - /* - * flush_work ensures that there are no outstanding - * work-queue items that will access interrupt_ring. New work items - * can't be created because we stopped interrupt handling above. - */ - flush_workqueue(node->ih_wq); - - destroy_workqueue(node->ih_wq); - kfifo_free(&node->ih_fifo); } @@ -114,55 +107,48 @@ void kfd_interrupt_exit(struct kfd_node *node) */ bool enqueue_ih_ring_entry(struct kfd_node *node, const void *ih_ring_entry) { - int count; - - count = kfifo_in(&node->ih_fifo, ih_ring_entry, - node->kfd->device_info.ih_ring_entry_size); - if (count != node->kfd->device_info.ih_ring_entry_size) { - dev_dbg_ratelimited(node->adev->dev, - "Interrupt ring overflow, dropping interrupt %d\n", - count); + if (kfifo_is_full(&node->ih_fifo)) { + dev_warn_ratelimited(node->adev->dev, "KFD node %d ih_fifo overflow\n", + node->node_id); return false; } + kfifo_in(&node->ih_fifo, ih_ring_entry, node->kfd->device_info.ih_ring_entry_size); return true; } /* * Assumption: single reader/writer. This function is not re-entrant */ -static bool dequeue_ih_ring_entry(struct kfd_node *node, void *ih_ring_entry) +static bool dequeue_ih_ring_entry(struct kfd_node *node, u32 **ih_ring_entry) { int count; - count = kfifo_out(&node->ih_fifo, ih_ring_entry, - node->kfd->device_info.ih_ring_entry_size); - - WARN_ON(count && count != node->kfd->device_info.ih_ring_entry_size); + if (kfifo_is_empty(&node->ih_fifo)) + return false; + count = kfifo_out_linear_ptr(&node->ih_fifo, ih_ring_entry, + node->kfd->device_info.ih_ring_entry_size); + WARN_ON(count != node->kfd->device_info.ih_ring_entry_size); return count == node->kfd->device_info.ih_ring_entry_size; } static void interrupt_wq(struct work_struct *work) { - struct kfd_node *dev = container_of(work, struct kfd_node, - interrupt_work); - uint32_t ih_ring_entry[KFD_MAX_RING_ENTRY_SIZE]; + struct kfd_node *dev = container_of(work, struct kfd_node, interrupt_work); + uint32_t *ih_ring_entry; unsigned long start_jiffies = jiffies; - if (dev->kfd->device_info.ih_ring_entry_size > sizeof(ih_ring_entry)) { - dev_err_once(dev->adev->dev, "Ring entry too small\n"); - return; - } - - while (dequeue_ih_ring_entry(dev, ih_ring_entry)) { + while (dequeue_ih_ring_entry(dev, &ih_ring_entry)) { dev->kfd->device_info.event_interrupt_class->interrupt_wq(dev, ih_ring_entry); + kfifo_skip_count(&dev->ih_fifo, dev->kfd->device_info.ih_ring_entry_size); + if (time_is_before_jiffies(start_jiffies + HZ)) { /* If we spent more than a second processing signals, * reschedule the worker to avoid soft-lockup warnings */ - queue_work(dev->ih_wq, &dev->interrupt_work); + queue_work(dev->kfd->ih_wq, &dev->interrupt_work); break; } } diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c index eacfeb32f35d6612e1afb5919f1cdc43c1b77d44..4b275937d05e93a41c6cae7173b10307728e1bb1 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c @@ -306,7 +306,7 @@ svm_migrate_copy_to_vram(struct kfd_node *node, struct svm_range *prange, spage = migrate_pfn_to_page(migrate->src[i]); if (spage && !is_zone_device_page(spage)) { src[i] = dma_map_page(dev, spage, 0, PAGE_SIZE, - DMA_TO_DEVICE); + DMA_BIDIRECTIONAL); r = dma_mapping_error(dev, src[i]); if (r) { dev_err(dev, "%s: fail %d dma_map_page\n", @@ -629,7 +629,7 @@ svm_migrate_copy_to_ram(struct amdgpu_device *adev, struct svm_range *prange, goto out_oom; } - dst[i] = dma_map_page(dev, dpage, 0, PAGE_SIZE, DMA_FROM_DEVICE); + dst[i] = dma_map_page(dev, dpage, 0, PAGE_SIZE, DMA_BIDIRECTIONAL); r = dma_mapping_error(dev, dst[i]); if (r) { dev_err(adev->dev, "%s: fail %d dma_map_page\n", __func__, r); diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c index 84e8ea3a8a0c940561c9f97eb62922d8d3311ecf..ff417d5361c42e7c080b9eb1892a941d5157bae9 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c @@ -78,7 +78,8 @@ static void update_cu_mask(struct mqd_manager *mm, void *mqd, m->compute_static_thread_mgmt_se2 = se_mask[2]; m->compute_static_thread_mgmt_se3 = se_mask[3]; if (KFD_GC_VERSION(mm->dev) != IP_VERSION(9, 4, 3) && - KFD_GC_VERSION(mm->dev) != IP_VERSION(9, 4, 4)) { + KFD_GC_VERSION(mm->dev) != IP_VERSION(9, 4, 4) && + KFD_GC_VERSION(mm->dev) != IP_VERSION(9, 5, 0)) { m->compute_static_thread_mgmt_se4 = se_mask[4]; m->compute_static_thread_mgmt_se5 = se_mask[5]; m->compute_static_thread_mgmt_se6 = se_mask[6]; @@ -301,7 +302,8 @@ static void update_mqd(struct mqd_manager *mm, void *mqd, m->cp_hqd_ctx_save_control = 0; if (KFD_GC_VERSION(mm->dev) != IP_VERSION(9, 4, 3) && - KFD_GC_VERSION(mm->dev) != IP_VERSION(9, 4, 4)) + KFD_GC_VERSION(mm->dev) != IP_VERSION(9, 4, 4) && + KFD_GC_VERSION(mm->dev) != IP_VERSION(9, 5, 0)) update_cu_mask(mm, mqd, minfo, 0); set_priority(m, q); @@ -885,7 +887,8 @@ struct mqd_manager *mqd_manager_init_v9(enum KFD_MQD_TYPE type, mqd->debugfs_show_mqd = debugfs_show_mqd; #endif if (KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 3) || - KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 4)) { + KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 4) || + KFD_GC_VERSION(dev) == IP_VERSION(9, 5, 0)) { mqd->init_mqd = init_mqd_v9_4_3; mqd->load_mqd = load_mqd_v9_4_3; mqd->update_mqd = update_mqd_v9_4_3; @@ -909,8 +912,10 @@ struct mqd_manager *mqd_manager_init_v9(enum KFD_MQD_TYPE type, #if defined(CONFIG_DEBUG_FS) mqd->debugfs_show_mqd = debugfs_show_mqd; #endif + mqd->check_preemption_failed = check_preemption_failed; if (KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 3) || - KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 4)) { + KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 4) || + KFD_GC_VERSION(dev) == IP_VERSION(9, 5, 0)) { mqd->init_mqd = init_mqd_hiq_v9_4_3; mqd->load_mqd = hiq_load_mqd_kiq_v9_4_3; mqd->destroy_mqd = destroy_hiq_mqd_v9_4_3; diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c index 37930629edc5de6393ff26903f73fd0bf4700427..4984b41cd372131cb52543c35d3e12e4dd35fec6 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c @@ -28,6 +28,10 @@ #include "kfd_kernel_queue.h" #include "kfd_priv.h" +#define OVER_SUBSCRIPTION_PROCESS_COUNT (1 << 0) +#define OVER_SUBSCRIPTION_COMPUTE_QUEUE_COUNT (1 << 1) +#define OVER_SUBSCRIPTION_GWS_QUEUE_COUNT (1 << 2) + static inline void inc_wptr(unsigned int *wptr, unsigned int increment_bytes, unsigned int buffer_size_bytes) { @@ -40,7 +44,7 @@ static inline void inc_wptr(unsigned int *wptr, unsigned int increment_bytes, static void pm_calc_rlib_size(struct packet_manager *pm, unsigned int *rlib_size, - bool *over_subscription) + int *over_subscription) { unsigned int process_count, queue_count, compute_queue_count, gws_queue_count; unsigned int map_queue_size; @@ -58,17 +62,20 @@ static void pm_calc_rlib_size(struct packet_manager *pm, * hws_max_conc_proc has been done in * kgd2kfd_device_init(). */ - *over_subscription = false; + *over_subscription = 0; if (node->max_proc_per_quantum > 1) max_proc_per_quantum = node->max_proc_per_quantum; - if ((process_count > max_proc_per_quantum) || - compute_queue_count > get_cp_queues_num(pm->dqm) || - gws_queue_count > 1) { - *over_subscription = true; + if (process_count > max_proc_per_quantum) + *over_subscription |= OVER_SUBSCRIPTION_PROCESS_COUNT; + if (compute_queue_count > get_cp_queues_num(pm->dqm)) + *over_subscription |= OVER_SUBSCRIPTION_COMPUTE_QUEUE_COUNT; + if (gws_queue_count > 1) + *over_subscription |= OVER_SUBSCRIPTION_GWS_QUEUE_COUNT; + + if (*over_subscription) dev_dbg(dev, "Over subscribed runlist\n"); - } map_queue_size = pm->pmf->map_queues_size; /* calculate run list ib allocation size */ @@ -89,7 +96,7 @@ static int pm_allocate_runlist_ib(struct packet_manager *pm, unsigned int **rl_buffer, uint64_t *rl_gpu_buffer, unsigned int *rl_buffer_size, - bool *is_over_subscription) + int *is_over_subscription) { struct kfd_node *node = pm->dqm->dev; struct device *dev = node->adev->dev; @@ -134,7 +141,7 @@ static int pm_create_runlist_ib(struct packet_manager *pm, struct qcm_process_device *qpd; struct queue *q; struct kernel_queue *kq; - bool is_over_subscription; + int is_over_subscription; rl_wptr = retval = processes_mapped = 0; @@ -213,15 +220,20 @@ static int pm_create_runlist_ib(struct packet_manager *pm, if (is_over_subscription) { if (!pm->is_over_subscription) - dev_warn( - dev, - "Runlist is getting oversubscribed. Expect reduced ROCm performance.\n"); + dev_warn(dev, "Runlist is getting oversubscribed due to%s%s%s. Expect reduced ROCm performance.\n", + is_over_subscription & OVER_SUBSCRIPTION_PROCESS_COUNT ? + " too many processes." : "", + is_over_subscription & OVER_SUBSCRIPTION_COMPUTE_QUEUE_COUNT ? + " too many queues." : "", + is_over_subscription & OVER_SUBSCRIPTION_GWS_QUEUE_COUNT ? + " multiple processes using cooperative launch." : ""); + retval = pm->pmf->runlist(pm, &rl_buffer[rl_wptr], *rl_gpu_addr, alloc_size_bytes / sizeof(uint32_t), true); } - pm->is_over_subscription = is_over_subscription; + pm->is_over_subscription = !!is_over_subscription; for (i = 0; i < alloc_size_bytes / sizeof(uint32_t); i++) pr_debug("0x%2X ", rl_buffer[i]); @@ -248,7 +260,8 @@ int pm_init(struct packet_manager *pm, struct device_queue_manager *dqm) default: if (KFD_GC_VERSION(dqm->dev) == IP_VERSION(9, 4, 2) || KFD_GC_VERSION(dqm->dev) == IP_VERSION(9, 4, 3) || - KFD_GC_VERSION(dqm->dev) == IP_VERSION(9, 4, 4)) + KFD_GC_VERSION(dqm->dev) == IP_VERSION(9, 4, 4) || + KFD_GC_VERSION(dqm->dev) == IP_VERSION(9, 5, 0)) pm->pmf = &kfd_aldebaran_pm_funcs; else if (KFD_GC_VERSION(dqm->dev) >= IP_VERSION(9, 0, 1)) pm->pmf = &kfd_v9_pm_funcs; diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h index 9e5ca0b93b2a256f511ca07820f89724bf888bf0..c32b255c0eb2d4886098d04ea878936c2c76d141 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h @@ -207,7 +207,8 @@ enum cache_policy { #define KFD_SUPPORT_XNACK_PER_PROCESS(dev)\ ((KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 2)) || \ (KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 3)) || \ - (KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 4))) + (KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 4)) || \ + (KFD_GC_VERSION(dev) == IP_VERSION(9, 5, 0))) struct kfd_node; @@ -273,7 +274,6 @@ struct kfd_node { /* Interrupts */ struct kfifo ih_fifo; - struct workqueue_struct *ih_wq; struct work_struct interrupt_work; spinlock_t interrupt_lock; @@ -366,6 +366,8 @@ struct kfd_dev { struct kfd_node *nodes[MAX_KFD_NODES]; unsigned int num_nodes; + struct workqueue_struct *ih_wq; + /* Kernel doorbells for KFD device */ struct amdgpu_bo *doorbells; @@ -1002,6 +1004,9 @@ struct kfd_process { struct semaphore runtime_enable_sema; bool is_runtime_retry; struct kfd_runtime_info runtime_info; + + /* if gpu page fault sent to KFD */ + bool gpu_page_fault; }; #define KFD_PROCESS_TABLE_SIZE 5 /* bits: 32 entries */ @@ -1150,7 +1155,8 @@ static inline struct kfd_node *kfd_node_by_irq_ids(struct amdgpu_device *adev, uint32_t i; if (KFD_GC_VERSION(dev) != IP_VERSION(9, 4, 3) && - KFD_GC_VERSION(dev) != IP_VERSION(9, 4, 4)) + KFD_GC_VERSION(dev) != IP_VERSION(9, 4, 4) && + KFD_GC_VERSION(dev) != IP_VERSION(9, 5, 0)) return dev->nodes[0]; for (i = 0; i < dev->num_nodes; i++) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c index 87cd52cf4ee995918050b7bcd93838e894fff98c..0976b5b0e8e8c28e74d2e0e3cad8352ec5b9a315 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c @@ -1076,7 +1076,8 @@ static void kfd_process_destroy_pdds(struct kfd_process *p) kfd_free_process_doorbells(pdd->dev->kfd, pdd); - if (pdd->dev->kfd->shared_resources.enable_mes) + if (pdd->dev->kfd->shared_resources.enable_mes && + pdd->proc_ctx_cpu_ptr) amdgpu_amdkfd_free_gtt_mem(pdd->dev->adev, &pdd->proc_ctx_bo); /* @@ -1608,7 +1609,6 @@ struct kfd_process_device *kfd_create_process_device_data(struct kfd_node *dev, struct kfd_process *p) { struct kfd_process_device *pdd = NULL; - int retval = 0; if (WARN_ON_ONCE(p->n_pdds >= MAX_GPU_INSTANCE)) return NULL; @@ -1632,21 +1632,6 @@ struct kfd_process_device *kfd_create_process_device_data(struct kfd_node *dev, pdd->user_gpu_id = dev->id; atomic64_set(&pdd->evict_duration_counter, 0); - if (dev->kfd->shared_resources.enable_mes) { - retval = amdgpu_amdkfd_alloc_gtt_mem(dev->adev, - AMDGPU_MES_PROC_CTX_SIZE, - &pdd->proc_ctx_bo, - &pdd->proc_ctx_gpu_addr, - &pdd->proc_ctx_cpu_ptr, - false); - if (retval) { - dev_err(dev->adev->dev, - "failed to allocate process context bo\n"); - goto err_free_pdd; - } - memset(pdd->proc_ctx_cpu_ptr, 0, AMDGPU_MES_PROC_CTX_SIZE); - } - p->pdds[p->n_pdds++] = pdd; if (kfd_dbg_is_per_vmid_supported(pdd->dev)) pdd->spi_dbg_override = pdd->dev->kfd2kgd->disable_debug_trap( @@ -1658,10 +1643,6 @@ struct kfd_process_device *kfd_create_process_device_data(struct kfd_node *dev, idr_init(&pdd->alloc_idr); return pdd; - -err_free_pdd: - kfree(pdd); - return NULL; } /** @@ -2146,10 +2127,11 @@ int kfd_process_drain_interrupts(struct kfd_process_device *pdd) irq_drain_fence[3] = pdd->process->pasid; /* - * For GFX 9.4.3, send the NodeId also in IH cookie DW[3] + * For GFX 9.4.3/9.5.0, send the NodeId also in IH cookie DW[3] */ if (KFD_GC_VERSION(pdd->dev->kfd) == IP_VERSION(9, 4, 3) || - KFD_GC_VERSION(pdd->dev->kfd) == IP_VERSION(9, 4, 4)) { + KFD_GC_VERSION(pdd->dev->kfd) == IP_VERSION(9, 4, 4) || + KFD_GC_VERSION(pdd->dev->kfd) == IP_VERSION(9, 5, 0)) { node_id = ffs(pdd->dev->interrupt_bitmap) - 1; irq_drain_fence[3] |= node_id << 16; } diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c index c76db22a1000578378a8f5183c59ab3b13bce524..9df56f8e09f910680d7f6f37710fe65abb0bdf30 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c @@ -131,8 +131,9 @@ int pqm_set_gws(struct process_queue_manager *pqm, unsigned int qid, if (!gws && pdd->qpd.num_gws == 0) return -EINVAL; - if (KFD_GC_VERSION(dev) != IP_VERSION(9, 4, 3) && - KFD_GC_VERSION(dev) != IP_VERSION(9, 4, 4) && + if ((KFD_GC_VERSION(dev) != IP_VERSION(9, 4, 3) && + KFD_GC_VERSION(dev) != IP_VERSION(9, 4, 4) && + KFD_GC_VERSION(dev) != IP_VERSION(9, 5, 0)) && !dev->kfd->shared_resources.enable_mes) { if (gws) ret = amdgpu_amdkfd_add_gws_to_process(pdd->process->kgd_process_info, @@ -197,6 +198,7 @@ static void pqm_clean_queue_resource(struct process_queue_manager *pqm, if (pqn->q->gws) { if (KFD_GC_VERSION(pqn->q->device) != IP_VERSION(9, 4, 3) && KFD_GC_VERSION(pqn->q->device) != IP_VERSION(9, 4, 4) && + KFD_GC_VERSION(pqn->q->device) != IP_VERSION(9, 5, 0) && !dev->kfd->shared_resources.enable_mes) amdgpu_amdkfd_remove_gws_from_process( pqm->process->kgd_process_info, pqn->q->gws); @@ -212,13 +214,17 @@ static void pqm_clean_queue_resource(struct process_queue_manager *pqm, void pqm_uninit(struct process_queue_manager *pqm) { struct process_queue_node *pqn, *next; - struct kfd_process_device *pdd; list_for_each_entry_safe(pqn, next, &pqm->queues, process_queue_list) { if (pqn->q) { - pdd = kfd_get_process_device_data(pqn->q->device, pqm->process); - kfd_queue_unref_bo_vas(pdd, &pqn->q->properties); - kfd_queue_release_buffers(pdd, &pqn->q->properties); + struct kfd_process_device *pdd = kfd_get_process_device_data(pqn->q->device, + pqm->process); + if (pdd) { + kfd_queue_unref_bo_vas(pdd, &pqn->q->properties); + kfd_queue_release_buffers(pdd, &pqn->q->properties); + } else { + WARN_ON(!pdd); + } pqm_clean_queue_resource(pqm, pqn); } @@ -316,11 +322,12 @@ int pqm_create_queue(struct process_queue_manager *pqm, unsigned int max_queues = 127; /* HWS limit */ /* - * On GFX 9.4.3, increase the number of queues that - * can be created to 255. No HWS limit on GFX 9.4.3. + * On GFX 9.4.3/9.5.0, increase the number of queues that + * can be created to 255. No HWS limit on GFX 9.4.3/9.5.0. */ if (KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 3) || - KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 4)) + KFD_GC_VERSION(dev) == IP_VERSION(9, 4, 4) || + KFD_GC_VERSION(dev) == IP_VERSION(9, 5, 0)) max_queues = 255; q = NULL; diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_queue.c b/drivers/gpu/drm/amd/amdkfd/kfd_queue.c index ad29634f8b44caed8757c3d5182f924935a31e32..ecccd7adbab4d266b3bfc85d934f5a1126809c01 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_queue.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_queue.c @@ -394,7 +394,8 @@ static u32 kfd_get_vgpr_size_per_cu(u32 gfxv) if ((gfxv / 100 * 100) == 90400 || /* GFX_VERSION_AQUA_VANJARAM */ gfxv == 90010 || /* GFX_VERSION_ALDEBARAN */ - gfxv == 90008) /* GFX_VERSION_ARCTURUS */ + gfxv == 90008 || /* GFX_VERSION_ARCTURUS */ + gfxv == 90500) vgpr_size = 0x80000; else if (gfxv == 110000 || /* GFX_VERSION_PLUM_BONITO */ gfxv == 110001 || /* GFX_VERSION_WHEAT_NAS */ @@ -405,9 +406,10 @@ static u32 kfd_get_vgpr_size_per_cu(u32 gfxv) return vgpr_size; } -#define WG_CONTEXT_DATA_SIZE_PER_CU(gfxv) \ +#define WG_CONTEXT_DATA_SIZE_PER_CU(gfxv, props) \ (kfd_get_vgpr_size_per_cu(gfxv) + SGPR_SIZE_PER_CU +\ - LDS_SIZE_PER_CU + HWREG_SIZE_PER_CU) + (((gfxv) == 90500) ? (props->lds_size_in_kb << 10) : LDS_SIZE_PER_CU) +\ + HWREG_SIZE_PER_CU) #define CNTL_STACK_BYTES_PER_WAVE(gfxv) \ ((gfxv) >= 100100 ? 12 : 8) /* GFX_VERSION_NAVI10*/ @@ -431,7 +433,7 @@ void kfd_queue_ctx_save_restore_size(struct kfd_topology_device *dev) min(cu_num * 40, props->array_count / props->simd_arrays_per_engine * 512) : cu_num * 32; - wg_data_size = ALIGN(cu_num * WG_CONTEXT_DATA_SIZE_PER_CU(gfxv), PAGE_SIZE); + wg_data_size = ALIGN(cu_num * WG_CONTEXT_DATA_SIZE_PER_CU(gfxv, props), PAGE_SIZE); ctl_stack_size = wave_num * CNTL_STACK_BYTES_PER_WAVE(gfxv) + 8; ctl_stack_size = ALIGN(SIZEOF_HSA_USER_CONTEXT_SAVE_AREA_HEADER + ctl_stack_size, PAGE_SIZE); diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c index 3e2911895c740dd53fcd8d2f49d17a818250589b..bd3e20d981e0c8c345e63df3b3c4457ff080e2ea 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c @@ -1195,6 +1195,7 @@ svm_range_get_pte_flags(struct kfd_node *node, struct kfd_node *bo_node; uint32_t flags = prange->flags; uint32_t mapping_flags = 0; + uint32_t gc_ip_version = KFD_GC_VERSION(node); uint64_t pte_flags; bool snoop = (domain != SVM_RANGE_VRAM_DOMAIN); bool coherent = flags & (KFD_IOCTL_SVM_FLAG_COHERENT | KFD_IOCTL_SVM_FLAG_EXT_COHERENT); @@ -1204,7 +1205,7 @@ svm_range_get_pte_flags(struct kfd_node *node, if (domain == SVM_RANGE_VRAM_DOMAIN) bo_node = prange->svm_bo->node; - switch (amdgpu_ip_version(node->adev, GC_HWIP, 0)) { + switch (gc_ip_version) { case IP_VERSION(9, 4, 1): if (domain == SVM_RANGE_VRAM_DOMAIN) { if (bo_node == node) { @@ -1241,8 +1242,10 @@ svm_range_get_pte_flags(struct kfd_node *node, break; case IP_VERSION(9, 4, 3): case IP_VERSION(9, 4, 4): + case IP_VERSION(9, 5, 0): if (ext_coherent) - mtype_local = node->adev->rev_id ? AMDGPU_VM_MTYPE_CC : AMDGPU_VM_MTYPE_UC; + mtype_local = (gc_ip_version < IP_VERSION(9, 5, 0) && !node->adev->rev_id) ? + AMDGPU_VM_MTYPE_UC : AMDGPU_VM_MTYPE_CC; else mtype_local = amdgpu_mtype_local == 1 ? AMDGPU_VM_MTYPE_NC : amdgpu_mtype_local == 2 ? AMDGPU_VM_MTYPE_CC : AMDGPU_VM_MTYPE_RW; @@ -1257,9 +1260,13 @@ svm_range_get_pte_flags(struct kfd_node *node, */ else if (svm_nodes_in_same_hive(bo_node, node) && !ext_coherent) mapping_flags |= AMDGPU_VM_MTYPE_NC; - /* PCIe P2P or extended system scope coherence */ - else + /* PCIe P2P on GPUs pre-9.5.0 */ + else if (gc_ip_version < IP_VERSION(9, 5, 0) && + !svm_nodes_in_same_hive(bo_node, node)) mapping_flags |= AMDGPU_VM_MTYPE_UC; + /* Other remote memory */ + else + mapping_flags |= ext_coherent ? AMDGPU_VM_MTYPE_UC : AMDGPU_VM_MTYPE_NC; /* system memory accessed by the APU */ } else if (node->adev->flags & AMD_IS_APU) { /* On NUMA systems, locality is determined per-page @@ -1271,7 +1278,10 @@ svm_range_get_pte_flags(struct kfd_node *node, mapping_flags |= ext_coherent ? AMDGPU_VM_MTYPE_UC : AMDGPU_VM_MTYPE_NC; /* system memory accessed by the dGPU */ } else { - mapping_flags |= AMDGPU_VM_MTYPE_UC; + if (gc_ip_version < IP_VERSION(9, 5, 0)) + mapping_flags |= AMDGPU_VM_MTYPE_UC; + else + mapping_flags |= AMDGPU_VM_MTYPE_NC; } break; case IP_VERSION(12, 0, 0): @@ -1299,7 +1309,7 @@ svm_range_get_pte_flags(struct kfd_node *node, pte_flags = AMDGPU_PTE_VALID; pte_flags |= (domain == SVM_RANGE_VRAM_DOMAIN) ? 0 : AMDGPU_PTE_SYSTEM; pte_flags |= snoop ? AMDGPU_PTE_SNOOPED : 0; - if (KFD_GC_VERSION(node) >= IP_VERSION(12, 0, 0)) + if (gc_ip_version >= IP_VERSION(12, 0, 0)) pte_flags |= AMDGPU_PTE_IS_PTE; pte_flags |= amdgpu_gem_va_map_flags(node->adev, mapping_flags); diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c index 9476e30d6baa1b5acd8b98050a506d1cf8b54f6b..ceb9fb475ef137d00a17e8390222d2fe55533dd7 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c @@ -1714,7 +1714,8 @@ static int fill_in_l2_l3_pcache(struct kfd_cache_properties **props_ext, pcache->cacheline_size = pcache_info[cache_type].cache_line_size; if (KFD_GC_VERSION(knode) == IP_VERSION(9, 4, 3) || - KFD_GC_VERSION(knode) == IP_VERSION(9, 4, 4)) + KFD_GC_VERSION(knode) == IP_VERSION(9, 4, 4) || + KFD_GC_VERSION(knode) == IP_VERSION(9, 5, 0)) mode = adev->gmc.gmc_funcs->query_mem_partition_mode(adev); else mode = UNKNOWN_MEMORY_PARTITION_MODE; @@ -1776,7 +1777,7 @@ static void kfd_fill_cache_non_crat_info(struct kfd_topology_device *dev, struct struct amdgpu_cu_info *cu_info = &kdev->adev->gfx.cu_info; struct amdgpu_gfx_config *gfx_info = &kdev->adev->gfx.config; int gpu_processor_id; - struct kfd_cache_properties *props_ext; + struct kfd_cache_properties *props_ext = NULL; int num_of_entries = 0; int num_of_cache_types = 0; struct kfd_gpu_cache_info cache_info[KFD_MAX_CACHE_TYPES]; diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c index 48be917e7bc550065acb383cd239c9d5c2ea7d83..b631b2eba0f0cd3fd9335ab4427298a6ff5fd8fc 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -93,6 +93,7 @@ #include #include #include +#include #include #include #include @@ -955,13 +956,13 @@ static void dm_dmub_outbox1_low_irq(void *interrupt_params) } } -static int dm_set_clockgating_state(void *handle, +static int dm_set_clockgating_state(struct amdgpu_ip_block *ip_block, enum amd_clockgating_state state) { return 0; } -static int dm_set_powergating_state(void *handle, +static int dm_set_powergating_state(struct amdgpu_ip_block *ip_block, enum amd_powergating_state state) { return 0; @@ -1036,8 +1037,10 @@ static int amdgpu_dm_audio_component_get_eld(struct device *kdev, int port, continue; *enabled = true; + mutex_lock(&connector->eld_mutex); ret = drm_eld_size(connector->eld); memcpy(buf, connector->eld, min(max_bytes, ret)); + mutex_unlock(&connector->eld_mutex); break; } @@ -2152,8 +2155,8 @@ static int amdgpu_dm_init(struct amdgpu_device *adev) } #if defined(CONFIG_DRM_AMD_SECURE_DISPLAY) - adev->dm.secure_display_ctxs = amdgpu_dm_crtc_secure_display_create_contexts(adev); - if (!adev->dm.secure_display_ctxs) + amdgpu_dm_crtc_secure_display_create_contexts(adev); + if (!adev->dm.secure_display_ctx.crtc_ctx) DRM_ERROR("amdgpu: failed to initialize secure display contexts.\n"); #endif @@ -2197,15 +2200,15 @@ static void amdgpu_dm_fini(struct amdgpu_device *adev) amdgpu_dm_destroy_drm_device(&adev->dm); #if defined(CONFIG_DRM_AMD_SECURE_DISPLAY) - if (adev->dm.secure_display_ctxs) { + if (adev->dm.secure_display_ctx.crtc_ctx) { for (i = 0; i < adev->mode_info.num_crtc; i++) { - if (adev->dm.secure_display_ctxs[i].crtc) { - flush_work(&adev->dm.secure_display_ctxs[i].notify_ta_work); - flush_work(&adev->dm.secure_display_ctxs[i].forward_roi_work); + if (adev->dm.secure_display_ctx.crtc_ctx[i].crtc) { + flush_work(&adev->dm.secure_display_ctx.crtc_ctx[i].notify_ta_work); + flush_work(&adev->dm.secure_display_ctx.crtc_ctx[i].forward_roi_work); } } - kfree(adev->dm.secure_display_ctxs); - adev->dm.secure_display_ctxs = NULL; + kfree(adev->dm.secure_display_ctx.crtc_ctx); + adev->dm.secure_display_ctx.crtc_ctx = NULL; } #endif if (adev->dm.hdcp_workqueue) { @@ -2338,7 +2341,8 @@ static int load_dmcu_fw(struct amdgpu_device *adev) return 0; } - r = amdgpu_ucode_request(adev, &adev->dm.fw_dmcu, "%s", fw_name_dmcu); + r = amdgpu_ucode_request(adev, &adev->dm.fw_dmcu, AMDGPU_UCODE_REQUIRED, + "%s", fw_name_dmcu); if (r == -ENODEV) { /* DMCU firmware is not necessary, so don't raise a fuss if it's missing */ DRM_DEBUG_KMS("dm: DMCU firmware not found\n"); @@ -3457,6 +3461,7 @@ static void update_connector_ext_caps(struct amdgpu_dm_connector *aconnector) struct drm_connector *conn_base; struct amdgpu_device *adev; struct drm_luminance_range_info *luminance_range; + int min_input_signal_override; if (aconnector->bl_idx == -1 || aconnector->dc_link->connector_signal != SIGNAL_TYPE_EDP) @@ -3493,6 +3498,10 @@ static void update_connector_ext_caps(struct amdgpu_dm_connector *aconnector) caps->aux_min_input_signal = 0; caps->aux_max_input_signal = 512; } + + min_input_signal_override = drm_get_panel_min_brightness_quirk(aconnector->drm_edid); + if (min_input_signal_override >= 0) + caps->min_input_signal = min_input_signal_override; } void amdgpu_dm_update_connector_after_detect( @@ -5306,7 +5315,8 @@ static int dm_init_microcode(struct amdgpu_device *adev) /* ASIC doesn't support DMUB. */ return 0; } - r = amdgpu_ucode_request(adev, &adev->dm.dmub_fw, "%s", fw_name_dmub); + r = amdgpu_ucode_request(adev, &adev->dm.dmub_fw, AMDGPU_UCODE_REQUIRED, + "%s", fw_name_dmub); return r; } @@ -5522,8 +5532,7 @@ fill_dc_plane_info_and_addr(struct amdgpu_device *adev, const u64 tiling_flags, struct dc_plane_info *plane_info, struct dc_plane_address *address, - bool tmz_surface, - bool force_disable_dcc) + bool tmz_surface) { const struct drm_framebuffer *fb = plane_state->fb; const struct amdgpu_framebuffer *afb = @@ -5622,7 +5631,7 @@ fill_dc_plane_info_and_addr(struct amdgpu_device *adev, &plane_info->tiling_info, &plane_info->plane_size, &plane_info->dcc, address, - tmz_surface, force_disable_dcc); + tmz_surface); if (ret) return ret; @@ -5643,7 +5652,6 @@ static int fill_dc_plane_attributes(struct amdgpu_device *adev, struct dc_scaling_info scaling_info; struct dc_plane_info plane_info; int ret; - bool force_disable_dcc = false; ret = amdgpu_dm_plane_fill_dc_scaling_info(adev, plane_state, &scaling_info); if (ret) @@ -5654,13 +5662,11 @@ static int fill_dc_plane_attributes(struct amdgpu_device *adev, dc_plane_state->clip_rect = scaling_info.clip_rect; dc_plane_state->scaling_quality = scaling_info.scaling_quality; - force_disable_dcc = adev->asic_type == CHIP_RAVEN && adev->in_suspend; ret = fill_dc_plane_info_and_addr(adev, plane_state, afb->tiling_flags, &plane_info, &dc_plane_state->address, - afb->tmz_surface, - force_disable_dcc); + afb->tmz_surface); if (ret) return ret; @@ -9097,7 +9103,7 @@ static void amdgpu_dm_commit_planes(struct drm_atomic_state *state, afb->tiling_flags, &bundle->plane_infos[planes_count], &bundle->flip_addrs[planes_count].address, - afb->tmz_surface, false); + afb->tmz_surface); drm_dbg_state(state->dev, "plane: id=%d dcc_en=%d\n", new_plane_state->plane->index, diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h index 6464a8378387c70f062a31ed5a1099c68ebac38b..e46e1365fe910c52ab73a2be6e90e59ccaeb2182 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h @@ -541,12 +541,12 @@ struct amdgpu_display_manager { #if defined(CONFIG_DRM_AMD_SECURE_DISPLAY) /** - * @secure_display_ctxs: + * @secure_display_ctx: * - * Store the ROI information and the work_struct to command dmub and psp for - * all crtcs. + * Store secure display relevant info. e.g. the ROI information + * , the work_struct to command dmub, etc. */ - struct secure_display_context *secure_display_ctxs; + struct secure_display_context secure_display_ctx; #endif /** * @hpd_rx_offload_wq: diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crc.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crc.c index f936a35fa9ebb716b8082d8e53cf0b4053e1a586..6fdc306a4a86bbbfcca21cb090311539791f6c6b 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crc.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crc.c @@ -83,12 +83,221 @@ const char *const *amdgpu_dm_crtc_get_crc_sources(struct drm_crtc *crtc, } #ifdef CONFIG_DRM_AMD_SECURE_DISPLAY +static void update_phy_id_mapping(struct amdgpu_device *adev) +{ + struct drm_device *ddev = adev_to_drm(adev); + struct amdgpu_display_manager *dm = &adev->dm; + struct drm_connector *connector; + struct amdgpu_dm_connector *aconnector; + struct amdgpu_dm_connector *sort_connector[AMDGPU_DM_MAX_CRTC] = {NULL}; + struct drm_connector_list_iter iter; + uint8_t idx = 0, idx_2 = 0, connector_cnt = 0; + + dm->secure_display_ctx.phy_mapping_updated = false; + + mutex_lock(&ddev->mode_config.mutex); + drm_connector_list_iter_begin(ddev, &iter); + drm_for_each_connector_iter(connector, &iter) { + + if (connector->status != connector_status_connected) + continue; + + if (idx >= AMDGPU_DM_MAX_CRTC) { + DRM_WARN("%s connected connectors exceed max crtc\n", __func__); + mutex_unlock(&ddev->mode_config.mutex); + return; + } + + aconnector = to_amdgpu_dm_connector(connector); + + sort_connector[idx] = aconnector; + idx++; + connector_cnt++; + } + drm_connector_list_iter_end(&iter); + + /* sort connectors by link_enc_hw_instance first */ + for (idx = connector_cnt; idx > 1 ; idx--) { + for (idx_2 = 0; idx_2 < (idx - 1); idx_2++) { + if (sort_connector[idx_2]->dc_link->link_enc_hw_inst > + sort_connector[idx_2 + 1]->dc_link->link_enc_hw_inst) + swap(sort_connector[idx_2], sort_connector[idx_2 + 1]); + } + } + + /* + * Sort mst connectors by RAD. mst connectors with the same enc_hw_instance are already + * sorted together above. + */ + for (idx = 0; idx < connector_cnt; /*Do nothing*/) { + if (sort_connector[idx]->mst_root) { + uint8_t i, j, k; + uint8_t mst_con_cnt = 1; + + for (idx_2 = (idx + 1); idx_2 < connector_cnt; idx_2++) { + if (sort_connector[idx_2]->mst_root == sort_connector[idx]->mst_root) + mst_con_cnt++; + else + break; + } + + for (i = mst_con_cnt; i > 1; i--) { + for (j = idx; j < (idx + i - 2); j++) { + int mstb_lct = sort_connector[j]->mst_output_port->parent->lct; + int next_mstb_lct = sort_connector[j + 1]->mst_output_port->parent->lct; + u8 *rad; + u8 *next_rad; + bool swap = false; + + /* Sort by mst tree depth first. Then compare RAD if depth is the same*/ + if (mstb_lct > next_mstb_lct) { + swap = true; + } else if (mstb_lct == next_mstb_lct) { + if (mstb_lct == 1) { + if (sort_connector[j]->mst_output_port->port_num > sort_connector[j + 1]->mst_output_port->port_num) + swap = true; + } else if (mstb_lct > 1) { + rad = sort_connector[j]->mst_output_port->parent->rad; + next_rad = sort_connector[j + 1]->mst_output_port->parent->rad; + + for (k = 0; k < mstb_lct - 1; k++) { + int shift = (k % 2) ? 0 : 4; + int port_num = (rad[k / 2] >> shift) & 0xf; + int next_port_num = (next_rad[k / 2] >> shift) & 0xf; + + if (port_num > next_port_num) { + swap = true; + break; + } + } + } else { + DRM_ERROR("MST LCT shouldn't be set as < 1"); + mutex_unlock(&ddev->mode_config.mutex); + return; + } + } + + if (swap) + swap(sort_connector[j], sort_connector[j + 1]); + } + } + + idx += mst_con_cnt; + } else { + idx++; + } + } + + /* Complete sorting. Assign relavant result to dm->secure_display_ctx.phy_id_mapping[]*/ + memset(dm->secure_display_ctx.phy_id_mapping, 0, sizeof(dm->secure_display_ctx.phy_id_mapping)); + for (idx = 0; idx < connector_cnt; idx++) { + aconnector = sort_connector[idx]; + + dm->secure_display_ctx.phy_id_mapping[idx].assigned = true; + dm->secure_display_ctx.phy_id_mapping[idx].is_mst = false; + dm->secure_display_ctx.phy_id_mapping[idx].enc_hw_inst = aconnector->dc_link->link_enc_hw_inst; + + if (sort_connector[idx]->mst_root) { + dm->secure_display_ctx.phy_id_mapping[idx].is_mst = true; + dm->secure_display_ctx.phy_id_mapping[idx].lct = aconnector->mst_output_port->parent->lct; + dm->secure_display_ctx.phy_id_mapping[idx].port_num = aconnector->mst_output_port->port_num; + memcpy(dm->secure_display_ctx.phy_id_mapping[idx].rad, + aconnector->mst_output_port->parent->rad, sizeof(aconnector->mst_output_port->parent->rad)); + } + } + mutex_unlock(&ddev->mode_config.mutex); + + dm->secure_display_ctx.phy_id_mapping_cnt = connector_cnt; + dm->secure_display_ctx.phy_mapping_updated = true; +} + +static bool get_phy_id(struct amdgpu_display_manager *dm, + struct amdgpu_dm_connector *aconnector, uint8_t *phy_id) +{ + int idx, idx_2; + bool found = false; + + /* + * Assume secure display start after all connectors are probed. The connection + * config is static as well + */ + if (!dm->secure_display_ctx.phy_mapping_updated) { + DRM_WARN("%s Should update the phy id table before get it's value", __func__); + return false; + } + + for (idx = 0; idx < dm->secure_display_ctx.phy_id_mapping_cnt; idx++) { + if (!dm->secure_display_ctx.phy_id_mapping[idx].assigned) { + DRM_ERROR("phy_id_mapping[%d] should be assigned", idx); + return false; + } + + if (aconnector->dc_link->link_enc_hw_inst == + dm->secure_display_ctx.phy_id_mapping[idx].enc_hw_inst) { + if (!dm->secure_display_ctx.phy_id_mapping[idx].is_mst) { + found = true; + goto out; + } else { + /* Could caused by wrongly pass mst root connector */ + if (!aconnector->mst_output_port) { + DRM_ERROR("%s Check mst case but connector without a port assigned", __func__); + return false; + } + + if (aconnector->mst_root && + aconnector->mst_root->mst_mgr.mst_primary == NULL) { + DRM_WARN("%s pass in a stale mst connector", __func__); + } + + if (aconnector->mst_output_port->parent->lct == dm->secure_display_ctx.phy_id_mapping[idx].lct && + aconnector->mst_output_port->port_num == dm->secure_display_ctx.phy_id_mapping[idx].port_num) { + if (aconnector->mst_output_port->parent->lct == 1) { + found = true; + goto out; + } else if (aconnector->mst_output_port->parent->lct > 1) { + /* Check RAD */ + for (idx_2 = 0; idx_2 < aconnector->mst_output_port->parent->lct - 1; idx_2++) { + int shift = (idx_2 % 2) ? 0 : 4; + int port_num = (aconnector->mst_output_port->parent->rad[idx_2 / 2] >> shift) & 0xf; + int port_num2 = (dm->secure_display_ctx.phy_id_mapping[idx].rad[idx_2 / 2] >> shift) & 0xf; + + if (port_num != port_num2) + break; + } + + if (idx_2 == aconnector->mst_output_port->parent->lct - 1) { + found = true; + goto out; + } + } else { + DRM_ERROR("lCT should be >= 1"); + return false; + } + } + } + } + } + +out: + if (found) { + DRM_DEBUG_DRIVER("Associated secure display PHY ID as %d", idx); + *phy_id = idx; + } else { + DRM_WARN("Can't find associated phy ID"); + return false; + } + + return true; +} + static void amdgpu_dm_set_crc_window_default(struct drm_crtc *crtc, struct dc_stream_state *stream) { struct drm_device *drm_dev = crtc->dev; struct amdgpu_display_manager *dm = &drm_to_adev(drm_dev)->dm; struct amdgpu_crtc *acrtc = to_amdgpu_crtc(crtc); bool was_activated; + struct amdgpu_dm_connector *aconnector; + uint8_t phy_id; spin_lock_irq(&drm_dev->event_lock); was_activated = acrtc->dm_irq_params.window_param.activated; @@ -104,24 +313,32 @@ static void amdgpu_dm_set_crc_window_default(struct drm_crtc *crtc, struct dc_st /* Disable secure_display if it was enabled */ if (was_activated) { /* stop ROI update on this crtc */ - flush_work(&dm->secure_display_ctxs[crtc->index].notify_ta_work); - flush_work(&dm->secure_display_ctxs[crtc->index].forward_roi_work); - dc_stream_forward_crc_window(stream, NULL, true); + flush_work(&dm->secure_display_ctx.crtc_ctx[crtc->index].notify_ta_work); + flush_work(&dm->secure_display_ctx.crtc_ctx[crtc->index].forward_roi_work); + + aconnector = (struct amdgpu_dm_connector *)stream->dm_stream_context; + + if (aconnector && get_phy_id(dm, aconnector, &phy_id)) + dc_stream_forward_crc_window(stream, NULL, phy_id, true); + else + DRM_DEBUG_DRIVER("%s Can't find matching phy id", __func__); } } static void amdgpu_dm_crtc_notify_ta_to_read(struct work_struct *work) { - struct secure_display_context *secure_display_ctx; + struct secure_display_crtc_context *crtc_ctx; struct psp_context *psp; struct ta_securedisplay_cmd *securedisplay_cmd; struct drm_crtc *crtc; struct dc_stream_state *stream; + struct amdgpu_dm_connector *aconnector; uint8_t phy_inst; + struct amdgpu_display_manager *dm; int ret; - secure_display_ctx = container_of(work, struct secure_display_context, notify_ta_work); - crtc = secure_display_ctx->crtc; + crtc_ctx = container_of(work, struct secure_display_crtc_context, notify_ta_work); + crtc = crtc_ctx->crtc; if (!crtc) return; @@ -133,8 +350,19 @@ static void amdgpu_dm_crtc_notify_ta_to_read(struct work_struct *work) return; } + dm = &drm_to_adev(crtc->dev)->dm; stream = to_amdgpu_crtc(crtc)->dm_irq_params.stream; - phy_inst = stream->link->link_enc_hw_inst; + aconnector = (struct amdgpu_dm_connector *)stream->dm_stream_context; + if (!aconnector) + return; + + mutex_lock(&crtc->dev->mode_config.mutex); + if (!get_phy_id(dm, aconnector, &phy_inst)) { + DRM_WARN("%s Can't find mapping phy id!", __func__); + mutex_unlock(&crtc->dev->mode_config.mutex); + return; + } + mutex_unlock(&crtc->dev->mode_config.mutex); /* need lock for multiple crtcs to use the command buffer */ mutex_lock(&psp->securedisplay_context.mutex); @@ -160,22 +388,37 @@ static void amdgpu_dm_crtc_notify_ta_to_read(struct work_struct *work) static void amdgpu_dm_forward_crc_window(struct work_struct *work) { - struct secure_display_context *secure_display_ctx; + struct secure_display_crtc_context *crtc_ctx; struct amdgpu_display_manager *dm; struct drm_crtc *crtc; struct dc_stream_state *stream; + struct amdgpu_dm_connector *aconnector; + uint8_t phy_id; - secure_display_ctx = container_of(work, struct secure_display_context, forward_roi_work); - crtc = secure_display_ctx->crtc; + crtc_ctx = container_of(work, struct secure_display_crtc_context, forward_roi_work); + crtc = crtc_ctx->crtc; if (!crtc) return; dm = &drm_to_adev(crtc->dev)->dm; stream = to_amdgpu_crtc(crtc)->dm_irq_params.stream; + aconnector = (struct amdgpu_dm_connector *)stream->dm_stream_context; + + if (!aconnector) + return; + + mutex_lock(&crtc->dev->mode_config.mutex); + if (!get_phy_id(dm, aconnector, &phy_id)) { + DRM_WARN("%s Can't find mapping phy id!", __func__); + mutex_unlock(&crtc->dev->mode_config.mutex); + return; + } + mutex_unlock(&crtc->dev->mode_config.mutex); mutex_lock(&dm->dc_lock); - dc_stream_forward_crc_window(stream, &secure_display_ctx->rect, false); + dc_stream_forward_crc_window(stream, &crtc_ctx->rect, + phy_id, false); mutex_unlock(&dm->dc_lock); } @@ -258,6 +501,10 @@ int amdgpu_dm_crtc_set_crc_source(struct drm_crtc *crtc, const char *src_name) struct drm_crtc_commit *commit; struct dm_crtc_state *crtc_state; struct drm_device *drm_dev = crtc->dev; +#if defined(CONFIG_DRM_AMD_SECURE_DISPLAY) + struct amdgpu_device *adev = drm_to_adev(drm_dev); + struct amdgpu_display_manager *dm = &adev->dm; +#endif struct amdgpu_crtc *acrtc = to_amdgpu_crtc(crtc); struct drm_dp_aux *aux = NULL; bool enable = false; @@ -402,6 +649,12 @@ int amdgpu_dm_crtc_set_crc_source(struct drm_crtc *crtc, const char *src_name) /* Reset crc_skipped on dm state */ crtc_state->crc_skip_count = 0; +#if defined(CONFIG_DRM_AMD_SECURE_DISPLAY) + /* Initialize phy id mapping table for secure display*/ + if (!dm->secure_display_ctx.phy_mapping_updated) + update_phy_id_mapping(adev); +#endif + cleanup: if (commit) drm_crtc_commit_put(commit); @@ -472,7 +725,7 @@ void amdgpu_dm_crtc_handle_crc_window_irq(struct drm_crtc *crtc) enum amdgpu_dm_pipe_crc_source cur_crc_src; struct amdgpu_crtc *acrtc = NULL; struct amdgpu_device *adev = NULL; - struct secure_display_context *secure_display_ctx = NULL; + struct secure_display_crtc_context *crtc_ctx = NULL; unsigned long flags1; if (crtc == NULL) @@ -498,23 +751,23 @@ void amdgpu_dm_crtc_handle_crc_window_irq(struct drm_crtc *crtc) goto cleanup; } - secure_display_ctx = &adev->dm.secure_display_ctxs[acrtc->crtc_id]; - if (WARN_ON(secure_display_ctx->crtc != crtc)) { - /* We have set the crtc when creating secure_display_context, + crtc_ctx = &adev->dm.secure_display_ctx.crtc_ctx[acrtc->crtc_id]; + if (WARN_ON(crtc_ctx->crtc != crtc)) { + /* We have set the crtc when creating secure_display_crtc_context, * don't expect it to be changed here. */ - secure_display_ctx->crtc = crtc; + crtc_ctx->crtc = crtc; } if (acrtc->dm_irq_params.window_param.update_win) { /* prepare work for dmub to update ROI */ - secure_display_ctx->rect.x = acrtc->dm_irq_params.window_param.x_start; - secure_display_ctx->rect.y = acrtc->dm_irq_params.window_param.y_start; - secure_display_ctx->rect.width = acrtc->dm_irq_params.window_param.x_end - + crtc_ctx->rect.x = acrtc->dm_irq_params.window_param.x_start; + crtc_ctx->rect.y = acrtc->dm_irq_params.window_param.y_start; + crtc_ctx->rect.width = acrtc->dm_irq_params.window_param.x_end - acrtc->dm_irq_params.window_param.x_start; - secure_display_ctx->rect.height = acrtc->dm_irq_params.window_param.y_end - + crtc_ctx->rect.height = acrtc->dm_irq_params.window_param.y_end - acrtc->dm_irq_params.window_param.y_start; - schedule_work(&secure_display_ctx->forward_roi_work); + schedule_work(&crtc_ctx->forward_roi_work); acrtc->dm_irq_params.window_param.update_win = false; @@ -527,32 +780,34 @@ void amdgpu_dm_crtc_handle_crc_window_irq(struct drm_crtc *crtc) } else { /* prepare work for psp to read ROI/CRC and send to I2C */ - schedule_work(&secure_display_ctx->notify_ta_work); + schedule_work(&crtc_ctx->notify_ta_work); } cleanup: spin_unlock_irqrestore(&drm_dev->event_lock, flags1); } -struct secure_display_context * -amdgpu_dm_crtc_secure_display_create_contexts(struct amdgpu_device *adev) +void amdgpu_dm_crtc_secure_display_create_contexts(struct amdgpu_device *adev) { - struct secure_display_context *secure_display_ctxs = NULL; + struct secure_display_crtc_context *crtc_ctx = NULL; int i; - secure_display_ctxs = kcalloc(adev->mode_info.num_crtc, - sizeof(struct secure_display_context), + crtc_ctx = kcalloc(adev->mode_info.num_crtc, + sizeof(struct secure_display_crtc_context), GFP_KERNEL); - if (!secure_display_ctxs) - return NULL; + if (!crtc_ctx) { + adev->dm.secure_display_ctx.crtc_ctx = NULL; + return; + } for (i = 0; i < adev->mode_info.num_crtc; i++) { - INIT_WORK(&secure_display_ctxs[i].forward_roi_work, amdgpu_dm_forward_crc_window); - INIT_WORK(&secure_display_ctxs[i].notify_ta_work, amdgpu_dm_crtc_notify_ta_to_read); - secure_display_ctxs[i].crtc = &adev->mode_info.crtcs[i]->base; + INIT_WORK(&crtc_ctx[i].forward_roi_work, amdgpu_dm_forward_crc_window); + INIT_WORK(&crtc_ctx[i].notify_ta_work, amdgpu_dm_crtc_notify_ta_to_read); + crtc_ctx[i].crtc = &adev->mode_info.crtcs[i]->base; } - return secure_display_ctxs; + adev->dm.secure_display_ctx.crtc_ctx = crtc_ctx; + return; } #endif diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crc.h b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crc.h index 748e80ef40d0a541433a36c774bb1d6868367805..4ec2510bc3129cdee9fc1051ca12dcda14fd066d 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crc.h +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crc.h @@ -40,6 +40,17 @@ enum amdgpu_dm_pipe_crc_source { }; #ifdef CONFIG_DRM_AMD_SECURE_DISPLAY +#define MAX_CRTC 6 + +struct phy_id_mapping { + bool assigned; + bool is_mst; + uint8_t enc_hw_inst; + u8 lct; + u8 port_num; + u8 rad[8]; +}; + struct crc_window_param { uint16_t x_start; uint16_t y_start; @@ -53,7 +64,7 @@ struct crc_window_param { int skip_frame_cnt; }; -struct secure_display_context { +struct secure_display_crtc_context { /* work to notify PSP TA*/ struct work_struct notify_ta_work; @@ -65,6 +76,15 @@ struct secure_display_context { /* Region of Interest (ROI) */ struct rect rect; }; + +struct secure_display_context { + + struct secure_display_crtc_context *crtc_ctx; + + bool phy_mapping_updated; + int phy_id_mapping_cnt; + struct phy_id_mapping phy_id_mapping[MAX_CRTC]; +}; #endif static inline bool amdgpu_dm_is_valid_crc_source(enum amdgpu_dm_pipe_crc_source source) @@ -95,8 +115,7 @@ void amdgpu_dm_crtc_handle_crc_irq(struct drm_crtc *crtc); #ifdef CONFIG_DRM_AMD_SECURE_DISPLAY bool amdgpu_dm_crc_window_is_activated(struct drm_crtc *crtc); void amdgpu_dm_crtc_handle_crc_window_irq(struct drm_crtc *crtc); -struct secure_display_context *amdgpu_dm_crtc_secure_display_create_contexts( - struct amdgpu_device *adev); +void amdgpu_dm_crtc_secure_display_create_contexts(struct amdgpu_device *adev); #else #define amdgpu_dm_crc_window_is_activated(x) #define amdgpu_dm_crtc_handle_crc_window_irq(x) diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_debugfs.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_debugfs.c index 6a97bb2d9160155301b028af882b4d9eea30c323..bc3e7a82b402c467d340262f75f375ce3ebc8a45 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_debugfs.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_debugfs.c @@ -258,7 +258,7 @@ static ssize_t dp_link_settings_write(struct file *f, const char __user *buf, struct dc_link *link = connector->dc_link; struct amdgpu_device *adev = drm_to_adev(connector->base.dev); struct dc *dc = (struct dc *)link->dc; - struct dc_link_settings prefer_link_settings; + struct dc_link_settings prefer_link_settings = {0}; char *wr_buf = NULL; const uint32_t wr_buf_size = 40; /* 0: lane_count; 1: link_rate */ @@ -389,7 +389,7 @@ static ssize_t dp_mst_link_setting(struct file *f, const char __user *buf, struct dc_link *link = aconnector->dc_link; struct amdgpu_device *adev = drm_to_adev(aconnector->base.dev); struct dc *dc = (struct dc *)link->dc; - struct dc_link_settings prefer_link_settings; + struct dc_link_settings prefer_link_settings = {0}; char *wr_buf = NULL; const uint32_t wr_buf_size = 40; /* 0: lane_count; 1: link_rate */ @@ -613,7 +613,7 @@ static ssize_t dp_phy_settings_write(struct file *f, const char __user *buf, uint32_t wr_buf_size = 40; long param[3]; bool use_prefer_link_setting; - struct link_training_settings link_lane_settings; + struct link_training_settings link_lane_settings = {0}; int max_param_num = 3; uint8_t param_nums = 0; int r = 0; @@ -768,7 +768,7 @@ static ssize_t dp_phy_test_pattern_debugfs_write(struct file *f, const char __us LINK_RATE_UNKNOWN, LINK_SPREAD_DISABLED}; struct dc_link_settings cur_link_settings = {LANE_COUNT_UNKNOWN, LINK_RATE_UNKNOWN, LINK_SPREAD_DISABLED}; - struct link_training_settings link_training_settings; + struct link_training_settings link_training_settings = {0}; int i; if (size == 0) @@ -902,9 +902,10 @@ static int dmub_tracebuffer_show(struct seq_file *m, void *data) { struct amdgpu_device *adev = m->private; struct dmub_srv_fb_info *fb_info = adev->dm.dmub_fb_info; + struct dmub_fw_meta_info *fw_meta_info = NULL; struct dmub_debugfs_trace_entry *entries; uint8_t *tbuf_base; - uint32_t tbuf_size, max_entries, num_entries, i; + uint32_t tbuf_size, max_entries, num_entries, first_entry, i; if (!fb_info) return 0; @@ -913,20 +914,42 @@ static int dmub_tracebuffer_show(struct seq_file *m, void *data) if (!tbuf_base) return 0; - tbuf_size = fb_info->fb[DMUB_WINDOW_5_TRACEBUFF].size; + if (adev->dm.dmub_srv) + fw_meta_info = &adev->dm.dmub_srv->meta_info; + + tbuf_size = fw_meta_info ? fw_meta_info->trace_buffer_size : + DMUB_TRACE_BUFFER_SIZE; max_entries = (tbuf_size - sizeof(struct dmub_debugfs_trace_header)) / sizeof(struct dmub_debugfs_trace_entry); num_entries = ((struct dmub_debugfs_trace_header *)tbuf_base)->entry_count; + /* DMCUB tracebuffer is a ring. If it rolled over, print a hint that + * entries are being overwritten. + */ + if (num_entries > max_entries) + seq_printf(m, "...\n"); + + first_entry = num_entries % max_entries; num_entries = min(num_entries, max_entries); entries = (struct dmub_debugfs_trace_entry *)(tbuf_base + sizeof(struct dmub_debugfs_trace_header)); - for (i = 0; i < num_entries; ++i) { + /* To print entries chronologically, start from the first entry till the + * top of buffer, then from base of buffer to first entry. + */ + for (i = first_entry; i < num_entries; ++i) { + struct dmub_debugfs_trace_entry *entry = &entries[i]; + + seq_printf(m, + "trace_code=%u tick_count=%u param0=%u param1=%u\n", + entry->trace_code, entry->tick_count, entry->param0, + entry->param1); + } + for (i = 0; i < first_entry; ++i) { struct dmub_debugfs_trace_entry *entry = &entries[i]; seq_printf(m, diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c index 6e43594906130c1cbea3e510e87d2013903b1bb5..56fa2b9b5d7a3485173b8341207104a4a0215103 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c @@ -590,11 +590,12 @@ dm_dp_add_mst_connector(struct drm_dp_mst_topology_mgr *mgr, amdgpu_dm_set_mst_status(&aconnector->mst_status, MST_PROBE, true); - if (drm_connector_init( + if (drm_connector_dynamic_init( dev, connector, &dm_dp_mst_connector_funcs, - DRM_MODE_CONNECTOR_DisplayPort)) { + DRM_MODE_CONNECTOR_DisplayPort, + NULL)) { kfree(aconnector); return NULL; } @@ -1688,16 +1689,16 @@ clean_exit: return ret; } -static unsigned int kbps_from_pbn(unsigned int pbn) +static uint32_t kbps_from_pbn(unsigned int pbn) { - unsigned int kbps = pbn; + uint64_t kbps = (uint64_t)pbn; kbps *= (1000000 / PEAK_FACTOR_X1000); kbps *= 8; kbps *= 54; kbps /= 64; - return kbps; + return (uint32_t)kbps; } static bool is_dsc_common_config_possible(struct dc_stream_state *stream, diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c index 495e3cd70426db0182cb2811bc6d5d09f52f8a4b..1ec4e4b9ea94551c2be45fca65068206ec6e094a 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c @@ -26,6 +26,7 @@ #include #include +#include "drm/drm_framebuffer.h" #include #include #include @@ -309,8 +310,7 @@ static int amdgpu_dm_plane_fill_gfx9_plane_attributes_from_modifiers(struct amdg const struct plane_size *plane_size, union dc_tiling_info *tiling_info, struct dc_plane_dcc_param *dcc, - struct dc_plane_address *address, - const bool force_disable_dcc) + struct dc_plane_address *address) { const uint64_t modifier = afb->base.modifier; int ret = 0; @@ -318,7 +318,7 @@ static int amdgpu_dm_plane_fill_gfx9_plane_attributes_from_modifiers(struct amdg amdgpu_dm_plane_fill_gfx9_tiling_info_from_modifier(adev, tiling_info, modifier); tiling_info->gfx9.swizzle = amdgpu_dm_plane_modifier_gfx9_swizzle_mode(modifier); - if (amdgpu_dm_plane_modifier_has_dcc(modifier) && !force_disable_dcc) { + if (amdgpu_dm_plane_modifier_has_dcc(modifier)) { uint64_t dcc_address = afb->address + afb->base.offsets[1]; bool independent_64b_blks = AMD_FMT_MOD_GET(DCC_INDEPENDENT_64B, modifier); bool independent_128b_blks = AMD_FMT_MOD_GET(DCC_INDEPENDENT_128B, modifier); @@ -360,8 +360,7 @@ static int amdgpu_dm_plane_fill_gfx12_plane_attributes_from_modifiers(struct amd const struct plane_size *plane_size, union dc_tiling_info *tiling_info, struct dc_plane_dcc_param *dcc, - struct dc_plane_address *address, - const bool force_disable_dcc) + struct dc_plane_address *address) { const uint64_t modifier = afb->base.modifier; int ret = 0; @@ -371,7 +370,7 @@ static int amdgpu_dm_plane_fill_gfx12_plane_attributes_from_modifiers(struct amd tiling_info->gfx9.swizzle = amdgpu_dm_plane_modifier_gfx9_swizzle_mode(modifier); - if (amdgpu_dm_plane_modifier_has_dcc(modifier) && !force_disable_dcc) { + if (amdgpu_dm_plane_modifier_has_dcc(modifier)) { int max_compressed_block = AMD_FMT_MOD_GET(DCC_MAX_COMPRESSED_BLOCK, modifier); dcc->enable = 1; @@ -839,8 +838,7 @@ int amdgpu_dm_plane_fill_plane_buffer_attributes(struct amdgpu_device *adev, struct plane_size *plane_size, struct dc_plane_dcc_param *dcc, struct dc_plane_address *address, - bool tmz_surface, - bool force_disable_dcc) + bool tmz_surface) { const struct drm_framebuffer *fb = &afb->base; int ret; @@ -900,16 +898,14 @@ int amdgpu_dm_plane_fill_plane_buffer_attributes(struct amdgpu_device *adev, ret = amdgpu_dm_plane_fill_gfx12_plane_attributes_from_modifiers(adev, afb, format, rotation, plane_size, tiling_info, dcc, - address, - force_disable_dcc); + address); if (ret) return ret; } else if (adev->family >= AMDGPU_FAMILY_AI) { ret = amdgpu_dm_plane_fill_gfx9_plane_attributes_from_modifiers(adev, afb, format, rotation, plane_size, tiling_info, dcc, - address, - force_disable_dcc); + address); if (ret) return ret; } else { @@ -1000,14 +996,13 @@ static int amdgpu_dm_plane_helper_prepare_fb(struct drm_plane *plane, dm_plane_state_old->dc_state != dm_plane_state_new->dc_state) { struct dc_plane_state *plane_state = dm_plane_state_new->dc_state; - bool force_disable_dcc = !plane_state->dcc.enable; amdgpu_dm_plane_fill_plane_buffer_attributes( adev, afb, plane_state->format, plane_state->rotation, afb->tiling_flags, &plane_state->tiling_info, &plane_state->plane_size, &plane_state->dcc, &plane_state->address, - afb->tmz_surface, force_disable_dcc); + afb->tmz_surface); } return 0; @@ -1421,6 +1416,20 @@ static void amdgpu_dm_plane_atomic_async_update(struct drm_plane *plane, amdgpu_dm_plane_handle_cursor_update(plane, old_state); } +static void amdgpu_dm_plane_panic_flush(struct drm_plane *plane) +{ + struct dm_plane_state *dm_plane_state = to_dm_plane_state(plane->state); + struct drm_framebuffer *fb = plane->state->fb; + struct dc_plane_state *dc_plane_state; + + if (!dm_plane_state || !dm_plane_state->dc_state) + return; + + dc_plane_state = dm_plane_state->dc_state; + + dc_plane_force_update_for_panic(dc_plane_state, fb->modifier ? true : false); +} + static const struct drm_plane_helper_funcs dm_plane_helper_funcs = { .prepare_fb = amdgpu_dm_plane_helper_prepare_fb, .cleanup_fb = amdgpu_dm_plane_helper_cleanup_fb, @@ -1429,6 +1438,16 @@ static const struct drm_plane_helper_funcs dm_plane_helper_funcs = { .atomic_async_update = amdgpu_dm_plane_atomic_async_update }; +static const struct drm_plane_helper_funcs dm_primary_plane_helper_funcs = { + .prepare_fb = amdgpu_dm_plane_helper_prepare_fb, + .cleanup_fb = amdgpu_dm_plane_helper_cleanup_fb, + .atomic_check = amdgpu_dm_plane_atomic_check, + .atomic_async_check = amdgpu_dm_plane_atomic_async_check, + .atomic_async_update = amdgpu_dm_plane_atomic_async_update, + .get_scanout_buffer = amdgpu_display_get_scanout_buffer, + .panic_flush = amdgpu_dm_plane_panic_flush, +}; + static void amdgpu_dm_plane_drm_plane_reset(struct drm_plane *plane) { struct dm_plane_state *amdgpu_state = NULL; @@ -1855,7 +1874,10 @@ int amdgpu_dm_plane_init(struct amdgpu_display_manager *dm, plane->type != DRM_PLANE_TYPE_CURSOR) drm_plane_enable_fb_damage_clips(plane); - drm_plane_helper_add(plane, &dm_plane_helper_funcs); + if (plane->type == DRM_PLANE_TYPE_PRIMARY) + drm_plane_helper_add(plane, &dm_primary_plane_helper_funcs); + else + drm_plane_helper_add(plane, &dm_plane_helper_funcs); #ifdef AMD_PRIVATE_COLOR dm_atomic_plane_attach_color_mgmt_properties(dm, plane); diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.h b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.h index 6498359bff6f68725d488e6dce88ce8da8e8a644..2eef13b1c05a4b80d1b85cf9038c47977ec5f542 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.h +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.h @@ -51,8 +51,7 @@ int amdgpu_dm_plane_fill_plane_buffer_attributes(struct amdgpu_device *adev, struct plane_size *plane_size, struct dc_plane_dcc_param *dcc, struct dc_plane_address *address, - bool tmz_surface, - bool force_disable_dcc); + bool tmz_surface); int amdgpu_dm_plane_init(struct amdgpu_display_manager *dm, struct drm_plane *plane, diff --git a/drivers/gpu/drm/amd/display/dc/clk_mgr/Makefile b/drivers/gpu/drm/amd/display/dc/clk_mgr/Makefile index ab1132bc896a3264907dbdc3a0599ef8b9a2e559..d9955c5d2e5ed59d0ae3105d7e56a9adbe2bde3d 100644 --- a/drivers/gpu/drm/amd/display/dc/clk_mgr/Makefile +++ b/drivers/gpu/drm/amd/display/dc/clk_mgr/Makefile @@ -174,7 +174,7 @@ AMD_DISPLAY_FILES += $(AMD_DAL_CLK_MGR_DCN32) ############################################################################### # DCN35 ############################################################################### -CLK_MGR_DCN35 = dcn35_smu.o dcn35_clk_mgr.o +CLK_MGR_DCN35 = dcn35_smu.o dcn351_clk_mgr.o dcn35_clk_mgr.o AMD_DAL_CLK_MGR_DCN35 = $(addprefix $(AMDDALPATH)/dc/clk_mgr/dcn35/,$(CLK_MGR_DCN35)) diff --git a/drivers/gpu/drm/amd/display/dc/clk_mgr/clk_mgr.c b/drivers/gpu/drm/amd/display/dc/clk_mgr/clk_mgr.c index 0e243f4344d050e9199c36a837dea9ca038021e1..4c3e58c730b11c23af96e05e5f3df319f32ecf07 100644 --- a/drivers/gpu/drm/amd/display/dc/clk_mgr/clk_mgr.c +++ b/drivers/gpu/drm/amd/display/dc/clk_mgr/clk_mgr.c @@ -355,8 +355,11 @@ struct clk_mgr *dc_clk_mgr_create(struct dc_context *ctx, struct pp_smu_funcs *p BREAK_TO_DEBUGGER(); return NULL; } + if (ctx->dce_version == DCN_VERSION_3_51) + dcn351_clk_mgr_construct(ctx, clk_mgr, pp_smu, dccg); + else + dcn35_clk_mgr_construct(ctx, clk_mgr, pp_smu, dccg); - dcn35_clk_mgr_construct(ctx, clk_mgr, pp_smu, dccg); return &clk_mgr->base.base; } break; diff --git a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn201/dcn201_clk_mgr.c b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn201/dcn201_clk_mgr.c index 7920f6f1aa6215b438c3fda60123db4b306c77e2..76c612ecfe3cdb311a57f1f1f3095ad5095653a0 100644 --- a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn201/dcn201_clk_mgr.c +++ b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn201/dcn201_clk_mgr.c @@ -34,8 +34,8 @@ #include "dm_services.h" #include "cyan_skillfish_ip_offset.h" -#include "dcn/dcn_2_0_3_offset.h" -#include "dcn/dcn_2_0_3_sh_mask.h" +#include "dcn/dcn_2_0_1_offset.h" +#include "dcn/dcn_2_0_1_sh_mask.h" #include "clk/clk_11_0_1_offset.h" #include "clk/clk_11_0_1_sh_mask.h" diff --git a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn35/dcn351_clk_mgr.c b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn35/dcn351_clk_mgr.c new file mode 100644 index 0000000000000000000000000000000000000000..6a6ae618650b6de429963a0daad76d18ea00b7cb --- /dev/null +++ b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn35/dcn351_clk_mgr.c @@ -0,0 +1,140 @@ +/* + * Copyright 2024 Advanced Micro Devices, Inc. + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR + * OTHER DEALINGS IN THE SOFTWARE. + * + * Authors: AMD + * + */ + +#include "core_types.h" +#include "dcn35_clk_mgr.h" + +#define DCN_BASE__INST0_SEG1 0x000000C0 +#define mmCLK1_CLK_PLL_REQ 0x16E37 + +#define mmCLK1_CLK0_DFS_CNTL 0x16E69 +#define mmCLK1_CLK1_DFS_CNTL 0x16E6C +#define mmCLK1_CLK2_DFS_CNTL 0x16E6F +#define mmCLK1_CLK3_DFS_CNTL 0x16E72 +#define mmCLK1_CLK4_DFS_CNTL 0x16E75 +#define mmCLK1_CLK5_DFS_CNTL 0x16E78 + +#define mmCLK1_CLK0_CURRENT_CNT 0x16EFC +#define mmCLK1_CLK1_CURRENT_CNT 0x16EFD +#define mmCLK1_CLK2_CURRENT_CNT 0x16EFE +#define mmCLK1_CLK3_CURRENT_CNT 0x16EFF +#define mmCLK1_CLK4_CURRENT_CNT 0x16F00 +#define mmCLK1_CLK5_CURRENT_CNT 0x16F01 + +#define mmCLK1_CLK0_BYPASS_CNTL 0x16E8A +#define mmCLK1_CLK1_BYPASS_CNTL 0x16E93 +#define mmCLK1_CLK2_BYPASS_CNTL 0x16E9C +#define mmCLK1_CLK3_BYPASS_CNTL 0x16EA5 +#define mmCLK1_CLK4_BYPASS_CNTL 0x16EAE +#define mmCLK1_CLK5_BYPASS_CNTL 0x16EB7 + +#define mmCLK1_CLK0_DS_CNTL 0x16E83 +#define mmCLK1_CLK1_DS_CNTL 0x16E8C +#define mmCLK1_CLK2_DS_CNTL 0x16E95 +#define mmCLK1_CLK3_DS_CNTL 0x16E9E +#define mmCLK1_CLK4_DS_CNTL 0x16EA7 +#define mmCLK1_CLK5_DS_CNTL 0x16EB0 + +#define mmCLK1_CLK0_ALLOW_DS 0x16E84 +#define mmCLK1_CLK1_ALLOW_DS 0x16E8D +#define mmCLK1_CLK2_ALLOW_DS 0x16E96 +#define mmCLK1_CLK3_ALLOW_DS 0x16E9F +#define mmCLK1_CLK4_ALLOW_DS 0x16EA8 +#define mmCLK1_CLK5_ALLOW_DS 0x16EB1 + +#define mmCLK5_spll_field_8 0x1B04B +#define mmDENTIST_DISPCLK_CNTL 0x0124 +#define regDENTIST_DISPCLK_CNTL 0x0064 +#define regDENTIST_DISPCLK_CNTL_BASE_IDX 1 + +#define CLK1_CLK_PLL_REQ__FbMult_int__SHIFT 0x0 +#define CLK1_CLK_PLL_REQ__PllSpineDiv__SHIFT 0xc +#define CLK1_CLK_PLL_REQ__FbMult_frac__SHIFT 0x10 +#define CLK1_CLK_PLL_REQ__FbMult_int_MASK 0x000001FFL +#define CLK1_CLK_PLL_REQ__PllSpineDiv_MASK 0x0000F000L +#define CLK1_CLK_PLL_REQ__FbMult_frac_MASK 0xFFFF0000L + +#define CLK1_CLK2_BYPASS_CNTL__CLK2_BYPASS_SEL_MASK 0x00000007L + +// DENTIST_DISPCLK_CNTL +#define DENTIST_DISPCLK_CNTL__DENTIST_DISPCLK_WDIVIDER__SHIFT 0x0 +#define DENTIST_DISPCLK_CNTL__DENTIST_DISPCLK_RDIVIDER__SHIFT 0x8 +#define DENTIST_DISPCLK_CNTL__DENTIST_DISPCLK_CHG_DONE__SHIFT 0x13 +#define DENTIST_DISPCLK_CNTL__DENTIST_DPPCLK_CHG_DONE__SHIFT 0x14 +#define DENTIST_DISPCLK_CNTL__DENTIST_DPPCLK_WDIVIDER__SHIFT 0x18 +#define DENTIST_DISPCLK_CNTL__DENTIST_DISPCLK_WDIVIDER_MASK 0x0000007FL +#define DENTIST_DISPCLK_CNTL__DENTIST_DISPCLK_RDIVIDER_MASK 0x00007F00L +#define DENTIST_DISPCLK_CNTL__DENTIST_DISPCLK_CHG_DONE_MASK 0x00080000L +#define DENTIST_DISPCLK_CNTL__DENTIST_DPPCLK_CHG_DONE_MASK 0x00100000L +#define DENTIST_DISPCLK_CNTL__DENTIST_DPPCLK_WDIVIDER_MASK 0x7F000000L + +#define CLK5_spll_field_8__spll_ssc_en_MASK 0x00002000L + +#define REG(reg) \ + (clk_mgr->regs->reg) + +#define BASE_INNER(seg) DCN_BASE__INST0_SEG ## seg + +#define BASE(seg) BASE_INNER(seg) + +#define SR(reg_name)\ + .reg_name = BASE(reg ## reg_name ## _BASE_IDX) + \ + reg ## reg_name + +#define CLK_SR_DCN35(reg_name)\ + .reg_name = mm ## reg_name + +static const struct clk_mgr_registers clk_mgr_regs_dcn351 = { + CLK_REG_LIST_DCN35() +}; + +static const struct clk_mgr_shift clk_mgr_shift_dcn351 = { + CLK_COMMON_MASK_SH_LIST_DCN32(__SHIFT) +}; + +static const struct clk_mgr_mask clk_mgr_mask_dcn351 = { + CLK_COMMON_MASK_SH_LIST_DCN32(_MASK) +}; + +#define TO_CLK_MGR_DCN35(clk_mgr)\ + container_of(clk_mgr, struct clk_mgr_dcn35, base) + + +void dcn351_clk_mgr_construct( + struct dc_context *ctx, + struct clk_mgr_dcn35 *clk_mgr, + struct pp_smu_funcs *pp_smu, + struct dccg *dccg) +{ + /*register offset changed*/ + clk_mgr->base.regs = &clk_mgr_regs_dcn351; + clk_mgr->base.clk_mgr_shift = &clk_mgr_shift_dcn351; + clk_mgr->base.clk_mgr_mask = &clk_mgr_mask_dcn351; + + dcn35_clk_mgr_construct(ctx, clk_mgr, pp_smu, dccg); + +} + + diff --git a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn35/dcn35_clk_mgr.c b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn35/dcn35_clk_mgr.c index b77333817f18954ca8abe1e03760a52953701b07..2a74140d7ebff47bfcd9383f22ab6610ea535912 100644 --- a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn35/dcn35_clk_mgr.c +++ b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn35/dcn35_clk_mgr.c @@ -36,15 +36,11 @@ #include "dcn20/dcn20_clk_mgr.h" - - #include "reg_helper.h" #include "core_types.h" #include "dcn35_smu.h" #include "dm_helpers.h" -/* TODO: remove this include once we ported over remaining clk mgr functions*/ -#include "dcn30/dcn30_clk_mgr.h" #include "dcn31/dcn31_clk_mgr.h" #include "dc_dmub_srv.h" @@ -55,35 +51,102 @@ #define DC_LOGGER \ clk_mgr->base.base.ctx->logger +#define DCN_BASE__INST0_SEG1 0x000000C0 +#define mmCLK1_CLK_PLL_REQ 0x16E37 + +#define mmCLK1_CLK0_DFS_CNTL 0x16E69 +#define mmCLK1_CLK1_DFS_CNTL 0x16E6C +#define mmCLK1_CLK2_DFS_CNTL 0x16E6F +#define mmCLK1_CLK3_DFS_CNTL 0x16E72 +#define mmCLK1_CLK4_DFS_CNTL 0x16E75 +#define mmCLK1_CLK5_DFS_CNTL 0x16E78 + +#define mmCLK1_CLK0_CURRENT_CNT 0x16EFB +#define mmCLK1_CLK1_CURRENT_CNT 0x16EFC +#define mmCLK1_CLK2_CURRENT_CNT 0x16EFD +#define mmCLK1_CLK3_CURRENT_CNT 0x16EFE +#define mmCLK1_CLK4_CURRENT_CNT 0x16EFF +#define mmCLK1_CLK5_CURRENT_CNT 0x16F00 + +#define mmCLK1_CLK0_BYPASS_CNTL 0x16E8A +#define mmCLK1_CLK1_BYPASS_CNTL 0x16E93 +#define mmCLK1_CLK2_BYPASS_CNTL 0x16E9C +#define mmCLK1_CLK3_BYPASS_CNTL 0x16EA5 +#define mmCLK1_CLK4_BYPASS_CNTL 0x16EAE +#define mmCLK1_CLK5_BYPASS_CNTL 0x16EB7 + +#define mmCLK1_CLK0_DS_CNTL 0x16E83 +#define mmCLK1_CLK1_DS_CNTL 0x16E8C +#define mmCLK1_CLK2_DS_CNTL 0x16E95 +#define mmCLK1_CLK3_DS_CNTL 0x16E9E +#define mmCLK1_CLK4_DS_CNTL 0x16EA7 +#define mmCLK1_CLK5_DS_CNTL 0x16EB0 + +#define mmCLK1_CLK0_ALLOW_DS 0x16E84 +#define mmCLK1_CLK1_ALLOW_DS 0x16E8D +#define mmCLK1_CLK2_ALLOW_DS 0x16E96 +#define mmCLK1_CLK3_ALLOW_DS 0x16E9F +#define mmCLK1_CLK4_ALLOW_DS 0x16EA8 +#define mmCLK1_CLK5_ALLOW_DS 0x16EB1 + +#define mmCLK5_spll_field_8 0x1B04B +#define mmDENTIST_DISPCLK_CNTL 0x0124 +#define regDENTIST_DISPCLK_CNTL 0x0064 +#define regDENTIST_DISPCLK_CNTL_BASE_IDX 1 + +#define CLK1_CLK_PLL_REQ__FbMult_int__SHIFT 0x0 +#define CLK1_CLK_PLL_REQ__PllSpineDiv__SHIFT 0xc +#define CLK1_CLK_PLL_REQ__FbMult_frac__SHIFT 0x10 +#define CLK1_CLK_PLL_REQ__FbMult_int_MASK 0x000001FFL +#define CLK1_CLK_PLL_REQ__PllSpineDiv_MASK 0x0000F000L +#define CLK1_CLK_PLL_REQ__FbMult_frac_MASK 0xFFFF0000L + +#define CLK1_CLK2_BYPASS_CNTL__CLK2_BYPASS_SEL_MASK 0x00000007L +#define CLK1_CLK2_BYPASS_CNTL__CLK2_BYPASS_DIV_MASK 0x000F0000L +// DENTIST_DISPCLK_CNTL +#define DENTIST_DISPCLK_CNTL__DENTIST_DISPCLK_WDIVIDER__SHIFT 0x0 +#define DENTIST_DISPCLK_CNTL__DENTIST_DISPCLK_RDIVIDER__SHIFT 0x8 +#define DENTIST_DISPCLK_CNTL__DENTIST_DISPCLK_CHG_DONE__SHIFT 0x13 +#define DENTIST_DISPCLK_CNTL__DENTIST_DPPCLK_CHG_DONE__SHIFT 0x14 +#define DENTIST_DISPCLK_CNTL__DENTIST_DPPCLK_WDIVIDER__SHIFT 0x18 +#define DENTIST_DISPCLK_CNTL__DENTIST_DISPCLK_WDIVIDER_MASK 0x0000007FL +#define DENTIST_DISPCLK_CNTL__DENTIST_DISPCLK_RDIVIDER_MASK 0x00007F00L +#define DENTIST_DISPCLK_CNTL__DENTIST_DISPCLK_CHG_DONE_MASK 0x00080000L +#define DENTIST_DISPCLK_CNTL__DENTIST_DPPCLK_CHG_DONE_MASK 0x00100000L +#define DENTIST_DISPCLK_CNTL__DENTIST_DPPCLK_WDIVIDER_MASK 0x7F000000L + +#define CLK5_spll_field_8__spll_ssc_en_MASK 0x00002000L + +#define SMU_VER_THRESHOLD 0x5D4A00 //93.74.0 +#undef FN +#define FN(reg_name, field_name) \ + clk_mgr->clk_mgr_shift->field_name, clk_mgr->clk_mgr_mask->field_name -#define regCLK1_CLK_PLL_REQ 0x0237 -#define regCLK1_CLK_PLL_REQ_BASE_IDX 0 +#define REG(reg) \ + (clk_mgr->regs->reg) -#define CLK1_CLK_PLL_REQ__FbMult_int__SHIFT 0x0 -#define CLK1_CLK_PLL_REQ__PllSpineDiv__SHIFT 0xc -#define CLK1_CLK_PLL_REQ__FbMult_frac__SHIFT 0x10 -#define CLK1_CLK_PLL_REQ__FbMult_int_MASK 0x000001FFL -#define CLK1_CLK_PLL_REQ__PllSpineDiv_MASK 0x0000F000L -#define CLK1_CLK_PLL_REQ__FbMult_frac_MASK 0xFFFF0000L +#define BASE_INNER(seg) DCN_BASE__INST0_SEG ## seg -#define regCLK1_CLK2_BYPASS_CNTL 0x029c -#define regCLK1_CLK2_BYPASS_CNTL_BASE_IDX 0 +#define BASE(seg) BASE_INNER(seg) -#define CLK1_CLK2_BYPASS_CNTL__CLK2_BYPASS_SEL__SHIFT 0x0 -#define CLK1_CLK2_BYPASS_CNTL__CLK2_BYPASS_DIV__SHIFT 0x10 -#define CLK1_CLK2_BYPASS_CNTL__CLK2_BYPASS_SEL_MASK 0x00000007L -#define CLK1_CLK2_BYPASS_CNTL__CLK2_BYPASS_DIV_MASK 0x000F0000L +#define SR(reg_name)\ + .reg_name = BASE(reg ## reg_name ## _BASE_IDX) + \ + reg ## reg_name -#define regCLK5_0_CLK5_spll_field_8 0x464b -#define regCLK5_0_CLK5_spll_field_8_BASE_IDX 0 +#define CLK_SR_DCN35(reg_name)\ + .reg_name = mm ## reg_name -#define CLK5_0_CLK5_spll_field_8__spll_ssc_en__SHIFT 0xd -#define CLK5_0_CLK5_spll_field_8__spll_ssc_en_MASK 0x00002000L +static const struct clk_mgr_registers clk_mgr_regs_dcn35 = { + CLK_REG_LIST_DCN35() +}; -#define SMU_VER_THRESHOLD 0x5D4A00 //93.74.0 +static const struct clk_mgr_shift clk_mgr_shift_dcn35 = { + CLK_COMMON_MASK_SH_LIST_DCN32(__SHIFT) +}; -#define REG(reg_name) \ - (ctx->clk_reg_offsets[reg ## reg_name ## _BASE_IDX] + reg ## reg_name) +static const struct clk_mgr_mask clk_mgr_mask_dcn35 = { + CLK_COMMON_MASK_SH_LIST_DCN32(_MASK) +}; #define TO_CLK_MGR_DCN35(clk_mgr)\ container_of(clk_mgr, struct clk_mgr_dcn35, base) @@ -452,7 +515,6 @@ static int get_vco_frequency_from_reg(struct clk_mgr_internal *clk_mgr) struct fixed31_32 pll_req; unsigned int fbmult_frac_val = 0; unsigned int fbmult_int_val = 0; - struct dc_context *ctx = clk_mgr->base.ctx; /* * Register value of fbmult is in 8.16 format, we are converting to 314.32 @@ -512,12 +574,12 @@ static void dcn35_dump_clk_registers(struct clk_state_registers_and_bypass *regs static bool dcn35_is_spll_ssc_enabled(struct clk_mgr *clk_mgr_base) { struct clk_mgr_internal *clk_mgr = TO_CLK_MGR_INTERNAL(clk_mgr_base); - struct dc_context *ctx = clk_mgr->base.ctx; + uint32_t ssc_enable; - REG_GET(CLK5_0_CLK5_spll_field_8, spll_ssc_en, &ssc_enable); + ssc_enable = REG_READ(CLK5_spll_field_8) & CLK5_spll_field_8__spll_ssc_en_MASK; - return ssc_enable == 1; + return ssc_enable != 0; } static void init_clk_states(struct clk_mgr *clk_mgr) @@ -632,6 +694,7 @@ static struct wm_table lpddr5_wm_table = { }; static DpmClocks_t_dcn35 dummy_clocks; +static DpmClocks_t_dcn351 dummy_clocks_dcn351; static struct dcn35_watermarks dummy_wms = { 0 }; @@ -642,10 +705,10 @@ static struct dcn35_ss_info_table ss_info_table = { static void dcn35_read_ss_info_from_lut(struct clk_mgr_internal *clk_mgr) { - struct dc_context *ctx = clk_mgr->base.ctx; - uint32_t clock_source; + uint32_t clock_source = 0; + + clock_source = REG_READ(CLK1_CLK2_BYPASS_CNTL) & CLK1_CLK2_BYPASS_CNTL__CLK2_BYPASS_SEL_MASK; - REG_GET(CLK1_CLK2_BYPASS_CNTL, CLK2_BYPASS_SEL, &clock_source); // If it's DFS mode, clock_source is 0. if (dcn35_is_spll_ssc_enabled(&clk_mgr->base) && (clock_source < ARRAY_SIZE(ss_info_table.ss_percentage))) { clk_mgr->dprefclk_ss_percentage = ss_info_table.ss_percentage[clock_source]; @@ -755,6 +818,22 @@ static void dcn35_get_dpm_table_from_smu(struct clk_mgr_internal *clk_mgr, dcn35_smu_transfer_dpm_table_smu_2_dram(clk_mgr); } +static void dcn351_get_dpm_table_from_smu(struct clk_mgr_internal *clk_mgr, + struct dcn351_smu_dpm_clks *smu_dpm_clks) +{ + DpmClocks_t_dcn351 *table = smu_dpm_clks->dpm_clks; + + if (!clk_mgr->smu_ver) + return; + if (!table || smu_dpm_clks->mc_address.quad_part == 0) + return; + memset(table, 0, sizeof(*table)); + dcn35_smu_set_dram_addr_high(clk_mgr, + smu_dpm_clks->mc_address.high_part); + dcn35_smu_set_dram_addr_low(clk_mgr, + smu_dpm_clks->mc_address.low_part); + dcn35_smu_transfer_dpm_table_smu_2_dram(clk_mgr); +} static uint32_t find_max_clk_value(const uint32_t clocks[], uint32_t num_clocks) { uint32_t max = 0; @@ -1093,6 +1172,57 @@ struct clk_mgr_funcs dcn35_fpga_funcs = { .get_dtb_ref_clk_frequency = dcn31_get_dtb_ref_freq_khz, }; +static void translate_to_DpmClocks_t_dcn35(struct dcn351_smu_dpm_clks *smu_dpm_clks_a, + struct dcn35_smu_dpm_clks *smu_dpm_clks_b) +{ + /*translate two structures and only take need clock tables*/ + uint8_t i; + + if (smu_dpm_clks_a == NULL || smu_dpm_clks_b == NULL || + smu_dpm_clks_a->dpm_clks == NULL || smu_dpm_clks_b->dpm_clks == NULL) + return; + + for (i = 0; i < NUM_DCFCLK_DPM_LEVELS; i++) + smu_dpm_clks_b->dpm_clks->DcfClocks[i] = smu_dpm_clks_a->dpm_clks->DcfClocks[i]; + + for (i = 0; i < NUM_DISPCLK_DPM_LEVELS; i++) + smu_dpm_clks_b->dpm_clks->DispClocks[i] = smu_dpm_clks_a->dpm_clks->DispClocks[i]; + + for (i = 0; i < NUM_DPPCLK_DPM_LEVELS; i++) + smu_dpm_clks_b->dpm_clks->DppClocks[i] = smu_dpm_clks_a->dpm_clks->DppClocks[i]; + + for (i = 0; i < NUM_FCLK_DPM_LEVELS; i++) { + smu_dpm_clks_b->dpm_clks->FclkClocks_Freq[i] = smu_dpm_clks_a->dpm_clks->FclkClocks_Freq[i]; + smu_dpm_clks_b->dpm_clks->FclkClocks_Voltage[i] = smu_dpm_clks_a->dpm_clks->FclkClocks_Voltage[i]; + } + for (i = 0; i < NUM_MEM_PSTATE_LEVELS; i++) { + smu_dpm_clks_b->dpm_clks->MemPstateTable[i].MemClk = + smu_dpm_clks_a->dpm_clks->MemPstateTable[i].MemClk; + smu_dpm_clks_b->dpm_clks->MemPstateTable[i].UClk = + smu_dpm_clks_a->dpm_clks->MemPstateTable[i].UClk; + smu_dpm_clks_b->dpm_clks->MemPstateTable[i].Voltage = + smu_dpm_clks_a->dpm_clks->MemPstateTable[i].Voltage; + smu_dpm_clks_b->dpm_clks->MemPstateTable[i].WckRatio = + smu_dpm_clks_a->dpm_clks->MemPstateTable[i].WckRatio; + } + smu_dpm_clks_b->dpm_clks->MaxGfxClk = smu_dpm_clks_a->dpm_clks->MaxGfxClk; + smu_dpm_clks_b->dpm_clks->MinGfxClk = smu_dpm_clks_a->dpm_clks->MinGfxClk; + smu_dpm_clks_b->dpm_clks->NumDcfClkLevelsEnabled = + smu_dpm_clks_a->dpm_clks->NumDcfClkLevelsEnabled; + smu_dpm_clks_b->dpm_clks->NumDispClkLevelsEnabled = + smu_dpm_clks_a->dpm_clks->NumDispClkLevelsEnabled; + smu_dpm_clks_b->dpm_clks->NumFclkLevelsEnabled = + smu_dpm_clks_a->dpm_clks->NumFclkLevelsEnabled; + smu_dpm_clks_b->dpm_clks->NumMemPstatesEnabled = + smu_dpm_clks_a->dpm_clks->NumMemPstatesEnabled; + smu_dpm_clks_b->dpm_clks->NumSocClkLevelsEnabled = + smu_dpm_clks_a->dpm_clks->NumSocClkLevelsEnabled; + + for (i = 0; i < NUM_SOC_VOLTAGE_LEVELS; i++) { + smu_dpm_clks_b->dpm_clks->SocClocks[i] = smu_dpm_clks_a->dpm_clks->SocClocks[i]; + smu_dpm_clks_b->dpm_clks->SocVoltage[i] = smu_dpm_clks_a->dpm_clks->SocVoltage[i]; + } +} void dcn35_clk_mgr_construct( struct dc_context *ctx, struct clk_mgr_dcn35 *clk_mgr, @@ -1100,6 +1230,7 @@ void dcn35_clk_mgr_construct( struct dccg *dccg) { struct dcn35_smu_dpm_clks smu_dpm_clks = { 0 }; + struct dcn351_smu_dpm_clks smu_dpm_clks_dcn351 = { 0 }; clk_mgr->base.base.ctx = ctx; clk_mgr->base.base.funcs = &dcn35_funcs; @@ -1112,6 +1243,12 @@ void dcn35_clk_mgr_construct( clk_mgr->base.dprefclk_ss_divider = 1000; clk_mgr->base.ss_on_dprefclk = false; clk_mgr->base.dfs_ref_freq_khz = 48000; + if (ctx->dce_version == DCN_VERSION_3_5) { + clk_mgr->base.regs = &clk_mgr_regs_dcn35; + clk_mgr->base.clk_mgr_shift = &clk_mgr_shift_dcn35; + clk_mgr->base.clk_mgr_mask = &clk_mgr_mask_dcn35; + } + clk_mgr->smu_wm_set.wm_set = (struct dcn35_watermarks *)dm_helpers_allocate_gpu_mem( clk_mgr->base.base.ctx, @@ -1130,14 +1267,24 @@ void dcn35_clk_mgr_construct( DC_MEM_ALLOC_TYPE_GART, sizeof(DpmClocks_t_dcn35), &smu_dpm_clks.mc_address.quad_part); - if (smu_dpm_clks.dpm_clks == NULL) { smu_dpm_clks.dpm_clks = &dummy_clocks; smu_dpm_clks.mc_address.quad_part = 0; } - ASSERT(smu_dpm_clks.dpm_clks); + if (ctx->dce_version == DCN_VERSION_3_51) { + smu_dpm_clks_dcn351.dpm_clks = (DpmClocks_t_dcn351 *)dm_helpers_allocate_gpu_mem( + clk_mgr->base.base.ctx, + DC_MEM_ALLOC_TYPE_GART, + sizeof(DpmClocks_t_dcn351), + &smu_dpm_clks_dcn351.mc_address.quad_part); + if (smu_dpm_clks_dcn351.dpm_clks == NULL) { + smu_dpm_clks_dcn351.dpm_clks = &dummy_clocks_dcn351; + smu_dpm_clks_dcn351.mc_address.quad_part = 0; + } + } + clk_mgr->base.smu_ver = dcn35_smu_get_smu_version(&clk_mgr->base); if (clk_mgr->base.smu_ver) @@ -1166,7 +1313,11 @@ void dcn35_clk_mgr_construct( if (clk_mgr->base.base.ctx->dc->debug.pstate_enabled) { int i; - dcn35_get_dpm_table_from_smu(&clk_mgr->base, &smu_dpm_clks); + if (ctx->dce_version == DCN_VERSION_3_51) { + dcn351_get_dpm_table_from_smu(&clk_mgr->base, &smu_dpm_clks_dcn351); + translate_to_DpmClocks_t_dcn35(&smu_dpm_clks_dcn351, &smu_dpm_clks); + } else + dcn35_get_dpm_table_from_smu(&clk_mgr->base, &smu_dpm_clks); DC_LOG_SMU("NumDcfClkLevelsEnabled: %d\n" "NumDispClkLevelsEnabled: %d\n" "NumSocClkLevelsEnabled: %d\n" @@ -1227,6 +1378,10 @@ void dcn35_clk_mgr_construct( dm_helpers_free_gpu_mem(clk_mgr->base.base.ctx, DC_MEM_ALLOC_TYPE_GART, smu_dpm_clks.dpm_clks); + if (smu_dpm_clks_dcn351.dpm_clks && smu_dpm_clks_dcn351.mc_address.quad_part != 0) + dm_helpers_free_gpu_mem(clk_mgr->base.base.ctx, DC_MEM_ALLOC_TYPE_GART, + smu_dpm_clks_dcn351.dpm_clks); + if (ctx->dc->config.disable_ips != DMUB_IPS_DISABLE_ALL) { bool ips_support = false; diff --git a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn35/dcn35_clk_mgr.h b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn35/dcn35_clk_mgr.h index 1203dc605b12c434807fe7e96d4c2a5ee7f7d558..a12a9bf90806edf9ddaeccd3423e563cd0955793 100644 --- a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn35/dcn35_clk_mgr.h +++ b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn35/dcn35_clk_mgr.h @@ -60,4 +60,8 @@ void dcn35_clk_mgr_construct(struct dc_context *ctx, void dcn35_clk_mgr_destroy(struct clk_mgr_internal *clk_mgr_int); +void dcn351_clk_mgr_construct(struct dc_context *ctx, + struct clk_mgr_dcn35 *clk_mgr, + struct pp_smu_funcs *pp_smu, + struct dccg *dccg); #endif //__DCN35_CLK_MGR_H__ diff --git a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn35/dcn35_smu.h b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn35/dcn35_smu.h index 3fae13c73934ce2fd84d8ad67a971107fbe5b06f..ab9d21ba0c43cda3eb9cfc489b5ac279c72015aa 100644 --- a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn35/dcn35_smu.h +++ b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn35/dcn35_smu.h @@ -126,18 +126,31 @@ typedef struct { uint32_t MaxGfxClk; } DpmClocks_t_dcn35; - -// Throttler Status Bitmask - - - - - - - - - - +typedef struct { + uint32_t DcfClocks[NUM_DCFCLK_DPM_LEVELS]; + uint32_t DispClocks[NUM_DISPCLK_DPM_LEVELS]; + uint32_t DppClocks[NUM_DPPCLK_DPM_LEVELS]; + uint32_t SocClocks[NUM_SOCCLK_DPM_LEVELS]; + uint32_t VClocks0[NUM_VCN_DPM_LEVELS]; + uint32_t VClocks1[NUM_VCN_DPM_LEVELS]; + uint32_t DClocks0[NUM_VCN_DPM_LEVELS]; + uint32_t DClocks1[NUM_VCN_DPM_LEVELS]; + uint32_t VPEClocks[NUM_VPE_DPM_LEVELS]; + uint32_t FclkClocks_Freq[NUM_FCLK_DPM_LEVELS]; + uint32_t FclkClocks_Voltage[NUM_FCLK_DPM_LEVELS]; + uint32_t SocVoltage[NUM_SOC_VOLTAGE_LEVELS]; + MemPstateTable_t MemPstateTable[NUM_MEM_PSTATE_LEVELS]; + uint8_t NumDcfClkLevelsEnabled; + uint8_t NumDispClkLevelsEnabled; // Applies to both Dispclk and Dppclk + uint8_t NumSocClkLevelsEnabled; + uint8_t Vcn0ClkLevelsEnabled; // Applies to both Vclk0 and Dclk0 + uint8_t Vcn1ClkLevelsEnabled; // Applies to both Vclk1 and Dclk1 + uint8_t VpeClkLevelsEnabled; + uint8_t NumMemPstatesEnabled; + uint8_t NumFclkLevelsEnabled; + uint32_t MinGfxClk; + uint32_t MaxGfxClk; +} DpmClocks_t_dcn351; #define TABLE_BIOS_IF 0 // Called by BIOS #define TABLE_WATERMARKS 1 // Called by DAL through VBIOS @@ -163,6 +176,10 @@ struct dcn35_smu_dpm_clks { union large_integer mc_address; }; +struct dcn351_smu_dpm_clks { + DpmClocks_t_dcn351 *dpm_clks; + union large_integer mc_address; +}; /* TODO: taken from vgh, may not be correct */ struct display_idle_optimization { unsigned int df_request_disabled : 1; diff --git a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn401/dcn401_clk_mgr.c b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn401/dcn401_clk_mgr.c index 8cfc5f4359374d1d93b5b66405d2c71f630fcc0c..5b4e1e8a9ae204e9479ec8dc459bac5f2c3c8e77 100644 --- a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn401/dcn401_clk_mgr.c +++ b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn401/dcn401_clk_mgr.c @@ -141,6 +141,20 @@ static bool dcn401_is_ppclk_idle_dpm_enabled(struct clk_mgr_internal *clk_mgr, P return ppclk_idle_dpm_enabled; } +static bool dcn401_is_df_throttle_opt_enabled(struct clk_mgr_internal *clk_mgr) +{ + bool is_df_throttle_opt_enabled = false; + + if (ASICREV_IS_GC_12_0_1_A0(clk_mgr->base.ctx->asic_id.hw_internal_rev) && + clk_mgr->smu_ver >= 0x663500) { + is_df_throttle_opt_enabled = !clk_mgr->base.ctx->dc->debug.force_subvp_df_throttle; + } + + is_df_throttle_opt_enabled &= clk_mgr->smu_present; + + return is_df_throttle_opt_enabled; +} + /* Query SMU for all clock states for a particular clock */ static void dcn401_init_single_clock(struct clk_mgr_internal *clk_mgr, PPCLK_e clk, unsigned int *entry_0, unsigned int *num_levels) @@ -869,6 +883,12 @@ static void dcn401_execute_block_sequence(struct clk_mgr *clk_mgr_base, unsigned params->update_idle_hardmin_params.uclk_mhz, params->update_idle_hardmin_params.fclk_mhz); break; + case CLK_MGR401_UPDATE_SUBVP_HARDMINS: + dcn401_smu_set_subvp_uclk_fclk_hardmin( + clk_mgr_internal, + params->update_idle_hardmin_params.uclk_mhz, + params->update_idle_hardmin_params.fclk_mhz); + break; case CLK_MGR401_UPDATE_DEEP_SLEEP_DCFCLK: dcn401_smu_set_min_deep_sleep_dcef_clk( clk_mgr_internal, @@ -945,15 +965,21 @@ static unsigned int dcn401_build_update_bandwidth_clocks_sequence( bool update_active_uclk = false; bool update_idle_fclk = false; bool update_idle_uclk = false; + bool update_subvp_prefetch_dramclk = false; + bool update_subvp_prefetch_fclk = false; bool is_idle_dpm_enabled = dcn401_is_ppclk_dpm_enabled(clk_mgr_internal, PPCLK_UCLK) && dcn401_is_ppclk_dpm_enabled(clk_mgr_internal, PPCLK_FCLK) && dcn401_is_ppclk_idle_dpm_enabled(clk_mgr_internal, PPCLK_UCLK) && dcn401_is_ppclk_idle_dpm_enabled(clk_mgr_internal, PPCLK_FCLK); + bool is_df_throttle_opt_enabled = is_idle_dpm_enabled && + dcn401_is_df_throttle_opt_enabled(clk_mgr_internal); int total_plane_count = clk_mgr_helper_get_active_plane_cnt(dc, context); int active_uclk_mhz = khz_to_mhz_ceil(clk_mgr_base->clks.dramclk_khz); int active_fclk_mhz = khz_to_mhz_ceil(clk_mgr_base->clks.fclk_khz); int idle_uclk_mhz = khz_to_mhz_ceil(clk_mgr_base->clks.idle_dramclk_khz); int idle_fclk_mhz = khz_to_mhz_ceil(clk_mgr_base->clks.idle_fclk_khz); + int subvp_prefetch_dramclk_mhz = khz_to_mhz_ceil(clk_mgr_base->clks.subvp_prefetch_dramclk_khz); + int subvp_prefetch_fclk_mhz = khz_to_mhz_ceil(clk_mgr_base->clks.subvp_prefetch_fclk_khz); unsigned int num_steps = 0; @@ -1109,6 +1135,12 @@ static unsigned int dcn401_build_update_bandwidth_clocks_sequence( } } + if (should_set_clock(safe_to_lower, new_clocks->subvp_prefetch_dramclk_khz, clk_mgr_base->clks.subvp_prefetch_dramclk_khz)) { + clk_mgr_base->clks.subvp_prefetch_dramclk_khz = new_clocks->subvp_prefetch_dramclk_khz; + update_subvp_prefetch_dramclk = true; + subvp_prefetch_dramclk_mhz = khz_to_mhz_ceil(clk_mgr_base->clks.subvp_prefetch_dramclk_khz); + } + /* FCLK */ /* Always update saved value, even if new value not set due to P-State switching unsupported */ if (should_set_clock(safe_to_lower, new_clocks->fclk_khz, clk_mgr_base->clks.fclk_khz)) { @@ -1129,6 +1161,12 @@ static unsigned int dcn401_build_update_bandwidth_clocks_sequence( } } + if (should_set_clock(safe_to_lower, new_clocks->subvp_prefetch_fclk_khz, clk_mgr_base->clks.subvp_prefetch_fclk_khz)) { + clk_mgr_base->clks.subvp_prefetch_fclk_khz = new_clocks->subvp_prefetch_fclk_khz; + update_subvp_prefetch_fclk = true; + subvp_prefetch_fclk_mhz = khz_to_mhz_ceil(clk_mgr_base->clks.subvp_prefetch_fclk_khz); + } + /* When idle DPM is enabled, need to send active and idle hardmins separately */ /* CLK_MGR401_UPDATE_ACTIVE_HARDMINS */ if ((update_active_uclk || update_active_fclk) && is_idle_dpm_enabled) { @@ -1146,6 +1184,14 @@ static unsigned int dcn401_build_update_bandwidth_clocks_sequence( num_steps++; } + /* CLK_MGR401_UPDATE_SUBVP_HARDMINS */ + if ((update_subvp_prefetch_dramclk || update_subvp_prefetch_fclk) && is_df_throttle_opt_enabled) { + block_sequence[num_steps].params.update_idle_hardmin_params.uclk_mhz = subvp_prefetch_dramclk_mhz; + block_sequence[num_steps].params.update_idle_hardmin_params.fclk_mhz = subvp_prefetch_fclk_mhz; + block_sequence[num_steps].func = CLK_MGR401_UPDATE_SUBVP_HARDMINS; + num_steps++; + } + /* set UCLK to requested value if P-State switching is supported, or to re-enable P-State switching */ if (update_active_uclk || update_idle_uclk) { if (!is_idle_dpm_enabled) { diff --git a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn401/dcn401_clk_mgr.h b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn401/dcn401_clk_mgr.h index 8b0461992b2296353700ffefa7cc9e93850628e2..6c9ae5ca2c7e96975e384d3b76799ffd65c839d4 100644 --- a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn401/dcn401_clk_mgr.h +++ b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn401/dcn401_clk_mgr.h @@ -90,6 +90,7 @@ enum dcn401_clk_mgr_block_sequence_func { CLK_MGR401_UPDATE_DTBCLK_DTO, CLK_MGR401_UPDATE_DENTIST, CLK_MGR401_UPDATE_PSR_WAIT_LOOP, + CLK_MGR401_UPDATE_SUBVP_HARDMINS, }; struct dcn401_clk_mgr_block_sequence { diff --git a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn401/dcn401_clk_mgr_smu_msg.c b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn401/dcn401_clk_mgr_smu_msg.c index 7700477d019b080200c405274e9d2b06ec75eb1a..b02a41179b41dd938335101933213a2c2d3b4db6 100644 --- a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn401/dcn401_clk_mgr_smu_msg.c +++ b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn401/dcn401_clk_mgr_smu_msg.c @@ -21,6 +21,11 @@ #define smu_print(str, ...) {DC_LOG_SMU(str, ##__VA_ARGS__); } +/* temporary define */ +#ifndef DALSMC_MSG_SubvpUclkFclk +#define DALSMC_MSG_SubvpUclkFclk 0x1B +#endif + /* * Function to be used instead of REG_WAIT macro because the wait ends when * the register is NOT EQUAL to zero, and because the translation in msg_if.h @@ -296,6 +301,24 @@ bool dcn401_smu_set_active_uclk_fclk_hardmin(struct clk_mgr_internal *clk_mgr, return success; } +bool dcn401_smu_set_subvp_uclk_fclk_hardmin(struct clk_mgr_internal *clk_mgr, + uint16_t uclk_freq_mhz, + uint16_t fclk_freq_mhz) +{ + uint32_t response = 0; + bool success; + + /* 15:0 for uclk, 32:16 for fclk */ + uint32_t param = (fclk_freq_mhz << 16) | uclk_freq_mhz; + + smu_print("SMU Set active hardmin by freq: uclk_freq_mhz = %d MHz, fclk_freq_mhz = %d MHz\n", uclk_freq_mhz, fclk_freq_mhz); + + success = dcn401_smu_send_msg_with_param(clk_mgr, + DALSMC_MSG_SubvpUclkFclk, param, &response); + + return success; +} + void dcn401_smu_set_min_deep_sleep_dcef_clk(struct clk_mgr_internal *clk_mgr, uint32_t freq_mhz) { smu_print("SMU Set min deep sleep dcef clk: freq_mhz = %d MHz\n", freq_mhz); diff --git a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn401/dcn401_clk_mgr_smu_msg.h b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn401/dcn401_clk_mgr_smu_msg.h index 651fb8d6286455f2e9e9f1aebb51852dc9ee6b55..42cf7885a7cb01500c0c595e9b898517ec9025a5 100644 --- a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn401/dcn401_clk_mgr_smu_msg.h +++ b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn401/dcn401_clk_mgr_smu_msg.h @@ -23,6 +23,9 @@ bool dcn401_smu_set_idle_uclk_fclk_hardmin(struct clk_mgr_internal *clk_mgr, bool dcn401_smu_set_active_uclk_fclk_hardmin(struct clk_mgr_internal *clk_mgr, uint16_t uclk_freq_mhz, uint16_t fclk_freq_mhz); +bool dcn401_smu_set_subvp_uclk_fclk_hardmin(struct clk_mgr_internal *clk_mgr, + uint16_t uclk_freq_mhz, + uint16_t fclk_freq_mhz); void dcn401_smu_set_min_deep_sleep_dcef_clk(struct clk_mgr_internal *clk_mgr, uint32_t freq_mhz); void dcn401_smu_set_num_of_displays(struct clk_mgr_internal *clk_mgr, uint32_t num_displays); diff --git a/drivers/gpu/drm/amd/display/dc/core/dc.c b/drivers/gpu/drm/amd/display/dc/core/dc.c index 49fe7dcf9372e49d3ac32fe74d618fd6617e8cee..dfa36368ae63f616b110fbddd95e03314ea54bbe 100644 --- a/drivers/gpu/drm/amd/display/dc/core/dc.c +++ b/drivers/gpu/drm/amd/display/dc/core/dc.c @@ -579,7 +579,7 @@ dc_stream_forward_dmcu_crc_window(struct dmcu *dmcu, bool dc_stream_forward_crc_window(struct dc_stream_state *stream, - struct rect *rect, bool is_stop) + struct rect *rect, uint8_t phy_id, bool is_stop) { struct dmcu *dmcu; struct dc_dmub_srv *dmub_srv; @@ -598,7 +598,7 @@ dc_stream_forward_crc_window(struct dc_stream_state *stream, if (i == MAX_PIPES) return false; - mux_mapping.phy_output_num = stream->link->link_enc_hw_inst; + mux_mapping.phy_output_num = phy_id; mux_mapping.otg_output_num = pipe->stream_res.tg->inst; dmcu = dc->res_pool->dmcu; @@ -2153,6 +2153,11 @@ enum dc_status dc_commit_streams(struct dc *dc, struct dc_commit_streams_params struct dc_stream_state *stream = params->streams[i]; struct dc_stream_status *status = dc_stream_get_status(stream); + /* revalidate streams */ + res = dc_validate_stream(dc, stream); + if (res != DC_OK) + return res; + dc_stream_log(dc, stream); set[i].stream = stream; @@ -2982,6 +2987,10 @@ static void copy_surface_update_to_plane( if (srf_update->cursor_csc_color_matrix) surface->cursor_csc_color_matrix = *srf_update->cursor_csc_color_matrix; + + if (srf_update->bias_and_scale.bias_and_scale_valid) + surface->bias_and_scale = + srf_update->bias_and_scale; } static void copy_stream_update_to_stream(struct dc *dc, @@ -5307,11 +5316,13 @@ void dc_set_power_state(struct dc *dc, enum dc_acpi_cm_power_state power_state) dc->vm_pa_config.valid) { dc->hwss.init_sys_ctx(dc->hwseq, dc, &dc->vm_pa_config); } - + /*mark d0 last*/ + dc->power_state = power_state; break; default: ASSERT(dc->current_state->stream_count == 0); - + /*mark d3 first*/ + dc->power_state = power_state; dc_dmub_srv_notify_fw_dc_power_state(dc->ctx->dmub_srv, power_state); dc_state_destruct(dc->current_state); @@ -6056,7 +6067,7 @@ void dc_query_current_properties(struct dc *dc, struct dc_current_properties *pr bool subvp_sw_cursor_req = false; for (i = 0; i < dc->current_state->stream_count; i++) { - if (check_subvp_sw_cursor_fallback_req(dc, dc->current_state->streams[i])) { + if (check_subvp_sw_cursor_fallback_req(dc, dc->current_state->streams[i]) && !dc->current_state->streams[i]->hw_cursor_req) { subvp_sw_cursor_req = true; break; } diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_stream.c b/drivers/gpu/drm/amd/display/dc/core/dc_stream.c index 55dc482d9b3660d8274e37b78f5144562bc134d7..701b7e4f29204ca4d68dc1af5d3278adb61ebbee 100644 --- a/drivers/gpu/drm/amd/display/dc/core/dc_stream.c +++ b/drivers/gpu/drm/amd/display/dc/core/dc_stream.c @@ -605,17 +605,6 @@ bool dc_stream_remove_writeback(struct dc *dc, return true; } -bool dc_stream_warmup_writeback(struct dc *dc, - int num_dwb, - struct dc_writeback_info *wb_info) -{ - dc_exit_ips_for_hw_access(dc); - - if (dc->hwss.mmhubbub_warmup) - return dc->hwss.mmhubbub_warmup(dc, num_dwb, wb_info); - else - return false; -} uint32_t dc_stream_get_vblank_counter(const struct dc_stream_state *stream) { uint8_t i; diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_surface.c b/drivers/gpu/drm/amd/display/dc/core/dc_surface.c index ccbb15f1638c8b73e45602333d8e3d42b6548f26..f3471d45b312fa22fff39ec530205d878d8b291a 100644 --- a/drivers/gpu/drm/amd/display/dc/core/dc_surface.c +++ b/drivers/gpu/drm/amd/display/dc/core/dc_surface.c @@ -83,13 +83,6 @@ uint8_t dc_plane_get_pipe_mask(struct dc_state *dc_state, const struct dc_plane /******************************************************************************* * Public functions ******************************************************************************/ -void enable_surface_flip_reporting(struct dc_plane_state *plane_state, - uint32_t controller_id) -{ - plane_state->irq_source = controller_id + DC_IRQ_SOURCE_PFLIP1 - 1; - /*register_flip_interrupt(surface);*/ -} - struct dc_plane_state *dc_create_plane_state(const struct dc *dc) { struct dc_plane_state *plane_state = kvzalloc(sizeof(*plane_state), @@ -277,4 +270,50 @@ void dc_3dlut_func_retain(struct dc_3dlut *lut) kref_get(&lut->refcount); } +void dc_plane_force_update_for_panic(struct dc_plane_state *plane_state, + bool clear_tiling) +{ + struct dc *dc; + int i; + + if (!plane_state) + return; + + dc = plane_state->ctx->dc; + if (!dc || !dc->current_state) + return; + + for (i = 0; i < dc->res_pool->pipe_count; i++) { + struct pipe_ctx *pipe_ctx = &dc->current_state->res_ctx.pipe_ctx[i]; + + if (!pipe_ctx) + continue; + + if (dc->ctx->dce_version >= DCE_VERSION_MAX) { + struct hubp *hubp = pipe_ctx->plane_res.hubp; + if (!hubp) + continue; + /* if framebuffer is tiled, disable tiling */ + if (clear_tiling && hubp->funcs->hubp_clear_tiling) + hubp->funcs->hubp_clear_tiling(hubp); + + /* force page flip to see the new content of the framebuffer */ + hubp->funcs->hubp_program_surface_flip_and_addr(hubp, + &plane_state->address, + true); + } else { + struct mem_input *mi = pipe_ctx->plane_res.mi; + if (!mi) + continue; + /* if framebuffer is tiled, disable tiling */ + if (clear_tiling && mi->funcs->mem_input_clear_tiling) + mi->funcs->mem_input_clear_tiling(mi); + + /* force page flip to see the new content of the framebuffer */ + mi->funcs->mem_input_program_surface_flip_and_addr(mi, + &plane_state->address, + true); + } + } +} diff --git a/drivers/gpu/drm/amd/display/dc/dc.h b/drivers/gpu/drm/amd/display/dc/dc.h index e9b9126c0401705da8e4d96fee96e18c06ea2773..8c6347413038d8fe2cba59e32c30f263aca7f418 100644 --- a/drivers/gpu/drm/amd/display/dc/dc.h +++ b/drivers/gpu/drm/amd/display/dc/dc.h @@ -55,7 +55,7 @@ struct aux_payload; struct set_config_cmd_payload; struct dmub_notification; -#define DC_VER "3.2.310" +#define DC_VER "3.2.314" #define MAX_SURFACES 3 #define MAX_PLANES 6 @@ -463,6 +463,7 @@ struct dc_config { bool enable_auto_dpm_test_logs; unsigned int disable_ips; unsigned int disable_ips_in_vpb; + bool disable_ips_in_dpms_off; bool usb4_bw_alloc_support; bool allow_0_dtb_clk; bool use_assr_psp_message; @@ -628,6 +629,8 @@ struct dc_clocks { int bw_dispclk_khz; int idle_dramclk_khz; int idle_fclk_khz; + int subvp_prefetch_dramclk_khz; + int subvp_prefetch_fclk_khz; }; struct dc_bw_validation_profile { @@ -1055,6 +1058,7 @@ struct dc_debug_options { bool dml21_force_pstate_method; uint32_t dml21_force_pstate_method_values[MAX_PIPES]; uint32_t dml21_disable_pstate_method_mask; + union fw_assisted_mclk_switch_version fams_version; union dmub_fams2_global_feature_config fams2_config; bool enable_legacy_clock_update; unsigned int force_cositing; @@ -1070,6 +1074,7 @@ struct dc_debug_options { bool skip_full_updated_if_possible; unsigned int enable_oled_edp_power_up_opt; bool enable_hblank_borrow; + bool force_subvp_df_throttle; }; @@ -1526,6 +1531,7 @@ struct dc_surface_update { const struct dc_cm2_parameters *cm2_params; const struct dc_csc_transform *cursor_csc_color_matrix; unsigned int sdr_white_level_nits; + struct dc_bias_and_scale bias_and_scale; }; /* diff --git a/drivers/gpu/drm/amd/display/dc/dc_dmub_srv.c b/drivers/gpu/drm/amd/display/dc/dc_dmub_srv.c index f90fc154549a8041b1aafd24e0b2cca4f1cf8033..44ff9abe2880f362ff7a05e1c9edc3f2f13c4334 100644 --- a/drivers/gpu/drm/amd/display/dc/dc_dmub_srv.c +++ b/drivers/gpu/drm/amd/display/dc/dc_dmub_srv.c @@ -1245,7 +1245,7 @@ static int count_active_streams(const struct dc *dc) for (i = 0; i < dc->current_state->stream_count; ++i) { struct dc_stream_state *stream = dc->current_state->streams[i]; - if (stream && !stream->dpms_off) + if (stream && (!stream->dpms_off || dc->config.disable_ips_in_dpms_off)) count += 1; } @@ -1694,10 +1694,10 @@ void dc_dmub_srv_fams2_update_config(struct dc *dc, { uint8_t num_cmds = 1; uint32_t i; - union dmub_rb_cmd cmd[MAX_STREAMS + 1]; + union dmub_rb_cmd cmd[2 * MAX_STREAMS + 1]; struct dmub_rb_cmd_fams2 *global_cmd = &cmd[0].fams2_config; - memset(cmd, 0, sizeof(union dmub_rb_cmd) * (MAX_STREAMS + 1)); + memset(cmd, 0, sizeof(union dmub_rb_cmd) * (2 * MAX_STREAMS + 1)); /* fill in generic command header */ global_cmd->header.type = DMUB_CMD__FW_ASSISTED_MCLK_SWITCH; global_cmd->header.sub_type = DMUB_CMD__FAMS2_CONFIG; @@ -1714,17 +1714,26 @@ void dc_dmub_srv_fams2_update_config(struct dc *dc, /* construct per-stream configs */ for (i = 0; i < context->bw_ctx.bw.dcn.fams2_global_config.num_streams; i++) { - struct dmub_rb_cmd_fams2 *stream_cmd = &cmd[i+1].fams2_config; + struct dmub_rb_cmd_fams2 *stream_base_cmd = &cmd[i+1].fams2_config; + struct dmub_rb_cmd_fams2 *stream_sub_state_cmd = &cmd[i+1+context->bw_ctx.bw.dcn.fams2_global_config.num_streams].fams2_config; /* configure command header */ - stream_cmd->header.type = DMUB_CMD__FW_ASSISTED_MCLK_SWITCH; - stream_cmd->header.sub_type = DMUB_CMD__FAMS2_CONFIG; - stream_cmd->header.payload_bytes = sizeof(struct dmub_rb_cmd_fams2) - sizeof(struct dmub_cmd_header); - stream_cmd->header.multi_cmd_pending = 1; - /* copy stream static state */ - memcpy(&stream_cmd->config.stream, - &context->bw_ctx.bw.dcn.fams2_stream_params[i], - sizeof(struct dmub_fams2_stream_static_state)); + stream_base_cmd->header.type = DMUB_CMD__FW_ASSISTED_MCLK_SWITCH; + stream_base_cmd->header.sub_type = DMUB_CMD__FAMS2_CONFIG; + stream_base_cmd->header.payload_bytes = sizeof(struct dmub_rb_cmd_fams2) - sizeof(struct dmub_cmd_header); + stream_base_cmd->header.multi_cmd_pending = 1; + stream_sub_state_cmd->header.type = DMUB_CMD__FW_ASSISTED_MCLK_SWITCH; + stream_sub_state_cmd->header.sub_type = DMUB_CMD__FAMS2_CONFIG; + stream_sub_state_cmd->header.payload_bytes = sizeof(struct dmub_rb_cmd_fams2) - sizeof(struct dmub_cmd_header); + stream_sub_state_cmd->header.multi_cmd_pending = 1; + /* copy stream static base state */ + memcpy(&stream_base_cmd->config, + &context->bw_ctx.bw.dcn.fams2_stream_base_params[i], + sizeof(union dmub_cmd_fams2_config)); + /* copy stream static sub state */ + memcpy(&stream_sub_state_cmd->config, + &context->bw_ctx.bw.dcn.fams2_stream_sub_params[i], + sizeof(union dmub_cmd_fams2_config)); } } @@ -1735,8 +1744,8 @@ void dc_dmub_srv_fams2_update_config(struct dc *dc, if (enable && context->bw_ctx.bw.dcn.fams2_global_config.features.bits.enable) { /* set multi pending for global, and unset for last stream cmd */ global_cmd->header.multi_cmd_pending = 1; - cmd[context->bw_ctx.bw.dcn.fams2_global_config.num_streams].fams2_config.header.multi_cmd_pending = 0; - num_cmds += context->bw_ctx.bw.dcn.fams2_global_config.num_streams; + cmd[2 * context->bw_ctx.bw.dcn.fams2_global_config.num_streams].fams2_config.header.multi_cmd_pending = 0; + num_cmds += 2 * context->bw_ctx.bw.dcn.fams2_global_config.num_streams; } dm_execute_dmub_cmd_list(dc->ctx, num_cmds, cmd, DM_DMUB_WAIT_TYPE_WAIT); diff --git a/drivers/gpu/drm/amd/display/dc/dc_plane.h b/drivers/gpu/drm/amd/display/dc/dc_plane.h index bd37ec82b42d13bb27825ffe8b6ef67c3661ce76..fabcefeda288cf6543d123bd5861228d725f1853 100644 --- a/drivers/gpu/drm/amd/display/dc/dc_plane.h +++ b/drivers/gpu/drm/amd/display/dc/dc_plane.h @@ -34,4 +34,7 @@ const struct dc_plane_status *dc_plane_get_status( void dc_plane_state_retain(struct dc_plane_state *plane_state); void dc_plane_state_release(struct dc_plane_state *plane_state); +void dc_plane_force_update_for_panic(struct dc_plane_state *plane_state, + bool clear_tiling); + #endif /* _DC_PLANE_H_ */ diff --git a/drivers/gpu/drm/amd/display/dc/dc_spl_translate.c b/drivers/gpu/drm/amd/display/dc/dc_spl_translate.c index 0e310fd48b5c528363a75fc9722319dac2bbbc8a..3518eb1b8cd1cda5c2470fb85df032a81a70aaa5 100644 --- a/drivers/gpu/drm/amd/display/dc/dc_spl_translate.c +++ b/drivers/gpu/drm/amd/display/dc/dc_spl_translate.c @@ -64,6 +64,13 @@ static void populate_inits_from_splinits(struct scl_inits *inits, inits->h_c = dc_fixpt_from_int_dy(spl_inits->h_filter_init_int_c, spl_inits->h_filter_init_frac_c >> 5, 0, 19); inits->v_c = dc_fixpt_from_int_dy(spl_inits->v_filter_init_int_c, spl_inits->v_filter_init_frac_c >> 5, 0, 19); } +static void populate_splformat_from_format(enum spl_pixel_format *spl_pixel_format, const enum pixel_format pixel_format) +{ + if (pixel_format < PIXEL_FORMAT_INVALID) + *spl_pixel_format = (enum spl_pixel_format)pixel_format; + else + *spl_pixel_format = SPL_PIXEL_FORMAT_INVALID; +} /// @brief Translate SPL input parameters from pipe context /// @param pipe_ctx /// @param spl_in @@ -89,7 +96,7 @@ void translate_SPL_in_params_from_pipe_ctx(struct pipe_ctx *pipe_ctx, struct spl spl_in->callbacks = dcn2_spl_callbacks; } // Make format field from spl_in point to plane_res scl_data format - spl_in->basic_in.format = (enum spl_pixel_format)pipe_ctx->plane_res.scl_data.format; + populate_splformat_from_format(&spl_in->basic_in.format, pipe_ctx->plane_res.scl_data.format); // Make view_format from basic_out point to view_format from stream spl_in->basic_out.view_format = (enum spl_view_3d)stream->view_format; // Populate spl input basic input clip rect from plane state clip rect @@ -108,12 +115,14 @@ void translate_SPL_in_params_from_pipe_ctx(struct pipe_ctx *pipe_ctx, struct spl spl_in->basic_in.horizontal_mirror = plane_state->horizontal_mirror; // Calculate horizontal splits and split index - spl_in->basic_in.mpc_combine_h = resource_get_mpc_slice_count(pipe_ctx); + spl_in->basic_in.num_h_slices_recout_width_align.use_recout_width_aligned = false; + spl_in->basic_in.num_h_slices_recout_width_align.num_slices_recout_width.mpc_num_h_slices = + resource_get_mpc_slice_count(pipe_ctx); if (stream->view_format == VIEW_3D_FORMAT_SIDE_BY_SIDE) - spl_in->basic_in.mpc_combine_v = 0; + spl_in->basic_in.mpc_h_slice_index = 0; else - spl_in->basic_in.mpc_combine_v = resource_get_mpc_slice_index(pipe_ctx); + spl_in->basic_in.mpc_h_slice_index = resource_get_mpc_slice_index(pipe_ctx); populate_splrect_from_rect(&spl_in->basic_out.odm_slice_rect, &odm_slice_src); spl_in->basic_out.odm_combine_factor = 0; diff --git a/drivers/gpu/drm/amd/display/dc/dc_stream.h b/drivers/gpu/drm/amd/display/dc/dc_stream.h index 413970588a26da915b350e4a62d29f8e8329240b..713884103aeac973dddcd13b2e102337069e3964 100644 --- a/drivers/gpu/drm/amd/display/dc/dc_stream.h +++ b/drivers/gpu/drm/amd/display/dc/dc_stream.h @@ -447,10 +447,6 @@ enum dc_status dc_stream_add_dsc_to_resource(struct dc *dc, struct dc_state *state, struct dc_stream_state *stream); -bool dc_stream_warmup_writeback(struct dc *dc, - int num_dwb, - struct dc_writeback_info *wb_info); - bool dc_stream_dmdata_status_done(struct dc *dc, struct dc_stream_state *stream); bool dc_stream_set_dynamic_metadata(struct dc *dc, @@ -541,6 +537,7 @@ bool dc_stream_get_crtc_position(struct dc *dc, #if defined(CONFIG_DRM_AMD_SECURE_DISPLAY) bool dc_stream_forward_crc_window(struct dc_stream_state *stream, struct rect *rect, + uint8_t phy_id, bool is_stop); #endif diff --git a/drivers/gpu/drm/amd/display/dc/dce/dce_clock_source.c b/drivers/gpu/drm/amd/display/dc/dce/dce_clock_source.c index b700608e4240302be2a49d458c5099acac3b52be..077337698e0adcacedb5e1d702e0d79f45995b94 100644 --- a/drivers/gpu/drm/amd/display/dc/dce/dce_clock_source.c +++ b/drivers/gpu/drm/amd/display/dc/dce/dce_clock_source.c @@ -1105,6 +1105,9 @@ static bool dcn401_program_pix_clk( &dto_params); } else { + if (pll_settings->actual_pix_clk_100hz > 6000000UL) + return false; + /* disables DP DTO when provided with TMDS signal type */ clock_source->ctx->dc->res_pool->dccg->funcs->set_dp_dto( clock_source->ctx->dc->res_pool->dccg, diff --git a/drivers/gpu/drm/amd/display/dc/dce/dce_mem_input.c b/drivers/gpu/drm/amd/display/dc/dce/dce_mem_input.c index f5e1d9caee4c822255b14d1b44970568aa72882f..ebd174be5786b14f93f581341b287e10977b2486 100644 --- a/drivers/gpu/drm/amd/display/dc/dce/dce_mem_input.c +++ b/drivers/gpu/drm/amd/display/dc/dce/dce_mem_input.c @@ -481,7 +481,6 @@ static void program_tiling( } } - static void program_size_and_rotation( struct dce_mem_input *dce_mi, enum dc_rotation_angle rotation, @@ -627,6 +626,27 @@ static void program_grph_pixel_format( GRPH_PRESCALE_B_SIGN, sign); } +static void dce_mi_clear_tiling( + struct mem_input *mi) +{ + struct dce_mem_input *dce_mi = TO_DCE_MEM_INPUT(mi); + + if (dce_mi->masks->GRPH_SW_MODE) { /* GFX9 */ + REG_UPDATE(GRPH_CONTROL, + GRPH_SW_MODE, DC_SW_LINEAR); + } + + if (dce_mi->masks->GRPH_MICRO_TILE_MODE) { /* GFX8 */ + REG_UPDATE(GRPH_CONTROL, + GRPH_ARRAY_MODE, DC_SW_LINEAR); + } + + if (dce_mi->masks->GRPH_ARRAY_MODE) { /* GFX6 but reuses gfx8 struct */ + REG_UPDATE(GRPH_CONTROL, + GRPH_ARRAY_MODE, DC_SW_LINEAR); + } +} + static void dce_mi_program_surface_config( struct mem_input *mi, enum surface_pixel_format format, @@ -884,7 +904,8 @@ static const struct mem_input_funcs dce_mi_funcs = { .mem_input_program_pte_vm = dce_mi_program_pte_vm, .mem_input_program_surface_config = dce_mi_program_surface_config, - .mem_input_is_flip_pending = dce_mi_is_flip_pending + .mem_input_is_flip_pending = dce_mi_is_flip_pending, + .mem_input_clear_tiling = dce_mi_clear_tiling, }; #if defined(CONFIG_DRM_AMD_DC_SI) @@ -897,7 +918,8 @@ static const struct mem_input_funcs dce60_mi_funcs = { .mem_input_program_pte_vm = dce_mi_program_pte_vm, .mem_input_program_surface_config = dce60_mi_program_surface_config, - .mem_input_is_flip_pending = dce_mi_is_flip_pending + .mem_input_is_flip_pending = dce_mi_is_flip_pending, + .mem_input_clear_tiling = dce_mi_clear_tiling, }; #endif @@ -910,7 +932,8 @@ static const struct mem_input_funcs dce112_mi_funcs = { .mem_input_program_pte_vm = dce_mi_program_pte_vm, .mem_input_program_surface_config = dce_mi_program_surface_config, - .mem_input_is_flip_pending = dce_mi_is_flip_pending + .mem_input_is_flip_pending = dce_mi_is_flip_pending, + .mem_input_clear_tiling = dce_mi_clear_tiling, }; static const struct mem_input_funcs dce120_mi_funcs = { @@ -922,7 +945,8 @@ static const struct mem_input_funcs dce120_mi_funcs = { .mem_input_program_pte_vm = dce_mi_program_pte_vm, .mem_input_program_surface_config = dce_mi_program_surface_config, - .mem_input_is_flip_pending = dce_mi_is_flip_pending + .mem_input_is_flip_pending = dce_mi_is_flip_pending, + .mem_input_clear_tiling = dce_mi_clear_tiling, }; void dce_mem_input_construct( diff --git a/drivers/gpu/drm/amd/display/dc/dce/dmub_psr.c b/drivers/gpu/drm/amd/display/dc/dce/dmub_psr.c index cae18f8c1c9a085f3c24f3e28d434d874e2a3ef2..88c75c243bf8aeb8988619abd86cab67dd0baebf 100644 --- a/drivers/gpu/drm/amd/display/dc/dce/dmub_psr.c +++ b/drivers/gpu/drm/amd/display/dc/dce/dmub_psr.c @@ -390,8 +390,7 @@ static bool dmub_psr_copy_settings(struct dmub_psr *dmub, !memcmp(link->dpcd_caps.sink_dev_id_str, DP_SINK_DEVICE_STR_ID_1, sizeof(DP_SINK_DEVICE_STR_ID_1))) link->psr_settings.force_ffu_mode = 1; - else - link->psr_settings.force_ffu_mode = 0; + copy_settings_data->force_ffu_mode = link->psr_settings.force_ffu_mode; if (((link->dpcd_caps.fec_cap.bits.FEC_CAPABLE && diff --git a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_panel_cntl.c b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_panel_cntl.c index 5738989847268c10d529258e3346ecabe4502b7d..f9961a6446f3ef89690bf890865facff4a40be9d 100644 --- a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_panel_cntl.c +++ b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_panel_cntl.c @@ -168,31 +168,33 @@ void dcn31_panel_cntl_construct( struct dcn31_panel_cntl *dcn31_panel_cntl, const struct panel_cntl_init_data *init_data) { - uint8_t pwrseq_inst = 0xF; dcn31_panel_cntl->base.funcs = &dcn31_link_panel_cntl_funcs; dcn31_panel_cntl->base.ctx = init_data->ctx; dcn31_panel_cntl->base.inst = init_data->inst; - switch (init_data->eng_id) { - case ENGINE_ID_DIGA: - pwrseq_inst = 0; - break; - case ENGINE_ID_DIGB: - pwrseq_inst = 1; - break; - default: - DC_LOG_WARNING("Unsupported pwrseq engine id: %d!\n", init_data->eng_id); - ASSERT(false); - break; - } - - if (dcn31_panel_cntl->base.ctx->dc->config.support_edp0_on_dp1) + if (dcn31_panel_cntl->base.ctx->dc->config.support_edp0_on_dp1) { //If supported, power sequencer mapping shall follow the DIG instance + uint8_t pwrseq_inst = 0xF; + + switch (init_data->eng_id) { + case ENGINE_ID_DIGA: + pwrseq_inst = 0; + break; + case ENGINE_ID_DIGB: + pwrseq_inst = 1; + break; + default: + DC_LOG_WARNING("Unsupported pwrseq engine id: %d!\n", init_data->eng_id); + ASSERT(false); + break; + } + dcn31_panel_cntl->base.pwrseq_inst = pwrseq_inst; - else + } else { /* If not supported, pwrseq will be assigned in order, * so first pwrseq will be assigned to first panel instance (legacy behavior) */ dcn31_panel_cntl->base.pwrseq_inst = dcn31_panel_cntl->base.inst; + } } diff --git a/drivers/gpu/drm/amd/display/dc/dml/calcs/dcn_calcs.c b/drivers/gpu/drm/amd/display/dc/dml/calcs/dcn_calcs.c index 39525721c976bc8f60091e9bd37a3df98b92fdd3..f1235bf9a5965f8b96875ea9856a30bc59d0ffe2 100644 --- a/drivers/gpu/drm/amd/display/dc/dml/calcs/dcn_calcs.c +++ b/drivers/gpu/drm/amd/display/dc/dml/calcs/dcn_calcs.c @@ -1312,138 +1312,6 @@ bool dcn_validate_bandwidth( return false; } -static unsigned int dcn_find_normalized_clock_vdd_Level( - const struct dc *dc, - enum dm_pp_clock_type clocks_type, - int clocks_in_khz) -{ - int vdd_level = dcn_bw_v_min0p65; - - if (clocks_in_khz == 0)/*todo some clock not in the considerations*/ - return vdd_level; - - switch (clocks_type) { - case DM_PP_CLOCK_TYPE_DISPLAY_CLK: - if (clocks_in_khz > dc->dcn_soc->max_dispclk_vmax0p9*1000) { - vdd_level = dcn_bw_v_max0p91; - BREAK_TO_DEBUGGER(); - } else if (clocks_in_khz > dc->dcn_soc->max_dispclk_vnom0p8*1000) { - vdd_level = dcn_bw_v_max0p9; - } else if (clocks_in_khz > dc->dcn_soc->max_dispclk_vmid0p72*1000) { - vdd_level = dcn_bw_v_nom0p8; - } else if (clocks_in_khz > dc->dcn_soc->max_dispclk_vmin0p65*1000) { - vdd_level = dcn_bw_v_mid0p72; - } else - vdd_level = dcn_bw_v_min0p65; - break; - case DM_PP_CLOCK_TYPE_DISPLAYPHYCLK: - if (clocks_in_khz > dc->dcn_soc->phyclkv_max0p9*1000) { - vdd_level = dcn_bw_v_max0p91; - BREAK_TO_DEBUGGER(); - } else if (clocks_in_khz > dc->dcn_soc->phyclkv_nom0p8*1000) { - vdd_level = dcn_bw_v_max0p9; - } else if (clocks_in_khz > dc->dcn_soc->phyclkv_mid0p72*1000) { - vdd_level = dcn_bw_v_nom0p8; - } else if (clocks_in_khz > dc->dcn_soc->phyclkv_min0p65*1000) { - vdd_level = dcn_bw_v_mid0p72; - } else - vdd_level = dcn_bw_v_min0p65; - break; - - case DM_PP_CLOCK_TYPE_DPPCLK: - if (clocks_in_khz > dc->dcn_soc->max_dppclk_vmax0p9*1000) { - vdd_level = dcn_bw_v_max0p91; - BREAK_TO_DEBUGGER(); - } else if (clocks_in_khz > dc->dcn_soc->max_dppclk_vnom0p8*1000) { - vdd_level = dcn_bw_v_max0p9; - } else if (clocks_in_khz > dc->dcn_soc->max_dppclk_vmid0p72*1000) { - vdd_level = dcn_bw_v_nom0p8; - } else if (clocks_in_khz > dc->dcn_soc->max_dppclk_vmin0p65*1000) { - vdd_level = dcn_bw_v_mid0p72; - } else - vdd_level = dcn_bw_v_min0p65; - break; - - case DM_PP_CLOCK_TYPE_MEMORY_CLK: - { - unsigned factor = (ddr4_dram_factor_single_Channel * dc->dcn_soc->number_of_channels); - - if (clocks_in_khz > dc->dcn_soc->fabric_and_dram_bandwidth_vmax0p9*1000000/factor) { - vdd_level = dcn_bw_v_max0p91; - BREAK_TO_DEBUGGER(); - } else if (clocks_in_khz > dc->dcn_soc->fabric_and_dram_bandwidth_vnom0p8*1000000/factor) { - vdd_level = dcn_bw_v_max0p9; - } else if (clocks_in_khz > dc->dcn_soc->fabric_and_dram_bandwidth_vmid0p72*1000000/factor) { - vdd_level = dcn_bw_v_nom0p8; - } else if (clocks_in_khz > dc->dcn_soc->fabric_and_dram_bandwidth_vmin0p65*1000000/factor) { - vdd_level = dcn_bw_v_mid0p72; - } else - vdd_level = dcn_bw_v_min0p65; - } - break; - - case DM_PP_CLOCK_TYPE_DCFCLK: - if (clocks_in_khz > dc->dcn_soc->dcfclkv_max0p9*1000) { - vdd_level = dcn_bw_v_max0p91; - BREAK_TO_DEBUGGER(); - } else if (clocks_in_khz > dc->dcn_soc->dcfclkv_nom0p8*1000) { - vdd_level = dcn_bw_v_max0p9; - } else if (clocks_in_khz > dc->dcn_soc->dcfclkv_mid0p72*1000) { - vdd_level = dcn_bw_v_nom0p8; - } else if (clocks_in_khz > dc->dcn_soc->dcfclkv_min0p65*1000) { - vdd_level = dcn_bw_v_mid0p72; - } else - vdd_level = dcn_bw_v_min0p65; - break; - - default: - break; - } - return vdd_level; -} - -unsigned int dcn_find_dcfclk_suits_all( - const struct dc *dc, - struct dc_clocks *clocks) -{ - unsigned vdd_level, vdd_level_temp; - unsigned dcf_clk; - - /*find a common supported voltage level*/ - vdd_level = dcn_find_normalized_clock_vdd_Level( - dc, DM_PP_CLOCK_TYPE_DISPLAY_CLK, clocks->dispclk_khz); - vdd_level_temp = dcn_find_normalized_clock_vdd_Level( - dc, DM_PP_CLOCK_TYPE_DISPLAYPHYCLK, clocks->phyclk_khz); - - vdd_level = dcn_bw_max(vdd_level, vdd_level_temp); - vdd_level_temp = dcn_find_normalized_clock_vdd_Level( - dc, DM_PP_CLOCK_TYPE_DPPCLK, clocks->dppclk_khz); - vdd_level = dcn_bw_max(vdd_level, vdd_level_temp); - - vdd_level_temp = dcn_find_normalized_clock_vdd_Level( - dc, DM_PP_CLOCK_TYPE_MEMORY_CLK, clocks->fclk_khz); - vdd_level = dcn_bw_max(vdd_level, vdd_level_temp); - vdd_level_temp = dcn_find_normalized_clock_vdd_Level( - dc, DM_PP_CLOCK_TYPE_DCFCLK, clocks->dcfclk_khz); - - /*find that level conresponding dcfclk*/ - vdd_level = dcn_bw_max(vdd_level, vdd_level_temp); - if (vdd_level == dcn_bw_v_max0p91) { - BREAK_TO_DEBUGGER(); - dcf_clk = dc->dcn_soc->dcfclkv_max0p9*1000; - } else if (vdd_level == dcn_bw_v_max0p9) - dcf_clk = dc->dcn_soc->dcfclkv_max0p9*1000; - else if (vdd_level == dcn_bw_v_nom0p8) - dcf_clk = dc->dcn_soc->dcfclkv_nom0p8*1000; - else if (vdd_level == dcn_bw_v_mid0p72) - dcf_clk = dc->dcn_soc->dcfclkv_mid0p72*1000; - else - dcf_clk = dc->dcn_soc->dcfclkv_min0p65*1000; - - DC_LOG_BANDWIDTH_CALCS("\tdcf_clk for voltage = %d\n", dcf_clk); - return dcf_clk; -} - void dcn_bw_update_from_pplib_fclks( struct dc *dc, struct dm_pp_clock_levels_with_voltage *fclks) diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn30/display_rq_dlg_calc_30.c b/drivers/gpu/drm/amd/display/dc/dml/dcn30/display_rq_dlg_calc_30.c index 76d3bb3c91550c47913f087cb39d7a3b6d3c705c..8d4873f80df023073ce949e1b11b7c856e1f235b 100644 --- a/drivers/gpu/drm/amd/display/dc/dml/dcn30/display_rq_dlg_calc_30.c +++ b/drivers/gpu/drm/amd/display/dc/dml/dcn30/display_rq_dlg_calc_30.c @@ -1562,6 +1562,7 @@ static void dml_rq_dlg_get_dlg_params(struct display_mode_lib *mode_lib, dml_print("DML_DLG: %s: disp_dlg_regs->dst_y_per_row_vblank = 0x%x\n", __func__, disp_dlg_regs->dst_y_per_row_vblank); dml_print("DML_DLG: %s: disp_dlg_regs->dst_y_per_vm_flip = 0x%x\n", __func__, disp_dlg_regs->dst_y_per_vm_flip); dml_print("DML_DLG: %s: disp_dlg_regs->dst_y_per_row_flip = 0x%x\n", __func__, disp_dlg_regs->dst_y_per_row_flip); + disp_dlg_regs->refcyc_per_pte_group_vblank_l = (unsigned int)(dst_y_per_row_vblank * (double)htotal * ref_freq_to_pix_freq / (double)dpte_groups_per_row_ub_l); diff --git a/drivers/gpu/drm/amd/display/dc/dml2/Makefile b/drivers/gpu/drm/amd/display/dc/dml2/Makefile index c4378e620cbf9129ec61a93bfb75e65c44354693..d9c27ebe12ee08d6330eb199cd8ca9c8489fa5b2 100644 --- a/drivers/gpu/drm/amd/display/dc/dml2/Makefile +++ b/drivers/gpu/drm/amd/display/dc/dml2/Makefile @@ -73,9 +73,8 @@ AMD_DAL_DML2 = $(addprefix $(AMDDALPATH)/dc/dml2/,$(DML2)) AMD_DISPLAY_FILES += $(AMD_DAL_DML2) -CFLAGS_$(AMDDALPATH)/dc/dml2/dml21/src/dml2_top/dml_top.o := $(dml2_ccflags) -CFLAGS_$(AMDDALPATH)/dc/dml2/dml21/src/dml2_top/dml_top_mcache.o := $(dml2_ccflags) -CFLAGS_$(AMDDALPATH)/dc/dml2/dml21/src/dml2_top/dml2_top_optimization := $(dml2_ccflags) +CFLAGS_$(AMDDALPATH)/dc/dml2/dml21/src/dml2_top/dml2_top_interfaces.o := $(dml2_ccflags) +CFLAGS_$(AMDDALPATH)/dc/dml2/dml21/src/dml2_top/dml2_top_soc15.o := $(dml2_ccflags) CFLAGS_$(AMDDALPATH)/dc/dml2/dml21/src/dml2_core/dml2_core_dcn4.o := $(dml2_ccflags) CFLAGS_$(AMDDALPATH)/dc/dml2/dml21/src/dml2_core/dml2_core_dcn4_calcs.o := $(dml2_ccflags) $(frame_warn_flag) CFLAGS_$(AMDDALPATH)/dc/dml2/dml21/src/dml2_core/dml2_core_factory.o := $(dml2_ccflags) @@ -94,9 +93,8 @@ CFLAGS_$(AMDDALPATH)/dc/dml2/dml21/dml21_translation_helper.o := $(dml2_ccflags) CFLAGS_$(AMDDALPATH)/dc/dml2/dml21/dml21_utils.o := $(dml2_ccflags) CFLAGS_$(AMDDALPATH)/dc/dml2/dml21/inc/dml2_debug.o := $(dml2_ccflags) -CFLAGS_REMOVE_$(AMDDALPATH)/dc/dml2/dml21/src/dml2_top/dml_top.o := $(dml2_rcflags) -CFLAGS_REMOVE_$(AMDDALPATH)/dc/dml2/dml21/src/dml2_top/dml_top_mcache.o := $(dml2_rcflags) -CFLAGS_REMOVE_$(AMDDALPATH)/dc/dml2/dml21/src/dml2_top/dml2_top_optimization.o := $(dml2_rcflags) +CFLAGS_REMOVE_$(AMDDALPATH)/dc/dml2/dml21/src/dml2_top/dml2_top_interfaces.o := $(dml2_rcflags) +CFLAGS_REMOVE_$(AMDDALPATH)/dc/dml2/dml21/src/dml2_top/dml2_top_soc15.o := $(dml2_rcflags) CFLAGS_REMOVE_$(AMDDALPATH)/dc/dml2/dml21/src/dml2_core/dml2_core_dcn4.o := $(dml2_rcflags) CFLAGS_REMOVE_$(AMDDALPATH)/dc/dml2/dml21/src/dml2_core/dml2_core_dcn4_calcs.o := $(dml2_rcflags) CFLAGS_REMOVE_$(AMDDALPATH)/dc/dml2/dml21/src/dml2_core/dml2_core_factory.o := $(dml2_rcflags) @@ -113,9 +111,8 @@ CFLAGS_REMOVE_$(AMDDALPATH)/dc/dml2/dml21/dml21_translation_helper.o := $(dml2_r CFLAGS_REMOVE_$(AMDDALPATH)/dc/dml2/dml21/dml21_utils.o := $(dml2_rcflags) CFLAGS_REMOVE_$(AMDDALPATH)/dc/dml2/dml21/inc/dml2_debug.o := $(dml2_rcflags) -DML21 := src/dml2_top/dml_top.o -DML21 += src/dml2_top/dml_top_mcache.o -DML21 += src/dml2_top/dml2_top_optimization.o +DML21 := src/dml2_top/dml2_top_interfaces.o +DML21 += src/dml2_top/dml2_top_soc15.o DML21 += src/inc/dml2_debug.o DML21 += src/dml2_core/dml2_core_dcn4.o DML21 += src/dml2_core/dml2_core_factory.o diff --git a/drivers/gpu/drm/amd/display/dc/dml2/display_mode_core.c b/drivers/gpu/drm/amd/display/dc/dml2/display_mode_core.c index 8dabb1ac0b684d9db105e405aaffb334d37d33ea..35bc917631aed46c83d1f3c7b009684ef843e3ef 100644 --- a/drivers/gpu/drm/amd/display/dc/dml2/display_mode_core.c +++ b/drivers/gpu/drm/amd/display/dc/dml2/display_mode_core.c @@ -6301,9 +6301,9 @@ static void dml_prefetch_check(struct display_mode_lib_st *mode_lib) mode_lib->ms.meta_row_bandwidth_this_state, mode_lib->ms.dpte_row_bandwidth_this_state, mode_lib->ms.NoOfDPPThisState, - mode_lib->ms.UrgentBurstFactorLuma, - mode_lib->ms.UrgentBurstFactorChroma, - mode_lib->ms.UrgentBurstFactorCursor); + mode_lib->ms.UrgentBurstFactorLuma[j], + mode_lib->ms.UrgentBurstFactorChroma[j], + mode_lib->ms.UrgentBurstFactorCursor[j]); s->VMDataOnlyReturnBWPerState = dml_get_return_bw_mbps_vm_only( &mode_lib->ms.soc, @@ -6434,7 +6434,7 @@ static void dml_prefetch_check(struct display_mode_lib_st *mode_lib) /* Output */ &mode_lib->ms.UrgentBurstFactorCursorPre[k], &mode_lib->ms.UrgentBurstFactorLumaPre[k], - &mode_lib->ms.UrgentBurstFactorChroma[k], + &mode_lib->ms.UrgentBurstFactorChromaPre[k], &mode_lib->ms.NotUrgentLatencyHidingPre[k]); mode_lib->ms.cursor_bw_pre[k] = mode_lib->ms.cache_display_cfg.plane.NumberOfCursors[k] * mode_lib->ms.cache_display_cfg.plane.CursorWidth[k] * @@ -6458,9 +6458,9 @@ static void dml_prefetch_check(struct display_mode_lib_st *mode_lib) mode_lib->ms.cursor_bw_pre, mode_lib->ms.prefetch_vmrow_bw, mode_lib->ms.NoOfDPPThisState, - mode_lib->ms.UrgentBurstFactorLuma, - mode_lib->ms.UrgentBurstFactorChroma, - mode_lib->ms.UrgentBurstFactorCursor, + mode_lib->ms.UrgentBurstFactorLuma[j], + mode_lib->ms.UrgentBurstFactorChroma[j], + mode_lib->ms.UrgentBurstFactorCursor[j], mode_lib->ms.UrgentBurstFactorLumaPre, mode_lib->ms.UrgentBurstFactorChromaPre, mode_lib->ms.UrgentBurstFactorCursorPre, @@ -6517,9 +6517,9 @@ static void dml_prefetch_check(struct display_mode_lib_st *mode_lib) mode_lib->ms.cursor_bw, mode_lib->ms.cursor_bw_pre, mode_lib->ms.NoOfDPPThisState, - mode_lib->ms.UrgentBurstFactorLuma, - mode_lib->ms.UrgentBurstFactorChroma, - mode_lib->ms.UrgentBurstFactorCursor, + mode_lib->ms.UrgentBurstFactorLuma[j], + mode_lib->ms.UrgentBurstFactorChroma[j], + mode_lib->ms.UrgentBurstFactorCursor[j], mode_lib->ms.UrgentBurstFactorLumaPre, mode_lib->ms.UrgentBurstFactorChromaPre, mode_lib->ms.UrgentBurstFactorCursorPre); @@ -6586,9 +6586,9 @@ static void dml_prefetch_check(struct display_mode_lib_st *mode_lib) mode_lib->ms.cursor_bw_pre, mode_lib->ms.prefetch_vmrow_bw, mode_lib->ms.NoOfDPP[j], // VBA_ERROR DPPPerSurface is not assigned at this point, should use NoOfDpp here - mode_lib->ms.UrgentBurstFactorLuma, - mode_lib->ms.UrgentBurstFactorChroma, - mode_lib->ms.UrgentBurstFactorCursor, + mode_lib->ms.UrgentBurstFactorLuma[j], + mode_lib->ms.UrgentBurstFactorChroma[j], + mode_lib->ms.UrgentBurstFactorCursor[j], mode_lib->ms.UrgentBurstFactorLumaPre, mode_lib->ms.UrgentBurstFactorChromaPre, mode_lib->ms.UrgentBurstFactorCursorPre, @@ -7809,9 +7809,9 @@ dml_bool_t dml_core_mode_support(struct display_mode_lib_st *mode_lib) mode_lib->ms.DETBufferSizeYThisState[k], mode_lib->ms.DETBufferSizeCThisState[k], /* Output */ - &mode_lib->ms.UrgentBurstFactorCursor[k], - &mode_lib->ms.UrgentBurstFactorLuma[k], - &mode_lib->ms.UrgentBurstFactorChroma[k], + &mode_lib->ms.UrgentBurstFactorCursor[j][k], + &mode_lib->ms.UrgentBurstFactorLuma[j][k], + &mode_lib->ms.UrgentBurstFactorChroma[j][k], &mode_lib->ms.NotUrgentLatencyHiding[k]); } @@ -8318,7 +8318,7 @@ void dml_core_mode_programming(struct display_mode_lib_st *mode_lib, const struc if (clk_cfg->dcfclk_option != dml_use_override_freq) locals->Dcfclk = mode_lib->ms.DCFCLK; else - locals->Dcfclk = clk_cfg->dcfclk_freq_mhz; + locals->Dcfclk = clk_cfg->dcfclk_mhz; #ifdef __DML_VBA_DEBUG__ dml_print_dml_policy(&mode_lib->ms.policy); @@ -8371,7 +8371,7 @@ void dml_core_mode_programming(struct display_mode_lib_st *mode_lib, const struc if (clk_cfg->dispclk_option == dml_use_required_freq) locals->Dispclk = locals->Dispclk_calculated; else if (clk_cfg->dispclk_option == dml_use_override_freq) - locals->Dispclk = clk_cfg->dispclk_freq_mhz; + locals->Dispclk = clk_cfg->dispclk_mhz; else locals->Dispclk = mode_lib->ms.state.dispclk_mhz; #ifdef __DML_VBA_DEBUG__ @@ -8412,7 +8412,7 @@ void dml_core_mode_programming(struct display_mode_lib_st *mode_lib, const struc if (clk_cfg->dppclk_option[k] == dml_use_required_freq) locals->Dppclk[k] = locals->Dppclk_calculated[k]; else if (clk_cfg->dppclk_option[k] == dml_use_override_freq) - locals->Dppclk[k] = clk_cfg->dppclk_freq_mhz[k]; + locals->Dppclk[k] = clk_cfg->dppclk_mhz[k]; else locals->Dppclk[k] = mode_lib->ms.state.dppclk_mhz; #ifdef __DML_VBA_DEBUG__ @@ -9190,6 +9190,8 @@ void dml_core_mode_programming(struct display_mode_lib_st *mode_lib, const struc &locals->FractionOfUrgentBandwidth, &s->dummy_boolean[0]); // dml_bool_t *PrefetchBandwidthSupport + + if (s->VRatioPrefetchMoreThanMax != false || s->DestinationLineTimesForPrefetchLessThan2 != false) { dml_print("DML::%s: VRatioPrefetchMoreThanMax = %u\n", __func__, s->VRatioPrefetchMoreThanMax); dml_print("DML::%s: DestinationLineTimesForPrefetchLessThan2 = %u\n", __func__, s->DestinationLineTimesForPrefetchLessThan2); @@ -9204,6 +9206,7 @@ void dml_core_mode_programming(struct display_mode_lib_st *mode_lib, const struc } } + if (locals->PrefetchModeSupported == true && mode_lib->ms.support.ImmediateFlipSupport == true) { locals->BandwidthAvailableForImmediateFlip = CalculateBandwidthAvailableForImmediateFlip( mode_lib->ms.num_active_planes, diff --git a/drivers/gpu/drm/amd/display/dc/dml2/display_mode_core_structs.h b/drivers/gpu/drm/amd/display/dc/dml2/display_mode_core_structs.h index f951936bb579e62ef862d71aec94dcf0b39619fa..dd3f43181a6ef44bfcee3b2da90ade052b4fd601 100644 --- a/drivers/gpu/drm/amd/display/dc/dml2/display_mode_core_structs.h +++ b/drivers/gpu/drm/amd/display/dc/dml2/display_mode_core_structs.h @@ -28,6 +28,7 @@ #define __DISPLAY_MODE_CORE_STRUCT_H__ #include "display_mode_lib_defines.h" +#include "dml_top_display_cfg_types.h" enum dml_project_id { dml_project_invalid = 0, @@ -49,7 +50,9 @@ enum dml_use_mall_for_pstate_change_mode { dml_use_mall_pstate_change_disable = 0, dml_use_mall_pstate_change_full_frame = 1, dml_use_mall_pstate_change_sub_viewport = 2, - dml_use_mall_pstate_change_phantom_pipe = 3 + dml_use_mall_pstate_change_phantom_pipe = 3, + dml_use_mall_pstate_change_phantom_pipe_no_data_return = 4, + dml_use_mall_pstate_change_imall = 5 }; enum dml_use_mall_for_static_screen_mode { dml_use_mall_static_screen_disable = 0, @@ -171,7 +174,11 @@ enum dml_swizzle_mode { dml_sw_256kb_z_x = 28, dml_sw_256kb_s_x = 29, dml_sw_256kb_d_x = 30, - dml_sw_256kb_r_x = 31 + dml_sw_256kb_r_x = 31, + dml_sw_256b_2d = 32, + dml_sw_4kb_2d = 33, + dml_sw_64kb_2d = 34, + dml_sw_256kb_2d = 35 }; enum dml_lb_depth { dml_lb_6 = 0, @@ -223,24 +230,28 @@ enum dml_mpc_use_policy { dml_mpc_disabled = 0, dml_mpc_as_possible = 1, dml_mpc_as_needed_for_voltage = 2, - dml_mpc_as_needed_for_pstate_and_voltage = 3 + dml_mpc_as_needed_for_pstate_and_voltage = 3, + dml_mpc_as_needed = 4, + dml_mpc_2to1 = 5 }; enum dml_odm_use_policy { dml_odm_use_policy_bypass = 0, dml_odm_use_policy_combine_as_needed = 1, dml_odm_use_policy_combine_2to1 = 2, - dml_odm_use_policy_combine_4to1 = 3, - dml_odm_use_policy_split_1to2 = 4, - dml_odm_use_policy_mso_1to2 = 5, - dml_odm_use_policy_mso_1to4 = 6 + dml_odm_use_policy_combine_3to1 = 3, + dml_odm_use_policy_combine_4to1 = 4, + dml_odm_use_policy_split_1to2 = 5, + dml_odm_use_policy_mso_1to2 = 6, + dml_odm_use_policy_mso_1to4 = 7 }; enum dml_odm_mode { dml_odm_mode_bypass = 0, dml_odm_mode_combine_2to1 = 1, - dml_odm_mode_combine_4to1 = 2, - dml_odm_mode_split_1to2 = 3, - dml_odm_mode_mso_1to2 = 4, - dml_odm_mode_mso_1to4 = 5 + dml_odm_mode_combine_3to1 = 2, + dml_odm_mode_combine_4to1 = 3, + dml_odm_mode_split_1to2 = 4, + dml_odm_mode_mso_1to2 = 5, + dml_odm_mode_mso_1to4 = 6 }; enum dml_writeback_configuration { dml_whole_buffer_for_single_stream_no_interleave = 0, @@ -289,6 +300,17 @@ struct soc_state_bounding_box_st { dml_float_t fclk_change_latency_us; dml_float_t usr_retraining_latency_us; dml_bool_t use_ideal_dram_bw_strobe; + dml_float_t g6_temp_read_blackout_us; + + struct { + dml_uint_t urgent_ramp_uclk_cycles; + dml_uint_t trip_to_memory_uclk_cycles; + dml_uint_t meta_trip_to_memory_uclk_cycles; + dml_uint_t maximum_latency_when_urgent_uclk_cycles; + dml_uint_t average_latency_when_urgent_uclk_cycles; + dml_uint_t maximum_latency_when_non_urgent_uclk_cycles; + dml_uint_t average_latency_when_non_urgent_uclk_cycles; + } dml_dcn401_uclk_dpm_dependent_soc_qos_params; }; struct soc_bounding_box_st { @@ -297,7 +319,7 @@ struct soc_bounding_box_st { dml_float_t pcierefclk_mhz; dml_float_t refclk_mhz; dml_float_t amclk_mhz; - dml_float_t max_outstanding_reqs; + dml_uint_t max_outstanding_reqs; dml_float_t pct_ideal_sdp_bw_after_urgent; dml_float_t pct_ideal_fabric_bw_after_urgent; dml_float_t pct_ideal_dram_bw_after_urgent_pixel_only; @@ -308,6 +330,16 @@ struct soc_bounding_box_st { dml_float_t max_avg_fabric_bw_use_normal_percent; dml_float_t max_avg_dram_bw_use_normal_percent; dml_float_t max_avg_dram_bw_use_normal_strobe_percent; + + dml_float_t svp_prefetch_pct_ideal_sdp_bw_after_urgent; + dml_float_t svp_prefetch_pct_ideal_fabric_bw_after_urgent; + dml_float_t svp_prefetch_pct_ideal_dram_bw_after_urgent_pixel_only; + dml_float_t svp_prefetch_pct_ideal_dram_bw_after_urgent_pixel_and_vm; + dml_float_t svp_prefetch_pct_ideal_dram_bw_after_urgent_vm_only; + dml_float_t svp_prefetch_max_avg_sdp_bw_use_normal_percent; + dml_float_t svp_prefetch_max_avg_fabric_bw_use_normal_percent; + dml_float_t svp_prefetch_max_avg_dram_bw_use_normal_percent; + dml_uint_t round_trip_ping_latency_dcfclk_cycles; dml_uint_t urgent_out_of_order_return_per_channel_pixel_only_bytes; dml_uint_t urgent_out_of_order_return_per_channel_pixel_and_vm_bytes; @@ -324,6 +356,26 @@ struct soc_bounding_box_st { dml_uint_t mall_allocated_for_dcn_mbytes; dml_float_t dispclk_dppclk_vco_speed_mhz; dml_bool_t do_urgent_latency_adjustment; + + dml_uint_t mem_word_bytes; + dml_uint_t num_dcc_mcaches; + dml_uint_t mcache_size_bytes; + dml_uint_t mcache_line_size_bytes; + + struct { + dml_bool_t UseNewDCN401SOCParameters; + dml_uint_t df_qos_response_time_fclk_cycles; + dml_uint_t max_round_trip_to_furthest_cs_fclk_cycles; + dml_uint_t mall_overhead_fclk_cycles; + dml_uint_t meta_trip_adder_fclk_cycles; + dml_uint_t average_transport_distance_fclk_cycles; + dml_float_t umc_urgent_ramp_latency_margin; + dml_float_t umc_max_latency_margin; + dml_float_t umc_average_latency_margin; + dml_float_t fabric_max_transport_latency_margin; + dml_float_t fabric_average_transport_latency_margin; + } dml_dcn401_soc_qos_params; + }; struct ip_params_st { @@ -515,6 +567,10 @@ struct dml_plane_cfg_st { dml_uint_t CursorWidth[__DML_NUM_PLANES__]; dml_uint_t CursorBPP[__DML_NUM_PLANES__]; + dml_bool_t setup_for_tdlut[__DML_NUM_PLANES__]; + enum dml2_tdlut_addressing_mode tdlut_addressing_mode[__DML_NUM_PLANES__]; + enum dml2_tdlut_width_mode tdlut_width_mode[__DML_NUM_PLANES__]; + enum dml_use_mall_for_static_screen_mode UseMALLForStaticScreen[__DML_NUM_PLANES__]; enum dml_use_mall_for_pstate_change_mode UseMALLForPStateChange[__DML_NUM_PLANES__]; @@ -604,6 +660,17 @@ struct dml_hw_resource_st { dml_float_t DLGRefClkFreqMHz; /// dcfclk_option); dml_print("DML: clk_cfg: dispclk_option = %d\n", clk_cfg->dispclk_option); - dml_print("DML: clk_cfg: dcfclk_freq_mhz = %f\n", clk_cfg->dcfclk_freq_mhz); - dml_print("DML: clk_cfg: dispclk_freq_mhz = %f\n", clk_cfg->dispclk_freq_mhz); + dml_print("DML: clk_cfg: dcfclk_mhz = %f\n", clk_cfg->dcfclk_mhz); + dml_print("DML: clk_cfg: dispclk_mhz = %f\n", clk_cfg->dispclk_mhz); for (dml_uint_t i = 0; i < DCN_DML__NUM_PLANE; i++) { dml_print("DML: clk_cfg: i=%d, dppclk_option = %d\n", i, clk_cfg->dppclk_option[i]); - dml_print("DML: clk_cfg: i=%d, dppclk_freq_mhz = %f\n", i, clk_cfg->dppclk_freq_mhz[i]); + dml_print("DML: clk_cfg: i=%d, dppclk_mhz = %f\n", i, clk_cfg->dppclk_mhz[i]); } } diff --git a/drivers/gpu/drm/amd/display/dc/dml2/dml21/dml21_translation_helper.c b/drivers/gpu/drm/amd/display/dc/dml2/dml21/dml21_translation_helper.c index c6a5a861467976bf61b3d4e7ff04ee6c3e6bdc55..efb09990549658fa16d302d32cc9d740f52fdb27 100644 --- a/drivers/gpu/drm/amd/display/dc/dml2/dml21/dml21_translation_helper.c +++ b/drivers/gpu/drm/amd/display/dc/dml2/dml21/dml21_translation_helper.c @@ -1077,6 +1077,8 @@ void dml21_copy_clocks_to_dc_state(struct dml2_context *in_ctx, struct dc_state context->bw_ctx.bw.dcn.clk.dtbclk_en = in_ctx->v21.mode_programming.programming->min_clocks.dcn4x.dtbrefclk_khz > 0; context->bw_ctx.bw.dcn.clk.ref_dtbclk_khz = in_ctx->v21.mode_programming.programming->min_clocks.dcn4x.dtbrefclk_khz; context->bw_ctx.bw.dcn.clk.socclk_khz = in_ctx->v21.mode_programming.programming->min_clocks.dcn4x.socclk_khz; + context->bw_ctx.bw.dcn.clk.subvp_prefetch_dramclk_khz = in_ctx->v21.mode_programming.programming->min_clocks.dcn4x.svp_prefetch_no_throttle.uclk_khz; + context->bw_ctx.bw.dcn.clk.subvp_prefetch_fclk_khz = in_ctx->v21.mode_programming.programming->min_clocks.dcn4x.svp_prefetch_no_throttle.fclk_khz; } void dml21_extract_legacy_watermark_set(const struct dc *in_dc, struct dcn_watermarks *watermark, enum dml2_dchub_watermark_reg_set_index reg_set_idx, struct dml2_context *in_ctx) @@ -1226,22 +1228,22 @@ void dml21_set_dc_p_state_type( bool sub_vp_enabled) { switch (stream_programming->uclk_pstate_method) { - case dml2_uclk_pstate_support_method_vactive: - case dml2_uclk_pstate_support_method_fw_vactive_drr: + case dml2_pstate_method_vactive: + case dml2_pstate_method_fw_vactive_drr: pipe_ctx->p_state_type = P_STATE_V_ACTIVE; break; - case dml2_uclk_pstate_support_method_vblank: - case dml2_uclk_pstate_support_method_fw_vblank_drr: + case dml2_pstate_method_vblank: + case dml2_pstate_method_fw_vblank_drr: if (sub_vp_enabled) pipe_ctx->p_state_type = P_STATE_V_BLANK_SUB_VP; else pipe_ctx->p_state_type = P_STATE_V_BLANK; break; - case dml2_uclk_pstate_support_method_fw_subvp_phantom: - case dml2_uclk_pstate_support_method_fw_subvp_phantom_drr: + case dml2_pstate_method_fw_svp: + case dml2_pstate_method_fw_svp_drr: pipe_ctx->p_state_type = P_STATE_SUB_VP; break; - case dml2_uclk_pstate_support_method_fw_drr: + case dml2_pstate_method_fw_drr: if (sub_vp_enabled) pipe_ctx->p_state_type = P_STATE_DRR_SUB_VP; else diff --git a/drivers/gpu/drm/amd/display/dc/dml2/dml21/dml21_utils.c b/drivers/gpu/drm/amd/display/dc/dml2/dml21/dml21_utils.c index 51d491bffa324119060a3a00431cce2f9a41905d..cb966f8d3216fde05f67edd1f9d9e71339ea0161 100644 --- a/drivers/gpu/drm/amd/display/dc/dml2/dml21/dml21_utils.c +++ b/drivers/gpu/drm/amd/display/dc/dml2/dml21/dml21_utils.c @@ -482,7 +482,8 @@ void dml21_build_fams2_programming(const struct dc *dc, unsigned int num_fams2_streams = 0; /* reset fams2 data */ - memset(&context->bw_ctx.bw.dcn.fams2_stream_params, 0, sizeof(struct dmub_fams2_stream_static_state) * DML2_MAX_PLANES); + memset(&context->bw_ctx.bw.dcn.fams2_stream_base_params, 0, sizeof(union dmub_cmd_fams2_config) * DML2_MAX_PLANES); + memset(&context->bw_ctx.bw.dcn.fams2_stream_sub_params, 0, sizeof(union dmub_cmd_fams2_config) * DML2_MAX_PLANES); memset(&context->bw_ctx.bw.dcn.fams2_global_config, 0, sizeof(struct dmub_cmd_fams2_global_config)); if (dml_ctx->v21.mode_programming.programming->fams2_required) { @@ -490,8 +491,10 @@ void dml21_build_fams2_programming(const struct dc *dc, int dml_stream_idx; struct dc_stream_state *phantom_stream; struct dc_stream_status *phantom_status; + enum fams2_stream_type type = 0; - struct dmub_fams2_stream_static_state *static_state = &context->bw_ctx.bw.dcn.fams2_stream_params[num_fams2_streams]; + union dmub_cmd_fams2_config *static_base_state = &context->bw_ctx.bw.dcn.fams2_stream_base_params[num_fams2_streams]; + union dmub_cmd_fams2_config *static_sub_state = &context->bw_ctx.bw.dcn.fams2_stream_sub_params[num_fams2_streams]; struct dc_stream_state *stream = context->streams[i]; @@ -508,28 +511,38 @@ void dml21_build_fams2_programming(const struct dc *dc, } /* copy static state from PMO */ - memcpy(static_state, - &dml_ctx->v21.mode_programming.programming->stream_programming[dml_stream_idx].fams2_params, - sizeof(struct dmub_fams2_stream_static_state)); - - /* get information from context */ - static_state->num_planes = context->stream_status[i].plane_count; - static_state->otg_inst = context->stream_status[i].primary_otg_inst; - - /* populate pipe masks for planes */ - for (j = 0; j < context->stream_status[i].plane_count; j++) { - for (k = 0; k < dc->res_pool->pipe_count; k++) { - if (context->res_ctx.pipe_ctx[k].stream && - context->res_ctx.pipe_ctx[k].stream->stream_id == stream->stream_id && - context->res_ctx.pipe_ctx[k].plane_state == context->stream_status[i].plane_states[j]) { - static_state->pipe_mask |= (1 << k); - static_state->plane_pipe_masks[j] |= (1 << k); + memcpy(static_base_state, + &dml_ctx->v21.mode_programming.programming->stream_programming[dml_stream_idx].fams2_base_params, + sizeof(union dmub_cmd_fams2_config)); + memcpy(static_sub_state, + &dml_ctx->v21.mode_programming.programming->stream_programming[dml_stream_idx].fams2_sub_params, + sizeof(union dmub_cmd_fams2_config)); + + switch (dc->debug.fams_version.minor) { + case 1: + default: + type = static_base_state->stream_v1.base.type; + + /* get information from context */ + static_base_state->stream_v1.base.num_planes = context->stream_status[i].plane_count; + static_base_state->stream_v1.base.otg_inst = context->stream_status[i].primary_otg_inst; + + /* populate pipe masks for planes */ + for (j = 0; j < context->stream_status[i].plane_count; j++) { + for (k = 0; k < dc->res_pool->pipe_count; k++) { + if (context->res_ctx.pipe_ctx[k].stream && + context->res_ctx.pipe_ctx[k].stream->stream_id == stream->stream_id && + context->res_ctx.pipe_ctx[k].plane_state == context->stream_status[i].plane_states[j]) { + static_base_state->stream_v1.base.pipe_mask |= (1 << k); + static_base_state->stream_v1.base.plane_pipe_masks[j] |= (1 << k); + } } } } + /* get per method programming */ - switch (static_state->type) { + switch (type) { case FAMS2_STREAM_TYPE_VBLANK: case FAMS2_STREAM_TYPE_VACTIVE: case FAMS2_STREAM_TYPE_DRR: @@ -543,16 +556,27 @@ void dml21_build_fams2_programming(const struct dc *dc, /* phantom status should always be present */ ASSERT(phantom_status); - static_state->sub_state.subvp.phantom_otg_inst = phantom_status->primary_otg_inst; + if (!phantom_status) + break; - /* populate pipe masks for phantom planes */ - for (j = 0; j < phantom_status->plane_count; j++) { - for (k = 0; k < dc->res_pool->pipe_count; k++) { - if (context->res_ctx.pipe_ctx[k].stream && - context->res_ctx.pipe_ctx[k].stream->stream_id == phantom_stream->stream_id && - context->res_ctx.pipe_ctx[k].plane_state == phantom_status->plane_states[j]) { - static_state->sub_state.subvp.phantom_pipe_mask |= (1 << k); - static_state->sub_state.subvp.phantom_plane_pipe_masks[j] |= (1 << k); + switch (dc->debug.fams_version.minor) { + case 1: + default: + static_sub_state->stream_v1.sub_state.subvp.phantom_otg_inst = phantom_status->primary_otg_inst; + + /* populate pipe masks for phantom planes */ + for (j = 0; j < phantom_status->plane_count; j++) { + for (k = 0; k < dc->res_pool->pipe_count; k++) { + if (context->res_ctx.pipe_ctx[k].stream && + context->res_ctx.pipe_ctx[k].stream->stream_id == phantom_stream->stream_id && + context->res_ctx.pipe_ctx[k].plane_state == phantom_status->plane_states[j]) { + switch (dc->debug.fams_version.minor) { + case 1: + default: + static_sub_state->stream_v1.sub_state.subvp.phantom_pipe_mask |= (1 << k); + static_sub_state->stream_v1.sub_state.subvp.phantom_plane_pipe_masks[j] |= (1 << k); + } + } } } } diff --git a/drivers/gpu/drm/amd/display/dc/dml2/dml21/inc/bounding_boxes/dcn4_soc_bb.h b/drivers/gpu/drm/amd/display/dc/dml2/dml21/inc/bounding_boxes/dcn4_soc_bb.h index 8ef7977841de008c8f5ca8774a3f673d0f2abc80..793e1c038efd0e9ad8be900b9658b4d227d40e43 100644 --- a/drivers/gpu/drm/amd/display/dc/dml2/dml21/inc/bounding_boxes/dcn4_soc_bb.h +++ b/drivers/gpu/drm/amd/display/dc/dml2/dml21/inc/bounding_boxes/dcn4_soc_bb.h @@ -344,6 +344,7 @@ static const struct dml2_ip_capabilities dml2_dcn401_max_ip_caps = { .config_return_buffer_segment_size_in_kbytes = 64, .meta_fifo_size_in_kentries = 22, .compressed_buffer_segment_size_in_kbytes = 64, + .cursor_buffer_size = 24, .max_flip_time_us = 80, .max_flip_time_lines = 32, .hostvm_mode = 0, @@ -354,7 +355,7 @@ static const struct dml2_ip_capabilities dml2_dcn401_max_ip_caps = { .fams2 = { .max_allow_delay_us = 100 * 1000, - .scheduling_delay_us = 125, + .scheduling_delay_us = 550, .vertical_interrupt_ack_delay_us = 40, .allow_programming_delay_us = 18, .min_allow_width_us = 20, diff --git a/drivers/gpu/drm/amd/display/dc/dml2/dml21/inc/dml_top_display_cfg_types.h b/drivers/gpu/drm/amd/display/dc/dml2/dml21/inc/dml_top_display_cfg_types.h index b132f676a68dc9a4111801386a2b447f9386f2c9..5e1ab6d97640420cc1917fe53ccf6b2332f9910d 100644 --- a/drivers/gpu/drm/amd/display/dc/dml2/dml21/inc/dml_top_display_cfg_types.h +++ b/drivers/gpu/drm/amd/display/dc/dml2/dml21/inc/dml_top_display_cfg_types.h @@ -10,9 +10,10 @@ #define DML2_MAX_PLANES 8 #define DML2_MAX_DCN_PIPES 8 #define DML2_MAX_MCACHES 8 // assume plane is going to be supported by a max of 8 mcaches +#define DML2_MAX_WRITEBACK 3 enum dml2_swizzle_mode { - dml2_sw_linear, + dml2_sw_linear, // SW_LINEAR accepts 256 byte aligned pitch and also 128 byte aligned pitch if DCC is not enabled dml2_sw_256b_2d, dml2_sw_4kb_2d, dml2_sw_64kb_2d, @@ -24,7 +25,8 @@ enum dml2_swizzle_mode { dml2_gfx11_sw_64kb_d_x, dml2_gfx11_sw_64kb_r_x, dml2_gfx11_sw_256kb_d_x, - dml2_gfx11_sw_256kb_r_x + dml2_gfx11_sw_256kb_r_x, + }; enum dml2_source_format_class { @@ -38,7 +40,13 @@ enum dml2_source_format_class { dml2_rgbe_alpha = 9, dml2_rgbe = 10, dml2_mono_8 = 11, - dml2_mono_16 = 12 + dml2_mono_16 = 12, + dml2_422_planar_8 = 13, + dml2_422_planar_10 = 14, + dml2_422_planar_12 = 15, + dml2_422_packed_8 = 16, + dml2_422_packed_10 = 17, + dml2_422_packed_12 = 18 }; enum dml2_rotation_angle { @@ -121,15 +129,6 @@ enum dml2_dsc_enable_option { dml2_dsc_enable_if_necessary = 2 }; -enum dml2_pstate_support_method { - dml2_pstate_method_uninitialized, - dml2_pstate_method_not_supported, - dml2_pstate_method_vactive, - dml2_pstate_method_vblank, - dml2_pstate_method_svp, - dml2_pstate_method_drr -}; - enum dml2_tdlut_addressing_mode { dml2_tdlut_sw_linear = 0, dml2_tdlut_simple_linear = 1 @@ -287,22 +286,23 @@ struct dml2_link_output_cfg { bool validate_output; // Do not validate the link configuration for this display stream. }; -struct dml2_writeback_cfg { - bool enable; +struct dml2_writeback_info { enum dml2_source_format_class pixel_format; - unsigned int active_writebacks_per_surface; + unsigned long input_width; + unsigned long input_height; + unsigned long output_width; + unsigned long output_height; + unsigned long v_taps; + unsigned long h_taps; + unsigned long v_taps_chroma; + unsigned long h_taps_chroma; + double h_ratio; + double v_ratio; +}; - struct { - bool enabled; - unsigned long input_width; - unsigned long input_height; - unsigned long output_width; - unsigned long output_height; - unsigned long v_taps; - unsigned long h_taps; - double h_ratio; - double v_ratio; - } scaling_info; +struct dml2_writeback_cfg { + unsigned int active_writebacks_per_stream; + struct dml2_writeback_info writeback_stream[DML2_MAX_WRITEBACK]; }; struct dml2_plane_parameters { diff --git a/drivers/gpu/drm/amd/display/dc/dml2/dml21/inc/dml_top_soc_parameter_types.h b/drivers/gpu/drm/amd/display/dc/dml2/dml21/inc/dml_top_soc_parameter_types.h index ebd8abe894a9a8024dee24e3900cbfef4b973404..5f0bc42d1d2f7757d235ae7b504d34f21e7746f8 100644 --- a/drivers/gpu/drm/amd/display/dc/dml2/dml21/inc/dml_top_soc_parameter_types.h +++ b/drivers/gpu/drm/amd/display/dc/dml2/dml21/inc/dml_top_soc_parameter_types.h @@ -167,11 +167,13 @@ struct dml2_ip_capabilities { unsigned int max_num_dp2p0_streams; unsigned int max_num_hdmi_frl_outputs; unsigned int max_num_dp2p0_outputs; + unsigned int max_num_wb; unsigned int rob_buffer_size_kbytes; unsigned int config_return_buffer_size_in_kbytes; unsigned int config_return_buffer_segment_size_in_kbytes; unsigned int meta_fifo_size_in_kentries; unsigned int compressed_buffer_segment_size_in_kbytes; + unsigned int cursor_buffer_size; unsigned int max_flip_time_us; unsigned int max_flip_time_lines; unsigned int hostvm_mode; diff --git a/drivers/gpu/drm/amd/display/dc/dml2/dml21/inc/dml_top_types.h b/drivers/gpu/drm/amd/display/dc/dml2/dml21/inc/dml_top_types.h index eeb96c45565841e1e1069cb92757eb32b76fd8b2..d2d053f2354d00998e8cd656ce7595e67077a612 100644 --- a/drivers/gpu/drm/amd/display/dc/dml2/dml21/inc/dml_top_types.h +++ b/drivers/gpu/drm/amd/display/dc/dml2/dml21/inc/dml_top_types.h @@ -26,20 +26,14 @@ enum dml2_project_id { dml2_project_dcn4x_stage2_auto_drr_svp = 3, }; -enum dml2_dram_clock_change_support { - dml2_dram_clock_change_vactive = 0, - dml2_dram_clock_change_vblank = 1, - dml2_dram_clock_change_vblank_and_vactive = 2, - dml2_dram_clock_change_drr = 3, - dml2_dram_clock_change_mall_svp = 4, - dml2_dram_clock_change_mall_full_frame = 6, - dml2_dram_clock_change_unsupported = 7 -}; - -enum dml2_fclock_change_support { - dml2_fclock_change_vactive = 0, - dml2_fclock_change_vblank = 1, - dml2_fclock_change_unsupported = 2 +enum dml2_pstate_change_support { + dml2_pstate_change_vactive = 0, + dml2_pstate_change_vblank = 1, + dml2_pstate_change_vblank_and_vactive = 2, + dml2_pstate_change_drr = 3, + dml2_pstate_change_mall_svp = 4, + dml2_pstate_change_mall_full_frame = 6, + dml2_pstate_change_unsupported = 7 }; enum dml2_output_type_and_rate__type { @@ -202,24 +196,23 @@ struct dml2_mcache_surface_allocation { } informative; }; -enum dml2_uclk_pstate_support_method { - dml2_uclk_pstate_support_method_not_supported = 0, - /* hw */ - dml2_uclk_pstate_support_method_vactive = 1, - dml2_uclk_pstate_support_method_vblank = 2, - dml2_uclk_pstate_support_method_reserved_hw = 5, - /* fw */ - dml2_uclk_pstate_support_method_fw_subvp_phantom = 6, - dml2_uclk_pstate_support_method_reserved_fw = 10, - /* fw w/drr */ - dml2_uclk_pstate_support_method_fw_vactive_drr = 11, - dml2_uclk_pstate_support_method_fw_vblank_drr = 12, - dml2_uclk_pstate_support_method_fw_subvp_phantom_drr = 13, - dml2_uclk_pstate_support_method_reserved_fw_drr_fixed = 20, - dml2_uclk_pstate_support_method_fw_drr = 21, - dml2_uclk_pstate_support_method_reserved_fw_drr_var = 22, - - dml2_uclk_pstate_support_method_count +enum dml2_pstate_method { + dml2_pstate_method_na = 0, + /* hw exclusive modes */ + dml2_pstate_method_vactive = 1, + dml2_pstate_method_vblank = 2, + dml2_pstate_method_reserved_hw = 5, + /* fw assisted exclusive modes */ + dml2_pstate_method_fw_svp = 6, + dml2_pstate_method_reserved_fw = 10, + /* fw assisted modes requiring drr modulation */ + dml2_pstate_method_fw_vactive_drr = 11, + dml2_pstate_method_fw_vblank_drr = 12, + dml2_pstate_method_fw_svp_drr = 13, + dml2_pstate_method_reserved_fw_drr_clamped = 20, + dml2_pstate_method_fw_drr = 21, + dml2_pstate_method_reserved_fw_drr_var = 22, + dml2_pstate_method_count }; struct dml2_per_plane_programming { @@ -241,7 +234,7 @@ struct dml2_per_plane_programming { // If a stream is using odm split, then this value is always 1 unsigned int num_dpps_required; - enum dml2_uclk_pstate_support_method uclk_pstate_support_method; + enum dml2_pstate_method uclk_pstate_support_method; // MALL size requirements for MALL SS and SubVP unsigned int surface_size_mall_bytes; @@ -281,7 +274,7 @@ struct dml2_per_stream_programming { unsigned int num_odms_required; - enum dml2_uclk_pstate_support_method uclk_pstate_method; + enum dml2_pstate_method uclk_pstate_method; struct { bool enabled; @@ -289,7 +282,8 @@ struct dml2_per_stream_programming { union dml2_global_sync_programming global_sync; } phantom_stream; - struct dmub_fams2_stream_static_state fams2_params; + union dmub_cmd_fams2_config fams2_base_params; + union dmub_cmd_fams2_config fams2_sub_params; }; //----------------- @@ -339,7 +333,7 @@ struct dml2_mode_support_info { bool DCCMetaBufferSizeNotExceeded; bool TotalVerticalActiveBandwidthSupport; bool VActiveBandwidthSupport; - enum dml2_fclock_change_support FCLKChangeSupport[DML2_MAX_PLANES]; + enum dml2_pstate_change_support FCLKChangeSupport[DML2_MAX_PLANES]; bool USRRetrainingSupport; bool PrefetchSupported; bool DynamicMetadataSupported; @@ -361,6 +355,7 @@ struct dml2_mode_support_info { unsigned int AlignedYPitch[DML2_MAX_PLANES]; unsigned int AlignedCPitch[DML2_MAX_PLANES]; bool g6_temp_read_support; + bool temp_read_or_ppt_support; }; // dml2_mode_support_info struct dml2_display_cfg_programming { @@ -392,6 +387,11 @@ struct dml2_display_cfg_programming { unsigned long fclk_khz; unsigned long dcfclk_khz; } svp_prefetch; + struct { + unsigned long uclk_khz; + unsigned long fclk_khz; + unsigned long dcfclk_khz; + } svp_prefetch_no_throttle; unsigned long deepsleep_dcfclk_khz; unsigned long dispclk_khz; @@ -444,7 +444,7 @@ struct dml2_display_cfg_programming { double pstate_change_us; double fclk_pstate_change_us; double usr_retraining_us; - double g6_temp_read_watermark_us; + double temp_read_or_ppt_watermark_us; } watermarks; struct { @@ -653,6 +653,7 @@ struct dml2_display_cfg_programming { double DisplayPipeLineDeliveryTimeLumaPrefetch[DML2_MAX_PLANES]; double DisplayPipeLineDeliveryTimeChromaPrefetch[DML2_MAX_PLANES]; + double WritebackRequiredBandwidth; double WritebackAllowDRAMClockChangeEndPosition[DML2_MAX_PLANES]; double WritebackAllowFCLKChangeEndPosition[DML2_MAX_PLANES]; double DSCCLK_calculated[DML2_MAX_PLANES]; @@ -662,6 +663,7 @@ struct dml2_display_cfg_programming { double MaxActiveDRAMClockChangeLatencySupported[DML2_MAX_PLANES]; unsigned int PrefetchMode[DML2_MAX_PLANES]; // LEGACY_ONLY bool ROBUrgencyAvoidance; + double LowestPrefetchMargin; } misc; struct dml2_mode_support_info mode_support_info; @@ -675,6 +677,7 @@ struct dml2_display_cfg_programming { bool failed_mcache_validation; bool failed_dpmm; bool failed_mode_programming; + bool failed_map_watermarks; } informative; }; diff --git a/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_core/dml2_core_dcn4.c b/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_core/dml2_core_dcn4.c index 3d41ffde91c1b56b2bf9ba51e064c4845fba3c34..d68b4567e218aabc0bf7c35658e257e2c8e1567c 100644 --- a/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_core/dml2_core_dcn4.c +++ b/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_core/dml2_core_dcn4.c @@ -9,7 +9,7 @@ #include "dml2_debug.h" #include "lib_float_math.h" -static const struct dml2_core_ip_params core_dcn4_ip_caps_base = { +struct dml2_core_ip_params core_dcn4_ip_caps_base = { // Hardcoded values for DCN3x .vblank_nom_default_us = 668, .remote_iommu_outstanding_translations = 256, @@ -90,6 +90,7 @@ static void patch_ip_caps_with_explicit_ip_params(struct dml2_ip_capabilities *i ip_caps->config_return_buffer_segment_size_in_kbytes = ip_params->config_return_buffer_segment_size_in_kbytes; ip_caps->meta_fifo_size_in_kentries = ip_params->meta_fifo_size_in_kentries; ip_caps->compressed_buffer_segment_size_in_kbytes = ip_params->compressed_buffer_segment_size_in_kbytes; + ip_caps->cursor_buffer_size = ip_params->cursor_buffer_size; ip_caps->max_flip_time_us = ip_params->max_flip_time_us; ip_caps->max_flip_time_lines = ip_params->max_flip_time_lines; ip_caps->hostvm_mode = ip_params->hostvm_mode; @@ -114,6 +115,7 @@ static void patch_ip_params_with_ip_caps(struct dml2_core_ip_params *ip_params, ip_params->config_return_buffer_segment_size_in_kbytes = ip_caps->config_return_buffer_segment_size_in_kbytes; ip_params->meta_fifo_size_in_kentries = ip_caps->meta_fifo_size_in_kentries; ip_params->compressed_buffer_segment_size_in_kbytes = ip_caps->compressed_buffer_segment_size_in_kbytes; + ip_params->cursor_buffer_size = ip_caps->cursor_buffer_size; ip_params->max_flip_time_us = ip_caps->max_flip_time_us; ip_params->max_flip_time_lines = ip_caps->max_flip_time_lines; ip_params->hostvm_mode = ip_caps->hostvm_mode; @@ -316,28 +318,9 @@ static void pack_mode_programming_params_with_implicit_subvp(struct dml2_core_in // Setup the appropriate p-state strategy if (display_cfg->stage3.performed && display_cfg->stage3.success) { - switch (display_cfg->stage3.pstate_switch_modes[plane_index]) { - case dml2_uclk_pstate_support_method_vactive: - case dml2_uclk_pstate_support_method_vblank: - case dml2_uclk_pstate_support_method_fw_subvp_phantom: - case dml2_uclk_pstate_support_method_fw_drr: - case dml2_uclk_pstate_support_method_fw_vactive_drr: - case dml2_uclk_pstate_support_method_fw_vblank_drr: - case dml2_uclk_pstate_support_method_fw_subvp_phantom_drr: - programming->plane_programming[plane_index].uclk_pstate_support_method = display_cfg->stage3.pstate_switch_modes[plane_index]; - break; - case dml2_uclk_pstate_support_method_reserved_hw: - case dml2_uclk_pstate_support_method_reserved_fw: - case dml2_uclk_pstate_support_method_reserved_fw_drr_fixed: - case dml2_uclk_pstate_support_method_reserved_fw_drr_var: - case dml2_uclk_pstate_support_method_not_supported: - case dml2_uclk_pstate_support_method_count: - default: - programming->plane_programming[plane_index].uclk_pstate_support_method = dml2_uclk_pstate_support_method_not_supported; - break; - } + programming->plane_programming[plane_index].uclk_pstate_support_method = display_cfg->stage3.pstate_switch_modes[plane_index]; } else { - programming->plane_programming[plane_index].uclk_pstate_support_method = dml2_uclk_pstate_support_method_not_supported; + programming->plane_programming[plane_index].uclk_pstate_support_method = dml2_pstate_method_na; } dml2_core_calcs_get_mall_allocation(&core->clean_me_up.mode_lib, &programming->plane_programming[plane_index].surface_size_mall_bytes, dml_internal_pipe_index); @@ -360,7 +343,8 @@ static void pack_mode_programming_params_with_implicit_subvp(struct dml2_core_in /* unconditionally populate fams2 params */ dml2_core_calcs_get_stream_fams2_programming(&core->clean_me_up.mode_lib, display_cfg, - &programming->stream_programming[main_plane->stream_index].fams2_params, + &programming->stream_programming[main_plane->stream_index].fams2_base_params, + &programming->stream_programming[main_plane->stream_index].fams2_sub_params, programming->stream_programming[main_plane->stream_index].uclk_pstate_method, plane_index); @@ -572,18 +556,18 @@ bool core_dcn4_mode_programming(struct dml2_core_mode_programming_in_out *in_out in_out->programming->plane_programming[plane_index].num_dpps_required = core->clean_me_up.mode_lib.mp.NoOfDPP[plane_index]; if (in_out->programming->display_config.plane_descriptors[plane_index].overrides.legacy_svp_config == dml2_svp_mode_override_main_pipe) - in_out->programming->plane_programming[plane_index].uclk_pstate_support_method = dml2_uclk_pstate_support_method_fw_subvp_phantom; + in_out->programming->plane_programming[plane_index].uclk_pstate_support_method = dml2_pstate_method_fw_svp; else if (in_out->programming->display_config.plane_descriptors[plane_index].overrides.legacy_svp_config == dml2_svp_mode_override_phantom_pipe) - in_out->programming->plane_programming[plane_index].uclk_pstate_support_method = dml2_uclk_pstate_support_method_fw_subvp_phantom; + in_out->programming->plane_programming[plane_index].uclk_pstate_support_method = dml2_pstate_method_fw_svp; else if (in_out->programming->display_config.plane_descriptors[plane_index].overrides.legacy_svp_config == dml2_svp_mode_override_phantom_pipe_no_data_return) - in_out->programming->plane_programming[plane_index].uclk_pstate_support_method = dml2_uclk_pstate_support_method_fw_subvp_phantom; + in_out->programming->plane_programming[plane_index].uclk_pstate_support_method = dml2_pstate_method_fw_svp; else { if (core->clean_me_up.mode_lib.mp.MaxActiveDRAMClockChangeLatencySupported[plane_index] >= core->clean_me_up.mode_lib.soc.power_management_parameters.dram_clk_change_blackout_us) - in_out->programming->plane_programming[plane_index].uclk_pstate_support_method = dml2_uclk_pstate_support_method_vactive; + in_out->programming->plane_programming[plane_index].uclk_pstate_support_method = dml2_pstate_method_vactive; else if (core->clean_me_up.mode_lib.mp.TWait[plane_index] >= core->clean_me_up.mode_lib.soc.power_management_parameters.dram_clk_change_blackout_us) - in_out->programming->plane_programming[plane_index].uclk_pstate_support_method = dml2_uclk_pstate_support_method_vblank; + in_out->programming->plane_programming[plane_index].uclk_pstate_support_method = dml2_pstate_method_vblank; else - in_out->programming->plane_programming[plane_index].uclk_pstate_support_method = dml2_uclk_pstate_support_method_not_supported; + in_out->programming->plane_programming[plane_index].uclk_pstate_support_method = dml2_pstate_method_na; } dml2_core_calcs_get_mall_allocation(&core->clean_me_up.mode_lib, &in_out->programming->plane_programming[plane_index].surface_size_mall_bytes, dml_internal_pipe_index); diff --git a/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_core/dml2_core_dcn4_calcs.c b/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_core/dml2_core_dcn4_calcs.c index 601320b1be81717b201e1b24e00d3d81844e04ac..b9ec243cf9ba5b45a430f2147ecd8b843e14fbd3 100644 --- a/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_core/dml2_core_dcn4_calcs.c +++ b/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_core/dml2_core_dcn4_calcs.c @@ -11,6 +11,9 @@ #define DML2_MAX_FMT_420_BUFFER_WIDTH 4096 #define DML_MAX_NUM_OF_SLICES_PER_DSC 4 +#define DML_MAX_COMPRESSION_RATIO 4 +//#define DML_MODE_SUPPORT_USE_DPM_DRAM_BW +//#define DML_GLOBAL_PREFETCH_CHECK #define ALLOW_SDPIF_RATE_LIMIT_PRE_CSTATE const char *dml2_core_internal_bw_type_str(enum dml2_core_internal_bw_type bw_type) @@ -132,9 +135,9 @@ static void dml2_print_mode_support_info(const struct dml2_core_internal_mode_su dml2_printf("DML: support: DynamicMetadataSupported = %d\n", support->DynamicMetadataSupported); if (!fail_only || support->VRatioInPrefetchSupported == 0) dml2_printf("DML: support: VRatioInPrefetchSupported = %d\n", support->VRatioInPrefetchSupported); - if (!fail_only || support->PTEBufferSizeNotExceeded == 1) + if (!fail_only || support->PTEBufferSizeNotExceeded == 0) dml2_printf("DML: support: PTEBufferSizeNotExceeded = %d\n", support->PTEBufferSizeNotExceeded); - if (!fail_only || support->DCCMetaBufferSizeNotExceeded == 1) + if (!fail_only || support->DCCMetaBufferSizeNotExceeded == 0) dml2_printf("DML: support: DCCMetaBufferSizeNotExceeded = %d\n", support->DCCMetaBufferSizeNotExceeded); if (!fail_only || support->ExceededMALLSize == 1) dml2_printf("DML: support: ExceededMALLSize = %d\n", support->ExceededMALLSize); @@ -315,12 +318,11 @@ dml_get_var_func(meta_trip_memory_us, double, mode_lib->mp.MetaTripToMemory); dml_get_var_func(wm_fclk_change, double, mode_lib->mp.Watermark.FCLKChangeWatermark); dml_get_var_func(wm_usr_retraining, double, mode_lib->mp.Watermark.USRRetrainingWatermark); -dml_get_var_func(wm_g6_temp_read, double, mode_lib->mp.Watermark.g6_temp_read_watermark_us); +dml_get_var_func(wm_temp_read_or_ppt, double, mode_lib->mp.Watermark.temp_read_or_ppt_watermark_us); dml_get_var_func(wm_dram_clock_change, double, mode_lib->mp.Watermark.DRAMClockChangeWatermark); dml_get_var_func(fraction_of_urgent_bandwidth, double, mode_lib->mp.FractionOfUrgentBandwidth); dml_get_var_func(fraction_of_urgent_bandwidth_imm_flip, double, mode_lib->mp.FractionOfUrgentBandwidthImmediateFlip); dml_get_var_func(fraction_of_urgent_bandwidth_mall, double, mode_lib->mp.FractionOfUrgentBandwidthMALL); -dml_get_var_func(urgent_latency, double, mode_lib->mp.UrgentLatency); dml_get_var_func(wm_writeback_dram_clock_change, double, mode_lib->mp.Watermark.WritebackDRAMClockChangeWatermark); dml_get_var_func(wm_writeback_fclk_change, double, mode_lib->mp.Watermark.WritebackFCLKChangeWatermark); dml_get_var_func(stutter_efficiency, double, mode_lib->mp.StutterEfficiency); @@ -355,7 +357,9 @@ dml_get_var_func(svp_prefetch_urg_bw_available_sdp, double, mode_lib->mp.urg_ban dml_get_var_func(svp_prefetch_urg_bw_available_dram, double, mode_lib->mp.urg_bandwidth_available[dml2_core_internal_soc_state_svp_prefetch][dml2_core_internal_bw_dram]); dml_get_var_func(svp_prefetch_urg_bw_available_dram_vm_only, double, mode_lib->mp.urg_bandwidth_available_vm_only[dml2_core_internal_soc_state_svp_prefetch]); +dml_get_var_func(urgent_latency, double, mode_lib->mp.UrgentLatency); dml_get_var_func(max_urgent_latency_us, double, mode_lib->ms.support.max_urgent_latency_us); +dml_get_var_func(max_non_urgent_latency_us, double, mode_lib->ms.support.max_non_urgent_latency_us); dml_get_var_func(avg_non_urgent_latency_us, double, mode_lib->ms.support.avg_non_urgent_latency_us); dml_get_var_func(avg_urgent_latency_us, double, mode_lib->ms.support.avg_urgent_latency_us); @@ -466,6 +470,24 @@ static bool dml_is_420(enum dml2_source_format_class source_format) case dml2_420_12: val = 1; break; + case dml2_422_planar_8: + val = 0; + break; + case dml2_422_planar_10: + val = 0; + break; + case dml2_422_planar_12: + val = 0; + break; + case dml2_422_packed_8: + val = 0; + break; + case dml2_422_packed_10: + val = 0; + break; + case dml2_422_packed_12: + val = 0; + break; case dml2_rgbe_alpha: val = 0; break; @@ -487,32 +509,31 @@ static bool dml_is_420(enum dml2_source_format_class source_format) static unsigned int dml_get_tile_block_size_bytes(enum dml2_swizzle_mode sw_mode) { - switch (sw_mode) { - case (dml2_sw_linear): - return 256; break; - case (dml2_sw_256b_2d): - return 256; break; - case (dml2_sw_4kb_2d): - return 4096; break; - case (dml2_sw_64kb_2d): - return 65536; break; - case (dml2_sw_256kb_2d): - return 262144; break; - case (dml2_gfx11_sw_linear): - return 256; break; - case (dml2_gfx11_sw_64kb_d): - return 65536; break; - case (dml2_gfx11_sw_64kb_d_t): - return 65536; break; - case (dml2_gfx11_sw_64kb_d_x): - return 65536; break; - case (dml2_gfx11_sw_64kb_r_x): - return 65536; break; - case (dml2_gfx11_sw_256kb_d_x): - return 262144; break; - case (dml2_gfx11_sw_256kb_r_x): - return 262144; break; - default: + if (sw_mode == dml2_sw_linear) + return 256; + else if (sw_mode == dml2_sw_256b_2d) + return 256; + else if (sw_mode == dml2_sw_4kb_2d) + return 4096; + else if (sw_mode == dml2_sw_64kb_2d) + return 65536; + else if (sw_mode == dml2_sw_256kb_2d) + return 262144; + else if (sw_mode == dml2_gfx11_sw_linear) + return 256; + else if (sw_mode == dml2_gfx11_sw_64kb_d) + return 65536; + else if (sw_mode == dml2_gfx11_sw_64kb_d_t) + return 65536; + else if (sw_mode == dml2_gfx11_sw_64kb_d_x) + return 65536; + else if (sw_mode == dml2_gfx11_sw_64kb_r_x) + return 65536; + else if (sw_mode == dml2_gfx11_sw_256kb_d_x) + return 262144; + else if (sw_mode == dml2_gfx11_sw_256kb_r_x) + return 262144; + else { DML2_ASSERT(0); return 256; } @@ -820,7 +841,7 @@ static void CalculateSwathWidth( // Output unsigned int req_per_swath_ub_l[], unsigned int req_per_swath_ub_c[], - unsigned int SwathWidthSingleDPPY[], + unsigned int SwathWidthSingleDPPY[], // post-rotated plane width unsigned int SwathWidthSingleDPPC[], unsigned int SwathWidthY[], // per-pipe unsigned int SwathWidthC[], // per-pipe @@ -1403,7 +1424,6 @@ static unsigned int dscceComputeDelay( // N422/N420 operate at 2 pixels per clock unsigned int pixelsPerClock, padding_pixels, ssm_group_priming_delay, ssm_pipeline_delay, obsm_pipeline_delay, slice_padded_pixels, ixd_plus_padding, ixd_plus_padding_groups, cycles_per_group, group_delay, pipeline_delay, pixels, additional_group_delay, lines_to_reach_ixd, groups_to_reach_ixd, slice_width_groups, initial_xmit_delay, number_of_lines_to_reach_ixd, slice_width_modified; - if (pixelFormat == dml2_420) pixelsPerClock = 2; // #all other modes operate at 1 pixel per clock @@ -1428,7 +1448,6 @@ static unsigned int dscceComputeDelay( } } - //sub-stream multiplexer balance fifo priming delay in groups as per dsc standard if (bpc == 8) ssm_group_priming_delay = 83; @@ -1447,9 +1466,6 @@ static unsigned int dscceComputeDelay( //determine number of padded pixels in the last group of a slice line, computed as slice_padded_pixels = 3 * slice_width_groups - slice_width_modified; - - - //determine integer number of complete slice lines required to reach initial transmit delay without ssm delay considered number_of_lines_to_reach_ixd = initial_xmit_delay / slice_width_modified; @@ -1463,7 +1479,6 @@ static unsigned int dscceComputeDelay( //number of groups required for a slice to reach initial transmit delay is the sum of the padded initial transmit delay plus the ssm group priming delay groups_to_reach_ixd = ixd_plus_padding_groups + ssm_group_priming_delay; - //number of lines required to reach padded initial transmit delay in groups in slices to the left of the last horizontal slice //needs to be rounded up as a complete slice lines are buffered prior to initial transmit delay being reached in the last horizontal slice lines_to_reach_ixd = (groups_to_reach_ixd + slice_width_groups - 1) / slice_width_groups; //round up lines to reach ixd to next @@ -1506,7 +1521,6 @@ static unsigned int dscceComputeDelay( return pixels; } - //updated in dcn4 static unsigned int dscComputeDelay(enum dml2_output_format_class pixelFormat, enum dml2_output_encoder_class Output) { @@ -2090,7 +2104,6 @@ static void CalculateDCCConfiguration( yuv420 = 1; else yuv420 = 0; - horz_div_l = 1; horz_div_c = 1; vert_div_l = 1; @@ -2561,8 +2574,7 @@ static void calculate_mcache_setting( if (*p->num_mcaches_l) { l->avg_mcache_element_size_l = l->meta_row_width_l / *p->num_mcaches_l; } - - if (l->is_dual_plane && *p->num_mcaches_c) { + if (l->is_dual_plane) { l->avg_mcache_element_size_c = l->meta_row_width_c / *p->num_mcaches_c; if (!p->imall_enable || (*p->mall_comb_mcache_l == *p->mall_comb_mcache_c)) { @@ -2682,12 +2694,12 @@ static double dml_get_return_bandwidth_available( bool is_avg_bw, bool is_hvm_en, bool is_hvm_only, - double dcflk_mhz, + double dcfclk_mhz, double fclk_mhz, double dram_bw_mbps) { double return_bw_mbps = 0.; - double ideal_sdp_bandwidth = (double)soc->return_bus_width_bytes * dcflk_mhz; + double ideal_sdp_bandwidth = (double)soc->return_bus_width_bytes * dcfclk_mhz; double ideal_fabric_bandwidth = fclk_mhz * (double)soc->fabric_datapath_to_dcn_data_return_bytes; double ideal_dram_bandwidth = dram_bw_mbps; //dram_speed_mts * soc->clk_table.dram_config.channel_count * soc->clk_table.dram_config.channel_width_bytes; @@ -2753,7 +2765,7 @@ static double dml_get_return_bandwidth_available( dml2_printf("DML::%s: is_hvm_only = %u\n", __func__, is_hvm_only); dml2_printf("DML::%s: state_type = %s\n", __func__, dml2_core_internal_soc_state_type_str(state_type)); dml2_printf("DML::%s: bw_type = %s\n", __func__, dml2_core_internal_bw_type_str(bw_type)); - dml2_printf("DML::%s: dcflk_mhz = %f\n", __func__, dcflk_mhz); + dml2_printf("DML::%s: dcfclk_mhz = %f\n", __func__, dcfclk_mhz); dml2_printf("DML::%s: fclk_mhz = %f\n", __func__, fclk_mhz); dml2_printf("DML::%s: ideal_sdp_bandwidth = %f\n", __func__, ideal_sdp_bandwidth); dml2_printf("DML::%s: ideal_fabric_bandwidth = %f\n", __func__, ideal_fabric_bandwidth); @@ -3816,8 +3828,8 @@ static void CalculateSwathAndDETConfiguration(struct dml2_core_internal_scratch p->SwathHeightC[k] = MaximumSwathHeightC[k] / 2; RoundedUpSwathSizeBytesY[k] = p->full_swath_bytes_l[k] / 2; RoundedUpSwathSizeBytesC[k] = p->full_swath_bytes_c[k] / 2; - p->request_size_bytes_luma[k] = ((p->BytePerPixY[k] == 2) == dml_is_vertical_rotation(p->display_cfg->plane_descriptors[k].composition.rotation_angle)) ? 128 : 64; - p->request_size_bytes_chroma[k] = ((p->BytePerPixC[k] == 2) == dml_is_vertical_rotation(p->display_cfg->plane_descriptors[k].composition.rotation_angle)) ? 128 : 64; + p->request_size_bytes_luma[k] = ((p->BytePerPixY[k] == 2) == dml_is_vertical_rotation(p->display_cfg->plane_descriptors[k].composition.rotation_angle)) ? 128 : 64;; + p->request_size_bytes_chroma[k] = ((p->BytePerPixC[k] == 2) == dml_is_vertical_rotation(p->display_cfg->plane_descriptors[k].composition.rotation_angle)) ? 128 : 64;; } if (p->SwathHeightC[k] == 0) @@ -5069,20 +5081,18 @@ static bool CalculatePrefetchSchedule(struct dml2_core_internal_scratch *scratch s->trip_to_mem = 0.0; *p->Tvm_trips = 0.0; *p->Tr0_trips = 0.0; - s->Tvm_no_trip_oto = 0.0; - s->Tr0_no_trip_oto = 0.0; s->Tvm_trips_rounded = 0.0; s->Tr0_trips_rounded = 0.0; s->max_Tsw = 0.0; s->Lsw_oto = 0.0; - s->Tpre_rounded = 0.0; + *p->Tpre_rounded = 0.0; s->prefetch_bw_equ = 0.0; s->Tvm_equ = 0.0; s->Tr0_equ = 0.0; s->Tdmbf = 0.0; s->Tdmec = 0.0; s->Tdmsks = 0.0; - s->prefetch_sw_bytes = 0.0; + *p->prefetch_sw_bytes = 0.0; s->prefetch_bw_pr = 0.0; s->bytes_pp = 0.0; s->dep_bytes = 0.0; @@ -5207,6 +5217,7 @@ static bool CalculatePrefetchSchedule(struct dml2_core_internal_scratch *scratch dml2_printf("DML::%s: setup_for_tdlut = %u\n", __func__, p->setup_for_tdlut); dml2_printf("DML::%s: tdlut_opt_time = %f\n", __func__, p->tdlut_opt_time); dml2_printf("DML::%s: tdlut_pte_bytes_per_frame = %u\n", __func__, p->tdlut_pte_bytes_per_frame); + dml2_printf("DML::%s: tdlut_drain_time = %f\n", __func__, p->tdlut_drain_time); #endif if (p->OutputFormat == dml2_420 || (p->myPipe->InterlaceEnable && p->myPipe->ProgressiveToInterlaceUnitInOPP)) @@ -5277,23 +5288,8 @@ static bool CalculatePrefetchSchedule(struct dml2_core_internal_scratch *scratch s->bytes_pp = p->myPipe->BytePerPixelY + p->myPipe->BytePerPixelC; } - s->prefetch_bw_pr = s->bytes_pp * p->myPipe->PixelClock / (double)p->myPipe->DPPPerSurface; - if (p->myPipe->VRatio < 1.0) - s->prefetch_bw_pr = p->myPipe->VRatio * s->prefetch_bw_pr; - s->max_Tsw = (math_max2(p->PrefetchSourceLinesY, p->PrefetchSourceLinesC) * s->LineTime); - - s->prefetch_sw_bytes = p->PrefetchSourceLinesY * p->swath_width_luma_ub * p->myPipe->BytePerPixelY + p->PrefetchSourceLinesC * p->swath_width_chroma_ub * p->myPipe->BytePerPixelC; - s->prefetch_bw_pr = s->prefetch_bw_pr * p->mall_prefetch_sdp_overhead_factor; - s->prefetch_sw_bytes = s->prefetch_sw_bytes * p->mall_prefetch_sdp_overhead_factor; - s->prefetch_bw_oto = math_max2(s->prefetch_bw_pr, s->prefetch_sw_bytes / s->max_Tsw); - - s->min_Lsw_oto = math_max2(p->PrefetchSourceLinesY, p->PrefetchSourceLinesC) / __DML2_CALCS_MAX_VRATIO_PRE_OTO__; - s->min_Lsw_oto = math_max2(s->min_Lsw_oto, 2.0); - s->min_Lsw_oto = math_max2(s->min_Lsw_oto, p->tdlut_drain_time / s->LineTime); - - s->min_Lsw_equ = math_max2(p->PrefetchSourceLinesY, p->PrefetchSourceLinesC) / __DML2_CALCS_MAX_VRATIO_PRE_EQU__; - s->min_Lsw_equ = math_max2(s->min_Lsw_equ, 2.0); - s->min_Lsw_equ = math_max2(s->min_Lsw_equ, p->tdlut_drain_time / s->LineTime); + *p->prefetch_sw_bytes = p->PrefetchSourceLinesY * p->swath_width_luma_ub * p->myPipe->BytePerPixelY + p->PrefetchSourceLinesC * p->swath_width_chroma_ub * p->myPipe->BytePerPixelC; + *p->prefetch_sw_bytes = *p->prefetch_sw_bytes * p->mall_prefetch_sdp_overhead_factor; vm_bytes = p->vm_bytes; // vm_bytes is dpde0_bytes_per_frame_ub_l + dpde0_bytes_per_frame_ub_c + 2*extra_dpde_bytes; extra_tdpe_bytes = (unsigned int)math_max2(0, (p->display_cfg->gpuvm_max_page_table_levels - 1) * 128); @@ -5302,57 +5298,103 @@ static bool CalculatePrefetchSchedule(struct dml2_core_internal_scratch *scratch vm_bytes = vm_bytes + p->tdlut_pte_bytes_per_frame + (p->display_cfg->gpuvm_enable ? extra_tdpe_bytes : 0); tdlut_row_bytes = (unsigned long) math_ceil2(p->tdlut_bytes_per_frame/2.0, 1.0); + + s->min_Lsw_oto = math_max2(p->PrefetchSourceLinesY, p->PrefetchSourceLinesC) / __DML2_CALCS_MAX_VRATIO_PRE_OTO__; + s->min_Lsw_oto = math_max2(s->min_Lsw_oto, p->tdlut_drain_time / s->LineTime); + s->min_Lsw_oto = math_max2(s->min_Lsw_oto, 2.0); + + // use vactive swath bw for prefetch oto and also cap prefetch_bw_oto to max_vratio_oto + // Note: in prefetch calculation, acounting is done mostly per-pipe. + // vactive swath bw represents the per-surface (aka per dml plane) bw to move vratio_l/c lines of bytes_l/c per line time + s->per_pipe_vactive_sw_bw = p->vactive_sw_bw_l / (double)p->myPipe->DPPPerSurface; + + // one-to-one prefetch bw as one line of bytes per line time (as per vratio_pre_l/c = 1) + s->prefetch_bw_oto = (p->swath_width_luma_ub * p->myPipe->BytePerPixelY) / s->LineTime; + + if (p->myPipe->BytePerPixelC > 0) { + s->per_pipe_vactive_sw_bw += p->vactive_sw_bw_c / (double)p->myPipe->DPPPerSurface; + s->prefetch_bw_oto += (p->swath_width_chroma_ub * p->myPipe->BytePerPixelC) / s->LineTime; + } + + s->prefetch_bw_oto = math_max2(s->per_pipe_vactive_sw_bw, s->prefetch_bw_oto) * p->mall_prefetch_sdp_overhead_factor; + + s->prefetch_bw_oto = math_min2(s->prefetch_bw_oto, *p->prefetch_sw_bytes/(s->min_Lsw_oto*s->LineTime)); + + s->Lsw_oto = math_ceil2(4.0 * *p->prefetch_sw_bytes / s->prefetch_bw_oto / s->LineTime, 1.0) / 4.0; + s->prefetch_bw_oto = math_max3(s->prefetch_bw_oto, p->vm_bytes * p->HostVMInefficiencyFactor / (31 * s->LineTime) - *p->Tno_bw, (p->PixelPTEBytesPerRow * p->HostVMInefficiencyFactor + p->meta_row_bytes + tdlut_row_bytes) / (15 * s->LineTime)); - s->Lsw_oto = math_ceil2(4.0 * math_max2(s->prefetch_sw_bytes / s->prefetch_bw_oto / s->LineTime, s->min_Lsw_oto), 1.0) / 4.0; + +#ifdef __DML_VBA_DEBUG__ + dml2_printf("DML::%s: vactive_sw_bw_l = %f\n", __func__, p->vactive_sw_bw_l); + dml2_printf("DML::%s: vactive_sw_bw_c = %f\n", __func__, p->vactive_sw_bw_c); + dml2_printf("DML::%s: per_pipe_vactive_sw_bw = %f\n", __func__, s->per_pipe_vactive_sw_bw); +#endif if (p->display_cfg->gpuvm_enable == true) { - s->Tvm_no_trip_oto = math_max2( + s->Tvm_oto = math_max3( + *p->Tvm_trips, *p->Tno_bw + vm_bytes * p->HostVMInefficiencyFactor / s->prefetch_bw_oto, s->LineTime / 4.0); - s->Tvm_oto = math_max2( - *p->Tvm_trips, - s->Tvm_no_trip_oto); + #ifdef __DML_VBA_DEBUG__ dml2_printf("DML::%s: Tvm_oto max0 = %f\n", __func__, *p->Tvm_trips); dml2_printf("DML::%s: Tvm_oto max1 = %f\n", __func__, *p->Tno_bw + vm_bytes * p->HostVMInefficiencyFactor / s->prefetch_bw_oto); dml2_printf("DML::%s: Tvm_oto max2 = %f\n", __func__, s->LineTime / 4.0); #endif } else { - s->Tvm_no_trip_oto = s->Tvm_trips_rounded; s->Tvm_oto = s->Tvm_trips_rounded; } if ((p->display_cfg->gpuvm_enable == true || p->setup_for_tdlut || dcc_mrq_enable)) { - s->Tr0_no_trip_oto = math_max2( + s->Tr0_oto = math_max3( + *p->Tr0_trips, (p->PixelPTEBytesPerRow * p->HostVMInefficiencyFactor + p->meta_row_bytes + tdlut_row_bytes) / s->prefetch_bw_oto, s->LineTime / 4.0); - s->Tr0_oto = math_max2( - *p->Tr0_trips, - s->Tr0_no_trip_oto); #ifdef __DML_VBA_DEBUG__ dml2_printf("DML::%s: Tr0_oto max0 = %f\n", __func__, *p->Tr0_trips); dml2_printf("DML::%s: Tr0_oto max1 = %f\n", __func__, (p->PixelPTEBytesPerRow * p->HostVMInefficiencyFactor + p->meta_row_bytes + tdlut_row_bytes) / s->prefetch_bw_oto); dml2_printf("DML::%s: Tr0_oto max2 = %f\n", __func__, s->LineTime / 4); #endif - } else { - s->Tr0_no_trip_oto = (s->LineTime - s->Tvm_oto) / 4.0; - s->Tr0_oto = s->Tr0_no_trip_oto; - } + } else + s->Tr0_oto = s->LineTime / 4.0; s->Tvm_oto_lines = math_ceil2(4.0 * s->Tvm_oto / s->LineTime, 1) / 4.0; s->Tr0_oto_lines = math_ceil2(4.0 * s->Tr0_oto / s->LineTime, 1) / 4.0; s->dst_y_prefetch_oto = s->Tvm_oto_lines + 2 * s->Tr0_oto_lines + s->Lsw_oto; +#ifdef DML_GLOBAL_PREFETCH_CHECK + dml2_printf("DML::%s: impacted_Tpre = %f\n", __func__, p->impacted_dst_y_pre); + if (p->impacted_dst_y_pre > 0) { + dml2_printf("DML::%s: dst_y_prefetch_oto = %f\n", __func__, s->dst_y_prefetch_oto); + s->dst_y_prefetch_oto = math_max2(s->dst_y_prefetch_oto, p->impacted_dst_y_pre); + dml2_printf("DML::%s: dst_y_prefetch_oto = %f (impacted)\n", __func__, s->dst_y_prefetch_oto); + } +#endif + *p->Tpre_oto = s->dst_y_prefetch_oto * s->LineTime; + //To (time for delay after scaler) in line time Lo = (unsigned int)(*p->DSTYAfterScaler + (double)*p->DSTXAfterScaler / (double)p->myPipe->HTotal); + s->min_Lsw_equ = math_max2(p->PrefetchSourceLinesY, p->PrefetchSourceLinesC) / __DML2_CALCS_MAX_VRATIO_PRE_EQU__; + s->min_Lsw_equ = math_max2(s->min_Lsw_equ, p->tdlut_drain_time / s->LineTime); + s->min_Lsw_equ = math_max2(s->min_Lsw_equ, 2.0); //Tpre_equ in line time if (p->DynamicMetadataVMEnabled && p->DynamicMetadataEnable) s->dst_y_prefetch_equ = p->VStartup - (*p->TSetup + math_max2(p->TCalc, *p->Tvm_trips) + s->TWait_p) / s->LineTime - Lo; else s->dst_y_prefetch_equ = p->VStartup - (*p->TSetup + math_max2(p->TCalc, p->ExtraLatencyPrefetch) + s->TWait_p) / s->LineTime - Lo; + +#ifdef DML_GLOBAL_PREFETCH_CHECK + s->dst_y_prefetch_equ_impacted = math_max2(p->impacted_dst_y_pre, s->dst_y_prefetch_equ); + + s->dst_y_prefetch_equ_impacted = math_min2(s->dst_y_prefetch_equ_impacted, 63.75); // limit to the reg limit of U6.2 for DST_Y_PREFETCH + + if (s->dst_y_prefetch_equ_impacted > s->dst_y_prefetch_equ) + s->dst_y_prefetch_equ -= s->dst_y_prefetch_equ_impacted - s->dst_y_prefetch_equ; +#endif + s->dst_y_prefetch_equ = math_min2(s->dst_y_prefetch_equ, 63.75); // limit to the reg limit of U6.2 for DST_Y_PREFETCH #ifdef __DML_VBA_DEBUG__ @@ -5370,7 +5412,7 @@ static bool CalculatePrefetchSchedule(struct dml2_core_internal_scratch *scratch dml2_printf("DML::%s: BytePerPixelC = %u\n", __func__, p->myPipe->BytePerPixelC); dml2_printf("DML::%s: PrefetchSourceLinesC = %f\n", __func__, p->PrefetchSourceLinesC); dml2_printf("DML::%s: swath_width_chroma_ub = %u\n", __func__, p->swath_width_chroma_ub); - dml2_printf("DML::%s: prefetch_sw_bytes = %f\n", __func__, s->prefetch_sw_bytes); + dml2_printf("DML::%s: prefetch_sw_bytes = %f\n", __func__, *p->prefetch_sw_bytes); dml2_printf("DML::%s: max_Tsw = %f\n", __func__, s->max_Tsw); dml2_printf("DML::%s: bytes_pp = %f\n", __func__, s->bytes_pp); dml2_printf("DML::%s: vm_bytes = %u\n", __func__, vm_bytes); @@ -5394,7 +5436,7 @@ static bool CalculatePrefetchSchedule(struct dml2_core_internal_scratch *scratch #endif double Tpre = s->dst_y_prefetch_equ * s->LineTime; s->dst_y_prefetch_equ = math_floor2(4.0 * (s->dst_y_prefetch_equ + 0.125), 1) / 4.0; - s->Tpre_rounded = s->dst_y_prefetch_equ * s->LineTime; + *p->Tpre_rounded = s->dst_y_prefetch_equ * s->LineTime; #ifdef __DML_VBA_DEBUG__ dml2_printf("DML::%s: dst_y_prefetch_equ: %f (after round)\n", __func__, s->dst_y_prefetch_equ); @@ -5420,7 +5462,7 @@ static bool CalculatePrefetchSchedule(struct dml2_core_internal_scratch *scratch dml2_printf("DML::%s: vm_bytes: %f (hvm inefficiency scaled)\n", __func__, vm_bytes*p->HostVMInefficiencyFactor); dml2_printf("DML::%s: row_bytes: %f (hvm inefficiency scaled, 1 row)\n", __func__, p->PixelPTEBytesPerRow*p->HostVMInefficiencyFactor+p->meta_row_bytes+tdlut_row_bytes); dml2_printf("DML::%s: Tno_bw: %f\n", __func__, *p->Tno_bw); - dml2_printf("DML::%s: Tpre=%f Tpre_rounded: %f, delta=%f\n", __func__, Tpre, s->Tpre_rounded, (s->Tpre_rounded - Tpre)); + dml2_printf("DML::%s: Tpre=%f Tpre_rounded: %f, delta=%f\n", __func__, Tpre, *p->Tpre_rounded, (*p->Tpre_rounded - Tpre)); dml2_printf("DML::%s: Tvm_trips=%f Tvm_trips_rounded: %f, delta=%f\n", __func__, *p->Tvm_trips, s->Tvm_trips_rounded, (s->Tvm_trips_rounded - *p->Tvm_trips)); #endif @@ -5434,78 +5476,85 @@ static bool CalculatePrefetchSchedule(struct dml2_core_internal_scratch *scratch // Tpre_rounded is Tpre rounding to 2-bit fraction // Tvm_trips_rounded is Tvm_trips ceiling to 1/4 line time // Tr0_trips_rounded is Tr0_trips ceiling to 1/4 line time - // So that means prefetch bw calculated can be higher since the total time availabe for prefetch is less - bool min_Lsw_equ_ok = s->Tpre_rounded >= s->Tvm_trips_rounded + 2.0*s->Tr0_trips_rounded + s->min_Lsw_equ*s->LineTime; + // So that means prefetch bw calculated can be higher since the total time available for prefetch is less + bool min_Lsw_equ_ok = *p->Tpre_rounded >= s->Tvm_trips_rounded + 2.0*s->Tr0_trips_rounded + s->min_Lsw_equ*s->LineTime; + bool tpre_gt_req_latency = true; +#if 0 + // Check that Tpre_rounded is big enough if all of the stages of the prefetch are time constrained. + // The terms Tvm_trips_rounded and Tr0_trips_rounded represent the min time constraints for the VM and row stages. + // Normally, these terms cover the overall time constraint for Tpre >= (Tex + max{Ttrip, Turg}), but if these terms are at their minimum, an explicit check is necessary. + tpre_gt_req_latency = *p->Tpre_rounded > (math_max2(p->Turg, s->trip_to_mem) + p->ExtraLatencyPrefetch); +#endif - if (s->dst_y_prefetch_equ > 1 && min_Lsw_equ_ok) { + if (s->dst_y_prefetch_equ > 1 && min_Lsw_equ_ok && tpre_gt_req_latency) { s->prefetch_bw1 = 0.; s->prefetch_bw2 = 0.; s->prefetch_bw3 = 0.; s->prefetch_bw4 = 0.; // prefetch_bw1: VM + 2*R0 + SW - if (s->Tpre_rounded - *p->Tno_bw > 0) { + if (*p->Tpre_rounded - *p->Tno_bw > 0) { s->prefetch_bw1 = (vm_bytes * p->HostVMInefficiencyFactor + 2 * (p->PixelPTEBytesPerRow * p->HostVMInefficiencyFactor + p->meta_row_bytes + tdlut_row_bytes) - + s->prefetch_sw_bytes) - / (s->Tpre_rounded - *p->Tno_bw); - s->Tsw_est1 = s->prefetch_sw_bytes / s->prefetch_bw1; + + *p->prefetch_sw_bytes) + / (*p->Tpre_rounded - *p->Tno_bw); + s->Tsw_est1 = *p->prefetch_sw_bytes / s->prefetch_bw1; } else s->prefetch_bw1 = 0; dml2_printf("DML::%s: prefetch_bw1: %f\n", __func__, s->prefetch_bw1); - if ((s->Tsw_est1 < s->min_Lsw_equ * s->LineTime) && (s->Tpre_rounded - s->min_Lsw_equ * s->LineTime - 0.75 * s->LineTime - *p->Tno_bw > 0)) { + if ((s->Tsw_est1 < s->min_Lsw_equ * s->LineTime) && (*p->Tpre_rounded - s->min_Lsw_equ * s->LineTime - 0.75 * s->LineTime - *p->Tno_bw > 0)) { s->prefetch_bw1 = (vm_bytes * p->HostVMInefficiencyFactor + 2 * (p->PixelPTEBytesPerRow * p->HostVMInefficiencyFactor + p->meta_row_bytes + tdlut_row_bytes)) / - (s->Tpre_rounded - s->min_Lsw_equ * s->LineTime - 0.75 * s->LineTime - *p->Tno_bw); + (*p->Tpre_rounded - s->min_Lsw_equ * s->LineTime - 0.75 * s->LineTime - *p->Tno_bw); #ifdef __DML_VBA_DEBUG__ dml2_printf("DML::%s: vm and 2 rows bytes = %f\n", __func__, (vm_bytes * p->HostVMInefficiencyFactor + 2 * (p->PixelPTEBytesPerRow * p->HostVMInefficiencyFactor + p->meta_row_bytes + tdlut_row_bytes))); - dml2_printf("DML::%s: Tpre_rounded = %f\n", __func__, s->Tpre_rounded); + dml2_printf("DML::%s: Tpre_rounded = %f\n", __func__, *p->Tpre_rounded); dml2_printf("DML::%s: minus term = %f\n", __func__, s->min_Lsw_equ * s->LineTime + 0.75 * s->LineTime + *p->Tno_bw); dml2_printf("DML::%s: min_Lsw_equ = %f\n", __func__, s->min_Lsw_equ); dml2_printf("DML::%s: LineTime = %f\n", __func__, s->LineTime); dml2_printf("DML::%s: Tno_bw = %f\n", __func__, *p->Tno_bw); - dml2_printf("DML::%s: Time to fetch vm and 2 rows = %f\n", __func__, (s->Tpre_rounded - s->min_Lsw_equ * s->LineTime - 0.75 * s->LineTime - *p->Tno_bw)); + dml2_printf("DML::%s: Time to fetch vm and 2 rows = %f\n", __func__, (*p->Tpre_rounded - s->min_Lsw_equ * s->LineTime - 0.75 * s->LineTime - *p->Tno_bw)); dml2_printf("DML::%s: prefetch_bw1: %f (updated)\n", __func__, s->prefetch_bw1); #endif } // prefetch_bw2: VM + SW - if (s->Tpre_rounded - *p->Tno_bw - 2.0 * s->Tr0_trips_rounded > 0) { - s->prefetch_bw2 = (vm_bytes * p->HostVMInefficiencyFactor + s->prefetch_sw_bytes) / - (s->Tpre_rounded - *p->Tno_bw - 2.0 * s->Tr0_trips_rounded); - s->Tsw_est2 = s->prefetch_sw_bytes / s->prefetch_bw2; + if (*p->Tpre_rounded - *p->Tno_bw - 2.0 * s->Tr0_trips_rounded > 0) { + s->prefetch_bw2 = (vm_bytes * p->HostVMInefficiencyFactor + *p->prefetch_sw_bytes) / + (*p->Tpre_rounded - *p->Tno_bw - 2.0 * s->Tr0_trips_rounded); + s->Tsw_est2 = *p->prefetch_sw_bytes / s->prefetch_bw2; } else s->prefetch_bw2 = 0; dml2_printf("DML::%s: prefetch_bw2: %f\n", __func__, s->prefetch_bw2); - if ((s->Tsw_est2 < s->min_Lsw_equ * s->LineTime) && ((s->Tpre_rounded - *p->Tno_bw - 2.0 * s->Tr0_trips_rounded - s->min_Lsw_equ * s->LineTime - 0.25 * s->LineTime) > 0)) { - s->prefetch_bw2 = vm_bytes * p->HostVMInefficiencyFactor / (s->Tpre_rounded - *p->Tno_bw - 2.0 * s->Tr0_trips_rounded - s->min_Lsw_equ * s->LineTime - 0.25 * s->LineTime); + if ((s->Tsw_est2 < s->min_Lsw_equ * s->LineTime) && ((*p->Tpre_rounded - *p->Tno_bw - 2.0 * s->Tr0_trips_rounded - s->min_Lsw_equ * s->LineTime - 0.25 * s->LineTime) > 0)) { + s->prefetch_bw2 = vm_bytes * p->HostVMInefficiencyFactor / (*p->Tpre_rounded - *p->Tno_bw - 2.0 * s->Tr0_trips_rounded - s->min_Lsw_equ * s->LineTime - 0.25 * s->LineTime); dml2_printf("DML::%s: prefetch_bw2: %f (updated)\n", __func__, s->prefetch_bw2); } // prefetch_bw3: 2*R0 + SW - if (s->Tpre_rounded - s->Tvm_trips_rounded > 0) { - s->prefetch_bw3 = (2 * (p->PixelPTEBytesPerRow * p->HostVMInefficiencyFactor + p->meta_row_bytes + tdlut_row_bytes) + s->prefetch_sw_bytes) / - (s->Tpre_rounded - s->Tvm_trips_rounded); - s->Tsw_est3 = s->prefetch_sw_bytes / s->prefetch_bw3; + if (*p->Tpre_rounded - s->Tvm_trips_rounded > 0) { + s->prefetch_bw3 = (2 * (p->PixelPTEBytesPerRow * p->HostVMInefficiencyFactor + p->meta_row_bytes + tdlut_row_bytes) + *p->prefetch_sw_bytes) / + (*p->Tpre_rounded - s->Tvm_trips_rounded); + s->Tsw_est3 = *p->prefetch_sw_bytes / s->prefetch_bw3; } else s->prefetch_bw3 = 0; dml2_printf("DML::%s: prefetch_bw3: %f\n", __func__, s->prefetch_bw3); - if ((s->Tsw_est3 < s->min_Lsw_equ * s->LineTime) && ((s->Tpre_rounded - s->min_Lsw_equ * s->LineTime - 0.5 * s->LineTime - s->Tvm_trips_rounded) > 0)) { - s->prefetch_bw3 = (2 * (p->PixelPTEBytesPerRow * p->HostVMInefficiencyFactor + p->meta_row_bytes + tdlut_row_bytes)) / (s->Tpre_rounded - s->min_Lsw_equ * s->LineTime - 0.5 * s->LineTime - s->Tvm_trips_rounded); + if ((s->Tsw_est3 < s->min_Lsw_equ * s->LineTime) && ((*p->Tpre_rounded - s->min_Lsw_equ * s->LineTime - 0.5 * s->LineTime - s->Tvm_trips_rounded) > 0)) { + s->prefetch_bw3 = (2 * (p->PixelPTEBytesPerRow * p->HostVMInefficiencyFactor + p->meta_row_bytes + tdlut_row_bytes)) / (*p->Tpre_rounded - s->min_Lsw_equ * s->LineTime - 0.5 * s->LineTime - s->Tvm_trips_rounded); dml2_printf("DML::%s: prefetch_bw3: %f (updated)\n", __func__, s->prefetch_bw3); } // prefetch_bw4: SW - if (s->Tpre_rounded - s->Tvm_trips_rounded - 2 * s->Tr0_trips_rounded > 0) - s->prefetch_bw4 = s->prefetch_sw_bytes / (s->Tpre_rounded - s->Tvm_trips_rounded - 2 * s->Tr0_trips_rounded); + if (*p->Tpre_rounded - s->Tvm_trips_rounded - 2 * s->Tr0_trips_rounded > 0) + s->prefetch_bw4 = *p->prefetch_sw_bytes / (*p->Tpre_rounded - s->Tvm_trips_rounded - 2 * s->Tr0_trips_rounded); else s->prefetch_bw4 = 0; #ifdef __DML_VBA_DEBUG__ dml2_printf("DML::%s: Tno_bw: %f\n", __func__, *p->Tno_bw); - dml2_printf("DML::%s: Tpre=%f Tpre_rounded: %f, delta=%f\n", __func__, Tpre, s->Tpre_rounded, (s->Tpre_rounded - Tpre)); + dml2_printf("DML::%s: Tpre=%f Tpre_rounded: %f, delta=%f\n", __func__, Tpre, *p->Tpre_rounded, (*p->Tpre_rounded - Tpre)); dml2_printf("DML::%s: Tvm_trips=%f Tvm_trips_rounded: %f, delta=%f\n", __func__, *p->Tvm_trips, s->Tvm_trips_rounded, (s->Tvm_trips_rounded - *p->Tvm_trips)); dml2_printf("DML::%s: Tr0_trips=%f Tr0_trips_rounded: %f, delta=%f\n", __func__, *p->Tr0_trips, s->Tr0_trips_rounded, (s->Tr0_trips_rounded - *p->Tr0_trips)); dml2_printf("DML::%s: Tsw_est1: %f\n", __func__, s->Tsw_est1); @@ -5617,9 +5666,6 @@ static bool CalculatePrefetchSchedule(struct dml2_core_internal_scratch *scratch dml2_printf("DML::%s: Tvm_equ = %f\n", __func__, s->Tvm_equ); dml2_printf("DML::%s: Tr0_equ = %f\n", __func__, s->Tr0_equ); #endif - // Lsw = dst_y_prefetch - (dst_y_per_vm_vblank + 2*dst_y_per_row_vblank) - s->Lsw_equ = s->dst_y_prefetch_equ - math_ceil2(4.0 * (s->Tvm_equ + 2 * s->Tr0_equ) / s->LineTime, 1.0) / 4.0; - // Use the more stressful prefetch schedule if (s->dst_y_prefetch_oto < s->dst_y_prefetch_equ) { *p->dst_y_prefetch = s->dst_y_prefetch_oto; @@ -5628,28 +5674,29 @@ static bool CalculatePrefetchSchedule(struct dml2_core_internal_scratch *scratch *p->dst_y_per_vm_vblank = math_ceil2(4.0 * s->TimeForFetchingVM / s->LineTime, 1.0) / 4.0; *p->dst_y_per_row_vblank = math_ceil2(4.0 * s->TimeForFetchingRowInVBlank / s->LineTime, 1.0) / 4.0; - s->dst_y_per_vm_no_trip_vblank = math_ceil2(4.0 * s->Tvm_no_trip_oto / s->LineTime, 1.0) / 4.0; - s->dst_y_per_row_no_trip_vblank = math_ceil2(4.0 * s->Tr0_no_trip_oto / s->LineTime, 1.0) / 4.0; #ifdef __DML_VBA_DEBUG__ dml2_printf("DML::%s: Using oto scheduling for prefetch\n", __func__); #endif + } else { *p->dst_y_prefetch = s->dst_y_prefetch_equ; + + if (s->dst_y_prefetch_equ < s->dst_y_prefetch_equ_impacted) + *p->dst_y_prefetch = s->dst_y_prefetch_equ_impacted; + s->TimeForFetchingVM = s->Tvm_equ; s->TimeForFetchingRowInVBlank = s->Tr0_equ; - *p->dst_y_per_vm_vblank = math_ceil2(4.0 * s->TimeForFetchingVM / s->LineTime, 1.0) / 4.0; - *p->dst_y_per_row_vblank = math_ceil2(4.0 * s->TimeForFetchingRowInVBlank / s->LineTime, 1.0) / 4.0; - s->dst_y_per_vm_no_trip_vblank = *p->dst_y_per_vm_vblank; - s->dst_y_per_row_no_trip_vblank = *p->dst_y_per_row_vblank; + *p->dst_y_per_vm_vblank = math_ceil2(4.0 * s->TimeForFetchingVM / s->LineTime, 1.0) / 4.0; + *p->dst_y_per_row_vblank = math_ceil2(4.0 * s->TimeForFetchingRowInVBlank / s->LineTime, 1.0) / 4.0; #ifdef __DML_VBA_DEBUG__ dml2_printf("DML::%s: Using equ bw scheduling for prefetch\n", __func__); #endif } - /* take worst case Lsw to calculate bandwidth requirement regardless of schedule */ - s->LinesToRequestPrefetchPixelData = math_min2(s->Lsw_equ, s->Lsw_oto); // Lsw + // Lsw = dst_y_prefetch - (dst_y_per_vm_vblank + 2*dst_y_per_row_vblank) + s->LinesToRequestPrefetchPixelData = *p->dst_y_prefetch - *p->dst_y_per_vm_vblank - 2 * *p->dst_y_per_row_vblank; // Lsw s->cursor_prefetch_bytes = (unsigned int)math_max2(p->cursor_bytes_per_chunk, 4 * p->cursor_bytes_per_line); *p->prefetch_cursor_bw = p->num_cursors * s->cursor_prefetch_bytes / (s->LinesToRequestPrefetchPixelData * s->LineTime); @@ -5749,8 +5796,10 @@ static bool CalculatePrefetchSchedule(struct dml2_core_internal_scratch *scratch } else { dml2_printf("DML::%s: No time to prefetch! dst_y_prefetch_equ = %f (should be > 1)\n", __func__, s->dst_y_prefetch_equ); - dml2_printf("DML::%s: No time to prefetch! min_Lsw_equ_ok = %d, Tpre_rounded (%f) should be >= Tvm_trips_rounded (%f) + 2.0*Tr0_trips_rounded (%f) + min_Tsw_equ (%f)\n", - __func__, min_Lsw_equ_ok, s->Tpre_rounded, s->Tvm_trips_rounded, 2.0*s->Tr0_trips_rounded, s->min_Lsw_equ*s->LineTime); + dml2_printf("DML::%s: No time to prefetch! min_Lsw_equ_ok = %d, Tpre_rounded (%f) should be >= Tvm_trips_rounded (%f) + 2.0*Tr0_trips_rounded (%f) + min_Tsw_equ (%f)\n", + __func__, min_Lsw_equ_ok, *p->Tpre_rounded, s->Tvm_trips_rounded, 2.0*s->Tr0_trips_rounded, s->min_Lsw_equ*s->LineTime); + dml2_printf("DML::%s: No time to prefetch! min_Lsw_equ_ok = %d, Tpre_rounded+Tvm_trips_rounded+2.0*Tr0_trips_rounded+min_Tsw_equ (%f) should be > \n", + __func__, tpre_gt_req_latency, (s->min_Lsw_equ*s->LineTime + s->Tvm_trips_rounded + 2.0*s->Tr0_trips_rounded), p->Turg, s->trip_to_mem, p->ExtraLatencyPrefetch); s->NoTimeToPrefetch = true; s->TimeForFetchingVM = 0; s->TimeForFetchingRowInVBlank = 0; @@ -5769,13 +5818,13 @@ static bool CalculatePrefetchSchedule(struct dml2_core_internal_scratch *scratch if (vm_bytes == 0) { prefetch_vm_bw = 0; - } else if (s->dst_y_per_vm_no_trip_vblank > 0) { + } else if (*p->dst_y_per_vm_vblank > 0) { #ifdef __DML_VBA_DEBUG__ dml2_printf("DML::%s: HostVMInefficiencyFactor = %f\n", __func__, p->HostVMInefficiencyFactor); dml2_printf("DML::%s: dst_y_per_vm_vblank = %f\n", __func__, *p->dst_y_per_vm_vblank); dml2_printf("DML::%s: LineTime = %f\n", __func__, s->LineTime); #endif - prefetch_vm_bw = vm_bytes * p->HostVMInefficiencyFactor / (s->dst_y_per_vm_no_trip_vblank * s->LineTime); + prefetch_vm_bw = vm_bytes * p->HostVMInefficiencyFactor / (*p->dst_y_per_vm_vblank * s->LineTime); #ifdef __DML_VBA_DEBUG__ dml2_printf("DML::%s: prefetch_vm_bw = %f\n", __func__, prefetch_vm_bw); #endif @@ -5787,8 +5836,8 @@ static bool CalculatePrefetchSchedule(struct dml2_core_internal_scratch *scratch if (p->PixelPTEBytesPerRow == 0 && tdlut_row_bytes == 0) { prefetch_row_bw = 0; - } else if (s->dst_y_per_row_no_trip_vblank > 0) { - prefetch_row_bw = (p->PixelPTEBytesPerRow * p->HostVMInefficiencyFactor + tdlut_row_bytes) / (s->dst_y_per_row_no_trip_vblank * s->LineTime); + } else if (*p->dst_y_per_row_vblank > 0) { + prefetch_row_bw = (p->PixelPTEBytesPerRow * p->HostVMInefficiencyFactor + tdlut_row_bytes) / (*p->dst_y_per_row_vblank * s->LineTime); #ifdef __DML_VBA_DEBUG__ dml2_printf("DML::%s: PixelPTEBytesPerRow = %u\n", __func__, p->PixelPTEBytesPerRow); @@ -5828,6 +5877,171 @@ static bool CalculatePrefetchSchedule(struct dml2_core_internal_scratch *scratch return s->NoTimeToPrefetch; } +static unsigned int get_num_lb_source_lines(unsigned int max_line_buffer_lines, + unsigned int line_buffer_size_bits, + unsigned int num_pipes, + unsigned int vp_width, + unsigned int vp_height, + double h_ratio, + enum dml2_rotation_angle rotation_angle) +{ + unsigned int num_lb_source_lines = 0; + double lb_bit_per_pixel = 57.0; + unsigned recin_width = vp_width/num_pipes; + + if (dml_is_vertical_rotation(rotation_angle)) + recin_width = vp_height/num_pipes; + + num_lb_source_lines = (unsigned int) math_min2((double) max_line_buffer_lines, + math_floor2(line_buffer_size_bits / lb_bit_per_pixel / (recin_width / math_max2(h_ratio, 1.0)), 1.0)); + + return num_lb_source_lines; +} + +static unsigned int find_max_impact_plane(unsigned int this_plane_idx, unsigned int num_planes, unsigned int Trpd_dcfclk_cycles[]) +{ + int max_value = -1; + int max_idx = -1; + for (unsigned int i = 0; i < num_planes; i++) { + if (i != this_plane_idx && (int) Trpd_dcfclk_cycles[i] > max_value) { + max_value = Trpd_dcfclk_cycles[i]; + max_idx = i; + } + } + if (max_idx <= 0) { + dml2_assert(max_idx >= 0); + max_idx = this_plane_idx; + } + + return max_idx; +} + +static double calculate_impacted_Tsw(unsigned int exclude_plane_idx, unsigned int num_planes, double *prefetch_swath_bytes, double bw_mbps) +{ + double sum = 0.; + for (unsigned int i = 0; i < num_planes; i++) { + if (i != exclude_plane_idx) { + sum += prefetch_swath_bytes[i]; + } + } + return sum / bw_mbps; +} + +// a global check against the aggregate effect of the per plane prefetch schedule +static bool CheckGlobalPrefetchAdmissibility(struct dml2_core_internal_scratch *scratch, + struct dml2_core_calcs_CheckGlobalPrefetchAdmissibility_params *p) +{ + struct dml2_core_calcs_CheckGlobalPrefetchAdmissibility_locals *s = &scratch->CheckGlobalPrefetchAdmissibility_locals; + unsigned int i, k; + + memset(s, 0, sizeof(struct dml2_core_calcs_CheckGlobalPrefetchAdmissibility_locals)); + + *p->recalc_prefetch_schedule = 0; + s->prefetch_global_check_passed = 1; + // worst case if the rob and cdb is fully hogged + s->max_Trpd_dcfclk_cycles = (unsigned int) math_ceil2((p->rob_buffer_size_kbytes*1024 + p->compressed_buffer_size_kbytes*DML_MAX_COMPRESSION_RATIO*1024)/64.0, 1.0); +#ifdef __DML_VBA_DEBUG__ + dml2_printf("DML::%s: num_active_planes = %d\n", __func__, p->num_active_planes); + dml2_printf("DML::%s: rob_buffer_size_kbytes = %d\n", __func__, p->rob_buffer_size_kbytes); + dml2_printf("DML::%s: compressed_buffer_size_kbytes = %d\n", __func__, p->compressed_buffer_size_kbytes); + dml2_printf("DML::%s: estimated_urg_bandwidth_required_mbps = %f\n", __func__, p->estimated_urg_bandwidth_required_mbps); + dml2_printf("DML::%s: estimated_dcfclk_mhz = %f\n", __func__, p->estimated_dcfclk_mhz); + dml2_printf("DML::%s: max_Trpd_dcfclk_cycles = %u\n", __func__, s->max_Trpd_dcfclk_cycles); +#endif + + // calculate the return impact from each plane, request is 256B per dcfclk + for (i = 0; i < p->num_active_planes; i++) { + s->src_detile_buf_size_bytes_l[i] = p->detile_buffer_size_bytes_l[i]; + s->src_detile_buf_size_bytes_c[i] = p->detile_buffer_size_bytes_c[i]; + s->src_swath_bytes_l[i] = p->full_swath_bytes_l[i]; + s->src_swath_bytes_c[i] = p->full_swath_bytes_c[i]; + + if (p->pixel_format[i] == dml2_420_10) { + s->src_detile_buf_size_bytes_l[i] = (unsigned int) (s->src_detile_buf_size_bytes_l[i] * 1.5); + s->src_detile_buf_size_bytes_c[i] = (unsigned int) (s->src_detile_buf_size_bytes_c[i] * 1.5); + s->src_swath_bytes_l[i] = (unsigned int) (s->src_swath_bytes_l[i] * 1.5); + s->src_swath_bytes_c[i] = (unsigned int) (s->src_swath_bytes_c[i] * 1.5); + } + + s->burst_bytes_to_fill_det = (unsigned int) (math_floor2(s->src_detile_buf_size_bytes_l[i] / p->chunk_bytes_l, 1) * p->chunk_bytes_l); + s->burst_bytes_to_fill_det += (unsigned int) (math_floor2(p->lb_source_lines_l[i] / p->swath_height_l[i], 1) * s->src_swath_bytes_l[i]); + +#ifdef __DML_VBA_DEBUG__ + dml2_printf("DML::%s: i=%u pixel_format = %d\n", __func__, i, p->pixel_format[i]); + dml2_printf("DML::%s: i=%u chunk_bytes_l = %d\n", __func__, i, p->chunk_bytes_l); + dml2_printf("DML::%s: i=%u lb_source_lines_l = %d\n", __func__, i, p->lb_source_lines_l[i]); + dml2_printf("DML::%s: i=%u src_detile_buf_size_bytes_l=%d\n", __func__, i, s->src_detile_buf_size_bytes_l[i]); + dml2_printf("DML::%s: i=%u src_swath_bytes_l=%d\n", __func__, i, s->src_swath_bytes_l[i]); + dml2_printf("DML::%s: i=%u burst_bytes_to_fill_det=%d (luma)\n", __func__, i, s->burst_bytes_to_fill_det); +#endif + + if (s->src_swath_bytes_c[i] > 0) { // dual_plane + s->burst_bytes_to_fill_det += (unsigned int) (math_floor2(s->src_detile_buf_size_bytes_c[i] / p->chunk_bytes_c, 1) * p->chunk_bytes_c); + + if (p->pixel_format[i] == dml2_422_planar_8 || p->pixel_format[i] == dml2_422_planar_10 || p->pixel_format[i] == dml2_422_planar_12) { + s->burst_bytes_to_fill_det += (unsigned int) (math_floor2(p->lb_source_lines_c[i] / p->swath_height_c[i], 1) * s->src_swath_bytes_c[i]); + } + +#ifdef __DML_VBA_DEBUG__ + dml2_printf("DML::%s: i=%u chunk_bytes_c = %d\n", __func__, i, p->chunk_bytes_c); + dml2_printf("DML::%s: i=%u lb_source_lines_c = %d\n", __func__, i, p->lb_source_lines_c[i]); + dml2_printf("DML::%s: i=%u src_detile_buf_size_bytes_c=%d\n", __func__, i, s->src_detile_buf_size_bytes_c[i]); + dml2_printf("DML::%s: i=%u src_swath_bytes_c=%d\n", __func__, i, s->src_swath_bytes_c[i]); +#endif + } + + s->time_to_fill_det_us = (double) s->burst_bytes_to_fill_det / (256 * p->estimated_dcfclk_mhz); // fill time assume full burst at request rate + s->accumulated_return_path_dcfclk_cycles[i] = (unsigned int) math_ceil2(((DML_MAX_COMPRESSION_RATIO-1) * 64 * p->estimated_dcfclk_mhz) * s->time_to_fill_det_us / 64.0, 1.0); //for 64B per DCFClk + +#ifdef __DML_VBA_DEBUG__ + dml2_printf("DML::%s: i=%u burst_bytes_to_fill_det=%d\n", __func__, i, s->burst_bytes_to_fill_det); + dml2_printf("DML::%s: i=%u time_to_fill_det_us=%f\n", __func__, i, s->time_to_fill_det_us); + dml2_printf("DML::%s: i=%u accumulated_return_path_dcfclk_cycles=%u\n", __func__, i, s->accumulated_return_path_dcfclk_cycles[i]); +#endif + // clamping to worst case delay which is one which occupy the full rob+cdb + if (s->accumulated_return_path_dcfclk_cycles[i] > s->max_Trpd_dcfclk_cycles) + s->accumulated_return_path_dcfclk_cycles[i] = s->max_Trpd_dcfclk_cycles; + } + + // Figure out the impacted prefetch time for each plane + // if impacted_Tre is > equ bw Tpre, we need to fail the prefetch schedule as we need a higher state to support the bw + for (i = 0; i < p->num_active_planes; i++) { + k = find_max_impact_plane(i, p->num_active_planes, s->accumulated_return_path_dcfclk_cycles); // plane k causes most impact to plane i + // the rest of planes (except for k) complete for bw + p->impacted_dst_y_pre[i] = s->accumulated_return_path_dcfclk_cycles[k]/p->estimated_dcfclk_mhz; + p->impacted_dst_y_pre[i] += calculate_impacted_Tsw(k, p->num_active_planes, p->prefetch_sw_bytes, p->estimated_urg_bandwidth_required_mbps); + p->impacted_dst_y_pre[i] = math_ceil2(p->impacted_dst_y_pre[i] / p->line_time[i], 0.25); + +#ifdef __DML_VBA_DEBUG__ + dml2_printf("DML::%s: i=%u impacted_Tpre=%f (k=%u)\n", __func__, i, p->impacted_dst_y_pre[i], k); +#endif + } + + if (p->Tpre_rounded != NULL && p->Tpre_oto != NULL) { + for (i = 0; i < p->num_active_planes; i++) { + if (p->impacted_dst_y_pre[i] > p->dst_y_prefetch[i]) { + s->prefetch_global_check_passed = 0; + *p->recalc_prefetch_schedule = 1; + } +#ifdef __DML_VBA_DEBUG__ + dml2_printf("DML::%s: i=%u Tpre_rounded=%f\n", __func__, i, p->Tpre_rounded[i]); + dml2_printf("DML::%s: i=%u Tpre_oto=%f\n", __func__, i, p->Tpre_oto[i]); +#endif + } + } else { + // likely a mode programming calls, assume support, and no recalc - not used anyways + s->prefetch_global_check_passed = 1; + *p->recalc_prefetch_schedule = 0; + } + +#ifdef __DML_VBA_DEBUG__ + dml2_printf("DML::%s: prefetch_global_check_passed=%u\n", __func__, s->prefetch_global_check_passed); + dml2_printf("DML::%s: recalc_prefetch_schedule=%u\n", __func__, *p->recalc_prefetch_schedule); +#endif + + return s->prefetch_global_check_passed; +} + static void calculate_peak_bandwidth_required( struct dml2_core_internal_scratch *s, struct dml2_core_calcs_calculate_peak_bandwidth_required_params *p) @@ -6046,7 +6260,7 @@ static void check_urgent_bandwidth_support( double *frac_urg_bandwidth_nom, double *frac_urg_bandwidth_mall, bool *vactive_bandwidth_support_ok, // vactive ok - bool *bandwidth_support_ok, // max of vm, prefetch, vactive all ok + bool *bandwidth_support_ok,// max of vm, prefetch, vactive all ok unsigned int mall_allocated_for_dcn_mbytes, double non_urg_bandwidth_required[dml2_core_internal_soc_state_max][dml2_core_internal_bw_max], @@ -6116,7 +6330,6 @@ static void check_urgent_bandwidth_support( } } #endif - } static double get_bandwidth_available_for_immediate_flip(enum dml2_core_internal_soc_state_type eval_state, @@ -6438,7 +6651,7 @@ static void CalculateWatermarksMALLUseAndDRAMSpeedChangeSupport( p->Watermark->Z8StutterExitWatermark += p->mmSOCParameters.max_urgent_latency_us + p->mmSOCParameters.df_response_time_us; p->Watermark->Z8StutterEnterPlusExitWatermark += p->mmSOCParameters.max_urgent_latency_us + p->mmSOCParameters.df_response_time_us; } - p->Watermark->g6_temp_read_watermark_us = p->mmSOCParameters.g6_temp_read_blackout_us + p->Watermark->UrgentWatermark; + p->Watermark->temp_read_or_ppt_watermark_us = p->mmSOCParameters.g6_temp_read_blackout_us + p->Watermark->UrgentWatermark; #ifdef __DML_VBA_DEBUG__ dml2_printf("DML::%s: UrgentLatency = %f\n", __func__, p->mmSOCParameters.UrgentLatency); @@ -6454,12 +6667,12 @@ static void CalculateWatermarksMALLUseAndDRAMSpeedChangeSupport( dml2_printf("DML::%s: StutterEnterPlusExitWatermark = %f\n", __func__, p->Watermark->StutterEnterPlusExitWatermark); dml2_printf("DML::%s: Z8StutterExitWatermark = %f\n", __func__, p->Watermark->Z8StutterExitWatermark); dml2_printf("DML::%s: Z8StutterEnterPlusExitWatermark = %f\n", __func__, p->Watermark->Z8StutterEnterPlusExitWatermark); - dml2_printf("DML::%s: g6_temp_read_watermark_us = %f\n", __func__, p->Watermark->g6_temp_read_watermark_us); + dml2_printf("DML::%s: temp_read_or_ppt_watermark_us = %f\n", __func__, p->Watermark->temp_read_or_ppt_watermark_us); #endif s->TotalActiveWriteback = 0; for (unsigned int k = 0; k < p->NumberOfActiveSurfaces; ++k) { - if (p->display_cfg->stream_descriptors[p->display_cfg->plane_descriptors[k].stream_index].writeback.enable == true) { + if (p->display_cfg->stream_descriptors[p->display_cfg->plane_descriptors[k].stream_index].writeback.active_writebacks_per_stream > 0) { s->TotalActiveWriteback = s->TotalActiveWriteback + 1; } } @@ -6522,7 +6735,7 @@ static void CalculateWatermarksMALLUseAndDRAMSpeedChangeSupport( s->LBLatencyHidingSourceLinesC[k] = (unsigned int)(math_min2((double)p->MaxLineBufferLines, math_floor2((double)p->LineBufferSize / LBBitPerPixel / ((double)p->SwathWidthC[k] / math_max2(h_ratio_c, 1.0)), 1)) - (v_taps_c - 1)); #ifdef __DML_VBA_DEBUG__ - dml2_printf("DML::%s: k=%u, MaxLineBufferLines= %u\n", __func__, k, p->MaxLineBufferLines); + dml2_printf("DML::%s: k=%u, MaxLineBufferLines = %u\n", __func__, k, p->MaxLineBufferLines); dml2_printf("DML::%s: k=%u, LineBufferSize = %u\n", __func__, k, p->LineBufferSize); dml2_printf("DML::%s: k=%u, LBBitPerPixel = %u\n", __func__, k, LBBitPerPixel); dml2_printf("DML::%s: k=%u, HRatio = %f\n", __func__, k, h_ratio); @@ -6563,7 +6776,7 @@ static void CalculateWatermarksMALLUseAndDRAMSpeedChangeSupport( s->ActiveDRAMClockChangeLatencyMargin[k] = s->ActiveClockChangeLatencyHiding - p->Watermark->DRAMClockChangeWatermark; s->ActiveFCLKChangeLatencyMargin[k] = s->ActiveClockChangeLatencyHiding - p->Watermark->FCLKChangeWatermark; s->USRRetrainingLatencyMargin[k] = s->ActiveClockChangeLatencyHiding - p->Watermark->USRRetrainingWatermark; - s->g6_temp_read_latency_margin[k] = s->ActiveClockChangeLatencyHiding - p->Watermark->g6_temp_read_watermark_us; + s->g6_temp_read_latency_margin[k] = s->ActiveClockChangeLatencyHiding - p->Watermark->temp_read_or_ppt_watermark_us; if (p->VActiveLatencyHidingMargin) p->VActiveLatencyHidingMargin[k] = s->ActiveDRAMClockChangeLatencyMargin[k]; @@ -6571,9 +6784,12 @@ static void CalculateWatermarksMALLUseAndDRAMSpeedChangeSupport( if (p->VActiveLatencyHidingUs) p->VActiveLatencyHidingUs[k] = s->ActiveClockChangeLatencyHiding; - if (p->display_cfg->stream_descriptors[p->display_cfg->plane_descriptors[k].stream_index].writeback.enable) { - s->WritebackLatencyHiding = (double)p->WritebackInterfaceBufferSize * 1024.0 / ((double)p->display_cfg->stream_descriptors[p->display_cfg->plane_descriptors[k].stream_index].writeback.scaling_info.output_height * (double)p->display_cfg->stream_descriptors[p->display_cfg->plane_descriptors[k].stream_index].writeback.scaling_info.output_width / ((double)p->display_cfg->stream_descriptors[p->display_cfg->plane_descriptors[k].stream_index].writeback.scaling_info.input_height * (double)h_total / pixel_clock_mhz) * 4.0); - if (p->display_cfg->stream_descriptors[p->display_cfg->plane_descriptors[k].stream_index].writeback.pixel_format == dml2_444_64) { + if (p->display_cfg->stream_descriptors[p->display_cfg->plane_descriptors[k].stream_index].writeback.active_writebacks_per_stream > 0) { + s->WritebackLatencyHiding = (double)p->WritebackInterfaceBufferSize * 1024.0 + / ((double)p->display_cfg->stream_descriptors[p->display_cfg->plane_descriptors[k].stream_index].writeback.writeback_stream[0].output_height + * (double)p->display_cfg->stream_descriptors[p->display_cfg->plane_descriptors[k].stream_index].writeback.writeback_stream[0].output_width + / ((double)p->display_cfg->stream_descriptors[p->display_cfg->plane_descriptors[k].stream_index].writeback.writeback_stream[0].input_height * (double)h_total / pixel_clock_mhz) * 4.0); + if (p->display_cfg->stream_descriptors[p->display_cfg->plane_descriptors[k].stream_index].writeback.writeback_stream[0].pixel_format == dml2_444_64) { s->WritebackLatencyHiding = s->WritebackLatencyHiding / 2; } s->WritebackDRAMClockChangeLatencyMargin = s->WritebackLatencyHiding - p->Watermark->WritebackDRAMClockChangeWatermark; @@ -6588,36 +6804,36 @@ static void CalculateWatermarksMALLUseAndDRAMSpeedChangeSupport( uclk_pstate_change_strategy = p->display_cfg->plane_descriptors[k].overrides.uclk_pstate_change_strategy; reserved_vblank_time_us = (double)p->display_cfg->plane_descriptors[k].overrides.reserved_vblank_time_ns / 1000; - p->FCLKChangeSupport[k] = dml2_fclock_change_unsupported; + p->FCLKChangeSupport[k] = dml2_pstate_change_unsupported; if (s->ActiveFCLKChangeLatencyMargin[k] > 0) - p->FCLKChangeSupport[k] = dml2_fclock_change_vactive; + p->FCLKChangeSupport[k] = dml2_pstate_change_vactive; else if (reserved_vblank_time_us >= p->mmSOCParameters.FCLKChangeLatency) - p->FCLKChangeSupport[k] = dml2_fclock_change_vblank; + p->FCLKChangeSupport[k] = dml2_pstate_change_vblank; - if (p->FCLKChangeSupport[k] == dml2_fclock_change_unsupported) + if (p->FCLKChangeSupport[k] == dml2_pstate_change_unsupported) *p->global_fclk_change_supported = false; - p->DRAMClockChangeSupport[k] = dml2_dram_clock_change_unsupported; + p->DRAMClockChangeSupport[k] = dml2_pstate_change_unsupported; if (uclk_pstate_change_strategy == dml2_uclk_pstate_change_strategy_auto) { if (p->display_cfg->overrides.all_streams_blanked || (s->ActiveDRAMClockChangeLatencyMargin[k] > 0 && reserved_vblank_time_us >= p->mmSOCParameters.DRAMClockChangeLatency)) - p->DRAMClockChangeSupport[k] = dml2_dram_clock_change_vblank_and_vactive; + p->DRAMClockChangeSupport[k] = dml2_pstate_change_vblank_and_vactive; else if (s->ActiveDRAMClockChangeLatencyMargin[k] > 0) - p->DRAMClockChangeSupport[k] = dml2_dram_clock_change_vactive; + p->DRAMClockChangeSupport[k] = dml2_pstate_change_vactive; else if (reserved_vblank_time_us >= p->mmSOCParameters.DRAMClockChangeLatency) - p->DRAMClockChangeSupport[k] = dml2_dram_clock_change_vblank; + p->DRAMClockChangeSupport[k] = dml2_pstate_change_vblank; } else if (uclk_pstate_change_strategy == dml2_uclk_pstate_change_strategy_force_vactive && s->ActiveDRAMClockChangeLatencyMargin[k] > 0) - p->DRAMClockChangeSupport[k] = dml2_dram_clock_change_vactive; + p->DRAMClockChangeSupport[k] = dml2_pstate_change_vactive; else if (uclk_pstate_change_strategy == dml2_uclk_pstate_change_strategy_force_vblank && reserved_vblank_time_us >= p->mmSOCParameters.DRAMClockChangeLatency) - p->DRAMClockChangeSupport[k] = dml2_dram_clock_change_vblank; + p->DRAMClockChangeSupport[k] = dml2_pstate_change_vblank; else if (uclk_pstate_change_strategy == dml2_uclk_pstate_change_strategy_force_drr) - p->DRAMClockChangeSupport[k] = dml2_dram_clock_change_drr; + p->DRAMClockChangeSupport[k] = dml2_pstate_change_drr; else if (uclk_pstate_change_strategy == dml2_uclk_pstate_change_strategy_force_mall_svp) - p->DRAMClockChangeSupport[k] = dml2_dram_clock_change_mall_svp; + p->DRAMClockChangeSupport[k] = dml2_pstate_change_mall_svp; else if (uclk_pstate_change_strategy == dml2_uclk_pstate_change_strategy_force_mall_full_frame) - p->DRAMClockChangeSupport[k] = dml2_dram_clock_change_mall_full_frame; + p->DRAMClockChangeSupport[k] = dml2_pstate_change_mall_full_frame; - if (p->DRAMClockChangeSupport[k] == dml2_dram_clock_change_unsupported) + if (p->DRAMClockChangeSupport[k] == dml2_pstate_change_unsupported) *p->global_dram_clock_change_supported = false; s->dst_y_pstate = (unsigned int)(math_ceil2((p->mmSOCParameters.DRAMClockChangeLatency + p->mmSOCParameters.UrgentLatency) / (h_total / pixel_clock_mhz), 1)); @@ -6915,8 +7131,7 @@ struct dml2_core_internal_g6_temp_read_blackouts_table { } entries[DML_MAX_CLK_TABLE_SIZE]; }; -static const struct dml2_core_internal_g6_temp_read_blackouts_table - core_dcn4_g6_temp_read_blackout_table = { +struct dml2_core_internal_g6_temp_read_blackouts_table core_dcn4_g6_temp_read_blackout_table = { .entries = { { .uclk_khz = 96000, @@ -7036,6 +7251,9 @@ static bool dml_core_mode_support(struct dml2_core_calcs_mode_support_ex *in_out struct dml2_core_calcs_CalculateVMRowAndSwath_params *CalculateVMRowAndSwath_params = &mode_lib->scratch.CalculateVMRowAndSwath_params; struct dml2_core_calcs_CalculateSwathAndDETConfiguration_params *CalculateSwathAndDETConfiguration_params = &mode_lib->scratch.CalculateSwathAndDETConfiguration_params; struct dml2_core_calcs_CalculatePrefetchSchedule_params *CalculatePrefetchSchedule_params = &mode_lib->scratch.CalculatePrefetchSchedule_params; +#ifdef DML_GLOBAL_PREFETCH_CHECK + struct dml2_core_calcs_CheckGlobalPrefetchAdmissibility_params *CheckGlobalPrefetchAdmissibility_params = &mode_lib->scratch.CheckGlobalPrefetchAdmissibility_params; +#endif struct dml2_core_calcs_calculate_tdlut_setting_params *calculate_tdlut_setting_params = &mode_lib->scratch.calculate_tdlut_setting_params; struct dml2_core_calcs_calculate_mcache_setting_params *calculate_mcache_setting_params = &mode_lib->scratch.calculate_mcache_setting_params; struct dml2_core_calcs_calculate_peak_bandwidth_required_params *calculate_peak_bandwidth_params = &mode_lib->scratch.calculate_peak_bandwidth_params; @@ -7083,12 +7301,6 @@ static bool dml_core_mode_support(struct dml2_core_calcs_mode_support_ex *in_out for (k = 0; k < mode_lib->ms.num_active_planes; k++) dml2_printf("DML::%s: plane_%d: reserved_vblank_time_ns = %u\n", __func__, k, display_cfg->plane_descriptors[k].overrides.reserved_vblank_time_ns); - - // dml2_printf_dml_policy(&mode_lib->ms.policy); - // dml2_printf_dml_display_cfg_timing(&display_cfg->timing, mode_lib->ms.num_active_planes); - // dml2_printf_dml_display_cfg_plane(&display_cfg->plane, mode_lib->ms.num_active_planes); - // dml2_printf_dml_display_cfg_surface(&display_cfg->surface, mode_lib->ms.num_active_planes); - // dml2_printf_dml_display_cfg_output(&display_cfg->output, mode_lib->ms.num_active_planes); #endif CalculateMaxDETAndMinCompressedBufferSize( @@ -7183,8 +7395,8 @@ static bool dml_core_mode_support(struct dml2_core_calcs_mode_support_ex *in_out } for (k = 0; k <= mode_lib->ms.num_active_planes - 1; k++) { - mode_lib->ms.SurfaceReadBandwidthLuma[k] = mode_lib->ms.SwathWidthYSingleDPP[k] * math_ceil2(mode_lib->ms.BytePerPixelY[k], 1.0) / (display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].timing.h_total / ((double)display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].timing.pixel_clock_khz / 1000)) * display_cfg->plane_descriptors[k].composition.scaler_info.plane0.v_ratio; - mode_lib->ms.SurfaceReadBandwidthChroma[k] = mode_lib->ms.SwathWidthCSingleDPP[k] * math_ceil2(mode_lib->ms.BytePerPixelC[k], 2.0) / (display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].timing.h_total / ((double)display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].timing.pixel_clock_khz / 1000)) * display_cfg->plane_descriptors[k].composition.scaler_info.plane1.v_ratio; + mode_lib->ms.vactive_sw_bw_l[k] = mode_lib->ms.SwathWidthYSingleDPP[k] * math_ceil2(mode_lib->ms.BytePerPixelY[k], 1.0) / (display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].timing.h_total / ((double)display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].timing.pixel_clock_khz / 1000)) * display_cfg->plane_descriptors[k].composition.scaler_info.plane0.v_ratio; + mode_lib->ms.vactive_sw_bw_c[k] = mode_lib->ms.SwathWidthCSingleDPP[k] * math_ceil2(mode_lib->ms.BytePerPixelC[k], 2.0) / (display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].timing.h_total / ((double)display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].timing.pixel_clock_khz / 1000)) * display_cfg->plane_descriptors[k].composition.scaler_info.plane1.v_ratio; mode_lib->ms.cursor_bw[k] = display_cfg->plane_descriptors[k].cursor.num_cursors * display_cfg->plane_descriptors[k].cursor.cursor_width * display_cfg->plane_descriptors[k].cursor.cursor_bpp / 8.0 / (display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].timing.h_total / ((double)display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].timing.pixel_clock_khz / 1000)); @@ -7194,35 +7406,35 @@ static bool dml_core_mode_support(struct dml2_core_calcs_mode_support_ex *in_out old_ReadBandwidthChroma = mode_lib->ms.SwathWidthYSingleDPP[k] / 2 * math_ceil2(mode_lib->ms.BytePerPixelInDETC[k], 2.0) / (display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].timing.h_total / ((double)display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].timing.pixel_clock_khz / 1000)) * display_cfg->plane_descriptors[k].composition.scaler_info.plane0.v_ratio / 2.0; dml2_printf("DML::%s: k=%u, old_ReadBandwidthLuma = %f\n", __func__, k, old_ReadBandwidthLuma); dml2_printf("DML::%s: k=%u, old_ReadBandwidthChroma = %f\n", __func__, k, old_ReadBandwidthChroma); - dml2_printf("DML::%s: k=%u, ReadBandwidthLuma = %f\n", __func__, k, mode_lib->ms.SurfaceReadBandwidthLuma[k]); - dml2_printf("DML::%s: k=%u, ReadBandwidthChroma = %f\n", __func__, k, mode_lib->ms.SurfaceReadBandwidthChroma[k]); + dml2_printf("DML::%s: k=%u, vactive_sw_bw_l = %f\n", __func__, k, mode_lib->ms.vactive_sw_bw_l[k]); + dml2_printf("DML::%s: k=%u, vactive_sw_bw_c = %f\n", __func__, k, mode_lib->ms.vactive_sw_bw_c[k]); #endif } // Writeback bandwidth for (k = 0; k < mode_lib->ms.num_active_planes; k++) { - if (display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.enable == true && display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.pixel_format == dml2_444_64) { - mode_lib->ms.WriteBandwidth[k] = display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.scaling_info.output_height - * display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.scaling_info.output_width - / (display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.scaling_info.input_height + if (display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.active_writebacks_per_stream > 0 && display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.writeback_stream[0].pixel_format == dml2_444_64) { + mode_lib->ms.WriteBandwidth[k][0] = display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.writeback_stream[0].output_height + * display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.writeback_stream[0].output_width + / (display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.writeback_stream[0].input_height * display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].timing.h_total / ((double)display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].timing.pixel_clock_khz / 1000)) * 8.0; - } else if (display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.enable == true) { - mode_lib->ms.WriteBandwidth[k] = display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.scaling_info.output_height - * display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.scaling_info.output_width - / (display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.scaling_info.input_height + } else if (display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.active_writebacks_per_stream > 0) { + mode_lib->ms.WriteBandwidth[k][0] = display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.writeback_stream[0].output_height + * display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.writeback_stream[0].output_width + / (display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.writeback_stream[0].input_height * display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].timing.h_total / ((double)display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].timing.pixel_clock_khz / 1000)) * 4.0; } else { - mode_lib->ms.WriteBandwidth[k] = 0.0; + mode_lib->ms.WriteBandwidth[k][0] = 0.0; } } /*Writeback Latency support check*/ mode_lib->ms.support.WritebackLatencySupport = true; for (k = 0; k <= mode_lib->ms.num_active_planes - 1; k++) { - if (display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.enable == true && - (mode_lib->ms.WriteBandwidth[k] > mode_lib->ip.writeback_interface_buffer_size_kbytes * 1024 / ((double)mode_lib->soc.qos_parameters.writeback.base_latency_us))) { + if (display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.active_writebacks_per_stream > 0 && + (mode_lib->ms.WriteBandwidth[k][0] > mode_lib->ip.writeback_interface_buffer_size_kbytes * 1024 / ((double)mode_lib->soc.qos_parameters.writeback.base_latency_us))) { mode_lib->ms.support.WritebackLatencySupport = false; } } @@ -7231,19 +7443,19 @@ static bool dml_core_mode_support(struct dml2_core_calcs_mode_support_ex *in_out /* Writeback Scale Ratio and Taps Support Check */ mode_lib->ms.support.WritebackScaleRatioAndTapsSupport = true; for (k = 0; k <= mode_lib->ms.num_active_planes - 1; k++) { - if (display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.enable == true) { - if (display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.scaling_info.h_ratio > mode_lib->ip.writeback_max_hscl_ratio - || display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.scaling_info.v_ratio > mode_lib->ip.writeback_max_vscl_ratio - || display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.scaling_info.h_ratio < mode_lib->ip.writeback_min_hscl_ratio - || display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.scaling_info.v_ratio < mode_lib->ip.writeback_min_vscl_ratio - || display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.scaling_info.h_taps > (unsigned int) mode_lib->ip.writeback_max_hscl_taps - || display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.scaling_info.v_taps > (unsigned int) mode_lib->ip.writeback_max_vscl_taps - || display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.scaling_info.h_ratio > (unsigned int)display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.scaling_info.h_taps - || display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.scaling_info.v_ratio > (unsigned int)display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.scaling_info.v_taps - || (display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.scaling_info.h_taps > 2.0 && ((display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.scaling_info.h_taps % 2) == 1))) { + if (display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.active_writebacks_per_stream > 0) { + if (display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.writeback_stream[0].h_ratio > mode_lib->ip.writeback_max_hscl_ratio + || display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.writeback_stream[0].v_ratio > mode_lib->ip.writeback_max_vscl_ratio + || display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.writeback_stream[0].h_ratio < mode_lib->ip.writeback_min_hscl_ratio + || display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.writeback_stream[0].v_ratio < mode_lib->ip.writeback_min_vscl_ratio + || display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.writeback_stream[0].h_taps > (unsigned int) mode_lib->ip.writeback_max_hscl_taps + || display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.writeback_stream[0].v_taps > (unsigned int) mode_lib->ip.writeback_max_vscl_taps + || display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.writeback_stream[0].h_ratio > (unsigned int)display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.writeback_stream[0].h_taps + || display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.writeback_stream[0].v_ratio > (unsigned int)display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.writeback_stream[0].v_taps + || (display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.writeback_stream[0].h_taps > 2.0 && ((display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.writeback_stream[0].h_taps % 2) == 1))) { mode_lib->ms.support.WritebackScaleRatioAndTapsSupport = false; } - if (2.0 * display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.scaling_info.output_height * (display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.scaling_info.v_taps - 1) * 57 > mode_lib->ip.writeback_line_buffer_buffer_size) { + if (2.0 * display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.writeback_stream[0].output_height * (display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.writeback_stream[0].v_taps - 1) * 57 > mode_lib->ip.writeback_line_buffer_buffer_size) { mode_lib->ms.support.WritebackScaleRatioAndTapsSupport = false; } } @@ -7423,8 +7635,8 @@ static bool dml_core_mode_support(struct dml2_core_calcs_mode_support_ex *in_out CalculateSwathAndDETConfiguration_params->nomDETInKByte = mode_lib->ms.NomDETInKByte; CalculateSwathAndDETConfiguration_params->ConfigReturnBufferSegmentSizeInkByte = mode_lib->ip.config_return_buffer_segment_size_in_kbytes; CalculateSwathAndDETConfiguration_params->CompressedBufferSegmentSizeInkByte = mode_lib->ip.compressed_buffer_segment_size_in_kbytes; - CalculateSwathAndDETConfiguration_params->ReadBandwidthLuma = mode_lib->ms.SurfaceReadBandwidthLuma; - CalculateSwathAndDETConfiguration_params->ReadBandwidthChroma = mode_lib->ms.SurfaceReadBandwidthChroma; + CalculateSwathAndDETConfiguration_params->ReadBandwidthLuma = mode_lib->ms.vactive_sw_bw_l; + CalculateSwathAndDETConfiguration_params->ReadBandwidthChroma = mode_lib->ms.vactive_sw_bw_c; CalculateSwathAndDETConfiguration_params->MaximumSwathWidthLuma = mode_lib->ms.MaximumSwathWidthLuma; CalculateSwathAndDETConfiguration_params->MaximumSwathWidthChroma = mode_lib->ms.MaximumSwathWidthChroma; CalculateSwathAndDETConfiguration_params->Read256BytesBlockHeightY = mode_lib->ms.Read256BlockHeightY; @@ -7671,16 +7883,16 @@ static bool dml_core_mode_support(struct dml2_core_calcs_mode_support_ex *in_out //DISPCLK/DPPCLK mode_lib->ms.WritebackRequiredDISPCLK = 0; for (k = 0; k < mode_lib->ms.num_active_planes; ++k) { - if (display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.enable) { + if (display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.active_writebacks_per_stream > 0) { mode_lib->ms.WritebackRequiredDISPCLK = math_max2(mode_lib->ms.WritebackRequiredDISPCLK, - CalculateWriteBackDISPCLK(display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.pixel_format, + CalculateWriteBackDISPCLK(display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.writeback_stream[0].pixel_format, ((double)display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].timing.pixel_clock_khz / 1000), - display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.scaling_info.h_ratio, - display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.scaling_info.v_ratio, - display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.scaling_info.h_taps, - display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.scaling_info.v_taps, - display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.scaling_info.input_width, - display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.scaling_info.output_width, + display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.writeback_stream[0].h_ratio, + display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.writeback_stream[0].v_ratio, + display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.writeback_stream[0].h_taps, + display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.writeback_stream[0].v_taps, + display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.writeback_stream[0].input_width, + display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.writeback_stream[0].output_width, display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].timing.h_total, mode_lib->ip.writeback_line_buffer_buffer_size)); } @@ -7712,7 +7924,7 @@ static bool dml_core_mode_support(struct dml2_core_calcs_mode_support_ex *in_out if (!s->stream_visited[display_cfg->plane_descriptors[k].stream_index]) { s->stream_visited[display_cfg->plane_descriptors[k].stream_index] = 1; - if (display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.enable == true) + if (display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.active_writebacks_per_stream > 0) s->TotalNumberOfActiveWriteback = s->TotalNumberOfActiveWriteback + 1; s->TotalNumberOfActiveOTG = s->TotalNumberOfActiveOTG + 1; @@ -8256,23 +8468,23 @@ static bool dml_core_mode_support(struct dml2_core_calcs_mode_support_ex *in_out mode_lib->ms.PSCL_FACTOR, mode_lib->ms.PSCL_FACTOR_CHROMA, mode_lib->ms.RequiredDPPCLK, - mode_lib->ms.SurfaceReadBandwidthLuma, - mode_lib->ms.SurfaceReadBandwidthChroma, + mode_lib->ms.vactive_sw_bw_l, + mode_lib->ms.vactive_sw_bw_c, mode_lib->soc.return_bus_width_bytes, /* Output */ &mode_lib->ms.dcfclk_deepsleep); for (k = 0; k <= mode_lib->ms.num_active_planes - 1; k++) { - if (display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.enable == true) { + if (display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.active_writebacks_per_stream > 0) { mode_lib->ms.WritebackDelayTime[k] = mode_lib->soc.qos_parameters.writeback.base_latency_us + CalculateWriteBackDelay( - display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.pixel_format, - display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.scaling_info.h_ratio, - display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.scaling_info.v_ratio, - display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.scaling_info.v_taps, - display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.scaling_info.output_width, - display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.scaling_info.output_height, - display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.scaling_info.input_height, + display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.writeback_stream[0].pixel_format, + display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.writeback_stream[0].h_ratio, + display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.writeback_stream[0].v_ratio, + display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.writeback_stream[0].v_taps, + display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.writeback_stream[0].output_width, + display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.writeback_stream[0].output_height, + display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.writeback_stream[0].input_height, display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].timing.h_total) / mode_lib->ms.RequiredDISPCLK; } else { mode_lib->ms.WritebackDelayTime[k] = 0.0; @@ -8349,7 +8561,7 @@ static bool dml_core_mode_support(struct dml2_core_calcs_mode_support_ex *in_out dml2_printf("DML::%s: mode_lib->ms.DCFCLK = %f\n", __func__, mode_lib->ms.DCFCLK); dml2_printf("DML::%s: mode_lib->ms.FabricClock = %f\n", __func__, mode_lib->ms.FabricClock); dml2_printf("DML::%s: mode_lib->ms.uclk_freq_mhz = %f\n", __func__, mode_lib->ms.uclk_freq_mhz); - dml2_printf("DML::%s: urgent latency tolerance = %f\n", __func__, ((mode_lib->ip.rob_buffer_size_kbytes - mode_lib->ip.pixel_chunk_size_kbytes) * 1024 / (mode_lib->ms.DCFCLK * mode_lib->soc.return_bus_width_bytes))); + dml2_printf("DML::%s: urgent latency tolarance = %f\n", __func__, ((mode_lib->ip.rob_buffer_size_kbytes - mode_lib->ip.pixel_chunk_size_kbytes) * 1024 / (mode_lib->ms.DCFCLK * mode_lib->soc.return_bus_width_bytes))); #endif mode_lib->ms.support.OutstandingRequestsSupport = true; @@ -8367,6 +8579,13 @@ static bool dml_core_mode_support(struct dml2_core_calcs_mode_support_ex *in_out + mode_lib->soc.qos_parameters.qos_params.dcn4x.average_transport_distance_fclk_cycles / mode_lib->ms.FabricClock) * (1 + mode_lib->soc.qos_parameters.qos_params.dcn4x.fabric_average_transport_latency_margin / 100.0); + mode_lib->ms.support.max_non_urgent_latency_us + = mode_lib->soc.qos_parameters.qos_params.dcn4x.per_uclk_dpm_params[mode_lib->ms.qos_param_index].maximum_latency_when_non_urgent_uclk_cycles + / mode_lib->ms.uclk_freq_mhz * (1 + mode_lib->soc.qos_parameters.qos_params.dcn4x.umc_max_latency_margin / 100.0) + + mode_lib->soc.qos_parameters.qos_params.dcn4x.mall_overhead_fclk_cycles / mode_lib->ms.FabricClock + + mode_lib->soc.qos_parameters.qos_params.dcn4x.max_round_trip_to_furthest_cs_fclk_cycles / mode_lib->ms.FabricClock + * (1 + mode_lib->soc.qos_parameters.qos_params.dcn4x.fabric_max_transport_latency_margin / 100.0); + for (k = 0; k < mode_lib->ms.num_active_planes; k++) { if (mode_lib->soc.qos_parameters.qos_type == dml2_qos_param_type_dcn4x) { @@ -8408,7 +8627,7 @@ static bool dml_core_mode_support(struct dml2_core_calcs_mode_support_ex *in_out } memset(calculate_mcache_setting_params, 0, sizeof(struct dml2_core_calcs_calculate_mcache_setting_params)); - if (mode_lib->soc.mall_allocated_for_dcn_mbytes == 0 || mode_lib->ip.dcn_mrq_present) { + if (mode_lib->soc.mcache_size_bytes == 0 || mode_lib->ip.dcn_mrq_present) { for (k = 0; k < mode_lib->ms.num_active_planes; k++) { mode_lib->ms.mall_prefetch_sdp_overhead_factor[k] = 1.0; mode_lib->ms.mall_prefetch_dram_overhead_factor[k] = 1.0; @@ -8515,8 +8734,11 @@ static bool dml_core_mode_support(struct dml2_core_calcs_mode_support_ex *in_out display_cfg->hostvm_enable, mode_lib->ms.MaxDCFCLK, mode_lib->ms.MaxFabricClock, +#ifdef DML_MODE_SUPPORT_USE_DPM_DRAM_BW + mode_lib->ms.dram_bw_mbps); +#else mode_lib->ms.max_dram_bw_mbps); - +#endif // Average BW support check calculate_avg_bandwidth_required( @@ -8524,8 +8746,8 @@ static bool dml_core_mode_support(struct dml2_core_calcs_mode_support_ex *in_out // input display_cfg, mode_lib->ms.num_active_planes, - mode_lib->ms.SurfaceReadBandwidthLuma, - mode_lib->ms.SurfaceReadBandwidthChroma, + mode_lib->ms.vactive_sw_bw_l, + mode_lib->ms.vactive_sw_bw_c, mode_lib->ms.cursor_bw, mode_lib->ms.dcc_dram_bw_nom_overhead_factor_p0, mode_lib->ms.dcc_dram_bw_nom_overhead_factor_p1, @@ -8638,9 +8860,32 @@ static bool dml_core_mode_support(struct dml2_core_calcs_mode_support_ex *in_out &mode_lib->ms.ExtraLatency_sr, &mode_lib->ms.ExtraLatencyPrefetch); - { + for (k = 0; k < mode_lib->ms.num_active_planes; k++) + s->impacted_dst_y_pre[k] = 0; + + s->recalc_prefetch_schedule = 0; + s->recalc_prefetch_done = 0; + do { mode_lib->ms.support.PrefetchSupported = true; + for (k = 0; k < mode_lib->ms.num_active_planes; k++) { + s->line_times[k] = display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].timing.h_total / ((double)display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].timing.pixel_clock_khz / 1000); + s->pixel_format[k] = display_cfg->plane_descriptors[k].pixel_format; + + s->lb_source_lines_l[k] = get_num_lb_source_lines(mode_lib->ip.max_line_buffer_lines, mode_lib->ip.line_buffer_size_bits, + mode_lib->ms.NoOfDPP[k], + display_cfg->plane_descriptors[k].composition.viewport.plane0.width, + display_cfg->plane_descriptors[k].composition.viewport.plane0.height, + display_cfg->plane_descriptors[k].composition.scaler_info.plane0.h_ratio, + display_cfg->plane_descriptors[k].composition.rotation_angle); + + s->lb_source_lines_c[k] = get_num_lb_source_lines(mode_lib->ip.max_line_buffer_lines, mode_lib->ip.line_buffer_size_bits, + mode_lib->ms.NoOfDPP[k], + display_cfg->plane_descriptors[k].composition.viewport.plane1.width, + display_cfg->plane_descriptors[k].composition.viewport.plane1.height, + display_cfg->plane_descriptors[k].composition.scaler_info.plane1.h_ratio, + display_cfg->plane_descriptors[k].composition.rotation_angle); + struct dml2_core_internal_DmlPipe *myPipe = &s->myPipe; mode_lib->ms.TWait[k] = CalculateTWait( @@ -8730,6 +8975,9 @@ static bool dml_core_mode_support(struct dml2_core_calcs_mode_support_ex *in_out CalculatePrefetchSchedule_params->mrq_present = mode_lib->ip.dcn_mrq_present; CalculatePrefetchSchedule_params->meta_row_bytes = mode_lib->ms.meta_row_bytes[k]; CalculatePrefetchSchedule_params->mall_prefetch_sdp_overhead_factor = mode_lib->ms.mall_prefetch_sdp_overhead_factor[k]; + CalculatePrefetchSchedule_params->impacted_dst_y_pre = s->impacted_dst_y_pre[k]; + CalculatePrefetchSchedule_params->vactive_sw_bw_l = mode_lib->ms.vactive_sw_bw_l[k]; + CalculatePrefetchSchedule_params->vactive_sw_bw_c = mode_lib->ms.vactive_sw_bw_c[k]; // output CalculatePrefetchSchedule_params->DSTXAfterScaler = &s->DSTXAfterScaler[k]; @@ -8758,6 +9006,9 @@ static bool dml_core_mode_support(struct dml2_core_calcs_mode_support_ex *in_out CalculatePrefetchSchedule_params->VUpdateWidthPix = &s->dummy_integer[1]; CalculatePrefetchSchedule_params->VReadyOffsetPix = &s->dummy_integer[2]; CalculatePrefetchSchedule_params->prefetch_cursor_bw = &mode_lib->ms.prefetch_cursor_bw[k]; + CalculatePrefetchSchedule_params->prefetch_sw_bytes = &s->prefetch_sw_bytes[k]; + CalculatePrefetchSchedule_params->Tpre_rounded = &s->Tpre_rounded[k]; + CalculatePrefetchSchedule_params->Tpre_oto = &s->Tpre_oto[k]; mode_lib->ms.NoTimeForPrefetch[k] = CalculatePrefetchSchedule(&mode_lib->scratch, CalculatePrefetchSchedule_params); @@ -8789,7 +9040,7 @@ static bool dml_core_mode_support(struct dml2_core_calcs_mode_support_ex *in_out } mode_lib->ms.support.VRatioInPrefetchSupported = true; - for (k = 0; k <= mode_lib->ms.num_active_planes - 1; k++) { + for (k = 0; k < mode_lib->ms.num_active_planes; k++) { if (mode_lib->ms.VRatioPreY[k] > __DML2_CALCS_MAX_VRATIO_PRE__ || mode_lib->ms.VRatioPreC[k] > __DML2_CALCS_MAX_VRATIO_PRE__) { mode_lib->ms.support.VRatioInPrefetchSupported = false; @@ -8799,10 +9050,14 @@ static bool dml_core_mode_support(struct dml2_core_calcs_mode_support_ex *in_out } } + mode_lib->ms.support.PrefetchSupported &= mode_lib->ms.support.VRatioInPrefetchSupported; + + // By default, do not recalc prefetch schedule + s->recalc_prefetch_schedule = 0; + // Only do urg vs prefetch bandwidth check, flip schedule check, power saving feature support check IF the Prefetch Schedule Check is ok if (mode_lib->ms.support.PrefetchSupported) { - for (k = 0; k <= mode_lib->ms.num_active_planes - 1; k++) { - double line_time_us = display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].timing.h_total / ((double)display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].timing.pixel_clock_khz / 1000); + for (k = 0; k < mode_lib->ms.num_active_planes; k++) { // Calculate Urgent burst factor for prefetch #ifdef __DML_VBA_DEBUG__ dml2_printf("DML::%s: k=%d, Calling CalculateUrgentBurstFactor (for prefetch)\n", __func__, k); @@ -8815,7 +9070,7 @@ static bool dml_core_mode_support(struct dml2_core_calcs_mode_support_ex *in_out mode_lib->ms.swath_width_chroma_ub[k], mode_lib->ms.SwathHeightY[k], mode_lib->ms.SwathHeightC[k], - line_time_us, + s->line_times[k], mode_lib->ms.UrgLatency, mode_lib->ms.VRatioPreY[k], mode_lib->ms.VRatioPreC[k], @@ -8852,8 +9107,8 @@ static bool dml_core_mode_support(struct dml2_core_calcs_mode_support_ex *in_out calculate_peak_bandwidth_params->mall_prefetch_sdp_overhead_factor = mode_lib->ms.mall_prefetch_sdp_overhead_factor; calculate_peak_bandwidth_params->mall_prefetch_dram_overhead_factor = mode_lib->ms.mall_prefetch_dram_overhead_factor; - calculate_peak_bandwidth_params->surface_read_bandwidth_l = mode_lib->ms.SurfaceReadBandwidthLuma; - calculate_peak_bandwidth_params->surface_read_bandwidth_c = mode_lib->ms.SurfaceReadBandwidthChroma; + calculate_peak_bandwidth_params->surface_read_bandwidth_l = mode_lib->ms.vactive_sw_bw_l; + calculate_peak_bandwidth_params->surface_read_bandwidth_c = mode_lib->ms.vactive_sw_bw_c; calculate_peak_bandwidth_params->prefetch_bandwidth_l = mode_lib->ms.RequiredPrefetchPixelDataBWLuma; calculate_peak_bandwidth_params->prefetch_bandwidth_c = mode_lib->ms.RequiredPrefetchPixelDataBWChroma; calculate_peak_bandwidth_params->excess_vactive_fill_bw_l = mode_lib->ms.excess_vactive_fill_bw_l; @@ -8899,127 +9154,164 @@ static bool dml_core_mode_support(struct dml2_core_calcs_mode_support_ex *in_out } } +#ifdef DML_GLOBAL_PREFETCH_CHECK + if (mode_lib->ms.support.PrefetchSupported && mode_lib->ms.num_active_planes > 1 && s->recalc_prefetch_done == 0) { + CheckGlobalPrefetchAdmissibility_params->num_active_planes = mode_lib->ms.num_active_planes; + CheckGlobalPrefetchAdmissibility_params->pixel_format = s->pixel_format; + CheckGlobalPrefetchAdmissibility_params->chunk_bytes_l = mode_lib->ip.pixel_chunk_size_kbytes * 1024; + CheckGlobalPrefetchAdmissibility_params->chunk_bytes_c = mode_lib->ip.pixel_chunk_size_kbytes * 1024; + CheckGlobalPrefetchAdmissibility_params->lb_source_lines_l = s->lb_source_lines_l; + CheckGlobalPrefetchAdmissibility_params->lb_source_lines_c = s->lb_source_lines_c; + CheckGlobalPrefetchAdmissibility_params->swath_height_l = mode_lib->ms.SwathHeightY; + CheckGlobalPrefetchAdmissibility_params->swath_height_c = mode_lib->ms.SwathHeightC; + CheckGlobalPrefetchAdmissibility_params->rob_buffer_size_kbytes = mode_lib->ip.rob_buffer_size_kbytes; + CheckGlobalPrefetchAdmissibility_params->compressed_buffer_size_kbytes = mode_lib->ms.CompressedBufferSizeInkByte; + CheckGlobalPrefetchAdmissibility_params->detile_buffer_size_bytes_l = mode_lib->ms.DETBufferSizeY; + CheckGlobalPrefetchAdmissibility_params->detile_buffer_size_bytes_c = mode_lib->ms.DETBufferSizeC; + CheckGlobalPrefetchAdmissibility_params->full_swath_bytes_l = s->full_swath_bytes_l; + CheckGlobalPrefetchAdmissibility_params->full_swath_bytes_c = s->full_swath_bytes_c; + CheckGlobalPrefetchAdmissibility_params->prefetch_sw_bytes = s->prefetch_sw_bytes; + CheckGlobalPrefetchAdmissibility_params->Tpre_rounded = s->Tpre_rounded; + CheckGlobalPrefetchAdmissibility_params->Tpre_oto = s->Tpre_oto; + CheckGlobalPrefetchAdmissibility_params->estimated_urg_bandwidth_required_mbps = mode_lib->ms.support.urg_bandwidth_required[dml2_core_internal_soc_state_sys_active][dml2_core_internal_bw_sdp]; + CheckGlobalPrefetchAdmissibility_params->line_time = s->line_times; + CheckGlobalPrefetchAdmissibility_params->dst_y_prefetch = mode_lib->ms.dst_y_prefetch; + if (CheckGlobalPrefetchAdmissibility_params->estimated_urg_bandwidth_required_mbps < 10 * 1024) + CheckGlobalPrefetchAdmissibility_params->estimated_urg_bandwidth_required_mbps = 10 * 1024; + + CheckGlobalPrefetchAdmissibility_params->estimated_dcfclk_mhz = (CheckGlobalPrefetchAdmissibility_params->estimated_urg_bandwidth_required_mbps / (double) mode_lib->soc.return_bus_width_bytes) / + ((double)mode_lib->soc.qos_parameters.derate_table.system_active_urgent.dcfclk_derate_percent / 100.0); + + // if recalc_prefetch_schedule is set, recalculate the prefetch schedule with the new impacted_Tpre, prefetch should be possible + CheckGlobalPrefetchAdmissibility_params->recalc_prefetch_schedule = &s->recalc_prefetch_schedule; + CheckGlobalPrefetchAdmissibility_params->impacted_dst_y_pre = s->impacted_dst_y_pre; + mode_lib->ms.support.PrefetchSupported = CheckGlobalPrefetchAdmissibility(&mode_lib->scratch, CheckGlobalPrefetchAdmissibility_params); + s->recalc_prefetch_done = 1; + s->recalc_prefetch_schedule = 1; + } +#endif + } // prefetch schedule ok, do urg bw and flip schedule + } while (s->recalc_prefetch_schedule); - // Both prefetch schedule and BW okay - if (mode_lib->ms.support.PrefetchSupported == true && mode_lib->ms.support.VRatioInPrefetchSupported == true) { - mode_lib->ms.BandwidthAvailableForImmediateFlip = - get_bandwidth_available_for_immediate_flip( - dml2_core_internal_soc_state_sys_active, - mode_lib->ms.support.urg_bandwidth_required_qual, // no flip - mode_lib->ms.support.urg_bandwidth_available); - - mode_lib->ms.TotImmediateFlipBytes = 0; - for (k = 0; k < mode_lib->ms.num_active_planes; k++) { - if (display_cfg->plane_descriptors[k].immediate_flip) { - s->per_pipe_flip_bytes[k] = get_pipe_flip_bytes( - s->HostVMInefficiencyFactor, - mode_lib->ms.vm_bytes[k], - mode_lib->ms.DPTEBytesPerRow[k], - mode_lib->ms.meta_row_bytes[k]); - } else { - s->per_pipe_flip_bytes[k] = 0; - } - mode_lib->ms.TotImmediateFlipBytes += s->per_pipe_flip_bytes[k] * mode_lib->ms.NoOfDPP[k]; + // Flip Schedule + // Both prefetch schedule and BW okay + if (mode_lib->ms.support.PrefetchSupported == true) { + mode_lib->ms.BandwidthAvailableForImmediateFlip = + get_bandwidth_available_for_immediate_flip( + dml2_core_internal_soc_state_sys_active, + mode_lib->ms.support.urg_bandwidth_required_qual, // no flip + mode_lib->ms.support.urg_bandwidth_available); - } + mode_lib->ms.TotImmediateFlipBytes = 0; + for (k = 0; k < mode_lib->ms.num_active_planes; k++) { + if (display_cfg->plane_descriptors[k].immediate_flip) { + s->per_pipe_flip_bytes[k] = get_pipe_flip_bytes( + s->HostVMInefficiencyFactor, + mode_lib->ms.vm_bytes[k], + mode_lib->ms.DPTEBytesPerRow[k], + mode_lib->ms.meta_row_bytes[k]); + } else { + s->per_pipe_flip_bytes[k] = 0; + } + mode_lib->ms.TotImmediateFlipBytes += s->per_pipe_flip_bytes[k] * mode_lib->ms.NoOfDPP[k]; - for (k = 0; k <= mode_lib->ms.num_active_planes - 1; k++) { - CalculateFlipSchedule( - &mode_lib->scratch, - display_cfg->plane_descriptors[k].immediate_flip, - 1, // use_lb_flip_bw - s->HostVMInefficiencyFactor, - s->Tvm_trips_flip[k], - s->Tr0_trips_flip[k], - s->Tvm_trips_flip_rounded[k], - s->Tr0_trips_flip_rounded[k], - display_cfg->gpuvm_enable, - mode_lib->ms.vm_bytes[k], - mode_lib->ms.DPTEBytesPerRow[k], - mode_lib->ms.BandwidthAvailableForImmediateFlip, - mode_lib->ms.TotImmediateFlipBytes, - display_cfg->plane_descriptors[k].pixel_format, - (display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].timing.h_total / ((double)display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].timing.pixel_clock_khz / 1000)), - display_cfg->plane_descriptors[k].composition.scaler_info.plane0.v_ratio, - display_cfg->plane_descriptors[k].composition.scaler_info.plane1.v_ratio, - mode_lib->ms.Tno_bw_flip[k], - mode_lib->ms.dpte_row_height[k], - mode_lib->ms.dpte_row_height_chroma[k], - mode_lib->ms.use_one_row_for_frame_flip[k], - mode_lib->ip.max_flip_time_us, - mode_lib->ip.max_flip_time_lines, - s->per_pipe_flip_bytes[k], - mode_lib->ms.meta_row_bytes[k], - s->meta_row_height_luma[k], - s->meta_row_height_chroma[k], - mode_lib->ip.dcn_mrq_present && display_cfg->plane_descriptors[k].surface.dcc.enable, - - /* Output */ - &mode_lib->ms.dst_y_per_vm_flip[k], - &mode_lib->ms.dst_y_per_row_flip[k], - &mode_lib->ms.final_flip_bw[k], - &mode_lib->ms.ImmediateFlipSupportedForPipe[k]); - } + } - calculate_peak_bandwidth_params->urg_vactive_bandwidth_required = s->dummy_bw; - calculate_peak_bandwidth_params->urg_bandwidth_required = mode_lib->ms.support.urg_bandwidth_required_flip; - calculate_peak_bandwidth_params->urg_bandwidth_required_qual = s->dummy_bw; - calculate_peak_bandwidth_params->non_urg_bandwidth_required = mode_lib->ms.support.non_urg_bandwidth_required_flip; - calculate_peak_bandwidth_params->surface_avg_vactive_required_bw = s->surface_dummy_bw; - calculate_peak_bandwidth_params->surface_peak_required_bw = mode_lib->ms.surface_peak_required_bw; - - calculate_peak_bandwidth_params->display_cfg = display_cfg; - calculate_peak_bandwidth_params->inc_flip_bw = 1; - calculate_peak_bandwidth_params->num_active_planes = mode_lib->ms.num_active_planes; - calculate_peak_bandwidth_params->num_of_dpp = mode_lib->ms.NoOfDPP; - calculate_peak_bandwidth_params->dcc_dram_bw_nom_overhead_factor_p0 = mode_lib->ms.dcc_dram_bw_nom_overhead_factor_p0; - calculate_peak_bandwidth_params->dcc_dram_bw_nom_overhead_factor_p1 = mode_lib->ms.dcc_dram_bw_nom_overhead_factor_p1; - calculate_peak_bandwidth_params->dcc_dram_bw_pref_overhead_factor_p0 = mode_lib->ms.dcc_dram_bw_pref_overhead_factor_p0; - calculate_peak_bandwidth_params->dcc_dram_bw_pref_overhead_factor_p1 = mode_lib->ms.dcc_dram_bw_pref_overhead_factor_p1; - calculate_peak_bandwidth_params->mall_prefetch_sdp_overhead_factor = mode_lib->ms.mall_prefetch_sdp_overhead_factor; - calculate_peak_bandwidth_params->mall_prefetch_dram_overhead_factor = mode_lib->ms.mall_prefetch_dram_overhead_factor; - - calculate_peak_bandwidth_params->surface_read_bandwidth_l = mode_lib->ms.SurfaceReadBandwidthLuma; - calculate_peak_bandwidth_params->surface_read_bandwidth_c = mode_lib->ms.SurfaceReadBandwidthChroma; - calculate_peak_bandwidth_params->prefetch_bandwidth_l = mode_lib->ms.RequiredPrefetchPixelDataBWLuma; - calculate_peak_bandwidth_params->prefetch_bandwidth_c = mode_lib->ms.RequiredPrefetchPixelDataBWChroma; - calculate_peak_bandwidth_params->excess_vactive_fill_bw_l = mode_lib->ms.excess_vactive_fill_bw_l; - calculate_peak_bandwidth_params->excess_vactive_fill_bw_c = mode_lib->ms.excess_vactive_fill_bw_c; - calculate_peak_bandwidth_params->cursor_bw = mode_lib->ms.cursor_bw; - calculate_peak_bandwidth_params->dpte_row_bw = mode_lib->ms.dpte_row_bw; - calculate_peak_bandwidth_params->meta_row_bw = mode_lib->ms.meta_row_bw; - calculate_peak_bandwidth_params->prefetch_cursor_bw = mode_lib->ms.prefetch_cursor_bw; - calculate_peak_bandwidth_params->prefetch_vmrow_bw = mode_lib->ms.prefetch_vmrow_bw; - calculate_peak_bandwidth_params->flip_bw = mode_lib->ms.final_flip_bw; - calculate_peak_bandwidth_params->urgent_burst_factor_l = mode_lib->ms.UrgentBurstFactorLuma; - calculate_peak_bandwidth_params->urgent_burst_factor_c = mode_lib->ms.UrgentBurstFactorChroma; - calculate_peak_bandwidth_params->urgent_burst_factor_cursor = mode_lib->ms.UrgentBurstFactorCursor; - calculate_peak_bandwidth_params->urgent_burst_factor_prefetch_l = mode_lib->ms.UrgentBurstFactorLumaPre; - calculate_peak_bandwidth_params->urgent_burst_factor_prefetch_c = mode_lib->ms.UrgentBurstFactorChromaPre; - calculate_peak_bandwidth_params->urgent_burst_factor_prefetch_cursor = mode_lib->ms.UrgentBurstFactorCursorPre; - - calculate_peak_bandwidth_required( - &mode_lib->scratch, - calculate_peak_bandwidth_params); - - calculate_immediate_flip_bandwidth_support( - &s->dummy_single[0], // double* frac_urg_bandwidth_flip - &mode_lib->ms.support.ImmediateFlipSupport, - - dml2_core_internal_soc_state_sys_active, - mode_lib->ms.support.urg_bandwidth_required_flip, - mode_lib->ms.support.non_urg_bandwidth_required_flip, - mode_lib->ms.support.urg_bandwidth_available); - - for (k = 0; k <= mode_lib->ms.num_active_planes - 1; k++) { - if (display_cfg->plane_descriptors[k].immediate_flip == true && mode_lib->ms.ImmediateFlipSupportedForPipe[k] == false) - mode_lib->ms.support.ImmediateFlipSupport = false; - } + for (k = 0; k < mode_lib->ms.num_active_planes; k++) { + CalculateFlipSchedule( + &mode_lib->scratch, + display_cfg->plane_descriptors[k].immediate_flip, + 1, // use_lb_flip_bw + s->HostVMInefficiencyFactor, + s->Tvm_trips_flip[k], + s->Tr0_trips_flip[k], + s->Tvm_trips_flip_rounded[k], + s->Tr0_trips_flip_rounded[k], + display_cfg->gpuvm_enable, + mode_lib->ms.vm_bytes[k], + mode_lib->ms.DPTEBytesPerRow[k], + mode_lib->ms.BandwidthAvailableForImmediateFlip, + mode_lib->ms.TotImmediateFlipBytes, + display_cfg->plane_descriptors[k].pixel_format, + (display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].timing.h_total / ((double)display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].timing.pixel_clock_khz / 1000)), + display_cfg->plane_descriptors[k].composition.scaler_info.plane0.v_ratio, + display_cfg->plane_descriptors[k].composition.scaler_info.plane1.v_ratio, + mode_lib->ms.Tno_bw_flip[k], + mode_lib->ms.dpte_row_height[k], + mode_lib->ms.dpte_row_height_chroma[k], + mode_lib->ms.use_one_row_for_frame_flip[k], + mode_lib->ip.max_flip_time_us, + mode_lib->ip.max_flip_time_lines, + s->per_pipe_flip_bytes[k], + mode_lib->ms.meta_row_bytes[k], + s->meta_row_height_luma[k], + s->meta_row_height_chroma[k], + mode_lib->ip.dcn_mrq_present && display_cfg->plane_descriptors[k].surface.dcc.enable, + + /* Output */ + &mode_lib->ms.dst_y_per_vm_flip[k], + &mode_lib->ms.dst_y_per_row_flip[k], + &mode_lib->ms.final_flip_bw[k], + &mode_lib->ms.ImmediateFlipSupportedForPipe[k]); + } + + calculate_peak_bandwidth_params->urg_vactive_bandwidth_required = s->dummy_bw; + calculate_peak_bandwidth_params->urg_bandwidth_required = mode_lib->ms.support.urg_bandwidth_required_flip; + calculate_peak_bandwidth_params->urg_bandwidth_required_qual = s->dummy_bw; + calculate_peak_bandwidth_params->non_urg_bandwidth_required = mode_lib->ms.support.non_urg_bandwidth_required_flip; + calculate_peak_bandwidth_params->surface_avg_vactive_required_bw = s->surface_dummy_bw; + calculate_peak_bandwidth_params->surface_peak_required_bw = mode_lib->ms.surface_peak_required_bw; + + calculate_peak_bandwidth_params->display_cfg = display_cfg; + calculate_peak_bandwidth_params->inc_flip_bw = 1; + calculate_peak_bandwidth_params->num_active_planes = mode_lib->ms.num_active_planes; + calculate_peak_bandwidth_params->num_of_dpp = mode_lib->ms.NoOfDPP; + calculate_peak_bandwidth_params->dcc_dram_bw_nom_overhead_factor_p0 = mode_lib->ms.dcc_dram_bw_nom_overhead_factor_p0; + calculate_peak_bandwidth_params->dcc_dram_bw_nom_overhead_factor_p1 = mode_lib->ms.dcc_dram_bw_nom_overhead_factor_p1; + calculate_peak_bandwidth_params->dcc_dram_bw_pref_overhead_factor_p0 = mode_lib->ms.dcc_dram_bw_pref_overhead_factor_p0; + calculate_peak_bandwidth_params->dcc_dram_bw_pref_overhead_factor_p1 = mode_lib->ms.dcc_dram_bw_pref_overhead_factor_p1; + calculate_peak_bandwidth_params->mall_prefetch_sdp_overhead_factor = mode_lib->ms.mall_prefetch_sdp_overhead_factor; + calculate_peak_bandwidth_params->mall_prefetch_dram_overhead_factor = mode_lib->ms.mall_prefetch_dram_overhead_factor; + + calculate_peak_bandwidth_params->surface_read_bandwidth_l = mode_lib->ms.vactive_sw_bw_l; + calculate_peak_bandwidth_params->surface_read_bandwidth_c = mode_lib->ms.vactive_sw_bw_c; + calculate_peak_bandwidth_params->prefetch_bandwidth_l = mode_lib->ms.RequiredPrefetchPixelDataBWLuma; + calculate_peak_bandwidth_params->prefetch_bandwidth_c = mode_lib->ms.RequiredPrefetchPixelDataBWChroma; + calculate_peak_bandwidth_params->excess_vactive_fill_bw_l = mode_lib->ms.excess_vactive_fill_bw_l; + calculate_peak_bandwidth_params->excess_vactive_fill_bw_c = mode_lib->ms.excess_vactive_fill_bw_c; + calculate_peak_bandwidth_params->cursor_bw = mode_lib->ms.cursor_bw; + calculate_peak_bandwidth_params->dpte_row_bw = mode_lib->ms.dpte_row_bw; + calculate_peak_bandwidth_params->meta_row_bw = mode_lib->ms.meta_row_bw; + calculate_peak_bandwidth_params->prefetch_cursor_bw = mode_lib->ms.prefetch_cursor_bw; + calculate_peak_bandwidth_params->prefetch_vmrow_bw = mode_lib->ms.prefetch_vmrow_bw; + calculate_peak_bandwidth_params->flip_bw = mode_lib->ms.final_flip_bw; + calculate_peak_bandwidth_params->urgent_burst_factor_l = mode_lib->ms.UrgentBurstFactorLuma; + calculate_peak_bandwidth_params->urgent_burst_factor_c = mode_lib->ms.UrgentBurstFactorChroma; + calculate_peak_bandwidth_params->urgent_burst_factor_cursor = mode_lib->ms.UrgentBurstFactorCursor; + calculate_peak_bandwidth_params->urgent_burst_factor_prefetch_l = mode_lib->ms.UrgentBurstFactorLumaPre; + calculate_peak_bandwidth_params->urgent_burst_factor_prefetch_c = mode_lib->ms.UrgentBurstFactorChromaPre; + calculate_peak_bandwidth_params->urgent_burst_factor_prefetch_cursor = mode_lib->ms.UrgentBurstFactorCursorPre; + + calculate_peak_bandwidth_required( + &mode_lib->scratch, + calculate_peak_bandwidth_params); + + calculate_immediate_flip_bandwidth_support( + &s->dummy_single[0], // double* frac_urg_bandwidth_flip + &mode_lib->ms.support.ImmediateFlipSupport, + + dml2_core_internal_soc_state_sys_active, + mode_lib->ms.support.urg_bandwidth_required_flip, + mode_lib->ms.support.non_urg_bandwidth_required_flip, + mode_lib->ms.support.urg_bandwidth_available); - } else { // if prefetch not support, assume iflip is not supported too + for (k = 0; k <= mode_lib->ms.num_active_planes - 1; k++) { + if (display_cfg->plane_descriptors[k].immediate_flip == true && mode_lib->ms.ImmediateFlipSupportedForPipe[k] == false) mode_lib->ms.support.ImmediateFlipSupport = false; - } - } // prefetch schedule + } + + } else { // if prefetch not support, assume iflip is not supported too + mode_lib->ms.support.ImmediateFlipSupport = false; } s->mSOCParameters.UrgentLatency = mode_lib->ms.UrgLatency; @@ -9116,8 +9408,8 @@ static bool dml_core_mode_support(struct dml2_core_calcs_mode_support_ex *in_out s->pstate_bytes_required_c, mode_lib->ms.dcc_dram_bw_nom_overhead_factor_p0, mode_lib->ms.dcc_dram_bw_nom_overhead_factor_p1, - mode_lib->ms.SurfaceReadBandwidthLuma, - mode_lib->ms.SurfaceReadBandwidthChroma, + mode_lib->ms.vactive_sw_bw_l, + mode_lib->ms.vactive_sw_bw_c, mode_lib->ms.surface_avg_vactive_required_bw, mode_lib->ms.surface_peak_required_bw, /* outputs */ @@ -9187,12 +9479,12 @@ static bool dml_core_mode_support(struct dml2_core_calcs_mode_support_ex *in_out dml2_printf("DML::%s: ModeSupport = %u\n", __func__, mode_lib->ms.support.ModeSupport); dml2_printf("DML::%s: ImmediateFlipSupport = %u\n", __func__, mode_lib->ms.support.ImmediateFlipSupport); - for (k = 0; k <= mode_lib->ms.num_active_planes - 1; k++) { + for (k = 0; k < mode_lib->ms.num_active_planes; k++) { mode_lib->ms.support.MPCCombineEnable[k] = mode_lib->ms.MPCCombine[k]; mode_lib->ms.support.DPPPerSurface[k] = mode_lib->ms.NoOfDPP[k]; } - for (k = 0; k <= mode_lib->ms.num_active_planes - 1; k++) { + for (k = 0; k < mode_lib->ms.num_active_planes; k++) { mode_lib->ms.support.ODMMode[k] = mode_lib->ms.ODMMode[k]; mode_lib->ms.support.DSCEnabled[k] = mode_lib->ms.RequiresDSC[k]; mode_lib->ms.support.FECEnabled[k] = mode_lib->ms.RequiresFEC[k]; @@ -9229,7 +9521,7 @@ unsigned int dml2_core_calcs_mode_support_ex(struct dml2_core_calcs_mode_support dml2_printf("DML::%s: is_mode_support = %u (min_clk_index=%d)\n", __func__, result, in_out_params->min_clk_index); for (unsigned int k = 0; k < in_out_params->in_display_cfg->num_planes; k++) - dml2_printf("DML::%s: plane_%d: reserved_vblank_time_ns = %u\n", __func__, k, in_out_params->in_display_cfg->plane_descriptors[k].overrides.reserved_vblank_time_ns); + dml2_printf("DML::%s: plane_%d: reserved_vblank_time_ns = %u\n", __func__, k, in_out_params->in_display_cfg->plane_descriptors[k].overrides.reserved_vblank_time_ns); dml2_printf("DML::%s: ------------- DONE ----------\n", __func__); @@ -9882,7 +10174,7 @@ static void CalculateStutterEfficiency(struct dml2_core_internal_scratch *scratc if (!dml_is_phantom_pipe(&p->display_cfg->plane_descriptors[k])) { if (!l->stream_visited[p->display_cfg->plane_descriptors[k].stream_index]) { - if (p->display_cfg->stream_descriptors[k].writeback.enable) + if (p->display_cfg->stream_descriptors[k].writeback.active_writebacks_per_stream > 0) l->TotalActiveWriteback = l->TotalActiveWriteback + 1; if (TotalNumberOfActiveOTG == 0) { // first otg @@ -9984,6 +10276,7 @@ static bool dml_core_mode_programming(struct dml2_core_calcs_mode_programming_ex struct dml2_core_calcs_CalculateSwathAndDETConfiguration_params *CalculateSwathAndDETConfiguration_params = &mode_lib->scratch.CalculateSwathAndDETConfiguration_params; struct dml2_core_calcs_CalculateStutterEfficiency_params *CalculateStutterEfficiency_params = &mode_lib->scratch.CalculateStutterEfficiency_params; struct dml2_core_calcs_CalculatePrefetchSchedule_params *CalculatePrefetchSchedule_params = &mode_lib->scratch.CalculatePrefetchSchedule_params; + struct dml2_core_calcs_CheckGlobalPrefetchAdmissibility_params *CheckGlobalPrefetchAdmissibility_params = &mode_lib->scratch.CheckGlobalPrefetchAdmissibility_params; struct dml2_core_calcs_calculate_mcache_setting_params *calculate_mcache_setting_params = &mode_lib->scratch.calculate_mcache_setting_params; struct dml2_core_calcs_calculate_tdlut_setting_params *calculate_tdlut_setting_params = &mode_lib->scratch.calculate_tdlut_setting_params; struct dml2_core_shared_CalculateMetaAndPTETimes_params *CalculateMetaAndPTETimes_params = &mode_lib->scratch.CalculateMetaAndPTETimes_params; @@ -10198,10 +10491,10 @@ static bool dml_core_mode_programming(struct dml2_core_calcs_mode_programming_ex for (k = 0; k < s->num_active_planes; ++k) { mode_lib->mp.cursor_bw[k] = display_cfg->plane_descriptors[k].cursor.num_cursors * display_cfg->plane_descriptors[k].cursor.cursor_width * display_cfg->plane_descriptors[k].cursor.cursor_bpp / 8.0 / ((double)display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].timing.h_total / ((double)display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].timing.pixel_clock_khz / 1000)); - mode_lib->mp.SurfaceReadBandwidthLuma[k] = mode_lib->mp.SwathWidthSingleDPPY[k] * mode_lib->mp.BytePerPixelY[k] / (display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].timing.h_total / ((double)display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].timing.pixel_clock_khz / 1000)) * display_cfg->plane_descriptors[k].composition.scaler_info.plane0.v_ratio; - mode_lib->mp.SurfaceReadBandwidthChroma[k] = mode_lib->mp.SwathWidthSingleDPPC[k] * mode_lib->mp.BytePerPixelC[k] / (display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].timing.h_total / ((double)display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].timing.pixel_clock_khz / 1000)) * display_cfg->plane_descriptors[k].composition.scaler_info.plane1.v_ratio; - dml2_printf("DML::%s: ReadBandwidthSurfaceLuma[%i] = %fBps\n", __func__, k, mode_lib->mp.SurfaceReadBandwidthLuma[k]); - dml2_printf("DML::%s: ReadBandwidthSurfaceChroma[%i] = %fBps\n", __func__, k, mode_lib->mp.SurfaceReadBandwidthChroma[k]); + mode_lib->mp.vactive_sw_bw_l[k] = mode_lib->mp.SwathWidthSingleDPPY[k] * mode_lib->mp.BytePerPixelY[k] / (display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].timing.h_total / ((double)display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].timing.pixel_clock_khz / 1000)) * display_cfg->plane_descriptors[k].composition.scaler_info.plane0.v_ratio; + mode_lib->mp.vactive_sw_bw_c[k] = mode_lib->mp.SwathWidthSingleDPPC[k] * mode_lib->mp.BytePerPixelC[k] / (display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].timing.h_total / ((double)display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].timing.pixel_clock_khz / 1000)) * display_cfg->plane_descriptors[k].composition.scaler_info.plane1.v_ratio; + dml2_printf("DML::%s: vactive_sw_bw_l[%i] = %fBps\n", __func__, k, mode_lib->mp.vactive_sw_bw_l[k]); + dml2_printf("DML::%s: vactive_sw_bw_c[%i] = %fBps\n", __func__, k, mode_lib->mp.vactive_sw_bw_c[k]); } CalculateSwathAndDETConfiguration_params->display_cfg = display_cfg; @@ -10217,8 +10510,8 @@ static bool dml_core_mode_programming(struct dml2_core_calcs_mode_programming_ex CalculateSwathAndDETConfiguration_params->nomDETInKByte = s->NomDETInKByte; CalculateSwathAndDETConfiguration_params->ConfigReturnBufferSegmentSizeInkByte = mode_lib->ip.config_return_buffer_segment_size_in_kbytes; CalculateSwathAndDETConfiguration_params->CompressedBufferSegmentSizeInkByte = mode_lib->ip.compressed_buffer_segment_size_in_kbytes; - CalculateSwathAndDETConfiguration_params->ReadBandwidthLuma = mode_lib->mp.SurfaceReadBandwidthLuma; - CalculateSwathAndDETConfiguration_params->ReadBandwidthChroma = mode_lib->mp.SurfaceReadBandwidthChroma; + CalculateSwathAndDETConfiguration_params->ReadBandwidthLuma = mode_lib->mp.vactive_sw_bw_l; + CalculateSwathAndDETConfiguration_params->ReadBandwidthChroma = mode_lib->mp.vactive_sw_bw_c; CalculateSwathAndDETConfiguration_params->MaximumSwathWidthLuma = s->dummy_single_array[0]; CalculateSwathAndDETConfiguration_params->MaximumSwathWidthChroma = s->dummy_single_array[1]; CalculateSwathAndDETConfiguration_params->Read256BytesBlockHeightY = mode_lib->mp.Read256BlockHeightY; @@ -10583,17 +10876,17 @@ static bool dml_core_mode_programming(struct dml2_core_calcs_mode_programming_ex mode_lib->mp.TCalc = 24.0 / mode_lib->mp.DCFCLKDeepSleep; for (k = 0; k < s->num_active_planes; ++k) { - if (display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.enable == true) { + if (display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.active_writebacks_per_stream > 0) { mode_lib->mp.WritebackDelay[k] = mode_lib->soc.qos_parameters.writeback.base_latency_us + CalculateWriteBackDelay( - display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.pixel_format, - display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.scaling_info.h_ratio, - display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.scaling_info.v_ratio, - display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.scaling_info.v_taps, - display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.scaling_info.output_width, - display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.scaling_info.output_height, - display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.scaling_info.input_height, + display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.writeback_stream[0].pixel_format, + display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.writeback_stream[0].h_ratio, + display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.writeback_stream[0].v_ratio, + display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.writeback_stream[0].v_taps, + display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.writeback_stream[0].output_width, + display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.writeback_stream[0].output_height, + display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.writeback_stream[0].input_height, display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].timing.h_total) / mode_lib->mp.Dispclk; } else mode_lib->mp.WritebackDelay[k] = 0; @@ -10679,10 +10972,25 @@ static bool dml_core_mode_programming(struct dml2_core_calcs_mode_programming_ex for (k = 0; k < s->num_active_planes; ++k) { bool cursor_not_enough_urgent_latency_hiding = 0; - double line_time_us = 0.0; - - line_time_us = display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].timing.h_total / + s->line_times[k] = display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].timing.h_total / ((double)display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].timing.pixel_clock_khz / 1000); + + s->pixel_format[k] = display_cfg->plane_descriptors[k].pixel_format; + + s->lb_source_lines_l[k] = get_num_lb_source_lines(mode_lib->ip.max_line_buffer_lines, mode_lib->ip.line_buffer_size_bits, + mode_lib->mp.NoOfDPP[k], + display_cfg->plane_descriptors[k].composition.viewport.plane0.width, + display_cfg->plane_descriptors[k].composition.viewport.plane0.height, + display_cfg->plane_descriptors[k].composition.scaler_info.plane0.h_ratio, + display_cfg->plane_descriptors[k].composition.rotation_angle); + + s->lb_source_lines_c[k] = get_num_lb_source_lines(mode_lib->ip.max_line_buffer_lines, mode_lib->ip.line_buffer_size_bits, + mode_lib->mp.NoOfDPP[k], + display_cfg->plane_descriptors[k].composition.viewport.plane1.width, + display_cfg->plane_descriptors[k].composition.viewport.plane1.height, + display_cfg->plane_descriptors[k].composition.scaler_info.plane1.h_ratio, + display_cfg->plane_descriptors[k].composition.rotation_angle); + if (display_cfg->plane_descriptors[k].cursor.num_cursors > 0) { calculate_cursor_req_attributes( display_cfg->plane_descriptors[k].cursor.cursor_width, @@ -10699,7 +11007,7 @@ static bool dml_core_mode_programming(struct dml2_core_calcs_mode_programming_ex display_cfg->plane_descriptors[k].cursor.cursor_width, s->cursor_bytes_per_chunk[k], s->cursor_lines_per_chunk[k], - line_time_us, + s->line_times[k], mode_lib->mp.UrgentLatency, // output @@ -10714,7 +11022,7 @@ static bool dml_core_mode_programming(struct dml2_core_calcs_mode_programming_ex mode_lib->mp.swath_width_chroma_ub[k], mode_lib->mp.SwathHeightY[k], mode_lib->mp.SwathHeightC[k], - line_time_us, + s->line_times[k], mode_lib->mp.UrgentLatency, display_cfg->plane_descriptors[k].composition.scaler_info.plane0.v_ratio, display_cfg->plane_descriptors[k].composition.scaler_info.plane1.v_ratio, @@ -10752,6 +11060,35 @@ static bool dml_core_mode_programming(struct dml2_core_calcs_mode_programming_ex dml2_printf("DML::%s: immediate_flip_required = %u\n", __func__, s->immediate_flip_required); #endif + if (s->num_active_planes > 1) { + CheckGlobalPrefetchAdmissibility_params->num_active_planes = s->num_active_planes; + CheckGlobalPrefetchAdmissibility_params->pixel_format = s->pixel_format; + CheckGlobalPrefetchAdmissibility_params->chunk_bytes_l = mode_lib->ip.pixel_chunk_size_kbytes * 1024; + CheckGlobalPrefetchAdmissibility_params->chunk_bytes_c = mode_lib->ip.pixel_chunk_size_kbytes * 1024; + CheckGlobalPrefetchAdmissibility_params->lb_source_lines_l = s->lb_source_lines_l; + CheckGlobalPrefetchAdmissibility_params->lb_source_lines_c = s->lb_source_lines_c; + CheckGlobalPrefetchAdmissibility_params->swath_height_l = mode_lib->mp.SwathHeightY; + CheckGlobalPrefetchAdmissibility_params->swath_height_c = mode_lib->mp.SwathHeightC; + CheckGlobalPrefetchAdmissibility_params->rob_buffer_size_kbytes = mode_lib->ip.rob_buffer_size_kbytes; + CheckGlobalPrefetchAdmissibility_params->compressed_buffer_size_kbytes = mode_lib->mp.CompressedBufferSizeInkByte; + CheckGlobalPrefetchAdmissibility_params->detile_buffer_size_bytes_l = mode_lib->mp.DETBufferSizeY; + CheckGlobalPrefetchAdmissibility_params->detile_buffer_size_bytes_c = mode_lib->mp.DETBufferSizeC; + CheckGlobalPrefetchAdmissibility_params->full_swath_bytes_l = s->full_swath_bytes_l; + CheckGlobalPrefetchAdmissibility_params->full_swath_bytes_c = s->full_swath_bytes_c; + CheckGlobalPrefetchAdmissibility_params->prefetch_sw_bytes = s->prefetch_sw_bytes; + CheckGlobalPrefetchAdmissibility_params->Tpre_rounded = 0; // don't care + CheckGlobalPrefetchAdmissibility_params->Tpre_oto = 0; // don't care + CheckGlobalPrefetchAdmissibility_params->estimated_urg_bandwidth_required_mbps = mode_lib->mp.urg_bandwidth_available[dml2_core_internal_soc_state_sys_active][dml2_core_internal_bw_sdp]; + CheckGlobalPrefetchAdmissibility_params->estimated_dcfclk_mhz = mode_lib->mp.Dcfclk; + CheckGlobalPrefetchAdmissibility_params->line_time = s->line_times; + CheckGlobalPrefetchAdmissibility_params->dst_y_prefetch = mode_lib->mp.dst_y_prefetch; + + // if recalc_prefetch_schedule is set, recalculate the prefetch schedule with the new impacted_Tpre, prefetch should be possible + CheckGlobalPrefetchAdmissibility_params->recalc_prefetch_schedule = &s->dummy_boolean[0]; + CheckGlobalPrefetchAdmissibility_params->impacted_dst_y_pre = s->impacted_dst_y_pre; + CheckGlobalPrefetchAdmissibility(&mode_lib->scratch, CheckGlobalPrefetchAdmissibility_params); // dont care about the check output for mode programming + } + { s->DestinationLineTimesForPrefetchLessThan2 = false; s->VRatioPrefetchMoreThanMax = false; @@ -10763,11 +11100,11 @@ static bool dml_core_mode_programming(struct dml2_core_calcs_mode_programming_ex dml2_printf("DML::%s: k=%d MaxVStartupLines = %u\n", __func__, k, s->MaxVStartupLines[k]); mode_lib->mp.TWait[k] = CalculateTWait( - display_cfg->plane_descriptors[k].overrides.reserved_vblank_time_ns, - mode_lib->mp.UrgentLatency, - mode_lib->mp.TripToMemory, - !dml_is_phantom_pipe(&display_cfg->plane_descriptors[k]) && display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].timing.drr_config.enabled ? - get_g6_temp_read_blackout_us(&mode_lib->soc, (unsigned int)(mode_lib->mp.uclk_freq_mhz * 1000), in_out_params->min_clk_index) : 0.0); + display_cfg->plane_descriptors[k].overrides.reserved_vblank_time_ns, + mode_lib->mp.UrgentLatency, + mode_lib->mp.TripToMemory, + !dml_is_phantom_pipe(&display_cfg->plane_descriptors[k]) && display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].timing.drr_config.enabled ? + get_g6_temp_read_blackout_us(&mode_lib->soc, (unsigned int)(mode_lib->mp.uclk_freq_mhz * 1000), in_out_params->min_clk_index) : 0.0); myPipe->Dppclk = mode_lib->mp.Dppclk[k]; myPipe->Dispclk = mode_lib->mp.Dispclk; @@ -10848,6 +11185,9 @@ static bool dml_core_mode_programming(struct dml2_core_calcs_mode_programming_ex CalculatePrefetchSchedule_params->mrq_present = mode_lib->ip.dcn_mrq_present; CalculatePrefetchSchedule_params->meta_row_bytes = mode_lib->mp.meta_row_bytes[k]; CalculatePrefetchSchedule_params->mall_prefetch_sdp_overhead_factor = mode_lib->mp.mall_prefetch_sdp_overhead_factor[k]; + CalculatePrefetchSchedule_params->impacted_dst_y_pre = s->impacted_dst_y_pre[k]; + CalculatePrefetchSchedule_params->vactive_sw_bw_l = mode_lib->mp.vactive_sw_bw_l[k]; + CalculatePrefetchSchedule_params->vactive_sw_bw_c = mode_lib->mp.vactive_sw_bw_c[k]; // output CalculatePrefetchSchedule_params->DSTXAfterScaler = &mode_lib->mp.DSTXAfterScaler[k]; @@ -10876,9 +11216,17 @@ static bool dml_core_mode_programming(struct dml2_core_calcs_mode_programming_ex CalculatePrefetchSchedule_params->VUpdateWidthPix = &mode_lib->mp.VUpdateWidthPix[k]; CalculatePrefetchSchedule_params->VReadyOffsetPix = &mode_lib->mp.VReadyOffsetPix[k]; CalculatePrefetchSchedule_params->prefetch_cursor_bw = &mode_lib->mp.prefetch_cursor_bw[k]; + CalculatePrefetchSchedule_params->prefetch_sw_bytes = &s->prefetch_sw_bytes[k]; + CalculatePrefetchSchedule_params->Tpre_rounded = &s->Tpre_rounded[k]; + CalculatePrefetchSchedule_params->Tpre_oto = &s->Tpre_oto[k]; mode_lib->mp.NoTimeToPrefetch[k] = CalculatePrefetchSchedule(&mode_lib->scratch, CalculatePrefetchSchedule_params); + if (s->impacted_dst_y_pre[k] > 0) + mode_lib->mp.impacted_prefetch_margin_us[k] = (mode_lib->mp.dst_y_prefetch[k] - s->impacted_dst_y_pre[k]) * s->line_times[k]; + else + mode_lib->mp.impacted_prefetch_margin_us[k] = 0; + #ifdef __DML_VBA_DEBUG__ dml2_printf("DML::%s: k=%0u NoTimeToPrefetch=%0d\n", __func__, k, mode_lib->mp.NoTimeToPrefetch[k]); #endif @@ -10956,8 +11304,8 @@ static bool dml_core_mode_programming(struct dml2_core_calcs_mode_programming_ex dml2_printf("DML::%s: k=%0u VRatioY=%f\n", __func__, k, display_cfg->plane_descriptors[k].composition.scaler_info.plane0.v_ratio); dml2_printf("DML::%s: k=%0u prefetch_vmrow_bw=%f\n", __func__, k, mode_lib->mp.prefetch_vmrow_bw[k]); - dml2_printf("DML::%s: k=%0u ReadBandwidthSurfaceLuma=%f\n", __func__, k, mode_lib->mp.SurfaceReadBandwidthLuma[k]); - dml2_printf("DML::%s: k=%0u ReadBandwidthSurfaceChroma=%f\n", __func__, k, mode_lib->mp.SurfaceReadBandwidthChroma[k]); + dml2_printf("DML::%s: k=%0u vactive_sw_bw_l=%f\n", __func__, k, mode_lib->mp.vactive_sw_bw_l[k]); + dml2_printf("DML::%s: k=%0u vactive_sw_bw_c=%f\n", __func__, k, mode_lib->mp.vactive_sw_bw_c[k]); dml2_printf("DML::%s: k=%0u cursor_bw=%f\n", __func__, k, mode_lib->mp.cursor_bw[k]); dml2_printf("DML::%s: k=%0u dpte_row_bw=%f\n", __func__, k, mode_lib->mp.dpte_row_bw[k]); dml2_printf("DML::%s: k=%0u meta_row_bw=%f\n", __func__, k, mode_lib->mp.meta_row_bw[k]); @@ -10988,8 +11336,8 @@ static bool dml_core_mode_programming(struct dml2_core_calcs_mode_programming_ex calculate_peak_bandwidth_params->mall_prefetch_sdp_overhead_factor = mode_lib->mp.mall_prefetch_sdp_overhead_factor; calculate_peak_bandwidth_params->mall_prefetch_dram_overhead_factor = mode_lib->mp.mall_prefetch_dram_overhead_factor; - calculate_peak_bandwidth_params->surface_read_bandwidth_l = mode_lib->mp.SurfaceReadBandwidthLuma; - calculate_peak_bandwidth_params->surface_read_bandwidth_c = mode_lib->mp.SurfaceReadBandwidthChroma; + calculate_peak_bandwidth_params->surface_read_bandwidth_l = mode_lib->mp.vactive_sw_bw_l; + calculate_peak_bandwidth_params->surface_read_bandwidth_c = mode_lib->mp.vactive_sw_bw_c; calculate_peak_bandwidth_params->prefetch_bandwidth_l = mode_lib->mp.RequiredPrefetchPixelDataBWLuma; calculate_peak_bandwidth_params->prefetch_bandwidth_c = mode_lib->mp.RequiredPrefetchPixelDataBWChroma; calculate_peak_bandwidth_params->excess_vactive_fill_bw_l = mode_lib->mp.excess_vactive_fill_bw_l; @@ -11120,8 +11468,8 @@ static bool dml_core_mode_programming(struct dml2_core_calcs_mode_programming_ex calculate_peak_bandwidth_params->mall_prefetch_sdp_overhead_factor = mode_lib->mp.mall_prefetch_sdp_overhead_factor; calculate_peak_bandwidth_params->mall_prefetch_dram_overhead_factor = mode_lib->mp.mall_prefetch_dram_overhead_factor; - calculate_peak_bandwidth_params->surface_read_bandwidth_l = mode_lib->mp.SurfaceReadBandwidthLuma; - calculate_peak_bandwidth_params->surface_read_bandwidth_c = mode_lib->mp.SurfaceReadBandwidthChroma; + calculate_peak_bandwidth_params->surface_read_bandwidth_l = mode_lib->mp.vactive_sw_bw_l; + calculate_peak_bandwidth_params->surface_read_bandwidth_c = mode_lib->mp.vactive_sw_bw_c; calculate_peak_bandwidth_params->prefetch_bandwidth_l = mode_lib->mp.RequiredPrefetchPixelDataBWLuma; calculate_peak_bandwidth_params->prefetch_bandwidth_c = mode_lib->mp.RequiredPrefetchPixelDataBWChroma; calculate_peak_bandwidth_params->excess_vactive_fill_bw_l = mode_lib->mp.excess_vactive_fill_bw_l; @@ -11238,8 +11586,8 @@ static bool dml_core_mode_programming(struct dml2_core_calcs_mode_programming_ex s->mmSOCParameters.USRRetrainingLatency = 0; s->mmSOCParameters.SMNLatency = 0; s->mmSOCParameters.g6_temp_read_blackout_us = get_g6_temp_read_blackout_us(&mode_lib->soc, (unsigned int)(mode_lib->mp.uclk_freq_mhz * 1000), in_out_params->min_clk_index); - s->mmSOCParameters.max_urgent_latency_us = get_max_urgent_latency_us(&mode_lib->soc.qos_parameters.qos_params.dcn4x, mode_lib->ms.uclk_freq_mhz, mode_lib->ms.FabricClock, in_out_params->min_clk_index); - s->mmSOCParameters.df_response_time_us = mode_lib->soc.qos_parameters.qos_params.dcn4x.df_qos_response_time_fclk_cycles / mode_lib->ms.FabricClock; + s->mmSOCParameters.max_urgent_latency_us = get_max_urgent_latency_us(&mode_lib->soc.qos_parameters.qos_params.dcn4x, mode_lib->mp.uclk_freq_mhz, mode_lib->mp.FabricClock, in_out_params->min_clk_index); + s->mmSOCParameters.df_response_time_us = mode_lib->soc.qos_parameters.qos_params.dcn4x.df_qos_response_time_fclk_cycles / mode_lib->mp.FabricClock; s->mmSOCParameters.qos_type = mode_lib->soc.qos_parameters.qos_type; CalculateWatermarks_params->display_cfg = display_cfg; @@ -11289,7 +11637,7 @@ static bool dml_core_mode_programming(struct dml2_core_calcs_mode_programming_ex CalculateWatermarksMALLUseAndDRAMSpeedChangeSupport(&mode_lib->scratch, CalculateWatermarks_params); for (k = 0; k < s->num_active_planes; ++k) { - if (display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.enable == true) { + if (display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.active_writebacks_per_stream > 0) { mode_lib->mp.WritebackAllowDRAMClockChangeEndPosition[k] = math_max2(0, mode_lib->mp.VStartupMin[k] * display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].timing.h_total / ((double)display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].timing.pixel_clock_khz / 1000) - mode_lib->mp.Watermark.WritebackDRAMClockChangeWatermark); mode_lib->mp.WritebackAllowFCLKChangeEndPosition[k] = math_max2(0, mode_lib->mp.VStartupMin[k] * display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].timing.h_total / @@ -11475,25 +11823,25 @@ static bool dml_core_mode_programming(struct dml2_core_calcs_mode_programming_ex //Maximum Bandwidth Used s->TotalWRBandwidth = 0; - s->WRBandwidth = 0; - for (k = 0; k < s->num_active_planes; ++k) { - if (display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.enable == true && display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.pixel_format == dml2_444_32) { - s->WRBandwidth = display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.scaling_info.output_height * display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.scaling_info.output_width / - (display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].timing.h_total * display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.scaling_info.input_height / ((double)display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].timing.pixel_clock_khz / 1000)) * 4; - } else if (display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.enable == true) { - s->WRBandwidth = display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.scaling_info.output_height * display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.scaling_info.output_width / - (display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].timing.h_total * display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].writeback.scaling_info.input_height / ((double)display_cfg->stream_descriptors[display_cfg->plane_descriptors[k].stream_index].timing.pixel_clock_khz / 1000)) * 8; + for (k = 0; k < display_cfg->num_streams; ++k) { + s->WRBandwidth = 0; + if (display_cfg->stream_descriptors[k].writeback.active_writebacks_per_stream > 0) { + s->WRBandwidth = display_cfg->stream_descriptors[k].writeback.writeback_stream[0].output_height + * display_cfg->stream_descriptors[k].writeback.writeback_stream[0].output_width / + (display_cfg->stream_descriptors[k].timing.h_total * display_cfg->stream_descriptors[k].writeback.writeback_stream[0].input_height + / ((double)display_cfg->stream_descriptors[k].timing.pixel_clock_khz / 1000)) + * (display_cfg->stream_descriptors[k].writeback.writeback_stream[0].pixel_format == dml2_444_32 ? 4.0 : 8.0); + s->TotalWRBandwidth = s->TotalWRBandwidth + s->WRBandwidth; } - s->TotalWRBandwidth = s->TotalWRBandwidth + s->WRBandwidth; } mode_lib->mp.TotalDataReadBandwidth = 0; for (k = 0; k < s->num_active_planes; ++k) { - mode_lib->mp.TotalDataReadBandwidth = mode_lib->mp.TotalDataReadBandwidth + mode_lib->mp.SurfaceReadBandwidthLuma[k] + mode_lib->mp.SurfaceReadBandwidthChroma[k]; + mode_lib->mp.TotalDataReadBandwidth = mode_lib->mp.TotalDataReadBandwidth + mode_lib->mp.vactive_sw_bw_l[k] + mode_lib->mp.vactive_sw_bw_c[k]; #ifdef __DML_VBA_DEBUG__ dml2_printf("DML::%s: k=%u, TotalDataReadBandwidth = %f\n", __func__, k, mode_lib->mp.TotalDataReadBandwidth); - dml2_printf("DML::%s: k=%u, ReadBandwidthSurfaceLuma = %f\n", __func__, k, mode_lib->mp.SurfaceReadBandwidthLuma[k]); - dml2_printf("DML::%s: k=%u, ReadBandwidthSurfaceChroma = %f\n", __func__, k, mode_lib->mp.SurfaceReadBandwidthChroma[k]); + dml2_printf("DML::%s: k=%u, vactive_sw_bw_l = %f\n", __func__, k, mode_lib->mp.vactive_sw_bw_l[k]); + dml2_printf("DML::%s: k=%u, vactive_sw_bw_c = %f\n", __func__, k, mode_lib->mp.vactive_sw_bw_c[k]); #endif } @@ -11530,8 +11878,8 @@ static bool dml_core_mode_programming(struct dml2_core_calcs_mode_programming_ex CalculateStutterEfficiency_params->BlockWidth256BytesC = mode_lib->mp.Read256BlockWidthC; CalculateStutterEfficiency_params->DCCYMaxUncompressedBlock = mode_lib->mp.DCCYMaxUncompressedBlock; CalculateStutterEfficiency_params->DCCCMaxUncompressedBlock = mode_lib->mp.DCCCMaxUncompressedBlock; - CalculateStutterEfficiency_params->ReadBandwidthSurfaceLuma = mode_lib->mp.SurfaceReadBandwidthLuma; - CalculateStutterEfficiency_params->ReadBandwidthSurfaceChroma = mode_lib->mp.SurfaceReadBandwidthChroma; + CalculateStutterEfficiency_params->ReadBandwidthSurfaceLuma = mode_lib->mp.vactive_sw_bw_l; + CalculateStutterEfficiency_params->ReadBandwidthSurfaceChroma = mode_lib->mp.vactive_sw_bw_c; CalculateStutterEfficiency_params->dpte_row_bw = mode_lib->mp.dpte_row_bw; CalculateStutterEfficiency_params->meta_row_bw = mode_lib->mp.meta_row_bw; CalculateStutterEfficiency_params->rob_alloc_compressed = mode_lib->ip.dcn_mrq_present; @@ -11742,7 +12090,7 @@ static void rq_dlg_get_wm_regs(const struct dml2_display_cfg *display_cfg, const wm_regs->fclk_pstate = (int unsigned)(mode_lib->mp.Watermark.FCLKChangeWatermark * refclk_freq_in_mhz); wm_regs->sr_enter = (int unsigned)(mode_lib->mp.Watermark.StutterEnterPlusExitWatermark * refclk_freq_in_mhz); wm_regs->sr_exit = (int unsigned)(mode_lib->mp.Watermark.StutterExitWatermark * refclk_freq_in_mhz); - wm_regs->temp_read_or_ppt = (int unsigned)(mode_lib->mp.Watermark.g6_temp_read_watermark_us * refclk_freq_in_mhz); + wm_regs->temp_read_or_ppt = (int unsigned)(mode_lib->mp.Watermark.temp_read_or_ppt_watermark_us * refclk_freq_in_mhz); wm_regs->uclk_pstate = (int unsigned)(mode_lib->mp.Watermark.DRAMClockChangeWatermark * refclk_freq_in_mhz); wm_regs->urgent = (int unsigned)(mode_lib->mp.Watermark.UrgentWatermark * refclk_freq_in_mhz); wm_regs->usr = (int unsigned)(mode_lib->mp.Watermark.USRRetrainingWatermark * refclk_freq_in_mhz); @@ -12321,14 +12669,18 @@ void dml2_core_calcs_get_global_fams2_programming(const struct dml2_core_interna void dml2_core_calcs_get_stream_fams2_programming(const struct dml2_core_internal_display_mode_lib *mode_lib, const struct display_configuation_with_meta *display_cfg, - struct dmub_fams2_stream_static_state *fams2_programming, - enum dml2_uclk_pstate_support_method pstate_method, + union dmub_cmd_fams2_config *fams2_base_programming, + union dmub_cmd_fams2_config *fams2_sub_programming, + enum dml2_pstate_method pstate_method, int plane_index) { const struct dml2_plane_parameters *plane_descriptor = &display_cfg->display_config.plane_descriptors[plane_index]; const struct dml2_stream_parameters *stream_descriptor = &display_cfg->display_config.stream_descriptors[plane_descriptor->stream_index]; const struct dml2_fams2_meta *stream_fams2_meta = &display_cfg->stage3.stream_fams2_meta[plane_descriptor->stream_index]; + struct dmub_fams2_cmd_stream_static_base_state *base_programming = &fams2_base_programming->stream_v1.base; + union dmub_fams2_cmd_stream_static_sub_state *sub_programming = &fams2_sub_programming->stream_v1.sub_state; + unsigned int i; if (display_cfg->display_config.overrides.all_streams_blanked) { @@ -12337,110 +12689,110 @@ void dml2_core_calcs_get_stream_fams2_programming(const struct dml2_core_interna } /* from display configuration */ - fams2_programming->htotal = (uint16_t)stream_descriptor->timing.h_total; - fams2_programming->vtotal = (uint16_t)stream_descriptor->timing.v_total; - fams2_programming->vblank_start = (uint16_t)(stream_fams2_meta->nom_vtotal - + base_programming->htotal = (uint16_t)stream_descriptor->timing.h_total; + base_programming->vtotal = (uint16_t)stream_descriptor->timing.v_total; + base_programming->vblank_start = (uint16_t)(stream_fams2_meta->nom_vtotal - stream_descriptor->timing.v_front_porch); - fams2_programming->vblank_end = (uint16_t)(stream_fams2_meta->nom_vtotal - + base_programming->vblank_end = (uint16_t)(stream_fams2_meta->nom_vtotal - stream_descriptor->timing.v_front_porch - stream_descriptor->timing.v_active); - fams2_programming->config.bits.is_drr = stream_descriptor->timing.drr_config.enabled; + base_programming->config.bits.is_drr = stream_descriptor->timing.drr_config.enabled; /* from meta */ - fams2_programming->otg_vline_time_ns = + base_programming->otg_vline_time_ns = (unsigned int)(stream_fams2_meta->otg_vline_time_us * 1000.0); - fams2_programming->scheduling_delay_otg_vlines = (uint8_t)stream_fams2_meta->scheduling_delay_otg_vlines; - fams2_programming->contention_delay_otg_vlines = (uint8_t)stream_fams2_meta->contention_delay_otg_vlines; - fams2_programming->vline_int_ack_delay_otg_vlines = (uint8_t)stream_fams2_meta->vertical_interrupt_ack_delay_otg_vlines; - fams2_programming->drr_keepout_otg_vline = (uint16_t)(stream_fams2_meta->nom_vtotal - + base_programming->scheduling_delay_otg_vlines = (uint8_t)stream_fams2_meta->scheduling_delay_otg_vlines; + base_programming->contention_delay_otg_vlines = (uint8_t)stream_fams2_meta->contention_delay_otg_vlines; + base_programming->vline_int_ack_delay_otg_vlines = (uint8_t)stream_fams2_meta->vertical_interrupt_ack_delay_otg_vlines; + base_programming->drr_keepout_otg_vline = (uint16_t)(stream_fams2_meta->nom_vtotal - stream_descriptor->timing.v_front_porch - stream_fams2_meta->method_drr.programming_delay_otg_vlines); - fams2_programming->allow_to_target_delay_otg_vlines = (uint8_t)stream_fams2_meta->allow_to_target_delay_otg_vlines; - fams2_programming->max_vtotal = (uint16_t)stream_fams2_meta->max_vtotal; + base_programming->allow_to_target_delay_otg_vlines = (uint8_t)stream_fams2_meta->allow_to_target_delay_otg_vlines; + base_programming->max_vtotal = (uint16_t)stream_fams2_meta->max_vtotal; /* from core */ - fams2_programming->config.bits.min_ttu_vblank_usable = true; + base_programming->config.bits.min_ttu_vblank_usable = true; for (i = 0; i < display_cfg->display_config.num_planes; i++) { /* check if all planes support p-state in blank */ if (display_cfg->display_config.plane_descriptors[i].stream_index == plane_descriptor->stream_index && mode_lib->mp.MinTTUVBlank[i] <= mode_lib->mp.Watermark.DRAMClockChangeWatermark) { - fams2_programming->config.bits.min_ttu_vblank_usable = false; + base_programming->config.bits.min_ttu_vblank_usable = false; break; } } switch (pstate_method) { - case dml2_uclk_pstate_support_method_vactive: - case dml2_uclk_pstate_support_method_fw_vactive_drr: + case dml2_pstate_method_vactive: + case dml2_pstate_method_fw_vactive_drr: /* legacy vactive */ - fams2_programming->type = FAMS2_STREAM_TYPE_VACTIVE; - fams2_programming->sub_state.legacy.vactive_det_fill_delay_otg_vlines = - (uint8_t)stream_fams2_meta->method_vactive.max_vactive_det_fill_delay_otg_vlines; - fams2_programming->allow_start_otg_vline = - (uint16_t)stream_fams2_meta->method_vactive.common.allow_start_otg_vline; - fams2_programming->allow_end_otg_vline = - (uint16_t)stream_fams2_meta->method_vactive.common.allow_end_otg_vline; - fams2_programming->config.bits.clamp_vtotal_min = true; + base_programming->type = FAMS2_STREAM_TYPE_VACTIVE; + sub_programming->legacy.vactive_det_fill_delay_otg_vlines = + (uint8_t)stream_fams2_meta->method_vactive.max_vactive_det_fill_delay_otg_vlines; + base_programming->allow_start_otg_vline = + (uint16_t)stream_fams2_meta->method_vactive.common.allow_start_otg_vline; + base_programming->allow_end_otg_vline = + (uint16_t)stream_fams2_meta->method_vactive.common.allow_end_otg_vline; + base_programming->config.bits.clamp_vtotal_min = true; break; - case dml2_uclk_pstate_support_method_vblank: - case dml2_uclk_pstate_support_method_fw_vblank_drr: + case dml2_pstate_method_vblank: + case dml2_pstate_method_fw_vblank_drr: /* legacy vblank */ - fams2_programming->type = FAMS2_STREAM_TYPE_VBLANK; - fams2_programming->allow_start_otg_vline = - (uint16_t)stream_fams2_meta->method_vblank.common.allow_start_otg_vline; - fams2_programming->allow_end_otg_vline = - (uint16_t)stream_fams2_meta->method_vblank.common.allow_end_otg_vline; - fams2_programming->config.bits.clamp_vtotal_min = true; + base_programming->type = FAMS2_STREAM_TYPE_VBLANK; + base_programming->allow_start_otg_vline = + (uint16_t)stream_fams2_meta->method_vblank.common.allow_start_otg_vline; + base_programming->allow_end_otg_vline = + (uint16_t)stream_fams2_meta->method_vblank.common.allow_end_otg_vline; + base_programming->config.bits.clamp_vtotal_min = true; break; - case dml2_uclk_pstate_support_method_fw_drr: + case dml2_pstate_method_fw_drr: /* drr */ - fams2_programming->type = FAMS2_STREAM_TYPE_DRR; - fams2_programming->sub_state.drr.programming_delay_otg_vlines = - (uint8_t)stream_fams2_meta->method_drr.programming_delay_otg_vlines; - fams2_programming->sub_state.drr.nom_stretched_vtotal = - (uint16_t)stream_fams2_meta->method_drr.stretched_vtotal; - fams2_programming->allow_start_otg_vline = - (uint16_t)stream_fams2_meta->method_drr.common.allow_start_otg_vline; - fams2_programming->allow_end_otg_vline = - (uint16_t)stream_fams2_meta->method_drr.common.allow_end_otg_vline; + base_programming->type = FAMS2_STREAM_TYPE_DRR; + sub_programming->drr.programming_delay_otg_vlines = + (uint8_t)stream_fams2_meta->method_drr.programming_delay_otg_vlines; + sub_programming->drr.nom_stretched_vtotal = + (uint16_t)stream_fams2_meta->method_drr.stretched_vtotal; + base_programming->allow_start_otg_vline = + (uint16_t)stream_fams2_meta->method_drr.common.allow_start_otg_vline; + base_programming->allow_end_otg_vline = + (uint16_t)stream_fams2_meta->method_drr.common.allow_end_otg_vline; /* drr only clamps to vtotal min for single display */ - fams2_programming->config.bits.clamp_vtotal_min = display_cfg->display_config.num_streams == 1; - fams2_programming->sub_state.drr.only_stretch_if_required = true; + base_programming->config.bits.clamp_vtotal_min = display_cfg->display_config.num_streams == 1; + sub_programming->drr.only_stretch_if_required = true; break; - case dml2_uclk_pstate_support_method_fw_subvp_phantom: - case dml2_uclk_pstate_support_method_fw_subvp_phantom_drr: + case dml2_pstate_method_fw_svp: + case dml2_pstate_method_fw_svp_drr: /* subvp */ - fams2_programming->type = FAMS2_STREAM_TYPE_SUBVP; - fams2_programming->sub_state.subvp.vratio_numerator = - (uint16_t)(plane_descriptor->composition.scaler_info.plane0.v_ratio * 1000.0); - fams2_programming->sub_state.subvp.vratio_denominator = 1000; - fams2_programming->sub_state.subvp.programming_delay_otg_vlines = - (uint8_t)stream_fams2_meta->method_subvp.programming_delay_otg_vlines; - fams2_programming->sub_state.subvp.prefetch_to_mall_otg_vlines = - (uint8_t)stream_fams2_meta->method_subvp.prefetch_to_mall_delay_otg_vlines; - fams2_programming->sub_state.subvp.phantom_vtotal = - (uint16_t)stream_fams2_meta->method_subvp.phantom_vtotal; - fams2_programming->sub_state.subvp.phantom_vactive = - (uint16_t)stream_fams2_meta->method_subvp.phantom_vactive; - fams2_programming->sub_state.subvp.config.bits.is_multi_planar = - plane_descriptor->surface.plane1.height > 0; - fams2_programming->sub_state.subvp.config.bits.is_yuv420 = - plane_descriptor->pixel_format == dml2_420_8 || - plane_descriptor->pixel_format == dml2_420_10 || - plane_descriptor->pixel_format == dml2_420_12; - - fams2_programming->allow_start_otg_vline = - (uint16_t)stream_fams2_meta->method_subvp.common.allow_start_otg_vline; - fams2_programming->allow_end_otg_vline = - (uint16_t)stream_fams2_meta->method_subvp.common.allow_end_otg_vline; - fams2_programming->config.bits.clamp_vtotal_min = true; + base_programming->type = FAMS2_STREAM_TYPE_SUBVP; + sub_programming->subvp.vratio_numerator = + (uint16_t)(plane_descriptor->composition.scaler_info.plane0.v_ratio * 1000.0); + sub_programming->subvp.vratio_denominator = 1000; + sub_programming->subvp.programming_delay_otg_vlines = + (uint8_t)stream_fams2_meta->method_subvp.programming_delay_otg_vlines; + sub_programming->subvp.prefetch_to_mall_otg_vlines = + (uint8_t)stream_fams2_meta->method_subvp.prefetch_to_mall_delay_otg_vlines; + sub_programming->subvp.phantom_vtotal = + (uint16_t)stream_fams2_meta->method_subvp.phantom_vtotal; + sub_programming->subvp.phantom_vactive = + (uint16_t)stream_fams2_meta->method_subvp.phantom_vactive; + sub_programming->subvp.config.bits.is_multi_planar = + plane_descriptor->surface.plane1.height > 0; + sub_programming->subvp.config.bits.is_yuv420 = + plane_descriptor->pixel_format == dml2_420_8 || + plane_descriptor->pixel_format == dml2_420_10 || + plane_descriptor->pixel_format == dml2_420_12; + + base_programming->allow_start_otg_vline = + (uint16_t)stream_fams2_meta->method_subvp.common.allow_start_otg_vline; + base_programming->allow_end_otg_vline = + (uint16_t)stream_fams2_meta->method_subvp.common.allow_end_otg_vline; + base_programming->config.bits.clamp_vtotal_min = true; break; - case dml2_uclk_pstate_support_method_reserved_hw: - case dml2_uclk_pstate_support_method_reserved_fw: - case dml2_uclk_pstate_support_method_reserved_fw_drr_fixed: - case dml2_uclk_pstate_support_method_reserved_fw_drr_var: - case dml2_uclk_pstate_support_method_not_supported: - case dml2_uclk_pstate_support_method_count: + case dml2_pstate_method_reserved_hw: + case dml2_pstate_method_reserved_fw: + case dml2_pstate_method_reserved_fw_drr_clamped: + case dml2_pstate_method_reserved_fw_drr_var: + case dml2_pstate_method_na: + case dml2_pstate_method_count: default: /* this should never happen */ break; @@ -12569,6 +12921,8 @@ void dml2_core_calcs_get_informative(const struct dml2_core_internal_display_mod out->informative.mode_support_info.InvalidCombinationOfMALLUseForPState = mode_lib->ms.support.InvalidCombinationOfMALLUseForPState; out->informative.mode_support_info.ExceededMALLSize = mode_lib->ms.support.ExceededMALLSize; out->informative.mode_support_info.EnoughWritebackUnits = mode_lib->ms.support.EnoughWritebackUnits; + out->informative.mode_support_info.temp_read_or_ppt_support = mode_lib->ms.support.temp_read_or_ppt_support; + out->informative.mode_support_info.g6_temp_read_support = mode_lib->ms.support.g6_temp_read_support; out->informative.mode_support_info.ExceededMultistreamSlots = mode_lib->ms.support.ExceededMultistreamSlots; out->informative.mode_support_info.NotEnoughDSCUnits = mode_lib->ms.support.NotEnoughDSCUnits; @@ -12662,7 +13016,7 @@ void dml2_core_calcs_get_informative(const struct dml2_core_internal_display_mod out->informative.watermarks.pstate_change_us = dml_get_wm_dram_clock_change(mode_lib); out->informative.watermarks.fclk_pstate_change_us = dml_get_wm_fclk_change(mode_lib); out->informative.watermarks.usr_retraining_us = dml_get_wm_usr_retraining(mode_lib); - out->informative.watermarks.g6_temp_read_watermark_us = dml_get_wm_g6_temp_read(mode_lib); + out->informative.watermarks.temp_read_or_ppt_watermark_us = dml_get_wm_temp_read_or_ppt(mode_lib); out->informative.mall.total_surface_size_in_mall_bytes = 0; for (k = 0; k < out->display_config.num_planes; ++k) @@ -12745,6 +13099,8 @@ void dml2_core_calcs_get_informative(const struct dml2_core_internal_display_mod out->informative.qos.max_active_fclk_change_latency_supported = dml_get_fclk_change_latency(mode_lib); + out->informative.misc.LowestPrefetchMargin = 10 * 1000 * 1000; + for (k = 0; k < out->display_config.num_planes; k++) { if ((out->display_config.plane_descriptors->overrides.reserved_vblank_time_ns >= 1000.0 * mode_lib->soc.power_management_parameters.dram_clk_change_blackout_us) @@ -12824,6 +13180,7 @@ void dml2_core_calcs_get_informative(const struct dml2_core_internal_display_mod out->informative.misc.DisplayPipeLineDeliveryTimeLumaPrefetch[k] = mode_lib->mp.DisplayPipeLineDeliveryTimeLumaPrefetch[k]; out->informative.misc.DisplayPipeLineDeliveryTimeChromaPrefetch[k] = mode_lib->mp.DisplayPipeLineDeliveryTimeChromaPrefetch[k]; + out->informative.misc.WritebackRequiredBandwidth = mode_lib->scratch.dml_core_mode_programming_locals.TotalWRBandwidth / 1000.0; out->informative.misc.WritebackAllowDRAMClockChangeEndPosition[k] = mode_lib->mp.WritebackAllowDRAMClockChangeEndPosition[k]; out->informative.misc.WritebackAllowFCLKChangeEndPosition[k] = mode_lib->mp.WritebackAllowFCLKChangeEndPosition[k]; out->informative.misc.DSCCLK_calculated[k] = mode_lib->mp.DSCCLK[k]; @@ -12831,6 +13188,9 @@ void dml2_core_calcs_get_informative(const struct dml2_core_internal_display_mod out->informative.misc.PTE_BUFFER_MODE[k] = mode_lib->mp.PTE_BUFFER_MODE[k]; out->informative.misc.DSCDelay[k] = mode_lib->mp.DSCDelay[k]; out->informative.misc.MaxActiveDRAMClockChangeLatencySupported[k] = mode_lib->mp.MaxActiveDRAMClockChangeLatencySupported[k]; + + if (mode_lib->mp.impacted_prefetch_margin_us[k] < out->informative.misc.LowestPrefetchMargin) + out->informative.misc.LowestPrefetchMargin = mode_lib->mp.impacted_prefetch_margin_us[k]; } // For this DV informative layer, all pipes in the same planes will just use the same id @@ -12853,16 +13213,11 @@ void dml2_core_calcs_get_informative(const struct dml2_core_internal_display_mod out->informative.non_optimized_mcache_allocation[k].global_mcache_ids_plane1[n] = k; } } - - out->informative.qos.max_non_urgent_latency_us = mode_lib->soc.qos_parameters.qos_params.dcn4x.per_uclk_dpm_params[mode_lib->mp.qos_param_index].maximum_latency_when_non_urgent_uclk_cycles - / mode_lib->mp.uclk_freq_mhz * (1 + mode_lib->soc.qos_parameters.qos_params.dcn4x.umc_max_latency_margin / 100.0) - + mode_lib->soc.qos_parameters.qos_params.dcn4x.mall_overhead_fclk_cycles / mode_lib->mp.FabricClock - + mode_lib->soc.qos_parameters.qos_params.dcn4x.max_round_trip_to_furthest_cs_fclk_cycles / mode_lib->mp.FabricClock - * (1 + mode_lib->soc.qos_parameters.qos_params.dcn4x.fabric_max_transport_latency_margin / 100.0); + out->informative.qos.max_non_urgent_latency_us = dml_get_max_non_urgent_latency_us(mode_lib); if (mode_lib->soc.qos_parameters.qos_type == dml2_qos_param_type_dcn4x) { if (((mode_lib->ip.rob_buffer_size_kbytes - mode_lib->ip.pixel_chunk_size_kbytes) * 1024 - / mode_lib->mp.non_urg_bandwidth_required[dml2_core_internal_soc_state_sys_active][dml2_core_internal_bw_sdp]) >= out->informative.qos.max_non_urgent_latency_us) { + / mode_lib->ms.support.non_urg_bandwidth_required[dml2_core_internal_soc_state_sys_active][dml2_core_internal_bw_sdp]) >= out->informative.qos.max_non_urgent_latency_us) { out->informative.misc.ROBUrgencyAvoidance = true; } else { out->informative.misc.ROBUrgencyAvoidance = false; diff --git a/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_core/dml2_core_dcn4_calcs.h b/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_core/dml2_core_dcn4_calcs.h index df2d1550a14b0db3f731e20a5f10e775cc70b515..27ef0e096b25435add4e58c4782dc560e55c5059 100644 --- a/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_core/dml2_core_dcn4_calcs.h +++ b/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_core/dml2_core_dcn4_calcs.h @@ -28,7 +28,7 @@ void dml2_core_calcs_get_plane_support_info(const struct dml2_display_cfg *displ void dml2_core_calcs_get_informative(const struct dml2_core_internal_display_mode_lib *mode_lib, struct dml2_display_cfg_programming *out); void dml2_core_calcs_get_stream_support_info(const struct dml2_display_cfg *display_cfg, const struct dml2_core_internal_display_mode_lib *mode_lib, struct core_stream_support_info *out, int plane_index); void dml2_core_calcs_get_mall_allocation(struct dml2_core_internal_display_mode_lib *mode_lib, unsigned int *out, int pipe_index); -void dml2_core_calcs_get_stream_fams2_programming(const struct dml2_core_internal_display_mode_lib *mode_lib, const struct display_configuation_with_meta *display_cfg, struct dmub_fams2_stream_static_state *fams2_programming, enum dml2_uclk_pstate_support_method pstate_method, int plane_index); +void dml2_core_calcs_get_stream_fams2_programming(const struct dml2_core_internal_display_mode_lib *mode_lib, const struct display_configuation_with_meta *display_cfg, union dmub_cmd_fams2_config *fams2_base_programming, union dmub_cmd_fams2_config *fams2_sub_programming, enum dml2_pstate_method pstate_method, int plane_index); void dml2_core_calcs_get_global_fams2_programming(const struct dml2_core_internal_display_mode_lib *mode_lib, const struct display_configuation_with_meta *display_cfg, struct dmub_cmd_fams2_global_config *fams2_global_config); void dml2_core_calcs_get_dpte_row_height(unsigned int *dpte_row_height, struct dml2_core_internal_display_mode_lib *mode_lib, bool is_plane1, enum dml2_source_format_class SourcePixelFormat, enum dml2_swizzle_mode SurfaceTiling, enum dml2_rotation_angle ScanDirection, unsigned int pitch, unsigned int GPUVMMinPageSizeKBytes); diff --git a/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_core/dml2_core_shared_types.h b/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_core/dml2_core_shared_types.h index cbdfbd5a0bdea151e879c62d5b7f695c32b63ea9..4f54e54102ef669b14c609f20ad05a788c07e871 100644 --- a/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_core/dml2_core_shared_types.h +++ b/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_core/dml2_core_shared_types.h @@ -201,7 +201,7 @@ struct dml2_core_internal_watermarks { double Z8StutterExitWatermark; double Z8StutterEnterPlusExitWatermark; double USRRetrainingWatermark; - double g6_temp_read_watermark_us; + double temp_read_or_ppt_watermark_us; }; struct dml2_core_internal_mode_support_info { @@ -252,8 +252,8 @@ struct dml2_core_internal_mode_support_info { bool PTEBufferSizeNotExceeded; bool DCCMetaBufferSizeNotExceeded; - enum dml2_dram_clock_change_support DRAMClockChangeSupport[DML2_MAX_PLANES]; - enum dml2_fclock_change_support FCLKChangeSupport[DML2_MAX_PLANES]; + enum dml2_pstate_change_support DRAMClockChangeSupport[DML2_MAX_PLANES]; + enum dml2_pstate_change_support FCLKChangeSupport[DML2_MAX_PLANES]; bool global_dram_clock_change_supported; bool global_fclk_change_supported; bool USRRetrainingSupport; @@ -318,12 +318,15 @@ struct dml2_core_internal_mode_support_info { bool avg_bandwidth_support_ok[dml2_core_internal_soc_state_max][dml2_core_internal_bw_max]; double max_urgent_latency_us; + double max_non_urgent_latency_us; double avg_non_urgent_latency_us; double avg_urgent_latency_us; + double df_response_time_us; bool incorrect_imall_usage; bool g6_temp_read_support; + bool temp_read_or_ppt_support; struct dml2_core_internal_watermarks watermarks; }; @@ -378,8 +381,8 @@ struct dml2_core_internal_mode_support { unsigned int DETBufferSizeC[DML2_MAX_PLANES]; unsigned int SwathHeightY[DML2_MAX_PLANES]; unsigned int SwathHeightC[DML2_MAX_PLANES]; - unsigned int SwathWidthY[DML2_MAX_PLANES]; - unsigned int SwathWidthC[DML2_MAX_PLANES]; + unsigned int SwathWidthY[DML2_MAX_PLANES]; // per-pipe + unsigned int SwathWidthC[DML2_MAX_PLANES]; // per-pipe // ---------------------------------- // Intermediates/Informational @@ -476,9 +479,9 @@ struct dml2_core_internal_mode_support { // Bandwidth Related Info double BandwidthAvailableForImmediateFlip; - double SurfaceReadBandwidthLuma[DML2_MAX_PLANES]; // no dcc overhead, for the plane - double SurfaceReadBandwidthChroma[DML2_MAX_PLANES]; - double WriteBandwidth[DML2_MAX_PLANES]; + double vactive_sw_bw_l[DML2_MAX_PLANES]; // no dcc overhead, for the plane + double vactive_sw_bw_c[DML2_MAX_PLANES]; + double WriteBandwidth[DML2_MAX_PLANES][DML2_MAX_WRITEBACK]; double RequiredPrefetchPixelDataBWLuma[DML2_MAX_PLANES]; double RequiredPrefetchPixelDataBWChroma[DML2_MAX_PLANES]; double cursor_bw[DML2_MAX_PLANES]; @@ -539,7 +542,7 @@ struct dml2_core_internal_mode_program { unsigned int qos_param_index; // to access the uclk dependent dpm table unsigned int active_min_uclk_dpm_index; // to access the min_clk table double FabricClock; /// DynamicMetadataSupported); if (!fail_only || support->VRatioInPrefetchSupported == 0) dml2_printf("DML: support: VRatioInPrefetchSupported = %d\n", support->VRatioInPrefetchSupported); - if (!fail_only || support->PTEBufferSizeNotExceeded == 1) + if (!fail_only || support->PTEBufferSizeNotExceeded == 0) dml2_printf("DML: support: PTEBufferSizeNotExceeded = %d\n", support->PTEBufferSizeNotExceeded); - if (!fail_only || support->DCCMetaBufferSizeNotExceeded == 1) + if (!fail_only || support->DCCMetaBufferSizeNotExceeded == 0) dml2_printf("DML: support: DCCMetaBufferSizeNotExceeded = %d\n", support->DCCMetaBufferSizeNotExceeded); if (!fail_only || support->ExceededMALLSize == 1) dml2_printf("DML: support: ExceededMALLSize = %d\n", support->ExceededMALLSize); @@ -280,39 +424,49 @@ bool dml2_core_utils_is_phantom_pipe(const struct dml2_plane_parameters *plane_c return is_phantom; } -unsigned int dml2_core_utils_get_tile_block_size_bytes(enum dml2_swizzle_mode sw_mode) -{ - switch (sw_mode) { - case (dml2_sw_linear): - return 256; break; - case (dml2_sw_256b_2d): - return 256; break; - case (dml2_sw_4kb_2d): - return 4096; break; - case (dml2_sw_64kb_2d): - return 65536; break; - case (dml2_sw_256kb_2d): - return 262144; break; - case (dml2_gfx11_sw_linear): - return 256; break; - case (dml2_gfx11_sw_64kb_d): - return 65536; break; - case (dml2_gfx11_sw_64kb_d_t): - return 65536; break; - case (dml2_gfx11_sw_64kb_d_x): - return 65536; break; - case (dml2_gfx11_sw_64kb_r_x): - return 65536; break; - case (dml2_gfx11_sw_256kb_d_x): - return 262144; break; - case (dml2_gfx11_sw_256kb_r_x): - return 262144; break; - default: +unsigned int dml2_core_utils_get_tile_block_size_bytes(enum dml2_swizzle_mode sw_mode, unsigned int byte_per_pixel) +{ + + if (sw_mode == dml2_sw_linear) + return 256; + else if (sw_mode == dml2_sw_256b_2d) + return 256; + else if (sw_mode == dml2_sw_4kb_2d) + return 4096; + else if (sw_mode == dml2_sw_64kb_2d) + return 65536; + else if (sw_mode == dml2_sw_256kb_2d) + return 262144; + else if (sw_mode == dml2_gfx11_sw_linear) + return 256; + else if (sw_mode == dml2_gfx11_sw_64kb_d) + return 65536; + else if (sw_mode == dml2_gfx11_sw_64kb_d_t) + return 65536; + else if (sw_mode == dml2_gfx11_sw_64kb_d_x) + return 65536; + else if (sw_mode == dml2_gfx11_sw_64kb_r_x) + return 65536; + else if (sw_mode == dml2_gfx11_sw_256kb_d_x) + return 262144; + else if (sw_mode == dml2_gfx11_sw_256kb_r_x) + return 262144; + else { DML2_ASSERT(0); return 256; }; } +bool dml2_core_utils_get_segment_horizontal_contiguous(enum dml2_swizzle_mode sw_mode, unsigned int byte_per_pixel) +{ + return (byte_per_pixel != 2); +} + +bool dml2_core_utils_is_linear(enum dml2_swizzle_mode sw_mode) +{ + return (sw_mode == dml2_sw_linear || sw_mode == dml2_sw_linear_256b || sw_mode == dml2_linear_64elements); +}; + bool dml2_core_utils_is_vertical_rotation(enum dml2_rotation_angle Scan) { @@ -325,7 +479,6 @@ bool dml2_core_utils_is_vertical_rotation(enum dml2_rotation_angle Scan) return is_vert; } - int unsigned dml2_core_utils_get_gfx_version(enum dml2_swizzle_mode sw_mode) { int unsigned version = 0; @@ -334,17 +487,17 @@ int unsigned dml2_core_utils_get_gfx_version(enum dml2_swizzle_mode sw_mode) sw_mode == dml2_sw_256b_2d || sw_mode == dml2_sw_4kb_2d || sw_mode == dml2_sw_64kb_2d || - sw_mode == dml2_sw_256kb_2d) { + sw_mode == dml2_sw_256kb_2d) version = 12; - } else if (sw_mode == dml2_gfx11_sw_linear || + else if (sw_mode == dml2_gfx11_sw_linear || sw_mode == dml2_gfx11_sw_64kb_d || sw_mode == dml2_gfx11_sw_64kb_d_t || sw_mode == dml2_gfx11_sw_64kb_d_x || sw_mode == dml2_gfx11_sw_64kb_r_x || sw_mode == dml2_gfx11_sw_256kb_d_x || - sw_mode == dml2_gfx11_sw_256kb_r_x) { + sw_mode == dml2_gfx11_sw_256kb_r_x) version = 11; - } else { + else { dml2_printf("ERROR: Invalid sw_mode setting! val=%u\n", sw_mode); DML2_ASSERT(0); } diff --git a/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_core/dml2_core_utils.h b/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_core/dml2_core_utils.h index a5cc6a07167aed32f17649bc95d4d032d47fef1f..95f0d017add455f57aff99b48ed8dffc18096822 100644 --- a/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_core/dml2_core_utils.h +++ b/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_core/dml2_core_utils.h @@ -11,6 +11,8 @@ double dml2_core_utils_div_rem(double dividend, unsigned int divisor, unsigned int *remainder); const char *dml2_core_utils_internal_bw_type_str(enum dml2_core_internal_bw_type bw_type); bool dml2_core_utils_is_420(enum dml2_source_format_class source_format); +bool dml2_core_utils_is_422_planar(enum dml2_source_format_class source_format); +bool dml2_core_utils_is_422_packed(enum dml2_source_format_class source_format); void dml2_core_utils_print_mode_support_info(const struct dml2_core_internal_mode_support_info *support, bool fail_only); const char *dml2_core_utils_internal_soc_state_type_str(enum dml2_core_internal_soc_state_type dml2_core_internal_soc_state_type); void dml2_core_utils_get_stream_output_bpp(double *out_bpp, const struct dml2_display_cfg *display_cfg); @@ -18,8 +20,10 @@ unsigned int dml2_core_utils_round_to_multiple(unsigned int num, unsigned int mu unsigned int dml2_core_util_get_num_active_pipes(int unsigned num_planes, const struct core_display_cfg_support_info *cfg_support_info); void dml2_core_utils_pipe_plane_mapping(const struct core_display_cfg_support_info *cfg_support_info, unsigned int *pipe_plane); bool dml2_core_utils_is_phantom_pipe(const struct dml2_plane_parameters *plane_cfg); -unsigned int dml2_core_utils_get_tile_block_size_bytes(enum dml2_swizzle_mode sw_mode); +unsigned int dml2_core_utils_get_tile_block_size_bytes(enum dml2_swizzle_mode sw_mode, unsigned int byte_per_pixel); +bool dml2_core_utils_get_segment_horizontal_contiguous(enum dml2_swizzle_mode sw_mode, unsigned int byte_per_pixel); bool dml2_core_utils_is_vertical_rotation(enum dml2_rotation_angle Scan); +bool dml2_core_utils_is_linear(enum dml2_swizzle_mode sw_mode); int unsigned dml2_core_utils_get_gfx_version(enum dml2_swizzle_mode sw_mode); unsigned int dml2_core_utils_get_qos_param_index(unsigned long uclk_freq_khz, const struct dml2_dcn4_uclk_dpm_dependent_qos_params *per_uclk_dpm_params); unsigned int dml2_core_utils_get_active_min_uclk_dpm_index(unsigned long uclk_freq_khz, const struct dml2_soc_state_table *clk_table); diff --git a/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_dpmm/dml2_dpmm_dcn4.c b/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_dpmm/dml2_dpmm_dcn4.c index 8869ea0893128d0b0275b9b99c44157497c3e84c..8a78b9adfc623432ef4f01f27f304f36e00f2013 100644 --- a/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_dpmm/dml2_dpmm_dcn4.c +++ b/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_dpmm/dml2_dpmm_dcn4.c @@ -96,6 +96,7 @@ static void calculate_svp_prefetch_minimums(struct dml2_dpmm_map_mode_to_soc_dpm double min_uclk_latency; const struct dml2_core_mode_support_result *mode_support_result = &in_out->display_cfg->mode_support_result; + /* assumes DF throttling is enabled */ min_uclk_avg = dram_bw_kbps_to_uclk_khz(mode_support_result->global.svp_prefetch.average_bw_dram_kbps, &in_out->soc_bb->clk_table.dram_config); min_uclk_avg = (double)min_uclk_avg / ((double)in_out->soc_bb->qos_parameters.derate_table.dcn_mall_prefetch_average.dram_derate_percent_pixel / 100); @@ -125,6 +126,37 @@ static void calculate_svp_prefetch_minimums(struct dml2_dpmm_map_mode_to_soc_dpm in_out->programming->min_clocks.dcn4x.svp_prefetch.uclk_khz = dml_round_up(min_uclk_bw > min_uclk_latency ? min_uclk_bw : min_uclk_latency); in_out->programming->min_clocks.dcn4x.svp_prefetch.fclk_khz = dml_round_up(min_fclk_bw > min_fclk_latency ? min_fclk_bw : min_fclk_latency); in_out->programming->min_clocks.dcn4x.svp_prefetch.dcfclk_khz = dml_round_up(min_dcfclk_bw > min_dcfclk_latency ? min_dcfclk_bw : min_dcfclk_latency); + + /* assumes DF throttling is disabled */ + min_uclk_avg = dram_bw_kbps_to_uclk_khz(mode_support_result->global.svp_prefetch.average_bw_dram_kbps, &in_out->soc_bb->clk_table.dram_config); + min_uclk_avg = (double)min_uclk_avg / ((double)in_out->soc_bb->qos_parameters.derate_table.system_active_average.dram_derate_percent_pixel / 100); + + min_uclk_urgent = dram_bw_kbps_to_uclk_khz(mode_support_result->global.svp_prefetch.urgent_bw_dram_kbps, &in_out->soc_bb->clk_table.dram_config); + min_uclk_urgent = (double)min_uclk_urgent / ((double)in_out->soc_bb->qos_parameters.derate_table.system_active_urgent.dram_derate_percent_pixel / 100); + + min_uclk_bw = min_uclk_urgent > min_uclk_avg ? min_uclk_urgent : min_uclk_avg; + + min_fclk_avg = (double)mode_support_result->global.svp_prefetch.average_bw_sdp_kbps / in_out->soc_bb->fabric_datapath_to_dcn_data_return_bytes; + min_fclk_avg = (double)min_fclk_avg / ((double)in_out->soc_bb->qos_parameters.derate_table.system_active_average.fclk_derate_percent / 100); + + min_fclk_urgent = (double)mode_support_result->global.svp_prefetch.urgent_bw_sdp_kbps / in_out->soc_bb->fabric_datapath_to_dcn_data_return_bytes; + min_fclk_urgent = (double)min_fclk_urgent / ((double)in_out->soc_bb->qos_parameters.derate_table.system_active_urgent.fclk_derate_percent / 100); + + min_fclk_bw = min_fclk_urgent > min_fclk_avg ? min_fclk_urgent : min_fclk_avg; + + min_dcfclk_avg = (double)mode_support_result->global.svp_prefetch.average_bw_sdp_kbps / in_out->soc_bb->return_bus_width_bytes; + min_dcfclk_avg = (double)min_dcfclk_avg / ((double)in_out->soc_bb->qos_parameters.derate_table.system_active_average.dcfclk_derate_percent / 100); + + min_dcfclk_urgent = (double)mode_support_result->global.svp_prefetch.urgent_bw_sdp_kbps / in_out->soc_bb->return_bus_width_bytes; + min_dcfclk_urgent = (double)min_dcfclk_urgent / ((double)in_out->soc_bb->qos_parameters.derate_table.system_active_urgent.dcfclk_derate_percent / 100); + + min_dcfclk_bw = min_dcfclk_urgent > min_dcfclk_avg ? min_dcfclk_urgent : min_dcfclk_avg; + + get_minimum_clocks_for_latency(in_out, &min_uclk_latency, &min_fclk_latency, &min_dcfclk_latency); + + in_out->programming->min_clocks.dcn4x.svp_prefetch_no_throttle.uclk_khz = dml_round_up(min_uclk_bw > min_uclk_latency ? min_uclk_bw : min_uclk_latency); + in_out->programming->min_clocks.dcn4x.svp_prefetch_no_throttle.fclk_khz = dml_round_up(min_fclk_bw > min_fclk_latency ? min_fclk_bw : min_fclk_latency); + in_out->programming->min_clocks.dcn4x.svp_prefetch_no_throttle.dcfclk_khz = dml_round_up(min_dcfclk_bw > min_dcfclk_latency ? min_dcfclk_bw : min_dcfclk_latency); } static void calculate_idle_minimums(struct dml2_dpmm_map_mode_to_soc_dpm_params_in_out *in_out) @@ -180,7 +212,7 @@ static bool add_margin_and_round_to_dfs_grainularity(double clock_khz, double ma clock_khz *= 1.0 + margin; - divider = (unsigned int)((int)DFS_DIVIDER_RANGE_SCALE_FACTOR * (vco_freq_khz / clock_khz)); + divider = (unsigned int)(DFS_DIVIDER_RANGE_SCALE_FACTOR * (vco_freq_khz / clock_khz)); /* we want to floor here to get higher clock than required rather than lower */ if (divider < DFS_DIVIDER_RANGE_2_START) { @@ -272,6 +304,17 @@ static bool map_soc_min_clocks_to_dpm_fine_grained(struct dml2_display_cfg_progr if (result) result = round_up_to_next_dpm(&display_cfg->min_clocks.dcn4x.idle.uclk_khz, &state_table->uclk); + /* these clocks are optional, so they can fail to map, in which case map all to 0 */ + if (result) { + if (!round_up_to_next_dpm(&display_cfg->min_clocks.dcn4x.svp_prefetch_no_throttle.dcfclk_khz, &state_table->dcfclk) || + !round_up_to_next_dpm(&display_cfg->min_clocks.dcn4x.svp_prefetch_no_throttle.fclk_khz, &state_table->fclk) || + !round_up_to_next_dpm(&display_cfg->min_clocks.dcn4x.svp_prefetch_no_throttle.uclk_khz, &state_table->uclk)) { + display_cfg->min_clocks.dcn4x.svp_prefetch_no_throttle.dcfclk_khz = 0; + display_cfg->min_clocks.dcn4x.svp_prefetch_no_throttle.fclk_khz = 0; + display_cfg->min_clocks.dcn4x.svp_prefetch_no_throttle.uclk_khz = 0; + } + } + return result; } @@ -711,7 +754,7 @@ bool dpmm_dcn4_map_watermarks(struct dml2_dpmm_map_watermarks_params_in_out *in_ dchubbub_regs->wm_regs[DML2_DCHUB_WATERMARK_SET_A].fclk_pstate = (int unsigned)(mode_lib->mp.Watermark.FCLKChangeWatermark * refclk_freq_in_mhz); dchubbub_regs->wm_regs[DML2_DCHUB_WATERMARK_SET_A].sr_enter = (int unsigned)(mode_lib->mp.Watermark.StutterEnterPlusExitWatermark * refclk_freq_in_mhz); dchubbub_regs->wm_regs[DML2_DCHUB_WATERMARK_SET_A].sr_exit = (int unsigned)(mode_lib->mp.Watermark.StutterExitWatermark * refclk_freq_in_mhz); - dchubbub_regs->wm_regs[DML2_DCHUB_WATERMARK_SET_A].temp_read_or_ppt = (int unsigned)(mode_lib->mp.Watermark.g6_temp_read_watermark_us * refclk_freq_in_mhz); + dchubbub_regs->wm_regs[DML2_DCHUB_WATERMARK_SET_A].temp_read_or_ppt = (int unsigned)(mode_lib->mp.Watermark.temp_read_or_ppt_watermark_us * refclk_freq_in_mhz); dchubbub_regs->wm_regs[DML2_DCHUB_WATERMARK_SET_A].uclk_pstate = (int unsigned)(mode_lib->mp.Watermark.DRAMClockChangeWatermark * refclk_freq_in_mhz); dchubbub_regs->wm_regs[DML2_DCHUB_WATERMARK_SET_A].urgent = (int unsigned)(mode_lib->mp.Watermark.UrgentWatermark * refclk_freq_in_mhz); dchubbub_regs->wm_regs[DML2_DCHUB_WATERMARK_SET_A].usr = (int unsigned)(mode_lib->mp.Watermark.USRRetrainingWatermark * refclk_freq_in_mhz); @@ -725,7 +768,7 @@ bool dpmm_dcn4_map_watermarks(struct dml2_dpmm_map_watermarks_params_in_out *in_ dchubbub_regs->wm_regs[DML2_DCHUB_WATERMARK_SET_B].fclk_pstate = (int unsigned)(mode_lib->mp.Watermark.FCLKChangeWatermark * refclk_freq_in_mhz); dchubbub_regs->wm_regs[DML2_DCHUB_WATERMARK_SET_B].sr_enter = (int unsigned)(mode_lib->mp.Watermark.StutterEnterPlusExitWatermark * refclk_freq_in_mhz); dchubbub_regs->wm_regs[DML2_DCHUB_WATERMARK_SET_B].sr_exit = (int unsigned)(mode_lib->mp.Watermark.StutterExitWatermark * refclk_freq_in_mhz); - dchubbub_regs->wm_regs[DML2_DCHUB_WATERMARK_SET_B].temp_read_or_ppt = (int unsigned)(mode_lib->mp.Watermark.g6_temp_read_watermark_us * refclk_freq_in_mhz); + dchubbub_regs->wm_regs[DML2_DCHUB_WATERMARK_SET_B].temp_read_or_ppt = (int unsigned)(mode_lib->mp.Watermark.temp_read_or_ppt_watermark_us * refclk_freq_in_mhz); dchubbub_regs->wm_regs[DML2_DCHUB_WATERMARK_SET_B].uclk_pstate = (int unsigned)(mode_lib->mp.Watermark.DRAMClockChangeWatermark * refclk_freq_in_mhz); dchubbub_regs->wm_regs[DML2_DCHUB_WATERMARK_SET_B].urgent = (int unsigned)(mode_lib->mp.Watermark.UrgentWatermark * refclk_freq_in_mhz); dchubbub_regs->wm_regs[DML2_DCHUB_WATERMARK_SET_B].usr = (int unsigned)(mode_lib->mp.Watermark.USRRetrainingWatermark * refclk_freq_in_mhz); diff --git a/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_pmo/dml2_pmo_dcn4_fams2.c b/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_pmo/dml2_pmo_dcn4_fams2.c index 92269f0e50ed249fd76f049c1e9c08a6bf9483a0..1efbc0329f85a2a47ac19a5ff1c96bf3b5e86de2 100644 --- a/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_pmo/dml2_pmo_dcn4_fams2.c +++ b/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_pmo/dml2_pmo_dcn4_fams2.c @@ -13,32 +13,32 @@ static const double MIN_BLANK_STUTTER_FACTOR = 3.0; static const struct dml2_pmo_pstate_strategy base_strategy_list_1_display[] = { // VActive Preferred { - .per_stream_pstate_method = { dml2_pmo_pstate_strategy_vactive, dml2_pmo_pstate_strategy_na, dml2_pmo_pstate_strategy_na, dml2_pmo_pstate_strategy_na }, + .per_stream_pstate_method = { dml2_pstate_method_vactive, dml2_pstate_method_na, dml2_pstate_method_na, dml2_pstate_method_na }, .allow_state_increase = true, }, // Then SVP { - .per_stream_pstate_method = { dml2_pmo_pstate_strategy_fw_svp, dml2_pmo_pstate_strategy_na, dml2_pmo_pstate_strategy_na, dml2_pmo_pstate_strategy_na }, + .per_stream_pstate_method = { dml2_pstate_method_fw_svp, dml2_pstate_method_na, dml2_pstate_method_na, dml2_pstate_method_na }, .allow_state_increase = true, }, // Then VBlank { - .per_stream_pstate_method = { dml2_pmo_pstate_strategy_vblank, dml2_pmo_pstate_strategy_na, dml2_pmo_pstate_strategy_na, dml2_pmo_pstate_strategy_na }, + .per_stream_pstate_method = { dml2_pstate_method_vblank, dml2_pstate_method_na, dml2_pstate_method_na, dml2_pstate_method_na }, .allow_state_increase = false, }, // Then DRR { - .per_stream_pstate_method = { dml2_pmo_pstate_strategy_fw_drr, dml2_pmo_pstate_strategy_na, dml2_pmo_pstate_strategy_na, dml2_pmo_pstate_strategy_na }, + .per_stream_pstate_method = { dml2_pstate_method_fw_drr, dml2_pstate_method_na, dml2_pstate_method_na, dml2_pstate_method_na }, .allow_state_increase = true, }, // Finally VBlank, but allow base clocks for latency to increase /* { - .per_stream_pstate_method = { dml2_pmo_pstate_strategy_vblank, dml2_pmo_pstate_strategy_na, dml2_pmo_pstate_strategy_na, dml2_pmo_pstate_strategy_na }, + .per_stream_pstate_method = { dml2_pstate_method_vblank, dml2_pstate_method_na, dml2_pstate_method_na, dml2_pstate_method_na }, .allow_state_increase = true, }, */ @@ -49,56 +49,56 @@ static const int base_strategy_list_1_display_size = sizeof(base_strategy_list_1 static const struct dml2_pmo_pstate_strategy base_strategy_list_2_display[] = { // VActive only is preferred { - .per_stream_pstate_method = { dml2_pmo_pstate_strategy_vactive, dml2_pmo_pstate_strategy_vactive, dml2_pmo_pstate_strategy_na, dml2_pmo_pstate_strategy_na }, + .per_stream_pstate_method = { dml2_pstate_method_vactive, dml2_pstate_method_vactive, dml2_pstate_method_na, dml2_pstate_method_na }, .allow_state_increase = true, }, // Then VActive + VBlank { - .per_stream_pstate_method = { dml2_pmo_pstate_strategy_vactive, dml2_pmo_pstate_strategy_vblank, dml2_pmo_pstate_strategy_na, dml2_pmo_pstate_strategy_na }, + .per_stream_pstate_method = { dml2_pstate_method_vactive, dml2_pstate_method_vblank, dml2_pstate_method_na, dml2_pstate_method_na }, .allow_state_increase = false, }, // Then VBlank only { - .per_stream_pstate_method = { dml2_pmo_pstate_strategy_vblank, dml2_pmo_pstate_strategy_vblank, dml2_pmo_pstate_strategy_na, dml2_pmo_pstate_strategy_na }, + .per_stream_pstate_method = { dml2_pstate_method_vblank, dml2_pstate_method_vblank, dml2_pstate_method_na, dml2_pstate_method_na }, .allow_state_increase = false, }, // Then SVP + VBlank { - .per_stream_pstate_method = { dml2_pmo_pstate_strategy_fw_svp, dml2_pmo_pstate_strategy_vblank, dml2_pmo_pstate_strategy_na, dml2_pmo_pstate_strategy_na }, + .per_stream_pstate_method = { dml2_pstate_method_fw_svp, dml2_pstate_method_vblank, dml2_pstate_method_na, dml2_pstate_method_na }, .allow_state_increase = false, }, // Then SVP + DRR { - .per_stream_pstate_method = { dml2_pmo_pstate_strategy_fw_svp, dml2_pmo_pstate_strategy_fw_drr, dml2_pmo_pstate_strategy_na, dml2_pmo_pstate_strategy_na }, + .per_stream_pstate_method = { dml2_pstate_method_fw_svp, dml2_pstate_method_fw_drr, dml2_pstate_method_na, dml2_pstate_method_na }, .allow_state_increase = true, }, // Then SVP + SVP { - .per_stream_pstate_method = { dml2_pmo_pstate_strategy_fw_svp, dml2_pmo_pstate_strategy_fw_svp, dml2_pmo_pstate_strategy_na, dml2_pmo_pstate_strategy_na }, + .per_stream_pstate_method = { dml2_pstate_method_fw_svp, dml2_pstate_method_fw_svp, dml2_pstate_method_na, dml2_pstate_method_na }, .allow_state_increase = true, }, // Then DRR + VActive { - .per_stream_pstate_method = { dml2_pmo_pstate_strategy_vactive, dml2_pmo_pstate_strategy_fw_drr, dml2_pmo_pstate_strategy_na, dml2_pmo_pstate_strategy_na }, + .per_stream_pstate_method = { dml2_pstate_method_vactive, dml2_pstate_method_fw_drr, dml2_pstate_method_na, dml2_pstate_method_na }, .allow_state_increase = true, }, // Then DRR + DRR { - .per_stream_pstate_method = { dml2_pmo_pstate_strategy_fw_drr, dml2_pmo_pstate_strategy_fw_drr, dml2_pmo_pstate_strategy_na, dml2_pmo_pstate_strategy_na }, + .per_stream_pstate_method = { dml2_pstate_method_fw_drr, dml2_pstate_method_fw_drr, dml2_pstate_method_na, dml2_pstate_method_na }, .allow_state_increase = true, }, // Finally VBlank, but allow base clocks for latency to increase /* { - .per_stream_pstate_method = { dml2_pmo_pstate_strategy_vblank, dml2_pmo_pstate_strategy_vblank, dml2_pmo_pstate_strategy_na, dml2_pmo_pstate_strategy_na }, + .per_stream_pstate_method = { dml2_pstate_method_vblank, dml2_pstate_method_vblank, dml2_pstate_method_na, dml2_pstate_method_na }, .allow_state_increase = true, }, */ @@ -109,32 +109,32 @@ static const int base_strategy_list_2_display_size = sizeof(base_strategy_list_2 static const struct dml2_pmo_pstate_strategy base_strategy_list_3_display[] = { // All VActive { - .per_stream_pstate_method = { dml2_pmo_pstate_strategy_vactive, dml2_pmo_pstate_strategy_vactive, dml2_pmo_pstate_strategy_vactive, dml2_pmo_pstate_strategy_na }, + .per_stream_pstate_method = { dml2_pstate_method_vactive, dml2_pstate_method_vactive, dml2_pstate_method_vactive, dml2_pstate_method_na }, .allow_state_increase = true, }, // VActive + 1 VBlank { - .per_stream_pstate_method = { dml2_pmo_pstate_strategy_vactive, dml2_pmo_pstate_strategy_vactive, dml2_pmo_pstate_strategy_vblank, dml2_pmo_pstate_strategy_na }, + .per_stream_pstate_method = { dml2_pstate_method_vactive, dml2_pstate_method_vactive, dml2_pstate_method_vblank, dml2_pstate_method_na }, .allow_state_increase = false, }, // All VBlank { - .per_stream_pstate_method = { dml2_pmo_pstate_strategy_vblank, dml2_pmo_pstate_strategy_vblank, dml2_pmo_pstate_strategy_vblank, dml2_pmo_pstate_strategy_na }, + .per_stream_pstate_method = { dml2_pstate_method_vblank, dml2_pstate_method_vblank, dml2_pstate_method_vblank, dml2_pstate_method_na }, .allow_state_increase = false, }, // All DRR { - .per_stream_pstate_method = { dml2_pmo_pstate_strategy_fw_drr, dml2_pmo_pstate_strategy_fw_drr, dml2_pmo_pstate_strategy_fw_drr, dml2_pmo_pstate_strategy_na }, + .per_stream_pstate_method = { dml2_pstate_method_fw_drr, dml2_pstate_method_fw_drr, dml2_pstate_method_fw_drr, dml2_pstate_method_na }, .allow_state_increase = true, }, // All VBlank, with state increase allowed /* { - .per_stream_pstate_method = { dml2_pmo_pstate_strategy_vblank, dml2_pmo_pstate_strategy_vblank, dml2_pmo_pstate_strategy_vblank, dml2_pmo_pstate_strategy_na }, + .per_stream_pstate_method = { dml2_pstate_method_vblank, dml2_pstate_method_vblank, dml2_pstate_method_vblank, dml2_pstate_method_na }, .allow_state_increase = true, }, */ @@ -145,32 +145,32 @@ static const int base_strategy_list_3_display_size = sizeof(base_strategy_list_3 static const struct dml2_pmo_pstate_strategy base_strategy_list_4_display[] = { // All VActive { - .per_stream_pstate_method = { dml2_pmo_pstate_strategy_vactive, dml2_pmo_pstate_strategy_vactive, dml2_pmo_pstate_strategy_vactive, dml2_pmo_pstate_strategy_vactive }, + .per_stream_pstate_method = { dml2_pstate_method_vactive, dml2_pstate_method_vactive, dml2_pstate_method_vactive, dml2_pstate_method_vactive }, .allow_state_increase = true, }, // VActive + 1 VBlank { - .per_stream_pstate_method = { dml2_pmo_pstate_strategy_vactive, dml2_pmo_pstate_strategy_vactive, dml2_pmo_pstate_strategy_vactive, dml2_pmo_pstate_strategy_vblank }, + .per_stream_pstate_method = { dml2_pstate_method_vactive, dml2_pstate_method_vactive, dml2_pstate_method_vactive, dml2_pstate_method_vblank }, .allow_state_increase = false, }, // All Vblank { - .per_stream_pstate_method = { dml2_pmo_pstate_strategy_vblank, dml2_pmo_pstate_strategy_vblank, dml2_pmo_pstate_strategy_vblank, dml2_pmo_pstate_strategy_vblank }, + .per_stream_pstate_method = { dml2_pstate_method_vblank, dml2_pstate_method_vblank, dml2_pstate_method_vblank, dml2_pstate_method_vblank }, .allow_state_increase = false, }, // All DRR { - .per_stream_pstate_method = { dml2_pmo_pstate_strategy_fw_drr, dml2_pmo_pstate_strategy_fw_drr, dml2_pmo_pstate_strategy_fw_drr, dml2_pmo_pstate_strategy_fw_drr }, + .per_stream_pstate_method = { dml2_pstate_method_fw_drr, dml2_pstate_method_fw_drr, dml2_pstate_method_fw_drr, dml2_pstate_method_fw_drr }, .allow_state_increase = true, }, // All VBlank, with state increase allowed /* { - .per_stream_pstate_method = { dml2_pmo_pstate_strategy_vblank, dml2_pmo_pstate_strategy_vblank, dml2_pmo_pstate_strategy_vblank, dml2_pmo_pstate_strategy_vblank }, + .per_stream_pstate_method = { dml2_pstate_method_vblank, dml2_pstate_method_vblank, dml2_pstate_method_vblank, dml2_pstate_method_vblank }, .allow_state_increase = true, }, */ @@ -355,29 +355,30 @@ bool pmo_dcn4_fams2_optimize_dcc_mcache(struct dml2_pmo_optimize_dcc_mcache_in_o return result; } -static enum dml2_pmo_pstate_method convert_strategy_to_drr_variant(const enum dml2_pmo_pstate_method base_strategy) +static enum dml2_pstate_method convert_strategy_to_drr_variant(const enum dml2_pstate_method base_strategy) { - enum dml2_pmo_pstate_method variant_strategy = 0; + enum dml2_pstate_method variant_strategy = 0; switch (base_strategy) { - case dml2_pmo_pstate_strategy_vactive: - variant_strategy = dml2_pmo_pstate_strategy_fw_vactive_drr; + case dml2_pstate_method_vactive: + variant_strategy = dml2_pstate_method_fw_vactive_drr; break; - case dml2_pmo_pstate_strategy_vblank: - variant_strategy = dml2_pmo_pstate_strategy_fw_vblank_drr; + case dml2_pstate_method_vblank: + variant_strategy = dml2_pstate_method_fw_vblank_drr; break; - case dml2_pmo_pstate_strategy_fw_svp: - variant_strategy = dml2_pmo_pstate_strategy_fw_svp_drr; + case dml2_pstate_method_fw_svp: + variant_strategy = dml2_pstate_method_fw_svp_drr; break; - case dml2_pmo_pstate_strategy_fw_vactive_drr: - case dml2_pmo_pstate_strategy_fw_vblank_drr: - case dml2_pmo_pstate_strategy_fw_svp_drr: - case dml2_pmo_pstate_strategy_fw_drr: - case dml2_pmo_pstate_strategy_reserved_hw: - case dml2_pmo_pstate_strategy_reserved_fw: - case dml2_pmo_pstate_strategy_reserved_fw_drr_clamped: - case dml2_pmo_pstate_strategy_reserved_fw_drr_var: - case dml2_pmo_pstate_strategy_na: + case dml2_pstate_method_fw_vactive_drr: + case dml2_pstate_method_fw_vblank_drr: + case dml2_pstate_method_fw_svp_drr: + case dml2_pstate_method_fw_drr: + case dml2_pstate_method_reserved_hw: + case dml2_pstate_method_reserved_fw: + case dml2_pstate_method_reserved_fw_drr_clamped: + case dml2_pstate_method_reserved_fw_drr_var: + case dml2_pstate_method_count: + case dml2_pstate_method_na: default: /* no variant for this mode */ variant_strategy = base_strategy; @@ -419,23 +420,22 @@ static unsigned int get_num_expanded_strategies( static void insert_strategy_into_expanded_list( const struct dml2_pmo_pstate_strategy *per_stream_pstate_strategy, - int stream_count, - struct dml2_pmo_init_data *init_data) + const int stream_count, + struct dml2_pmo_pstate_strategy *expanded_strategy_list, + unsigned int *num_expanded_strategies) { - struct dml2_pmo_pstate_strategy *expanded_strategy_list = NULL; - - expanded_strategy_list = get_expanded_strategy_list(init_data, stream_count); - - if (expanded_strategy_list) { - memcpy(&expanded_strategy_list[init_data->pmo_dcn4.num_expanded_strategies_per_list[stream_count - 1]], per_stream_pstate_strategy, sizeof(struct dml2_pmo_pstate_strategy)); + if (expanded_strategy_list && num_expanded_strategies) { + memcpy(&expanded_strategy_list[*num_expanded_strategies], per_stream_pstate_strategy, sizeof(struct dml2_pmo_pstate_strategy)); - init_data->pmo_dcn4.num_expanded_strategies_per_list[stream_count - 1]++; + (*num_expanded_strategies)++; } } -static void expand_base_strategy(struct dml2_pmo_instance *pmo, +static void expand_base_strategy( const struct dml2_pmo_pstate_strategy *base_strategy, - unsigned int stream_count) + const unsigned int stream_count, + struct dml2_pmo_pstate_strategy *expanded_strategy_list, + unsigned int *num_expanded_strategies) { bool skip_to_next_stream; bool expanded_strategy_added; @@ -473,7 +473,7 @@ static void expand_base_strategy(struct dml2_pmo_instance *pmo, if (i >= stream_count - 1) { /* insert into strategy list */ - insert_strategy_into_expanded_list(&cur_strategy_list, stream_count, &pmo->init_data); + insert_strategy_into_expanded_list(&cur_strategy_list, stream_count, expanded_strategy_list, num_expanded_strategies); expanded_strategy_added = true; } else { /* skip to next stream */ @@ -512,9 +512,9 @@ static void expand_base_strategy(struct dml2_pmo_instance *pmo, static bool is_variant_method_valid(const struct dml2_pmo_pstate_strategy *base_strategy, const struct dml2_pmo_pstate_strategy *variant_strategy, - unsigned int num_streams_per_base_method[PMO_DCN4_MAX_DISPLAYS], - unsigned int num_streams_per_variant_method[PMO_DCN4_MAX_DISPLAYS], - unsigned int stream_count) + const unsigned int num_streams_per_base_method[PMO_DCN4_MAX_DISPLAYS], + const unsigned int num_streams_per_variant_method[PMO_DCN4_MAX_DISPLAYS], + const unsigned int stream_count) { bool valid = true; unsigned int i; @@ -522,7 +522,7 @@ static bool is_variant_method_valid(const struct dml2_pmo_pstate_strategy *base_ /* check all restrictions are met */ for (i = 0; i < stream_count; i++) { /* vblank + vblank_drr variants are invalid */ - if (base_strategy->per_stream_pstate_method[i] == dml2_pmo_pstate_strategy_vblank && + if (base_strategy->per_stream_pstate_method[i] == dml2_pstate_method_vblank && ((num_streams_per_base_method[i] > 0 && num_streams_per_variant_method[i] > 0) || num_streams_per_variant_method[i] > 1)) { valid = false; @@ -533,9 +533,12 @@ static bool is_variant_method_valid(const struct dml2_pmo_pstate_strategy *base_ return valid; } -static void expand_variant_strategy(struct dml2_pmo_instance *pmo, +static void expand_variant_strategy( const struct dml2_pmo_pstate_strategy *base_strategy, - unsigned int stream_count) + const unsigned int stream_count, + const bool should_permute, + struct dml2_pmo_pstate_strategy *expanded_strategy_list, + unsigned int *num_expanded_strategies) { bool variant_found; unsigned int i, j; @@ -544,7 +547,7 @@ static void expand_variant_strategy(struct dml2_pmo_instance *pmo, unsigned int num_streams_per_method[PMO_DCN4_MAX_DISPLAYS] = { 0 }; unsigned int num_streams_per_base_method[PMO_DCN4_MAX_DISPLAYS] = { 0 }; unsigned int num_streams_per_variant_method[PMO_DCN4_MAX_DISPLAYS] = { 0 }; - enum dml2_pmo_pstate_method per_stream_variant_method[DML2_MAX_PLANES]; + enum dml2_pstate_method per_stream_variant_method[DML2_MAX_PLANES]; struct dml2_pmo_pstate_strategy variant_strategy = { 0 }; /* determine number of displays per method */ @@ -585,7 +588,13 @@ static void expand_variant_strategy(struct dml2_pmo_instance *pmo, } if (variant_found && is_variant_method_valid(base_strategy, &variant_strategy, num_streams_per_base_method, num_streams_per_variant_method, stream_count)) { - expand_base_strategy(pmo, &variant_strategy, stream_count); + if (should_permute) { + /* permutations are permitted, proceed to expand */ + expand_base_strategy(&variant_strategy, stream_count, expanded_strategy_list, num_expanded_strategies); + } else { + /* no permutations allowed, so add to list now */ + insert_strategy_into_expanded_list(&variant_strategy, stream_count, expanded_strategy_list, num_expanded_strategies); + } } /* rollback to earliest method with bases remaining */ @@ -612,18 +621,19 @@ static void expand_variant_strategy(struct dml2_pmo_instance *pmo, } } -static void expand_base_strategies( - struct dml2_pmo_instance *pmo, - const struct dml2_pmo_pstate_strategy *base_strategies_list, - const unsigned int num_base_strategies, - unsigned int stream_count) +void pmo_dcn4_fams2_expand_base_pstate_strategies( + const struct dml2_pmo_pstate_strategy *base_strategies_list, + const unsigned int num_base_strategies, + const unsigned int stream_count, + struct dml2_pmo_pstate_strategy *expanded_strategy_list, + unsigned int *num_expanded_strategies) { unsigned int i; /* expand every explicit base strategy (except all DRR) */ for (i = 0; i < num_base_strategies; i++) { - expand_base_strategy(pmo, &base_strategies_list[i], stream_count); - expand_variant_strategy(pmo, &base_strategies_list[i], stream_count); + expand_base_strategy(&base_strategies_list[i], stream_count, expanded_strategy_list, num_expanded_strategies); + expand_variant_strategy(&base_strategies_list[i], stream_count, true, expanded_strategy_list, num_expanded_strategies); } } @@ -652,25 +662,45 @@ bool pmo_dcn4_fams2_initialize(struct dml2_pmo_initialize_in_out *in_out) DML2_ASSERT(base_strategy_list_1_display_size <= PMO_DCN4_MAX_BASE_STRATEGIES); /* populate list */ - expand_base_strategies(pmo, base_strategy_list_1_display, base_strategy_list_1_display_size, 1); + pmo_dcn4_fams2_expand_base_pstate_strategies( + base_strategy_list_1_display, + base_strategy_list_1_display_size, + i, + pmo->init_data.pmo_dcn4.expanded_strategy_list_1_display, + &pmo->init_data.pmo_dcn4.num_expanded_strategies_per_list[i - 1]); break; case 2: DML2_ASSERT(base_strategy_list_2_display_size <= PMO_DCN4_MAX_BASE_STRATEGIES); /* populate list */ - expand_base_strategies(pmo, base_strategy_list_2_display, base_strategy_list_2_display_size, 2); + pmo_dcn4_fams2_expand_base_pstate_strategies( + base_strategy_list_2_display, + base_strategy_list_2_display_size, + i, + pmo->init_data.pmo_dcn4.expanded_strategy_list_2_display, + &pmo->init_data.pmo_dcn4.num_expanded_strategies_per_list[i - 1]); break; case 3: DML2_ASSERT(base_strategy_list_3_display_size <= PMO_DCN4_MAX_BASE_STRATEGIES); /* populate list */ - expand_base_strategies(pmo, base_strategy_list_3_display, base_strategy_list_3_display_size, 3); + pmo_dcn4_fams2_expand_base_pstate_strategies( + base_strategy_list_3_display, + base_strategy_list_3_display_size, + i, + pmo->init_data.pmo_dcn4.expanded_strategy_list_3_display, + &pmo->init_data.pmo_dcn4.num_expanded_strategies_per_list[i - 1]); break; case 4: DML2_ASSERT(base_strategy_list_4_display_size <= PMO_DCN4_MAX_BASE_STRATEGIES); /* populate list */ - expand_base_strategies(pmo, base_strategy_list_4_display, base_strategy_list_4_display_size, 4); + pmo_dcn4_fams2_expand_base_pstate_strategies( + base_strategy_list_4_display, + base_strategy_list_4_display_size, + i, + pmo->init_data.pmo_dcn4.expanded_strategy_list_4_display, + &pmo->init_data.pmo_dcn4.num_expanded_strategies_per_list[i - 1]); break; } } @@ -941,11 +971,8 @@ static void build_synchronized_timing_groups( /* find synchronizable timing groups */ for (j = i + 1; j < display_config->display_config.num_streams; j++) { if (memcmp(master_timing, - &display_config->display_config.stream_descriptors[j].timing, - sizeof(struct dml2_timing_cfg)) == 0 && - display_config->display_config.stream_descriptors[i].output.output_encoder == display_config->display_config.stream_descriptors[j].output.output_encoder && - (display_config->display_config.stream_descriptors[i].output.output_encoder != dml2_hdmi || //hdmi requires formats match - display_config->display_config.stream_descriptors[i].output.output_format == display_config->display_config.stream_descriptors[j].output.output_format)) { + &display_config->display_config.stream_descriptors[j].timing, + sizeof(struct dml2_timing_cfg)) == 0) { set_bit_in_bitfield(&pmo->scratch.pmo_dcn4.synchronized_timing_group_masks[timing_group_idx], j); set_bit_in_bitfield(&stream_mapped_mask, j); } @@ -1106,24 +1133,73 @@ static void insert_into_candidate_list(const struct dml2_pmo_pstate_strategy *ps scratch->pmo_dcn4.num_pstate_candidates++; } -static bool all_planes_match_method(const struct display_configuation_with_meta *display_cfg, int plane_mask, enum dml2_pmo_pstate_method method) +static enum dml2_pstate_method uclk_pstate_strategy_override_to_pstate_method(const enum dml2_uclk_pstate_change_strategy override_strategy) { - unsigned char i; - enum dml2_uclk_pstate_change_strategy matching_strategy = (enum dml2_uclk_pstate_change_strategy) dml2_pmo_pstate_strategy_na; + enum dml2_pstate_method method = dml2_pstate_method_na; + + switch (override_strategy) { + case dml2_uclk_pstate_change_strategy_force_vactive: + method = dml2_pstate_method_vactive; + break; + case dml2_uclk_pstate_change_strategy_force_vblank: + method = dml2_pstate_method_vblank; + break; + case dml2_uclk_pstate_change_strategy_force_drr: + method = dml2_pstate_method_fw_drr; + break; + case dml2_uclk_pstate_change_strategy_force_mall_svp: + method = dml2_pstate_method_fw_svp; + break; + case dml2_uclk_pstate_change_strategy_force_mall_full_frame: + case dml2_uclk_pstate_change_strategy_auto: + default: + method = dml2_pstate_method_na; + } - if (method == dml2_pmo_pstate_strategy_vactive || method == dml2_pmo_pstate_strategy_fw_vactive_drr) - matching_strategy = dml2_uclk_pstate_change_strategy_force_vactive; - else if (method == dml2_pmo_pstate_strategy_vblank || method == dml2_pmo_pstate_strategy_fw_vblank_drr) - matching_strategy = dml2_uclk_pstate_change_strategy_force_vblank; - else if (method == dml2_pmo_pstate_strategy_fw_svp) - matching_strategy = dml2_uclk_pstate_change_strategy_force_mall_svp; - else if (method == dml2_pmo_pstate_strategy_fw_drr) - matching_strategy = dml2_uclk_pstate_change_strategy_force_drr; + return method; +} + +static enum dml2_uclk_pstate_change_strategy pstate_method_to_uclk_pstate_strategy_override(const enum dml2_pstate_method method) +{ + enum dml2_uclk_pstate_change_strategy override_strategy = dml2_uclk_pstate_change_strategy_auto; + + switch (method) { + case dml2_pstate_method_vactive: + case dml2_pstate_method_fw_vactive_drr: + override_strategy = dml2_uclk_pstate_change_strategy_force_vactive; + break; + case dml2_pstate_method_vblank: + case dml2_pstate_method_fw_vblank_drr: + override_strategy = dml2_uclk_pstate_change_strategy_force_vblank; + break; + case dml2_pstate_method_fw_svp: + case dml2_pstate_method_fw_svp_drr: + override_strategy = dml2_uclk_pstate_change_strategy_force_mall_svp; + break; + case dml2_pstate_method_fw_drr: + override_strategy = dml2_uclk_pstate_change_strategy_force_drr; + break; + case dml2_pstate_method_reserved_hw: + case dml2_pstate_method_reserved_fw: + case dml2_pstate_method_reserved_fw_drr_clamped: + case dml2_pstate_method_reserved_fw_drr_var: + case dml2_pstate_method_count: + case dml2_pstate_method_na: + default: + override_strategy = dml2_uclk_pstate_change_strategy_auto; + } + + return override_strategy; +} + +static bool all_planes_match_method(const struct display_configuation_with_meta *display_cfg, int plane_mask, enum dml2_pstate_method method) +{ + unsigned char i; for (i = 0; i < DML2_MAX_PLANES; i++) { if (is_bit_set_in_bitfield(plane_mask, i)) { if (display_cfg->display_config.plane_descriptors[i].overrides.uclk_pstate_change_strategy != dml2_uclk_pstate_change_strategy_auto && - display_cfg->display_config.plane_descriptors[i].overrides.uclk_pstate_change_strategy != matching_strategy) + display_cfg->display_config.plane_descriptors[i].overrides.uclk_pstate_change_strategy != pstate_method_to_uclk_pstate_strategy_override(method)) return false; } } @@ -1149,32 +1225,33 @@ static void build_method_scheduling_params( static struct dml2_fams2_per_method_common_meta *get_per_method_common_meta( struct dml2_pmo_instance *pmo, - enum dml2_pmo_pstate_method stream_pstate_method, + enum dml2_pstate_method stream_pstate_method, int stream_idx) { struct dml2_fams2_per_method_common_meta *stream_method_fams2_meta = NULL; switch (stream_pstate_method) { - case dml2_pmo_pstate_strategy_vactive: - case dml2_pmo_pstate_strategy_fw_vactive_drr: + case dml2_pstate_method_vactive: + case dml2_pstate_method_fw_vactive_drr: stream_method_fams2_meta = &pmo->scratch.pmo_dcn4.stream_fams2_meta[stream_idx].method_vactive.common; break; - case dml2_pmo_pstate_strategy_vblank: - case dml2_pmo_pstate_strategy_fw_vblank_drr: + case dml2_pstate_method_vblank: + case dml2_pstate_method_fw_vblank_drr: stream_method_fams2_meta = &pmo->scratch.pmo_dcn4.stream_fams2_meta[stream_idx].method_vblank.common; break; - case dml2_pmo_pstate_strategy_fw_svp: - case dml2_pmo_pstate_strategy_fw_svp_drr: + case dml2_pstate_method_fw_svp: + case dml2_pstate_method_fw_svp_drr: stream_method_fams2_meta = &pmo->scratch.pmo_dcn4.stream_fams2_meta[stream_idx].method_subvp.common; break; - case dml2_pmo_pstate_strategy_fw_drr: + case dml2_pstate_method_fw_drr: stream_method_fams2_meta = &pmo->scratch.pmo_dcn4.stream_fams2_meta[stream_idx].method_drr.common; break; - case dml2_pmo_pstate_strategy_reserved_hw: - case dml2_pmo_pstate_strategy_reserved_fw: - case dml2_pmo_pstate_strategy_reserved_fw_drr_clamped: - case dml2_pmo_pstate_strategy_reserved_fw_drr_var: - case dml2_pmo_pstate_strategy_na: + case dml2_pstate_method_reserved_hw: + case dml2_pstate_method_reserved_fw: + case dml2_pstate_method_reserved_fw_drr_clamped: + case dml2_pstate_method_reserved_fw_drr_var: + case dml2_pstate_method_count: + case dml2_pstate_method_na: default: stream_method_fams2_meta = NULL; } @@ -1215,7 +1292,7 @@ static bool is_timing_group_schedulable( if (is_bit_set_in_bitfield(pmo->scratch.pmo_dcn4.synchronized_timing_group_masks[timing_group_idx], i)) { stream_method_fams2_meta = get_per_method_common_meta(pmo, pstate_strategy->per_stream_pstate_method[i], i); if (!stream_method_fams2_meta) - return false; + continue; if (group_fams2_meta->allow_start_otg_vline < stream_method_fams2_meta->allow_start_otg_vline) { /* set group allow start to larger otg vline */ @@ -1295,7 +1372,7 @@ static bool is_config_schedulable( if (j_disallow_us < jp1_disallow_us) { /* swap as A < B */ swap(s->pmo_dcn4.sorted_group_gtl_disallow_index[j], - s->pmo_dcn4.sorted_group_gtl_disallow_index[j+1]); + s->pmo_dcn4.sorted_group_gtl_disallow_index[j+1]); swapped = true; } } @@ -1354,7 +1431,7 @@ static bool is_config_schedulable( if (j_period_us < jp1_period_us) { /* swap as A < B */ swap(s->pmo_dcn4.sorted_group_gtl_period_index[j], - s->pmo_dcn4.sorted_group_gtl_period_index[j+1]); + s->pmo_dcn4.sorted_group_gtl_period_index[j+1]); swapped = true; } } @@ -1413,7 +1490,7 @@ static bool is_config_schedulable( static bool stream_matches_drr_policy(struct dml2_pmo_instance *pmo, const struct display_configuation_with_meta *display_cfg, - const enum dml2_pmo_pstate_method stream_pstate_method, + const enum dml2_pstate_method stream_pstate_method, unsigned int stream_index) { const struct dml2_stream_parameters *stream_descriptor = &display_cfg->display_config.stream_descriptors[stream_index]; @@ -1494,19 +1571,19 @@ static bool validate_pstate_support_strategy_cofunctionality(struct dml2_pmo_ins strategy_matches_drr_requirements &= stream_matches_drr_policy(pmo, display_cfg, pstate_strategy->per_stream_pstate_method[stream_index], stream_index); - if (pstate_strategy->per_stream_pstate_method[stream_index] == dml2_pmo_pstate_strategy_fw_svp || - pstate_strategy->per_stream_pstate_method[stream_index] == dml2_pmo_pstate_strategy_fw_svp_drr) { + if (pstate_strategy->per_stream_pstate_method[stream_index] == dml2_pstate_method_fw_svp || + pstate_strategy->per_stream_pstate_method[stream_index] == dml2_pstate_method_fw_svp_drr) { svp_count++; set_bit_in_bitfield(&svp_stream_mask, stream_index); - } else if (pstate_strategy->per_stream_pstate_method[stream_index] == dml2_pmo_pstate_strategy_fw_drr) { + } else if (pstate_strategy->per_stream_pstate_method[stream_index] == dml2_pstate_method_fw_drr) { drr_count++; set_bit_in_bitfield(&drr_stream_mask, stream_index); - } else if (pstate_strategy->per_stream_pstate_method[stream_index] == dml2_pmo_pstate_strategy_vactive || - pstate_strategy->per_stream_pstate_method[stream_index] == dml2_pmo_pstate_strategy_fw_vactive_drr) { + } else if (pstate_strategy->per_stream_pstate_method[stream_index] == dml2_pstate_method_vactive || + pstate_strategy->per_stream_pstate_method[stream_index] == dml2_pstate_method_fw_vactive_drr) { vactive_count++; set_bit_in_bitfield(&vactive_stream_mask, stream_index); - } else if (pstate_strategy->per_stream_pstate_method[stream_index] == dml2_pmo_pstate_strategy_vblank || - pstate_strategy->per_stream_pstate_method[stream_index] == dml2_pmo_pstate_strategy_fw_vblank_drr) { + } else if (pstate_strategy->per_stream_pstate_method[stream_index] == dml2_pstate_method_vblank || + pstate_strategy->per_stream_pstate_method[stream_index] == dml2_pstate_method_fw_vblank_drr) { vblank_count++; set_bit_in_bitfield(&vblank_stream_mask, stream_index); } @@ -1625,7 +1702,7 @@ static void build_fams2_meta_per_stream(struct dml2_pmo_instance *pmo, /* for single stream, guarantee at least an instant of allow */ stream_fams2_meta->method_vactive.max_vactive_det_fill_delay_otg_vlines = (unsigned int)math_floor( math_max2(0.0, - timing->v_active - stream_fams2_meta->min_allow_width_otg_vlines - stream_fams2_meta->dram_clk_change_blackout_otg_vlines)); + timing->v_active - math_max2(1.0, stream_fams2_meta->min_allow_width_otg_vlines) - stream_fams2_meta->dram_clk_change_blackout_otg_vlines)); } else { /* for multi stream, bound to a max fill time defined by IP caps */ stream_fams2_meta->method_vactive.max_vactive_det_fill_delay_otg_vlines = @@ -1738,8 +1815,10 @@ bool pmo_dcn4_fams2_init_for_pstate_support(struct dml2_pmo_init_for_pstate_supp struct display_configuation_with_meta *display_config; const struct dml2_plane_parameters *plane_descriptor; const struct dml2_pmo_pstate_strategy *strategy_list = NULL; + struct dml2_pmo_pstate_strategy override_base_strategy = { 0 }; unsigned int strategy_list_size = 0; unsigned char plane_index, stream_index, i; + bool build_override_strategy = true; state->performed = true; in_out->base_display_config->stage3.min_clk_index_for_latency = in_out->base_display_config->stage1.min_clk_index_for_latency; @@ -1763,7 +1842,11 @@ bool pmo_dcn4_fams2_init_for_pstate_support(struct dml2_pmo_init_for_pstate_supp set_bit_in_bitfield(&s->pmo_dcn4.stream_plane_mask[plane_descriptor->stream_index], plane_index); - state->pstate_switch_modes[plane_index] = dml2_uclk_pstate_support_method_vactive; + state->pstate_switch_modes[plane_index] = dml2_pstate_method_vactive; + + build_override_strategy &= plane_descriptor->overrides.uclk_pstate_change_strategy != dml2_uclk_pstate_change_strategy_auto; + override_base_strategy.per_stream_pstate_method[plane_descriptor->stream_index] = + uclk_pstate_strategy_override_to_pstate_method(plane_descriptor->overrides.uclk_pstate_change_strategy); } // Figure out which streams can do vactive, and also build up implicit SVP and FAMS2 meta @@ -1781,13 +1864,30 @@ bool pmo_dcn4_fams2_init_for_pstate_support(struct dml2_pmo_init_for_pstate_supp /* get synchronized timing groups */ build_synchronized_timing_groups(pmo, display_config); - strategy_list = get_expanded_strategy_list(&pmo->init_data, display_config->display_config.num_streams); - if (!strategy_list) - return false; - - strategy_list_size = get_num_expanded_strategies(&pmo->init_data, display_config->display_config.num_streams); + if (build_override_strategy) { + /* build expanded override strategy list (no permutations) */ + override_base_strategy.allow_state_increase = true; + s->pmo_dcn4.num_expanded_override_strategies = 0; + insert_strategy_into_expanded_list(&override_base_strategy, + display_config->display_config.num_streams, + s->pmo_dcn4.expanded_override_strategy_list, + &s->pmo_dcn4.num_expanded_override_strategies); + expand_variant_strategy(&override_base_strategy, + display_config->display_config.num_streams, + false, + s->pmo_dcn4.expanded_override_strategy_list, + &s->pmo_dcn4.num_expanded_override_strategies); + + /* use override strategy list */ + strategy_list = s->pmo_dcn4.expanded_override_strategy_list; + strategy_list_size = s->pmo_dcn4.num_expanded_override_strategies; + } else { + /* use predefined strategy list */ + strategy_list = get_expanded_strategy_list(&pmo->init_data, display_config->display_config.num_streams); + strategy_list_size = get_num_expanded_strategies(&pmo->init_data, display_config->display_config.num_streams); + } - if (strategy_list_size == 0) + if (!strategy_list || strategy_list_size == 0) return false; s->pmo_dcn4.num_pstate_candidates = 0; @@ -1799,7 +1899,7 @@ bool pmo_dcn4_fams2_init_for_pstate_support(struct dml2_pmo_init_for_pstate_supp } if (s->pmo_dcn4.num_pstate_candidates > 0) { - s->pmo_dcn4.pstate_strategy_candidates[s->pmo_dcn4.num_pstate_candidates - 1].allow_state_increase = true; + s->pmo_dcn4.pstate_strategy_candidates[s->pmo_dcn4.num_pstate_candidates-1].allow_state_increase = true; s->pmo_dcn4.cur_pstate_candidate = -1; return true; } else { @@ -1832,7 +1932,7 @@ static void reset_display_configuration(struct display_configuation_with_meta *d // Reset strategy to auto plane->overrides.uclk_pstate_change_strategy = dml2_uclk_pstate_change_strategy_auto; - display_config->stage3.pstate_switch_modes[plane_index] = dml2_uclk_pstate_support_method_not_supported; + display_config->stage3.pstate_switch_modes[plane_index] = dml2_pstate_method_na; } } @@ -1849,7 +1949,7 @@ static void setup_planes_for_drr_by_mask(struct display_configuation_with_meta * plane->overrides.uclk_pstate_change_strategy = dml2_uclk_pstate_change_strategy_force_drr; - display_config->stage3.pstate_switch_modes[plane_index] = dml2_uclk_pstate_support_method_fw_drr; + display_config->stage3.pstate_switch_modes[plane_index] = dml2_pstate_method_fw_drr; } } @@ -1867,7 +1967,7 @@ static void setup_planes_for_svp_by_mask(struct display_configuation_with_meta * for (plane_index = 0; plane_index < display_config->display_config.num_planes; plane_index++) { if (is_bit_set_in_bitfield(plane_mask, plane_index)) { stream_index = (char)display_config->display_config.plane_descriptors[plane_index].stream_index; - display_config->stage3.pstate_switch_modes[plane_index] = dml2_uclk_pstate_support_method_fw_subvp_phantom; + display_config->stage3.pstate_switch_modes[plane_index] = dml2_pstate_method_fw_svp; } } @@ -1890,7 +1990,7 @@ static void setup_planes_for_svp_drr_by_mask(struct display_configuation_with_me for (plane_index = 0; plane_index < display_config->display_config.num_planes; plane_index++) { if (is_bit_set_in_bitfield(plane_mask, plane_index)) { stream_index = (char)display_config->display_config.plane_descriptors[plane_index].stream_index; - display_config->stage3.pstate_switch_modes[plane_index] = dml2_uclk_pstate_support_method_fw_subvp_phantom_drr; + display_config->stage3.pstate_switch_modes[plane_index] = dml2_pstate_method_fw_svp_drr; } } @@ -1915,7 +2015,7 @@ static void setup_planes_for_vblank_by_mask(struct display_configuation_with_met plane->overrides.reserved_vblank_time_ns = (long)math_max2(pmo->soc_bb->power_management_parameters.dram_clk_change_blackout_us * 1000.0, plane->overrides.reserved_vblank_time_ns); - display_config->stage3.pstate_switch_modes[plane_index] = dml2_uclk_pstate_support_method_vblank; + display_config->stage3.pstate_switch_modes[plane_index] = dml2_pstate_method_vblank; } } @@ -1933,7 +2033,7 @@ static void setup_planes_for_vblank_drr_by_mask(struct display_configuation_with plane = &display_config->display_config.plane_descriptors[plane_index]; plane->overrides.reserved_vblank_time_ns = (long)(pmo->soc_bb->power_management_parameters.dram_clk_change_blackout_us * 1000); - display_config->stage3.pstate_switch_modes[plane_index] = dml2_uclk_pstate_support_method_fw_vblank_drr; + display_config->stage3.pstate_switch_modes[plane_index] = dml2_pstate_method_fw_vblank_drr; } } } @@ -1949,7 +2049,7 @@ static void setup_planes_for_vactive_by_mask(struct display_configuation_with_me if (is_bit_set_in_bitfield(plane_mask, plane_index)) { stream_index = display_config->display_config.plane_descriptors[plane_index].stream_index; - display_config->stage3.pstate_switch_modes[plane_index] = dml2_uclk_pstate_support_method_vactive; + display_config->stage3.pstate_switch_modes[plane_index] = dml2_pstate_method_vactive; if (!pmo->options->disable_vactive_det_fill_bw_pad) { display_config->display_config.plane_descriptors[plane_index].overrides.max_vactive_det_fill_delay_us = @@ -1970,7 +2070,7 @@ static void setup_planes_for_vactive_drr_by_mask(struct display_configuation_wit if (is_bit_set_in_bitfield(plane_mask, plane_index)) { stream_index = display_config->display_config.plane_descriptors[plane_index].stream_index; - display_config->stage3.pstate_switch_modes[plane_index] = dml2_uclk_pstate_support_method_fw_vactive_drr; + display_config->stage3.pstate_switch_modes[plane_index] = dml2_pstate_method_fw_vactive_drr; if (!pmo->options->disable_vactive_det_fill_bw_pad) { display_config->display_config.plane_descriptors[plane_index].overrides.max_vactive_det_fill_delay_us = @@ -1992,26 +2092,26 @@ static bool setup_display_config(struct display_configuation_with_meta *display_ for (stream_index = 0; stream_index < display_config->display_config.num_streams; stream_index++) { - if (pmo->scratch.pmo_dcn4.pstate_strategy_candidates[strategy_index].per_stream_pstate_method[stream_index] == dml2_pmo_pstate_strategy_na) { + if (pmo->scratch.pmo_dcn4.pstate_strategy_candidates[strategy_index].per_stream_pstate_method[stream_index] == dml2_pstate_method_na) { success = false; break; - } else if (scratch->pmo_dcn4.pstate_strategy_candidates[strategy_index].per_stream_pstate_method[stream_index] == dml2_pmo_pstate_strategy_vactive) { + } else if (scratch->pmo_dcn4.pstate_strategy_candidates[strategy_index].per_stream_pstate_method[stream_index] == dml2_pstate_method_vactive) { setup_planes_for_vactive_by_mask(display_config, pmo, scratch->pmo_dcn4.stream_plane_mask[stream_index]); - } else if (scratch->pmo_dcn4.pstate_strategy_candidates[strategy_index].per_stream_pstate_method[stream_index] == dml2_pmo_pstate_strategy_vblank) { + } else if (scratch->pmo_dcn4.pstate_strategy_candidates[strategy_index].per_stream_pstate_method[stream_index] == dml2_pstate_method_vblank) { setup_planes_for_vblank_by_mask(display_config, pmo, scratch->pmo_dcn4.stream_plane_mask[stream_index]); - } else if (scratch->pmo_dcn4.pstate_strategy_candidates[strategy_index].per_stream_pstate_method[stream_index] == dml2_pmo_pstate_strategy_fw_svp) { + } else if (scratch->pmo_dcn4.pstate_strategy_candidates[strategy_index].per_stream_pstate_method[stream_index] == dml2_pstate_method_fw_svp) { fams2_required = true; setup_planes_for_svp_by_mask(display_config, pmo, scratch->pmo_dcn4.stream_plane_mask[stream_index]); - } else if (scratch->pmo_dcn4.pstate_strategy_candidates[strategy_index].per_stream_pstate_method[stream_index] == dml2_pmo_pstate_strategy_fw_vactive_drr) { + } else if (scratch->pmo_dcn4.pstate_strategy_candidates[strategy_index].per_stream_pstate_method[stream_index] == dml2_pstate_method_fw_vactive_drr) { fams2_required = true; setup_planes_for_vactive_drr_by_mask(display_config, pmo, scratch->pmo_dcn4.stream_plane_mask[stream_index]); - } else if (scratch->pmo_dcn4.pstate_strategy_candidates[strategy_index].per_stream_pstate_method[stream_index] == dml2_pmo_pstate_strategy_fw_vblank_drr) { + } else if (scratch->pmo_dcn4.pstate_strategy_candidates[strategy_index].per_stream_pstate_method[stream_index] == dml2_pstate_method_fw_vblank_drr) { fams2_required = true; setup_planes_for_vblank_drr_by_mask(display_config, pmo, scratch->pmo_dcn4.stream_plane_mask[stream_index]); - } else if (scratch->pmo_dcn4.pstate_strategy_candidates[strategy_index].per_stream_pstate_method[stream_index] == dml2_pmo_pstate_strategy_fw_svp_drr) { + } else if (scratch->pmo_dcn4.pstate_strategy_candidates[strategy_index].per_stream_pstate_method[stream_index] == dml2_pstate_method_fw_svp_drr) { fams2_required = true; setup_planes_for_svp_drr_by_mask(display_config, pmo, scratch->pmo_dcn4.stream_plane_mask[stream_index]); - } else if (scratch->pmo_dcn4.pstate_strategy_candidates[strategy_index].per_stream_pstate_method[stream_index] == dml2_pmo_pstate_strategy_fw_drr) { + } else if (scratch->pmo_dcn4.pstate_strategy_candidates[strategy_index].per_stream_pstate_method[stream_index] == dml2_pstate_method_fw_drr) { fams2_required = true; setup_planes_for_drr_by_mask(display_config, pmo, scratch->pmo_dcn4.stream_plane_mask[stream_index]); } @@ -2066,34 +2166,34 @@ bool pmo_dcn4_fams2_test_for_pstate_support(struct dml2_pmo_test_for_pstate_supp for (stream_index = 0; stream_index < in_out->base_display_config->display_config.num_streams; stream_index++) { struct dml2_fams2_meta *stream_fams2_meta = &s->pmo_dcn4.stream_fams2_meta[stream_index]; - if (s->pmo_dcn4.pstate_strategy_candidates[s->pmo_dcn4.cur_pstate_candidate].per_stream_pstate_method[stream_index] == dml2_pmo_pstate_strategy_vactive || - s->pmo_dcn4.pstate_strategy_candidates[s->pmo_dcn4.cur_pstate_candidate].per_stream_pstate_method[stream_index] == dml2_pmo_pstate_strategy_fw_vactive_drr) { + if (s->pmo_dcn4.pstate_strategy_candidates[s->pmo_dcn4.cur_pstate_candidate].per_stream_pstate_method[stream_index] == dml2_pstate_method_vactive || + s->pmo_dcn4.pstate_strategy_candidates[s->pmo_dcn4.cur_pstate_candidate].per_stream_pstate_method[stream_index] == dml2_pstate_method_fw_vactive_drr) { if (get_vactive_pstate_margin(in_out->base_display_config, s->pmo_dcn4.stream_plane_mask[stream_index]) < (MIN_VACTIVE_MARGIN_PCT * in_out->instance->soc_bb->power_management_parameters.dram_clk_change_blackout_us) || get_vactive_det_fill_latency_delay_us(in_out->base_display_config, s->pmo_dcn4.stream_plane_mask[stream_index]) > stream_fams2_meta->method_vactive.max_vactive_det_fill_delay_us) { p_state_supported = false; break; } - } else if (s->pmo_dcn4.pstate_strategy_candidates[s->pmo_dcn4.cur_pstate_candidate].per_stream_pstate_method[stream_index] == dml2_pmo_pstate_strategy_vblank || - s->pmo_dcn4.pstate_strategy_candidates[s->pmo_dcn4.cur_pstate_candidate].per_stream_pstate_method[stream_index] == dml2_pmo_pstate_strategy_fw_vblank_drr) { + } else if (s->pmo_dcn4.pstate_strategy_candidates[s->pmo_dcn4.cur_pstate_candidate].per_stream_pstate_method[stream_index] == dml2_pstate_method_vblank || + s->pmo_dcn4.pstate_strategy_candidates[s->pmo_dcn4.cur_pstate_candidate].per_stream_pstate_method[stream_index] == dml2_pstate_method_fw_vblank_drr) { if (get_minimum_reserved_time_us_for_planes(in_out->base_display_config, s->pmo_dcn4.stream_plane_mask[stream_index]) < REQUIRED_RESERVED_TIME || get_vactive_pstate_margin(in_out->base_display_config, s->pmo_dcn4.stream_plane_mask[stream_index]) < MIN_VACTIVE_MARGIN_VBLANK) { p_state_supported = false; break; } - } else if (s->pmo_dcn4.pstate_strategy_candidates[s->pmo_dcn4.cur_pstate_candidate].per_stream_pstate_method[stream_index] == dml2_pmo_pstate_strategy_fw_svp || - s->pmo_dcn4.pstate_strategy_candidates[s->pmo_dcn4.cur_pstate_candidate].per_stream_pstate_method[stream_index] == dml2_pmo_pstate_strategy_fw_svp_drr) { + } else if (s->pmo_dcn4.pstate_strategy_candidates[s->pmo_dcn4.cur_pstate_candidate].per_stream_pstate_method[stream_index] == dml2_pstate_method_fw_svp || + s->pmo_dcn4.pstate_strategy_candidates[s->pmo_dcn4.cur_pstate_candidate].per_stream_pstate_method[stream_index] == dml2_pstate_method_fw_svp_drr) { if (in_out->base_display_config->stage3.stream_svp_meta[stream_index].valid == false) { p_state_supported = false; break; } - } else if (s->pmo_dcn4.pstate_strategy_candidates[s->pmo_dcn4.cur_pstate_candidate].per_stream_pstate_method[stream_index] == dml2_pmo_pstate_strategy_fw_drr) { - if (!all_planes_match_method(in_out->base_display_config, s->pmo_dcn4.stream_plane_mask[stream_index], dml2_pmo_pstate_strategy_fw_drr) || + } else if (s->pmo_dcn4.pstate_strategy_candidates[s->pmo_dcn4.cur_pstate_candidate].per_stream_pstate_method[stream_index] == dml2_pstate_method_fw_drr) { + if (!all_planes_match_method(in_out->base_display_config, s->pmo_dcn4.stream_plane_mask[stream_index], dml2_pstate_method_fw_drr) || get_vactive_pstate_margin(in_out->base_display_config, s->pmo_dcn4.stream_plane_mask[stream_index]) < MIN_VACTIVE_MARGIN_DRR) { p_state_supported = false; break; } - } else if (s->pmo_dcn4.pstate_strategy_candidates[s->pmo_dcn4.cur_pstate_candidate].per_stream_pstate_method[stream_index] == dml2_pmo_pstate_strategy_na) { + } else if (s->pmo_dcn4.pstate_strategy_candidates[s->pmo_dcn4.cur_pstate_candidate].per_stream_pstate_method[stream_index] == dml2_pstate_method_na) { p_state_supported = false; break; } diff --git a/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_pmo/dml2_pmo_dcn4_fams2.h b/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_pmo/dml2_pmo_dcn4_fams2.h index 0c25bd3e9ac021ce3f5f4ec21ada30d1275af06f..6baab7ad6ecc3e507bc3a2d0cab52ceeea283911 100644 --- a/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_pmo/dml2_pmo_dcn4_fams2.h +++ b/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_pmo/dml2_pmo_dcn4_fams2.h @@ -23,4 +23,11 @@ bool pmo_dcn4_fams2_init_for_stutter(struct dml2_pmo_init_for_stutter_in_out *in bool pmo_dcn4_fams2_test_for_stutter(struct dml2_pmo_test_for_stutter_in_out *in_out); bool pmo_dcn4_fams2_optimize_for_stutter(struct dml2_pmo_optimize_for_stutter_in_out *in_out); +void pmo_dcn4_fams2_expand_base_pstate_strategies( + const struct dml2_pmo_pstate_strategy *base_strategies_list, + const unsigned int num_base_strategies, + const unsigned int stream_count, + struct dml2_pmo_pstate_strategy *expanded_strategy_list, + unsigned int *num_expanded_strategies); + #endif diff --git a/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_pmo/dml2_pmo_factory.c b/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_pmo/dml2_pmo_factory.c index add51d41a515814c15044d33c2c8d902488766f5..7ed0242a4b3311e470b7e75b51b6cddb619c2514 100644 --- a/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_pmo/dml2_pmo_factory.c +++ b/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_pmo/dml2_pmo_factory.c @@ -72,7 +72,6 @@ bool dml2_pmo_create(enum dml2_project_id project_id, struct dml2_pmo_instance * out->init_for_stutter = pmo_dcn4_fams2_init_for_stutter; out->test_for_stutter = pmo_dcn4_fams2_test_for_stutter; out->optimize_for_stutter = pmo_dcn4_fams2_optimize_for_stutter; - result = true; break; case dml2_project_invalid: diff --git a/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_top/dml2_top_interfaces.c b/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_top/dml2_top_interfaces.c new file mode 100644 index 0000000000000000000000000000000000000000..5f6dfc24df69dc1289f7bed143d8322afeb23b46 --- /dev/null +++ b/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_top/dml2_top_interfaces.c @@ -0,0 +1,51 @@ +// SPDX-License-Identifier: MIT +// +// Copyright 2024 Advanced Micro Devices, Inc. + +#include "dml_top.h" +#include "dml2_internal_shared_types.h" +#include "dml2_top_soc15.h" + +unsigned int dml2_get_instance_size_bytes(void) +{ + return sizeof(struct dml2_instance); +} + +bool dml2_initialize_instance(struct dml2_initialize_instance_in_out *in_out) +{ + switch (in_out->options.project_id) { + case dml2_project_dcn4x_stage1: + return false; + case dml2_project_dcn4x_stage2: + case dml2_project_dcn4x_stage2_auto_drr_svp: + return dml2_top_soc15_initialize_instance(in_out); + case dml2_project_invalid: + default: + return false; + } +} + +bool dml2_check_mode_supported(struct dml2_check_mode_supported_in_out *in_out) +{ + if (!in_out->dml2_instance->funcs.check_mode_supported) + return false; + + return in_out->dml2_instance->funcs.check_mode_supported(in_out); +} + +bool dml2_build_mode_programming(struct dml2_build_mode_programming_in_out *in_out) +{ + if (!in_out->dml2_instance->funcs.build_mode_programming) + return false; + + return in_out->dml2_instance->funcs.build_mode_programming(in_out); +} + +bool dml2_build_mcache_programming(struct dml2_build_mcache_programming_in_out *in_out) +{ + if (!in_out->dml2_instance->funcs.build_mcache_programming) + return false; + + return in_out->dml2_instance->funcs.build_mcache_programming(in_out); +} + diff --git a/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_top/dml2_top_legacy.c b/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_top/dml2_top_legacy.c new file mode 100644 index 0000000000000000000000000000000000000000..db0a30fdb58d720696d154096523d5834ddd19c7 --- /dev/null +++ b/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_top/dml2_top_legacy.c @@ -0,0 +1,4 @@ +// SPDX-License-Identifier: MIT +// +// Copyright 2024 Advanced Micro Devices, Inc. + diff --git a/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_top/dml2_top_legacy.h b/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_top/dml2_top_legacy.h new file mode 100644 index 0000000000000000000000000000000000000000..14d0ae03dce6d5a27fde164025705f66beb2e98c --- /dev/null +++ b/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_top/dml2_top_legacy.h @@ -0,0 +1,9 @@ +// SPDX-License-Identifier: MIT +// +// Copyright 2024 Advanced Micro Devices, Inc. + +#ifndef __DML2_TOP_LEGACY_H__ +#define __DML2_TOP_LEGACY_H__ +#include "dml2_internal_shared_types.h" +bool dml2_top_legacy_initialize_instance(struct dml2_initialize_instance_in_out *in_out); +#endif /* __DML2_TOP_LEGACY_H__ */ diff --git a/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_top/dml2_top_optimization.c b/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_top/dml2_top_optimization.c deleted file mode 100644 index d0e026d981b503eada1e74f6de3fc43f5d7529fa..0000000000000000000000000000000000000000 --- a/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_top/dml2_top_optimization.c +++ /dev/null @@ -1,307 +0,0 @@ -// SPDX-License-Identifier: MIT -// -// Copyright 2024 Advanced Micro Devices, Inc. - -#include "dml2_top_optimization.h" -#include "dml2_internal_shared_types.h" -#include "dml_top_mcache.h" - -static void copy_display_configuration_with_meta(struct display_configuation_with_meta *dst, const struct display_configuation_with_meta *src) -{ - memcpy(dst, src, sizeof(struct display_configuation_with_meta)); -} - -bool dml2_top_optimization_init_function_min_clk_for_latency(const struct optimization_init_function_params *params) -{ - struct dml2_optimization_stage1_state *state = ¶ms->display_config->stage1; - - state->performed = true; - - return true; -} - -bool dml2_top_optimization_test_function_min_clk_for_latency(const struct optimization_test_function_params *params) -{ - struct dml2_optimization_stage1_state *state = ¶ms->display_config->stage1; - - return state->min_clk_index_for_latency == 0; -} - -bool dml2_top_optimization_optimize_function_min_clk_for_latency(const struct optimization_optimize_function_params *params) -{ - bool result = false; - - if (params->display_config->stage1.min_clk_index_for_latency > 0) { - copy_display_configuration_with_meta(params->optimized_display_config, params->display_config); - params->optimized_display_config->stage1.min_clk_index_for_latency--; - result = true; - } - - return result; -} - -bool dml2_top_optimization_test_function_mcache(const struct optimization_test_function_params *params) -{ - struct dml2_optimization_test_function_locals *l = params->locals; - bool mcache_success = false; - bool result = false; - - memset(l, 0, sizeof(struct dml2_optimization_test_function_locals)); - - l->test_mcache.calc_mcache_count_params.dml2_instance = params->dml; - l->test_mcache.calc_mcache_count_params.display_config = ¶ms->display_config->display_config; - l->test_mcache.calc_mcache_count_params.mcache_allocations = params->display_config->stage2.mcache_allocations; - - result = dml2_top_mcache_calc_mcache_count_and_offsets(&l->test_mcache.calc_mcache_count_params); // use core to get the basic mcache_allocations - - if (result) { - l->test_mcache.assign_global_mcache_ids_params.allocations = params->display_config->stage2.mcache_allocations; - l->test_mcache.assign_global_mcache_ids_params.num_allocations = params->display_config->display_config.num_planes; - - dml2_top_mcache_assign_global_mcache_ids(&l->test_mcache.assign_global_mcache_ids_params); - - l->test_mcache.validate_admissibility_params.dml2_instance = params->dml; - l->test_mcache.validate_admissibility_params.display_cfg = ¶ms->display_config->display_config; - l->test_mcache.validate_admissibility_params.mcache_allocations = params->display_config->stage2.mcache_allocations; - l->test_mcache.validate_admissibility_params.cfg_support_info = ¶ms->display_config->mode_support_result.cfg_support_info; - - mcache_success = dml2_top_mcache_validate_admissability(&l->test_mcache.validate_admissibility_params); // also find the shift to make mcache allocation works - - memcpy(params->display_config->stage2.per_plane_mcache_support, l->test_mcache.validate_admissibility_params.per_plane_status, sizeof(bool) * DML2_MAX_PLANES); - } - - return mcache_success; -} - -bool dml2_top_optimization_optimize_function_mcache(const struct optimization_optimize_function_params *params) -{ - struct dml2_optimization_optimize_function_locals *l = params->locals; - bool optimize_success = false; - - if (params->last_candidate_supported == false) - return false; - - copy_display_configuration_with_meta(params->optimized_display_config, params->display_config); - - l->optimize_mcache.optimize_mcache_params.instance = ¶ms->dml->pmo_instance; - l->optimize_mcache.optimize_mcache_params.dcc_mcache_supported = params->display_config->stage2.per_plane_mcache_support; - l->optimize_mcache.optimize_mcache_params.display_config = ¶ms->display_config->display_config; - l->optimize_mcache.optimize_mcache_params.optimized_display_cfg = ¶ms->optimized_display_config->display_config; - l->optimize_mcache.optimize_mcache_params.cfg_support_info = ¶ms->optimized_display_config->mode_support_result.cfg_support_info; - - optimize_success = params->dml->pmo_instance.optimize_dcc_mcache(&l->optimize_mcache.optimize_mcache_params); - - return optimize_success; -} - -bool dml2_top_optimization_init_function_vmin(const struct optimization_init_function_params *params) -{ - struct dml2_optimization_init_function_locals *l = params->locals; - - l->vmin.init_params.instance = ¶ms->dml->pmo_instance; - l->vmin.init_params.base_display_config = params->display_config; - return params->dml->pmo_instance.init_for_vmin(&l->vmin.init_params); -} - -bool dml2_top_optimization_test_function_vmin(const struct optimization_test_function_params *params) -{ - struct dml2_optimization_test_function_locals *l = params->locals; - - l->test_vmin.pmo_test_vmin_params.instance = ¶ms->dml->pmo_instance; - l->test_vmin.pmo_test_vmin_params.display_config = params->display_config; - l->test_vmin.pmo_test_vmin_params.vmin_limits = ¶ms->dml->soc_bbox.vmin_limit; - return params->dml->pmo_instance.test_for_vmin(&l->test_vmin.pmo_test_vmin_params); -} - -bool dml2_top_optimization_optimize_function_vmin(const struct optimization_optimize_function_params *params) -{ - struct dml2_optimization_optimize_function_locals *l = params->locals; - - if (params->last_candidate_supported == false) - return false; - - l->optimize_vmin.pmo_optimize_vmin_params.instance = ¶ms->dml->pmo_instance; - l->optimize_vmin.pmo_optimize_vmin_params.base_display_config = params->display_config; - l->optimize_vmin.pmo_optimize_vmin_params.optimized_display_config = params->optimized_display_config; - return params->dml->pmo_instance.optimize_for_vmin(&l->optimize_vmin.pmo_optimize_vmin_params); -} - -bool dml2_top_optimization_perform_optimization_phase(struct dml2_optimization_phase_locals *l, const struct optimization_phase_params *params) -{ - bool test_passed = false; - bool optimize_succeeded = true; - bool candidate_validation_passed = true; - struct optimization_init_function_params init_params = { 0 }; - struct optimization_test_function_params test_params = { 0 }; - struct optimization_optimize_function_params optimize_params = { 0 }; - - if (!params->dml || - !params->optimize_function || - !params->test_function || - !params->display_config || - !params->optimized_display_config) - return false; - - copy_display_configuration_with_meta(&l->cur_candidate_display_cfg, params->display_config); - - init_params.locals = &l->init_function_locals; - init_params.dml = params->dml; - init_params.display_config = &l->cur_candidate_display_cfg; - - if (params->init_function && !params->init_function(&init_params)) - return false; - - test_params.locals = &l->test_function_locals; - test_params.dml = params->dml; - test_params.display_config = &l->cur_candidate_display_cfg; - - test_passed = params->test_function(&test_params); - - while (!test_passed && optimize_succeeded) { - memset(&optimize_params, 0, sizeof(struct optimization_optimize_function_params)); - - optimize_params.locals = &l->optimize_function_locals; - optimize_params.dml = params->dml; - optimize_params.display_config = &l->cur_candidate_display_cfg; - optimize_params.optimized_display_config = &l->next_candidate_display_cfg; - optimize_params.last_candidate_supported = candidate_validation_passed; - - optimize_succeeded = params->optimize_function(&optimize_params); - - if (optimize_succeeded) { - l->mode_support_params.instance = ¶ms->dml->core_instance; - l->mode_support_params.display_cfg = &l->next_candidate_display_cfg; - l->mode_support_params.min_clk_table = ¶ms->dml->min_clk_table; - - if (l->next_candidate_display_cfg.stage3.performed) - l->mode_support_params.min_clk_index = l->next_candidate_display_cfg.stage3.min_clk_index_for_latency; - else - l->mode_support_params.min_clk_index = l->next_candidate_display_cfg.stage1.min_clk_index_for_latency; - - candidate_validation_passed = params->dml->core_instance.mode_support(&l->mode_support_params); - - l->next_candidate_display_cfg.mode_support_result = l->mode_support_params.mode_support_result; - } - - if (optimize_succeeded && candidate_validation_passed) { - memset(&test_params, 0, sizeof(struct optimization_test_function_params)); - test_params.locals = &l->test_function_locals; - test_params.dml = params->dml; - test_params.display_config = &l->next_candidate_display_cfg; - test_passed = params->test_function(&test_params); - - copy_display_configuration_with_meta(&l->cur_candidate_display_cfg, &l->next_candidate_display_cfg); - - // If optimization is not all or nothing, then store partial progress in output - if (!params->all_or_nothing) - copy_display_configuration_with_meta(params->optimized_display_config, &l->next_candidate_display_cfg); - } - } - - if (test_passed) - copy_display_configuration_with_meta(params->optimized_display_config, &l->cur_candidate_display_cfg); - - return test_passed; -} - -bool dml2_top_optimization_perform_optimization_phase_1(struct dml2_optimization_phase_locals *l, const struct optimization_phase_params *params) -{ - int highest_state, lowest_state, cur_state; - bool supported = false; - - if (!params->dml || - !params->optimize_function || - !params->test_function || - !params->display_config || - !params->optimized_display_config) - return false; - - copy_display_configuration_with_meta(&l->cur_candidate_display_cfg, params->display_config); - highest_state = l->cur_candidate_display_cfg.stage1.min_clk_index_for_latency; - lowest_state = 0; - - while (highest_state > lowest_state) { - cur_state = (highest_state + lowest_state) / 2; - - l->mode_support_params.instance = ¶ms->dml->core_instance; - l->mode_support_params.display_cfg = &l->cur_candidate_display_cfg; - l->mode_support_params.min_clk_table = ¶ms->dml->min_clk_table; - l->mode_support_params.min_clk_index = cur_state; - - supported = params->dml->core_instance.mode_support(&l->mode_support_params); - - if (supported) { - l->cur_candidate_display_cfg.mode_support_result = l->mode_support_params.mode_support_result; - highest_state = cur_state; - } else { - lowest_state = cur_state + 1; - } - } - l->cur_candidate_display_cfg.stage1.min_clk_index_for_latency = lowest_state; - - copy_display_configuration_with_meta(params->optimized_display_config, &l->cur_candidate_display_cfg); - - return true; -} - -bool dml2_top_optimization_init_function_uclk_pstate(const struct optimization_init_function_params *params) -{ - struct dml2_optimization_init_function_locals *l = params->locals; - - l->uclk_pstate.init_params.instance = ¶ms->dml->pmo_instance; - l->uclk_pstate.init_params.base_display_config = params->display_config; - - return params->dml->pmo_instance.init_for_uclk_pstate(&l->uclk_pstate.init_params); -} - -bool dml2_top_optimization_test_function_uclk_pstate(const struct optimization_test_function_params *params) -{ - struct dml2_optimization_test_function_locals *l = params->locals; - - l->uclk_pstate.test_params.instance = ¶ms->dml->pmo_instance; - l->uclk_pstate.test_params.base_display_config = params->display_config; - - return params->dml->pmo_instance.test_for_uclk_pstate(&l->uclk_pstate.test_params); -} - -bool dml2_top_optimization_optimize_function_uclk_pstate(const struct optimization_optimize_function_params *params) -{ - struct dml2_optimization_optimize_function_locals *l = params->locals; - - l->uclk_pstate.optimize_params.instance = ¶ms->dml->pmo_instance; - l->uclk_pstate.optimize_params.base_display_config = params->display_config; - l->uclk_pstate.optimize_params.optimized_display_config = params->optimized_display_config; - l->uclk_pstate.optimize_params.last_candidate_failed = !params->last_candidate_supported; - - return params->dml->pmo_instance.optimize_for_uclk_pstate(&l->uclk_pstate.optimize_params); -} - -bool dml2_top_optimization_init_function_stutter(const struct optimization_init_function_params *params) -{ - struct dml2_optimization_init_function_locals *l = params->locals; - - l->uclk_pstate.init_params.instance = ¶ms->dml->pmo_instance; - l->uclk_pstate.init_params.base_display_config = params->display_config; - - return params->dml->pmo_instance.init_for_stutter(&l->stutter.stutter_params); -} - -bool dml2_top_optimization_test_function_stutter(const struct optimization_test_function_params *params) -{ - struct dml2_optimization_test_function_locals *l = params->locals; - - l->stutter.stutter_params.instance = ¶ms->dml->pmo_instance; - l->stutter.stutter_params.base_display_config = params->display_config; - return params->dml->pmo_instance.test_for_stutter(&l->stutter.stutter_params); -} - -bool dml2_top_optimization_optimize_function_stutter(const struct optimization_optimize_function_params *params) -{ - struct dml2_optimization_optimize_function_locals *l = params->locals; - - l->stutter.stutter_params.instance = ¶ms->dml->pmo_instance; - l->stutter.stutter_params.base_display_config = params->display_config; - l->stutter.stutter_params.optimized_display_config = params->optimized_display_config; - l->stutter.stutter_params.last_candidate_failed = !params->last_candidate_supported; - return params->dml->pmo_instance.optimize_for_stutter(&l->stutter.stutter_params); -} diff --git a/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_top/dml2_top_optimization.h b/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_top/dml2_top_optimization.h deleted file mode 100644 index 9f22ab33eab126d80efe1ee8611bc922f78f47c1..0000000000000000000000000000000000000000 --- a/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_top/dml2_top_optimization.h +++ /dev/null @@ -1,33 +0,0 @@ -// SPDX-License-Identifier: MIT -// -// Copyright 2024 Advanced Micro Devices, Inc. - -#ifndef __DML2_TOP_OPTIMIZATION_H__ -#define __DML2_TOP_OPTIMIZATION_H__ - -#include "dml2_external_lib_deps.h" -#include "dml2_internal_shared_types.h" - -bool dml2_top_optimization_perform_optimization_phase(struct dml2_optimization_phase_locals *l, const struct optimization_phase_params *params); -bool dml2_top_optimization_perform_optimization_phase_1(struct dml2_optimization_phase_locals *l, const struct optimization_phase_params *params); - -bool dml2_top_optimization_init_function_min_clk_for_latency(const struct optimization_init_function_params *params); -bool dml2_top_optimization_test_function_min_clk_for_latency(const struct optimization_test_function_params *params); -bool dml2_top_optimization_optimize_function_min_clk_for_latency(const struct optimization_optimize_function_params *params); - -bool dml2_top_optimization_test_function_mcache(const struct optimization_test_function_params *params); -bool dml2_top_optimization_optimize_function_mcache(const struct optimization_optimize_function_params *params); - -bool dml2_top_optimization_init_function_uclk_pstate(const struct optimization_init_function_params *params); -bool dml2_top_optimization_test_function_uclk_pstate(const struct optimization_test_function_params *params); -bool dml2_top_optimization_optimize_function_uclk_pstate(const struct optimization_optimize_function_params *params); - -bool dml2_top_optimization_init_function_vmin(const struct optimization_init_function_params *params); -bool dml2_top_optimization_test_function_vmin(const struct optimization_test_function_params *params); -bool dml2_top_optimization_optimize_function_vmin(const struct optimization_optimize_function_params *params); - -bool dml2_top_optimization_init_function_stutter(const struct optimization_init_function_params *params); -bool dml2_top_optimization_test_function_stutter(const struct optimization_test_function_params *params); -bool dml2_top_optimization_optimize_function_stutter(const struct optimization_optimize_function_params *params); - -#endif diff --git a/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_top/dml2_top_soc15.c b/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_top/dml2_top_soc15.c new file mode 100644 index 0000000000000000000000000000000000000000..b39029c0e56f1db6ec265039564f533991193420 --- /dev/null +++ b/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_top/dml2_top_soc15.c @@ -0,0 +1,1177 @@ +// SPDX-License-Identifier: MIT +// +// Copyright 2024 Advanced Micro Devices, Inc. + +#include "dml2_top_soc15.h" +#include "dml2_mcg_factory.h" +#include "dml2_dpmm_factory.h" +#include "dml2_core_factory.h" +#include "dml2_pmo_factory.h" +#include "lib_float_math.h" +#include "dml2_debug.h" +static void setup_unoptimized_display_config_with_meta(const struct dml2_instance *dml, struct display_configuation_with_meta *out, const struct dml2_display_cfg *display_config) +{ + memcpy(&out->display_config, display_config, sizeof(struct dml2_display_cfg)); + out->stage1.min_clk_index_for_latency = dml->min_clk_table.dram_bw_table.num_entries - 1; //dml->min_clk_table.clean_me_up.soc_bb.num_states - 1; +} + +static void setup_speculative_display_config_with_meta(const struct dml2_instance *dml, struct display_configuation_with_meta *out, const struct dml2_display_cfg *display_config) +{ + memcpy(&out->display_config, display_config, sizeof(struct dml2_display_cfg)); + out->stage1.min_clk_index_for_latency = 0; +} + +static void copy_display_configuration_with_meta(struct display_configuation_with_meta *dst, const struct display_configuation_with_meta *src) +{ + memcpy(dst, src, sizeof(struct display_configuation_with_meta)); +} + +static bool dml2_top_optimization_init_function_min_clk_for_latency(const struct optimization_init_function_params *params) +{ + struct dml2_optimization_stage1_state *state = ¶ms->display_config->stage1; + + state->performed = true; + + return true; +} + +static bool dml2_top_optimization_test_function_min_clk_for_latency(const struct optimization_test_function_params *params) +{ + struct dml2_optimization_stage1_state *state = ¶ms->display_config->stage1; + + return state->min_clk_index_for_latency == 0; +} + +static bool dml2_top_optimization_optimize_function_min_clk_for_latency(const struct optimization_optimize_function_params *params) +{ + bool result = false; + + if (params->display_config->stage1.min_clk_index_for_latency > 0) { + copy_display_configuration_with_meta(params->optimized_display_config, params->display_config); + params->optimized_display_config->stage1.min_clk_index_for_latency--; + result = true; + } + + return result; +} + +static bool dml2_top_optimization_test_function_mcache(const struct optimization_test_function_params *params) +{ + struct dml2_optimization_test_function_locals *l = params->locals; + bool mcache_success = false; + bool result = false; + + memset(l, 0, sizeof(struct dml2_optimization_test_function_locals)); + + l->test_mcache.calc_mcache_count_params.dml2_instance = params->dml; + l->test_mcache.calc_mcache_count_params.display_config = ¶ms->display_config->display_config; + l->test_mcache.calc_mcache_count_params.mcache_allocations = params->display_config->stage2.mcache_allocations; + + result = dml2_top_mcache_calc_mcache_count_and_offsets(&l->test_mcache.calc_mcache_count_params); // use core to get the basic mcache_allocations + + if (result) { + l->test_mcache.assign_global_mcache_ids_params.allocations = params->display_config->stage2.mcache_allocations; + l->test_mcache.assign_global_mcache_ids_params.num_allocations = params->display_config->display_config.num_planes; + + dml2_top_mcache_assign_global_mcache_ids(&l->test_mcache.assign_global_mcache_ids_params); + + l->test_mcache.validate_admissibility_params.dml2_instance = params->dml; + l->test_mcache.validate_admissibility_params.display_cfg = ¶ms->display_config->display_config; + l->test_mcache.validate_admissibility_params.mcache_allocations = params->display_config->stage2.mcache_allocations; + l->test_mcache.validate_admissibility_params.cfg_support_info = ¶ms->display_config->mode_support_result.cfg_support_info; + + mcache_success = dml2_top_mcache_validate_admissability(&l->test_mcache.validate_admissibility_params); // also find the shift to make mcache allocation works + + memcpy(params->display_config->stage2.per_plane_mcache_support, l->test_mcache.validate_admissibility_params.per_plane_status, sizeof(bool) * DML2_MAX_PLANES); + } + + return mcache_success; +} + +static bool dml2_top_optimization_optimize_function_mcache(const struct optimization_optimize_function_params *params) +{ + struct dml2_optimization_optimize_function_locals *l = params->locals; + bool optimize_success = false; + + if (params->last_candidate_supported == false) + return false; + + copy_display_configuration_with_meta(params->optimized_display_config, params->display_config); + + l->optimize_mcache.optimize_mcache_params.instance = ¶ms->dml->pmo_instance; + l->optimize_mcache.optimize_mcache_params.dcc_mcache_supported = params->display_config->stage2.per_plane_mcache_support; + l->optimize_mcache.optimize_mcache_params.display_config = ¶ms->display_config->display_config; + l->optimize_mcache.optimize_mcache_params.optimized_display_cfg = ¶ms->optimized_display_config->display_config; + l->optimize_mcache.optimize_mcache_params.cfg_support_info = ¶ms->optimized_display_config->mode_support_result.cfg_support_info; + + optimize_success = params->dml->pmo_instance.optimize_dcc_mcache(&l->optimize_mcache.optimize_mcache_params); + + return optimize_success; +} + +static bool dml2_top_optimization_init_function_vmin(const struct optimization_init_function_params *params) +{ + struct dml2_optimization_init_function_locals *l = params->locals; + + l->vmin.init_params.instance = ¶ms->dml->pmo_instance; + l->vmin.init_params.base_display_config = params->display_config; + return params->dml->pmo_instance.init_for_vmin(&l->vmin.init_params); +} + +static bool dml2_top_optimization_test_function_vmin(const struct optimization_test_function_params *params) +{ + struct dml2_optimization_test_function_locals *l = params->locals; + + l->test_vmin.pmo_test_vmin_params.instance = ¶ms->dml->pmo_instance; + l->test_vmin.pmo_test_vmin_params.display_config = params->display_config; + l->test_vmin.pmo_test_vmin_params.vmin_limits = ¶ms->dml->soc_bbox.vmin_limit; + return params->dml->pmo_instance.test_for_vmin(&l->test_vmin.pmo_test_vmin_params); +} + +static bool dml2_top_optimization_optimize_function_vmin(const struct optimization_optimize_function_params *params) +{ + struct dml2_optimization_optimize_function_locals *l = params->locals; + + if (params->last_candidate_supported == false) + return false; + + l->optimize_vmin.pmo_optimize_vmin_params.instance = ¶ms->dml->pmo_instance; + l->optimize_vmin.pmo_optimize_vmin_params.base_display_config = params->display_config; + l->optimize_vmin.pmo_optimize_vmin_params.optimized_display_config = params->optimized_display_config; + return params->dml->pmo_instance.optimize_for_vmin(&l->optimize_vmin.pmo_optimize_vmin_params); +} + +static bool dml2_top_optimization_init_function_uclk_pstate(const struct optimization_init_function_params *params) +{ + struct dml2_optimization_init_function_locals *l = params->locals; + + l->uclk_pstate.init_params.instance = ¶ms->dml->pmo_instance; + l->uclk_pstate.init_params.base_display_config = params->display_config; + + return params->dml->pmo_instance.init_for_uclk_pstate(&l->uclk_pstate.init_params); +} + +static bool dml2_top_optimization_test_function_uclk_pstate(const struct optimization_test_function_params *params) +{ + struct dml2_optimization_test_function_locals *l = params->locals; + + l->uclk_pstate.test_params.instance = ¶ms->dml->pmo_instance; + l->uclk_pstate.test_params.base_display_config = params->display_config; + + return params->dml->pmo_instance.test_for_uclk_pstate(&l->uclk_pstate.test_params); +} + +static bool dml2_top_optimization_optimize_function_uclk_pstate(const struct optimization_optimize_function_params *params) +{ + struct dml2_optimization_optimize_function_locals *l = params->locals; + + l->uclk_pstate.optimize_params.instance = ¶ms->dml->pmo_instance; + l->uclk_pstate.optimize_params.base_display_config = params->display_config; + l->uclk_pstate.optimize_params.optimized_display_config = params->optimized_display_config; + l->uclk_pstate.optimize_params.last_candidate_failed = !params->last_candidate_supported; + + return params->dml->pmo_instance.optimize_for_uclk_pstate(&l->uclk_pstate.optimize_params); +} + +static bool dml2_top_optimization_init_function_stutter(const struct optimization_init_function_params *params) +{ + struct dml2_optimization_init_function_locals *l = params->locals; + + l->uclk_pstate.init_params.instance = ¶ms->dml->pmo_instance; + l->uclk_pstate.init_params.base_display_config = params->display_config; + + return params->dml->pmo_instance.init_for_stutter(&l->stutter.stutter_params); +} + +static bool dml2_top_optimization_test_function_stutter(const struct optimization_test_function_params *params) +{ + struct dml2_optimization_test_function_locals *l = params->locals; + + l->stutter.stutter_params.instance = ¶ms->dml->pmo_instance; + l->stutter.stutter_params.base_display_config = params->display_config; + return params->dml->pmo_instance.test_for_stutter(&l->stutter.stutter_params); +} + +static bool dml2_top_optimization_optimize_function_stutter(const struct optimization_optimize_function_params *params) +{ + struct dml2_optimization_optimize_function_locals *l = params->locals; + + l->stutter.stutter_params.instance = ¶ms->dml->pmo_instance; + l->stutter.stutter_params.base_display_config = params->display_config; + l->stutter.stutter_params.optimized_display_config = params->optimized_display_config; + l->stutter.stutter_params.last_candidate_failed = !params->last_candidate_supported; + return params->dml->pmo_instance.optimize_for_stutter(&l->stutter.stutter_params); +} + +static bool dml2_top_optimization_perform_optimization_phase(struct dml2_optimization_phase_locals *l, const struct optimization_phase_params *params) +{ + bool test_passed = false; + bool optimize_succeeded = true; + bool candidate_validation_passed = true; + struct optimization_init_function_params init_params = { 0 }; + struct optimization_test_function_params test_params = { 0 }; + struct optimization_optimize_function_params optimize_params = { 0 }; + + if (!params->dml || + !params->optimize_function || + !params->test_function || + !params->display_config || + !params->optimized_display_config) + return false; + + copy_display_configuration_with_meta(&l->cur_candidate_display_cfg, params->display_config); + + init_params.locals = &l->init_function_locals; + init_params.dml = params->dml; + init_params.display_config = &l->cur_candidate_display_cfg; + + if (params->init_function && !params->init_function(&init_params)) + return false; + + test_params.locals = &l->test_function_locals; + test_params.dml = params->dml; + test_params.display_config = &l->cur_candidate_display_cfg; + + test_passed = params->test_function(&test_params); + + while (!test_passed && optimize_succeeded) { + memset(&optimize_params, 0, sizeof(struct optimization_optimize_function_params)); + + optimize_params.locals = &l->optimize_function_locals; + optimize_params.dml = params->dml; + optimize_params.display_config = &l->cur_candidate_display_cfg; + optimize_params.optimized_display_config = &l->next_candidate_display_cfg; + optimize_params.last_candidate_supported = candidate_validation_passed; + + optimize_succeeded = params->optimize_function(&optimize_params); + + if (optimize_succeeded) { + l->mode_support_params.instance = ¶ms->dml->core_instance; + l->mode_support_params.display_cfg = &l->next_candidate_display_cfg; + l->mode_support_params.min_clk_table = ¶ms->dml->min_clk_table; + + if (l->next_candidate_display_cfg.stage3.performed) + l->mode_support_params.min_clk_index = l->next_candidate_display_cfg.stage3.min_clk_index_for_latency; + else + l->mode_support_params.min_clk_index = l->next_candidate_display_cfg.stage1.min_clk_index_for_latency; + candidate_validation_passed = params->dml->core_instance.mode_support(&l->mode_support_params); + l->next_candidate_display_cfg.mode_support_result = l->mode_support_params.mode_support_result; + } + + if (optimize_succeeded && candidate_validation_passed) { + memset(&test_params, 0, sizeof(struct optimization_test_function_params)); + test_params.locals = &l->test_function_locals; + test_params.dml = params->dml; + test_params.display_config = &l->next_candidate_display_cfg; + test_passed = params->test_function(&test_params); + + copy_display_configuration_with_meta(&l->cur_candidate_display_cfg, &l->next_candidate_display_cfg); + + // If optimization is not all or nothing, then store partial progress in output + if (!params->all_or_nothing) + copy_display_configuration_with_meta(params->optimized_display_config, &l->next_candidate_display_cfg); + } + } + + if (test_passed) + copy_display_configuration_with_meta(params->optimized_display_config, &l->cur_candidate_display_cfg); + + return test_passed; +} + +static bool dml2_top_optimization_perform_optimization_phase_1(struct dml2_optimization_phase_locals *l, const struct optimization_phase_params *params) +{ + int highest_state, lowest_state, cur_state; + bool supported = false; + + if (!params->dml || + !params->optimize_function || + !params->test_function || + !params->display_config || + !params->optimized_display_config) + return false; + + copy_display_configuration_with_meta(&l->cur_candidate_display_cfg, params->display_config); + highest_state = l->cur_candidate_display_cfg.stage1.min_clk_index_for_latency; + lowest_state = 0; + + while (highest_state > lowest_state) { + cur_state = (highest_state + lowest_state) / 2; + + l->mode_support_params.instance = ¶ms->dml->core_instance; + l->mode_support_params.display_cfg = &l->cur_candidate_display_cfg; + l->mode_support_params.min_clk_table = ¶ms->dml->min_clk_table; + l->mode_support_params.min_clk_index = cur_state; + supported = params->dml->core_instance.mode_support(&l->mode_support_params); + + if (supported) { + l->cur_candidate_display_cfg.mode_support_result = l->mode_support_params.mode_support_result; + highest_state = cur_state; + } else { + lowest_state = cur_state + 1; + } + } + l->cur_candidate_display_cfg.stage1.min_clk_index_for_latency = lowest_state; + + copy_display_configuration_with_meta(params->optimized_display_config, &l->cur_candidate_display_cfg); + + return true; +} + +/* +* Takes an input set of mcache boundaries and finds the appropriate setting of cache programming. +* Returns true if a valid set of programming can be made, and false otherwise. "Valid" means +* that the horizontal viewport does not span more than 2 cache slices. +* +* It optionally also can apply a constant shift to all the cache boundaries. +*/ +static const uint32_t MCACHE_ID_UNASSIGNED = 0xF; +static const uint32_t SPLIT_LOCATION_UNDEFINED = 0xFFFF; + +static bool calculate_first_second_splitting(const int *mcache_boundaries, int num_boundaries, int shift, + int pipe_h_vp_start, int pipe_h_vp_end, int *first_offset, int *second_offset) +{ + const int MAX_VP = 0xFFFFFF; + int left_cache_id; + int right_cache_id; + int range_start; + int range_end; + bool success = false; + + if (num_boundaries <= 1) { + if (first_offset && second_offset) { + *first_offset = 0; + *second_offset = -1; + } + success = true; + return success; + } else { + range_start = 0; + for (left_cache_id = 0; left_cache_id < num_boundaries; left_cache_id++) { + range_end = mcache_boundaries[left_cache_id] - shift - 1; + + if (range_start <= pipe_h_vp_start && pipe_h_vp_start <= range_end) + break; + + range_start = range_end + 1; + } + + range_end = MAX_VP; + for (right_cache_id = num_boundaries - 1; right_cache_id >= -1; right_cache_id--) { + if (right_cache_id >= 0) + range_start = mcache_boundaries[right_cache_id] - shift; + else + range_start = 0; + + if (range_start <= pipe_h_vp_end && pipe_h_vp_end <= range_end) { + break; + } + range_end = range_start - 1; + } + right_cache_id = (right_cache_id + 1) % num_boundaries; + + if (right_cache_id == left_cache_id) { + if (first_offset && second_offset) { + *first_offset = left_cache_id; + *second_offset = -1; + } + success = true; + } else if (right_cache_id == (left_cache_id + 1) % num_boundaries) { + if (first_offset && second_offset) { + *first_offset = left_cache_id; + *second_offset = right_cache_id; + } + success = true; + } + } + + return success; +} + +/* +* For a given set of pipe start/end x positions, checks to see it can support the input mcache splitting. +* It also attempts to "optimize" by finding a shift if the default 0 shift does not work. +*/ +static bool find_shift_for_valid_cache_id_assignment(int *mcache_boundaries, unsigned int num_boundaries, + int *pipe_vp_startx, int *pipe_vp_endx, unsigned int pipe_count, int shift_granularity, int *shift) +{ + int max_shift = 0xFFFF; + unsigned int pipe_index; + unsigned int i, slice_width; + bool success = false; + + for (i = 0; i < num_boundaries; i++) { + if (i == 0) + slice_width = mcache_boundaries[i]; + else + slice_width = mcache_boundaries[i] - mcache_boundaries[i - 1]; + + if (max_shift > (int)slice_width) { + max_shift = slice_width; + } + } + + for (*shift = 0; *shift <= max_shift; *shift += shift_granularity) { + success = true; + for (pipe_index = 0; pipe_index < pipe_count; pipe_index++) { + if (!calculate_first_second_splitting(mcache_boundaries, num_boundaries, *shift, + pipe_vp_startx[pipe_index], pipe_vp_endx[pipe_index], 0, 0)) { + success = false; + break; + } + } + if (success) + break; + } + + return success; +} + +/* +* Counts the number of elements inside input array within the given span length. +* Formally, what is the size of the largest subset of the array where the largest and smallest element +* differ no more than the span. +*/ +static unsigned int count_elements_in_span(int *array, unsigned int array_size, unsigned int span) +{ + unsigned int i; + unsigned int span_start_value; + unsigned int span_start_index; + unsigned int greatest_element_count; + + if (array_size == 0) + return 1; + + if (span == 0) + return array_size > 0 ? 1 : 0; + + span_start_value = 0; + span_start_index = 0; + greatest_element_count = 0; + + while (span_start_index < array_size) { + for (i = span_start_index; i < array_size; i++) { + if (array[i] - span_start_value <= span) { + if (i - span_start_index + 1 > greatest_element_count) { + greatest_element_count = i - span_start_index + 1; + } + } else + break; + } + + span_start_index++; + + if (span_start_index < array_size) { + span_start_value = array[span_start_index - 1] + 1; + } + } + + return greatest_element_count; +} + +static bool calculate_h_split_for_scaling_transform(int full_vp_width, int h_active, int num_pipes, + enum dml2_scaling_transform scaling_transform, int *pipe_vp_x_start, int *pipe_vp_x_end) +{ + int i, slice_width; + const char MAX_SCL_VP_OVERLAP = 3; + bool success = false; + + switch (scaling_transform) { + case dml2_scaling_transform_centered: + case dml2_scaling_transform_aspect_ratio: + case dml2_scaling_transform_fullscreen: + slice_width = full_vp_width / num_pipes; + for (i = 0; i < num_pipes; i++) { + pipe_vp_x_start[i] = i * slice_width; + pipe_vp_x_end[i] = (i + 1) * slice_width - 1; + + if (pipe_vp_x_start[i] < MAX_SCL_VP_OVERLAP) + pipe_vp_x_start[i] = 0; + else + pipe_vp_x_start[i] -= MAX_SCL_VP_OVERLAP; + + if (pipe_vp_x_end[i] > full_vp_width - MAX_SCL_VP_OVERLAP - 1) + pipe_vp_x_end[i] = full_vp_width - 1; + else + pipe_vp_x_end[i] += MAX_SCL_VP_OVERLAP; + } + break; + case dml2_scaling_transform_explicit: + default: + success = false; + break; + } + + return success; +} + +bool dml2_top_mcache_validate_admissability(struct top_mcache_validate_admissability_in_out *params) +{ + struct dml2_instance *dml = (struct dml2_instance *)params->dml2_instance; + struct dml2_top_mcache_validate_admissability_locals *l = &dml->scratch.mcache_validate_admissability_locals; + + const int MAX_PIXEL_OVERLAP = 6; + int max_per_pipe_vp_p0 = 0; + int max_per_pipe_vp_p1 = 0; + int temp, p0shift, p1shift; + unsigned int plane_index = 0; + unsigned int i; + unsigned int odm_combine_factor; + unsigned int mpc_combine_factor; + unsigned int num_dpps; + unsigned int num_boundaries; + enum dml2_scaling_transform scaling_transform; + const struct dml2_plane_parameters *plane; + const struct dml2_stream_parameters *stream; + + bool p0pass = false; + bool p1pass = false; + bool all_pass = true; + + for (plane_index = 0; plane_index < params->display_cfg->num_planes; plane_index++) { + if (!params->display_cfg->plane_descriptors[plane_index].surface.dcc.enable) + continue; + + plane = ¶ms->display_cfg->plane_descriptors[plane_index]; + stream = ¶ms->display_cfg->stream_descriptors[plane->stream_index]; + + num_dpps = odm_combine_factor = params->cfg_support_info->stream_support_info[plane->stream_index].odms_used; + + if (odm_combine_factor == 1) + num_dpps = mpc_combine_factor = (unsigned int)params->cfg_support_info->plane_support_info[plane_index].dpps_used; + else + mpc_combine_factor = 1; + + if (odm_combine_factor > 1) { + max_per_pipe_vp_p0 = plane->surface.plane0.width; + temp = (unsigned int)math_ceil(plane->composition.scaler_info.plane0.h_ratio * stream->timing.h_active / odm_combine_factor); + if (temp < max_per_pipe_vp_p0) + max_per_pipe_vp_p0 = temp; + + max_per_pipe_vp_p1 = plane->surface.plane1.width; + temp = (unsigned int)math_ceil(plane->composition.scaler_info.plane1.h_ratio * stream->timing.h_active / odm_combine_factor); + + if (temp < max_per_pipe_vp_p1) + max_per_pipe_vp_p1 = temp; + } else { + max_per_pipe_vp_p0 = plane->surface.plane0.width / mpc_combine_factor; + max_per_pipe_vp_p1 = plane->surface.plane1.width / mpc_combine_factor; + } + + max_per_pipe_vp_p0 += 2 * MAX_PIXEL_OVERLAP; + max_per_pipe_vp_p1 += MAX_PIXEL_OVERLAP; + + p0shift = 0; + p1shift = 0; + + // The last element in the unshifted boundary array will always be the first pixel outside the + // plane, which means theres no mcache associated with it, so -1 + num_boundaries = params->mcache_allocations[plane_index].num_mcaches_plane0 == 0 ? 0 : params->mcache_allocations[plane_index].num_mcaches_plane0 - 1; + if ((count_elements_in_span(params->mcache_allocations[plane_index].mcache_x_offsets_plane0, + num_boundaries, max_per_pipe_vp_p0) <= 1) && (num_boundaries <= num_dpps)) { + p0pass = true; + } + num_boundaries = params->mcache_allocations[plane_index].num_mcaches_plane1 == 0 ? 0 : params->mcache_allocations[plane_index].num_mcaches_plane1 - 1; + if ((count_elements_in_span(params->mcache_allocations[plane_index].mcache_x_offsets_plane1, + num_boundaries, max_per_pipe_vp_p1) <= 1) && (num_boundaries <= num_dpps)) { + p1pass = true; + } + + if (!p0pass || !p1pass) { + if (odm_combine_factor > 1) { + num_dpps = odm_combine_factor; + scaling_transform = plane->composition.scaling_transform; + } else { + num_dpps = mpc_combine_factor; + scaling_transform = dml2_scaling_transform_fullscreen; + } + + if (!p0pass) { + if (plane->composition.viewport.stationary) { + calculate_h_split_for_scaling_transform(plane->surface.plane0.width, + stream->timing.h_active, num_dpps, scaling_transform, + &l->plane0.pipe_vp_startx[plane_index], &l->plane0.pipe_vp_endx[plane_index]); + p0pass = find_shift_for_valid_cache_id_assignment(params->mcache_allocations[plane_index].mcache_x_offsets_plane0, + params->mcache_allocations[plane_index].num_mcaches_plane0, + &l->plane0.pipe_vp_startx[plane_index], &l->plane0.pipe_vp_endx[plane_index], num_dpps, + params->mcache_allocations[plane_index].shift_granularity.p0, &p0shift); + } + } + if (!p1pass) { + if (plane->composition.viewport.stationary) { + calculate_h_split_for_scaling_transform(plane->surface.plane1.width, + stream->timing.h_active, num_dpps, scaling_transform, + &l->plane0.pipe_vp_startx[plane_index], &l->plane0.pipe_vp_endx[plane_index]); + p1pass = find_shift_for_valid_cache_id_assignment(params->mcache_allocations[plane_index].mcache_x_offsets_plane1, + params->mcache_allocations[plane_index].num_mcaches_plane1, + &l->plane1.pipe_vp_startx[plane_index], &l->plane1.pipe_vp_endx[plane_index], num_dpps, + params->mcache_allocations[plane_index].shift_granularity.p1, &p1shift); + } + } + } + + if (p0pass && p1pass) { + for (i = 0; i < params->mcache_allocations[plane_index].num_mcaches_plane0; i++) { + params->mcache_allocations[plane_index].mcache_x_offsets_plane0[i] -= p0shift; + } + for (i = 0; i < params->mcache_allocations[plane_index].num_mcaches_plane1; i++) { + params->mcache_allocations[plane_index].mcache_x_offsets_plane1[i] -= p1shift; + } + } + + params->per_plane_status[plane_index] = p0pass && p1pass; + all_pass &= p0pass && p1pass; + } + + return all_pass; +} + +static void reset_mcache_allocations(struct dml2_hubp_pipe_mcache_regs *per_plane_pipe_mcache_regs) +{ + // Initialize all entries to special valid MCache ID and special valid split coordinate + per_plane_pipe_mcache_regs->main.p0.mcache_id_first = MCACHE_ID_UNASSIGNED; + per_plane_pipe_mcache_regs->main.p0.mcache_id_second = MCACHE_ID_UNASSIGNED; + per_plane_pipe_mcache_regs->main.p0.split_location = SPLIT_LOCATION_UNDEFINED; + + per_plane_pipe_mcache_regs->mall.p0.mcache_id_first = MCACHE_ID_UNASSIGNED; + per_plane_pipe_mcache_regs->mall.p0.mcache_id_second = MCACHE_ID_UNASSIGNED; + per_plane_pipe_mcache_regs->mall.p0.split_location = SPLIT_LOCATION_UNDEFINED; + + per_plane_pipe_mcache_regs->main.p1.mcache_id_first = MCACHE_ID_UNASSIGNED; + per_plane_pipe_mcache_regs->main.p1.mcache_id_second = MCACHE_ID_UNASSIGNED; + per_plane_pipe_mcache_regs->main.p1.split_location = SPLIT_LOCATION_UNDEFINED; + + per_plane_pipe_mcache_regs->mall.p1.mcache_id_first = MCACHE_ID_UNASSIGNED; + per_plane_pipe_mcache_regs->mall.p1.mcache_id_second = MCACHE_ID_UNASSIGNED; + per_plane_pipe_mcache_regs->mall.p1.split_location = SPLIT_LOCATION_UNDEFINED; +} + +void dml2_top_mcache_assign_global_mcache_ids(struct top_mcache_assign_global_mcache_ids_in_out *params) +{ + int i; + unsigned int j; + int next_unused_cache_id = 0; + + for (i = 0; i < params->num_allocations; i++) { + if (!params->allocations[i].valid) + continue; + + for (j = 0; j < params->allocations[i].num_mcaches_plane0; j++) { + params->allocations[i].global_mcache_ids_plane0[j] = next_unused_cache_id++; + } + for (j = 0; j < params->allocations[i].num_mcaches_plane1; j++) { + params->allocations[i].global_mcache_ids_plane1[j] = next_unused_cache_id++; + } + + // The "psuedo-last" slice is always wrapped around + params->allocations[i].global_mcache_ids_plane0[params->allocations[i].num_mcaches_plane0] = + params->allocations[i].global_mcache_ids_plane0[0]; + params->allocations[i].global_mcache_ids_plane1[params->allocations[i].num_mcaches_plane1] = + params->allocations[i].global_mcache_ids_plane1[0]; + + // If we need dedicated caches for mall requesting, then we assign them here. + if (params->allocations[i].requires_dedicated_mall_mcache) { + for (j = 0; j < params->allocations[i].num_mcaches_plane0; j++) { + params->allocations[i].global_mcache_ids_mall_plane0[j] = next_unused_cache_id++; + } + for (j = 0; j < params->allocations[i].num_mcaches_plane1; j++) { + params->allocations[i].global_mcache_ids_mall_plane1[j] = next_unused_cache_id++; + } + + // The "psuedo-last" slice is always wrapped around + params->allocations[i].global_mcache_ids_mall_plane0[params->allocations[i].num_mcaches_plane0] = + params->allocations[i].global_mcache_ids_mall_plane0[0]; + params->allocations[i].global_mcache_ids_mall_plane1[params->allocations[i].num_mcaches_plane1] = + params->allocations[i].global_mcache_ids_mall_plane1[0]; + } + + // If P0 and P1 are sharing caches, then it means the largest mcache IDs for p0 and p1 can be the same + // since mcache IDs are always ascending, then it means the largest mcacheID of p1 should be the + // largest mcacheID of P0 + if (params->allocations[i].num_mcaches_plane0 > 0 && params->allocations[i].num_mcaches_plane1 > 0 && + params->allocations[i].last_slice_sharing.plane0_plane1) { + params->allocations[i].global_mcache_ids_plane1[params->allocations[i].num_mcaches_plane1 - 1] = + params->allocations[i].global_mcache_ids_plane0[params->allocations[i].num_mcaches_plane0 - 1]; + } + + // If we need dedicated caches handle last slice sharing + if (params->allocations[i].requires_dedicated_mall_mcache) { + if (params->allocations[i].num_mcaches_plane0 > 0 && params->allocations[i].num_mcaches_plane1 > 0 && + params->allocations[i].last_slice_sharing.plane0_plane1) { + params->allocations[i].global_mcache_ids_mall_plane1[params->allocations[i].num_mcaches_plane1 - 1] = + params->allocations[i].global_mcache_ids_mall_plane0[params->allocations[i].num_mcaches_plane0 - 1]; + } + // If mall_comb_mcache_l is set then it means that largest mcache ID for MALL p0 can be same as regular read p0 + if (params->allocations[i].num_mcaches_plane0 > 0 && params->allocations[i].last_slice_sharing.mall_comb_mcache_p0) { + params->allocations[i].global_mcache_ids_mall_plane0[params->allocations[i].num_mcaches_plane0 - 1] = + params->allocations[i].global_mcache_ids_plane0[params->allocations[i].num_mcaches_plane0 - 1]; + } + // If mall_comb_mcache_c is set then it means that largest mcache ID for MALL p1 can be same as regular + // read p1 (which can be same as regular read p0 if plane0_plane1 is also set) + if (params->allocations[i].num_mcaches_plane1 > 0 && params->allocations[i].last_slice_sharing.mall_comb_mcache_p1) { + params->allocations[i].global_mcache_ids_mall_plane1[params->allocations[i].num_mcaches_plane1 - 1] = + params->allocations[i].global_mcache_ids_plane1[params->allocations[i].num_mcaches_plane1 - 1]; + } + } + + // If you don't need dedicated mall mcaches, the mall mcache assignments are identical to the normal requesting + if (!params->allocations[i].requires_dedicated_mall_mcache) { + memcpy(params->allocations[i].global_mcache_ids_mall_plane0, params->allocations[i].global_mcache_ids_plane0, + sizeof(params->allocations[i].global_mcache_ids_mall_plane0)); + memcpy(params->allocations[i].global_mcache_ids_mall_plane1, params->allocations[i].global_mcache_ids_plane1, + sizeof(params->allocations[i].global_mcache_ids_mall_plane1)); + } + } +} + +bool dml2_top_mcache_calc_mcache_count_and_offsets(struct top_mcache_calc_mcache_count_and_offsets_in_out *params) +{ + struct dml2_instance *dml = (struct dml2_instance *)params->dml2_instance; + struct dml2_top_mcache_verify_mcache_size_locals *l = &dml->scratch.mcache_verify_mcache_size_locals; + + unsigned int total_mcaches_required; + unsigned int i; + bool result = false; + + if (dml->soc_bbox.num_dcc_mcaches == 0) { + return true; + } + + total_mcaches_required = 0; + l->calc_mcache_params.instance = &dml->core_instance; + for (i = 0; i < params->display_config->num_planes; i++) { + if (!params->display_config->plane_descriptors[i].surface.dcc.enable) { + memset(¶ms->mcache_allocations[i], 0, sizeof(struct dml2_mcache_surface_allocation)); + continue; + } + + l->calc_mcache_params.plane_descriptor = ¶ms->display_config->plane_descriptors[i]; + l->calc_mcache_params.mcache_allocation = ¶ms->mcache_allocations[i]; + l->calc_mcache_params.plane_index = i; + + if (!dml->core_instance.calculate_mcache_allocation(&l->calc_mcache_params)) { + result = false; + break; + } + + if (params->mcache_allocations[i].valid) { + total_mcaches_required += params->mcache_allocations[i].num_mcaches_plane0 + params->mcache_allocations[i].num_mcaches_plane1; + if (params->mcache_allocations[i].last_slice_sharing.plane0_plane1) + total_mcaches_required--; + } + } + dml2_printf("DML_CORE_DCN3::%s: plane_%d, total_mcaches_required=%d\n", __func__, i, total_mcaches_required); + + if (total_mcaches_required > dml->soc_bbox.num_dcc_mcaches) { + result = false; + } else { + result = true; + } + + return result; +} + +static bool dml2_top_soc15_check_mode_supported(struct dml2_check_mode_supported_in_out *in_out) +{ + struct dml2_instance *dml = (struct dml2_instance *)in_out->dml2_instance; + struct dml2_check_mode_supported_locals *l = &dml->scratch.check_mode_supported_locals; + struct dml2_display_cfg_programming *dpmm_programming = &dml->dpmm_instance.dpmm_scratch.programming; + + bool result = false; + bool mcache_success = false; + memset(dpmm_programming, 0, sizeof(struct dml2_display_cfg_programming)); + + setup_unoptimized_display_config_with_meta(dml, &l->base_display_config_with_meta, in_out->display_config); + + l->mode_support_params.instance = &dml->core_instance; + l->mode_support_params.display_cfg = &l->base_display_config_with_meta; + l->mode_support_params.min_clk_table = &dml->min_clk_table; + l->mode_support_params.min_clk_index = l->base_display_config_with_meta.stage1.min_clk_index_for_latency; + result = dml->core_instance.mode_support(&l->mode_support_params); + l->base_display_config_with_meta.mode_support_result = l->mode_support_params.mode_support_result; + + if (result) { + struct optimization_phase_params mcache_phase = { + .dml = dml, + .display_config = &l->base_display_config_with_meta, + .test_function = dml2_top_optimization_test_function_mcache, + .optimize_function = dml2_top_optimization_optimize_function_mcache, + .optimized_display_config = &l->optimized_display_config_with_meta, + .all_or_nothing = false, + }; + mcache_success = dml2_top_optimization_perform_optimization_phase(&l->optimization_phase_locals, &mcache_phase); + } + + /* + * Call DPMM to map all requirements to minimum clock state + */ + if (result) { + l->dppm_map_mode_params.min_clk_table = &dml->min_clk_table; + l->dppm_map_mode_params.display_cfg = &l->base_display_config_with_meta; + l->dppm_map_mode_params.programming = dpmm_programming; + l->dppm_map_mode_params.soc_bb = &dml->soc_bbox; + l->dppm_map_mode_params.ip = &dml->core_instance.clean_me_up.mode_lib.ip; + result = dml->dpmm_instance.map_mode_to_soc_dpm(&l->dppm_map_mode_params); + } + + in_out->is_supported = mcache_success; + result = result && in_out->is_supported; + + return result; +} + +static bool dml2_top_soc15_build_mode_programming(struct dml2_build_mode_programming_in_out *in_out) +{ + struct dml2_instance *dml = (struct dml2_instance *)in_out->dml2_instance; + struct dml2_build_mode_programming_locals *l = &dml->scratch.build_mode_programming_locals; + + bool result = false; + bool mcache_success = false; + bool uclk_pstate_success = false; + bool vmin_success = false; + bool stutter_success = false; + unsigned int i; + + memset(l, 0, sizeof(struct dml2_build_mode_programming_locals)); + memset(in_out->programming, 0, sizeof(struct dml2_display_cfg_programming)); + + memcpy(&in_out->programming->display_config, in_out->display_config, sizeof(struct dml2_display_cfg)); + + setup_speculative_display_config_with_meta(dml, &l->base_display_config_with_meta, in_out->display_config); + + l->mode_support_params.instance = &dml->core_instance; + l->mode_support_params.display_cfg = &l->base_display_config_with_meta; + l->mode_support_params.min_clk_table = &dml->min_clk_table; + l->mode_support_params.min_clk_index = l->base_display_config_with_meta.stage1.min_clk_index_for_latency; + result = dml->core_instance.mode_support(&l->mode_support_params); + + l->base_display_config_with_meta.mode_support_result = l->mode_support_params.mode_support_result; + + if (!result) { + setup_unoptimized_display_config_with_meta(dml, &l->base_display_config_with_meta, in_out->display_config); + + l->mode_support_params.instance = &dml->core_instance; + l->mode_support_params.display_cfg = &l->base_display_config_with_meta; + l->mode_support_params.min_clk_table = &dml->min_clk_table; + l->mode_support_params.min_clk_index = l->base_display_config_with_meta.stage1.min_clk_index_for_latency; + result = dml->core_instance.mode_support(&l->mode_support_params); + l->base_display_config_with_meta.mode_support_result = l->mode_support_params.mode_support_result; + + if (!result) { + l->informative_params.instance = &dml->core_instance; + l->informative_params.programming = in_out->programming; + l->informative_params.mode_is_supported = false; + dml->core_instance.populate_informative(&l->informative_params); + + return false; + } + + /* + * Phase 1: Determine minimum clocks to satisfy latency requirements for this mode + */ + memset(&l->min_clock_for_latency_phase, 0, sizeof(struct optimization_phase_params)); + l->min_clock_for_latency_phase.dml = dml; + l->min_clock_for_latency_phase.display_config = &l->base_display_config_with_meta; + l->min_clock_for_latency_phase.init_function = dml2_top_optimization_init_function_min_clk_for_latency; + l->min_clock_for_latency_phase.test_function = dml2_top_optimization_test_function_min_clk_for_latency; + l->min_clock_for_latency_phase.optimize_function = dml2_top_optimization_optimize_function_min_clk_for_latency; + l->min_clock_for_latency_phase.optimized_display_config = &l->optimized_display_config_with_meta; + l->min_clock_for_latency_phase.all_or_nothing = false; + + dml2_top_optimization_perform_optimization_phase_1(&l->optimization_phase_locals, &l->min_clock_for_latency_phase); + + memcpy(&l->base_display_config_with_meta, &l->optimized_display_config_with_meta, sizeof(struct display_configuation_with_meta)); + } + + /* + * Phase 2: Satisfy DCC mcache requirements + */ + memset(&l->mcache_phase, 0, sizeof(struct optimization_phase_params)); + l->mcache_phase.dml = dml; + l->mcache_phase.display_config = &l->base_display_config_with_meta; + l->mcache_phase.test_function = dml2_top_optimization_test_function_mcache; + l->mcache_phase.optimize_function = dml2_top_optimization_optimize_function_mcache; + l->mcache_phase.optimized_display_config = &l->optimized_display_config_with_meta; + l->mcache_phase.all_or_nothing = true; + + mcache_success = dml2_top_optimization_perform_optimization_phase(&l->optimization_phase_locals, &l->mcache_phase); + + if (!mcache_success) { + l->informative_params.instance = &dml->core_instance; + l->informative_params.programming = in_out->programming; + l->informative_params.mode_is_supported = false; + + dml->core_instance.populate_informative(&l->informative_params); + + in_out->programming->informative.failed_mcache_validation = true; + return false; + } + + memcpy(&l->base_display_config_with_meta, &l->optimized_display_config_with_meta, sizeof(struct display_configuation_with_meta)); + + /* + * Phase 3: Optimize for Pstate + */ + memset(&l->uclk_pstate_phase, 0, sizeof(struct optimization_phase_params)); + l->uclk_pstate_phase.dml = dml; + l->uclk_pstate_phase.display_config = &l->base_display_config_with_meta; + l->uclk_pstate_phase.init_function = dml2_top_optimization_init_function_uclk_pstate; + l->uclk_pstate_phase.test_function = dml2_top_optimization_test_function_uclk_pstate; + l->uclk_pstate_phase.optimize_function = dml2_top_optimization_optimize_function_uclk_pstate; + l->uclk_pstate_phase.optimized_display_config = &l->optimized_display_config_with_meta; + l->uclk_pstate_phase.all_or_nothing = true; + + uclk_pstate_success = dml2_top_optimization_perform_optimization_phase(&l->optimization_phase_locals, &l->uclk_pstate_phase); + + if (uclk_pstate_success) { + memcpy(&l->base_display_config_with_meta, &l->optimized_display_config_with_meta, sizeof(struct display_configuation_with_meta)); + l->base_display_config_with_meta.stage3.success = true; + } + + /* + * Phase 4: Optimize for Vmin + */ + memset(&l->vmin_phase, 0, sizeof(struct optimization_phase_params)); + l->vmin_phase.dml = dml; + l->vmin_phase.display_config = &l->base_display_config_with_meta; + l->vmin_phase.init_function = dml2_top_optimization_init_function_vmin; + l->vmin_phase.test_function = dml2_top_optimization_test_function_vmin; + l->vmin_phase.optimize_function = dml2_top_optimization_optimize_function_vmin; + l->vmin_phase.optimized_display_config = &l->optimized_display_config_with_meta; + l->vmin_phase.all_or_nothing = false; + + vmin_success = dml2_top_optimization_perform_optimization_phase(&l->optimization_phase_locals, &l->vmin_phase); + + if (l->optimized_display_config_with_meta.stage4.performed) { + /* + * when performed is true, optimization has applied to + * optimized_display_config_with_meta and it has passed mode + * support. However it may or may not pass the test function to + * reach actual Vmin. As long as voltage is optimized even if it + * doesn't reach Vmin level, there is still power benefit so in + * this case we will still copy this optimization into base + * display config. + */ + memcpy(&l->base_display_config_with_meta, &l->optimized_display_config_with_meta, sizeof(struct display_configuation_with_meta)); + l->base_display_config_with_meta.stage4.success = vmin_success; + } + + /* + * Phase 5: Optimize for Stutter + */ + memset(&l->stutter_phase, 0, sizeof(struct optimization_phase_params)); + l->stutter_phase.dml = dml; + l->stutter_phase.display_config = &l->base_display_config_with_meta; + l->stutter_phase.init_function = dml2_top_optimization_init_function_stutter; + l->stutter_phase.test_function = dml2_top_optimization_test_function_stutter; + l->stutter_phase.optimize_function = dml2_top_optimization_optimize_function_stutter; + l->stutter_phase.optimized_display_config = &l->optimized_display_config_with_meta; + l->stutter_phase.all_or_nothing = true; + + stutter_success = dml2_top_optimization_perform_optimization_phase(&l->optimization_phase_locals, &l->stutter_phase); + + if (stutter_success) { + memcpy(&l->base_display_config_with_meta, &l->optimized_display_config_with_meta, sizeof(struct display_configuation_with_meta)); + l->base_display_config_with_meta.stage5.success = true; + } + + /* + * Populate mcache programming + */ + for (i = 0; i < in_out->display_config->num_planes; i++) { + in_out->programming->plane_programming[i].mcache_allocation = l->base_display_config_with_meta.stage2.mcache_allocations[i]; + } + + /* + * Call DPMM to map all requirements to minimum clock state + */ + if (result) { + l->dppm_map_mode_params.min_clk_table = &dml->min_clk_table; + l->dppm_map_mode_params.display_cfg = &l->base_display_config_with_meta; + l->dppm_map_mode_params.programming = in_out->programming; + l->dppm_map_mode_params.soc_bb = &dml->soc_bbox; + l->dppm_map_mode_params.ip = &dml->core_instance.clean_me_up.mode_lib.ip; + result = dml->dpmm_instance.map_mode_to_soc_dpm(&l->dppm_map_mode_params); + if (!result) + in_out->programming->informative.failed_dpmm = true; + } + + if (result) { + l->mode_programming_params.instance = &dml->core_instance; + l->mode_programming_params.display_cfg = &l->base_display_config_with_meta; + l->mode_programming_params.cfg_support_info = &l->base_display_config_with_meta.mode_support_result.cfg_support_info; + l->mode_programming_params.programming = in_out->programming; + result = dml->core_instance.mode_programming(&l->mode_programming_params); + if (!result) + in_out->programming->informative.failed_mode_programming = true; + } + + if (result) { + l->dppm_map_watermarks_params.core = &dml->core_instance; + l->dppm_map_watermarks_params.display_cfg = &l->base_display_config_with_meta; + l->dppm_map_watermarks_params.programming = in_out->programming; + result = dml->dpmm_instance.map_watermarks(&l->dppm_map_watermarks_params); + } + + l->informative_params.instance = &dml->core_instance; + l->informative_params.programming = in_out->programming; + l->informative_params.mode_is_supported = result; + + dml->core_instance.populate_informative(&l->informative_params); + + return result; +} + +bool dml2_top_soc15_build_mcache_programming(struct dml2_build_mcache_programming_in_out *params) +{ + bool success = true; + int config_index, pipe_index; + int first_offset, second_offset; + int free_per_plane_reg_index = 0; + + memset(params->per_plane_pipe_mcache_regs, 0, DML2_MAX_PLANES * DML2_MAX_DCN_PIPES * sizeof(struct dml2_hubp_pipe_mcache_regs *)); + + for (config_index = 0; config_index < params->num_configurations; config_index++) { + for (pipe_index = 0; pipe_index < params->mcache_configurations[config_index].num_pipes; pipe_index++) { + // Allocate storage for the mcache regs + params->per_plane_pipe_mcache_regs[config_index][pipe_index] = ¶ms->mcache_regs_set[free_per_plane_reg_index++]; + + reset_mcache_allocations(params->per_plane_pipe_mcache_regs[config_index][pipe_index]); + + if (params->mcache_configurations[config_index].plane_descriptor->surface.dcc.enable) { + // P0 always enabled + if (!calculate_first_second_splitting(params->mcache_configurations[config_index].mcache_allocation->mcache_x_offsets_plane0, + params->mcache_configurations[config_index].mcache_allocation->num_mcaches_plane0, + 0, + params->mcache_configurations[config_index].pipe_configurations[pipe_index].plane0.viewport_x_start, + params->mcache_configurations[config_index].pipe_configurations[pipe_index].plane0.viewport_x_start + + params->mcache_configurations[config_index].pipe_configurations[pipe_index].plane0.viewport_width - 1, + &first_offset, &second_offset)) { + success = false; + break; + } + + params->per_plane_pipe_mcache_regs[config_index][pipe_index]->main.p0.mcache_id_first = + params->mcache_configurations[config_index].mcache_allocation->global_mcache_ids_plane0[first_offset]; + + params->per_plane_pipe_mcache_regs[config_index][pipe_index]->mall.p0.mcache_id_first = + params->mcache_configurations[config_index].mcache_allocation->global_mcache_ids_mall_plane0[first_offset]; + + if (second_offset >= 0) { + params->per_plane_pipe_mcache_regs[config_index][pipe_index]->main.p0.mcache_id_second = + params->mcache_configurations[config_index].mcache_allocation->global_mcache_ids_plane0[second_offset]; + params->per_plane_pipe_mcache_regs[config_index][pipe_index]->main.p0.split_location = + params->mcache_configurations[config_index].mcache_allocation->mcache_x_offsets_plane0[first_offset] - 1; + + params->per_plane_pipe_mcache_regs[config_index][pipe_index]->mall.p0.mcache_id_second = + params->mcache_configurations[config_index].mcache_allocation->global_mcache_ids_mall_plane0[second_offset]; + params->per_plane_pipe_mcache_regs[config_index][pipe_index]->mall.p0.split_location = + params->mcache_configurations[config_index].mcache_allocation->mcache_x_offsets_plane0[first_offset] - 1; + } + + // Populate P1 if enabled + if (params->mcache_configurations[config_index].pipe_configurations[pipe_index].plane1_enabled) { + if (!calculate_first_second_splitting(params->mcache_configurations[config_index].mcache_allocation->mcache_x_offsets_plane1, + params->mcache_configurations[config_index].mcache_allocation->num_mcaches_plane1, + 0, + params->mcache_configurations[config_index].pipe_configurations[pipe_index].plane1.viewport_x_start, + params->mcache_configurations[config_index].pipe_configurations[pipe_index].plane1.viewport_x_start + + params->mcache_configurations[config_index].pipe_configurations[pipe_index].plane1.viewport_width - 1, + &first_offset, &second_offset)) { + success = false; + break; + } + + params->per_plane_pipe_mcache_regs[config_index][pipe_index]->main.p1.mcache_id_first = + params->mcache_configurations[config_index].mcache_allocation->global_mcache_ids_plane1[first_offset]; + + params->per_plane_pipe_mcache_regs[config_index][pipe_index]->mall.p1.mcache_id_first = + params->mcache_configurations[config_index].mcache_allocation->global_mcache_ids_mall_plane1[first_offset]; + + if (second_offset >= 0) { + params->per_plane_pipe_mcache_regs[config_index][pipe_index]->main.p1.mcache_id_second = + params->mcache_configurations[config_index].mcache_allocation->global_mcache_ids_plane1[second_offset]; + params->per_plane_pipe_mcache_regs[config_index][pipe_index]->main.p1.split_location = + params->mcache_configurations[config_index].mcache_allocation->mcache_x_offsets_plane1[first_offset] - 1; + + params->per_plane_pipe_mcache_regs[config_index][pipe_index]->mall.p1.mcache_id_second = + params->mcache_configurations[config_index].mcache_allocation->global_mcache_ids_mall_plane1[second_offset]; + params->per_plane_pipe_mcache_regs[config_index][pipe_index]->mall.p1.split_location = + params->mcache_configurations[config_index].mcache_allocation->mcache_x_offsets_plane1[first_offset] - 1; + } + } + } + } + } + + return success; +} + +static const struct dml2_top_funcs soc15_funcs = { + .check_mode_supported = dml2_top_soc15_check_mode_supported, + .build_mode_programming = dml2_top_soc15_build_mode_programming, + .build_mcache_programming = dml2_top_soc15_build_mcache_programming, +}; + +bool dml2_top_soc15_initialize_instance(struct dml2_initialize_instance_in_out *in_out) +{ + struct dml2_instance *dml = (struct dml2_instance *)in_out->dml2_instance; + struct dml2_initialize_instance_locals *l = &dml->scratch.initialize_instance_locals; + struct dml2_core_initialize_in_out core_init_params = { 0 }; + struct dml2_mcg_build_min_clock_table_params_in_out mcg_build_min_clk_params = { 0 }; + struct dml2_pmo_initialize_in_out pmo_init_params = { 0 }; + bool result = false; + + memset(l, 0, sizeof(struct dml2_initialize_instance_locals)); + memset(dml, 0, sizeof(struct dml2_instance)); + + memcpy(&dml->ip_caps, &in_out->ip_caps, sizeof(struct dml2_ip_capabilities)); + memcpy(&dml->soc_bbox, &in_out->soc_bb, sizeof(struct dml2_soc_bb)); + + dml->project_id = in_out->options.project_id; + dml->pmo_options = in_out->options.pmo_options; + + // Initialize All Components + result = dml2_mcg_create(in_out->options.project_id, &dml->mcg_instance); + + if (result) + result = dml2_dpmm_create(in_out->options.project_id, &dml->dpmm_instance); + + if (result) + result = dml2_core_create(in_out->options.project_id, &dml->core_instance); + + if (result) { + mcg_build_min_clk_params.soc_bb = &in_out->soc_bb; + mcg_build_min_clk_params.min_clk_table = &dml->min_clk_table; + result = dml->mcg_instance.build_min_clock_table(&mcg_build_min_clk_params); + } + + if (result) { + core_init_params.project_id = in_out->options.project_id; + core_init_params.instance = &dml->core_instance; + core_init_params.minimum_clock_table = &dml->min_clk_table; + core_init_params.explicit_ip_bb = in_out->overrides.explicit_ip_bb; + core_init_params.explicit_ip_bb_size = in_out->overrides.explicit_ip_bb_size; + core_init_params.ip_caps = &in_out->ip_caps; + core_init_params.soc_bb = &in_out->soc_bb; + result = dml->core_instance.initialize(&core_init_params); + + if (core_init_params.explicit_ip_bb && core_init_params.explicit_ip_bb_size > 0) { + memcpy(&dml->ip_caps, &in_out->ip_caps, sizeof(struct dml2_ip_capabilities)); + } + } + + if (result) + result = dml2_pmo_create(in_out->options.project_id, &dml->pmo_instance); + + if (result) { + pmo_init_params.instance = &dml->pmo_instance; + pmo_init_params.soc_bb = &dml->soc_bbox; + pmo_init_params.ip_caps = &dml->ip_caps; + pmo_init_params.mcg_clock_table_size = dml->min_clk_table.dram_bw_table.num_entries; + pmo_init_params.options = &dml->pmo_options; + dml->pmo_instance.initialize(&pmo_init_params); + } + dml->funcs = soc15_funcs; + return result; +} diff --git a/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_top/dml_top_mcache.h b/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_top/dml2_top_soc15.h similarity index 58% rename from drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_top/dml_top_mcache.h rename to drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_top/dml2_top_soc15.h index 7b1f6f7143d07c587c10b8f80225a80964088881..6fda201af898f4e94390346e392528baedb67c4a 100644 --- a/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_top/dml_top_mcache.h +++ b/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_top/dml2_top_soc15.h @@ -1,23 +1,13 @@ // SPDX-License-Identifier: MIT // // Copyright 2024 Advanced Micro Devices, Inc. - -#ifndef __DML_TOP_MCACHE_H__ -#define __DML_TOP_MCACHE_H__ - -#include "dml2_external_lib_deps.h" -#include "dml_top_display_cfg_types.h" -#include "dml_top_types.h" +#ifndef __DML2_TOP_SOC15_H__ +#define __DML2_TOP_SOC15_H__ #include "dml2_internal_shared_types.h" +bool dml2_top_soc15_initialize_instance(struct dml2_initialize_instance_in_out *in_out); bool dml2_top_mcache_calc_mcache_count_and_offsets(struct top_mcache_calc_mcache_count_and_offsets_in_out *params); - void dml2_top_mcache_assign_global_mcache_ids(struct top_mcache_assign_global_mcache_ids_in_out *params); - bool dml2_top_mcache_validate_admissability(struct top_mcache_validate_admissability_in_out *params); - -bool dml2_top_mcache_build_mcache_programming(struct dml2_build_mcache_programming_in_out *params); - -bool dml2_top_mcache_unit_test(void); - -#endif +bool dml2_top_soc15_build_mcache_programming(struct dml2_build_mcache_programming_in_out *params); +#endif /* __DML2_TOP_SOC15_H__ */ diff --git a/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_top/dml_top_mcache.c b/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_top/dml_top_mcache.c deleted file mode 100644 index a342ebfbe4e7fd852455ae09a08a5b99f91b22b4..0000000000000000000000000000000000000000 --- a/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_top/dml_top_mcache.c +++ /dev/null @@ -1,549 +0,0 @@ -// SPDX-License-Identifier: MIT -// -// Copyright 2024 Advanced Micro Devices, Inc. - -#include "dml2_debug.h" - -#include "dml_top_mcache.h" -#include "lib_float_math.h" - -#include "dml2_internal_shared_types.h" - -/* -* Takes an input set of mcache boundaries and finds the appropriate setting of cache programming. -* Returns true if a valid set of programming can be made, and false otherwise. "Valid" means -* that the horizontal viewport does not span more than 2 cache slices. -* -* It optionally also can apply a constant shift to all the cache boundaries. -*/ -static const uint32_t MCACHE_ID_UNASSIGNED = 0xF; -static const uint32_t SPLIT_LOCATION_UNDEFINED = 0xFFFF; - -static bool calculate_first_second_splitting(const int *mcache_boundaries, int num_boundaries, int shift, - int pipe_h_vp_start, int pipe_h_vp_end, int *first_offset, int *second_offset) -{ - const int MAX_VP = 0xFFFFFF; - int left_cache_id; - int right_cache_id; - int range_start; - int range_end; - bool success = false; - - if (num_boundaries <= 1) { - if (first_offset && second_offset) { - *first_offset = 0; - *second_offset = -1; - } - success = true; - return success; - } else { - range_start = 0; - for (left_cache_id = 0; left_cache_id < num_boundaries; left_cache_id++) { - range_end = mcache_boundaries[left_cache_id] - shift - 1; - - if (range_start <= pipe_h_vp_start && pipe_h_vp_start <= range_end) - break; - - range_start = range_end + 1; - } - - range_end = MAX_VP; - for (right_cache_id = num_boundaries - 1; right_cache_id >= -1; right_cache_id--) { - if (right_cache_id >= 0) - range_start = mcache_boundaries[right_cache_id] - shift; - else - range_start = 0; - - if (range_start <= pipe_h_vp_end && pipe_h_vp_end <= range_end) { - break; - } - range_end = range_start - 1; - } - right_cache_id = (right_cache_id + 1) % num_boundaries; - - if (right_cache_id == left_cache_id) { - if (first_offset && second_offset) { - *first_offset = left_cache_id; - *second_offset = -1; - } - success = true; - } else if (right_cache_id == (left_cache_id + 1) % num_boundaries) { - if (first_offset && second_offset) { - *first_offset = left_cache_id; - *second_offset = right_cache_id; - } - success = true; - } - } - - return success; -} - -/* -* For a given set of pipe start/end x positions, checks to see it can support the input mcache splitting. -* It also attempts to "optimize" by finding a shift if the default 0 shift does not work. -*/ -static bool find_shift_for_valid_cache_id_assignment(int *mcache_boundaries, unsigned int num_boundaries, - int *pipe_vp_startx, int *pipe_vp_endx, unsigned int pipe_count, int shift_granularity, int *shift) -{ - int max_shift = 0xFFFF; - unsigned int pipe_index; - unsigned int i, slice_width; - bool success = false; - - for (i = 0; i < num_boundaries; i++) { - if (i == 0) - slice_width = mcache_boundaries[i]; - else - slice_width = mcache_boundaries[i] - mcache_boundaries[i - 1]; - - if (max_shift > (int)slice_width) { - max_shift = slice_width; - } - } - - for (*shift = 0; *shift <= max_shift; *shift += shift_granularity) { - success = true; - for (pipe_index = 0; pipe_index < pipe_count; pipe_index++) { - if (!calculate_first_second_splitting(mcache_boundaries, num_boundaries, *shift, - pipe_vp_startx[pipe_index], pipe_vp_endx[pipe_index], 0, 0)) { - success = false; - break; - } - } - if (success) - break; - } - - return success; -} - -/* -* Counts the number of elements inside input array within the given span length. -* Formally, what is the size of the largest subset of the array where the largest and smallest element -* differ no more than the span. -*/ -static unsigned int count_elements_in_span(int *array, unsigned int array_size, unsigned int span) -{ - unsigned int i; - unsigned int span_start_value; - unsigned int span_start_index; - unsigned int greatest_element_count; - - if (array_size == 0) - return 1; - - if (span == 0) - return array_size > 0 ? 1 : 0; - - span_start_value = 0; - span_start_index = 0; - greatest_element_count = 0; - - while (span_start_index < array_size) { - for (i = span_start_index; i < array_size; i++) { - if (array[i] - span_start_value <= span) { - if (i - span_start_index + 1 > greatest_element_count) { - greatest_element_count = i - span_start_index + 1; - } - } else - break; - } - - span_start_index++; - - if (span_start_index < array_size) { - span_start_value = array[span_start_index - 1] + 1; - } - } - - return greatest_element_count; -} - -static bool calculate_h_split_for_scaling_transform(int full_vp_width, int h_active, int num_pipes, - enum dml2_scaling_transform scaling_transform, int *pipe_vp_x_start, int *pipe_vp_x_end) -{ - int i, slice_width; - const char MAX_SCL_VP_OVERLAP = 3; - bool success = false; - - switch (scaling_transform) { - case dml2_scaling_transform_centered: - case dml2_scaling_transform_aspect_ratio: - case dml2_scaling_transform_fullscreen: - slice_width = full_vp_width / num_pipes; - for (i = 0; i < num_pipes; i++) { - pipe_vp_x_start[i] = i * slice_width; - pipe_vp_x_end[i] = (i + 1) * slice_width - 1; - - if (pipe_vp_x_start[i] < MAX_SCL_VP_OVERLAP) - pipe_vp_x_start[i] = 0; - else - pipe_vp_x_start[i] -= MAX_SCL_VP_OVERLAP; - - if (pipe_vp_x_end[i] > full_vp_width - MAX_SCL_VP_OVERLAP - 1) - pipe_vp_x_end[i] = full_vp_width - 1; - else - pipe_vp_x_end[i] += MAX_SCL_VP_OVERLAP; - } - break; - case dml2_scaling_transform_explicit: - default: - success = false; - break; - } - - return success; -} - -bool dml2_top_mcache_validate_admissability(struct top_mcache_validate_admissability_in_out *params) -{ - struct dml2_instance *dml = (struct dml2_instance *)params->dml2_instance; - struct dml2_top_mcache_validate_admissability_locals *l = &dml->scratch.mcache_validate_admissability_locals; - - const int MAX_PIXEL_OVERLAP = 6; - int max_per_pipe_vp_p0 = 0; - int max_per_pipe_vp_p1 = 0; - int temp, p0shift, p1shift; - unsigned int plane_index = 0; - unsigned int i; - unsigned int odm_combine_factor; - unsigned int mpc_combine_factor; - unsigned int num_dpps; - unsigned int num_boundaries; - enum dml2_scaling_transform scaling_transform; - const struct dml2_plane_parameters *plane; - const struct dml2_stream_parameters *stream; - - bool p0pass = false; - bool p1pass = false; - bool all_pass = true; - - for (plane_index = 0; plane_index < params->display_cfg->num_planes; plane_index++) { - if (!params->display_cfg->plane_descriptors[plane_index].surface.dcc.enable) - continue; - - plane = ¶ms->display_cfg->plane_descriptors[plane_index]; - stream = ¶ms->display_cfg->stream_descriptors[plane->stream_index]; - - num_dpps = odm_combine_factor = params->cfg_support_info->stream_support_info[plane->stream_index].odms_used; - - if (odm_combine_factor == 1) - num_dpps = mpc_combine_factor = (unsigned int)params->cfg_support_info->plane_support_info[plane_index].dpps_used; - else - mpc_combine_factor = 1; - - if (odm_combine_factor > 1) { - max_per_pipe_vp_p0 = plane->surface.plane0.width; - temp = (unsigned int)math_ceil(plane->composition.scaler_info.plane0.h_ratio * stream->timing.h_active / odm_combine_factor); - - if (temp < max_per_pipe_vp_p0) - max_per_pipe_vp_p0 = temp; - - max_per_pipe_vp_p1 = plane->surface.plane1.width; - temp = (unsigned int)math_ceil(plane->composition.scaler_info.plane1.h_ratio * stream->timing.h_active / odm_combine_factor); - - if (temp < max_per_pipe_vp_p1) - max_per_pipe_vp_p1 = temp; - } else { - max_per_pipe_vp_p0 = plane->surface.plane0.width / mpc_combine_factor; - max_per_pipe_vp_p1 = plane->surface.plane1.width / mpc_combine_factor; - } - - max_per_pipe_vp_p0 += 2 * MAX_PIXEL_OVERLAP; - max_per_pipe_vp_p1 += MAX_PIXEL_OVERLAP; - - p0shift = 0; - p1shift = 0; - - // The last element in the unshifted boundary array will always be the first pixel outside the - // plane, which means theres no mcache associated with it, so -1 - num_boundaries = params->mcache_allocations[plane_index].num_mcaches_plane0 == 0 ? 0 : params->mcache_allocations[plane_index].num_mcaches_plane0 - 1; - if ((count_elements_in_span(params->mcache_allocations[plane_index].mcache_x_offsets_plane0, - num_boundaries, max_per_pipe_vp_p0) <= 1) && (num_boundaries <= num_dpps)) { - p0pass = true; - } - num_boundaries = params->mcache_allocations[plane_index].num_mcaches_plane1 == 0 ? 0 : params->mcache_allocations[plane_index].num_mcaches_plane1 - 1; - if ((count_elements_in_span(params->mcache_allocations[plane_index].mcache_x_offsets_plane1, - num_boundaries, max_per_pipe_vp_p1) <= 1) && (num_boundaries <= num_dpps)) { - p1pass = true; - } - - if (!p0pass || !p1pass) { - if (odm_combine_factor > 1) { - num_dpps = odm_combine_factor; - scaling_transform = plane->composition.scaling_transform; - } else { - num_dpps = mpc_combine_factor; - scaling_transform = dml2_scaling_transform_fullscreen; - } - - if (!p0pass) { - if (plane->composition.viewport.stationary) { - calculate_h_split_for_scaling_transform(plane->surface.plane0.width, - stream->timing.h_active, num_dpps, scaling_transform, - &l->plane0.pipe_vp_startx[plane_index], &l->plane0.pipe_vp_endx[plane_index]); - p0pass = find_shift_for_valid_cache_id_assignment(params->mcache_allocations[plane_index].mcache_x_offsets_plane0, - params->mcache_allocations[plane_index].num_mcaches_plane0, - &l->plane0.pipe_vp_startx[plane_index], &l->plane0.pipe_vp_endx[plane_index], num_dpps, - params->mcache_allocations[plane_index].shift_granularity.p0, &p0shift); - } - } - if (!p1pass) { - if (plane->composition.viewport.stationary) { - calculate_h_split_for_scaling_transform(plane->surface.plane1.width, - stream->timing.h_active, num_dpps, scaling_transform, - &l->plane0.pipe_vp_startx[plane_index], &l->plane0.pipe_vp_endx[plane_index]); - p1pass = find_shift_for_valid_cache_id_assignment(params->mcache_allocations[plane_index].mcache_x_offsets_plane1, - params->mcache_allocations[plane_index].num_mcaches_plane1, - &l->plane1.pipe_vp_startx[plane_index], &l->plane1.pipe_vp_endx[plane_index], num_dpps, - params->mcache_allocations[plane_index].shift_granularity.p1, &p1shift); - } - } - } - - if (p0pass && p1pass) { - for (i = 0; i < params->mcache_allocations[plane_index].num_mcaches_plane0; i++) { - params->mcache_allocations[plane_index].mcache_x_offsets_plane0[i] -= p0shift; - } - for (i = 0; i < params->mcache_allocations[plane_index].num_mcaches_plane1; i++) { - params->mcache_allocations[plane_index].mcache_x_offsets_plane1[i] -= p1shift; - } - } - - params->per_plane_status[plane_index] = p0pass && p1pass; - all_pass &= p0pass && p1pass; - } - - return all_pass; -} - -static void reset_mcache_allocations(struct dml2_hubp_pipe_mcache_regs *per_plane_pipe_mcache_regs) -{ - // Initialize all entries to special valid MCache ID and special valid split coordinate - per_plane_pipe_mcache_regs->main.p0.mcache_id_first = MCACHE_ID_UNASSIGNED; - per_plane_pipe_mcache_regs->main.p0.mcache_id_second = MCACHE_ID_UNASSIGNED; - per_plane_pipe_mcache_regs->main.p0.split_location = SPLIT_LOCATION_UNDEFINED; - - per_plane_pipe_mcache_regs->mall.p0.mcache_id_first = MCACHE_ID_UNASSIGNED; - per_plane_pipe_mcache_regs->mall.p0.mcache_id_second = MCACHE_ID_UNASSIGNED; - per_plane_pipe_mcache_regs->mall.p0.split_location = SPLIT_LOCATION_UNDEFINED; - - per_plane_pipe_mcache_regs->main.p1.mcache_id_first = MCACHE_ID_UNASSIGNED; - per_plane_pipe_mcache_regs->main.p1.mcache_id_second = MCACHE_ID_UNASSIGNED; - per_plane_pipe_mcache_regs->main.p1.split_location = SPLIT_LOCATION_UNDEFINED; - - per_plane_pipe_mcache_regs->mall.p1.mcache_id_first = MCACHE_ID_UNASSIGNED; - per_plane_pipe_mcache_regs->mall.p1.mcache_id_second = MCACHE_ID_UNASSIGNED; - per_plane_pipe_mcache_regs->mall.p1.split_location = SPLIT_LOCATION_UNDEFINED; -} - -bool dml2_top_mcache_build_mcache_programming(struct dml2_build_mcache_programming_in_out *params) -{ - bool success = true; - int config_index, pipe_index; - int first_offset, second_offset; - int free_per_plane_reg_index = 0; - - memset(params->per_plane_pipe_mcache_regs, 0, DML2_MAX_PLANES * DML2_MAX_DCN_PIPES * sizeof(struct dml2_hubp_pipe_mcache_regs *)); - - for (config_index = 0; config_index < params->num_configurations; config_index++) { - for (pipe_index = 0; pipe_index < params->mcache_configurations[config_index].num_pipes; pipe_index++) { - // Allocate storage for the mcache regs - params->per_plane_pipe_mcache_regs[config_index][pipe_index] = ¶ms->mcache_regs_set[free_per_plane_reg_index++]; - - reset_mcache_allocations(params->per_plane_pipe_mcache_regs[config_index][pipe_index]); - - if (params->mcache_configurations[config_index].plane_descriptor->surface.dcc.enable) { - // P0 always enabled - if (!calculate_first_second_splitting(params->mcache_configurations[config_index].mcache_allocation->mcache_x_offsets_plane0, - params->mcache_configurations[config_index].mcache_allocation->num_mcaches_plane0, - 0, - params->mcache_configurations[config_index].pipe_configurations[pipe_index].plane0.viewport_x_start, - params->mcache_configurations[config_index].pipe_configurations[pipe_index].plane0.viewport_x_start + - params->mcache_configurations[config_index].pipe_configurations[pipe_index].plane0.viewport_width - 1, - &first_offset, &second_offset)) { - success = false; - break; - } - - params->per_plane_pipe_mcache_regs[config_index][pipe_index]->main.p0.mcache_id_first = - params->mcache_configurations[config_index].mcache_allocation->global_mcache_ids_plane0[first_offset]; - - params->per_plane_pipe_mcache_regs[config_index][pipe_index]->mall.p0.mcache_id_first = - params->mcache_configurations[config_index].mcache_allocation->global_mcache_ids_mall_plane0[first_offset]; - - if (second_offset >= 0) { - params->per_plane_pipe_mcache_regs[config_index][pipe_index]->main.p0.mcache_id_second = - params->mcache_configurations[config_index].mcache_allocation->global_mcache_ids_plane0[second_offset]; - params->per_plane_pipe_mcache_regs[config_index][pipe_index]->main.p0.split_location = - params->mcache_configurations[config_index].mcache_allocation->mcache_x_offsets_plane0[first_offset] - 1; - - params->per_plane_pipe_mcache_regs[config_index][pipe_index]->mall.p0.mcache_id_second = - params->mcache_configurations[config_index].mcache_allocation->global_mcache_ids_mall_plane0[second_offset]; - params->per_plane_pipe_mcache_regs[config_index][pipe_index]->mall.p0.split_location = - params->mcache_configurations[config_index].mcache_allocation->mcache_x_offsets_plane0[first_offset] - 1; - } - - // Populate P1 if enabled - if (params->mcache_configurations[config_index].pipe_configurations[pipe_index].plane1_enabled) { - if (!calculate_first_second_splitting(params->mcache_configurations[config_index].mcache_allocation->mcache_x_offsets_plane1, - params->mcache_configurations[config_index].mcache_allocation->num_mcaches_plane1, - 0, - params->mcache_configurations[config_index].pipe_configurations[pipe_index].plane1.viewport_x_start, - params->mcache_configurations[config_index].pipe_configurations[pipe_index].plane1.viewport_x_start + - params->mcache_configurations[config_index].pipe_configurations[pipe_index].plane1.viewport_width - 1, - &first_offset, &second_offset)) { - success = false; - break; - } - - params->per_plane_pipe_mcache_regs[config_index][pipe_index]->main.p1.mcache_id_first = - params->mcache_configurations[config_index].mcache_allocation->global_mcache_ids_plane1[first_offset]; - - params->per_plane_pipe_mcache_regs[config_index][pipe_index]->mall.p1.mcache_id_first = - params->mcache_configurations[config_index].mcache_allocation->global_mcache_ids_mall_plane1[first_offset]; - - if (second_offset >= 0) { - params->per_plane_pipe_mcache_regs[config_index][pipe_index]->main.p1.mcache_id_second = - params->mcache_configurations[config_index].mcache_allocation->global_mcache_ids_plane1[second_offset]; - params->per_plane_pipe_mcache_regs[config_index][pipe_index]->main.p1.split_location = - params->mcache_configurations[config_index].mcache_allocation->mcache_x_offsets_plane1[first_offset] - 1; - - params->per_plane_pipe_mcache_regs[config_index][pipe_index]->mall.p1.mcache_id_second = - params->mcache_configurations[config_index].mcache_allocation->global_mcache_ids_mall_plane1[second_offset]; - params->per_plane_pipe_mcache_regs[config_index][pipe_index]->mall.p1.split_location = - params->mcache_configurations[config_index].mcache_allocation->mcache_x_offsets_plane1[first_offset] - 1; - } - } - } - } - } - - return success; -} - -void dml2_top_mcache_assign_global_mcache_ids(struct top_mcache_assign_global_mcache_ids_in_out *params) -{ - int i; - unsigned int j; - int next_unused_cache_id = 0; - - for (i = 0; i < params->num_allocations; i++) { - if (!params->allocations[i].valid) - continue; - - for (j = 0; j < params->allocations[i].num_mcaches_plane0; j++) { - params->allocations[i].global_mcache_ids_plane0[j] = next_unused_cache_id++; - } - for (j = 0; j < params->allocations[i].num_mcaches_plane1; j++) { - params->allocations[i].global_mcache_ids_plane1[j] = next_unused_cache_id++; - } - - // The "psuedo-last" slice is always wrapped around - params->allocations[i].global_mcache_ids_plane0[params->allocations[i].num_mcaches_plane0] = - params->allocations[i].global_mcache_ids_plane0[0]; - params->allocations[i].global_mcache_ids_plane1[params->allocations[i].num_mcaches_plane1] = - params->allocations[i].global_mcache_ids_plane1[0]; - - // If we need dedicated caches for mall requesting, then we assign them here. - if (params->allocations[i].requires_dedicated_mall_mcache) { - for (j = 0; j < params->allocations[i].num_mcaches_plane0; j++) { - params->allocations[i].global_mcache_ids_mall_plane0[j] = next_unused_cache_id++; - } - for (j = 0; j < params->allocations[i].num_mcaches_plane1; j++) { - params->allocations[i].global_mcache_ids_mall_plane1[j] = next_unused_cache_id++; - } - - // The "psuedo-last" slice is always wrapped around - params->allocations[i].global_mcache_ids_mall_plane0[params->allocations[i].num_mcaches_plane0] = - params->allocations[i].global_mcache_ids_mall_plane0[0]; - params->allocations[i].global_mcache_ids_mall_plane1[params->allocations[i].num_mcaches_plane1] = - params->allocations[i].global_mcache_ids_mall_plane1[0]; - } - - // If P0 and P1 are sharing caches, then it means the largest mcache IDs for p0 and p1 can be the same - // since mcache IDs are always ascending, then it means the largest mcacheID of p1 should be the - // largest mcacheID of P0 - if (params->allocations[i].num_mcaches_plane0 > 0 && params->allocations[i].num_mcaches_plane1 > 0 && - params->allocations[i].last_slice_sharing.plane0_plane1) { - params->allocations[i].global_mcache_ids_plane1[params->allocations[i].num_mcaches_plane1 - 1] = - params->allocations[i].global_mcache_ids_plane0[params->allocations[i].num_mcaches_plane0 - 1]; - } - - // If we need dedicated caches handle last slice sharing - if (params->allocations[i].requires_dedicated_mall_mcache) { - if (params->allocations[i].num_mcaches_plane0 > 0 && params->allocations[i].num_mcaches_plane1 > 0 && - params->allocations[i].last_slice_sharing.plane0_plane1) { - params->allocations[i].global_mcache_ids_mall_plane1[params->allocations[i].num_mcaches_plane1 - 1] = - params->allocations[i].global_mcache_ids_mall_plane0[params->allocations[i].num_mcaches_plane0 - 1]; - } - // If mall_comb_mcache_l is set then it means that largest mcache ID for MALL p0 can be same as regular read p0 - if (params->allocations[i].num_mcaches_plane0 > 0 && params->allocations[i].last_slice_sharing.mall_comb_mcache_p0) { - params->allocations[i].global_mcache_ids_mall_plane0[params->allocations[i].num_mcaches_plane0 - 1] = - params->allocations[i].global_mcache_ids_plane0[params->allocations[i].num_mcaches_plane0 - 1]; - } - // If mall_comb_mcache_c is set then it means that largest mcache ID for MALL p1 can be same as regular - // read p1 (which can be same as regular read p0 if plane0_plane1 is also set) - if (params->allocations[i].num_mcaches_plane1 > 0 && params->allocations[i].last_slice_sharing.mall_comb_mcache_p1) { - params->allocations[i].global_mcache_ids_mall_plane1[params->allocations[i].num_mcaches_plane1 - 1] = - params->allocations[i].global_mcache_ids_plane1[params->allocations[i].num_mcaches_plane1 - 1]; - } - } - - // If you don't need dedicated mall mcaches, the mall mcache assignments are identical to the normal requesting - if (!params->allocations[i].requires_dedicated_mall_mcache) { - memcpy(params->allocations[i].global_mcache_ids_mall_plane0, params->allocations[i].global_mcache_ids_plane0, - sizeof(params->allocations[i].global_mcache_ids_mall_plane0)); - memcpy(params->allocations[i].global_mcache_ids_mall_plane1, params->allocations[i].global_mcache_ids_plane1, - sizeof(params->allocations[i].global_mcache_ids_mall_plane1)); - } - } -} - -bool dml2_top_mcache_calc_mcache_count_and_offsets(struct top_mcache_calc_mcache_count_and_offsets_in_out *params) -{ - struct dml2_instance *dml = (struct dml2_instance *)params->dml2_instance; - struct dml2_top_mcache_verify_mcache_size_locals *l = &dml->scratch.mcache_verify_mcache_size_locals; - - unsigned int total_mcaches_required; - unsigned int i; - bool result = false; - - if (dml->soc_bbox.num_dcc_mcaches == 0) { - return true; - } - - total_mcaches_required = 0; - l->calc_mcache_params.instance = &dml->core_instance; - for (i = 0; i < params->display_config->num_planes; i++) { - if (!params->display_config->plane_descriptors[i].surface.dcc.enable) { - memset(¶ms->mcache_allocations[i], 0, sizeof(struct dml2_mcache_surface_allocation)); - continue; - } - - l->calc_mcache_params.plane_descriptor = ¶ms->display_config->plane_descriptors[i]; - l->calc_mcache_params.mcache_allocation = ¶ms->mcache_allocations[i]; - l->calc_mcache_params.plane_index = i; - - if (!dml->core_instance.calculate_mcache_allocation(&l->calc_mcache_params)) { - result = false; - break; - } - - if (params->mcache_allocations[i].valid) { - total_mcaches_required += params->mcache_allocations[i].num_mcaches_plane0 + params->mcache_allocations[i].num_mcaches_plane1; - if (params->mcache_allocations[i].last_slice_sharing.plane0_plane1) - total_mcaches_required--; - } - } - dml2_printf("DML_CORE_DCN3::%s: plane_%d, total_mcaches_required=%d\n", __func__, i, total_mcaches_required); - - if (total_mcaches_required > dml->soc_bbox.num_dcc_mcaches) { - result = false; - } else { - result = true; - } - - return result; -} diff --git a/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/inc/dml2_debug.c b/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/inc/dml2_debug.c index e9b8e10695ae0e219ab708c06a6bf94b8d1468a6..f95c7ff56f152ca13ec78dee53333aa3018cc8fd 100644 --- a/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/inc/dml2_debug.c +++ b/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/inc/dml2_debug.c @@ -4,6 +4,11 @@ #include "dml2_debug.h" +int dml2_log_internal(const char *format, ...) +{ + return 0; +} + int dml2_printf(const char *format, ...) { #ifdef _DEBUG diff --git a/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/inc/dml2_debug.h b/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/inc/dml2_debug.h index d51a1b6c62f263fae6e7f3f6ce6acf4a20460cf0..a27792b56f7e9c190775fc4e835e418c6382fed3 100644 --- a/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/inc/dml2_debug.h +++ b/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/inc/dml2_debug.h @@ -8,9 +8,53 @@ #ifdef _DEBUG #define DML2_ASSERT(condition) dml2_assert(condition) #else -#define DML2_ASSERT(condition) +#define DML2_ASSERT(condition) ((void)0) +#endif +/* + * DML_LOG_FATAL - fatal errors for unrecoverable DML states until a restart. + * DML_LOG_ERROR - unexpected but recoverable failures inside DML + * DML_LOG_WARN - unexpected inputs or events to DML + * DML_LOG_INFO - high level tracing of DML interfaces + * DML_LOG_DEBUG - detailed tracing of DML internal components + * DML_LOG_VERBOSE - detailed tracing of DML calculation procedure + */ +#if !defined(DML_LOG_LEVEL) +#if defined(_DEBUG) && defined(_DEBUG_PRINTS) +/* for backward compatibility with old macros */ +#define DML_LOG_LEVEL 5 +#else +#define DML_LOG_LEVEL 0 +#endif +#endif + +#define DML_LOG_FATAL(fmt, ...) dml2_log_internal(fmt, ## __VA_ARGS__) +#if DML_LOG_LEVEL >= 1 +#define DML_LOG_ERROR(fmt, ...) dml2_log_internal(fmt, ## __VA_ARGS__) +#else +#define DML_LOG_ERROR(fmt, ...) ((void)0) +#endif +#if DML_LOG_LEVEL >= 2 +#define DML_LOG_WARN(fmt, ...) dml2_log_internal(fmt, ## __VA_ARGS__) +#else +#define DML_LOG_WARN(fmt, ...) ((void)0) +#endif +#if DML_LOG_LEVEL >= 3 +#define DML_LOG_INFO(fmt, ...) dml2_log_internal(fmt, ## __VA_ARGS__) +#else +#define DML_LOG_INFO(fmt, ...) ((void)0) +#endif +#if DML_LOG_LEVEL >= 4 +#define DML_LOG_DEBUG(fmt, ...) dml2_log_internal(fmt, ## __VA_ARGS__) +#else +#define DML_LOG_DEBUG(fmt, ...) ((void)0) +#endif +#if DML_LOG_LEVEL >= 5 +#define DML_LOG_VERBOSE(fmt, ...) dml2_log_internal(fmt, ## __VA_ARGS__) +#else +#define DML_LOG_VERBOSE(fmt, ...) ((void)0) #endif +int dml2_log_internal(const char *format, ...); int dml2_printf(const char *format, ...); void dml2_assert(int condition); diff --git a/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/inc/dml2_internal_shared_types.h b/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/inc/dml2_internal_shared_types.h index aeac9f159fa5cd8868c6b4fcdb4e49896c43b58b..d94b310d6eec28390096f303320a12832acf10d0 100644 --- a/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/inc/dml2_internal_shared_types.h +++ b/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/inc/dml2_internal_shared_types.h @@ -8,7 +8,6 @@ #include "dml2_external_lib_deps.h" #include "dml_top_types.h" #include "dml2_core_shared_types.h" - /* * DML2 MCG Types and Interfaces */ @@ -63,7 +62,6 @@ struct dml2_mcg_build_min_clock_table_params_in_out { */ struct dml2_mcg_min_clock_table *min_clk_table; }; - struct dml2_mcg_instance { bool (*build_min_clock_table)(struct dml2_mcg_build_min_clock_table_params_in_out *in_out); bool (*unit_test)(void); @@ -81,7 +79,6 @@ struct dml2_dpmm_map_mode_to_soc_dpm_params_in_out { struct dml2_soc_bb *soc_bb; struct dml2_mcg_min_clock_table *min_clk_table; const struct display_configuation_with_meta *display_cfg; - struct { bool perform_pseudo_map; struct dml2_core_internal_soc_bb *soc_bb; @@ -309,7 +306,7 @@ struct dml2_optimization_stage3_state { // The pstate support mode for each plane // The number of valid elements == display_cfg.num_planes // The indexing of pstate_switch_modes matches plane_descriptors[] - enum dml2_uclk_pstate_support_method pstate_switch_modes[DML2_MAX_PLANES]; + enum dml2_pstate_method pstate_switch_modes[DML2_MAX_PLANES]; // Meta-data for implicit SVP generation, indexed by stream index struct dml2_implicit_svp_meta stream_svp_meta[DML2_MAX_PLANES]; @@ -356,6 +353,12 @@ struct display_configuation_with_meta { struct dml2_optimization_stage5_state stage5; }; +struct dml2_pmo_pstate_strategy { + enum dml2_pstate_method per_stream_pstate_method[DML2_MAX_PLANES]; + bool allow_state_increase; +}; + + struct dml2_core_mode_support_in_out { /* * Inputs @@ -365,7 +368,6 @@ struct dml2_core_mode_support_in_out { struct dml2_mcg_min_clock_table *min_clk_table; int min_clk_index; - /* * Outputs */ @@ -395,7 +397,6 @@ struct dml2_core_mode_programming_in_out { struct dml2_core_instance *instance; const struct display_configuation_with_meta *display_cfg; const struct core_display_cfg_support_info *cfg_support_info; - /* * Outputs (also Input the clk freq are also from programming struct) */ @@ -445,6 +446,7 @@ struct dml2_core_internal_state_intermediates { struct dml2_core_mode_support_locals { struct dml2_core_calcs_mode_support_ex mode_support_ex_params; struct dml2_display_cfg svp_expanded_display_cfg; + struct dml2_calculate_mcache_allocation_in_out calc_mcache_allocation_params; }; struct dml2_core_mode_programming_locals { @@ -600,34 +602,11 @@ struct dml2_pmo_optimize_for_stutter_in_out { struct display_configuation_with_meta *optimized_display_config; }; -enum dml2_pmo_pstate_method { - dml2_pmo_pstate_strategy_na = 0, - /* hw exclusive modes */ - dml2_pmo_pstate_strategy_vactive = 1, - dml2_pmo_pstate_strategy_vblank = 2, - dml2_pmo_pstate_strategy_reserved_hw = 5, - /* fw assisted exclusive modes */ - dml2_pmo_pstate_strategy_fw_svp = 6, - dml2_pmo_pstate_strategy_reserved_fw = 10, - /* fw assisted modes requiring drr modulation */ - dml2_pmo_pstate_strategy_fw_vactive_drr = 11, - dml2_pmo_pstate_strategy_fw_vblank_drr = 12, - dml2_pmo_pstate_strategy_fw_svp_drr = 13, - dml2_pmo_pstate_strategy_reserved_fw_drr_clamped = 20, - dml2_pmo_pstate_strategy_fw_drr = 21, - dml2_pmo_pstate_strategy_reserved_fw_drr_var = 22, -}; - -struct dml2_pmo_pstate_strategy { - enum dml2_pmo_pstate_method per_stream_pstate_method[DML2_MAX_PLANES]; - bool allow_state_increase; -}; - -#define PMO_NO_DRR_STRATEGY_MASK (((1 << (dml2_pmo_pstate_strategy_reserved_fw - dml2_pmo_pstate_strategy_na + 1)) - 1) << dml2_pmo_pstate_strategy_na) -#define PMO_DRR_STRATEGY_MASK (((1 << (dml2_pmo_pstate_strategy_reserved_fw_drr_var - dml2_pmo_pstate_strategy_fw_vactive_drr + 1)) - 1) << dml2_pmo_pstate_strategy_fw_vactive_drr) -#define PMO_DRR_CLAMPED_STRATEGY_MASK (((1 << (dml2_pmo_pstate_strategy_reserved_fw_drr_clamped - dml2_pmo_pstate_strategy_fw_vactive_drr + 1)) - 1) << dml2_pmo_pstate_strategy_fw_vactive_drr) -#define PMO_DRR_VAR_STRATEGY_MASK (((1 << (dml2_pmo_pstate_strategy_reserved_fw_drr_var - dml2_pmo_pstate_strategy_fw_drr + 1)) - 1) << dml2_pmo_pstate_strategy_fw_drr) -#define PMO_FW_STRATEGY_MASK (((1 << (dml2_pmo_pstate_strategy_reserved_fw_drr_var - dml2_pmo_pstate_strategy_fw_svp + 1)) - 1) << dml2_pmo_pstate_strategy_fw_svp) +#define PMO_NO_DRR_STRATEGY_MASK (((1 << (dml2_pstate_method_reserved_fw - dml2_pstate_method_na + 1)) - 1) << dml2_pstate_method_na) +#define PMO_DRR_STRATEGY_MASK (((1 << (dml2_pstate_method_reserved_fw_drr_var - dml2_pstate_method_fw_vactive_drr + 1)) - 1) << dml2_pstate_method_fw_vactive_drr) +#define PMO_DRR_CLAMPED_STRATEGY_MASK (((1 << (dml2_pstate_method_reserved_fw_drr_clamped - dml2_pstate_method_fw_vactive_drr + 1)) - 1) << dml2_pstate_method_fw_vactive_drr) +#define PMO_DRR_VAR_STRATEGY_MASK (((1 << (dml2_pstate_method_reserved_fw_drr_var - dml2_pstate_method_fw_drr + 1)) - 1) << dml2_pstate_method_fw_drr) +#define PMO_FW_STRATEGY_MASK (((1 << (dml2_pstate_method_reserved_fw_drr_var - dml2_pstate_method_fw_svp + 1)) - 1) << dml2_pstate_method_fw_svp) #define PMO_DCN4_MAX_DISPLAYS 4 #define PMO_DCN4_MAX_NUM_VARIANTS 2 @@ -645,6 +624,8 @@ struct dml2_pmo_scratch { int stream_mask; } pmo_dcn3; struct { + struct dml2_pmo_pstate_strategy expanded_override_strategy_list[2 * 2 * 2 * 2]; + unsigned int num_expanded_override_strategies; struct dml2_pmo_pstate_strategy pstate_strategy_candidates[DML2_PMO_PSTATE_CANDIDATE_LIST_SIZE]; int num_pstate_candidates; int cur_pstate_candidate; @@ -706,7 +687,6 @@ struct dml2_pmo_instance { int mpc_combine_limit; int odm_combine_limit; int mcg_clock_table_size; - union { struct { struct { @@ -963,7 +943,13 @@ struct dml2_top_mcache_validate_admissability_locals { struct dml2_top_display_cfg_support_info { const struct dml2_display_cfg *display_config; struct core_display_cfg_support_info core_info; - enum dml2_pstate_support_method per_plane_pstate_method[DML2_MAX_PLANES]; +}; + +struct dml2_top_funcs { + bool (*check_mode_supported)(struct dml2_check_mode_supported_in_out *in_out); + bool (*build_mode_programming)(struct dml2_build_mode_programming_in_out *in_out); + bool (*build_mcache_programming)(struct dml2_build_mcache_programming_in_out *in_out); + bool (*unit_test)(void); }; struct dml2_instance { @@ -978,8 +964,8 @@ struct dml2_instance { struct dml2_ip_capabilities ip_caps; struct dml2_mcg_min_clock_table min_clk_table; - struct dml2_pmo_options pmo_options; + struct dml2_top_funcs funcs; struct { struct dml2_initialize_instance_locals initialize_instance_locals; diff --git a/drivers/gpu/drm/amd/display/dc/dml2/dml2_translation_helper.c b/drivers/gpu/drm/amd/display/dc/dml2/dml2_translation_helper.c index bde4250853b10cf94e6cf55cae897ca3c33a116b..b416320873e11142d4ab390495d563d23dc15cc7 100644 --- a/drivers/gpu/drm/amd/display/dc/dml2/dml2_translation_helper.c +++ b/drivers/gpu/drm/amd/display/dc/dml2/dml2_translation_helper.c @@ -553,13 +553,53 @@ void dml2_init_soc_states(struct dml2_context *dml2, const struct dc *in_dc, } } - dml2_policy_build_synthetic_soc_states(s, p); - if (dml2->v20.dml_core_ctx.project == dml_project_dcn35) { - // Override last out_state with data from last in_state - // This will ensure that out_state contains max fclk - memcpy(&p->out_states->state_array[p->out_states->num_states - 1], - &p->in_states->state_array[p->in_states->num_states - 1], - sizeof(struct soc_state_bounding_box_st)); + if (dml2->v20.dml_core_ctx.project == dml_project_dcn35 || + dml2->v20.dml_core_ctx.project == dml_project_dcn351) { + int max_dcfclk_mhz = 0, max_dispclk_mhz = 0, max_dppclk_mhz = 0, max_phyclk_mhz = 0, + max_dtbclk_mhz = 0, max_fclk_mhz = 0, max_uclk_mhz = 0, max_socclk_mhz = 0; + + for (i = 0; i < p->in_states->num_states; i++) { + if (p->in_states->state_array[i].dcfclk_mhz > max_dcfclk_mhz) + max_dcfclk_mhz = (int)p->in_states->state_array[i].dcfclk_mhz; + if (p->in_states->state_array[i].fabricclk_mhz > max_fclk_mhz) + max_fclk_mhz = (int)p->in_states->state_array[i].fabricclk_mhz; + if (p->in_states->state_array[i].socclk_mhz > max_socclk_mhz) + max_socclk_mhz = (int)p->in_states->state_array[i].socclk_mhz; + if (p->in_states->state_array[i].dram_speed_mts > max_uclk_mhz) + max_uclk_mhz = (int)p->in_states->state_array[i].dram_speed_mts; + if (p->in_states->state_array[i].dispclk_mhz > max_dispclk_mhz) + max_dispclk_mhz = (int)p->in_states->state_array[i].dispclk_mhz; + if (p->in_states->state_array[i].dppclk_mhz > max_dppclk_mhz) + max_dppclk_mhz = (int)p->in_states->state_array[i].dppclk_mhz; + if (p->in_states->state_array[i].phyclk_mhz > max_phyclk_mhz) + max_phyclk_mhz = (int)p->in_states->state_array[i].phyclk_mhz; + if (p->in_states->state_array[i].dtbclk_mhz > max_dtbclk_mhz) + max_dtbclk_mhz = (int)p->in_states->state_array[i].dtbclk_mhz; + } + + for (i = 0; i < p->in_states->num_states; i++) { + /* Independent states - including base (unlisted) parameters from state 0. */ + p->out_states->state_array[i] = p->in_states->state_array[0]; + + p->out_states->state_array[i].dispclk_mhz = max_dispclk_mhz; + p->out_states->state_array[i].dppclk_mhz = max_dppclk_mhz; + p->out_states->state_array[i].dtbclk_mhz = max_dtbclk_mhz; + p->out_states->state_array[i].phyclk_mhz = max_phyclk_mhz; + + p->out_states->state_array[i].dscclk_mhz = max_dispclk_mhz / 3.0; + p->out_states->state_array[i].phyclk_mhz = max_phyclk_mhz; + p->out_states->state_array[i].dtbclk_mhz = max_dtbclk_mhz; + + /* Dependent states. */ + p->out_states->state_array[i].dram_speed_mts = p->in_states->state_array[i].dram_speed_mts; + p->out_states->state_array[i].fabricclk_mhz = p->in_states->state_array[i].fabricclk_mhz; + p->out_states->state_array[i].socclk_mhz = p->in_states->state_array[i].socclk_mhz; + p->out_states->state_array[i].dcfclk_mhz = p->in_states->state_array[i].dcfclk_mhz; + } + + p->out_states->num_states = p->in_states->num_states; + } else { + dml2_policy_build_synthetic_soc_states(s, p); } } diff --git a/drivers/gpu/drm/amd/display/dc/dml2/dml2_wrapper.c b/drivers/gpu/drm/amd/display/dc/dml2/dml2_wrapper.c index 9190c1328d5b2df9a747414ffdb9f313dd7923b9..340791d40ecbfd8581435f2ebd7cdc54594b132f 100644 --- a/drivers/gpu/drm/amd/display/dc/dml2/dml2_wrapper.c +++ b/drivers/gpu/drm/amd/display/dc/dml2/dml2_wrapper.c @@ -531,14 +531,21 @@ static bool optimize_pstate_with_svp_and_drr(struct dml2_context *dml2, struct d static bool call_dml_mode_support_and_programming(struct dc_state *context) { unsigned int result = 0; - unsigned int min_state; + unsigned int min_state = 0; int min_state_for_g6_temp_read = 0; + + + if (!context) + return false; + struct dml2_context *dml2 = context->bw_ctx.dml2; struct dml2_wrapper_scratch *s = &dml2->v20.scratch; - min_state_for_g6_temp_read = calculate_lowest_supported_state_for_temp_read(dml2, context); + if (!context->streams[0]->sink->link->dc->caps.is_apu) { + min_state_for_g6_temp_read = calculate_lowest_supported_state_for_temp_read(dml2, context); - ASSERT(min_state_for_g6_temp_read >= 0); + ASSERT(min_state_for_g6_temp_read >= 0); + } if (!dml2->config.use_native_pstate_optimization) { result = optimize_pstate_with_svp_and_drr(dml2, context); @@ -549,14 +556,20 @@ static bool call_dml_mode_support_and_programming(struct dc_state *context) /* Upon trying to sett certain frequencies in FRL, min_state_for_g6_temp_read is reported as -1. This leads to an invalid value of min_state causing crashes later on. * Use the default logic for min_state only when min_state_for_g6_temp_read is a valid value. In other cases, use the value calculated by the DML directly. */ - if (min_state_for_g6_temp_read >= 0) - min_state = min_state_for_g6_temp_read > s->mode_support_params.out_lowest_state_idx ? min_state_for_g6_temp_read : s->mode_support_params.out_lowest_state_idx; - else - min_state = s->mode_support_params.out_lowest_state_idx; - - if (result) - result = dml_mode_programming(&dml2->v20.dml_core_ctx, min_state, &s->cur_display_config, true); + if (!context->streams[0]->sink->link->dc->caps.is_apu) { + if (min_state_for_g6_temp_read >= 0) + min_state = min_state_for_g6_temp_read > s->mode_support_params.out_lowest_state_idx ? min_state_for_g6_temp_read : s->mode_support_params.out_lowest_state_idx; + else + min_state = s->mode_support_params.out_lowest_state_idx; + } + if (result) { + if (!context->streams[0]->sink->link->dc->caps.is_apu) { + result = dml_mode_programming(&dml2->v20.dml_core_ctx, min_state, &s->cur_display_config, true); + } else { + result = dml_mode_programming(&dml2->v20.dml_core_ctx, s->mode_support_params.out_lowest_state_idx, &s->cur_display_config, true); + } + } return result; } @@ -685,6 +698,8 @@ static bool dml2_validate_only(struct dc_state *context) build_unoptimized_policy_settings(dml2->v20.dml_core_ctx.project, &dml2->v20.dml_core_ctx.policy); map_dc_state_into_dml_display_cfg(dml2, context, &dml2->v20.scratch.cur_display_config); + if (!dml2->config.skip_hw_state_mapping) + dml2_apply_det_buffer_allocation_policy(dml2, &dml2->v20.scratch.cur_display_config); result = pack_and_call_dml_mode_support_ex(dml2, &dml2->v20.scratch.cur_display_config, diff --git a/drivers/gpu/drm/amd/display/dc/dml2/dml_display_rq_dlg_calc.c b/drivers/gpu/drm/amd/display/dc/dml2/dml_display_rq_dlg_calc.c index 377ef6d01ae5d3027ce8db1424734936dc0ca624..00d22e5424699287efd5f34788ed84e17527ffbf 100644 --- a/drivers/gpu/drm/amd/display/dc/dml2/dml_display_rq_dlg_calc.c +++ b/drivers/gpu/drm/amd/display/dc/dml2/dml_display_rq_dlg_calc.c @@ -427,18 +427,6 @@ void dml_rq_dlg_get_dlg_reg(dml_display_dlg_regs_st *disp_dlg_regs, dml_print("DML_DLG: %s: disp_dlg_regs->dst_y_per_vm_flip = 0x%x\n", __func__, disp_dlg_regs->dst_y_per_vm_flip); dml_print("DML_DLG: %s: disp_dlg_regs->dst_y_per_row_flip = 0x%x\n", __func__, disp_dlg_regs->dst_y_per_row_flip); - // hack for FPGA - /* NOTE: We dont have getenv defined in driver and it does not make any sense in the driver */ - /*char* fpga_env = getenv("FPGA_FPDIV"); - if(fpga_env !=NULL) - { - if(disp_dlg_regs->vratio_prefetch >= (dml_uint_t)dml_pow(2, 22)) - { - disp_dlg_regs->vratio_prefetch = (dml_uint_t)dml_pow(2, 22)-1; - dml_print("FPGA msg: vratio_prefetch exceed the max value, the register field is [21:0]\n"); - } - }*/ - disp_dlg_regs->refcyc_per_vm_group_vblank = (dml_uint_t)(dml_get_refcyc_per_vm_group_vblank_in_us(mode_lib, pipe_idx) * refclk_freq_in_mhz); disp_dlg_regs->refcyc_per_vm_group_flip = (dml_uint_t)(dml_get_refcyc_per_vm_group_flip_in_us(mode_lib, pipe_idx) * refclk_freq_in_mhz); disp_dlg_regs->refcyc_per_vm_req_vblank = (dml_uint_t)(dml_get_refcyc_per_vm_req_vblank_in_us(mode_lib, pipe_idx) * refclk_freq_in_mhz * dml_pow(2, 10)); diff --git a/drivers/gpu/drm/amd/display/dc/dwb/dcn30/dcn30_dwb.c b/drivers/gpu/drm/amd/display/dc/dwb/dcn30/dcn30_dwb.c index fae98cf520201d5afe11c8c17e57d0a5c6dc1a9e..bc058f6824385ef6477f5d3a0aef2f395fac50a7 100644 --- a/drivers/gpu/drm/amd/display/dc/dwb/dcn30/dcn30_dwb.c +++ b/drivers/gpu/drm/amd/display/dc/dwb/dcn30/dcn30_dwb.c @@ -270,16 +270,3 @@ void dcn30_dwbc_construct(struct dcn30_dwbc *dwbc30, dwbc30->dwbc_shift = dwbc_shift; dwbc30->dwbc_mask = dwbc_mask; } - -void dwb3_set_host_read_rate_control(struct dwbc *dwbc, bool host_read_delay) -{ - struct dcn30_dwbc *dwbc30 = TO_DCN30_DWBC(dwbc); - - /* - * Set maximum delay of host read access to DWBSCL LUT or OGAM LUT if there are no - * idle cycles in HW pipeline (in number of clock cycles times 4) - */ - REG_UPDATE(DWB_HOST_READ_CONTROL, DWB_HOST_READ_RATE_CONTROL, host_read_delay); - - DC_LOG_DWB("%s dwb3_rate_control at inst = %d", __func__, dwbc->inst); -} diff --git a/drivers/gpu/drm/amd/display/dc/dwb/dcn30/dcn30_dwb.h b/drivers/gpu/drm/amd/display/dc/dwb/dcn30/dcn30_dwb.h index 0f3f7c5fbaecf95222a3d3da64404e7867b7826d..7f053f49ec6a3204b0c21c607db526f999ea5725 100644 --- a/drivers/gpu/drm/amd/display/dc/dwb/dcn30/dcn30_dwb.h +++ b/drivers/gpu/drm/amd/display/dc/dwb/dcn30/dcn30_dwb.h @@ -914,7 +914,6 @@ bool dwb3_ogam_set_input_transfer_func( struct dwbc *dwbc, const struct dc_transfer_func *in_transfer_func_dwb_ogam); -void dwb3_set_host_read_rate_control(struct dwbc *dwbc, bool host_read_delay); #endif diff --git a/drivers/gpu/drm/amd/display/dc/hubp/dcn10/dcn10_hubp.c b/drivers/gpu/drm/amd/display/dc/hubp/dcn10/dcn10_hubp.c index 22ac2b7e49aeae66e6ce182f77be943df14151f4..f0ba944553df596cd2550cd107d06557b095d95e 100644 --- a/drivers/gpu/drm/amd/display/dc/hubp/dcn10/dcn10_hubp.c +++ b/drivers/gpu/drm/amd/display/dc/hubp/dcn10/dcn10_hubp.c @@ -518,6 +518,20 @@ bool hubp1_program_surface_flip_and_addr( return true; } +void hubp1_clear_tiling(struct hubp *hubp) +{ + struct dcn10_hubp *hubp1 = TO_DCN10_HUBP(hubp); + + REG_UPDATE(DCHUBP_REQ_SIZE_CONFIG, SWATH_HEIGHT, 0); + REG_UPDATE(DCSURF_TILING_CONFIG, SW_MODE, DC_SW_LINEAR); + + REG_UPDATE_4(DCSURF_SURFACE_CONTROL, + PRIMARY_SURFACE_DCC_EN, 0, + PRIMARY_SURFACE_DCC_IND_64B_BLK, 0, + SECONDARY_SURFACE_DCC_EN, 0, + SECONDARY_SURFACE_DCC_IND_64B_BLK, 0); +} + void hubp1_dcc_control(struct hubp *hubp, bool enable, enum hubp_ind_block_size independent_64b_blks) { @@ -1363,6 +1377,7 @@ static const struct hubp_funcs dcn10_hubp_funcs = { .hubp_disable_control = hubp1_disable_control, .hubp_get_underflow_status = hubp1_get_underflow_status, .hubp_init = hubp1_init, + .hubp_clear_tiling = hubp1_clear_tiling, .dmdata_set_attributes = NULL, .dmdata_load = NULL, diff --git a/drivers/gpu/drm/amd/display/dc/hubp/dcn10/dcn10_hubp.h b/drivers/gpu/drm/amd/display/dc/hubp/dcn10/dcn10_hubp.h index 69119b2fdce23b878f9148e62cfa37a7820ac30f..631350cd4f2ed6963d53dcced67f2edda1c60024 100644 --- a/drivers/gpu/drm/amd/display/dc/hubp/dcn10/dcn10_hubp.h +++ b/drivers/gpu/drm/amd/display/dc/hubp/dcn10/dcn10_hubp.h @@ -794,4 +794,6 @@ void hubp1_soft_reset(struct hubp *hubp, bool reset); void hubp1_set_flip_int(struct hubp *hubp); +void hubp1_clear_tiling(struct hubp *hubp); + #endif diff --git a/drivers/gpu/drm/amd/display/dc/hubp/dcn20/dcn20_hubp.c b/drivers/gpu/drm/amd/display/dc/hubp/dcn20/dcn20_hubp.c index 0637e4c552d8a29cda4bb087b391b2d6d7305805..200194544bf0c25c0b36c4aa87f2cb9ea181d5f2 100644 --- a/drivers/gpu/drm/amd/display/dc/hubp/dcn20/dcn20_hubp.c +++ b/drivers/gpu/drm/amd/display/dc/hubp/dcn20/dcn20_hubp.c @@ -406,6 +406,20 @@ void hubp2_program_rotation( H_MIRROR_EN, mirror); } +void hubp2_clear_tiling(struct hubp *hubp) +{ + struct dcn20_hubp *hubp2 = TO_DCN20_HUBP(hubp); + + REG_UPDATE(DCHUBP_REQ_SIZE_CONFIG, SWATH_HEIGHT, 0); + REG_UPDATE(DCSURF_TILING_CONFIG, SW_MODE, DC_SW_LINEAR); + + REG_UPDATE_4(DCSURF_SURFACE_CONTROL, + PRIMARY_SURFACE_DCC_EN, 0, + PRIMARY_SURFACE_DCC_IND_64B_BLK, 0, + SECONDARY_SURFACE_DCC_EN, 0, + SECONDARY_SURFACE_DCC_IND_64B_BLK, 0); +} + void hubp2_dcc_control(struct hubp *hubp, bool enable, enum hubp_ind_block_size independent_64b_blks) { @@ -1676,6 +1690,7 @@ static struct hubp_funcs dcn20_hubp_funcs = { .hubp_in_blank = hubp1_in_blank, .hubp_soft_reset = hubp1_soft_reset, .hubp_set_flip_int = hubp1_set_flip_int, + .hubp_clear_tiling = hubp2_clear_tiling, }; diff --git a/drivers/gpu/drm/amd/display/dc/hubp/dcn20/dcn20_hubp.h b/drivers/gpu/drm/amd/display/dc/hubp/dcn20/dcn20_hubp.h index 18e194507e36ddad3a3106c55a136bc4dd7c663f..7fd9240868c3441fe84b07a031557e8106d88d57 100644 --- a/drivers/gpu/drm/amd/display/dc/hubp/dcn20/dcn20_hubp.h +++ b/drivers/gpu/drm/amd/display/dc/hubp/dcn20/dcn20_hubp.h @@ -409,6 +409,8 @@ void hubp2_read_state_common(struct hubp *hubp); void hubp2_read_state(struct hubp *hubp); +void hubp2_clear_tiling(struct hubp *hubp); + #endif /* __DC_MEM_INPUT_DCN20_H__ */ diff --git a/drivers/gpu/drm/amd/display/dc/hubp/dcn201/dcn201_hubp.c b/drivers/gpu/drm/amd/display/dc/hubp/dcn201/dcn201_hubp.c index cd2bfcc51276503db21c0b7e5d8567c53b16a2ae..d910e4a54c34abedc9d9431d6951527b0eadaf0b 100644 --- a/drivers/gpu/drm/amd/display/dc/hubp/dcn201/dcn201_hubp.c +++ b/drivers/gpu/drm/amd/display/dc/hubp/dcn201/dcn201_hubp.c @@ -131,6 +131,7 @@ static struct hubp_funcs dcn201_hubp_funcs = { .hubp_clear_underflow = hubp1_clear_underflow, .hubp_set_flip_control_surface_gsl = hubp2_set_flip_control_surface_gsl, .hubp_init = hubp1_init, + .hubp_clear_tiling = hubp1_clear_tiling, }; bool dcn201_hubp_construct( diff --git a/drivers/gpu/drm/amd/display/dc/hubp/dcn21/dcn21_hubp.c b/drivers/gpu/drm/amd/display/dc/hubp/dcn21/dcn21_hubp.c index e13d69a22c1c7fc91fe69696408a46951deb1ceb..edbdb8c88d5c8527d7a2e207cc18a68df714f4a4 100644 --- a/drivers/gpu/drm/amd/display/dc/hubp/dcn21/dcn21_hubp.c +++ b/drivers/gpu/drm/amd/display/dc/hubp/dcn21/dcn21_hubp.c @@ -837,6 +837,7 @@ static struct hubp_funcs dcn21_hubp_funcs = { .hubp_init = hubp21_init, .validate_dml_output = hubp21_validate_dml_output, .hubp_set_flip_int = hubp1_set_flip_int, + .hubp_clear_tiling = hubp1_clear_tiling, }; bool hubp21_construct( diff --git a/drivers/gpu/drm/amd/display/dc/hubp/dcn30/dcn30_hubp.c b/drivers/gpu/drm/amd/display/dc/hubp/dcn30/dcn30_hubp.c index 60a64d290352743f2d6ad1148be45d275df89bd0..3b16c3cda2c3e7ab4d13f8a9da2e970ffc9466f1 100644 --- a/drivers/gpu/drm/amd/display/dc/hubp/dcn30/dcn30_hubp.c +++ b/drivers/gpu/drm/amd/display/dc/hubp/dcn30/dcn30_hubp.c @@ -334,6 +334,22 @@ void hubp3_program_tiling( } +void hubp3_clear_tiling(struct hubp *hubp) +{ + struct dcn20_hubp *hubp2 = TO_DCN20_HUBP(hubp); + + REG_UPDATE(DCHUBP_REQ_SIZE_CONFIG, SWATH_HEIGHT, 0); + REG_UPDATE(DCSURF_TILING_CONFIG, SW_MODE, DC_SW_LINEAR); + + REG_UPDATE_6(DCSURF_SURFACE_CONTROL, + PRIMARY_SURFACE_DCC_EN, 0, + PRIMARY_SURFACE_DCC_IND_BLK, 0, + PRIMARY_SURFACE_DCC_IND_BLK_C, 0, + SECONDARY_SURFACE_DCC_EN, 0, + SECONDARY_SURFACE_DCC_IND_BLK, 0, + SECONDARY_SURFACE_DCC_IND_BLK_C, 0); +} + void hubp3_dcc_control(struct hubp *hubp, bool enable, enum hubp_ind_block_size blk_size) { @@ -512,6 +528,7 @@ static struct hubp_funcs dcn30_hubp_funcs = { .hubp_in_blank = hubp1_in_blank, .hubp_soft_reset = hubp1_soft_reset, .hubp_set_flip_int = hubp1_set_flip_int, + .hubp_clear_tiling = hubp3_clear_tiling, }; bool hubp3_construct( diff --git a/drivers/gpu/drm/amd/display/dc/hubp/dcn30/dcn30_hubp.h b/drivers/gpu/drm/amd/display/dc/hubp/dcn30/dcn30_hubp.h index b010531a7fe886cd0c393c70ddfadcd0d40d0b39..cfb01bf340a1a6db449deaa94e3f87bc7b75edf4 100644 --- a/drivers/gpu/drm/amd/display/dc/hubp/dcn30/dcn30_hubp.h +++ b/drivers/gpu/drm/amd/display/dc/hubp/dcn30/dcn30_hubp.h @@ -297,6 +297,8 @@ void hubp3_read_state(struct hubp *hubp); void hubp3_init(struct hubp *hubp); +void hubp3_clear_tiling(struct hubp *hubp); + #endif /* __DC_HUBP_DCN30_H__ */ diff --git a/drivers/gpu/drm/amd/display/dc/hubp/dcn31/dcn31_hubp.c b/drivers/gpu/drm/amd/display/dc/hubp/dcn31/dcn31_hubp.c index 8394e8c069199fd7d63367a0aeaca723899bf99d..46b804ed05fba10399e13eb0eca85a4d92b0cd71 100644 --- a/drivers/gpu/drm/amd/display/dc/hubp/dcn31/dcn31_hubp.c +++ b/drivers/gpu/drm/amd/display/dc/hubp/dcn31/dcn31_hubp.c @@ -96,6 +96,7 @@ static struct hubp_funcs dcn31_hubp_funcs = { .hubp_set_flip_int = hubp1_set_flip_int, .hubp_in_blank = hubp1_in_blank, .program_extended_blank = hubp31_program_extended_blank, + .hubp_clear_tiling = hubp3_clear_tiling, }; bool hubp31_construct( diff --git a/drivers/gpu/drm/amd/display/dc/hubp/dcn32/dcn32_hubp.c b/drivers/gpu/drm/amd/display/dc/hubp/dcn32/dcn32_hubp.c index ca5b4b28a66441bca370578d11d4e05097db9d4c..8b5bd73b8094a0c37337431b6dcfd513813916bf 100644 --- a/drivers/gpu/drm/amd/display/dc/hubp/dcn32/dcn32_hubp.c +++ b/drivers/gpu/drm/amd/display/dc/hubp/dcn32/dcn32_hubp.c @@ -201,7 +201,8 @@ static struct hubp_funcs dcn32_hubp_funcs = { .hubp_update_force_cursor_pstate_disallow = hubp32_update_force_cursor_pstate_disallow, .phantom_hubp_post_enable = hubp32_phantom_hubp_post_enable, .hubp_update_mall_sel = hubp32_update_mall_sel, - .hubp_prepare_subvp_buffering = hubp32_prepare_subvp_buffering + .hubp_prepare_subvp_buffering = hubp32_prepare_subvp_buffering, + .hubp_clear_tiling = hubp3_clear_tiling, }; bool hubp32_construct( diff --git a/drivers/gpu/drm/amd/display/dc/hubp/dcn35/dcn35_hubp.c b/drivers/gpu/drm/amd/display/dc/hubp/dcn35/dcn35_hubp.c index d1f05b82b3dd5cb37db23abf994d2e2c6421bc07..eb62042dfafc2d5fcb445ad3c8d0b0a6324fe12b 100644 --- a/drivers/gpu/drm/amd/display/dc/hubp/dcn35/dcn35_hubp.c +++ b/drivers/gpu/drm/amd/display/dc/hubp/dcn35/dcn35_hubp.c @@ -216,6 +216,7 @@ static struct hubp_funcs dcn35_hubp_funcs = { .hubp_set_flip_int = hubp1_set_flip_int, .hubp_in_blank = hubp1_in_blank, .program_extended_blank = hubp31_program_extended_blank_value, + .hubp_clear_tiling = hubp3_clear_tiling, }; bool hubp35_construct( diff --git a/drivers/gpu/drm/amd/display/dc/hubp/dcn401/dcn401_hubp.c b/drivers/gpu/drm/amd/display/dc/hubp/dcn401/dcn401_hubp.c index b1ebf5053b4fc3286b0d03a1d8f506caa690d195..09f730cfbf8e2af4b6298a046ff744fb694c4544 100644 --- a/drivers/gpu/drm/amd/display/dc/hubp/dcn401/dcn401_hubp.c +++ b/drivers/gpu/drm/amd/display/dc/hubp/dcn401/dcn401_hubp.c @@ -40,7 +40,7 @@ #define FN(reg_name, field_name) \ hubp2->hubp_shift->field_name, hubp2->hubp_mask->field_name -static void hubp401_program_3dlut_fl_addr(struct hubp *hubp, +void hubp401_program_3dlut_fl_addr(struct hubp *hubp, const struct dc_plane_address address) { struct dcn20_hubp *hubp2 = TO_DCN20_HUBP(hubp); @@ -49,14 +49,14 @@ static void hubp401_program_3dlut_fl_addr(struct hubp *hubp, REG_WRITE(HUBP_3DLUT_ADDRESS_LOW, address.lut3d.addr.low_part); } -static void hubp401_program_3dlut_fl_dlg_param(struct hubp *hubp, int refcyc_per_3dlut_group) +void hubp401_program_3dlut_fl_dlg_param(struct hubp *hubp, int refcyc_per_3dlut_group) { struct dcn20_hubp *hubp2 = TO_DCN20_HUBP(hubp); REG_UPDATE(HUBP_3DLUT_DLG_PARAM, REFCYC_PER_3DLUT_GROUP, refcyc_per_3dlut_group); } -static void hubp401_enable_3dlut_fl(struct hubp *hubp, bool enable) +void hubp401_enable_3dlut_fl(struct hubp *hubp, bool enable) { struct dcn20_hubp *hubp2 = TO_DCN20_HUBP(hubp); @@ -72,28 +72,28 @@ int hubp401_get_3dlut_fl_done(struct hubp *hubp) return ret; } -static void hubp401_program_3dlut_fl_addressing_mode(struct hubp *hubp, enum hubp_3dlut_fl_addressing_mode addr_mode) +void hubp401_program_3dlut_fl_addressing_mode(struct hubp *hubp, enum hubp_3dlut_fl_addressing_mode addr_mode) { struct dcn20_hubp *hubp2 = TO_DCN20_HUBP(hubp); REG_UPDATE(HUBP_3DLUT_CONTROL, HUBP_3DLUT_ADDRESSING_MODE, addr_mode); } -static void hubp401_program_3dlut_fl_width(struct hubp *hubp, enum hubp_3dlut_fl_width width) +void hubp401_program_3dlut_fl_width(struct hubp *hubp, enum hubp_3dlut_fl_width width) { struct dcn20_hubp *hubp2 = TO_DCN20_HUBP(hubp); REG_UPDATE(HUBP_3DLUT_CONTROL, HUBP_3DLUT_WIDTH, width); } -static void hubp401_program_3dlut_fl_tmz_protected(struct hubp *hubp, bool protection_enabled) +void hubp401_program_3dlut_fl_tmz_protected(struct hubp *hubp, bool protection_enabled) { struct dcn20_hubp *hubp2 = TO_DCN20_HUBP(hubp); REG_UPDATE(HUBP_3DLUT_CONTROL, HUBP_3DLUT_TMZ, protection_enabled ? 1 : 0); } -static void hubp401_program_3dlut_fl_crossbar(struct hubp *hubp, +void hubp401_program_3dlut_fl_crossbar(struct hubp *hubp, enum hubp_3dlut_fl_crossbar_bit_slice bit_slice_y_g, enum hubp_3dlut_fl_crossbar_bit_slice bit_slice_cb_b, enum hubp_3dlut_fl_crossbar_bit_slice bit_slice_cr_r) @@ -106,21 +106,21 @@ static void hubp401_program_3dlut_fl_crossbar(struct hubp *hubp, HUBP_3DLUT_CROSSBAR_SELECT_CR_R, bit_slice_cr_r); } -static void hubp401_update_3dlut_fl_bias_scale(struct hubp *hubp, uint16_t bias, uint16_t scale) +void hubp401_update_3dlut_fl_bias_scale(struct hubp *hubp, uint16_t bias, uint16_t scale) { struct dcn20_hubp *hubp2 = TO_DCN20_HUBP(hubp); REG_UPDATE_2(_3DLUT_FL_BIAS_SCALE, HUBP0_3DLUT_FL_BIAS, bias, HUBP0_3DLUT_FL_SCALE, scale); } -static void hubp401_program_3dlut_fl_mode(struct hubp *hubp, enum hubp_3dlut_fl_mode mode) +void hubp401_program_3dlut_fl_mode(struct hubp *hubp, enum hubp_3dlut_fl_mode mode) { struct dcn20_hubp *hubp2 = TO_DCN20_HUBP(hubp); REG_UPDATE(_3DLUT_FL_CONFIG, HUBP0_3DLUT_FL_MODE, mode); } -static void hubp401_program_3dlut_fl_format(struct hubp *hubp, enum hubp_3dlut_fl_format format) +void hubp401_program_3dlut_fl_format(struct hubp *hubp, enum hubp_3dlut_fl_format format) { struct dcn20_hubp *hubp2 = TO_DCN20_HUBP(hubp); @@ -508,6 +508,18 @@ bool hubp401_program_surface_flip_and_addr( return true; } +void hubp401_clear_tiling(struct hubp *hubp) +{ + struct dcn20_hubp *hubp2 = TO_DCN20_HUBP(hubp); + + REG_UPDATE(DCHUBP_REQ_SIZE_CONFIG, SWATH_HEIGHT, 0); + REG_UPDATE(DCSURF_TILING_CONFIG, SW_MODE, DC_SW_LINEAR); + + REG_UPDATE_2(DCSURF_SURFACE_CONTROL, + PRIMARY_SURFACE_DCC_EN, 0, + SECONDARY_SURFACE_DCC_EN, 0); +} + void hubp401_dcc_control(struct hubp *hubp, struct dc_plane_dcc_param *dcc) { @@ -1004,7 +1016,8 @@ static struct hubp_funcs dcn401_hubp_funcs = { .hubp_program_3dlut_fl_width = hubp401_program_3dlut_fl_width, .hubp_program_3dlut_fl_tmz_protected = hubp401_program_3dlut_fl_tmz_protected, .hubp_program_3dlut_fl_crossbar = hubp401_program_3dlut_fl_crossbar, - .hubp_get_3dlut_fl_done = hubp401_get_3dlut_fl_done + .hubp_get_3dlut_fl_done = hubp401_get_3dlut_fl_done, + .hubp_clear_tiling = hubp2_clear_tiling, }; bool hubp401_construct( diff --git a/drivers/gpu/drm/amd/display/dc/hubp/dcn401/dcn401_hubp.h b/drivers/gpu/drm/amd/display/dc/hubp/dcn401/dcn401_hubp.h index e52fdb5b0cd02f7db673657faa5876c7c55104ba..9b200a55bf9d36b30c9f2ef73c2ec95b67e913ee 100644 --- a/drivers/gpu/drm/amd/display/dc/hubp/dcn401/dcn401_hubp.h +++ b/drivers/gpu/drm/amd/display/dc/hubp/dcn401/dcn401_hubp.h @@ -340,4 +340,29 @@ int hubp401_get_3dlut_fl_done(struct hubp *hubp); void hubp401_set_unbounded_requesting(struct hubp *hubp, bool enable); +void hubp401_update_3dlut_fl_bias_scale(struct hubp *hubp, uint16_t bias, uint16_t scale); + +void hubp401_program_3dlut_fl_crossbar(struct hubp *hubp, + enum hubp_3dlut_fl_crossbar_bit_slice bit_slice_y_g, + enum hubp_3dlut_fl_crossbar_bit_slice bit_slice_cb_b, + enum hubp_3dlut_fl_crossbar_bit_slice bit_slice_cr_r); + +void hubp401_program_3dlut_fl_tmz_protected(struct hubp *hubp, bool protection_enabled); + +void hubp401_program_3dlut_fl_width(struct hubp *hubp, enum hubp_3dlut_fl_width width); + +void hubp401_program_3dlut_fl_addressing_mode(struct hubp *hubp, enum hubp_3dlut_fl_addressing_mode addr_mode); + +void hubp401_enable_3dlut_fl(struct hubp *hubp, bool enable); + +void hubp401_program_3dlut_fl_dlg_param(struct hubp *hubp, int refcyc_per_3dlut_group); + +void hubp401_program_3dlut_fl_addr(struct hubp *hubp, const struct dc_plane_address address); + +void hubp401_program_3dlut_fl_format(struct hubp *hubp, enum hubp_3dlut_fl_format format); + +void hubp401_program_3dlut_fl_mode(struct hubp *hubp, enum hubp_3dlut_fl_mode mode); + +void hubp401_clear_tiling(struct hubp *hubp); + #endif /* __DC_HUBP_DCN401_H__ */ diff --git a/drivers/gpu/drm/amd/display/dc/hwss/dcn30/dcn30_init.c b/drivers/gpu/drm/amd/display/dc/hwss/dcn30/dcn30_init.c index 0e8d32e3dbae1db1c9313fb4acd622b0903cc6b2..c32764aef8840062c4099c87e16bbce4378e6f22 100644 --- a/drivers/gpu/drm/amd/display/dc/hwss/dcn30/dcn30_init.c +++ b/drivers/gpu/drm/amd/display/dc/hwss/dcn30/dcn30_init.c @@ -86,7 +86,6 @@ static const struct hw_sequencer_funcs dcn30_funcs = { .enable_writeback = dcn30_enable_writeback, .disable_writeback = dcn30_disable_writeback, .update_writeback = dcn30_update_writeback, - .mmhubbub_warmup = dcn30_mmhubbub_warmup, .dmdata_status_done = dcn20_dmdata_status_done, .program_dmdata_engine = dcn30_program_dmdata_engine, .set_dmdata_attributes = dcn20_set_dmdata_attributes, diff --git a/drivers/gpu/drm/amd/display/dc/hwss/dcn301/dcn301_init.c b/drivers/gpu/drm/amd/display/dc/hwss/dcn301/dcn301_init.c index 780ce4c064aa5826dcebeab95143127565408722..dcb27cdbce7319ea8c513fa6deeaa963c62d3a1f 100644 --- a/drivers/gpu/drm/amd/display/dc/hwss/dcn301/dcn301_init.c +++ b/drivers/gpu/drm/amd/display/dc/hwss/dcn301/dcn301_init.c @@ -86,7 +86,6 @@ static const struct hw_sequencer_funcs dcn301_funcs = { .enable_writeback = dcn30_enable_writeback, .disable_writeback = dcn30_disable_writeback, .update_writeback = dcn30_update_writeback, - .mmhubbub_warmup = dcn30_mmhubbub_warmup, .dmdata_status_done = dcn20_dmdata_status_done, .program_dmdata_engine = dcn30_program_dmdata_engine, .set_dmdata_attributes = dcn20_set_dmdata_attributes, diff --git a/drivers/gpu/drm/amd/display/dc/hwss/dcn31/dcn31_init.c b/drivers/gpu/drm/amd/display/dc/hwss/dcn31/dcn31_init.c index 5f8f45b4872050bc04dcaa0115b90f7160103de6..fb2ffb6379317a06220961515edde964fa659229 100644 --- a/drivers/gpu/drm/amd/display/dc/hwss/dcn31/dcn31_init.c +++ b/drivers/gpu/drm/amd/display/dc/hwss/dcn31/dcn31_init.c @@ -89,7 +89,6 @@ static const struct hw_sequencer_funcs dcn31_funcs = { .enable_writeback = dcn30_enable_writeback, .disable_writeback = dcn30_disable_writeback, .update_writeback = dcn30_update_writeback, - .mmhubbub_warmup = dcn30_mmhubbub_warmup, .dmdata_status_done = dcn20_dmdata_status_done, .program_dmdata_engine = dcn30_program_dmdata_engine, .set_dmdata_attributes = dcn20_set_dmdata_attributes, @@ -98,7 +97,7 @@ static const struct hw_sequencer_funcs dcn31_funcs = { .set_flip_control_gsl = dcn20_set_flip_control_gsl, .get_vupdate_offset_from_vsync = dcn10_get_vupdate_offset_from_vsync, .calc_vupdate_position = dcn10_calc_vupdate_position, - .set_backlight_level = dcn31_set_backlight_level, + .set_backlight_level = dcn21_set_backlight_level, .set_abm_immediate_disable = dcn21_set_abm_immediate_disable, .set_pipe = dcn21_set_pipe, .enable_lvds_link_output = dce110_enable_lvds_link_output, diff --git a/drivers/gpu/drm/amd/display/dc/hwss/dcn314/dcn314_hwseq.c b/drivers/gpu/drm/amd/display/dc/hwss/dcn314/dcn314_hwseq.c index 9b88eb72086db5595bd998a546acab90cb3ed03b..be26c925fdfa1a969b04a4cc9bbca2e09f0e158c 100644 --- a/drivers/gpu/drm/amd/display/dc/hwss/dcn314/dcn314_hwseq.c +++ b/drivers/gpu/drm/amd/display/dc/hwss/dcn314/dcn314_hwseq.c @@ -162,6 +162,8 @@ void dcn314_update_odm(struct dc *dc, struct dc_state *context, struct pipe_ctx int opp_inst[MAX_PIPES] = {0}; int odm_slice_width = resource_get_odm_slice_dst_width(pipe_ctx, false); int last_odm_slice_width = resource_get_odm_slice_dst_width(pipe_ctx, true); + struct mpc *mpc = dc->res_pool->mpc; + int i; opp_cnt = get_odm_config(pipe_ctx, opp_inst); @@ -174,6 +176,16 @@ void dcn314_update_odm(struct dc *dc, struct dc_state *context, struct pipe_ctx pipe_ctx->stream_res.tg->funcs->set_odm_bypass( pipe_ctx->stream_res.tg, &pipe_ctx->stream->timing); + if (mpc->funcs->set_out_rate_control) { + for (i = 0; i < opp_cnt; ++i) { + mpc->funcs->set_out_rate_control( + mpc, opp_inst[i], + false, + 0, + NULL); + } + } + for (odm_pipe = pipe_ctx->next_odm_pipe; odm_pipe; odm_pipe = odm_pipe->next_odm_pipe) { odm_pipe->stream_res.opp->funcs->opp_pipe_clock_control( odm_pipe->stream_res.opp, diff --git a/drivers/gpu/drm/amd/display/dc/hwss/dcn314/dcn314_init.c b/drivers/gpu/drm/amd/display/dc/hwss/dcn314/dcn314_init.c index 6bdfbf22ce8728d2dd4ce064910d68ce176a2899..21ef03a76229a18f81b8d7b4498f10ecde7fc9da 100644 --- a/drivers/gpu/drm/amd/display/dc/hwss/dcn314/dcn314_init.c +++ b/drivers/gpu/drm/amd/display/dc/hwss/dcn314/dcn314_init.c @@ -91,7 +91,6 @@ static const struct hw_sequencer_funcs dcn314_funcs = { .enable_writeback = dcn30_enable_writeback, .disable_writeback = dcn30_disable_writeback, .update_writeback = dcn30_update_writeback, - .mmhubbub_warmup = dcn30_mmhubbub_warmup, .dmdata_status_done = dcn20_dmdata_status_done, .program_dmdata_engine = dcn30_program_dmdata_engine, .set_dmdata_attributes = dcn20_set_dmdata_attributes, @@ -100,7 +99,7 @@ static const struct hw_sequencer_funcs dcn314_funcs = { .set_flip_control_gsl = dcn20_set_flip_control_gsl, .get_vupdate_offset_from_vsync = dcn10_get_vupdate_offset_from_vsync, .calc_vupdate_position = dcn10_calc_vupdate_position, - .set_backlight_level = dcn31_set_backlight_level, + .set_backlight_level = dcn21_set_backlight_level, .set_abm_immediate_disable = dcn21_set_abm_immediate_disable, .set_pipe = dcn21_set_pipe, .enable_lvds_link_output = dce110_enable_lvds_link_output, diff --git a/drivers/gpu/drm/amd/display/dc/hwss/dcn32/dcn32_hwseq.c b/drivers/gpu/drm/amd/display/dc/hwss/dcn32/dcn32_hwseq.c index fa11f075d1f9aaeb51889eb7d9e48d70b6291868..2e6ecf939fbd3aba3507b0d5c4a2928a1ed9abf6 100644 --- a/drivers/gpu/drm/amd/display/dc/hwss/dcn32/dcn32_hwseq.c +++ b/drivers/gpu/drm/amd/display/dc/hwss/dcn32/dcn32_hwseq.c @@ -985,6 +985,7 @@ void dcn32_init_hw(struct dc *dc) dc->caps.dmub_caps.subvp_psr = dc->ctx->dmub_srv->dmub->feature_caps.subvp_psr_support; dc->caps.dmub_caps.gecc_enable = dc->ctx->dmub_srv->dmub->feature_caps.gecc_enable; dc->caps.dmub_caps.mclk_sw = dc->ctx->dmub_srv->dmub->feature_caps.fw_assisted_mclk_switch_ver; + dc->caps.dmub_caps.aux_backlight_support = dc->ctx->dmub_srv->dmub->feature_caps.abm_aux_backlight_support; /* for DCN401 testing only */ dc->caps.dmub_caps.fams_ver = dc->ctx->dmub_srv->dmub->feature_caps.fw_assisted_mclk_switch_ver; diff --git a/drivers/gpu/drm/amd/display/dc/hwss/dcn32/dcn32_init.c b/drivers/gpu/drm/amd/display/dc/hwss/dcn32/dcn32_init.c index 5ecee7e320da964d966eddca812322f1d96f5791..e4d149eff10f5d78556f6b52e51a721890b6f648 100644 --- a/drivers/gpu/drm/amd/display/dc/hwss/dcn32/dcn32_init.c +++ b/drivers/gpu/drm/amd/display/dc/hwss/dcn32/dcn32_init.c @@ -87,7 +87,6 @@ static const struct hw_sequencer_funcs dcn32_funcs = { .enable_writeback = dcn30_enable_writeback, .disable_writeback = dcn30_disable_writeback, .update_writeback = dcn30_update_writeback, - .mmhubbub_warmup = dcn30_mmhubbub_warmup, .dmdata_status_done = dcn20_dmdata_status_done, .program_dmdata_engine = dcn30_program_dmdata_engine, .set_dmdata_attributes = dcn20_set_dmdata_attributes, diff --git a/drivers/gpu/drm/amd/display/dc/hwss/dcn35/dcn35_hwseq.c b/drivers/gpu/drm/amd/display/dc/hwss/dcn35/dcn35_hwseq.c index e599cdc465bfd2a1936f447696482930d6296846..d5f76cc69c73609e2cba52f1f50362673c374fca 100644 --- a/drivers/gpu/drm/amd/display/dc/hwss/dcn35/dcn35_hwseq.c +++ b/drivers/gpu/drm/amd/display/dc/hwss/dcn35/dcn35_hwseq.c @@ -426,6 +426,8 @@ void dcn35_update_odm(struct dc *dc, struct dc_state *context, struct pipe_ctx * int opp_inst[MAX_PIPES] = {0}; int odm_slice_width = resource_get_odm_slice_dst_width(pipe_ctx, false); int last_odm_slice_width = resource_get_odm_slice_dst_width(pipe_ctx, true); + struct mpc *mpc = dc->res_pool->mpc; + int i; opp_cnt = get_odm_config(pipe_ctx, opp_inst); @@ -438,6 +440,16 @@ void dcn35_update_odm(struct dc *dc, struct dc_state *context, struct pipe_ctx * pipe_ctx->stream_res.tg->funcs->set_odm_bypass( pipe_ctx->stream_res.tg, &pipe_ctx->stream->timing); + if (mpc->funcs->set_out_rate_control) { + for (i = 0; i < opp_cnt; ++i) { + mpc->funcs->set_out_rate_control( + mpc, opp_inst[i], + false, + 0, + NULL); + } + } + for (odm_pipe = pipe_ctx->next_odm_pipe; odm_pipe; odm_pipe = odm_pipe->next_odm_pipe) { odm_pipe->stream_res.opp->funcs->opp_pipe_clock_control( odm_pipe->stream_res.opp, diff --git a/drivers/gpu/drm/amd/display/dc/hwss/dcn35/dcn35_init.c b/drivers/gpu/drm/amd/display/dc/hwss/dcn35/dcn35_init.c index fd67779c27a948fe02bddb9477a1f8cc67fa2777..5ca8db2b2d032054a55b81d0eb73612840a2170c 100644 --- a/drivers/gpu/drm/amd/display/dc/hwss/dcn35/dcn35_init.c +++ b/drivers/gpu/drm/amd/display/dc/hwss/dcn35/dcn35_init.c @@ -92,7 +92,6 @@ static const struct hw_sequencer_funcs dcn35_funcs = { .enable_writeback = dcn30_enable_writeback, .disable_writeback = dcn30_disable_writeback, .update_writeback = dcn30_update_writeback, - .mmhubbub_warmup = dcn30_mmhubbub_warmup, .dmdata_status_done = dcn20_dmdata_status_done, .program_dmdata_engine = dcn30_program_dmdata_engine, .set_dmdata_attributes = dcn20_set_dmdata_attributes, diff --git a/drivers/gpu/drm/amd/display/dc/hwss/dcn351/dcn351_init.c b/drivers/gpu/drm/amd/display/dc/hwss/dcn351/dcn351_init.c index 3c275a1eff589b115ea55ad79255b2696ffff4d6..4f73e7f551acac906ccc431de8d044e803b0b65f 100644 --- a/drivers/gpu/drm/amd/display/dc/hwss/dcn351/dcn351_init.c +++ b/drivers/gpu/drm/amd/display/dc/hwss/dcn351/dcn351_init.c @@ -91,7 +91,6 @@ static const struct hw_sequencer_funcs dcn351_funcs = { .enable_writeback = dcn30_enable_writeback, .disable_writeback = dcn30_disable_writeback, .update_writeback = dcn30_update_writeback, - .mmhubbub_warmup = dcn30_mmhubbub_warmup, .dmdata_status_done = dcn20_dmdata_status_done, .program_dmdata_engine = dcn30_program_dmdata_engine, .set_dmdata_attributes = dcn20_set_dmdata_attributes, diff --git a/drivers/gpu/drm/amd/display/dc/hwss/dcn401/dcn401_hwseq.c b/drivers/gpu/drm/amd/display/dc/hwss/dcn401/dcn401_hwseq.c index 307782592789c86972d11877a8605d4be6f63f6f..8cb0fbd301d8598482a60a73602ab140b028e944 100644 --- a/drivers/gpu/drm/amd/display/dc/hwss/dcn401/dcn401_hwseq.c +++ b/drivers/gpu/drm/amd/display/dc/hwss/dcn401/dcn401_hwseq.c @@ -435,7 +435,8 @@ void dcn401_init_hw(struct dc *dc) dc->caps.dmub_caps.psr = dc->ctx->dmub_srv->dmub->feature_caps.psr; dc->caps.dmub_caps.mclk_sw = dc->ctx->dmub_srv->dmub->feature_caps.fw_assisted_mclk_switch_ver > 0; dc->caps.dmub_caps.fams_ver = dc->ctx->dmub_srv->dmub->feature_caps.fw_assisted_mclk_switch_ver; - dc->debug.fams2_config.bits.enable &= dc->ctx->dmub_srv->dmub->feature_caps.fw_assisted_mclk_switch_ver == 2; + dc->debug.fams2_config.bits.enable &= + dc->caps.dmub_caps.fams_ver == dc->debug.fams_version.ver; // sw & fw fams versions must match for support if ((!dc->debug.fams2_config.bits.enable && dc->res_pool->funcs->update_bw_bounding_box) || res_pool->ref_clocks.dchub_ref_clock_inKhz / 1000 != current_dchub_ref_freq) { /* update bounding box if FAMS2 disabled, or if dchub clk has changed */ @@ -821,7 +822,7 @@ enum dc_status dcn401_enable_stream_timing( int opp_inst[MAX_PIPES] = {0}; struct pipe_ctx *opp_heads[MAX_PIPES] = {0}; struct dc_crtc_timing patched_crtc_timing = stream->timing; - bool manual_mode; + bool manual_mode = false; unsigned int tmds_div = PIXEL_RATE_DIV_NA; unsigned int unused_div = PIXEL_RATE_DIV_NA; int odm_slice_width; diff --git a/drivers/gpu/drm/amd/display/dc/hwss/dcn401/dcn401_init.c b/drivers/gpu/drm/amd/display/dc/hwss/dcn401/dcn401_init.c index 23e4f208152efd06437132874279f3957cab97d9..b30f665d98a609918edaf3d544cf736d8ce3b091 100644 --- a/drivers/gpu/drm/amd/display/dc/hwss/dcn401/dcn401_init.c +++ b/drivers/gpu/drm/amd/display/dc/hwss/dcn401/dcn401_init.c @@ -66,7 +66,6 @@ static const struct hw_sequencer_funcs dcn401_funcs = { .enable_writeback = dcn30_enable_writeback, .disable_writeback = dcn30_disable_writeback, .update_writeback = dcn30_update_writeback, - .mmhubbub_warmup = dcn30_mmhubbub_warmup, .dmdata_status_done = dcn20_dmdata_status_done, .program_dmdata_engine = dcn30_program_dmdata_engine, .set_dmdata_attributes = dcn20_set_dmdata_attributes, diff --git a/drivers/gpu/drm/amd/display/dc/hwss/hw_sequencer.h b/drivers/gpu/drm/amd/display/dc/hwss/hw_sequencer.h index 66fdc5805d0a93dcf09b333b2bef54cc9e5b88dc..a5bb10d7b16032ff3dc84539e05ab4f6c651c68b 100644 --- a/drivers/gpu/drm/amd/display/dc/hwss/hw_sequencer.h +++ b/drivers/gpu/drm/amd/display/dc/hwss/hw_sequencer.h @@ -331,10 +331,6 @@ struct hw_sequencer_funcs { void (*disable_writeback)(struct dc *dc, unsigned int dwb_pipe_inst); - bool (*mmhubbub_warmup)(struct dc *dc, - unsigned int num_dwb, - struct dc_writeback_info *wb_info); - /* Clock Related */ enum dc_status (*set_clock)(struct dc *dc, enum dc_clock_type clock_type, diff --git a/drivers/gpu/drm/amd/display/dc/inc/core_types.h b/drivers/gpu/drm/amd/display/dc/inc/core_types.h index 2edd5b38ce4f85ea591c357fedeb01dd7cea6f02..06a4201548882b7f51113d71fecd21c42ed5f10c 100644 --- a/drivers/gpu/drm/amd/display/dc/inc/core_types.h +++ b/drivers/gpu/drm/amd/display/dc/inc/core_types.h @@ -45,9 +45,6 @@ #define MAX_SVP_PHANTOM_STREAMS 2 #define MAX_SVP_PHANTOM_PLANES 2 -void enable_surface_flip_reporting(struct dc_plane_state *plane_state, - uint32_t controller_id); - #include "grph_object_id.h" #include "link_encoder.h" #include "stream_encoder.h" @@ -542,7 +539,8 @@ struct dcn_bw_output { bool legacy_svp_drr_stream_index_valid; struct dml2_mcache_surface_allocation mcache_allocations[DML2_MAX_PLANES]; struct dmub_cmd_fams2_global_config fams2_global_config; - struct dmub_fams2_stream_static_state fams2_stream_params[DML2_MAX_PLANES]; + union dmub_cmd_fams2_config fams2_stream_base_params[DML2_MAX_PLANES]; + union dmub_cmd_fams2_config fams2_stream_sub_params[DML2_MAX_PLANES]; struct dml2_display_arb_regs arb_regs; }; diff --git a/drivers/gpu/drm/amd/display/dc/inc/dcn_calcs.h b/drivers/gpu/drm/amd/display/dc/inc/dcn_calcs.h index 55529c5f471ce9917a7b52ebcbf022727d763c0b..d19a595c2be408347719b7f712f5175487a253b1 100644 --- a/drivers/gpu/drm/amd/display/dc/inc/dcn_calcs.h +++ b/drivers/gpu/drm/amd/display/dc/inc/dcn_calcs.h @@ -624,10 +624,6 @@ bool dcn_validate_bandwidth( struct dc_state *context, bool fast_validate); -unsigned int dcn_find_dcfclk_suits_all( - const struct dc *dc, - struct dc_clocks *clocks); - void dcn_get_soc_clks( struct dc *dc, int *min_fclk_khz, diff --git a/drivers/gpu/drm/amd/display/dc/inc/hw/clk_mgr_internal.h b/drivers/gpu/drm/amd/display/dc/inc/hw/clk_mgr_internal.h index c2dd061892f4d9741d9d4bda6eaa8cb04658ec73..7a1ca1e98059b0c4a256934e61fa2472a7900f5e 100644 --- a/drivers/gpu/drm/amd/display/dc/inc/hw/clk_mgr_internal.h +++ b/drivers/gpu/drm/amd/display/dc/inc/hw/clk_mgr_internal.h @@ -166,6 +166,41 @@ enum dentist_divider_range { CLK_SR_DCN32(CLK1_CLK4_CURRENT_CNT), \ CLK_SR_DCN32(CLK4_CLK0_CURRENT_CNT) +#define CLK_REG_LIST_DCN35() \ + CLK_SR_DCN35(CLK1_CLK_PLL_REQ), \ + CLK_SR_DCN35(CLK1_CLK0_DFS_CNTL), \ + CLK_SR_DCN35(CLK1_CLK1_DFS_CNTL), \ + CLK_SR_DCN35(CLK1_CLK2_DFS_CNTL), \ + CLK_SR_DCN35(CLK1_CLK3_DFS_CNTL), \ + CLK_SR_DCN35(CLK1_CLK4_DFS_CNTL), \ + CLK_SR_DCN35(CLK1_CLK5_DFS_CNTL), \ + CLK_SR_DCN35(CLK1_CLK0_CURRENT_CNT), \ + CLK_SR_DCN35(CLK1_CLK1_CURRENT_CNT), \ + CLK_SR_DCN35(CLK1_CLK2_CURRENT_CNT), \ + CLK_SR_DCN35(CLK1_CLK3_CURRENT_CNT), \ + CLK_SR_DCN35(CLK1_CLK4_CURRENT_CNT), \ + CLK_SR_DCN35(CLK1_CLK5_CURRENT_CNT), \ + CLK_SR_DCN35(CLK1_CLK0_BYPASS_CNTL), \ + CLK_SR_DCN35(CLK1_CLK1_BYPASS_CNTL), \ + CLK_SR_DCN35(CLK1_CLK2_BYPASS_CNTL), \ + CLK_SR_DCN35(CLK1_CLK3_BYPASS_CNTL), \ + CLK_SR_DCN35(CLK1_CLK4_BYPASS_CNTL),\ + CLK_SR_DCN35(CLK1_CLK5_BYPASS_CNTL), \ + CLK_SR_DCN35(CLK1_CLK0_DS_CNTL), \ + CLK_SR_DCN35(CLK1_CLK1_DS_CNTL), \ + CLK_SR_DCN35(CLK1_CLK2_DS_CNTL), \ + CLK_SR_DCN35(CLK1_CLK3_DS_CNTL), \ + CLK_SR_DCN35(CLK1_CLK4_DS_CNTL), \ + CLK_SR_DCN35(CLK1_CLK5_DS_CNTL), \ + CLK_SR_DCN35(CLK1_CLK0_ALLOW_DS), \ + CLK_SR_DCN35(CLK1_CLK1_ALLOW_DS), \ + CLK_SR_DCN35(CLK1_CLK2_ALLOW_DS), \ + CLK_SR_DCN35(CLK1_CLK3_ALLOW_DS), \ + CLK_SR_DCN35(CLK1_CLK4_ALLOW_DS), \ + CLK_SR_DCN35(CLK1_CLK5_ALLOW_DS), \ + CLK_SR_DCN35(CLK5_spll_field_8), \ + SR(DENTIST_DISPCLK_CNTL), \ + #define CLK_COMMON_MASK_SH_LIST_DCN32(mask_sh) \ CLK_COMMON_MASK_SH_LIST_DCN20_BASE(mask_sh),\ CLK_SF(CLK1_CLK_PLL_REQ, FbMult_int, mask_sh),\ @@ -236,6 +271,7 @@ struct clk_mgr_registers { uint32_t CLK1_CLK2_DFS_CNTL; uint32_t CLK1_CLK3_DFS_CNTL; uint32_t CLK1_CLK4_DFS_CNTL; + uint32_t CLK1_CLK5_DFS_CNTL; uint32_t CLK2_CLK2_DFS_CNTL; uint32_t CLK1_CLK0_CURRENT_CNT; @@ -243,11 +279,34 @@ struct clk_mgr_registers { uint32_t CLK1_CLK2_CURRENT_CNT; uint32_t CLK1_CLK3_CURRENT_CNT; uint32_t CLK1_CLK4_CURRENT_CNT; + uint32_t CLK1_CLK5_CURRENT_CNT; uint32_t CLK0_CLK0_DFS_CNTL; uint32_t CLK0_CLK1_DFS_CNTL; uint32_t CLK0_CLK3_DFS_CNTL; uint32_t CLK0_CLK4_DFS_CNTL; + uint32_t CLK1_CLK0_BYPASS_CNTL; + uint32_t CLK1_CLK1_BYPASS_CNTL; + uint32_t CLK1_CLK2_BYPASS_CNTL; + uint32_t CLK1_CLK3_BYPASS_CNTL; + uint32_t CLK1_CLK4_BYPASS_CNTL; + uint32_t CLK1_CLK5_BYPASS_CNTL; + + uint32_t CLK1_CLK0_DS_CNTL; + uint32_t CLK1_CLK1_DS_CNTL; + uint32_t CLK1_CLK2_DS_CNTL; + uint32_t CLK1_CLK3_DS_CNTL; + uint32_t CLK1_CLK4_DS_CNTL; + uint32_t CLK1_CLK5_DS_CNTL; + + uint32_t CLK1_CLK0_ALLOW_DS; + uint32_t CLK1_CLK1_ALLOW_DS; + uint32_t CLK1_CLK2_ALLOW_DS; + uint32_t CLK1_CLK3_ALLOW_DS; + uint32_t CLK1_CLK4_ALLOW_DS; + uint32_t CLK1_CLK5_ALLOW_DS; + uint32_t CLK5_spll_field_8; + }; struct clk_mgr_shift { diff --git a/drivers/gpu/drm/amd/display/dc/inc/hw/hubp.h b/drivers/gpu/drm/amd/display/dc/inc/hw/hubp.h index 16580d62427891fe91001f0e0da8ea301db6b216..d0878fc0cc948b7fcfa1eb6c6d6a3b3ccff467aa 100644 --- a/drivers/gpu/drm/amd/display/dc/inc/hw/hubp.h +++ b/drivers/gpu/drm/amd/display/dc/inc/hw/hubp.h @@ -275,6 +275,7 @@ struct hubp_funcs { enum hubp_3dlut_fl_crossbar_bit_slice bit_slice_cb_b, enum hubp_3dlut_fl_crossbar_bit_slice bit_slice_cr_r); int (*hubp_get_3dlut_fl_done)(struct hubp *hubp); + void (*hubp_clear_tiling)(struct hubp *hubp); }; #endif diff --git a/drivers/gpu/drm/amd/display/dc/inc/hw/mem_input.h b/drivers/gpu/drm/amd/display/dc/inc/hw/mem_input.h index a8b44f398ce688f0042c42f6605e1e74777a9543..4f5d102455cac2151c2ee085f44f6b322a7ea902 100644 --- a/drivers/gpu/drm/amd/display/dc/inc/hw/mem_input.h +++ b/drivers/gpu/drm/amd/display/dc/inc/hw/mem_input.h @@ -187,6 +187,8 @@ struct mem_input_funcs { const struct dc_cursor_position *pos, const struct dc_cursor_mi_param *param); + void (*mem_input_clear_tiling)( + struct mem_input *mem_input); }; #endif diff --git a/drivers/gpu/drm/amd/display/dc/irq/dcn201/irq_service_dcn201.c b/drivers/gpu/drm/amd/display/dc/irq/dcn201/irq_service_dcn201.c index 4fb9cd6708d5a8b3bb200c53e2857934bff8e9e4..1d61d475d36fe6d85a92ddd8679e53313c182433 100644 --- a/drivers/gpu/drm/amd/display/dc/irq/dcn201/irq_service_dcn201.c +++ b/drivers/gpu/drm/amd/display/dc/irq/dcn201/irq_service_dcn201.c @@ -30,8 +30,8 @@ #include "../dce110/irq_service_dce110.h" #include "irq_service_dcn201.h" -#include "dcn/dcn_2_0_3_offset.h" -#include "dcn/dcn_2_0_3_sh_mask.h" +#include "dcn/dcn_2_0_1_offset.h" +#include "dcn/dcn_2_0_1_sh_mask.h" #include "cyan_skillfish_ip_offset.h" #include "soc15_hw_ip.h" diff --git a/drivers/gpu/drm/amd/display/dc/link/link_dpms.c b/drivers/gpu/drm/amd/display/dc/link/link_dpms.c index 5d66bfc7fe6e6d4a488a5434682a2b358508cee5..60e64e0138a32e9a6281b6c2c57a66f68a6df2a1 100644 --- a/drivers/gpu/drm/amd/display/dc/link/link_dpms.c +++ b/drivers/gpu/drm/amd/display/dc/link/link_dpms.c @@ -1953,6 +1953,10 @@ static void enable_link_hdmi(struct pipe_ctx *pipe_ctx) stream->phy_pix_clk = stream->timing.pix_clk_100hz / 10; if (stream->phy_pix_clk > 340000) is_over_340mhz = true; + if (dc_is_tmds_signal(stream->signal) && stream->phy_pix_clk > 6000000UL) { + ASSERT(false); + return; + } if (dc_is_hdmi_signal(pipe_ctx->stream->signal)) { unsigned short masked_chip_caps = pipe_ctx->stream->link->chip_caps & diff --git a/drivers/gpu/drm/amd/display/dc/link/protocols/link_dp_capability.c b/drivers/gpu/drm/amd/display/dc/link/protocols/link_dp_capability.c index 9dabaf682171d2f3945e9c58bac370cf45ddd065..ce9d799d2386df752fbcc967982d3f77e107aeec 100644 --- a/drivers/gpu/drm/amd/display/dc/link/protocols/link_dp_capability.c +++ b/drivers/gpu/drm/amd/display/dc/link/protocols/link_dp_capability.c @@ -1632,13 +1632,6 @@ static bool retrieve_link_cap(struct dc_link *link) sizeof(link->dpcd_caps.lttpr_caps.phy_repeater_cnt)); } - /* Read DP tunneling information. */ - if (link->ep_type == DISPLAY_ENDPOINT_USB4_DPIA) { - status = dpcd_get_tunneling_device_data(link); - if (status != DC_OK) - dm_error("%s: Read tunneling device data failed.\n", __func__); - } - dpcd_set_source_specific_data(link); /* Sink may need to configure internals based on vendor, so allow some * time before proceeding with possibly vendor specific transactions @@ -1711,7 +1704,7 @@ static bool retrieve_link_cap(struct dc_link *link) link->dpcd_caps.dprx_feature.raw = dpcd_dprx_data; if (status != DC_OK) - dm_error("%s: Read DPRX caps data failed.\n", __func__); + dm_error("%s: Read DPRX feature list failed.\n", __func__); /* AdaptiveSyncCapability */ dpcd_dprx_data = 0; @@ -1726,15 +1719,13 @@ static bool retrieve_link_cap(struct dc_link *link) link->dpcd_caps.adaptive_sync_caps.dp_adap_sync_caps.raw = dpcd_dprx_data; if (status != DC_OK) - dm_error("%s: Read DPRX caps data failed. Addr:%#x\n", + dm_error("%s: Read DPRX feature list_1 failed. Addr:%#x\n", __func__, DP_DPRX_FEATURE_ENUMERATION_LIST_CONT_1); } - else { link->dpcd_caps.dprx_feature.raw = 0; } - /* Error condition checking... * It is impossible for Sink to report Max Lane Count = 0. * It is possible for Sink to report Max Link Rate = 0, if it is @@ -1918,6 +1909,7 @@ static bool retrieve_link_cap(struct dc_link *link) if (link->dpcd_caps.channel_coding_cap.bits.DP_128b_132b_SUPPORTED) { DC_LOG_DP2("128b/132b encoding is supported at link %d", link->link_index); + /* Read 128b/132b suppoerted link rates */ core_link_read_dpcd(link, DP_128B132B_SUPPORTED_LINK_RATES, &link->dpcd_caps.dp_128b_132b_supported_link_rates.raw, @@ -1965,6 +1957,13 @@ static bool retrieve_link_cap(struct dc_link *link) link->dpcd_caps.max_uncompressed_pixel_rate_cap.raw, sizeof(link->dpcd_caps.max_uncompressed_pixel_rate_cap.raw)); + /* Read DP tunneling information. */ + if (link->ep_type == DISPLAY_ENDPOINT_USB4_DPIA) { + status = dpcd_get_tunneling_device_data(link); + if (status != DC_OK) + dm_error("%s: Read DP tunneling device data failed.\n", __func__); + } + retrieve_cable_id(link); dpcd_write_cable_id_to_dprx(link); diff --git a/drivers/gpu/drm/amd/display/dc/mpc/dcn30/dcn30_mpc.c b/drivers/gpu/drm/amd/display/dc/mpc/dcn30/dcn30_mpc.c index fe26fde12eeb3c407a7ae75458c815f48c61d588..85298b8a1b5efa1abfdcdec2f66afbb35c1d7532 100644 --- a/drivers/gpu/drm/amd/display/dc/mpc/dcn30/dcn30_mpc.c +++ b/drivers/gpu/drm/amd/display/dc/mpc/dcn30/dcn30_mpc.c @@ -110,6 +110,23 @@ void mpc3_disable_dwb_mux( MPC_DWB0_MUX, 0xf); } +void mpc3_set_out_rate_control( + struct mpc *mpc, + int opp_id, + bool enable, + bool rate_2x_mode, + struct mpc_dwb_flow_control *flow_control) +{ + struct dcn30_mpc *mpc30 = TO_DCN30_MPC(mpc); + + /* Always disable mpc out rate and flow control. + * MPC flow rate control is not needed for DCN30 and above. + */ + REG_UPDATE_2(MUX[opp_id], + MPC_OUT_RATE_CONTROL_DISABLE, 1, + MPC_OUT_RATE_CONTROL, 0); +} + enum dc_lut_mode mpc3_get_ogam_current(struct mpc *mpc, int mpcc_id) { /*Contrary to DCN2 and DCN1 wherein a single status register field holds this info; @@ -1519,6 +1536,7 @@ static const struct mpc_funcs dcn30_mpc_funcs = { .set_dwb_mux = mpc3_set_dwb_mux, .disable_dwb_mux = mpc3_disable_dwb_mux, .is_dwb_idle = mpc3_is_dwb_idle, + .set_out_rate_control = mpc3_set_out_rate_control, .set_gamut_remap = mpc3_set_gamut_remap, .program_shaper = mpc3_program_shaper, .acquire_rmu = mpcc3_acquire_rmu, diff --git a/drivers/gpu/drm/amd/display/dc/mpc/dcn30/dcn30_mpc.h b/drivers/gpu/drm/amd/display/dc/mpc/dcn30/dcn30_mpc.h index ce93003dae011325efa0e1e3e5351646a5406017..103f29900a2c70380e8744ccc67758956b533687 100644 --- a/drivers/gpu/drm/amd/display/dc/mpc/dcn30/dcn30_mpc.h +++ b/drivers/gpu/drm/amd/display/dc/mpc/dcn30/dcn30_mpc.h @@ -1085,6 +1085,13 @@ bool mpc3_is_dwb_idle( struct mpc *mpc, int dwb_id); +void mpc3_set_out_rate_control( + struct mpc *mpc, + int opp_id, + bool enable, + bool rate_2x_mode, + struct mpc_dwb_flow_control *flow_control); + void mpc3_power_on_ogam_lut( struct mpc *mpc, int mpcc_id, bool power_on); diff --git a/drivers/gpu/drm/amd/display/dc/optc/dcn401/dcn401_optc.c b/drivers/gpu/drm/amd/display/dc/optc/dcn401/dcn401_optc.c index 783ca9acc76266b696a270b81dfd64c497a4cf6b..338a0cad23a52364e3811213c5a5e1fe033fb232 100644 --- a/drivers/gpu/drm/amd/display/dc/optc/dcn401/dcn401_optc.c +++ b/drivers/gpu/drm/amd/display/dc/optc/dcn401/dcn401_optc.c @@ -315,7 +315,7 @@ void optc401_set_drr( struct drr_params amended_params = { 0 }; bool program_manual_trigger = false; - if (dc->caps.dmub_caps.fams_ver >= 2 && dc->debug.fams2_config.bits.enable) { + if (dc->caps.dmub_caps.fams_ver == dc->debug.fams_version.ver && dc->debug.fams2_config.bits.enable) { if (params != NULL && params->vertical_total_max > 0 && params->vertical_total_min > 0) { @@ -380,7 +380,7 @@ void optc401_set_vtotal_min_max(struct timing_generator *optc, int vtotal_min, i { struct dc *dc = optc->ctx->dc; - if (dc->caps.dmub_caps.fams_ver >= 2 && dc->debug.fams2_config.bits.enable) { + if (dc->caps.dmub_caps.fams_ver == dc->debug.fams_version.ver && dc->debug.fams2_config.bits.enable) { /* FAMS2 */ dc_dmub_srv_fams2_drr_update(dc, optc->inst, vtotal_min, diff --git a/drivers/gpu/drm/amd/display/dc/resource/dcn20/dcn20_resource.c b/drivers/gpu/drm/amd/display/dc/resource/dcn20/dcn20_resource.c index 7a5b9aa5292cda3f35cbc5841c10841802f10d7c..5c616b1f7bf71b98c4398b935052953d5d806b10 100644 --- a/drivers/gpu/drm/amd/display/dc/resource/dcn20/dcn20_resource.c +++ b/drivers/gpu/drm/amd/display/dc/resource/dcn20/dcn20_resource.c @@ -1509,60 +1509,9 @@ bool dcn20_split_stream_for_odm( next_odm_pipe->prev_odm_pipe = prev_odm_pipe; if (prev_odm_pipe->plane_state) { - struct scaler_data *sd = &prev_odm_pipe->plane_res.scl_data; - struct output_pixel_processor *opp = next_odm_pipe->stream_res.opp; - int new_width; - - /* HACTIVE halved for odm combine */ - sd->h_active /= 2; - /* Calculate new vp and recout for left pipe */ - /* Need at least 16 pixels width per side */ - if (sd->recout.x + 16 >= sd->h_active) - return false; - new_width = sd->h_active - sd->recout.x; - sd->viewport.width -= dc_fixpt_floor(dc_fixpt_mul_int( - sd->ratios.horz, sd->recout.width - new_width)); - sd->viewport_c.width -= dc_fixpt_floor(dc_fixpt_mul_int( - sd->ratios.horz_c, sd->recout.width - new_width)); - sd->recout.width = new_width; - - /* Calculate new vp and recout for right pipe */ - sd = &next_odm_pipe->plane_res.scl_data; - /* HACTIVE halved for odm combine */ - sd->h_active /= 2; - /* Need at least 16 pixels width per side */ - if (new_width <= 16) - return false; - new_width = sd->recout.width + sd->recout.x - sd->h_active; - sd->viewport.width -= dc_fixpt_floor(dc_fixpt_mul_int( - sd->ratios.horz, sd->recout.width - new_width)); - sd->viewport_c.width -= dc_fixpt_floor(dc_fixpt_mul_int( - sd->ratios.horz_c, sd->recout.width - new_width)); - sd->recout.width = new_width; - sd->viewport.x += dc_fixpt_floor(dc_fixpt_mul_int( - sd->ratios.horz, sd->h_active - sd->recout.x)); - sd->viewport_c.x += dc_fixpt_floor(dc_fixpt_mul_int( - sd->ratios.horz_c, sd->h_active - sd->recout.x)); - sd->recout.x = 0; - - /* - * When odm is used in YcbCr422 or 420 colour space, a split screen - * will be seen with the previous calculations since the extra left - * edge pixel is accounted for in fmt but not in viewport. - * - * Below are calculations which fix the split by fixing the calculations - * if there is an extra left edge pixel. - */ - if (opp && opp->funcs->opp_get_left_edge_extra_pixel_count - && opp->funcs->opp_get_left_edge_extra_pixel_count( - opp, next_odm_pipe->stream->timing.pixel_encoding, - resource_is_pipe_type(next_odm_pipe, OTG_MASTER)) == 1) { - sd->h_active += 1; - sd->recout.width += 1; - sd->viewport.x -= dc_fixpt_ceil(dc_fixpt_mul_int(sd->ratios.horz, 1)); - sd->viewport_c.x -= dc_fixpt_ceil(dc_fixpt_mul_int(sd->ratios.horz, 1)); - sd->viewport_c.width += dc_fixpt_ceil(dc_fixpt_mul_int(sd->ratios.horz, 1)); - sd->viewport.width += dc_fixpt_ceil(dc_fixpt_mul_int(sd->ratios.horz, 1)); + if (!resource_build_scaling_params(prev_odm_pipe) || + !resource_build_scaling_params(next_odm_pipe)) { + return false; } } diff --git a/drivers/gpu/drm/amd/display/dc/resource/dcn201/dcn201_resource.c b/drivers/gpu/drm/amd/display/dc/resource/dcn201/dcn201_resource.c index d3d67d36652308ab36a1c28391831c3a73ff9faf..9f37f0097feb42a92542efd29a1a9e82737cf51a 100644 --- a/drivers/gpu/drm/amd/display/dc/resource/dcn201/dcn201_resource.c +++ b/drivers/gpu/drm/amd/display/dc/resource/dcn201/dcn201_resource.c @@ -59,8 +59,8 @@ #include "cyan_skillfish_ip_offset.h" -#include "dcn/dcn_2_0_3_offset.h" -#include "dcn/dcn_2_0_3_sh_mask.h" +#include "dcn/dcn_2_0_1_offset.h" +#include "dcn/dcn_2_0_1_sh_mask.h" #include "dpcs/dpcs_2_0_3_offset.h" #include "dpcs/dpcs_2_0_3_sh_mask.h" diff --git a/drivers/gpu/drm/amd/display/dc/resource/dcn401/dcn401_resource.c b/drivers/gpu/drm/amd/display/dc/resource/dcn401/dcn401_resource.c index 2a3dabfe3ceac46ce56f7c0e08425e4fd2a5e825..09f5b8b40791e50b42db0d4621c73e82d5679ac4 100644 --- a/drivers/gpu/drm/amd/display/dc/resource/dcn401/dcn401_resource.c +++ b/drivers/gpu/drm/amd/display/dc/resource/dcn401/dcn401_resource.c @@ -726,6 +726,10 @@ static const struct dc_debug_options debug_defaults_drv = { .disable_unbounded_requesting = false, .enable_legacy_fast_update = false, .dcc_meta_propagation_delay_us = 10, + .fams_version = { + .minor = 1, + .major = 2, + }, //v2.1 .fams2_config = { .bits = { .enable = true, diff --git a/drivers/gpu/drm/amd/display/dc/spl/dc_spl.c b/drivers/gpu/drm/amd/display/dc/spl/dc_spl.c index 73a65913cb12476b6ce4db3eb056858e998e0568..1306ce0321e2dd0bcb2a3b074985259cc1a78ea9 100644 --- a/drivers/gpu/drm/amd/display/dc/spl/dc_spl.c +++ b/drivers/gpu/drm/amd/display/dc/spl/dc_spl.c @@ -137,15 +137,32 @@ static struct spl_rect calculate_mpc_slice_in_timing_active( struct spl_in *spl_in, struct spl_rect *plane_clip_rec) { - int mpc_slice_count = spl_in->basic_in.mpc_combine_h; - int mpc_slice_idx = spl_in->basic_in.mpc_combine_v; + bool use_recout_width_aligned = + spl_in->basic_in.num_h_slices_recout_width_align.use_recout_width_aligned; + int mpc_slice_count = + spl_in->basic_in.num_h_slices_recout_width_align.num_slices_recout_width.mpc_num_h_slices; + int recout_width_align = + spl_in->basic_in.num_h_slices_recout_width_align.num_slices_recout_width.mpc_recout_width_align; + int mpc_slice_idx = spl_in->basic_in.mpc_h_slice_index; int epimo = mpc_slice_count - plane_clip_rec->width % mpc_slice_count - 1; struct spl_rect mpc_rec; - mpc_rec.width = plane_clip_rec->width / mpc_slice_count; - mpc_rec.x = plane_clip_rec->x + mpc_rec.width * mpc_slice_idx; - mpc_rec.height = plane_clip_rec->height; - mpc_rec.y = plane_clip_rec->y; + if (use_recout_width_aligned) { + mpc_rec.width = recout_width_align; + if ((mpc_rec.width * (mpc_slice_idx + 1)) > plane_clip_rec->width) { + mpc_rec.width = plane_clip_rec->width % recout_width_align; + mpc_rec.x = plane_clip_rec->x + recout_width_align * mpc_slice_idx; + } else + mpc_rec.x = plane_clip_rec->x + mpc_rec.width * mpc_slice_idx; + mpc_rec.height = plane_clip_rec->height; + mpc_rec.y = plane_clip_rec->y; + + } else { + mpc_rec.width = plane_clip_rec->width / mpc_slice_count; + mpc_rec.x = plane_clip_rec->x + mpc_rec.width * mpc_slice_idx; + mpc_rec.height = plane_clip_rec->height; + mpc_rec.y = plane_clip_rec->y; + } SPL_ASSERT(mpc_slice_count == 1 || spl_in->basic_out.view_format != SPL_VIEW_3D_SIDE_BY_SIDE || mpc_rec.width % 2 == 0); @@ -546,6 +563,24 @@ static bool spl_is_rgb8(enum spl_pixel_format format) return false; } +static bool spl_is_video_format(enum spl_pixel_format format) +{ + if (format >= SPL_PIXEL_FORMAT_VIDEO_BEGIN + && format <= SPL_PIXEL_FORMAT_VIDEO_END) + return true; + else + return false; +} + +static bool spl_is_subsampled_format(enum spl_pixel_format format) +{ + if (format >= SPL_PIXEL_FORMAT_SUBSAMPLED_BEGIN + && format <= SPL_PIXEL_FORMAT_SUBSAMPLED_END) + return true; + else + return false; +} + /*Calculate inits and viewport */ static void spl_calculate_inits_and_viewports(struct spl_in *spl_in, struct spl_scratch *spl_scratch) @@ -590,7 +625,7 @@ static void spl_calculate_inits_and_viewports(struct spl_in *spl_in, spl_swap(flip_vert_scan_dir, flip_horz_scan_dir); } - if (spl_is_yuv420(spl_in->basic_in.format)) { + if (spl_is_subsampled_format(spl_in->basic_in.format)) { /* this gives the direction of the cositing (negative will move * left, right otherwise) */ @@ -678,7 +713,7 @@ static void spl_handle_3d_recout(struct spl_in *spl_in, struct spl_rect *recout) * since 3d is special and needs to calculate vp as if there is no recout offset * This may break with rotation, good thing we aren't mixing hw rotation and 3d */ - if (spl_in->basic_in.mpc_combine_v) { + if (spl_in->basic_in.mpc_h_slice_index) { SPL_ASSERT(spl_in->basic_in.rotation == SPL_ROTATION_ANGLE_0 || (spl_in->basic_out.view_format != SPL_VIEW_3D_TOP_AND_BOTTOM && spl_in->basic_out.view_format != SPL_VIEW_3D_SIDE_BY_SIDE)); @@ -698,24 +733,6 @@ static void spl_clamp_viewport(struct spl_rect *viewport) viewport->width = MIN_VIEWPORT_SIZE; } -static bool spl_dscl_is_420_format(enum spl_pixel_format format) -{ - if (format == SPL_PIXEL_FORMAT_420BPP8 || - format == SPL_PIXEL_FORMAT_420BPP10) - return true; - else - return false; -} - -static bool spl_dscl_is_video_format(enum spl_pixel_format format) -{ - if (format >= SPL_PIXEL_FORMAT_VIDEO_BEGIN - && format <= SPL_PIXEL_FORMAT_VIDEO_END) - return true; - else - return false; -} - static enum scl_mode spl_get_dscl_mode(const struct spl_in *spl_in, const struct spl_scaler_data *data, bool enable_isharp, bool enable_easf) @@ -732,8 +749,8 @@ static enum scl_mode spl_get_dscl_mode(const struct spl_in *spl_in, && !enable_isharp) return SCL_MODE_SCALING_444_BYPASS; - if (!spl_dscl_is_420_format(pixel_format)) { - if (spl_dscl_is_video_format(pixel_format)) + if (!spl_is_subsampled_format(pixel_format)) { + if (spl_is_video_format(pixel_format)) return SCL_MODE_SCALING_444_YCBCR_ENABLE; else return SCL_MODE_SCALING_444_RGB_ENABLE; @@ -756,7 +773,7 @@ static bool spl_choose_lls_policy(enum spl_pixel_format format, enum spl_transfer_func_predefined tf_predefined_type, enum linear_light_scaling *lls_pref) { - if (spl_is_yuv420(format)) { + if (spl_is_video_format(format)) { *lls_pref = LLS_PREF_NO; if ((tf_type == SPL_TF_TYPE_PREDEFINED) || (tf_type == SPL_TF_TYPE_DISTRIBUTED_POINTS)) @@ -815,7 +832,7 @@ static bool enable_easf(struct spl_in *spl_in, struct spl_scratch *spl_scratch) /* Check if video is in fullscreen mode */ static bool spl_is_video_fullscreen(struct spl_in *spl_in) { - if (spl_is_yuv420(spl_in->basic_in.format) && spl_in->is_fullscreen) + if (spl_is_video_format(spl_in->basic_in.format) && spl_in->is_fullscreen) return true; return false; } @@ -846,10 +863,10 @@ static bool spl_get_isharp_en(struct spl_in *spl_in, * Apply sharpness to RGB and YUV (NV12/P010) * surfaces based on policy setting */ - if (!spl_is_yuv420(spl_in->basic_in.format) && + if (!spl_is_video_format(spl_in->basic_in.format) && (spl_in->sharpen_policy == SHARPEN_YUV)) return enable_isharp; - else if ((spl_is_yuv420(spl_in->basic_in.format) && !fullscreen) && + else if ((spl_is_video_format(spl_in->basic_in.format) && !fullscreen) && (spl_in->sharpen_policy == SHARPEN_RGB_FULLSCREEN_YUV)) return enable_isharp; else if (!spl_in->is_fullscreen && @@ -882,8 +899,8 @@ static void spl_get_taps_non_adaptive_scaler( if (in_taps->v_taps == 0) { if (spl_fixpt_ceil(spl_scratch->scl_data.ratios.vert) > 1) - spl_scratch->scl_data.taps.v_taps = spl_min(spl_fixpt_ceil(spl_fixpt_mul_int( - spl_scratch->scl_data.ratios.vert, 2)), 8); + spl_scratch->scl_data.taps.v_taps = spl_min(2 * spl_fixpt_ceil( + spl_scratch->scl_data.ratios.vert), 8); else spl_scratch->scl_data.taps.v_taps = 4; } else @@ -891,8 +908,8 @@ static void spl_get_taps_non_adaptive_scaler( if (in_taps->v_taps_c == 0) { if (spl_fixpt_ceil(spl_scratch->scl_data.ratios.vert_c) > 1) - spl_scratch->scl_data.taps.v_taps_c = spl_min(spl_fixpt_ceil(spl_fixpt_mul_int( - spl_scratch->scl_data.ratios.vert_c, 2)), 8); + spl_scratch->scl_data.taps.v_taps_c = spl_min(2 * spl_fixpt_ceil( + spl_scratch->scl_data.ratios.vert_c), 8); else spl_scratch->scl_data.taps.v_taps_c = 4; } else @@ -932,7 +949,7 @@ static bool spl_get_optimal_number_of_taps( int min_taps_y, min_taps_c; enum lb_memory_config lb_config; bool skip_easf = false; - bool is_ycbcr = spl_dscl_is_video_format(spl_in->basic_in.format); + bool is_subsampled = spl_is_subsampled_format(spl_in->basic_in.format); if (spl_scratch->scl_data.viewport.width > spl_scratch->scl_data.h_active && max_downscale_src_width != 0 && @@ -964,7 +981,7 @@ static bool spl_get_optimal_number_of_taps( if (skip_easf) spl_get_taps_non_adaptive_scaler(spl_scratch, in_taps); else { - if (spl_is_yuv420(spl_in->basic_in.format)) { + if (spl_is_video_format(spl_in->basic_in.format)) { spl_scratch->scl_data.taps.h_taps = 6; spl_scratch->scl_data.taps.v_taps = 6; spl_scratch->scl_data.taps.h_taps_c = 4; @@ -982,8 +999,7 @@ static bool spl_get_optimal_number_of_taps( min_taps_c = spl_fixpt_ceil(spl_scratch->scl_data.ratios.vert_c); /* Use LB_MEMORY_CONFIG_3 for 4:2:0 */ - if ((spl_in->basic_in.format == SPL_PIXEL_FORMAT_420BPP8) - || (spl_in->basic_in.format == SPL_PIXEL_FORMAT_420BPP10)) + if (spl_is_yuv420(spl_in->basic_in.format)) lb_config = LB_MEMORY_CONFIG_3; else lb_config = LB_MEMORY_CONFIG_0; @@ -1039,13 +1055,11 @@ static bool spl_get_optimal_number_of_taps( if (spl_scratch->scl_data.taps.h_taps_c == 5) spl_scratch->scl_data.taps.h_taps_c = 4; - if (spl_is_yuv420(spl_in->basic_in.format)) { - if ((spl_scratch->scl_data.taps.h_taps <= 4) || - (spl_scratch->scl_data.taps.h_taps_c <= 3)) { + if (spl_is_video_format(spl_in->basic_in.format)) { + if (spl_scratch->scl_data.taps.h_taps <= 4) { *enable_easf_v = false; *enable_easf_h = false; - } else if ((spl_scratch->scl_data.taps.v_taps <= 3) || - (spl_scratch->scl_data.taps.v_taps_c <= 3)) { + } else if (spl_scratch->scl_data.taps.v_taps <= 3) { *enable_easf_v = false; *enable_easf_h = true; } else { @@ -1086,10 +1100,10 @@ static bool spl_get_optimal_number_of_taps( spl_scratch->scl_data.taps.h_taps = 1; spl_scratch->scl_data.taps.v_taps = 1; - if (IDENTITY_RATIO(spl_scratch->scl_data.ratios.horz_c) && !is_ycbcr) + if (IDENTITY_RATIO(spl_scratch->scl_data.ratios.horz_c) && !is_subsampled) spl_scratch->scl_data.taps.h_taps_c = 1; - if (IDENTITY_RATIO(spl_scratch->scl_data.ratios.vert_c) && !is_ycbcr) + if (IDENTITY_RATIO(spl_scratch->scl_data.ratios.vert_c) && !is_subsampled) spl_scratch->scl_data.taps.v_taps_c = 1; *enable_easf_v = false; @@ -1103,11 +1117,11 @@ static bool spl_get_optimal_number_of_taps( (IDENTITY_RATIO(spl_scratch->scl_data.ratios.vert))) spl_scratch->scl_data.taps.v_taps = 1; - if ((!*enable_easf_h) && !is_ycbcr && + if ((!*enable_easf_h) && !is_subsampled && (IDENTITY_RATIO(spl_scratch->scl_data.ratios.horz_c))) spl_scratch->scl_data.taps.h_taps_c = 1; - if ((!*enable_easf_v) && !is_ycbcr && + if ((!*enable_easf_v) && !is_subsampled && (IDENTITY_RATIO(spl_scratch->scl_data.ratios.vert_c))) spl_scratch->scl_data.taps.v_taps_c = 1; } @@ -1118,7 +1132,7 @@ static bool spl_get_optimal_number_of_taps( static void spl_set_black_color_data(enum spl_pixel_format format, struct scl_black_color *scl_black_color) { - bool ycbcr = spl_dscl_is_video_format(format); + bool ycbcr = spl_is_video_format(format); if (ycbcr) { scl_black_color->offset_rgb_y = BLACK_OFFSET_RGB_Y; scl_black_color->offset_rgb_cbcr = BLACK_OFFSET_CBCR; @@ -1585,7 +1599,7 @@ static void spl_set_easf_data(struct spl_scratch *spl_scratch, struct spl_out *s 0x0; // fp1.5.10, C3 coefficient } - if (spl_is_yuv420(format)) { /* TODO: 0 = RGB, 1 = YUV */ + if (spl_is_video_format(format)) { /* TODO: 0 = RGB, 1 = YUV */ dscl_prog_data->easf_matrix_mode = 1; /* * 2-bit, BF3 chroma mode correction calculation mode diff --git a/drivers/gpu/drm/amd/display/dc/spl/dc_spl_types.h b/drivers/gpu/drm/amd/display/dc/spl/dc_spl_types.h index 55d557df4aa5bfc13eb73e39684f7218f99d1464..467af9dd90ded18396cf3dc9a0b7e9d6e3c654be 100644 --- a/drivers/gpu/drm/amd/display/dc/spl/dc_spl_types.h +++ b/drivers/gpu/drm/amd/display/dc/spl/dc_spl_types.h @@ -63,13 +63,13 @@ enum spl_pixel_format { SPL_PIXEL_FORMAT_420BPP8, SPL_PIXEL_FORMAT_420BPP10, /*end of pixel format definition*/ - SPL_PIXEL_FORMAT_INVALID, - SPL_PIXEL_FORMAT_422BPP8, - SPL_PIXEL_FORMAT_422BPP10, SPL_PIXEL_FORMAT_GRPH_BEGIN = SPL_PIXEL_FORMAT_INDEX8, SPL_PIXEL_FORMAT_GRPH_END = SPL_PIXEL_FORMAT_FP16, + SPL_PIXEL_FORMAT_SUBSAMPLED_BEGIN = SPL_PIXEL_FORMAT_420BPP8, + SPL_PIXEL_FORMAT_SUBSAMPLED_END = SPL_PIXEL_FORMAT_420BPP10, SPL_PIXEL_FORMAT_VIDEO_BEGIN = SPL_PIXEL_FORMAT_420BPP8, SPL_PIXEL_FORMAT_VIDEO_END = SPL_PIXEL_FORMAT_420BPP10, + SPL_PIXEL_FORMAT_INVALID, SPL_PIXEL_FORMAT_UNKNOWN }; @@ -436,8 +436,14 @@ struct basic_in { struct spl_rect clip_rect; // Clip rect enum spl_rotation_angle rotation; // Rotation bool horizontal_mirror; // Horizontal mirror - int mpc_combine_h; // MPC Horizontal Combine Factor (split_count) - int mpc_combine_v; // MPC Vertical Combine Factor (split_idx) + struct { // previous mpc_combine_h - split count + bool use_recout_width_aligned; + union { + int mpc_num_h_slices; + int mpc_recout_width_align; + } num_slices_recout_width; + } num_h_slices_recout_width_align; + int mpc_h_slice_index; // previous mpc_combine_v - split_idx // Inputs for adaptive scaler - TODO enum spl_transfer_func_type tf_type; /* Transfer function type */ enum spl_transfer_func_predefined tf_predefined_type; /* Transfer function predefined type */ diff --git a/drivers/gpu/drm/amd/display/dmub/dmub_srv.h b/drivers/gpu/drm/amd/display/dmub/dmub_srv.h index b353c4ceb60dce88dbe86c3bec1535104682b792..4b3ccbca0da2724bc533c3791c2e949b72c9b31f 100644 --- a/drivers/gpu/drm/amd/display/dmub/dmub_srv.h +++ b/drivers/gpu/drm/amd/display/dmub/dmub_srv.h @@ -69,6 +69,9 @@ #define DMUB_PC_SNAPSHOT_COUNT 10 +/* Default tracebuffer size if meta is absent. */ +#define DMUB_TRACE_BUFFER_SIZE (64 * 1024) + /* Forward declarations */ struct dmub_srv; struct dmub_srv_common_regs; diff --git a/drivers/gpu/drm/amd/display/dmub/inc/dmub_cmd.h b/drivers/gpu/drm/amd/display/dmub/inc/dmub_cmd.h index b800a507d1e074931beb4c5e4d08f5f21ae172e9..59990929e44e34d3936b0c598f65382e64591179 100644 --- a/drivers/gpu/drm/amd/display/dmub/inc/dmub_cmd.h +++ b/drivers/gpu/drm/amd/display/dmub/inc/dmub_cmd.h @@ -475,11 +475,23 @@ union replay_hw_flags { * Use TPS3 signal when restore main link. */ uint32_t force_wakeup_by_tps3 : 1; + /** + * @is_alpm_initialized: Indicates whether ALPM is initialized + */ + uint32_t is_alpm_initialized : 1; } bitfields; uint32_t u32All; }; +union fw_assisted_mclk_switch_version { + struct { + uint8_t minor : 5; + uint8_t major : 3; + }; + uint8_t ver; +}; + /** * DMUB feature capabilities. * After DMUB init, driver will query FW capabilities prior to enabling certain features. @@ -1823,52 +1835,11 @@ enum fams2_stream_type { FAMS2_STREAM_TYPE_SUBVP = 4, }; -/* dynamic stream state */ -struct dmub_fams2_legacy_stream_dynamic_state { - uint8_t force_allow_at_vblank; - uint8_t pad[3]; -}; - -struct dmub_fams2_subvp_stream_dynamic_state { - uint16_t viewport_start_hubp_vline; - uint16_t viewport_height_hubp_vlines; - uint16_t viewport_start_c_hubp_vline; - uint16_t viewport_height_c_hubp_vlines; - uint16_t phantom_viewport_height_hubp_vlines; - uint16_t phantom_viewport_height_c_hubp_vlines; - uint16_t microschedule_start_otg_vline; - uint16_t mall_start_otg_vline; - uint16_t mall_start_hubp_vline; - uint16_t mall_start_c_hubp_vline; - uint8_t force_allow_at_vblank_only; - uint8_t pad[3]; -}; - -struct dmub_fams2_drr_stream_dynamic_state { - uint16_t stretched_vtotal; - uint8_t use_cur_vtotal; - uint8_t pad; -}; - -struct dmub_fams2_stream_dynamic_state { - uint64_t ref_tick; - uint32_t cur_vtotal; - uint16_t adjusted_allow_end_otg_vline; - uint8_t pad[2]; - struct dmub_optc_position ref_otg_pos; - struct dmub_optc_position target_otg_pos; - union { - struct dmub_fams2_legacy_stream_dynamic_state legacy; - struct dmub_fams2_subvp_stream_dynamic_state subvp; - struct dmub_fams2_drr_stream_dynamic_state drr; - } sub_state; -}; - /* static stream state */ struct dmub_fams2_legacy_stream_static_state { uint8_t vactive_det_fill_delay_otg_vlines; uint8_t programming_delay_otg_vlines; -}; +}; //v0 struct dmub_fams2_subvp_stream_static_state { uint16_t vratio_numerator; @@ -1887,14 +1858,59 @@ struct dmub_fams2_subvp_stream_static_state { uint8_t phantom_otg_inst; uint8_t phantom_pipe_mask; uint8_t phantom_plane_pipe_masks[DMUB_MAX_PHANTOM_PLANES]; // phantom pipe mask per plane (for flip passthrough) -}; +}; //v0 struct dmub_fams2_drr_stream_static_state { uint16_t nom_stretched_vtotal; uint8_t programming_delay_otg_vlines; uint8_t only_stretch_if_required; uint8_t pad[2]; -}; +}; //v0 + +struct dmub_fams2_cmd_legacy_stream_static_state { + uint16_t vactive_det_fill_delay_otg_vlines; + uint16_t programming_delay_otg_vlines; +}; //v1 + +struct dmub_fams2_cmd_subvp_stream_static_state { + uint16_t vratio_numerator; + uint16_t vratio_denominator; + uint16_t phantom_vtotal; + uint16_t phantom_vactive; + uint16_t programming_delay_otg_vlines; + uint16_t prefetch_to_mall_otg_vlines; + union { + struct { + uint8_t is_multi_planar : 1; + uint8_t is_yuv420 : 1; + } bits; + uint8_t all; + } config; + uint8_t phantom_otg_inst; + uint8_t phantom_pipe_mask; + uint8_t pad0; + uint8_t phantom_plane_pipe_masks[DMUB_MAX_PHANTOM_PLANES]; // phantom pipe mask per plane (for flip passthrough) + uint8_t pad1[4 - (DMUB_MAX_PHANTOM_PLANES % 4)]; +}; //v1 + +struct dmub_fams2_cmd_drr_stream_static_state { + uint16_t nom_stretched_vtotal; + uint16_t programming_delay_otg_vlines; + uint8_t only_stretch_if_required; + uint8_t pad[3]; +}; //v1 + +union dmub_fams2_stream_static_sub_state { + struct dmub_fams2_legacy_stream_static_state legacy; + struct dmub_fams2_subvp_stream_static_state subvp; + struct dmub_fams2_drr_stream_static_state drr; +}; //v0 + +union dmub_fams2_cmd_stream_static_sub_state { + struct dmub_fams2_cmd_legacy_stream_static_state legacy; + struct dmub_fams2_cmd_subvp_stream_static_state subvp; + struct dmub_fams2_cmd_drr_stream_static_state drr; +}; //v1 struct dmub_fams2_stream_static_state { enum fams2_stream_type type; @@ -1924,13 +1940,45 @@ struct dmub_fams2_stream_static_state { uint8_t pipe_mask; // pipe mask for the whole config uint8_t num_planes; uint8_t plane_pipe_masks[DMUB_MAX_PLANES]; // pipe mask per plane (for flip passthrough) - uint8_t pad[DMUB_MAX_PLANES % 4]; + uint8_t pad[4 - (DMUB_MAX_PLANES % 4)]; + union dmub_fams2_stream_static_sub_state sub_state; +}; //v0 + +struct dmub_fams2_cmd_stream_static_base_state { + enum fams2_stream_type type; + uint32_t otg_vline_time_ns; + uint32_t otg_vline_time_ticks; + uint16_t htotal; + uint16_t vtotal; // nominal vtotal + uint16_t vblank_start; + uint16_t vblank_end; + uint16_t max_vtotal; + uint16_t allow_start_otg_vline; + uint16_t allow_end_otg_vline; + uint16_t drr_keepout_otg_vline; // after this vline, vtotal cannot be changed + uint16_t scheduling_delay_otg_vlines; // min time to budget for ready to microschedule start + uint16_t contention_delay_otg_vlines; // time to budget for contention on execution + uint16_t vline_int_ack_delay_otg_vlines; // min time to budget for vertical interrupt firing + uint16_t allow_to_target_delay_otg_vlines; // time from allow vline to target vline union { - struct dmub_fams2_legacy_stream_static_state legacy; - struct dmub_fams2_subvp_stream_static_state subvp; - struct dmub_fams2_drr_stream_static_state drr; - } sub_state; -}; + struct { + uint8_t is_drr : 1; // stream is DRR enabled + uint8_t clamp_vtotal_min : 1; // clamp vtotal to min instead of nominal + uint8_t min_ttu_vblank_usable : 1; // if min ttu vblank is above wm, no force pstate is needed in blank + } bits; + uint8_t all; + } config; + uint8_t otg_inst; + uint8_t pipe_mask; // pipe mask for the whole config + uint8_t num_planes; + uint8_t plane_pipe_masks[DMUB_MAX_PLANES]; // pipe mask per plane (for flip passthrough) + uint8_t pad[4 - (DMUB_MAX_PLANES % 4)]; +}; //v1 + +struct dmub_fams2_stream_static_state_v1 { + struct dmub_fams2_cmd_stream_static_base_state base; + union dmub_fams2_cmd_stream_static_sub_state sub_state; +}; //v1 /** * enum dmub_fams2_allow_delay_check_mode - macroscheduler mode for breaking on excessive @@ -1970,7 +2018,11 @@ struct dmub_cmd_fams2_global_config { union dmub_cmd_fams2_config { struct dmub_cmd_fams2_global_config global; - struct dmub_fams2_stream_static_state stream; + struct dmub_fams2_stream_static_state stream; //v0 + union { + struct dmub_fams2_cmd_stream_static_base_state base; + union dmub_fams2_cmd_stream_static_sub_state sub_state; + } stream_v1; //v1 }; /** diff --git a/drivers/gpu/drm/amd/display/dmub/src/dmub_srv.c b/drivers/gpu/drm/amd/display/dmub/src/dmub_srv.c index a3f3ff5d49ace06726b517fba87e94d3e846ab93..15ea216e903d587125f70221363a24926607f5b0 100644 --- a/drivers/gpu/drm/amd/display/dmub/src/dmub_srv.c +++ b/drivers/gpu/drm/amd/display/dmub/src/dmub_srv.c @@ -61,10 +61,6 @@ /* Default state size if meta is absent. */ #define DMUB_FW_STATE_SIZE (64 * 1024) -/* Default tracebuffer size if meta is absent. */ -#define DMUB_TRACE_BUFFER_SIZE (64 * 1024) - - /* Default scratch mem size. */ #define DMUB_SCRATCH_MEM_SIZE (1024) diff --git a/drivers/gpu/drm/amd/include/amd_shared.h b/drivers/gpu/drm/amd/include/amd_shared.h index 7eefcb0f50703f7b7e9a4146678b79520437cc67..98d9e840b0e2a0b197529ddbe3b79c81438c32f5 100644 --- a/drivers/gpu/drm/amd/include/amd_shared.h +++ b/drivers/gpu/drm/amd/include/amd_shared.h @@ -401,9 +401,9 @@ struct amd_ip_funcs { int (*pre_soft_reset)(struct amdgpu_ip_block *ip_block); int (*soft_reset)(struct amdgpu_ip_block *ip_block); int (*post_soft_reset)(struct amdgpu_ip_block *ip_block); - int (*set_clockgating_state)(void *handle, + int (*set_clockgating_state)(struct amdgpu_ip_block *ip_block, enum amd_clockgating_state state); - int (*set_powergating_state)(void *handle, + int (*set_powergating_state)(struct amdgpu_ip_block *ip_block, enum amd_powergating_state state); void (*get_clockgating_state)(void *handle, u64 *flags); void (*dump_ip_state)(struct amdgpu_ip_block *ip_block); diff --git a/drivers/gpu/drm/amd/include/asic_reg/dcn/dcn_2_0_3_offset.h b/drivers/gpu/drm/amd/include/asic_reg/dcn/dcn_2_0_1_offset.h similarity index 99% rename from drivers/gpu/drm/amd/include/asic_reg/dcn/dcn_2_0_3_offset.h rename to drivers/gpu/drm/amd/include/asic_reg/dcn/dcn_2_0_1_offset.h index cae1a7e743235d851a321783eaa8d5aad7a73c32..73c5dd5e83d48a909ce534756211294e06d03a63 100644 --- a/drivers/gpu/drm/amd/include/asic_reg/dcn/dcn_2_0_3_offset.h +++ b/drivers/gpu/drm/amd/include/asic_reg/dcn/dcn_2_0_1_offset.h @@ -19,8 +19,8 @@ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ -#ifndef _dcn_2_0_3_OFFSET_HEADER -#define _dcn_2_0_3_OFFSET_HEADER +#ifndef _dcn_2_0_1_OFFSET_HEADER +#define _dcn_2_0_1_OFFSET_HEADER // addressBlock: dce_dc_dccg_dccg_dispdec diff --git a/drivers/gpu/drm/amd/include/asic_reg/dcn/dcn_2_0_3_sh_mask.h b/drivers/gpu/drm/amd/include/asic_reg/dcn/dcn_2_0_1_sh_mask.h similarity index 99% rename from drivers/gpu/drm/amd/include/asic_reg/dcn/dcn_2_0_3_sh_mask.h rename to drivers/gpu/drm/amd/include/asic_reg/dcn/dcn_2_0_1_sh_mask.h index ca1e1eb39256f7bbd1163a7ee39042db7b1da3f9..290d807800a64e7eec27c92d35a6f71f38688e3e 100644 --- a/drivers/gpu/drm/amd/include/asic_reg/dcn/dcn_2_0_3_sh_mask.h +++ b/drivers/gpu/drm/amd/include/asic_reg/dcn/dcn_2_0_1_sh_mask.h @@ -18,8 +18,8 @@ * AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ -#ifndef _dcn_2_0_3_SH_MASK_HEADER -#define _dcn_2_0_3_SH_MASK_HEADER +#ifndef _dcn_2_0_1_SH_MASK_HEADER +#define _dcn_2_0_1_SH_MASK_HEADER // addressBlock: dce_dc_dccg_dccg_dispdec diff --git a/drivers/gpu/drm/amd/include/asic_reg/umc/umc_8_14_0_offset.h b/drivers/gpu/drm/amd/include/asic_reg/umc/umc_8_14_0_offset.h new file mode 100644 index 0000000000000000000000000000000000000000..0e8f12728d5f48db69c75f7c51dd13eafb4767a9 --- /dev/null +++ b/drivers/gpu/drm/amd/include/asic_reg/umc/umc_8_14_0_offset.h @@ -0,0 +1,29 @@ +/* + * Copyright (C) 2024 Advanced Micro Devices, Inc. + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included + * in all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS + * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE COPYRIGHT HOLDER(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN + * AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. + */ +#ifndef _umc_8_14_0_OFFSET_HEADER +#define _umc_8_14_0_OFFSET_HEADER + +#define regUMCCH0_GeccErrCntSel 0x0328 +#define regUMCCH0_GeccErrCntSel_BASE_IDX 0 +#define regUMCCH0_GeccErrCnt 0x0329 +#define regUMCCH0_GeccErrCnt_BASE_IDX 0 + +#endif diff --git a/drivers/gpu/drm/amd/include/asic_reg/umc/umc_8_14_0_sh_mask.h b/drivers/gpu/drm/amd/include/asic_reg/umc/umc_8_14_0_sh_mask.h new file mode 100644 index 0000000000000000000000000000000000000000..5d723b5d9b87b8ce31ddc514158b27967b81a077 --- /dev/null +++ b/drivers/gpu/drm/amd/include/asic_reg/umc/umc_8_14_0_sh_mask.h @@ -0,0 +1,37 @@ +/* + * Copyright (C) 2024 Advanced Micro Devices, Inc. + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included + * in all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS + * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE COPYRIGHT HOLDER(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN + * AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. + */ +#ifndef _umc_8_14_0_SH_MASK_HEADER +#define _umc_8_14_0_SH_MASK_HEADER + +//UMCCH0_GeccErrCntSel +#define UMCCH0_GeccErrCntSel__GeccErrInt__SHIFT 0xc +#define UMCCH0_GeccErrCntSel__GeccErrCntEn__SHIFT 0xf +#define UMCCH0_GeccErrCntSel__PoisonCntEn__SHIFT 0x10 +#define UMCCH0_GeccErrCntSel__GeccErrInt_MASK 0x00003000L +#define UMCCH0_GeccErrCntSel__GeccErrCntEn_MASK 0x00008000L +#define UMCCH0_GeccErrCntSel__PoisonCntEn_MASK 0x00030000L +//UMCCH0_GeccErrCnt +#define UMCCH0_GeccErrCnt__GeccErrCnt__SHIFT 0x0 +#define UMCCH0_GeccErrCnt__GeccUnCorrErrCnt__SHIFT 0x10 +#define UMCCH0_GeccErrCnt__GeccErrCnt_MASK 0x0000FFFFL +#define UMCCH0_GeccErrCnt__GeccUnCorrErrCnt_MASK 0xFFFF0000L + +#endif diff --git a/drivers/gpu/drm/amd/include/atomfirmware.h b/drivers/gpu/drm/amd/include/atomfirmware.h index b0fc22383e2874191660b8b3f4eeed6cb7372077..0160d65f3f5e55f21d6c490a9f7b8298356a1bd7 100644 --- a/drivers/gpu/drm/amd/include/atomfirmware.h +++ b/drivers/gpu/drm/amd/include/atomfirmware.h @@ -1300,12 +1300,17 @@ struct atom_ext_display_path //usCaps enum ext_display_path_cap_def { - EXT_DISPLAY_PATH_CAPS__HBR2_DISABLE = 0x0001, - EXT_DISPLAY_PATH_CAPS__DP_FIXED_VS_EN = 0x0002, - EXT_DISPLAY_PATH_CAPS__EXT_CHIP_MASK = 0x007C, - EXT_DISPLAY_PATH_CAPS__HDMI20_PI3EQX1204 = (0x01 << 2), //PI redriver chip - EXT_DISPLAY_PATH_CAPS__HDMI20_TISN65DP159RSBT = (0x02 << 2), //TI retimer chip - EXT_DISPLAY_PATH_CAPS__HDMI20_PARADE_PS175 = (0x03 << 2) //Parade DP->HDMI recoverter chip + EXT_DISPLAY_PATH_CAPS__EXT_CHIP_MASK = 0x007E, + AMD_EXT_DISPLAY_PATH_CAPS__EXT_CHIP_MASK = 0x007E, + AMD_EXT_DISPLAY_PATH_CAPS__DP_FIXED_VS_EN = (0x01 << 1), + AMD_EXT_DISPLAY_PATH_CAPS__HDMI20_PI3EQX1204 = (0x02 << 1), + AMD_EXT_DISPLAY_PATH_CAPS__DP_EARLY_8B10B_TPS2 = (0x03 << 1), + AMD_EXT_DISPLAY_PATH_CAPS__HDMI20_TISN65DP159RSBT = (0x04 << 1), + AMD_EXT_DISPLAY_PATH_CAPS__HDMI20_PARADE_PS175 = (0x06 << 1), + EXT_DISPLAY_PATH_CAPS__DP_FIXED_VS_EN = (0x07 << 1), + EXT_DISPLAY_PATH_CAPS__HDMI20_PI3EQX1204 = (0x08 << 1), //PI redriver chip + EXT_DISPLAY_PATH_CAPS__HDMI20_TISN65DP159RSBT = (0x09 << 1), //TI retimer chip + EXT_DISPLAY_PATH_CAPS__AMD_INTERNAL = (0x0a << 1), //AMD internal customer chip placeholder }; struct atom_external_display_connection_info diff --git a/drivers/gpu/drm/amd/include/ivsrcid/vcn/irqsrcs_vcn_5_0.h b/drivers/gpu/drm/amd/include/ivsrcid/vcn/irqsrcs_vcn_5_0.h new file mode 100644 index 0000000000000000000000000000000000000000..64b553e7de1aef1384761c9a5737b5cd438fc30e --- /dev/null +++ b/drivers/gpu/drm/amd/include/ivsrcid/vcn/irqsrcs_vcn_5_0.h @@ -0,0 +1,47 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +/* + * Copyright 2024 Advanced Micro Devices, Inc. All rights reserved. + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR + * OTHER DEALINGS IN THE SOFTWARE. + */ + +#ifndef __IRQSRCS_VCN_5_0_H__ +#define __IRQSRCS_VCN_5_0_H__ + +#define VCN_5_0__SRCID__UVD_TRAP 114 // 0x72 UVD_TRAP +#define VCN_5_0__SRCID__UVD_ENC_GENERAL_PURPOSE 119 // 0x77 Encoder General Purpose +#define VCN_5_0__SRCID__UVD_ENC_LOW_LATENCY 120 // 0x78 Encoder Low Latency +#define VCN_5_0__SRCID__UVD_SYSTEM_MESSAGE_INTERRUPT 124 // 0x7c UVD system message interrupt +#define VCN_5_0__SRCID__JPEG_ENCODE 151 // 0x97 JRBC Encode interrupt +#define VCN_5_0__SRCID__JPEG_DECODE 153 // 0x99 JRBC Decode interrupt +#define VCN_5_0__SRCID__JPEG1_DECODE 149 // 0x95 JRBC1 Decode interrupt +#define VCN_5_0__SRCID__JPEG2_DECODE 151 // 0x97 JRBC2 Decode interrupt +#define VCN_5_0__SRCID__JPEG3_DECODE 171 // 0xab JRBC3 Decode interrupt +#define VCN_5_0__SRCID__JPEG4_DECODE 172 // 0xac JRBC4 Decode interrupt +#define VCN_5_0__SRCID__JPEG5_DECODE 173 // 0xad JRBC5 Decode interrupt +#define VCN_5_0__SRCID__JPEG6_DECODE 174 // 0xae JRBC6 Decode interrupt +#define VCN_5_0__SRCID__JPEG7_DECODE 175 // 0xaf JRBC7 Decode interrupt +#define VCN_5_0__SRCID__JPEG8_DECODE 177 // 0xb1 JRBC8 Decode interrupt +#define VCN_5_0__SRCID__JPEG9_DECODE 178 // 0xb2 JRBC9 Decode interrupt + +#define VCN_5_0__SRCID_UVD_POISON 160 +#define VCN_5_0__SRCID_DJPEG0_POISON 161 +#define VCN_5_0__SRCID_EJPEG0_POISON 162 +#endif diff --git a/drivers/gpu/drm/amd/include/kgd_pp_interface.h b/drivers/gpu/drm/amd/include/kgd_pp_interface.h index 67a5de57394361e2a272be17e0a9daa907c39e8a..9189dcb6518817c742ac98ee16aa141dcce63412 100644 --- a/drivers/gpu/drm/amd/include/kgd_pp_interface.h +++ b/drivers/gpu/drm/amd/include/kgd_pp_interface.h @@ -164,6 +164,7 @@ enum amd_pp_task { }; enum PP_SMC_POWER_PROFILE { + PP_SMC_POWER_PROFILE_UNKNOWN = -1, PP_SMC_POWER_PROFILE_BOOTUP_DEFAULT = 0x0, PP_SMC_POWER_PROFILE_FULLSCREEN3D = 0x1, PP_SMC_POWER_PROFILE_POWERSAVING = 0x2, @@ -420,7 +421,9 @@ struct amd_pm_funcs { int (*load_firmware)(void *handle); int (*wait_for_fw_loading_complete)(void *handle); int (*set_powergating_by_smu)(void *handle, - uint32_t block_type, bool gate); + uint32_t block_type, + bool gate, + int inst); int (*set_clockgating_by_smu)(void *handle, uint32_t msg_id); int (*set_power_limit)(void *handle, uint32_t n); int (*get_power_limit)(void *handle, uint32_t *limit, diff --git a/drivers/gpu/drm/amd/pm/amdgpu_dpm.c b/drivers/gpu/drm/amd/pm/amdgpu_dpm.c index 9dc82f4d7c937aa6e021db4212d8deed025c4fbf..6a9e26905edfc6a5381096e175b2b8084ac72d26 100644 --- a/drivers/gpu/drm/amd/pm/amdgpu_dpm.c +++ b/drivers/gpu/drm/amd/pm/amdgpu_dpm.c @@ -70,13 +70,18 @@ int amdgpu_dpm_get_mclk(struct amdgpu_device *adev, bool low) return ret; } -int amdgpu_dpm_set_powergating_by_smu(struct amdgpu_device *adev, uint32_t block_type, bool gate) +int amdgpu_dpm_set_powergating_by_smu(struct amdgpu_device *adev, + uint32_t block_type, + bool gate, + int inst) { int ret = 0; const struct amd_pm_funcs *pp_funcs = adev->powerplay.pp_funcs; enum ip_power_state pwr_state = gate ? POWER_STATE_OFF : POWER_STATE_ON; + bool is_vcn = (block_type == AMD_IP_BLOCK_TYPE_UVD || block_type == AMD_IP_BLOCK_TYPE_VCN); - if (atomic_read(&adev->pm.pwr_state[block_type]) == pwr_state) { + if (atomic_read(&adev->pm.pwr_state[block_type]) == pwr_state && + (!is_vcn || adev->vcn.num_vcn_inst == 1)) { dev_dbg(adev->dev, "IP block%d already in the target %s state!", block_type, gate ? "gate" : "ungate"); return 0; @@ -88,7 +93,6 @@ int amdgpu_dpm_set_powergating_by_smu(struct amdgpu_device *adev, uint32_t block case AMD_IP_BLOCK_TYPE_UVD: case AMD_IP_BLOCK_TYPE_VCE: case AMD_IP_BLOCK_TYPE_GFX: - case AMD_IP_BLOCK_TYPE_VCN: case AMD_IP_BLOCK_TYPE_SDMA: case AMD_IP_BLOCK_TYPE_JPEG: case AMD_IP_BLOCK_TYPE_GMC: @@ -96,7 +100,12 @@ int amdgpu_dpm_set_powergating_by_smu(struct amdgpu_device *adev, uint32_t block case AMD_IP_BLOCK_TYPE_VPE: if (pp_funcs && pp_funcs->set_powergating_by_smu) ret = (pp_funcs->set_powergating_by_smu( - (adev)->powerplay.pp_handle, block_type, gate)); + (adev)->powerplay.pp_handle, block_type, gate, 0)); + break; + case AMD_IP_BLOCK_TYPE_VCN: + if (pp_funcs && pp_funcs->set_powergating_by_smu) + ret = (pp_funcs->set_powergating_by_smu( + (adev)->powerplay.pp_handle, block_type, gate, inst)); break; default: break; @@ -566,7 +575,17 @@ void amdgpu_dpm_enable_uvd(struct amdgpu_device *adev, bool enable) return; } - ret = amdgpu_dpm_set_powergating_by_smu(adev, AMD_IP_BLOCK_TYPE_UVD, !enable); + ret = amdgpu_dpm_set_powergating_by_smu(adev, AMD_IP_BLOCK_TYPE_UVD, !enable, 0); + if (ret) + DRM_ERROR("Dpm %s uvd failed, ret = %d. \n", + enable ? "enable" : "disable", ret); +} + +void amdgpu_dpm_enable_vcn(struct amdgpu_device *adev, bool enable, int inst) +{ + int ret = 0; + + ret = amdgpu_dpm_set_powergating_by_smu(adev, AMD_IP_BLOCK_TYPE_VCN, !enable, inst); if (ret) DRM_ERROR("Dpm %s uvd failed, ret = %d. \n", enable ? "enable" : "disable", ret); @@ -591,7 +610,7 @@ void amdgpu_dpm_enable_vce(struct amdgpu_device *adev, bool enable) return; } - ret = amdgpu_dpm_set_powergating_by_smu(adev, AMD_IP_BLOCK_TYPE_VCE, !enable); + ret = amdgpu_dpm_set_powergating_by_smu(adev, AMD_IP_BLOCK_TYPE_VCE, !enable, 0); if (ret) DRM_ERROR("Dpm %s vce failed, ret = %d. \n", enable ? "enable" : "disable", ret); @@ -601,7 +620,7 @@ void amdgpu_dpm_enable_jpeg(struct amdgpu_device *adev, bool enable) { int ret = 0; - ret = amdgpu_dpm_set_powergating_by_smu(adev, AMD_IP_BLOCK_TYPE_JPEG, !enable); + ret = amdgpu_dpm_set_powergating_by_smu(adev, AMD_IP_BLOCK_TYPE_JPEG, !enable, 0); if (ret) DRM_ERROR("Dpm %s jpeg failed, ret = %d. \n", enable ? "enable" : "disable", ret); @@ -611,7 +630,7 @@ void amdgpu_dpm_enable_vpe(struct amdgpu_device *adev, bool enable) { int ret = 0; - ret = amdgpu_dpm_set_powergating_by_smu(adev, AMD_IP_BLOCK_TYPE_VPE, !enable); + ret = amdgpu_dpm_set_powergating_by_smu(adev, AMD_IP_BLOCK_TYPE_VPE, !enable, 0); if (ret) DRM_ERROR("Dpm %s vpe failed, ret = %d.\n", enable ? "enable" : "disable", ret); @@ -700,6 +719,21 @@ int amdgpu_dpm_send_rma_reason(struct amdgpu_device *adev) return ret; } +int amdgpu_dpm_reset_sdma(struct amdgpu_device *adev, uint32_t inst_mask) +{ + struct smu_context *smu = adev->powerplay.pp_handle; + int ret; + + if (!is_support_sw_smu(adev)) + return -EOPNOTSUPP; + + mutex_lock(&adev->pm.mutex); + ret = smu_reset_sdma(smu, inst_mask); + mutex_unlock(&adev->pm.mutex); + + return ret; +} + int amdgpu_dpm_get_dpm_freq_range(struct amdgpu_device *adev, enum pp_clock_type type, uint32_t *min, @@ -953,6 +987,24 @@ enum amd_dpm_forced_level amdgpu_dpm_get_performance_level(struct amdgpu_device return level; } +static void amdgpu_dpm_enter_umd_state(struct amdgpu_device *adev) +{ + /* enter UMD Pstate */ + amdgpu_device_ip_set_powergating_state(adev, AMD_IP_BLOCK_TYPE_GFX, + AMD_PG_STATE_UNGATE); + amdgpu_device_ip_set_clockgating_state(adev, AMD_IP_BLOCK_TYPE_GFX, + AMD_CG_STATE_UNGATE); +} + +static void amdgpu_dpm_exit_umd_state(struct amdgpu_device *adev) +{ + /* exit UMD Pstate */ + amdgpu_device_ip_set_clockgating_state(adev, AMD_IP_BLOCK_TYPE_GFX, + AMD_CG_STATE_GATE); + amdgpu_device_ip_set_powergating_state(adev, AMD_IP_BLOCK_TYPE_GFX, + AMD_PG_STATE_GATE); +} + int amdgpu_dpm_force_performance_level(struct amdgpu_device *adev, enum amd_dpm_forced_level level) { @@ -973,6 +1025,10 @@ int amdgpu_dpm_force_performance_level(struct amdgpu_device *adev, if (current_level == level) return 0; + if (!(current_level & profile_mode_mask) && + (level == AMD_DPM_FORCED_LEVEL_PROFILE_EXIT)) + return -EINVAL; + if (adev->asic_type == CHIP_RAVEN) { if (!(adev->apu_flags & AMD_APU_IS_RAVEN2)) { if (current_level != AMD_DPM_FORCED_LEVEL_MANUAL && @@ -984,35 +1040,25 @@ int amdgpu_dpm_force_performance_level(struct amdgpu_device *adev, } } - if (!(current_level & profile_mode_mask) && - (level == AMD_DPM_FORCED_LEVEL_PROFILE_EXIT)) - return -EINVAL; - - if (!(current_level & profile_mode_mask) && - (level & profile_mode_mask)) { - /* enter UMD Pstate */ - amdgpu_device_ip_set_powergating_state(adev, - AMD_IP_BLOCK_TYPE_GFX, - AMD_PG_STATE_UNGATE); - amdgpu_device_ip_set_clockgating_state(adev, - AMD_IP_BLOCK_TYPE_GFX, - AMD_CG_STATE_UNGATE); - } else if ((current_level & profile_mode_mask) && - !(level & profile_mode_mask)) { - /* exit UMD Pstate */ - amdgpu_device_ip_set_clockgating_state(adev, - AMD_IP_BLOCK_TYPE_GFX, - AMD_CG_STATE_GATE); - amdgpu_device_ip_set_powergating_state(adev, - AMD_IP_BLOCK_TYPE_GFX, - AMD_PG_STATE_GATE); - } + if (!(current_level & profile_mode_mask) && (level & profile_mode_mask)) + amdgpu_dpm_enter_umd_state(adev); + else if ((current_level & profile_mode_mask) && + !(level & profile_mode_mask)) + amdgpu_dpm_exit_umd_state(adev); mutex_lock(&adev->pm.mutex); if (pp_funcs->force_performance_level(adev->powerplay.pp_handle, level)) { mutex_unlock(&adev->pm.mutex); + /* If new level failed, retain the umd state as before */ + if (!(current_level & profile_mode_mask) && + (level & profile_mode_mask)) + amdgpu_dpm_exit_umd_state(adev); + else if ((current_level & profile_mode_mask) && + !(level & profile_mode_mask)) + amdgpu_dpm_enter_umd_state(adev); + return -EINVAL; } diff --git a/drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h b/drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h index 363af8990aa25762cc2d8d0eaf848d4ec292e9ad..1f5ac7e0230d292c31bb3372c46efae9eb60a024 100644 --- a/drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h +++ b/drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h @@ -397,7 +397,7 @@ int amdgpu_dpm_get_apu_thermal_limit(struct amdgpu_device *adev, uint32_t *limit int amdgpu_dpm_set_apu_thermal_limit(struct amdgpu_device *adev, uint32_t limit); int amdgpu_dpm_set_powergating_by_smu(struct amdgpu_device *adev, - uint32_t block_type, bool gate); + uint32_t block_type, bool gate, int inst); extern int amdgpu_dpm_get_sclk(struct amdgpu_device *adev, bool low); @@ -446,6 +446,7 @@ void amdgpu_pm_acpi_event_handler(struct amdgpu_device *adev); void amdgpu_dpm_compute_clocks(struct amdgpu_device *adev); void amdgpu_dpm_enable_uvd(struct amdgpu_device *adev, bool enable); +void amdgpu_dpm_enable_vcn(struct amdgpu_device *adev, bool enable, int inst); void amdgpu_dpm_enable_vce(struct amdgpu_device *adev, bool enable); void amdgpu_dpm_enable_jpeg(struct amdgpu_device *adev, bool enable); void amdgpu_dpm_enable_vpe(struct amdgpu_device *adev, bool enable); @@ -601,5 +602,6 @@ int amdgpu_dpm_set_pm_policy(struct amdgpu_device *adev, int policy_type, int policy_level); ssize_t amdgpu_dpm_get_pm_policy_info(struct amdgpu_device *adev, enum pp_pm_policy p_type, char *buf); +int amdgpu_dpm_reset_sdma(struct amdgpu_device *adev, uint32_t inst_mask); #endif diff --git a/drivers/gpu/drm/amd/pm/legacy-dpm/kv_dpm.c b/drivers/gpu/drm/amd/pm/legacy-dpm/kv_dpm.c index 8908646ad620d409f5d3deed6392a070fcb7e8ed..67a8e22b1126d9eabaefe6d66809d63da9212612 100644 --- a/drivers/gpu/drm/amd/pm/legacy-dpm/kv_dpm.c +++ b/drivers/gpu/drm/amd/pm/legacy-dpm/kv_dpm.c @@ -3177,13 +3177,13 @@ static int kv_dpm_process_interrupt(struct amdgpu_device *adev, return 0; } -static int kv_dpm_set_clockgating_state(void *handle, +static int kv_dpm_set_clockgating_state(struct amdgpu_ip_block *ip_block, enum amd_clockgating_state state) { return 0; } -static int kv_dpm_set_powergating_state(void *handle, +static int kv_dpm_set_powergating_state(struct amdgpu_ip_block *ip_block, enum amd_powergating_state state) { return 0; @@ -3276,7 +3276,9 @@ static int kv_dpm_read_sensor(void *handle, int idx, } static int kv_set_powergating_by_smu(void *handle, - uint32_t block_type, bool gate) + uint32_t block_type, + bool gate, + int inst) { switch (block_type) { case AMD_IP_BLOCK_TYPE_UVD: diff --git a/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c b/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c index ee23a0f897c50c13b2a28cc009ad0703cff6266f..a87dcf0974bc1cd92bb6eb80fb6aa4f70649ed18 100644 --- a/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c +++ b/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c @@ -7709,7 +7709,8 @@ static int si_dpm_init_microcode(struct amdgpu_device *adev) default: BUG(); } - err = amdgpu_ucode_request(adev, &adev->pm.fw, "amdgpu/%s_smc.bin", chip_name); + err = amdgpu_ucode_request(adev, &adev->pm.fw, AMDGPU_UCODE_REQUIRED, + "amdgpu/%s_smc.bin", chip_name); if (err) { DRM_ERROR("si_smc: Failed to load firmware. err = %d\"%s_smc.bin\"\n", err, chip_name); @@ -7849,13 +7850,13 @@ static int si_dpm_wait_for_idle(struct amdgpu_ip_block *ip_block) return 0; } -static int si_dpm_set_clockgating_state(void *handle, +static int si_dpm_set_clockgating_state(struct amdgpu_ip_block *ip_block, enum amd_clockgating_state state) { return 0; } -static int si_dpm_set_powergating_state(void *handle, +static int si_dpm_set_powergating_state(struct amdgpu_ip_block *ip_block, enum amd_powergating_state state) { return 0; diff --git a/drivers/gpu/drm/amd/pm/powerplay/amd_powerplay.c b/drivers/gpu/drm/amd/pm/powerplay/amd_powerplay.c index 26624a716fc6076072d7eae4e0b8c2f92de9f17a..686345f75f2645aa8a1d24af5171f06f4cf1a3e5 100644 --- a/drivers/gpu/drm/amd/pm/powerplay/amd_powerplay.c +++ b/drivers/gpu/drm/amd/pm/powerplay/amd_powerplay.c @@ -244,7 +244,7 @@ static bool pp_is_idle(void *handle) return false; } -static int pp_set_powergating_state(void *handle, +static int pp_set_powergating_state(struct amdgpu_ip_block *ip_block, enum amd_powergating_state state) { return 0; @@ -267,7 +267,7 @@ static int pp_resume(struct amdgpu_ip_block *ip_block) return hwmgr_resume(hwmgr); } -static int pp_set_clockgating_state(void *handle, +static int pp_set_clockgating_state(struct amdgpu_ip_block *ip_block, enum amd_clockgating_state state) { return 0; @@ -1227,7 +1227,9 @@ static void pp_dpm_powergate_sdma(void *handle, bool gate) } static int pp_set_powergating_by_smu(void *handle, - uint32_t block_type, bool gate) + uint32_t block_type, + bool gate, + int inst) { int ret = 0; diff --git a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/ppatomctrl.c b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/ppatomctrl.c index fe24219c3bf48e784302d7b43884759385fb9352..4bd92fd782be6abe754e3d487935b4fcedd44faa 100644 --- a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/ppatomctrl.c +++ b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/ppatomctrl.c @@ -992,6 +992,8 @@ int atomctrl_get_smc_sclk_range_table(struct pp_hwmgr *hwmgr, struct pp_atom_ctr GetIndexIntoMasterTable(DATA, SMU_Info), &size, &frev, &crev); + if (!psmu_info) + return -EINVAL; for (i = 0; i < psmu_info->ucSclkEntryNum; i++) { table->entry[i].ucVco_setting = psmu_info->asSclkFcwRangeEntry[i].ucVco_setting; diff --git a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/vega10_powertune.c b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/vega10_powertune.c index 3007b054c873c9f4e68248e59463413247170af0..776d58ea63ae90002d85ffed60afd8dee399d50f 100644 --- a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/vega10_powertune.c +++ b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/vega10_powertune.c @@ -1120,13 +1120,14 @@ static int vega10_enable_se_edc_force_stall_config(struct pp_hwmgr *hwmgr) result = vega10_program_didt_config_registers(hwmgr, SEEDCForceStallPatternConfig_Vega10, VEGA10_CONFIGREG_DIDT); result |= vega10_program_didt_config_registers(hwmgr, SEEDCCtrlForceStallConfig_Vega10, VEGA10_CONFIGREG_DIDT); if (0 != result) - return result; + goto exit_safe_mode; vega10_didt_set_mask(hwmgr, false); +exit_safe_mode: amdgpu_gfx_rlc_exit_safe_mode(adev, 0); - return 0; + return result; } static int vega10_disable_se_edc_force_stall_config(struct pp_hwmgr *hwmgr) diff --git a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c index 5eae14fe79f1782e575b97dc36e73a34893232d8..8ca793c222ff2e9f702733ee896babdbd0598dc3 100644 --- a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c +++ b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c @@ -238,7 +238,8 @@ static bool is_vcn_enabled(struct amdgpu_device *adev) } static int smu_dpm_set_vcn_enable(struct smu_context *smu, - bool enable) + bool enable, + int inst) { struct smu_power_context *smu_power = &smu->smu_power; struct smu_power_gate *power_gate = &smu_power->power_gate; @@ -253,12 +254,12 @@ static int smu_dpm_set_vcn_enable(struct smu_context *smu, if (!smu->ppt_funcs->dpm_set_vcn_enable) return 0; - if (atomic_read(&power_gate->vcn_gated) ^ enable) + if (atomic_read(&power_gate->vcn_gated[inst]) ^ enable) return 0; - ret = smu->ppt_funcs->dpm_set_vcn_enable(smu, enable, 0xff); + ret = smu->ppt_funcs->dpm_set_vcn_enable(smu, enable, inst); if (!ret) - atomic_set(&power_gate->vcn_gated, !enable); + atomic_set(&power_gate->vcn_gated[inst], !enable); return ret; } @@ -345,8 +346,9 @@ static int smu_set_mall_enable(struct smu_context *smu) * smu_dpm_set_power_gate - power gate/ungate the specific IP block * * @handle: smu_context pointer - * @block_type: the IP block to power gate/ungate - * @gate: to power gate if true, ungate otherwise + * @block_type: the IP block to power gate/ungate + * @gate: to power gate if true, ungate otherwise + * @inst: the instance of the IP block to power gate/ungate * * This API uses no smu->mutex lock protection due to: * 1. It is either called by other IP block(gfx/sdma/vcn/uvd/vce). @@ -357,7 +359,8 @@ static int smu_set_mall_enable(struct smu_context *smu) */ static int smu_dpm_set_power_gate(void *handle, uint32_t block_type, - bool gate) + bool gate, + int inst) { struct smu_context *smu = handle; int ret = 0; @@ -376,10 +379,10 @@ static int smu_dpm_set_power_gate(void *handle, */ case AMD_IP_BLOCK_TYPE_UVD: case AMD_IP_BLOCK_TYPE_VCN: - ret = smu_dpm_set_vcn_enable(smu, !gate); + ret = smu_dpm_set_vcn_enable(smu, !gate, inst); if (ret) - dev_err(smu->adev->dev, "Failed to power %s VCN!\n", - gate ? "gate" : "ungate"); + dev_err(smu->adev->dev, "Failed to power %s VCN instance %d!\n", + gate ? "gate" : "ungate", inst); break; case AMD_IP_BLOCK_TYPE_GFX: ret = smu_gfx_off_control(smu, gate); @@ -724,6 +727,7 @@ static int smu_set_funcs(struct amdgpu_device *adev) break; case IP_VERSION(13, 0, 6): case IP_VERSION(13, 0, 14): + case IP_VERSION(13, 0, 12): smu_v13_0_6_set_ppt_funcs(smu); /* Enable pp_od_clk_voltage node */ smu->od_enabled = true; @@ -764,6 +768,7 @@ static int smu_early_init(struct amdgpu_ip_block *ip_block) smu->smu_baco.platform_support = false; smu->smu_baco.maco_support = false; smu->user_dpm_profile.fan_mode = -1; + smu->power_profile_mode = PP_SMC_POWER_PROFILE_UNKNOWN; mutex_init(&smu->message_lock); @@ -781,21 +786,25 @@ static int smu_set_default_dpm_table(struct smu_context *smu) struct amdgpu_device *adev = smu->adev; struct smu_power_context *smu_power = &smu->smu_power; struct smu_power_gate *power_gate = &smu_power->power_gate; - int vcn_gate, jpeg_gate; + int vcn_gate[AMDGPU_MAX_VCN_INSTANCES], jpeg_gate, i; int ret = 0; if (!smu->ppt_funcs->set_default_dpm_table) return 0; - if (adev->pg_flags & AMD_PG_SUPPORT_VCN) - vcn_gate = atomic_read(&power_gate->vcn_gated); + if (adev->pg_flags & AMD_PG_SUPPORT_VCN) { + for (i = 0; i < adev->vcn.num_vcn_inst; i++) + vcn_gate[i] = atomic_read(&power_gate->vcn_gated[i]); + } if (adev->pg_flags & AMD_PG_SUPPORT_JPEG) jpeg_gate = atomic_read(&power_gate->jpeg_gated); if (adev->pg_flags & AMD_PG_SUPPORT_VCN) { - ret = smu_dpm_set_vcn_enable(smu, true); - if (ret) - return ret; + for (i = 0; i < adev->vcn.num_vcn_inst; i++) { + ret = smu_dpm_set_vcn_enable(smu, true, i); + if (ret) + return ret; + } } if (adev->pg_flags & AMD_PG_SUPPORT_JPEG) { @@ -812,8 +821,10 @@ static int smu_set_default_dpm_table(struct smu_context *smu) if (adev->pg_flags & AMD_PG_SUPPORT_JPEG) smu_dpm_set_jpeg_enable(smu, !jpeg_gate); err_out: - if (adev->pg_flags & AMD_PG_SUPPORT_VCN) - smu_dpm_set_vcn_enable(smu, !vcn_gate); + if (adev->pg_flags & AMD_PG_SUPPORT_VCN) { + for (i = 0; i < adev->vcn.num_vcn_inst; i++) + smu_dpm_set_vcn_enable(smu, !vcn_gate[i], i); + } return ret; } @@ -1248,11 +1259,26 @@ static bool smu_is_workload_profile_available(struct smu_context *smu, return smu->workload_map && smu->workload_map[profile].valid_mapping; } +static void smu_init_power_profile(struct smu_context *smu) +{ + if (smu->power_profile_mode == PP_SMC_POWER_PROFILE_UNKNOWN) { + if (smu->is_apu || + !smu_is_workload_profile_available( + smu, PP_SMC_POWER_PROFILE_FULLSCREEN3D)) + smu->power_profile_mode = + PP_SMC_POWER_PROFILE_BOOTUP_DEFAULT; + else + smu->power_profile_mode = + PP_SMC_POWER_PROFILE_FULLSCREEN3D; + } + smu_power_profile_mode_get(smu, smu->power_profile_mode); +} + static int smu_sw_init(struct amdgpu_ip_block *ip_block) { struct amdgpu_device *adev = ip_block->adev; struct smu_context *smu = adev->powerplay.pp_handle; - int ret; + int i, ret; smu->pool_size = adev->pm.smu_prv_buffer_size; smu->smu_feature.feature_num = SMU_FEATURE_MAX; @@ -1264,18 +1290,13 @@ static int smu_sw_init(struct amdgpu_ip_block *ip_block) atomic64_set(&smu->throttle_int_counter, 0); smu->watermarks_bitmap = 0; - atomic_set(&smu->smu_power.power_gate.vcn_gated, 1); + for (i = 0; i < adev->vcn.num_vcn_inst; i++) + atomic_set(&smu->smu_power.power_gate.vcn_gated[i], 1); atomic_set(&smu->smu_power.power_gate.jpeg_gated, 1); atomic_set(&smu->smu_power.power_gate.vpe_gated, 1); atomic_set(&smu->smu_power.power_gate.umsch_mm_gated, 1); - if (smu->is_apu || - !smu_is_workload_profile_available(smu, PP_SMC_POWER_PROFILE_FULLSCREEN3D)) - smu->power_profile_mode = PP_SMC_POWER_PROFILE_BOOTUP_DEFAULT; - else - smu->power_profile_mode = PP_SMC_POWER_PROFILE_FULLSCREEN3D; - smu_power_profile_mode_get(smu, smu->power_profile_mode); - + smu_init_power_profile(smu); smu->display_config = &adev->pm.pm_display_cfg; smu->smu_dpm.dpm_level = AMD_DPM_FORCED_LEVEL_AUTO; @@ -1800,7 +1821,7 @@ static int smu_start_smc_engine(struct smu_context *smu) static int smu_hw_init(struct amdgpu_ip_block *ip_block) { - int ret; + int i, ret; struct amdgpu_device *adev = ip_block->adev; struct smu_context *smu = adev->powerplay.pp_handle; @@ -1826,7 +1847,8 @@ static int smu_hw_init(struct amdgpu_ip_block *ip_block) ret = smu_set_gfx_imu_enable(smu); if (ret) return ret; - smu_dpm_set_vcn_enable(smu, true); + for (i = 0; i < adev->vcn.num_vcn_inst; i++) + smu_dpm_set_vcn_enable(smu, true, i); smu_dpm_set_jpeg_enable(smu, true); smu_dpm_set_vpe_enable(smu, true); smu_dpm_set_umsch_mm_enable(smu, true); @@ -2024,12 +2046,13 @@ static int smu_hw_fini(struct amdgpu_ip_block *ip_block) { struct amdgpu_device *adev = ip_block->adev; struct smu_context *smu = adev->powerplay.pp_handle; - int ret; + int i, ret; if (amdgpu_sriov_vf(adev) && !amdgpu_sriov_is_pp_one_vf(adev)) return 0; - smu_dpm_set_vcn_enable(smu, false); + for (i = 0; i < adev->vcn.num_vcn_inst; i++) + smu_dpm_set_vcn_enable(smu, false, i); smu_dpm_set_jpeg_enable(smu, false); smu_dpm_set_vpe_enable(smu, false); smu_dpm_set_umsch_mm_enable(smu, false); @@ -2181,13 +2204,13 @@ static int smu_display_configuration_change(void *handle, return 0; } -static int smu_set_clockgating_state(void *handle, +static int smu_set_clockgating_state(struct amdgpu_ip_block *ip_block, enum amd_clockgating_state state) { return 0; } -static int smu_set_powergating_state(void *handle, +static int smu_set_powergating_state(struct amdgpu_ip_block *ip_block, enum amd_powergating_state state) { return 0; @@ -2969,9 +2992,10 @@ static int smu_read_sensor(void *handle, int *size_arg) { struct smu_context *smu = handle; + struct amdgpu_device *adev = smu->adev; struct smu_umd_pstate_table *pstate_table = &smu->pstate_table; - int ret = 0; + int i, ret = 0; uint32_t *size, size_val; if (!smu->pm_enabled || !smu->adev->pm.dpm_enabled) @@ -3017,7 +3041,13 @@ static int smu_read_sensor(void *handle, *size = 4; break; case AMDGPU_PP_SENSOR_VCN_POWER_STATE: - *(uint32_t *)data = atomic_read(&smu->smu_power.power_gate.vcn_gated) ? 0 : 1; + *(uint32_t *)data = 0; + for (i = 0; i < adev->vcn.num_vcn_inst; i++) { + if (!atomic_read(&smu->smu_power.power_gate.vcn_gated[i])) { + *(uint32_t *)data = 1; + break; + } + } *size = 4; break; case AMDGPU_PP_SENSOR_MIN_FAN_RPM: @@ -3885,3 +3915,13 @@ int smu_send_rma_reason(struct smu_context *smu) return ret; } + +int smu_reset_sdma(struct smu_context *smu, uint32_t inst_mask) +{ + int ret = 0; + + if (smu->ppt_funcs && smu->ppt_funcs->reset_sdma) + ret = smu->ppt_funcs->reset_sdma(smu, inst_mask); + + return ret; +} diff --git a/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h b/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h index 3925815358ce98120038dfbf84d388cfddaf7baf..3630593bce61d6ab8fd99934fcbd7a740311674d 100644 --- a/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h +++ b/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h @@ -399,7 +399,7 @@ struct smu_dpm_context { struct smu_power_gate { bool uvd_gated; bool vce_gated; - atomic_t vcn_gated; + atomic_t vcn_gated[AMDGPU_MAX_VCN_INSTANCES]; atomic_t jpeg_gated; atomic_t vpe_gated; atomic_t umsch_mm_gated; @@ -1372,6 +1372,11 @@ struct pptable_funcs { */ int (*send_rma_reason)(struct smu_context *smu); + /** + * @reset_sdma: message SMU to soft reset sdma instance. + */ + int (*reset_sdma)(struct smu_context *smu, uint32_t inst_mask); + /** * @get_ecc_table: message SMU to get ECC INFO table. */ @@ -1631,6 +1636,7 @@ void amdgpu_smu_stb_debug_fs_init(struct amdgpu_device *adev); int smu_send_hbm_bad_pages_num(struct smu_context *smu, uint32_t size); int smu_send_hbm_bad_channel_flag(struct smu_context *smu, uint32_t size); int smu_send_rma_reason(struct smu_context *smu); +int smu_reset_sdma(struct smu_context *smu, uint32_t inst_mask); int smu_set_pm_policy(struct smu_context *smu, enum pp_pm_policy p_type, int level); ssize_t smu_get_pm_policy_info(struct smu_context *smu, diff --git a/drivers/gpu/drm/amd/pm/swsmu/inc/pmfw_if/smu_v13_0_6_ppsmc.h b/drivers/gpu/drm/amd/pm/swsmu/inc/pmfw_if/smu_v13_0_6_ppsmc.h index 41cb681927e2fd6f6f14345bc234ae39ad85432d..147bfb12fd751e4d0c0cb80e7dd9bd57c2001159 100644 --- a/drivers/gpu/drm/amd/pm/swsmu/inc/pmfw_if/smu_v13_0_6_ppsmc.h +++ b/drivers/gpu/drm/amd/pm/swsmu/inc/pmfw_if/smu_v13_0_6_ppsmc.h @@ -93,7 +93,8 @@ #define PPSMC_MSG_SelectPLPDMode 0x40 #define PPSMC_MSG_RmaDueToBadPageThreshold 0x43 #define PPSMC_MSG_SelectPstatePolicy 0x44 -#define PPSMC_Message_Count 0x45 +#define PPSMC_MSG_ResetSDMA 0x4D +#define PPSMC_Message_Count 0x4E //PPSMC Reset Types for driver msg argument #define PPSMC_RESET_TYPE_DRIVER_MODE_1_RESET 0x1 diff --git a/drivers/gpu/drm/amd/pm/swsmu/inc/smu_types.h b/drivers/gpu/drm/amd/pm/swsmu/inc/smu_types.h index a299dc4a807149d3e661adc04d96904686c1eb91..e4cd6a0d13dad8f0d1d176acbc5d4d0b96c75e3d 100644 --- a/drivers/gpu/drm/amd/pm/swsmu/inc/smu_types.h +++ b/drivers/gpu/drm/amd/pm/swsmu/inc/smu_types.h @@ -275,7 +275,8 @@ __SMU_DUMMY_MAP(RmaDueToBadPageThreshold), \ __SMU_DUMMY_MAP(SelectPstatePolicy), \ __SMU_DUMMY_MAP(MALLPowerController), \ - __SMU_DUMMY_MAP(MALLPowerState), + __SMU_DUMMY_MAP(MALLPowerState), \ + __SMU_DUMMY_MAP(ResetSDMA), #undef __SMU_DUMMY_MAP #define __SMU_DUMMY_MAP(type) SMU_MSG_##type diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c b/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c index 286777ada1dfc4cded7a729d66cf643d6463a086..19a25fdc2f5b426008cbe067875afd0223361e2f 100644 --- a/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c +++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c @@ -1157,19 +1157,15 @@ static int sienna_cichlid_dpm_set_vcn_enable(struct smu_context *smu, int inst) { struct amdgpu_device *adev = smu->adev; - int i, ret = 0; + int ret = 0; - for (i = 0; i < adev->vcn.num_vcn_inst; i++) { - if (adev->vcn.harvest_config & (1 << i)) - continue; - /* vcn dpm on is a prerequisite for vcn power gate messages */ - if (smu_cmn_feature_is_enabled(smu, SMU_FEATURE_MM_DPM_PG_BIT)) { - ret = smu_cmn_send_smc_msg_with_param(smu, enable ? - SMU_MSG_PowerUpVcn : SMU_MSG_PowerDownVcn, - 0x10000 * i, NULL); - if (ret) - return ret; - } + if (adev->vcn.harvest_config & (1 << inst)) + return ret; + /* vcn dpm on is a prerequisite for vcn power gate messages */ + if (smu_cmn_feature_is_enabled(smu, SMU_FEATURE_MM_DPM_PG_BIT)) { + ret = smu_cmn_send_smc_msg_with_param(smu, enable ? + SMU_MSG_PowerUpVcn : SMU_MSG_PowerDownVcn, + 0x10000 * inst, NULL); } return ret; diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c b/drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c index 480cf3cb204d2dc03e62c19bf268ad0e11af69a5..189c6a32b6bdb4b065ef5f551dcd3d6503e84972 100644 --- a/drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c +++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c @@ -105,7 +105,8 @@ int smu_v11_0_init_microcode(struct smu_context *smu) return 0; amdgpu_ucode_ip_version_decode(adev, MP1_HWIP, ucode_prefix, sizeof(ucode_prefix)); - err = amdgpu_ucode_request(adev, &adev->pm.fw, "amdgpu/%s.bin", ucode_prefix); + err = amdgpu_ucode_request(adev, &adev->pm.fw, AMDGPU_UCODE_REQUIRED, + "amdgpu/%s.bin", ucode_prefix); if (err) goto out; diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c index 2bfea740daceedb550474dae9478cfda28102d44..7bb45ff6d5c856014c5c08cf5649a6587e3a4cb0 100644 --- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c +++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c @@ -103,7 +103,8 @@ int smu_v13_0_init_microcode(struct smu_context *smu) return 0; amdgpu_ucode_ip_version_decode(adev, MP1_HWIP, ucode_prefix, sizeof(ucode_prefix)); - err = amdgpu_ucode_request(adev, &adev->pm.fw, "amdgpu/%s.bin", ucode_prefix); + err = amdgpu_ucode_request(adev, &adev->pm.fw, AMDGPU_UCODE_REQUIRED, + "amdgpu/%s.bin", ucode_prefix); if (err) goto out; @@ -2108,18 +2109,14 @@ int smu_v13_0_set_vcn_enable(struct smu_context *smu, int inst) { struct amdgpu_device *adev = smu->adev; - int i, ret = 0; + int ret = 0; - for (i = 0; i < adev->vcn.num_vcn_inst; i++) { - if (adev->vcn.harvest_config & (1 << i)) - continue; + if (adev->vcn.harvest_config & (1 << inst)) + return ret; - ret = smu_cmn_send_smc_msg_with_param(smu, enable ? - SMU_MSG_PowerUpVcn : SMU_MSG_PowerDownVcn, - i << 16U, NULL); - if (ret) - return ret; - } + ret = smu_cmn_send_smc_msg_with_param(smu, enable ? + SMU_MSG_PowerUpVcn : SMU_MSG_PowerDownVcn, + inst << 16U, NULL); return ret; } diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c index ab3c93ddce46ff2292de6434455bf52173ae4e02..5b86df0c8536efe69a8afa924299c7851bc21247 100644 --- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c +++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c @@ -193,6 +193,7 @@ static const struct cmn2asic_msg_mapping smu_v13_0_6_message_map[SMU_MSG_MAX_COU MSG_MAP(SelectPLPDMode, PPSMC_MSG_SelectPLPDMode, 0), MSG_MAP(RmaDueToBadPageThreshold, PPSMC_MSG_RmaDueToBadPageThreshold, 0), MSG_MAP(SelectPstatePolicy, PPSMC_MSG_SelectPstatePolicy, 0), + MSG_MAP(ResetSDMA, PPSMC_MSG_ResetSDMA, 0), }; // clang-format on @@ -304,7 +305,8 @@ static int smu_v13_0_6_init_microcode(struct smu_context *smu) amdgpu_ucode_ip_version_decode(adev, MP1_HWIP, ucode_prefix, sizeof(ucode_prefix)); - ret = amdgpu_ucode_request(adev, &adev->pm.fw, "amdgpu/%s.bin", ucode_prefix); + ret = amdgpu_ucode_request(adev, &adev->pm.fw, AMDGPU_UCODE_REQUIRED, + "amdgpu/%s.bin", ucode_prefix); if (ret) goto out; @@ -2716,6 +2718,27 @@ static int smu_v13_0_6_send_rma_reason(struct smu_context *smu) return ret; } +static int smu_v13_0_6_reset_sdma(struct smu_context *smu, uint32_t inst_mask) +{ + struct amdgpu_device *adev = smu->adev; + int ret = 0; + + /* the message is only valid on SMU 13.0.6 with pmfw 85.121.00 and above */ + if ((adev->flags & AMD_IS_APU) || + amdgpu_ip_version(adev, MP1_HWIP, 0) != IP_VERSION(13, 0, 6) || + smu->smc_fw_version < 0x00557900) + return 0; + + ret = smu_cmn_send_smc_msg_with_param(smu, + SMU_MSG_ResetSDMA, inst_mask, NULL); + if (ret) + dev_err(smu->adev->dev, + "failed to send ResetSDMA event with mask 0x%x\n", + inst_mask); + + return ret; +} + static int mca_smu_set_debug_mode(struct amdgpu_device *adev, bool enable) { struct smu_context *smu = adev->powerplay.pp_handle; @@ -3385,6 +3408,7 @@ static const struct pptable_funcs smu_v13_0_6_ppt_funcs = { .i2c_fini = smu_v13_0_6_i2c_control_fini, .send_hbm_bad_pages_num = smu_v13_0_6_smu_send_hbm_bad_page_num, .send_rma_reason = smu_v13_0_6_send_rma_reason, + .reset_sdma = smu_v13_0_6_reset_sdma, }; void smu_v13_0_6_set_ppt_funcs(struct smu_context *smu) diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_7_ppt.c b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_7_ppt.c index f4ac403b8b36009d951e777049b0ca60b5a09162..aabb9479600562bfffcc0618414065b246a68696 100644 --- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_7_ppt.c +++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_7_ppt.c @@ -2810,4 +2810,5 @@ void smu_v13_0_7_set_ppt_funcs(struct smu_context *smu) smu->workload_map = smu_v13_0_7_workload_map; smu->smc_driver_if_version = SMU13_0_7_DRIVER_IF_VERSION; smu_v13_0_set_smu_mailbox_registers(smu); + smu->power_profile_mode = PP_SMC_POWER_PROFILE_BOOTUP_DEFAULT; } diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu14/smu_v14_0.c b/drivers/gpu/drm/amd/pm/swsmu/smu14/smu_v14_0.c index a87040cb2f2e52ff830c4f60d24dcde078961b90..9b2f4fe1578b8db32d8cf5db0768d3d80461c1e9 100644 --- a/drivers/gpu/drm/amd/pm/swsmu/smu14/smu_v14_0.c +++ b/drivers/gpu/drm/amd/pm/swsmu/smu14/smu_v14_0.c @@ -79,7 +79,8 @@ int smu_v14_0_init_microcode(struct smu_context *smu) return 0; amdgpu_ucode_ip_version_decode(adev, MP1_HWIP, ucode_prefix, sizeof(ucode_prefix)); - err = amdgpu_ucode_request(adev, &adev->pm.fw, "amdgpu/%s.bin", ucode_prefix); + err = amdgpu_ucode_request(adev, &adev->pm.fw, AMDGPU_UCODE_REQUIRED, + "amdgpu/%s.bin", ucode_prefix); if (err) goto out; @@ -1511,29 +1512,24 @@ int smu_v14_0_set_vcn_enable(struct smu_context *smu, int inst) { struct amdgpu_device *adev = smu->adev; - int i, ret = 0; + int ret = 0; - for (i = 0; i < adev->vcn.num_vcn_inst; i++) { - if (adev->vcn.harvest_config & (1 << i)) - continue; + if (adev->vcn.harvest_config & (1 << inst)) + return ret; - if (smu->is_apu) { - if (i == 0) - ret = smu_cmn_send_smc_msg_with_param(smu, enable ? - SMU_MSG_PowerUpVcn0 : SMU_MSG_PowerDownVcn0, - i << 16U, NULL); - else if (i == 1) - ret = smu_cmn_send_smc_msg_with_param(smu, enable ? - SMU_MSG_PowerUpVcn1 : SMU_MSG_PowerDownVcn1, - i << 16U, NULL); - } else { + if (smu->is_apu) { + if (inst == 0) ret = smu_cmn_send_smc_msg_with_param(smu, enable ? - SMU_MSG_PowerUpVcn : SMU_MSG_PowerDownVcn, - i << 16U, NULL); - } - - if (ret) - return ret; + SMU_MSG_PowerUpVcn0 : SMU_MSG_PowerDownVcn0, + inst << 16U, NULL); + else if (inst == 1) + ret = smu_cmn_send_smc_msg_with_param(smu, enable ? + SMU_MSG_PowerUpVcn1 : SMU_MSG_PowerDownVcn1, + inst << 16U, NULL); + } else { + ret = smu_cmn_send_smc_msg_with_param(smu, enable ? + SMU_MSG_PowerUpVcn : SMU_MSG_PowerDownVcn, + inst << 16U, NULL); } return ret; diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu14/smu_v14_0_2_ppt.c b/drivers/gpu/drm/amd/pm/swsmu/smu14/smu_v14_0_2_ppt.c index 6a565ce74d5b3be69fa26114f8b94629e44a2c86..5cad09c5f2ff2c3c1fd1ed727e54476835dd4c6c 100644 --- a/drivers/gpu/drm/amd/pm/swsmu/smu14/smu_v14_0_2_ppt.c +++ b/drivers/gpu/drm/amd/pm/swsmu/smu14/smu_v14_0_2_ppt.c @@ -2096,7 +2096,7 @@ static int smu_v14_0_2_enable_gfx_features(struct smu_context *smu) { struct amdgpu_device *adev = smu->adev; - if (adev->ip_versions[MP1_HWIP][0] == IP_VERSION(14, 0, 2)) + if (amdgpu_ip_version(adev, MP1_HWIP, 0) == IP_VERSION(14, 0, 2)) return smu_cmn_send_smc_msg_with_param(smu, SMU_MSG_EnableAllSmuFeatures, FEATURE_PWR_GFX, NULL); else diff --git a/drivers/gpu/drm/arm/display/komeda/komeda_drv.c b/drivers/gpu/drm/arm/display/komeda/komeda_drv.c index d981d721e7969b55240349a073d755da0d98335b..358c1512b08799be69fab74d62df5f1c0cfead77 100644 --- a/drivers/gpu/drm/arm/display/komeda/komeda_drv.c +++ b/drivers/gpu/drm/arm/display/komeda/komeda_drv.c @@ -9,7 +9,7 @@ #include #include #include -#include +#include #include #include #include "komeda_dev.h" diff --git a/drivers/gpu/drm/arm/display/komeda/komeda_kms.c b/drivers/gpu/drm/arm/display/komeda/komeda_kms.c index 1e7b1fcb2848e72c19c44f11786e7eb91791ab3b..6ed504099188714e86d75abbeda4c194f5a42299 100644 --- a/drivers/gpu/drm/arm/display/komeda/komeda_kms.c +++ b/drivers/gpu/drm/arm/display/komeda/komeda_kms.c @@ -63,7 +63,6 @@ static const struct drm_driver komeda_kms_driver = { .fops = &komeda_cma_fops, .name = "komeda", .desc = "Arm Komeda Display Processor driver", - .date = "20181101", .major = 0, .minor = 1, }; diff --git a/drivers/gpu/drm/arm/hdlcd_drv.c b/drivers/gpu/drm/arm/hdlcd_drv.c index 191b806624dfa55b033e6bce639a4b8fafd85f78..c3179d74f3f54667e7a00d9bb4e1793e77b2f0bb 100644 --- a/drivers/gpu/drm/arm/hdlcd_drv.c +++ b/drivers/gpu/drm/arm/hdlcd_drv.c @@ -22,8 +22,8 @@ #include #include +#include #include -#include #include #include #include @@ -233,7 +233,6 @@ static const struct drm_driver hdlcd_driver = { .fops = &fops, .name = "hdlcd", .desc = "ARM HDLCD Controller DRM", - .date = "20151021", .major = 1, .minor = 0, }; diff --git a/drivers/gpu/drm/arm/malidp_drv.c b/drivers/gpu/drm/arm/malidp_drv.c index fd2be80f3bf5c06f2133d886f0e23b329d0dc455..e083021e9e99caefaaf1daa1753fc952dc7f0c72 100644 --- a/drivers/gpu/drm/arm/malidp_drv.c +++ b/drivers/gpu/drm/arm/malidp_drv.c @@ -16,9 +16,9 @@ #include #include +#include #include #include -#include #include #include #include @@ -570,7 +570,6 @@ static const struct drm_driver malidp_driver = { .fops = &fops, .name = "mali-dp", .desc = "ARM Mali Display Processor driver", - .date = "20160106", .major = 1, .minor = 0, }; diff --git a/drivers/gpu/drm/armada/armada_drv.c b/drivers/gpu/drm/armada/armada_drv.c index 650e450cc19b83800cf65be49c070121447a61c8..cae25ad66c749dbd36dfa040755b6f970b16da96 100644 --- a/drivers/gpu/drm/armada/armada_drv.c +++ b/drivers/gpu/drm/armada/armada_drv.c @@ -11,8 +11,8 @@ #include #include +#include #include -#include #include #include #include @@ -45,7 +45,6 @@ static const struct drm_driver armada_drm_driver = { .minor = 0, .name = "armada-drm", .desc = "Armada SoC DRM", - .date = "20120730", .driver_features = DRIVER_GEM | DRIVER_MODESET | DRIVER_ATOMIC, .ioctls = armada_ioctls, .num_ioctls = ARRAY_SIZE(armada_ioctls), diff --git a/drivers/gpu/drm/aspeed/aspeed_gfx_drv.c b/drivers/gpu/drm/aspeed/aspeed_gfx_drv.c index b7e608ba61941dc2665568767d3928abef1cd2e7..397e677a691c2c6d199063f44358196a4569b389 100644 --- a/drivers/gpu/drm/aspeed/aspeed_gfx_drv.c +++ b/drivers/gpu/drm/aspeed/aspeed_gfx_drv.c @@ -13,8 +13,8 @@ #include #include +#include #include -#include #include #include #include @@ -252,7 +252,6 @@ static const struct drm_driver aspeed_gfx_driver = { .fops = &fops, .name = "aspeed-gfx-drm", .desc = "ASPEED GFX DRM", - .date = "20180319", .major = 1, .minor = 0, }; diff --git a/drivers/gpu/drm/ast/ast_drv.c b/drivers/gpu/drm/ast/ast_drv.c index 4afe4be072efc9c08848896e1b7911b43576335c..ff3bcdd1cff2a2daa88d6e85a154ef1a58a87f02 100644 --- a/drivers/gpu/drm/ast/ast_drv.c +++ b/drivers/gpu/drm/ast/ast_drv.c @@ -31,8 +31,8 @@ #include #include +#include #include -#include #include #include #include @@ -60,7 +60,6 @@ static const struct drm_driver ast_driver = { .fops = &ast_fops, .name = DRIVER_NAME, .desc = DRIVER_DESC, - .date = DRIVER_DATE, .major = DRIVER_MAJOR, .minor = DRIVER_MINOR, .patchlevel = DRIVER_PATCHLEVEL, diff --git a/drivers/gpu/drm/ast/ast_drv.h b/drivers/gpu/drm/ast/ast_drv.h index 21ce3769bf0d2c00d1565ee787e1f5f1810880c5..6b4305ac07d4fc48b1df115408f5c983251223ad 100644 --- a/drivers/gpu/drm/ast/ast_drv.h +++ b/drivers/gpu/drm/ast/ast_drv.h @@ -43,7 +43,6 @@ #define DRIVER_NAME "ast" #define DRIVER_DESC "AST" -#define DRIVER_DATE "20120228" #define DRIVER_MAJOR 0 #define DRIVER_MINOR 1 diff --git a/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.c b/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.c index 7b209af7cf457a774237c0310bbc45bafdaf2018..fa8ad94e431af95280c365b4d7188b0fc3145b56 100644 --- a/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.c +++ b/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.c @@ -16,9 +16,9 @@ #include #include +#include #include #include -#include #include #include #include @@ -846,7 +846,6 @@ static const struct drm_driver atmel_hlcdc_dc_driver = { .fops = &fops, .name = "atmel-hlcdc", .desc = "Atmel HLCD Controller DRM", - .date = "20141504", .major = 1, .minor = 0, }; diff --git a/drivers/gpu/drm/bridge/analogix/analogix-anx6345.c b/drivers/gpu/drm/bridge/analogix/analogix-anx6345.c index b754947e3e001bd388087dd14ecab706eec114f9..83d711ee3a2eb8b7a491c115171b5cc12a9df066 100644 --- a/drivers/gpu/drm/bridge/analogix/analogix-anx6345.c +++ b/drivers/gpu/drm/bridge/analogix/analogix-anx6345.c @@ -793,7 +793,7 @@ static void anx6345_i2c_remove(struct i2c_client *client) } static const struct i2c_device_id anx6345_id[] = { - { "anx6345", 0 }, + { "anx6345" }, { /* sentinel */ } }; MODULE_DEVICE_TABLE(i2c, anx6345_id); diff --git a/drivers/gpu/drm/bridge/analogix/anx7625.c b/drivers/gpu/drm/bridge/analogix/anx7625.c index a2675b121fe44b96945f34215fd900f35bfde43a..6238eabd23282d5f7e59a05d267737f40211aaf3 100644 --- a/drivers/gpu/drm/bridge/analogix/anx7625.c +++ b/drivers/gpu/drm/bridge/analogix/anx7625.c @@ -2002,8 +2002,10 @@ static int anx7625_audio_get_eld(struct device *dev, void *data, memset(buf, 0, len); } else { dev_dbg(dev, "audio copy eld\n"); + mutex_lock(&ctx->connector->eld_mutex); memcpy(buf, ctx->connector->eld, min(sizeof(ctx->connector->eld), len)); + mutex_unlock(&ctx->connector->eld_mutex); } return 0; @@ -2795,7 +2797,7 @@ static void anx7625_i2c_remove(struct i2c_client *client) } static const struct i2c_device_id anx7625_id[] = { - {"anx7625", 0}, + { "anx7625" }, {} }; diff --git a/drivers/gpu/drm/bridge/cadence/cdns-mhdp8546-hdcp.c b/drivers/gpu/drm/bridge/cadence/cdns-mhdp8546-hdcp.c index 31832ba4017f1a8489c7dd653aa259be02e3183d..42248f179b69deb75754e7f438bcdc517387a728 100644 --- a/drivers/gpu/drm/bridge/cadence/cdns-mhdp8546-hdcp.c +++ b/drivers/gpu/drm/bridge/cadence/cdns-mhdp8546-hdcp.c @@ -500,34 +500,6 @@ static void cdns_mhdp_hdcp_prop_work(struct work_struct *work) drm_modeset_unlock(&dev->mode_config.connection_mutex); } -int cdns_mhdp_hdcp_set_lc(struct cdns_mhdp_device *mhdp, u8 *val) -{ - int ret; - - mutex_lock(&mhdp->mbox_mutex); - ret = cdns_mhdp_secure_mailbox_send(mhdp, MB_MODULE_ID_HDCP_GENERAL, - HDCP_GENERAL_SET_LC_128, - 16, val); - mutex_unlock(&mhdp->mbox_mutex); - - return ret; -} - -int -cdns_mhdp_hdcp_set_public_key_param(struct cdns_mhdp_device *mhdp, - struct cdns_hdcp_tx_public_key_param *val) -{ - int ret; - - mutex_lock(&mhdp->mbox_mutex); - ret = cdns_mhdp_secure_mailbox_send(mhdp, MB_MODULE_ID_HDCP_TX, - HDCP2X_TX_SET_PUBLIC_KEY_PARAMS, - sizeof(*val), (u8 *)val); - mutex_unlock(&mhdp->mbox_mutex); - - return ret; -} - int cdns_mhdp_hdcp_enable(struct cdns_mhdp_device *mhdp, u8 content_type) { int ret; diff --git a/drivers/gpu/drm/bridge/cadence/cdns-mhdp8546-hdcp.h b/drivers/gpu/drm/bridge/cadence/cdns-mhdp8546-hdcp.h index 334c0b8b0d4f5dbb70a85d81487fee80f28b8e03..3b6ec9c3a8d8bf321006982443d02cf511c565cd 100644 --- a/drivers/gpu/drm/bridge/cadence/cdns-mhdp8546-hdcp.h +++ b/drivers/gpu/drm/bridge/cadence/cdns-mhdp8546-hdcp.h @@ -82,9 +82,6 @@ struct cdns_hdcp_tx_public_key_param { u8 E[DLP_E]; }; -int cdns_mhdp_hdcp_set_public_key_param(struct cdns_mhdp_device *mhdp, - struct cdns_hdcp_tx_public_key_param *val); -int cdns_mhdp_hdcp_set_lc(struct cdns_mhdp_device *mhdp, u8 *val); int cdns_mhdp_hdcp_enable(struct cdns_mhdp_device *mhdp, u8 content_type); int cdns_mhdp_hdcp_disable(struct cdns_mhdp_device *mhdp); void cdns_mhdp_hdcp_init(struct cdns_mhdp_device *mhdp); diff --git a/drivers/gpu/drm/bridge/chipone-icn6211.c b/drivers/gpu/drm/bridge/chipone-icn6211.c index 9eecac457dcf5e8385f5d8e2e94c2b6b562ccb63..d47703559b0dec13fa4478d90ca29be2e899637a 100644 --- a/drivers/gpu/drm/bridge/chipone-icn6211.c +++ b/drivers/gpu/drm/bridge/chipone-icn6211.c @@ -785,7 +785,7 @@ static struct mipi_dsi_driver chipone_dsi_driver = { }, }; -static struct i2c_device_id chipone_i2c_id[] = { +static const struct i2c_device_id chipone_i2c_id[] = { { "chipone,icn6211" }, {}, }; diff --git a/drivers/gpu/drm/bridge/chrontel-ch7033.c b/drivers/gpu/drm/bridge/chrontel-ch7033.c index c83486cf6b15f105f28c27cc4fca4d24817ea335..da17f0978a791fe9bffab3fa0527e3a355d65c6b 100644 --- a/drivers/gpu/drm/bridge/chrontel-ch7033.c +++ b/drivers/gpu/drm/bridge/chrontel-ch7033.c @@ -597,7 +597,7 @@ static const struct of_device_id ch7033_dt_ids[] = { MODULE_DEVICE_TABLE(of, ch7033_dt_ids); static const struct i2c_device_id ch7033_ids[] = { - { "ch7033", 0 }, + { "ch7033" }, { } }; MODULE_DEVICE_TABLE(i2c, ch7033_ids); diff --git a/drivers/gpu/drm/bridge/ite-it6263.c b/drivers/gpu/drm/bridge/ite-it6263.c index cbabd4e20d3ebbe632a60f7cac73302815fab7d8..44af1f25034a6e3e7eca1d679c24bde38cb2e6cb 100644 --- a/drivers/gpu/drm/bridge/ite-it6263.c +++ b/drivers/gpu/drm/bridge/ite-it6263.c @@ -48,6 +48,7 @@ #define REG_COL_DEP GENMASK(1, 0) #define BIT8 FIELD_PREP(REG_COL_DEP, 1) #define OUT_MAP BIT(4) +#define VESA BIT(4) #define JEIDA 0 #define REG_DESSC_ENB BIT(6) #define DMODE BIT(7) @@ -428,12 +429,30 @@ static inline void it6263_lvds_reset(struct it6263 *it) fsleep(10000); } +static inline bool it6263_is_input_bus_fmt_valid(int input_fmt) +{ + switch (input_fmt) { + case MEDIA_BUS_FMT_RGB888_1X7X4_JEIDA: + case MEDIA_BUS_FMT_RGB888_1X7X4_SPWG: + return true; + } + return false; +} + static inline void it6263_lvds_set_interface(struct it6263 *it) { + u8 fmt; + /* color depth */ regmap_write_bits(it->lvds_regmap, LVDS_REG_2C, REG_COL_DEP, BIT8); + + if (it->lvds_data_mapping == MEDIA_BUS_FMT_RGB888_1X7X4_SPWG) + fmt = VESA; + else + fmt = JEIDA; + /* output mapping */ - regmap_write_bits(it->lvds_regmap, LVDS_REG_2C, OUT_MAP, JEIDA); + regmap_write_bits(it->lvds_regmap, LVDS_REG_2C, OUT_MAP, fmt); if (it->lvds_dual_link) { regmap_write_bits(it->lvds_regmap, LVDS_REG_2C, DMODE, DISO); @@ -714,14 +733,14 @@ it6263_bridge_atomic_get_input_bus_fmts(struct drm_bridge *bridge, *num_input_fmts = 0; - if (it->lvds_data_mapping != MEDIA_BUS_FMT_RGB888_1X7X4_JEIDA) + if (!it6263_is_input_bus_fmt_valid(it->lvds_data_mapping)) return NULL; input_fmts = kmalloc(sizeof(*input_fmts), GFP_KERNEL); if (!input_fmts) return NULL; - input_fmts[0] = MEDIA_BUS_FMT_RGB888_1X7X4_JEIDA; + input_fmts[0] = it->lvds_data_mapping; *num_input_fmts = 1; return input_fmts; @@ -878,7 +897,7 @@ static const struct of_device_id it6263_of_match[] = { MODULE_DEVICE_TABLE(of, it6263_of_match); static const struct i2c_device_id it6263_i2c_ids[] = { - { "it6263", 0 }, + { "it6263" }, { } }; MODULE_DEVICE_TABLE(i2c, it6263_i2c_ids); diff --git a/drivers/gpu/drm/bridge/ite-it6505.c b/drivers/gpu/drm/bridge/ite-it6505.c index 008d86cc562af77952d65beacc3979b8611ea6cc..0faad10ba8e448e1611a487f361d4471357f773a 100644 --- a/drivers/gpu/drm/bridge/ite-it6505.c +++ b/drivers/gpu/drm/bridge/ite-it6505.c @@ -3497,7 +3497,7 @@ static void it6505_i2c_remove(struct i2c_client *client) } static const struct i2c_device_id it6505_id[] = { - { "it6505", 0 }, + { "it6505" }, { } }; diff --git a/drivers/gpu/drm/bridge/ite-it66121.c b/drivers/gpu/drm/bridge/ite-it66121.c index 35ae3f0e8f51f768229e055a086b53a419ffcd9f..940083e5d2ddbfc56f14e2bdc6ddd0b9dd50b1f8 100644 --- a/drivers/gpu/drm/bridge/ite-it66121.c +++ b/drivers/gpu/drm/bridge/ite-it66121.c @@ -1450,8 +1450,10 @@ static int it66121_audio_get_eld(struct device *dev, void *data, dev_dbg(dev, "No connector present, passing empty EDID data"); memset(buf, 0, len); } else { + mutex_lock(&ctx->connector->eld_mutex); memcpy(buf, ctx->connector->eld, min(sizeof(ctx->connector->eld), len)); + mutex_unlock(&ctx->connector->eld_mutex); } mutex_unlock(&ctx->lock); diff --git a/drivers/gpu/drm/bridge/lontium-lt8912b.c b/drivers/gpu/drm/bridge/lontium-lt8912b.c index e265ab3c8c92937873ef4c4a81d0792ee2af2ffd..52da204f57404e63c7eadb4318b55082ae732105 100644 --- a/drivers/gpu/drm/bridge/lontium-lt8912b.c +++ b/drivers/gpu/drm/bridge/lontium-lt8912b.c @@ -815,8 +815,8 @@ static const struct of_device_id lt8912_dt_match[] = { MODULE_DEVICE_TABLE(of, lt8912_dt_match); static const struct i2c_device_id lt8912_id[] = { - {"lt8912", 0}, - {}, + { "lt8912" }, + {} }; MODULE_DEVICE_TABLE(i2c, lt8912_id); diff --git a/drivers/gpu/drm/bridge/lontium-lt9211.c b/drivers/gpu/drm/bridge/lontium-lt9211.c index c8881796fba4c6a4dddebd2a266a0c3c30825e0b..999ddebb832de1a3e4a4174c3d5d4cfb3bfcee74 100644 --- a/drivers/gpu/drm/bridge/lontium-lt9211.c +++ b/drivers/gpu/drm/bridge/lontium-lt9211.c @@ -773,7 +773,7 @@ static void lt9211_remove(struct i2c_client *client) drm_bridge_remove(&ctx->bridge); } -static struct i2c_device_id lt9211_id[] = { +static const struct i2c_device_id lt9211_id[] = { { "lontium,lt9211" }, {}, }; diff --git a/drivers/gpu/drm/bridge/lontium-lt9611.c b/drivers/gpu/drm/bridge/lontium-lt9611.c index 1b31fdebe164063e6f3972fdf8a5801ef4c35c4e..74f726efc74613460a6eb9c41f0bbad2ab06208f 100644 --- a/drivers/gpu/drm/bridge/lontium-lt9611.c +++ b/drivers/gpu/drm/bridge/lontium-lt9611.c @@ -757,7 +757,6 @@ static enum drm_mode_status lt9611_bridge_mode_valid(struct drm_bridge *bridge, const struct drm_display_mode *mode) { struct lt9611 *lt9611 = bridge_to_lt9611(bridge); - unsigned long long rate; if (mode->hdisplay > 3840) return MODE_BAD_HVALUE; @@ -765,8 +764,7 @@ static enum drm_mode_status lt9611_bridge_mode_valid(struct drm_bridge *bridge, if (mode->hdisplay > 2000 && !lt9611->dsi1_node) return MODE_PANEL; - rate = drm_hdmi_compute_mode_clock(mode, 8, HDMI_COLORSPACE_RGB); - return bridge->funcs->hdmi_tmds_char_rate_valid(bridge, mode, rate); + return MODE_OK; } static int lt9611_bridge_atomic_check(struct drm_bridge *bridge, @@ -1235,8 +1233,8 @@ static void lt9611_remove(struct i2c_client *client) of_node_put(lt9611->dsi0_node); } -static struct i2c_device_id lt9611_id[] = { - { "lontium,lt9611", 0 }, +static const struct i2c_device_id lt9611_id[] = { + { "lontium,lt9611" }, {} }; MODULE_DEVICE_TABLE(i2c, lt9611_id); diff --git a/drivers/gpu/drm/bridge/lontium-lt9611uxc.c b/drivers/gpu/drm/bridge/lontium-lt9611uxc.c index 4d1d40e1f1b4d144f4aa9de7b83bedf13dfa4436..db9a5466060b663e88d58e85f24bf2d58d74a81c 100644 --- a/drivers/gpu/drm/bridge/lontium-lt9611uxc.c +++ b/drivers/gpu/drm/bridge/lontium-lt9611uxc.c @@ -913,8 +913,8 @@ static void lt9611uxc_remove(struct i2c_client *client) of_node_put(lt9611uxc->dsi0_node); } -static struct i2c_device_id lt9611uxc_id[] = { - { "lontium,lt9611uxc", 0 }, +static const struct i2c_device_id lt9611uxc_id[] = { + { "lontium,lt9611uxc" }, { /* sentinel */ } }; diff --git a/drivers/gpu/drm/bridge/megachips-stdpxxxx-ge-b850v3-fw.c b/drivers/gpu/drm/bridge/megachips-stdpxxxx-ge-b850v3-fw.c index 37f1acf5c0f83ded2fb3d83150dc05cb6cfd2432..a3dcee62e7a59e7895d39eb77575da21203cb136 100644 --- a/drivers/gpu/drm/bridge/megachips-stdpxxxx-ge-b850v3-fw.c +++ b/drivers/gpu/drm/bridge/megachips-stdpxxxx-ge-b850v3-fw.c @@ -318,8 +318,8 @@ static void stdp4028_ge_b850v3_fw_remove(struct i2c_client *stdp4028_i2c) } static const struct i2c_device_id stdp4028_ge_b850v3_fw_i2c_table[] = { - {"stdp4028_ge_fw", 0}, - {}, + { "stdp4028_ge_fw" }, + {} }; MODULE_DEVICE_TABLE(i2c, stdp4028_ge_b850v3_fw_i2c_table); @@ -365,8 +365,8 @@ static void stdp2690_ge_b850v3_fw_remove(struct i2c_client *stdp2690_i2c) } static const struct i2c_device_id stdp2690_ge_b850v3_fw_i2c_table[] = { - {"stdp2690_ge_fw", 0}, - {}, + { "stdp2690_ge_fw" }, + {} }; MODULE_DEVICE_TABLE(i2c, stdp2690_ge_b850v3_fw_i2c_table); diff --git a/drivers/gpu/drm/bridge/nxp-ptn3460.c b/drivers/gpu/drm/bridge/nxp-ptn3460.c index e77aab965fcf91f3e8449ff1302f6594d412f5d6..44e36ae66db487c871712432eecfcb0d68a28da4 100644 --- a/drivers/gpu/drm/bridge/nxp-ptn3460.c +++ b/drivers/gpu/drm/bridge/nxp-ptn3460.c @@ -319,8 +319,8 @@ static void ptn3460_remove(struct i2c_client *client) } static const struct i2c_device_id ptn3460_i2c_table[] = { - {"ptn3460", 0}, - {}, + { "ptn3460" }, + {} }; MODULE_DEVICE_TABLE(i2c, ptn3460_i2c_table); diff --git a/drivers/gpu/drm/bridge/sii902x.c b/drivers/gpu/drm/bridge/sii902x.c index 9be9cc5b902594ebe6e1ac29ab8684623e336796..127da22011b3235b049c38413e56d50414cf36fb 100644 --- a/drivers/gpu/drm/bridge/sii902x.c +++ b/drivers/gpu/drm/bridge/sii902x.c @@ -1239,8 +1239,8 @@ static const struct of_device_id sii902x_dt_ids[] = { MODULE_DEVICE_TABLE(of, sii902x_dt_ids); static const struct i2c_device_id sii902x_i2c_ids[] = { - { "sii9022", 0 }, - { }, + { "sii9022" }, + { } }; MODULE_DEVICE_TABLE(i2c, sii902x_i2c_ids); diff --git a/drivers/gpu/drm/bridge/sii9234.c b/drivers/gpu/drm/bridge/sii9234.c index 0c74cdc070326d03361df30b19209c1a8c4e0b7e..cd7837c9a6e00b572a3fb65e5e0c9fa884555a73 100644 --- a/drivers/gpu/drm/bridge/sii9234.c +++ b/drivers/gpu/drm/bridge/sii9234.c @@ -945,8 +945,8 @@ static const struct of_device_id sii9234_dt_match[] = { MODULE_DEVICE_TABLE(of, sii9234_dt_match); static const struct i2c_device_id sii9234_id[] = { - { "SII9234", 0 }, - { }, + { "SII9234" }, + { } }; MODULE_DEVICE_TABLE(i2c, sii9234_id); diff --git a/drivers/gpu/drm/bridge/sil-sii8620.c b/drivers/gpu/drm/bridge/sil-sii8620.c index 26b8d137bce096c7c848785da414e26dfa8d4945..28a2e1ee04b2828364c6e633399b821562fc8728 100644 --- a/drivers/gpu/drm/bridge/sil-sii8620.c +++ b/drivers/gpu/drm/bridge/sil-sii8620.c @@ -2368,8 +2368,8 @@ static const struct of_device_id sii8620_dt_match[] = { MODULE_DEVICE_TABLE(of, sii8620_dt_match); static const struct i2c_device_id sii8620_id[] = { - { "sii8620", 0 }, - { }, + { "sii8620" }, + { } }; MODULE_DEVICE_TABLE(i2c, sii8620_id); diff --git a/drivers/gpu/drm/bridge/synopsys/Kconfig b/drivers/gpu/drm/bridge/synopsys/Kconfig index ca416dab156d833fa9276bc7aad344146d955828..f3ab2f985f8ca9dc1eeac3bda6b4a31d355cd51c 100644 --- a/drivers/gpu/drm/bridge/synopsys/Kconfig +++ b/drivers/gpu/drm/bridge/synopsys/Kconfig @@ -59,3 +59,9 @@ config DRM_DW_MIPI_DSI select DRM_KMS_HELPER select DRM_MIPI_DSI select DRM_PANEL_BRIDGE + +config DRM_DW_MIPI_DSI2 + tristate + select DRM_KMS_HELPER + select DRM_MIPI_DSI + select DRM_PANEL_BRIDGE diff --git a/drivers/gpu/drm/bridge/synopsys/Makefile b/drivers/gpu/drm/bridge/synopsys/Makefile index 9869d9651ed1f6ae760f9813013fe8d4ca78301b..9dc376d220ad7308355536004ea8430c8c181da1 100644 --- a/drivers/gpu/drm/bridge/synopsys/Makefile +++ b/drivers/gpu/drm/bridge/synopsys/Makefile @@ -8,3 +8,4 @@ obj-$(CONFIG_DRM_DW_HDMI_CEC) += dw-hdmi-cec.o obj-$(CONFIG_DRM_DW_HDMI_QP) += dw-hdmi-qp.o obj-$(CONFIG_DRM_DW_MIPI_DSI) += dw-mipi-dsi.o +obj-$(CONFIG_DRM_DW_MIPI_DSI2) += dw-mipi-dsi2.o diff --git a/drivers/gpu/drm/bridge/synopsys/dw-hdmi-qp.c b/drivers/gpu/drm/bridge/synopsys/dw-hdmi-qp.c index 181c5164b23192f0b557624d73c6223032b90ec6..c686671e4850a1af75b82995185ffc3cbb22a447 100644 --- a/drivers/gpu/drm/bridge/synopsys/dw-hdmi-qp.c +++ b/drivers/gpu/drm/bridge/synopsys/dw-hdmi-qp.c @@ -442,16 +442,14 @@ dw_hdmi_qp_bridge_edid_read(struct drm_bridge *bridge, } static enum drm_mode_status -dw_hdmi_qp_bridge_mode_valid(struct drm_bridge *bridge, - const struct drm_display_info *info, - const struct drm_display_mode *mode) +dw_hdmi_qp_bridge_tmds_char_rate_valid(const struct drm_bridge *bridge, + const struct drm_display_mode *mode, + unsigned long long rate) { struct dw_hdmi_qp *hdmi = bridge->driver_private; - unsigned long long rate; - rate = drm_hdmi_compute_mode_clock(mode, 8, HDMI_COLORSPACE_RGB); if (rate > HDMI14_MAX_TMDSCLK) { - dev_dbg(hdmi->dev, "Unsupported mode clock: %d\n", mode->clock); + dev_dbg(hdmi->dev, "Unsupported TMDS char rate: %lld\n", rate); return MODE_CLOCK_HIGH; } @@ -510,7 +508,7 @@ static const struct drm_bridge_funcs dw_hdmi_qp_bridge_funcs = { .atomic_disable = dw_hdmi_qp_bridge_atomic_disable, .detect = dw_hdmi_qp_bridge_detect, .edid_read = dw_hdmi_qp_bridge_edid_read, - .mode_valid = dw_hdmi_qp_bridge_mode_valid, + .hdmi_tmds_char_rate_valid = dw_hdmi_qp_bridge_tmds_char_rate_valid, .hdmi_clear_infoframe = dw_hdmi_qp_bridge_clear_infoframe, .hdmi_write_infoframe = dw_hdmi_qp_bridge_write_infoframe, }; diff --git a/drivers/gpu/drm/bridge/synopsys/dw-hdmi-qp.h b/drivers/gpu/drm/bridge/synopsys/dw-hdmi-qp.h index 2115b8ef0bd6851bd09a9e95855a23ea86881dfb..72987e6c468928f2b998099697a6f32726411557 100644 --- a/drivers/gpu/drm/bridge/synopsys/dw-hdmi-qp.h +++ b/drivers/gpu/drm/bridge/synopsys/dw-hdmi-qp.h @@ -1,6 +1,6 @@ /* SPDX-License-Identifier: GPL-2.0 */ /* - * Copyright (C) Rockchip Electronics Co.Ltd + * Copyright (C) Rockchip Electronics Co., Ltd. * Author: * Algea Cao */ diff --git a/drivers/gpu/drm/bridge/synopsys/dw-mipi-dsi2.c b/drivers/gpu/drm/bridge/synopsys/dw-mipi-dsi2.c new file mode 100644 index 0000000000000000000000000000000000000000..d7569bf2d9c3ef1f22ac07c95d112d1c62dd67a1 --- /dev/null +++ b/drivers/gpu/drm/bridge/synopsys/dw-mipi-dsi2.c @@ -0,0 +1,1030 @@ +// SPDX-License-Identifier: GPL-2.0+ +/* + * Copyright (c) 2024, Fuzhou Rockchip Electronics Co., Ltd + * + * Modified by Heiko Stuebner + * This generic Synopsys DesignWare MIPI DSI2 host driver is based on the + * Rockchip version from rockchip/dw-mipi-dsi2.c converted to use bridge APIs. + */ + +#include +#include +#include +#include +#include +#include +#include +#include + +#include