iommu/arm-smmu: Add support to handle Qcom's wait-for-safe logic
Qcom's implementation of arm,mmu-500 adds a WAIT-FOR-SAFE logic to address under-performance issues in real-time clients, such as Display, and Camera. On receiving an invalidation requests, the SMMU forwards SAFE request to these clients and waits for SAFE ack signal from real-time clients. The SAFE signal from such clients is used to qualify the start of invalidation. This logic is controlled by chicken bits, one for each - MDP (display), IFE0, and IFE1 (camera), that can be accessed only from secure software on sdm845. This configuration, however, degrades the performance of non-real time clients, such as USB, and UFS etc. This happens because, with wait-for-safe logic enabled the hardware tries to throttle non-real time clients while waiting for SAFE ack signals from real-time clients. On MTP sdm845 devices, with wait-for-safe logic enabled at the boot time by the bootloaders we see degraded performance of USB and UFS when kernel enables the smmu stage-1 translations for these clients. Turn off this wait-for-safe logic from the kernel gets us back the perf of USB and UFS devices until we re-visit this when we start seeing perf issues on display/camera on upstream supported SDM845 platforms. Now, different bootloaders with their access control policies allow this register access differently through secure monitor calls - 1) With one we can issue io-read/write secure monitor call (qcom-scm) to update the register, while, 2) With other, such as one on MTP sdm845 we should use the specific qcom-scm command to send request to do the complete register configuration. Adding a separate device tree flag for arm-smmu to identify which firmware configuration of the two mentioned above we use. Not adding code change to allow type-(1) bootloaders to toggle the safe using io-read/write qcom-scm call. This change is inspired by the downstream change from Patrick Daly to address performance issues with display and camera by handling this wait-for-safe within separte io-pagetable ops to do TLB maintenance. So a big thanks to him for the change. Without this change the UFS reads are pretty slow: $ time dd if=/dev/sda of=/dev/zero bs=1048576 count=10 conv=sync 10+0 records in 10+0 records out 10485760 bytes (10.0MB) copied, 22.394903 seconds, 457.2KB/s real 0m 22.39s user 0m 0.00s sys 0m 0.01s With this change they are back to rock! $ time dd if=/dev/sda of=/dev/zero bs=1048576 count=300 conv=sync 300+0 records in 300+0 records out 314572800 bytes (300.0MB) copied, 1.030541 seconds, 291.1MB/s real 0m 1.03s user 0m 0.00s sys 0m 0.54s Signed-off-by:Vivek Gautam <vivek.gautam@codeaurora.org> Signed-off-by:
Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by:
Ondrej Kubik <ondrej.kubik@canonical.com>
Loading
Please sign in to comment