OSDMP157C: DCMI overruns triggered by SPI5 transfers when DMA is enabled

Forums Devices OSD32MP15x OSDMP157C: DCMI overruns triggered by SPI5 transfers when DMA is enabled

Viewing 5 reply threads
  • Author
    Posts
    • #13574
      Sylvain BourréSylvainB
      Participant

      Hi,

      We’re using DCMI (80MHz pixel clock, 12-bit) to connect a custom sensor (2048×256) and we can’t afford to miss any frames.
      The DCMI is assigned to CA7 cores, stock Linux driver stm32-dcmi is used.
      SPI5, assigned to CA7 cores as well, is used to send data to a mipi-dbi TFT screen (15MHz, 160×80).

      Everything is working like a charm when SPI5 uses interrupt mode.
      However, we’d like to use DMA mode for SPI5 to reduce CPU consumption: unfortunately, when using DMA a screen refresh triggers DCMI overrun errors.

      I tried to change DCMI’s DMA priority to very high: no improvement.
      Any advice/suggestion?

      As I’m a bit desperate, I tried to use SPI6 instead of SPI5 because this SPI instance seems closer from the CA7 cores than the former:
      – SPI6 in Interrupt mode: TFT screen is working
      – SPI6 in DMA mode: TFT screen not working and MDMA driver reports BSE (Block Size Error).

      How to solve this transfer error ?

      Thanks in advance,
      Sylvain.

      Please, find below dmesg output that shows the BSE error:
      [ 4927.476761] spi_stm32 5c001000.spi: cpol=0 cpha=0 lsb_first=0 cs_high=0
      [ 4927.476777] spi_stm32 5c001000.spi: stm32_spi_can_dma: true
      [ 4927.476817] spi_stm32 5c001000.spi: stm32_spi_can_dma: true
      [ 4927.476833] spi_stm32 5c001000.spi: transfer communication mode set to 1
      [ 4927.476848] spi_stm32 5c001000.spi: data frame of 16-bit, data packet of 2 data frames
      [ 4927.476862] spi_stm32 5c001000.spi: speed set to 8328125Hz
      [ 4927.476876] spi_stm32 5c001000.spi: transfer of 25600 bytes (12800 data frames)
      [ 4927.476888] spi_stm32 5c001000.spi: dma enabled
      [ 4927.476904] spi_stm32 5c001000.spi: Tx DMA config buswidth=2, maxburst=1
      [ 4927.476934] dma dma0chan8: hwdesc:  0xd8000000
      [ 4927.476948] dma dma0chan8: CTCR:    0x02000042
      [ 4927.476961] dma dma0chan8: CBNDTR:  0x00006400
      [ 4927.476974] dma dma0chan8: CSAR:    0xd8048000
      [ 4927.476987] dma dma0chan8: CDAR:    0x5c001020
      [ 4927.477000] dma dma0chan8: CBRUR:   0x00000000
      [ 4927.477013] dma dma0chan8: CLAR:    0x00000000
      [ 4927.477026] dma dma0chan8: CTBR:    0x00000023
      [ 4927.477038] dma dma0chan8: CMAR:    0x00000000
      [ 4927.477051] dma dma0chan8: CMDR:    0x00000000
      [ 4927.477051]
      [ 4927.477075] dma dma0chan8: vchan fbf730a5: issued
      [ 4927.477090] dma dma0chan8: CCR:     0x00000006
      [ 4927.477104] dma dma0chan8: CTCR:    0x02000042
      [ 4927.477117] dma dma0chan8: CBNDTR:  0x00006400
      [ 4927.477130] dma dma0chan8: CSAR:    0xd8048000
      [ 4927.477143] dma dma0chan8: CDAR:    0x5c001020
      [ 4927.477156] dma dma0chan8: CBRUR:   0x00000000
      [ 4927.477169] dma dma0chan8: CLAR:    0x00000000
      [ 4927.477182] dma dma0chan8: CTBR:    0x00000023
      [ 4927.477194] dma dma0chan8: CMAR:    0x00000000
      [ 4927.477207] dma dma0chan8: CMDR:    0x00000000
      [ 4927.477221] dma dma0chan8: vchan fbf730a5: started
      [ 4927.477235] spi_stm32 5c001000.spi: enable controller
      [ 4927.477259] dma dma0chan8: Transfer Err: stat=0x00000880
      [ 4927.734857] st7735s spi1.0: SPI transfer timed out
      [ 4927.738283] spi_stm32 5c001000.spi: stm32_spi_can_dma: true
      [ 4927.738307] spi_stm32 5c001000.spi: disable controller
      [ 4927.738354] spi_master spi1: failed to transfer one message from queue

    • #13575
      Aedan Cullen
      Participant

      Do you have readouts of those MDMA registers from a successful 25600-byte transfer on SPI5?

    • #13576
      Sylvain BourréSylvainB
      Participant

      Aedan, below is a dmesg output showing successful 25600-byte transfer on SPI5 (DMA):

      [  291.136104] spi_stm32 44009000.spi: cpol=0 cpha=0 lsb_first=0 cs_high=0
      [  291.136119] spi_stm32 44009000.spi: stm32_spi_can_dma: true
      [  291.136158] spi_stm32 44009000.spi: stm32_spi_can_dma: true
      [  291.136174] spi_stm32 44009000.spi: transfer communication mode set to 1
      [  291.136189] spi_stm32 44009000.spi: data frame of 16-bit, data packet of 2 data frames
      [  291.136202] spi_stm32 44009000.spi: speed set to 13054870Hz
      [  291.136216] spi_stm32 44009000.spi: transfer of 25600 bytes (12800 data frames)
      [  291.136228] spi_stm32 44009000.spi: dma enabled
      [  291.136244] spi_stm32 44009000.spi: Tx DMA config buswidth=2, maxburst=1
      [  291.136282] stm32-dma 48000000.dma-controller: vchan a624546f: txd d3e6b206[316]: submitted
      [  291.136302] dma dma1chan4: vchan a624546f: issued
      [  291.136318] dma dma1chan4: SCR:   0x00002c56
      [  291.136332] dma dma1chan4: NDTR:  0x00003200
      [  291.136345] dma dma1chan4: SPAR:  0x44009020
      [  291.136358] dma dma1chan4: SM0AR: 0xd8048000
      [  291.136370] dma dma1chan4: SM1AR: 0xd8048000
      [  291.136383] dma dma1chan4: SFCR:  0x00000021
      [  291.136397] dma dma1chan4: vchan a624546f: started
      [  291.136410] spi_stm32 44009000.spi: enable controller
      [  291.152230] spi_stm32 44009000.spi: disable controller
      [  291.152336] spi_stm32 44009000.spi: stm32_spi_can_dma: true
      [  291.152363] spi_stm32 44009000.spi: disable controller

    • #13580
      Aedan Cullen
      Participant

      Here’s the next thing I’d try:

       

      In your device tree node for SPI6, set the dmas property as follows:

       

      dmas = <&mdma1 34 0x0 0x40008 0x0 0x0 0x0>,

      <&mdma1 35 0x0 0x10040002 0x0 0x0 0x0>;

       

      Compared to stm32mp151.dtsi, which has 0x40002 as the third parameter in the TX DMA, the value 0x10040002 should enable block transfer mode. If this doesn’t work (we might need to tweak more things for the block transfer to complete successfully), send the error/register dump from dmesg again. (Unfortunately I don’t have a similar SPI display laying around to actually test on.)

       

       

       

      Full thought process, starting with what I think caused the overruns with your original SPI5 configuration:

       

      DCMI is always going to be using one of the standard DMA controllers, DMA1 or DMA2 (selected by the DMAMUX) [1].

      SPI5 will also be automatically assigned a stream by the DMAMUX driver in the same way [2]. Most likely, all eight streams of DMA1 are not yet occupied, and so both DCMI and SPI5 end up on the same DMA1 controller.

      DMA1/DMA2 only arbitrate between streams once a request is completely finished [3]. That is, DMA1 will wait until your 26Kbyte display frame is completed before possibly servicing the DCMI, even if the DCMI stream has a higher priority.

      – Probably, the display frame transfer takes long enough over SPI5 that a DCMI frame is guaranteed to start before the display frame transfer is completed. Check whether this is true with whatever framerate you use for your image sensor.

      – So then the DCMI FIFO is guaranteed to overflow anytime you transfer a display frame, regardless of priority.

       

      So we either need SPI display frame transfers to be much faster, or we need to use a separate DMA controller. Incidentally, your SPI6 configuration achieves exactly that and uses MDMA instead [1]. (The separate DMA controller is what we need, not proximity to the ARM cores [4].)

       

      MDMA breaks because TRGM[1:0] is left as 00 in CTCR (as if to configure a buffer transfer) but the data length is set in CBNDTR (for a block transfer). The TLEN (for buffer length) is left as the default one byte, which can’t be packed into a 16-bit write to the peripheral, so you get the BSE error. The MDMA driver seems to deal with the data length registers properly (you must use a block transfer above 128 bytes), but it doesn’t automatically switch to a block transfer in CTCR. In the device tree property above, I’m setting TRGM[1:0] in CTCR to 01 instead of 00. More debugging might be required after this, but you definitely need the MDMA to be in block transfer mode.

       

      [1] per stm32mp151.dtsi.

      [2] line 110 of stm32-dmamux.c.

      [3] section 2.2.3 of ST AN4031. F7, H7, and MP1 use the same DMA IP as far as I know.

      [4] The A7 cores are not involved during DMA transfers between memory and a peripheral.

       

      Octavo, is there some sort of allow-list that the forum has for trusted users so that I could post links without being caught by the spam filter? 🙂

      • This reply was modified 10 months, 4 weeks ago by Aedan Cullen. Reason: fixed missing zero in hex
    • #13584
      Sylvain BourréSylvainB
      Participant

      Aedan, thank you for your very detailed answer, it’s fantastic. You are right about every point!

      I cannot speedup the SPI display transfers so I need to use of separate DMA controller; however to avoid re-spinning a new board, I’d like to continue using SPI5 and to use DMA2 for DCMIonly.

      The question is HOW to get DCMI to use DMA2 ?

      I enabled DMA for I2C1  in the hope that DCMI ends up using DMA2…  and it did, everything is working as expected : no more overruns!

      New /sys/kernel/debug/dmaengine/summary output:
      dma0 (58000000.dma-controller): number of channels: 32
      dma0chan0    | 48000000.dma-controller:ch0
      dma0chan1    | 48000000.dma-controller:ch1
      dma0chan2    | 48000000.dma-controller:ch2
      dma0chan3    | 48000000.dma-controller:ch3
      dma0chan4    | 48000000.dma-controller:ch4
      dma0chan5    | 48000000.dma-controller:ch5
      dma0chan6    | 48000000.dma-controller:ch6
      dma0chan7    | 48000000.dma-controller:ch7
      dma0chan8    | 48001000.dma-controller:ch0
      dma0chan9    | 48001000.dma-controller:ch1
      dma0chan10   | 48001000.dma-controller:ch2
      dma0chan11   | 48001000.dma-controller:ch3
      dma0chan12   | 48001000.dma-controller:ch4
      dma0chan13   | 48001000.dma-controller:ch5
      dma0chan14   | 48001000.dma-controller:ch6
      dma0chan15   | 48001000.dma-controller:ch7

      dma1 (48000000.dma-controller): number of channels: 8
      dma1chan0    | 4000e000.serial:rx (via router: 48002000.dma-router)
      dma1chan1    | 4000e000.serial:tx (via router: 48002000.dma-router)
      dma1chan2    | 4000b000.spi:tx (via router: 48002000.dma-router)
      dma1chan3    | 4000b000.spi:rx (via router: 48002000.dma-router)dmas
      dma1chan4    | 44009000.spi:tx (via router: 48002000.dma-router)
       dma1chan5    | 44009000.spi:rx (via router: 48002000.dma-router)
       dma1chan6    | 40012000.i2c:tx (via router: 48002000.dma-router)
       dma1chan7    | 40012000.i2c:rx (via router: 48002000.dma-router)

      dma2 (48001000.dma-controller): number of channels: 8
      dma2chan1    | 4c006000.dcmi:tx

      This is really good news ! But I don’t like the hackish method I used to getl DCMI to use DMA2… What is the recommended method ?

      I tried to add dmas = <&dma2 0 0 0x400 0xe0000001>; in my device tree node for DCMI to overwrite  dmas = <&dmamux1 75 0x400 0xe0000001>; from stm32mp151.dtsi.
      According to /sys/kernel/debug/dmaengine/summary DCMI uses the DMA2 controller, but it does not work….

      Thank you in advance,
      Sylvain.

    • #13589
      Aedan CullenAedan Cullen
      Participant

      <p id=”docs-internal-guid-70e29185-7fff-97b0-0d38-c39dbf26df72″ dir=”ltr”>Awesome, I’m glad you got it working. I think the reason you can’t directly set &dma2 in the device tree is that DMAMUX configuration is needed to route the request signal from DCMI to DMA2 within the SoC, and that remains unconfigured if the device tree only references the DMA controller directly. You could potentially try changing only the SPI TX DMA to &dma2 to see if that’ll still work since the CPU is starting the transfer – I didn’t yet look closely to see whether it should.
      <p dir=”ltr”>In the stm32-dmamux.c driver, it definitely seems to be an oversight that there is no provision for manually choosing a controller. They just take the 16 total channels (eight DMA0 followed by eight DMA1) and assign them sequentially as you saw. I think ST may not have anticipated/tested DMA use for simultaneous video streams like this. The cleanest approach I can think of right now would be to improve the stm32-dmamux.c driver to add (optional) manual multiplexing control in the device tree.

Viewing 5 reply threads
  • You must be logged in to reply to this topic.