Ethernet port failure on 2nd stage u-boot

Forums Devices OSD335x-SM Ethernet port failure on 2nd stage u-boot

Viewing 17 reply threads
  • Author
    Posts
    • #9241
      Mat Mat Mahermemaher
      Participant

        Sorry, this might be quite a large post. Its the strangest problem I’ve dealt with in a long while and I’m lost for ideas…

        We’ve got a custom designed board with dual 100mbit Ethernets: MII interfaced ports connected through LAN8710A devices.
        1. Running a standard Beaglebone Debian image with custom device-tree, eth0 is alive from the O/S side but unresponsive to data throughput.
        2. Pinging the devices from the Ethernet shows activity on the PHY activity lights, showing the PHY is receiving *something*.
        3. Performing the MII loopback tests as described in the app notes works perfectly: the PHY’s are online on the MII port.
        4. breaking u-boot on the keyboard halt shows ethernet as working (it performs active DHCP requests back to my host computer).

        So, I can pretty much confirm that the the hardware working, and stage-1 uboot configuration is happy with it. However, the moment the 2nd-stage u-boot and Linux kernel takes over, it breaks the configuration and ethernet fails to work.

        There are three possibilities as I see it:
        1. my device-tree is wrong
        – its had a lot of changes and been validated against EVM device trees so unlikely.

        2. u-boot is applying beaglebone device trees on top of my device tree.
        – the boot messages suggest this is happening

        3. there is some weird configuration of Linux at work

        Option 2 is currently my best guess. The boot messages show a lot of unwanted beaglebone device trees being applied.

        Apart from the simple question of “have you ever seen this before or have any suggests where to start looking?”, my main question is

        “how can I shut-down the beaglebone overlays so they aren’t overwriting my device tree?

        Any ideas?

        Mat

        ————————————————————————————————————

         

        So,

        1. Anyone any ideas what might be going wrong?

        2. Anyone any ideas how to shut-down the beaglebone overlays?

        Attachments:
      • #9251
        Eshtaartha Basu
        Moderator

          Hello memaher,

          Looks like emanip-arm-ctrl.dts was not successfully uploaded due to security issues. Please zip all your files and re-upload them so that we can access it.

          1. Please share complete dmesg or boot log. In the current partial boot log, we cannot see any messages related to Eth PHY.
          2. Please re-upload all device tree files so that we can review it.
          3. If possible, share schematics of Eth PHY circuitry.
          4. Have you tried using the official Octavo RED image instead of latest BeagleBoard image? If yes, did you face similar issues on RED image as well?

          Meanwhile, we’d also recommend taking a look at our Ethernet app note (https://octavosystems.com/app_notes/ethernet-am335x-system-in-package/) and other Dual Ethernet discussions on our forum.

        • #9252
          Mat Mat Mahermemaher
          Participant

            Some progress since, but even more confusing now…

            The good news is that its now working. However, to get it working, I had to connect both ethernet ports to a valid target (including the one which isn’t actually being used)! If either Ethernet port is disconnected, then both fail to operate. Is this something you’ve come across before?

            I’ll re-upload my files (with .txt extension): please let me know if you spot anything or can shed some light on why its operating in this manner?

            My ultimate aim is to have the 3 ethernet ports operating as a switch (CPU as the 3rd). However, I’d settle for just one ethernet port operating (with the other disconnected) for now!

            I appreciate your support!

            Mat

          • #9255
            Mat Mat Mahermemaher
            Participant

              Final bit of useful information. Both dmesg log and EthTool both list the PHY as being at address 4. So, what I think is happening is that the kernel is mis-configured between MII port and MDIO address. What should be:
              MII0 == MDIO Addr 0
              MII1 == MDIO Addr 4

              is actually reading

              MII0 == MDIO Addr 4
              MII1 == MDIO Addr 0

              This would explain why MII0 port fails if MDIO<4> says the port is disconnected.

              However, this being the case, you would think that just swapping the cpsw {} entries for phy addesses would resolve the problem. Alas not: it doesn’t actually make the slightest difference: EthTool STILL reports address 4!

              —————————————————————————————————-

              Settings for eth0:
              Supported ports: [ TP MII ]
              Supported link modes: 10baseT/Half 10baseT/Full
              100baseT/Half 100baseT/Full
              Supported pause frame use: Symmetric Receive-only
              Supports auto-negotiation: Yes
              Supported FEC modes: Not reported
              Advertised link modes: 10baseT/Half 10baseT/Full
              100baseT/Half 100baseT/Full
              Advertised pause frame use: No
              Advertised auto-negotiation: Yes
              Advertised FEC modes: Not reported
              Link partner advertised link modes: 10baseT/Half 10baseT/Full
              100baseT/Half 100baseT/Full
              Link partner advertised pause frame use: Symmetric
              Link partner advertised auto-negotiation: Yes
              Link partner advertised FEC modes: Not reported
              Speed: 100Mb/s
              Duplex: Full
              Port: MII
              PHYAD: 4
              Transceiver: internal
              Auto-negotiation: on
              Cannot get wake-on-lan settings: Operation not permitted
              Current message level: 0x00000000 (0)

              Link detected: yes

            • #9256
              Neeraj Dantu
              Moderator

                Hey Memaher,

                A couple of changes are suggested for the device tree attached: emanip-arm-ctrl.txt.

                1. pinctrl-names ‘default’ and ‘sleep’ are defined but ‘sleep’ pin mux definitions are not linked in the @mac node.

                2. The pin mux definitions in ‘mii_pins_default‘ set the pins for GMII and RGMII modes instead of the MII mode that they need to be set to

                3. This is more of a clean up action, but please get rid of the pinmux nodes that are not being used for clarity.

                These changes might fix the issues you are having. In case they do not, please attach a complete log output of dmesg(failed upload in your last post) so that we can examine how the processor is probing the PHYs and initializing the interfaces

                In addition, note that only 1 1.5K pull-up is enough on MDIO_DATA and 2 pull-ups in parallel could be stronger than necessary,

                Hope this helps. Please let us know how your debug goes.

                Neeraj

              • #9272
                Mat Mat Mahermemaher
                Participant

                  Thanks for the comments. I’ll make the amendments but I don’t think they are the route cause of this problem.

                  dmesg reattached, from what I can ascertain:
                  1. Both PHYs are detected correctly on MDIO ([1.171191] + [1.171200])
                  2. eth0 clearly gets brought up with MDIO:04 [17.571013].

                  MAt

                  Attachments:
                • #9281
                  Neeraj Dantu
                  Moderator

                    Memaher,

                    As you pointed out, the MDIO probe is finding 2 PHYs, but only 1 PHY is being initialized after. This could be related to the pinmux declaration. Please let us know how the system behaves after the device tree updates.

                    Best,

                    Neeraj

                  • #9282
                    Mat Mat Mahermemaher
                    Participant

                      I made all the changes except for the allocation of MII vs GMII. Not sure where to go here, as the options are GMII, RGMII or RMII. As I’m using the full MII I didn’t want to head to RMII. Despite the MDIO problems, using GMII works fine: I get a very stable link up (but of course only when I’ve got both ethernet cables connected).

                      Interestingly, I tried breaking the cpsw_emac definitions by allocating PHY2 to the wrong address:

                      &cpsw_emac0
                      {
                      phy_id = <&davinci_mdio>, <0>;
                      phy-mode = “mii”;
                      dual_emac_res_vlan = <1>;
                      ifname = “eth0”;
                      };
                      &cpsw_emac1
                      {
                      phy_id = <&davinci_mdio>, <1>;
                      phy-mode = “mii”;
                      dual_emac_res_vlan = <2>;
                      ifname = “eth1”;
                      };

                      The end result was the same, it still allocated the eth0 PHY to address 4 (ethtool printout below).

                      This to me indicates one thing: the eth0 PHY address is being picked up as the last-found address from an automatic scan of the MDIO bus, NOT the device tree.

                      Still lost how to fix it though!

                      Mat
                      ——————————————–

                      Settings for eth0:
                      Supported ports: [ TP MII ]
                      Supported link modes: 10baseT/Half 10baseT/Full
                      100baseT/Half 100baseT/Full
                      Supported pause frame use: Symmetric Receive-only
                      Supports auto-negotiation: Yes
                      Supported FEC modes: Not reported
                      Advertised link modes: 10baseT/Half 10baseT/Full
                      100baseT/Half 100baseT/Full
                      Advertised pause frame use: No
                      Advertised auto-negotiation: Yes
                      Advertised FEC modes: Not reported
                      Link partner advertised link modes: 10baseT/Half 10baseT/Full
                      100baseT/Half 100baseT/Full
                      Link partner advertised pause frame use: Symmetric
                      Link partner advertised auto-negotiation: Yes
                      Link partner advertised FEC modes: Not reported
                      Speed: 100Mb/s
                      Duplex: Full
                      Port: MII
                      PHYAD: 4
                      Transceiver: internal
                      Auto-negotiation: on
                      Cannot get wake-on-lan settings: Operation not permitted
                      Current message level: 0x00000000 (0)

                      Link detected: yes

                    • #9283
                      Neeraj Dantu
                      Moderator

                        Mat,

                        Can you please attach the dmesg logs for 1. when both ports come up and work and 2. Both ports do not work. You are right that the processor scans the MDIO bus to see where the PHYs are located. But, in our experience, having a wrong address in the device tree still results in the ethernet port not working.

                        Best,

                        Neeraj

                         

                      • #9285
                        Mat Mat Mahermemaher
                        Participant

                          Two logs attached.

                          Later today I’ll attempt to physically change the phy address on eth0 to be higher than eth1 phy. Should confirm whether or not it’s just picking the highest number.

                          Attachments:
                        • #9289
                          Mat Mat Mahermemaher
                          Participant

                            ok, so my theory about the highest address doesn’t stack up. But, I can confirm that changing the PHYID within the cpsw {} device tree sections doesn’t make the slightest bit of difference to the PHYID allocation.

                            I can also confirm that the extra 1.5k pullup isn’t the issue (now removed).

                            I’ve modified the board as follows

                            ethernet#1 = PHYID 5 (this is my eth0 port)
                            ethernet#2 = PHYID 4

                            1. setting the device tree to addresses 0 & 1 respectively results in ethtool detecting PHYID=4 as eth0 (aka it doesn’t work)
                            2. setting the device tree to addresses 5 & 4 respectively results in ethtool detecting PHYID=4 on eth0 (ditto)
                            3. setting the device tree backwards to 4 & 5 respectively results in exactly the same PHYID=4.

                            Something is finding address 4 and prioritising it over everything, but I’ve no idea what. It still remains the same problem: eth0 works brilliantly, but only when a cable is plugged into eth1 – otherwise it doesn’t detect a valid link.

                          • #9290
                            Mat Mat Mahermemaher
                            Participant

                              and finally…

                              PHY ADDRESS = 4 is hard-coded somewhere in the system: it is NOT being read from the custom device tree.

                              I can say that now, as…

                              1. Changing my hardware 2nd PHY address away from 4 to 0 (with changing the device tree to match) results in

                              [ 18.311930] libphy: PHY 4a101000.mdio:04 not found
                              [ 18.353779] net eth0: phy “4a101000.mdio:04” not found on slave 0, err -19

                              So something, somewhere, is hard-coding this address and it was just coincidence that I chose adr=4 for my second phy.

                              2. changing my hardware 1st PHY address to 4 (leaving 2nd phy at 0) results in the ethernet port now working (regardless of device tree setting).

                              So, I now have a working ethernet port. However, it doesn’t explain the root cause of the problem: maybe its the original suspicion of beaglebone overlays still being loaded on top of the custom overlay?

                            • #9291
                              Neeraj Dantu
                              Moderator

                                Mat,

                                It would be good to verify that the correct device tree and no overlays are being loaded while booting. For this, we will have to look at logs before the kernel starts on UART0. The bootloader log will show whether any overlays are being loaded.

                                The logs both working and not working only show 1 Ethernet port, while you should see eth0 and eth1 come up.

                                Note that the PHY address for Beaglebone Black is set to 0. PHY address for the RED board is set to 4. I suspect that the device tree is not loading correctly? See model in the boot logs “[    0.000000] OF: fdt: Machine model: Octavo Systems OSD3358-SM-RED” while the string you have in your device tree is “model = “Seaeye eManip Arm controller”;”

                                You can also check the model name in command line: “cat /proc/device-tree/model”

                                Hope this helps.

                                Neeraj

                                 

                                 

                              • #9328
                                Mat Mat Mahermemaher
                                Participant

                                  u-boot is loading a LOT of device trees in addition to my custom one! Is there any way to shut these down other than customising u-boot?

                                   

                                • #9337
                                  Neeraj Dantu
                                  Moderator

                                    Memaher,

                                    From the logs it looks like the following is happening:

                                    1. Processor recognizes MMC0 interface, presumably with the SD card as boot source

                                    2. Check for uEnv.txt in /boot/ folder

                                    3. Check for uEnv.txt in /(root) folder

                                    4. Loading environment from /uEnv.txt

                                    5. Checking for uEnv.txt again??

                                    6. Switching to MMC1, presumable eMMC

                                    7. Checking for /boot/uEnv.txt

                                    8. Loaded environment from /boot/uEnv.txt on eMMC

                                     

                                    Is ’emanip-arm-ctrl.dtb’ file present in /boot/dtbs/[Kernel Version]/ folder of eMMC?

                                     

                                    9. Device tree is switching to dtb=am335x-boneblack-uboot-univ.dtb (line 105)

                                    10. Loading several overlays

                                    The questions are the following:

                                    1. Which boot source are you intending to boot from? (SD card/eMMC) If you are booting from the eMMC, please remove the SD card

                                    2. Which uEnv.txt file are you intending for the processor to read? Is the board reading the correct uEnv.txt file?

                                    3. Is the Device tree file present in the correct location on the desired boot source?

                                    You have both the values debug:[enable_uboot_overlays=0] ...’ and ‘[enable_uboot_cape_universal-0] …’. You can remove all the other variables from your uEnv.txt file to remove any confusion for the processor.

                                    Hope this helps.

                                    Neeraj

                                  • #9338
                                    Mat Mat Mahermemaher
                                    Participant

                                      I think its a more deep-seated problem. Looking at the startup logs

                                      95: Using: dtb=emanip-arm-ctrl.dtb
                                      105: uboot_overlays: Switching too: dtb=am335x-boneblack-uboot-univ.dtb

                                      Looking into the u-boot source it seems that dtb= (and therefore fdtfile=) can be set to whatever in uEnv.txt, but is then automatically overridden during the i2c EEPROM check.

                                      Everything is pointing at the moment to needing a custom u-boot build, which I’ve been trying to avoid

                                    • #9339
                                      Mat Mat Mahermemaher
                                      Participant

                                        To conclude this thread (as I’m now giving up)…

                                        in Debian 9, dtb= and/or fdtfile= statements are overwritten by the findfdt= macro during boot. This makes it almost impossible to insert a custom device tree using uEnv.txt. The options are therefore:

                                        1. modify/rebuild u-boot
                                        2. use Debian 8
                                        3. implement changes as overlays (however this limits the changes that are possible)

                                      • #9340
                                        Neeraj Dantu
                                        Moderator

                                          Memaher,

                                          The default behavior of Beagle u-boot is to check the EEPROM first, select the device tree based on the ID and then over ride that selection based on the variable set in uEnv.txt. You can see this behavior in the OSD3358-SM-RED image we host on the website.

                                          We believe that the multiple boot sources and uEnv.txt files that are being sourced could be causing the overrides. We have verified that modifying /boot/uEnv.txt with the correct ‘dtb’ variable name(and corresponding dtb file sitting in /boot/dtbs/[kernel version]) and commenting out everything else in the file did not load any device tree overlays on the latest Beagle image (9.9).

                                          Please let us know if you need further assistance.

                                          Best,

                                          Neeraj

                                      Viewing 17 reply threads
                                      • You must be logged in to reply to this topic.