Ethernet port failure on 2nd stage u-boot

Forums Devices OSD335x-SM Ethernet port failure on 2nd stage u-boot

Viewing 17 reply threads
  • Author
    Posts
    • #9241
      Mat Mat Mahermemaher
      Participant

      Sorry, this might be quite a large post. Its the strangest problem I’ve dealt with in a long while and I’m lost for ideas…

      We’ve got a custom designed board with dual 100mbit Ethernets: MII interfaced ports connected through LAN8710A devices.
      1. Running a standard Beaglebone Debian image with custom device-tree, eth0 is alive from the O/S side but unresponsive to data throughput.
      2. Pinging the devices from the Ethernet shows activity on the PHY activity lights, showing the PHY is receiving *something*.
      3. Performing the MII loopback tests as described in the app notes works perfectly: the PHY’s are online on the MII port.
      4. breaking u-boot on the keyboard halt shows ethernet as working (it performs active DHCP requests back to my host computer).

      So, I can pretty much confirm that the the hardware working, and stage-1 uboot configuration is happy with it. However, the moment the 2nd-stage u-boot and Linux kernel takes over, it breaks the configuration and ethernet fails to work.

      There are three possibilities as I see it:
      1. my device-tree is wrong
      – its had a lot of changes and been validated against EVM device trees so unlikely.

      2. u-boot is applying beaglebone device trees on top of my device tree.
      – the boot messages suggest this is happening

      3. there is some weird configuration of Linux at work

      Option 2 is currently my best guess. The boot messages show a lot of unwanted beaglebone device trees being applied.

      Apart from the simple question of “have you ever seen this before or have any suggests where to start looking?”, my main question is

      “how can I shut-down the beaglebone overlays so they aren’t overwriting my device tree?

      Any ideas?

      Mat

      ————————————————————————————————————

       

      So,

      1. Anyone any ideas what might be going wrong?

      2. Anyone any ideas how to shut-down the beaglebone overlays?

      Attachments:
    • #9251
      Eshtaartha Basu
      Moderator

      Hello memaher,

      Looks like emanip-arm-ctrl.dts was not successfully uploaded due to security issues. Please zip all your files and re-upload them so that we can access it.

      1. Please share complete dmesg or boot log. In the current partial boot log, we cannot see any messages related to Eth PHY.
      2. Please re-upload all device tree files so that we can review it.
      3. If possible, share schematics of Eth PHY circuitry.
      4. Have you tried using the official Octavo RED image instead of latest BeagleBoard image? If yes, did you face similar issues on RED image as well?

      Meanwhile, we’d also recommend taking a look at our Ethernet app note (https://octavosystems.com/app_notes/ethernet-am335x-system-in-package/) and other Dual Ethernet discussions on our forum.

    • #9252
      Mat Mat Mahermemaher
      Participant

      Some progress since, but even more confusing now…

      The good news is that its now working. However, to get it working, I had to connect both ethernet ports to a valid target (including the one which isn’t actually being used)! If either Ethernet port is disconnected, then both fail to operate. Is this something you’ve come across before?

      I’ll re-upload my files (with .txt extension): please let me know if you spot anything or can shed some light on why its operating in this manner?

      My ultimate aim is to have the 3 ethernet ports operating as a switch (CPU as the 3rd). However, I’d settle for just one ethernet port operating (with the other disconnected) for now!

      I appreciate your support!

      Mat

    • #9255
      Mat Mat Mahermemaher
      Participant

      Final bit of useful information. Both dmesg log and EthTool both list the PHY as being at address 4. So, what I think is happening is that the kernel is mis-configured between MII port and MDIO address. What should be:
      MII0 == MDIO Addr 0
      MII1 == MDIO Addr 4

      is actually reading

      MII0 == MDIO Addr 4
      MII1 == MDIO Addr 0

      This would explain why MII0 port fails if MDIO<4> says the port is disconnected.

      However, this being the case, you would think that just swapping the cpsw {} entries for phy addesses would resolve the problem. Alas not: it doesn’t actually make the slightest difference: EthTool STILL reports address 4!

      —————————————————————————————————-

      Settings for eth0:
      Supported ports: [ TP MII ]
      Supported link modes: 10baseT/Half 10baseT/Full
      100baseT/Half 100baseT/Full
      Supported pause frame use: Symmetric Receive-only
      Supports auto-negotiation: Yes
      Supported FEC modes: Not reported
      Advertised link modes: 10baseT/Half 10baseT/Full
      100baseT/Half 100baseT/Full
      Advertised pause frame use: No
      Advertised auto-negotiation: Yes
      Advertised FEC modes: Not reported
      Link partner advertised link modes: 10baseT/Half 10baseT/Full
      100baseT/Half 100baseT/Full
      Link partner advertised pause frame use: Symmetric
      Link partner advertised auto-negotiation: Yes
      Link partner advertised FEC modes: Not reported
      Speed: 100Mb/s
      Duplex: Full
      Port: MII
      PHYAD: 4
      Transceiver: internal
      Auto-negotiation: on
      Cannot get wake-on-lan settings: Operation not permitted
      Current message level: 0x00000000 (0)

      Link detected: yes

    • #9256
      Neeraj Dantu
      Moderator

      Hey Memaher,

      A couple of changes are suggested for the device tree attached: emanip-arm-ctrl.txt.

      1. pinctrl-names ‘default’ and ‘sleep’ are defined but ‘sleep’ pin mux definitions are not linked in the @mac node.

      2. The pin mux definitions in ‘mii_pins_default‘ set the pins for GMII and RGMII modes instead of the MII mode that they need to be set to

      3. This is more of a clean up action, but please get rid of the pinmux nodes that are not being used for clarity.

      These changes might fix the issues you are having. In case they do not, please attach a complete log output of dmesg(failed upload in your last post) so that we can examine how the processor is probing the PHYs and initializing the interfaces

      In addition, note that only 1 1.5K pull-up is enough on MDIO_DATA and 2 pull-ups in parallel could be stronger than necessary,

      Hope this helps. Please let us know how your debug goes.

      Neeraj

    • #9272
      Mat Mat Mahermemaher
      Participant

      Thanks for the comments. I’ll make the amendments but I don’t think they are the route cause of this problem.

      dmesg reattached, from what I can ascertain:
      1. Both PHYs are detected correctly on MDIO ([1.171191] + [1.171200])
      2. eth0 clearly gets brought up with MDIO:04 [17.571013].

      MAt

      Attachments:
    • #9281
      Neeraj Dantu
      Moderator

      Memaher,

      As you pointed out, the MDIO probe is finding 2 PHYs, but only 1 PHY is being initialized after. This could be related to the pinmux declaration. Please let us know how the system behaves after the device tree updates.

      Best,

      Neeraj

    • #9282
      Mat Mat Mahermemaher
      Participant

      I made all the changes except for the allocation of MII vs GMII. Not sure where to go here, as the options are GMII, RGMII or RMII. As I’m using the full MII I didn’t want to head to RMII. Despite the MDIO problems, using GMII works fine: I get a very stable link up (but of course only when I’ve got both ethernet cables connected).

      Interestingly, I tried breaking the cpsw_emac definitions by allocating PHY2 to the wrong address:

      &cpsw_emac0
      {
      phy_id = <&davinci_mdio>, <0>;
      phy-mode = “mii”;
      dual_emac_res_vlan = <1>;
      ifname = “eth0”;
      };
      &cpsw_emac1
      {
      phy_id = <&davinci_mdio>, <1>;
      phy-mode = “mii”;
      dual_emac_res_vlan = <2>;
      ifname = “eth1”;
      };

      The end result was the same, it still allocated the eth0 PHY to address 4 (ethtool printout below).

      This to me indicates one thing: the eth0 PHY address is being picked up as the last-found address from an automatic scan of the MDIO bus, NOT the device tree.

      Still lost how to fix it though!

      Mat
      ——————————————–

      Settings for eth0:
      Supported ports: [ TP MII ]
      Supported link modes: 10baseT/Half 10baseT/Full
      100baseT/Half 100baseT/Full
      Supported pause frame use: Symmetric Receive-only
      Supports auto-negotiation: Yes
      Supported FEC modes: Not reported
      Advertised link modes: 10baseT/Half 10baseT/Full
      100baseT/Half 100baseT/Full
      Advertised pause frame use: No
      Advertised auto-negotiation: Yes
      Advertised FEC modes: Not reported
      Link partner advertised link modes: 10baseT/Half 10baseT/Full
      100baseT/Half 100baseT/Full
      Link partner advertised pause frame use: Symmetric
      Link partner advertised auto-negotiation: Yes
      Link partner advertised FEC modes: Not reported
      Speed: 100Mb/s
      Duplex: Full
      Port: MII
      PHYAD: 4
      Transceiver: internal
      Auto-negotiation: on
      Cannot get wake-on-lan settings: Operation not permitted
      Current message level: 0x00000000 (0)

      Link detected: yes

    • #9283
      Neeraj Dantu
      Moderator

      Mat,

      Can you please attach the dmesg logs for 1. when both ports come up and work and 2. Both ports do not work. You are right that the processor scans the MDIO bus to see where the PHYs are located. But, in our experience, having a wrong address in the device tree still results in the ethernet port not working.

      Best,

      Neeraj

       

    • #9285
      Mat Mat Mahermemaher
      Participant

      Two logs attached.

      Later today I’ll attempt to physically change the phy address on eth0 to be higher than eth1 phy. Should confirm whether or not it’s just picking the highest number.

      Attachments:
    • #9289
      Mat Mat Mahermemaher
      Participant

      ok, so my theory about the highest address doesn’t stack up. But, I can confirm that changing the PHYID within the cpsw {} device tree sections doesn’t make the slightest bit of difference to the PHYID allocation.

      I can also confirm that the extra 1.5k pullup isn’t the issue (now removed).

      I’ve modified the board as follows

      ethernet#1 = PHYID 5 (this is my eth0 port)
      ethernet#2 = PHYID 4

      1. setting the device tree to addresses 0 & 1 respectively results in ethtool detecting PHYID=4 as eth0 (aka it doesn’t work)
      2. setting the device tree to addresses 5 & 4 respectively results in ethtool detecting PHYID=4 on eth0 (ditto)
      3. setting the device tree backwards to 4 & 5 respectively results in exactly the same PHYID=4.

      Something is finding address 4 and prioritising it over everything, but I’ve no idea what. It still remains the same problem: eth0 works brilliantly, but only when a cable is plugged into eth1 – otherwise it doesn’t detect a valid link.

    • #9290
      Mat Mat Mahermemaher
      Participant

      and finally…

      PHY ADDRESS = 4 is hard-coded somewhere in the system: it is NOT being read from the custom device tree.

      I can say that now, as…

      1. Changing my hardware 2nd PHY address away from 4 to 0 (with changing the device tree to match) results in

      [ 18.311930] libphy: PHY 4a101000.mdio:04 not found
      [ 18.353779] net eth0: phy “4a101000.mdio:04” not found on slave 0, err -19

      So something, somewhere, is hard-coding this address and it was just coincidence that I chose adr=4 for my second phy.

      2. changing my hardware 1st PHY address to 4 (leaving 2nd phy at 0) results in the ethernet port now working (regardless of device tree setting).

      So, I now have a working ethernet port. However, it doesn’t explain the root cause of the problem: maybe its the original suspicion of beaglebone overlays still being loaded on top of the custom overlay?

    • #9291
      Neeraj Dantu
      Moderator

      Mat,

      It would be good to verify that the correct device tree and no overlays are being loaded while booting. For this, we will have to look at logs before the kernel starts on UART0. The bootloader log will show whether any overlays are being loaded.

      The logs both working and not working only show 1 Ethernet port, while you should see eth0 and eth1 come up.

      Note that the PHY address for Beaglebone Black is set to 0. PHY address for the RED board is set to 4. I suspect that the device tree is not loading correctly? See model in the boot logs “[    0.000000] OF: fdt: Machine model: Octavo Systems OSD3358-SM-RED” while the string you have in your device tree is “model = “Seaeye eManip Arm controller”;”

      You can also check the model name in command line: “cat /proc/device-tree/model”

      Hope this helps.

      Neeraj

       

       

    • #9328
      Mat Mat Mahermemaher
      Participant

      u-boot is loading a LOT of device trees in addition to my custom one! Is there any way to shut these down other than customising u-boot?

       

    • #9337
      Neeraj Dantu
      Moderator

      Memaher,

      From the logs it looks like the following is happening:

      1. Processor recognizes MMC0 interface, presumably with the SD card as boot source

      2. Check for uEnv.txt in /boot/ folder

      3. Check for uEnv.txt in /(root) folder

      4. Loading environment from /uEnv.txt

      5. Checking for uEnv.txt again??

      6. Switching to MMC1, presumable eMMC

      7. Checking for /boot/uEnv.txt

      8. Loaded environment from /boot/uEnv.txt on eMMC

       

      Is ’emanip-arm-ctrl.dtb’ file present in /boot/dtbs/[Kernel Version]/ folder of eMMC?

       

      9. Device tree is switching to dtb=am335x-boneblack-uboot-univ.dtb (line 105)

      10. Loading several overlays

      The questions are the following:

      1. Which boot source are you intending to boot from? (SD card/eMMC) If you are booting from the eMMC, please remove the SD card

      2. Which uEnv.txt file are you intending for the processor to read? Is the board reading the correct uEnv.txt file?

      3. Is the Device tree file present in the correct location on the desired boot source?

      You have both the values debug:[enable_uboot_overlays=0] ...’ and ‘[enable_uboot_cape_universal-0] …’. You can remove all the other variables from your uEnv.txt file to remove any confusion for the processor.

      Hope this helps.

      Neeraj

    • #9338
      Mat Mat Mahermemaher
      Participant

      I think its a more deep-seated problem. Looking at the startup logs

      95: Using: dtb=emanip-arm-ctrl.dtb
      105: uboot_overlays: Switching too: dtb=am335x-boneblack-uboot-univ.dtb

      Looking into the u-boot source it seems that dtb= (and therefore fdtfile=) can be set to whatever in uEnv.txt, but is then automatically overridden during the i2c EEPROM check.

      Everything is pointing at the moment to needing a custom u-boot build, which I’ve been trying to avoid

    • #9339
      Mat Mat Mahermemaher
      Participant

      To conclude this thread (as I’m now giving up)…

      in Debian 9, dtb= and/or fdtfile= statements are overwritten by the findfdt= macro during boot. This makes it almost impossible to insert a custom device tree using uEnv.txt. The options are therefore:

      1. modify/rebuild u-boot
      2. use Debian 8
      3. implement changes as overlays (however this limits the changes that are possible)

    • #9340
      Neeraj Dantu
      Moderator

      Memaher,

      The default behavior of Beagle u-boot is to check the EEPROM first, select the device tree based on the ID and then over ride that selection based on the variable set in uEnv.txt. You can see this behavior in the OSD3358-SM-RED image we host on the website.

      We believe that the multiple boot sources and uEnv.txt files that are being sourced could be causing the overrides. We have verified that modifying /boot/uEnv.txt with the correct ‘dtb’ variable name(and corresponding dtb file sitting in /boot/dtbs/[kernel version]) and commenting out everything else in the file did not load any device tree overlays on the latest Beagle image (9.9).

      Please let us know if you need further assistance.

      Best,

      Neeraj

Viewing 17 reply threads
  • You must be logged in to reply to this topic.