Poor Ethernet performance (AR8035)

Forums Devices OSD335x-SM Poor Ethernet performance (AR8035)

Viewing 9 reply threads
  • Author
    Posts
    • #8094
      Andres Olivaresandyolivares
      Participant

      Hi:

      As you may already know, I developed a simple board based mostly on your RED platform. I just kept the bare minimum to make a bootable board, but I left an expansion header with Ethernet signals and others for further experimentation.

      Some time ago I also built an Ethernet board to be connected to my base board, also based on what’s present on the RED platform (AR8035 chipset).

      I did all necessary device tree modifications and both boards “turn on” and “work”. The bootloader (u-boot) sees something on the Ethernet port, but seems to fail to recognize the Ethernet chipset on address 0 (which looks “normal” as it should be address 4… I may need some tweaking there):

      And when the kernel starts it recognizes the chipset and load drivers for it.

      I can even ping Google, download files, etc. The problem is that I’ve been experiencing very poor Ethernet performance. I even get some kernel panics from time to time:

      And apart from those kernel panics, the Ethernet connectivity is almost unusable. It is very very slow. I checked for lost packets or packets being rejected but nothing. It looks normal to me.

      I’ve been tracking down the issue and it seems to happen only on some Ethernet networks. If I share my PC’s Wi-Fi Internet Connection (Windows 10) through the Ethernet cable, it seems to work just fine. But if I connect the Ethernet board directly to my router, it has very poor performance (only a few Kbps) and kernel panics start to appear from time to time.

      Do you guys have any idea what could be happening? I’m attaching the board’s schematic. If you need the board design (Gerbers or other file) let me know. The only difference between my design and the RED platform is the Ethernet connector (I used Molex 93626-3508).

      Best regards,

      Andy

      • This topic was modified 5 years ago by Andres Olivaresandyolivares. Reason: more information
    • #8103
      Erik Welsh
      Keymaster

      Andy,

      U-Boot in the BeagleBoard.org images expects the Ethernet PHY to be at address 0.  As long as you don’t need to use Ethernet during U-Boot (i.e. no TFTP or network boot) then you can just ignore that error message.  If you do need Ethernet during U-Boot, then you will have to modify U-Boot to use the different PHY address.  I believe that you would need to modify the “phy_interface” value in the “cpsw_pdata” structure (approx line 915) in board.c (http://git.denx.de/?p=u-boot.git;a=blob;f=board/ti/am335x/board.c).

      Looking at the performance problem, it looks like this might be related to the Ethernet driver and malformed packets.  Looking around, I found a couple of links that could help debug the problem:

      https://forums.xilinx.com/t5/Embedded-Linux/Ethernet-Lite-Linux-Driver-Bug-quot-skbuff-skb-over-panic-quot/td-p/587295

      https://e2e.ti.com/support/processors/f/791/t/530328

      My suggestion would be to first use WireShark (https://www.wireshark.org/) to look at the network traffic on the switch to understand what packets are being sent to the OSD335x.  Then, you can take a look in the Ethernet driver and see if it might have a similar issue to the Xilinx driver.  Given that everything works fine when you are routing things through your computer, it would indicate this is more of a software issue than a hardware issue.

      Additionally, what kernel version / image are you using?

      Thanks,

      Erik

    • #8158
      Andres Olivaresandyolivares
      Participant

      Thank you Erik for the valuable information & suggestions. Indeed, it appears to be a software issue. I’ve been performing some test on other networks and it works very good with no performance issues and/or kernel panics. I even performed some speed tests and I got maximum speed for my current ISP service.

      Will perform more tests with Wireshark capturing packet information on the network with issues and try to narrow down which packet is causing the problem (or type of packets). The network that causes issues is my company network. I would guess that there are some kind of jumbo packets and/or VLANs or some other “company specific” packet that may cause the problem. My home network works like a charm.

      I’m using latest IoT image available from Beagleboard.org:
      Linux beaglebone 4.14.71-ti-r80 #1 SMP PREEMPT Fri Oct 5 23:50:11 UTC 2018 armv7l GNU/Linux

      Will keep you all posted on my findings.

      Regards,

      Andy

    • #8170
      Andres Olivaresandyolivares
      Participant

      Hi Erik:

      For my tests, I’m using a USB-to-Ethernet adapter that came with my laptop PC. In fact, I have two of these adapters: one that came with my personal laptop, and one that came with my office laptop (both ASUS brand).

      And so far, I’ve performed the following tests:

      1. Used my personal laptop’s USB-to-Ethernet adapter to share laptop’s Wi-Fi connection (to my home network): it works very good with no problems at all (at least no noticeable issues). I can download things at max speed, upgrade things with apt, perform network speed tests, communicate with external servers, ping, etc.

      2. Used my office laptop’s USB-to-Ethernet adapter to share laptop’s Wi-Fi connection (to my office network): it has very poor performance, quite unusable, only a few Kb/s with many errors and timeouts.

      3. Board directly connected to a GbE port on my office network switch: same as 2 but with some Kernel panics from time to time.

      I noticed that my personal laptop’s USB-to-Ethernet adapter is based on AX88772B chipset (10/100 Mbps), and that my office laptop one is based on a Realtek GbE chipset.

      So the next obvious thing to do was to test my AX88772B USB-to-Ethernet adapter on my office network to share the Wi-Fi throught my office laptop, and guess what: it worked just fine just like in case 1.

      In both cases where a GbE port is used (2 & 3) the network performance is very poor and unusable (with kernel panics in the case of 3). So it seems like there’s something to do to with the Ethernet speed (or adapter chipset) rather than the network itself (I’m guessing here).

      Also, due to the fact that my (small) office router is very basic and doesn’t have traffic capture capabilities, I’ve been unable to find the packet that is causing the kernel to panic seen on 3.

      To that end, I enabled tcpdump on the target board to see if I could find some more information of the offending packet, but the trace looks pretty normal to me. Almost all packets are 1400 bytes in length (max) which relates to my MTU I guess.
      I’m attaching the file in case you want to take a look. Board’s IP address was 192.168.0.175.
      After the last packet (1373) in **capture.cap** file I got a kernel panic, but unfortunately, that offending packet did not get captured, at least not entirely as Wireshark complains about an incomplete packet on the capture file.
      Used a tool to repair the capture file and a new last packet appeared (1374, incomplete) but its headers look normal to me too. I doesn’t seem like an overly large packet.

      I’m very puzzled to be honest. Do you think GbE has something to do with this? I don’t really think the USB-to-Ethernet adapter’s chipsets has anything to do here, as there may be many chipsets in routers/switches out there. It’s hard for me to believe that the board will fail depending on the router/switch chipset.

      I have a RED board laying around and haven’t done the test with it yet. Will try to do that ASAP. Maybe there are HW problems with my custom made Ethernet board and/or its routing. I’m attaching the Eagle BRD file just in case you want to take a look.

      Thanks for any help.

      Andy

      Edit: I was unable to upload the files. Please download them from here https://www.dropbox.com/s/rorc9lsafqae4mv/eth.zip?dl=1

      • This reply was modified 5 years ago by Andres Olivaresandyolivares. Reason: attached new file
      • This reply was modified 5 years ago by Andres Olivaresandyolivares.
      • This reply was modified 5 years ago by Andres Olivaresandyolivares. Reason: added url to dropbox file
    • #8176
      Erik Welsh
      Keymaster

      Andy,

      We took a quick look at the layout and pcap files and didn’t see anything that jumped out at us.  From your description, it feels more like a network issue at your office.  One thing you could check is to take your office adapter and try it on your home network.  There could be some small noise issues on your office network which is causing the GbE adapaters to not perform as well as the 10/100 adapters.  I don’t think it is anything inherent with GbE but could be a bad cable since GbE requires more twisted pairs in the cable than 10/100 Ethernet.

      Let us know how the testing goes.

      Thanks,

      Erik

       

       

    • #8177
      Andres Olivaresandyolivares
      Participant

      Hi Erik:

      I did the test with my GbE adapter on my home network and same results as the office: poor performance, very slow, almost unusable.

      Further testing with “ethtool”, I forced the connection to 100Mbps as follows (still with the GbE adapter):

      Did a few tests, and it worked great just like if I had connected the board to a 100Mbps adapter.

      I really think this has something to do with GbE but not sure how. Maybe, as you suggested, my cable has an important role. I’m testing with a cable almost identical to this one: https://www.ebay.com/itm/AWM-E212689-STYLE-2854-32-AWG-30V-2-FEET-CAT-5E-B0816F281J-CABLE-BRAND-NEW-/202639361149 (except mine is black, but has same markings on the cable: AWM E212689 STYLE 2854 80°C 30V CABLE).

      Anyway, cabling at office allows 1000Mbps speeds with my USB GbE adapter, at least with my laptop PC, with no problems. Connecting that same cable to my test board produces poor results with some kernel panics from time to time, as stated before.

      Maybe USB GbE is more forgiving to link issues than the AR8035 chipset/driver. I don’t really know. Do you think my board’s Ethernet connector choice may have something to do with it (Molex 93626-3508)? As you said… there are not apparent routing problems or board design issues.

      Next test will be to connect the GbE USB-to-Ethernet directly to the USB host on my test board and see if the network connection works at full 1000Mbps speed directly connected to my office network.

      Will keep you posted. Thank you!

      Andy

    • #8178
      Andres Olivaresandyolivares
      Participant

      One more thing to consider: when in GbE mode, I’m getting a lot of Rx CRC Errors:

      When in 100Mbps mode, that number does not increase.

      I read some forums about the trace length from my PHY to the microprocessor could be too long for the GbE speeds. Due to the fact that I’m connecting the PHY through a header connector to my base board, and then to the microprocessor, they indeed could be too long.

      Do you think that may be the cause?

      Andy

    • #8188
      Andres Olivaresandyolivares
      Participant

      Erik:

      I did the test of connecting my GbE USB-to-Ethernet adapter directly to my base test board (through the USB host connector). It was recognized as a 1000Mbps (GbE) full-duplex link and loaded as “eth1”. I connected to my office network switch/router and it worked like a charm (unlike my Ethernet daugther board based on AR8035 chipset).

      Just for completeness, also did the test of connecting my Ethernet daugther board to my office network but downgrading the link to 100Mbps full-duplex and it also worked just fine.

      So it is confirmed that for some reason, my Ethernet daugther board does not work properly at GbE speeds. Could be the trace lengths, routing problems, noise issues due to the board-to-board connector, you name it 🙂

      But it seems like using the USB adapter at GbE speeds is more forgiving and works just fine. I assume that’s because the way it is constructed, how the USB driver works and other variables.

      Regards,

      Andy

    • #8196
      Erik Welsh
      Keymaster

      Andy,

      There could be some trace length issues.  You might want to look at the RX bus on the oscilloscope.  Focus on the RX_CLK and one of the data lines to see if there are issues with the clock and data not lining up.  100Mbps requires a 25MHz clock while 1Gbps requires a 125MHz clock, hence the timing requirements are much stricter.  Additionally the RGMII requires data to be clocked on both the rising and falling edges.

      Thanks,

      Erik

       

    • #8200
      Andres Olivaresandyolivares
      Participant

      Will try to perform the test with the oscilloscope as soon as possible. Thank you for all the help so far.

      Andy

Viewing 9 reply threads
  • You must be logged in to reply to this topic.