[prev in list] [next in list] [prev in thread] [next in thread] 

List:       linux-can
Subject:    Re: [patch V2 00/21] can: c_can: Another pile of fixes and improvements
From:       Alexander Stein <alexander.stein () systec-electronic ! com>
Date:       2014-04-14 8:38:04
Message-ID: 2227317.E2ytjs4Wd3 () ws-stein
[Download message RAW]

On Friday 11 April 2014 08:13:09, Thomas Gleixner wrote:
> Changes since V1:
> 
> - Slightly modified version of the interrupt reduction patch
> - Included the fix for PCH / C_CAN
> - Lockless XMIT path
> - Further reduction of register access
> - Add the missing can.type setup in c_can_pci.c
> - A pile of code cleanups.
> 
> It would be nice to reduce the register access some more by relying
> completely on the status interrupt, but it turned out that the TX/RXOK
> is not reliable enough. So we need to invalidate the message objects
> in the tx softirq handling.
> 
> But the overall change of this series is that the I/O load gets
> reduced by about 45% according to perf top. Though that PCH thing
> sucks. The beaglebone manages to almost saturate the bus with short
> packets at 1Mbit while PCH fails miserably and thats solely related to
> the miserable I/O performance.
> 
> time cangen can0 -g0 -p10 -I5A5 -L0 -x -n 1000000 
> 
> arm: real	0m51.510s 	I/O read:  ~6%  I/O write: 1.5%  ~3.5s
> x86: real	1m48.533s	I/O read: ~29%  I/O write: 0.8%  ~32 s!!
> 
> That's both with HW loopback on, as my PCH does not have a
> tranceiver. Granted the C_CAN in the PCH needs the double IF transfer
> to prevent the message loss versus the D_CAN in the ARM chip, but even
> that taken into account makes a whopping 16s per 1M messages vs. 3.5s
> on ARM.
> 
> w/o loopback the arm I/O read load drops to ~3.5% on the sender side
> and ~5.5% on the receiver side. The time drops to 50.5s on the
> transmit side if we do not have to get all the RX packets from HW
> loopback. On TX we have a ~10us large gap every 16 packets which is
> caused by the queue stall as we have to wait for the last
> packet in the "FIFO" to be transferred. 
> 
> It seems there is a reason why the ATOM perf events do not expose the
> stalled cpu cycles. But it's easy to figure out. You can compare the
> CAN load case with some other scenario which has 100% CPU utilization
> by running 
> 
> # perf stat -a sleep 60
> 
> The interesting part is: insns per cycle
> 
> CAN:	 0.23  insns per cycle
> Other:	 0.53  insns per cycle
> 
> I don't have comparison numbers for ARM due to not supported perf
> events, but the perf top numbers and the transfer performance tell a
> clear story.
> 
> There might be room for a few improvements, but I'm running out of
> cycles and I really want to get the IF3 DMA feature functional on the
> TI chips, but that seems to be an equally tedious reverse engineering
> problem as the rest of this.

Run this patchset on top of linux-can-fixes-for-3.15-20140401 on idle system and with \
                running iperf and I2C:
idle: 10 runs with 2 x 250'000 frames each, _no_ losts or swaps at all
load: 10 runs with 2 x 250'000 frames each, _no_ losts or swaps at all
\o/

CONFIG_CAN_C_CAN_STRICT_FRAME_ORDERING is not set. Maybe we can drop it now?

Despite that:
Tested-by: Alexander Stein <alexander.stein@systec-electronic.com>

Thanks a lot and best regards,
Alexander
-- 
Dipl.-Inf. Alexander Stein

SYS TEC electronic GmbH
Am Windrad 2
08468 Heinsdorfergrund
Tel.: 03765 38600-1156
Fax: 03765 38600-4100
Email: alexander.stein@systec-electronic.com
Website: www.systec-electronic.com
 
Managing Director: Dipl.-Phys. Siegmar Schmidt
Commercial registry: Amtsgericht Chemnitz, HRB 28082

--
To unsubscribe from this list: send the line "unsubscribe linux-can" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic