networking - Linux Loopback performance with TCP_NODELAY enabled -


i stumbled on interesting tcp performance issue while running performance tests compared network performance versus loopback performance. in case network performance exceeded loopback performance (1gig network, same subnet). in case dealing latencies crucial, tcp_nodelay enabled. best theory have come tcp congestion control holding packets. did packet analysis , can see packets being held, reason not obvious. questions...

1) in cases, , why, communicating on loopback slower on network?

2) when sending fast possible, why toggling tcp_nodelay have more of impact on maximum throughput on loopback on network?

3) how can detect , analyze tcp congestion control potential explanation poor performance?

4) have other theories reason phenomenon? if yes, method prove theory?

here sample data generated simple point point c++ app:

 transport     message size (bytes)  tcp nodelay   send buffer (bytes)   sender host   receiver host   throughput (bytes/sec)  message rate (msgs/sec) tcp           128                   on            16777216              hosta         hostb           118085994                922546 tcp           128                   off           16777216              hosta         hostb           118072006                922437 tcp           128                   on                4096              hosta         hostb            11097417                 86698 tcp           128                   off               4096              hosta         hostb            62441935                487827 tcp           128                   on            16777216              hosta         hosta            20606417                160987 tcp           128                   off           16777216              hosta         hosta           239580949               1871726 tcp           128                   on                4096              hosta         hosta            18053364                141041 tcp           128                   off               4096              hosta         hosta           214148304               1673033 unixstream    128                   -             16777216              hosta         hosta            89215454                696995 unixdatagram  128                   -             16777216              hosta         hosta            41275468                322464 namedpipe     128                   -             -                     hosta         hosta            73488749                574130 

here few more pieces of useful information:

  • i see issue small messages
  • hosta , hostb both have same hardware kit (xeon x5550@2.67ghz, 32 cores total/128 gig mem/1gig nics)
  • os rhel 5.4 kernel 2.6.18-164.2.1.el5)

thank you

1) in cases, , why, communicating on loopback slower on network?

loopback puts packet setup+tcp chksum calculation both tx+rx on same machine, needs 2x processing, while 2 machines split tx/rx between them. can have negative impact on loopback.

2) when sending fast possible, why toggling tcp_nodelay have more of impact on maximum throughput on loopback on network?

not sure how you've come conclusion, loopback vs network implemented differently, , if try push them limit, hit different issues. loopback interfaces (as mentioned in answer 1) cause tx+rx processing overhead on same machine. on other hand, nics have # of limits in terms of how many outstanding packets can have in circular buffers etc cause different bottlenecks (and varies chip chip too, , switch that's between them)

3) how can detect , analyze tcp congestion control potential explanation poor performance?

congestion control kicks in if there packet loss. seeing packet loss? otherwise, you're hitting limits on tcp window size vs network latency factors.

4) does have other theories reason phenomenon? if yes, method prove theory?

i don't understand phenomenon refer here. see in table have sockets large send buffer - can legitimate. on fast machine, application capable of generating more data network can pump out, i'm not sure you're classifying problem here.

one final note: small messages create bigger performance hit on network various reasons, such as:

  • there fixed per packet overhead (for mac+ip+tcp headers), , smaller payload is, more overhead you're going have.
  • many nic limitations relative # of outstanding packets, means you'll hit nic bottlenecks less data when using smaller packets.
  • the network per-packet overhead, max amount of data can pump through network dependent on size of packets again.

Comments

Popular posts from this blog

objective c - Change font of selected text in UITextView -

php - Accessing POST data in Facebook cavas app -

c# - Getting control value when switching a view as part of a multiview -