SuperComputing 2012

Memory to Memory

No problems with memoty to memory: 37.806 Gb/s

[root@sc02 ~]# fdtClient -P 7 -c 192.168.100.4  /dev/zero -d /dev/null
Oct 28, 2012 11:49:59 PM lia.util.net.common.Config <init>
INFO: Using lia.util.net.copy.PosixFSFileChannelProviderFactory as FileChannelProviderFactory
Oct 28, 2012 11:49:59 PM lia.util.net.common.Config <init>
INFO: FDT started in client mode
FDT uses *blocking* I/O mode.
INFO: Requested window size -1. Using window size: 49360
28/10 23:50:09   Net Out: 38.130 Gb/s   Avg: 38.130 Gb/s
28/10 23:50:14   Net Out: 38.128 Gb/s   Avg: 38.129 Gb/s
28/10 23:50:19   Net Out: 36.775 Gb/s   Avg: 37.678 Gb/s
28/10 23:50:24   Net Out: 38.197 Gb/s   Avg: 37.806 Gb/s

Disk to Disk within a machine

MegaCli -CfgLdAdd -r0 [252:1] WT NORA DIRECT -strpsz 1024 -a0
[root@sc02 ~]# fdtCopy if=/ssd1/010Gfile_n0010.dat of=/ssd7/010Gfile_n0010.dat
[Sun Oct 28 20:56:57 PDT 2012] Current Speed = 441.648 MB/s Avg Speed: 436.479 MB/s Total Transfer: 4.277 GB

Network transfer with 7 virtual disks and 1 fdtServers, 7 parallel streams

Grand total with 7 disks, 1 FDT Server: 10.533 Gb/s

[root@sc02 ssd6]# fdtClient -P 7 -c 192.168.100.4 -fl /ssd6/filelist.txt -d /
Avg: 10.533 Gb/s 100.00% ( 00s )
FDTReaderSession ( 13f62162-4fb5-41c5-b4fb-db9af9c453f3 ) final stats:
 Started: Sun Oct 28 23:33:07 PDT 2012
 Ended:   Sun Oct 28 23:42:42 PDT 2012
 Transfer period:   09m 34s
 TotalBytes: 751619276800
 TotalNetworkBytes: 751619276800
 Exit Status: OK

Network transfer with 7 virtual disks and 7 fdtServers

The test was set up with 1 virtual disk per physical disk and 1 FDT server for every disk.

Grand total with 7 disks, 7 FDT Servers: 22.196 Gb/s

fdtClient -c 192.168.100.4 -p 54321 -d /ssd1/  /ssd1/100Gfile_x.dat > /root/fdt1.log &
fdtClient -c 192.168.100.4 -p 54322 -d /ssd2/  /ssd2/100Gfile_x.dat > /root/fdt2.log &
fdtClient -c 192.168.100.4 -p 54323 -d /ssd3/  /ssd3/100Gfile_x.dat > /root/fdt3.log &
fdtClient -c 192.168.100.4 -p 54324 -d /ssd4/  /ssd4/100Gfile_x.dat > /root/fdt4.log &
fdtClient -c 192.168.100.4 -p 54325 -d /ssd5/  /ssd5/100Gfile_x.dat > /root/fdt5.log &
fdtClient -c 192.168.100.4 -p 54326 -d /ssd6/  /ssd6/100Gfile_x.dat > /root/fdt6.log &
fdtClient -c 192.168.100.4 -p 54327 -d /ssd7/  /ssd7/100Gfile_x.dat > /root/fdt7.log &

Avg: 3.067 Gb/s 100.00% ( 00s )
FDTWriterSession ( 9920c20c-6378-4ef1-b14b-8393b5c1eafb ) final stats:
 Started: Sun Oct 28 22:57:04 PDT 2012
 Ended:   Sun Oct 28 23:01:48 PDT 2012
 Transfer period:   04m 43s
 TotalBytes: 107374182400
 TotalNetworkBytes: 107374182400
 Exit Status: OK

Avg: 2.962 Gb/s 100.00% ( 00s )
FDTWriterSession ( 2cc841b1-0ce5-4f09-b773-794e59d0d35e ) final stats:
 Started: Sun Oct 28 22:57:04 PDT 2012
 Ended:   Sun Oct 28 23:01:58 PDT 2012
 Transfer period:   04m 53s
 TotalBytes: 107374182400
 TotalNetworkBytes: 107374182400
 Exit Status: OK

Avg: 3.159 Gb/s 100.00% ( 00s )
FDTWriterSession ( 183111ea-8c54-4247-870f-e1f7eb8440b7 ) final stats:
 Started: Sun Oct 28 22:57:04 PDT 2012
 Ended:   Sun Oct 28 23:01:42 PDT 2012
 Transfer period:   04m 37s
 TotalBytes: 107374182400
 TotalNetworkBytes: 107374182400
 Exit Status: OK

Avg: 3.232 Gb/s 100.00% ( 00s )
FDTWriterSession ( 9bf985f6-9260-466d-b775-99f4df5931a6 ) final stats:
 Started: Sun Oct 28 22:57:04 PDT 2012
 Ended:   Sun Oct 28 23:01:33 PDT 2012
 Transfer period:   04m 28s
 TotalBytes: 107374182400
 TotalNetworkBytes: 107374182400
 Exit Status: OK

Avg: 3.297 Gb/s 100.00% ( 00s )
FDTWriterSession ( 45f42c87-44c8-4db8-8d29-fcc2b33a12a1 ) final stats:
 Started: Sun Oct 28 22:57:04 PDT 2012
 Ended:   Sun Oct 28 23:01:31 PDT 2012
 Transfer period:   04m 26s
 TotalBytes: 107374182400
 TotalNetworkBytes: 107374182400
 Exit Status: OK

Avg: 3.238 Gb/s 100.00% ( 00s )
FDTWriterSession ( 5dedbe6a-949b-47ec-9d8a-2e9c0b32e5f3 ) final stats:
 Started: Sun Oct 28 22:57:04 PDT 2012
 Ended:   Sun Oct 28 23:01:36 PDT 2012
 Transfer period:   04m 31s
 TotalBytes: 107374182400
 TotalNetworkBytes: 107374182400
 Exit Status: OK

Avg: 3.241 Gb/s 100.00% ( 00s )
FDTWriterSession ( 90247407-4dfd-417b-8022-bc3925e0f078 ) final stats:
 Started: Sun Oct 28 22:57:04 PDT 2012
 Ended:   Sun Oct 28 23:01:35 PDT 2012
 Transfer period:   04m 30s
 TotalBytes: 107374182400
 TotalNetworkBytes: 107374182400
 Exit Status: OK

7 clients one server

9cee52db-3569-4b29-b894-06a536c14409Net In: 1.246 Gb/s  Avg: 1.216 Gb/s 99.14% ( 06s )
9ea1e5bc-9c3c-41c0-8772-0e9a8845cbd5Net In: 1.238 Gb/s  Avg: 1.219 Gb/s 99.38% ( 04s )
9f1983ee-585f-47be-9b76-5d6d6d51f688Net In: 1.226 Gb/s  Avg: 1.217 Gb/s 99.21% ( 05s )
b55d01b7-5d32-4b9c-a222-93d114a33bf7Net In: 1.201 Gb/s  Avg: 1.215 Gb/s 98.95% ( 07s )
13792c14-34e5-413e-9b2c-de1fded01a80Net In: 1.246 Gb/s  Avg: 1.225 Gb/s 99.85% ( 01s )
2456eb3d-1f19-45d8-9441-6213d1f60b39Net In: 1.209 Gb/s  Avg: 1.221 Gb/s 99.54% ( 03s )
5106e622-0934-4903-afc7-92b540759c85Net In: 1.201 Gb/s  Avg: 1.220 Gb/s 99.46% ( 03s )
Total Net In: 8.569 Gb/s


......




FDTWriterSession ( 13792c14-34e5-413e-9b2c-de1fded01a80 ) final stats:
 Started: Mon Oct 29 10:36:24 PDT 2012
 Ended:   Mon Oct 29 10:48:11 PDT 2012
 Transfer period:   11m 46s
 TotalBytes: 107374182400
 TotalNetworkBytes: 107374182400
 Exit Status: OK

Oct 29, 2012 10:48:11 AM lia.util.net.copy.FDTWriterSession doPostProcessing
INFO: [ FDTWriterSession ] Post Processing started
Oct 29, 2012 10:48:12 AM lia.util.net.copy.FDTWriterSession doPostProcessing
INFO: [ FDTWriterSession ] No post processing filters defined/processed.
Oct 29, 2012 10:48:12 AM lia.util.net.copy.transport.ControlChannel run
INFO:  ControlThread for ( 13792c14-34e5-413e-9b2c-de1fded01a80 ) /192.168.100.2:35549 FINISHED
Oct 29, 2012 10:48:13 AM lia.util.net.copy.FDTWriterSession handleEndFDTSession
INFO: [ FDTWriterSession ] Remote FDTReaderSession for session [ b55d01b7-5d32-4b9c-a222-93d114a33bf7 ] finished ok. Waiting for our side to finish.
Oct 29, 2012 10:48:14 AM lia.util.net.copy.transport.ControlChannel run
INFO:  ControlThread for ( 2456eb3d-1f19-45d8-9441-6213d1f60b39 ) /192.168.100.2:35552 FINISHED


FDTWriterSession ( 2456eb3d-1f19-45d8-9441-6213d1f60b39 ) final stats:
 Started: Mon Oct 29 10:36:24 PDT 2012
 Ended:   Mon Oct 29 10:48:14 PDT 2012
 Transfer period:   11m 49s
 TotalBytes: 107374182400
 TotalNetworkBytes: 107374182400
 Exit Status: OK

Oct 29, 2012 10:48:14 AM lia.util.net.copy.FDTWriterSession doPostProcessing
INFO: [ FDTWriterSession ] Post Processing started
Oct 29, 2012 10:48:14 AM lia.util.net.copy.FDTWriterSession doPostProcessing
INFO: [ FDTWriterSession ] No post processing filters defined/processed.


FDTWriterSession ( 5106e622-0934-4903-afc7-92b540759c85 ) final stats:
 Started: Mon Oct 29 10:36:24 PDT 2012
 Ended:   Mon Oct 29 10:48:15 PDT 2012
 Transfer period:   11m 50s
 TotalBytes: 107374182400
 TotalNetworkBytes: 107374182400
 Exit Status: OK

Oct 29, 2012 10:48:15 AM lia.util.net.copy.FDTWriterSession doPostProcessing
INFO: [ FDTWriterSession ] Post Processing started
Oct 29, 2012 10:48:15 AM lia.util.net.copy.FDTWriterSession doPostProcessing
INFO: [ FDTWriterSession ] No post processing filters defined/processed.
Oct 29, 2012 10:48:15 AM lia.util.net.copy.transport.ControlChannel run
INFO:  ControlThread for ( 5106e622-0934-4903-afc7-92b540759c85 ) /192.168.100.2:35553 FINISHED
29/10 10:48:15  7 active sessions:
9cee52db-3569-4b29-b894-06a536c14409Net In: 326.459 Mb/s        Avg: 1.209 Gb/s 100.00% ( 00s )
9ea1e5bc-9c3c-41c0-8772-0e9a8845cbd5Net In: 0.000 b/s   Avg: 1.209 Gb/s 100.00% ( 00s )
9f1983ee-585f-47be-9b76-5d6d6d51f688Net In: 254.261 Mb/s        Avg: 1.209 Gb/s 100.00% ( 00s )
b55d01b7-5d32-4b9c-a222-93d114a33bf7Net In: 681.169 Mb/s        Avg: 1.210 Gb/s 100.00% ( 00s )
Total Net In: 1.262 Gb/s


FDTWriterSession ( 9ea1e5bc-9c3c-41c0-8772-0e9a8845cbd5 ) final stats:
 Started: Mon Oct 29 10:36:25 PDT 2012
 Ended:   Mon Oct 29 10:48:16 PDT 2012
 Transfer period:   11m 51s
 TotalBytes: 107374182400
 TotalNetworkBytes: 107374182400
 Exit Status: OK

Oct 29, 2012 10:48:16 AM lia.util.net.copy.FDTWriterSession doPostProcessing
INFO: [ FDTWriterSession ] Post Processing started
Oct 29, 2012 10:48:16 AM lia.util.net.copy.FDTWriterSession doPostProcessing
INFO: [ FDTWriterSession ] No post processing filters defined/processed.
Oct 29, 2012 10:48:16 AM lia.util.net.copy.transport.ControlChannel run
INFO:  ControlThread for ( 9ea1e5bc-9c3c-41c0-8772-0e9a8845cbd5 ) /192.168.100.2:35554 FINISHED
Oct 29, 2012 10:48:18 AM lia.util.net.copy.transport.ControlChannel run
INFO:  ControlThread for ( 9f1983ee-585f-47be-9b76-5d6d6d51f688 ) /192.168.100.2:35551 FINISHED
Oct 29, 2012 10:48:18 AM lia.util.net.copy.transport.ControlChannel run
INFO:  ControlThread for ( 9cee52db-3569-4b29-b894-06a536c14409 ) /192.168.100.2:35550 FINISHED


FDTWriterSession ( 9f1983ee-585f-47be-9b76-5d6d6d51f688 ) final stats:
 Started: Mon Oct 29 10:36:24 PDT 2012
 Ended:   Mon Oct 29 10:48:18 PDT 2012
 Transfer period:   11m 53s
 TotalBytes: 107374182400
 TotalNetworkBytes: 107374182400
 Exit Status: OK

Oct 29, 2012 10:48:18 AM lia.util.net.copy.FDTWriterSession doPostProcessing
INFO: [ FDTWriterSession ] Post Processing started
Oct 29, 2012 10:48:18 AM lia.util.net.copy.FDTWriterSession doPostProcessing
INFO: [ FDTWriterSession ] No post processing filters defined/processed.


FDTWriterSession ( 9cee52db-3569-4b29-b894-06a536c14409 ) final stats:
 Started: Mon Oct 29 10:36:24 PDT 2012
 Ended:   Mon Oct 29 10:48:18 PDT 2012
 Transfer period:   11m 53s
 TotalBytes: 107374182400
 TotalNetworkBytes: 107374182400
 Exit Status: OK

Oct 29, 2012 10:48:18 AM lia.util.net.copy.FDTWriterSession doPostProcessing
INFO: [ FDTWriterSession ] Post Processing started
Oct 29, 2012 10:48:18 AM lia.util.net.copy.FDTWriterSession doPostProcessing
INFO: [ FDTWriterSession ] No post processing filters defined/processed.


FDTWriterSession ( b55d01b7-5d32-4b9c-a222-93d114a33bf7 ) final stats:
 Started: Mon Oct 29 10:36:25 PDT 2012
 Ended:   Mon Oct 29 10:48:19 PDT 2012
 Transfer period:   11m 54s
 TotalBytes: 107374182400
 TotalNetworkBytes: 107374182400
 Exit Status: OK

Oct 29, 2012 10:48:19 AM lia.util.net.copy.transport.ControlChannel run
INFO:  ControlThread for ( b55d01b7-5d32-4b9c-a222-93d114a33bf7 ) /192.168.100.2:35548 FINISHED
Oct 29, 2012 10:48:19 AM lia.util.net.copy.FDTWriterSession doPostProcessing
INFO: [ FDTWriterSession ] Post Processing started
Oct 29, 2012 10:48:19 AM lia.util.net.copy.FDTWriterSession doPostProcessing
INFO: [ FDTWriterSession ] No post processing filters defined/processed.

Post firmware single clients (November 1)

Summary: After a firmware upgrade the card no longer has connections getting stuck half open ( SYN_SENT ). Writing to disk using the -wCount feature works however we run into problems with more then 2 writer threads. Prior to the firmware update, when we hit these problems the machine would require a complete reboot in order to get back connectivity on the interface. However now we see what looks like fewer threads finish the job. Is this a FDT application problem or are we still seeing issues with the card?

transfer-notes-2012-11-01.png

Details:

With a 15 writer server halfway though the transfer the speed was cut in half.

01/11 20:16:34   Net In: 11.824 Gb/s   Avg: 16.476 Gb/s 99.90% ( 00s )

After the transfer of the files was complete FDT continued to report line like such:

01/11 20:16:30   Net Out: 11.686 Gb/s   Avg: 16.490 Gb/s 99.54% ( 03s )
01/11 20:16:35   Net Out: 11.770 Gb/s   Avg: 16.459 Gb/s 100.00% ( 00s )
01/11 20:16:40   Net Out: 0.000 b/s   Avg: 16.354 Gb/s 100.00% ( 00s )
01/11 20:16:45   Net Out: 0.000 b/s   Avg: 16.250 Gb/s 100.00% ( 00s )
01/11 20:16:50   Net Out: 0.000 b/s   Avg: 16.147 Gb/s 100.00% ( 00s )
01/11 20:16:55   Net Out: 0.000 b/s   Avg: 16.045 Gb/s 100.00% ( 00s )
01/11 20:17:00   Net Out: 0.000 b/s   Avg: 15.945 Gb/s 100.00% ( 00s )

until eventually killed.

The problem with half open connection was eliminated with the firmware upgrade:

tcp        0      1 ::ffff:10.20.3.104:54008    ::ffff:10.20.3.101:54321    SYN_SENT    5684/java
was fixed by moving to the 2.11.500 firmware version for the Mellanox CX3.

Using a single writer thread the transfer was completely stable.

01/11 22:33:59  Net In: 2.475 Gb/s      Avg: 10.954 Gb/s 100.00% ( 00s )

FDTWriterSession ( ff0ff011-80f1-4497-8397-72e1a18ab78b ) final stats:
 Started: Thu Nov 01 22:14:23 PDT 2012
 Ended:   Thu Nov 01 22:34:04 PDT 2012
 Transfer period:   19m 40s
 TotalBytes: 1609085802000
 TotalNetworkBytes: 1609085802000
 Exit Status: OK

Using 2 writer threads also completely stable:

01/11 23:04:14   Net In: 3.423 Gb/s   Avg: 21.328 Gb/s 100.00% ( 00s )

FDTWriterSession ( d9cad4f0-8987-4079-8ec0-7e8f73890afb ) final stats:
 Started: Thu Nov 01 22:54:09 PDT 2012
 Ended:   Thu Nov 01 23:04:18 PDT 2012
 Transfer period:   10m 09s
 TotalBytes: 1609085802000
 TotalNetworkBytes: 1609085802000
 Exit Status: OK

Using 3 writers there was a problem. Part way through the transfer the first 4 connection stop doing anything.

01/11 23:31:07   Net In: 6.523 Gb/s   Avg: 19.177 Gb/s 100.00% ( 00s )

netstat
tcp        0      0 ::ffff:10.20.3.101:54321    ::ffff:10.20.3.104:58233    ESTABLISHED 5898/java
tcp        0      0 ::ffff:10.20.3.101:54321    ::ffff:10.20.3.104:58236    ESTABLISHED 5898/java
tcp        0      0 ::ffff:10.20.3.101:54321    ::ffff:10.20.3.104:58234    ESTABLISHED 5898/java
tcp        0      0 ::ffff:10.20.3.101:54321    ::ffff:10.20.3.104:58237    ESTABLISHED 5898/java
tcp   14390168      0 ::ffff:10.20.3.101:54321    ::ffff:10.20.3.104:58235    ESTABLISHED 5898/java

Edit | Attach | Watch | Print version | History: r12 | r8 < r7 < r6 < r5 | Backlinks | Raw View | More topic actions...
Topic revision: r6 - 2012-11-02 - igable
 
  • Edit
  • Attach
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback