November 8, 2011
Mark managed to dump some packets down our 100G link this morning using a test set. We didn't observe any packet loss. He was sending at 19 Gbit/s.
'
Files produced via dd and /dev/zero has some interesting properties when stored on the SSDs. It would appear that writing zero to a ssd gives an artificially good results. We haven't discovered the cause of this, but it definitely has in impact. We switched to using files produced by /dev/urandom our disk to disk performance has deteriorated significantly:
In an effort to get this back to something more reasonable we pulled two drives out of scdemo08 and placed them in scdemo06 and scdemo07. We improved performance by about 12 percent. (We might be able to add a few more drives).
We did some more investigation and discovered that we get different results doing pure read of all zero files vs random files. For example:
Read a random file: |
955.402 MB/s |
7.46 Gb/s |
Read a zeroed file: |
1.205 GB/s |
9.64 Gb / s |
Write a /dev/zero file: |
1.372 GB/s |
10.97600 Gb/s |
scdemo07->scdemo00
Avg: 5.536 Gb/s
FDTReaderSession ( 2df13142-1621-496a-96be-4ded64eb9645 ) final stats:
Started: Tue Nov 08 17:10:41 PST 2011
Ended: Tue Nov 08 17:23:37 PST 2011
Transfer period: 12m 56s
TotalBytes: 536870913830
TotalNetworkBytes: 536870913830
Exit Status: OK
scdemo06->scdemo09
Avg: 6.172 Gb/s 100.00% ( 00s )
FDTReaderSession ( 7abaf554-401a-4ba3-85d0-16fe0eae35ab ) final stats:
Started: Tue Nov 08 17:02:29 PST 2011
Ended: Tue Nov 08 17:14:05 PST 2011
Transfer period: 11m 35s
TotalBytes: 536870913830
TotalNetworkBytes: 536870913830
Exit Status: OK
Writing to disk with with the /dev/zero device:
[iscdemo00 ~]$ java -cp ~/fdt/fdt.jar lia.util.net.common.DDCopy if=/dev/zero of=/ssd/10Goutputfile5 bs=10M count=10240
[Tue Nov 08 19:09:03 PST 2011] Current Speed = 1.416 GB/s Avg Speed: 1.416 GB/s Total Transfer: 2.832 GB
[Tue Nov 08 19:09:05 PST 2011] Current Speed = 1.328 GB/s Avg Speed: 1.371 GB/s Total Transfer: 5.605 GB
[Tue Nov 08 19:09:07 PST 2011] Current Speed = 1.313 GB/s Avg Speed: 1.352 GB/s Total Transfer: 8.232 GB
[Tue Nov 08 19:09:09 PST 2011] Current Speed = 1.309 GB/s Avg Speed: 1.341 GB/s Total Transfer: 10.85 GB
[Tue Nov 08 19:09:11 PST 2011] Current Speed = 1.284 GB/s Avg Speed: 1.33 GB/s Total Transfer: 13.418 GB
[Tue Nov 08 19:09:13 PST 2011] Current Speed = 1.401 GB/s Avg Speed: 1.342 GB/s Total Transfer: 16.221 GB
[Tue Nov 08 19:09:15 PST 2011] Current Speed = 1.401 GB/s Avg Speed: 1.35 GB/s Total Transfer: 19.023 GB
[Tue Nov 08 19:09:17 PST 2011] Current Speed = 1.391 GB/s Avg Speed: 1.355 GB/s Total Transfer: 21.816 GB
[Tue Nov 08 19:09:19 PST 2011] Current Speed = 1.392 GB/s Avg Speed: 1.359 GB/s Total Transfer: 24.6 GB
[Tue Nov 08 19:09:21 PST 2011] Current Speed = 1.396 GB/s Avg Speed: 1.363 GB/s Total Transfer: 27.393 GB
[Tue Nov 08 19:09:23 PST 2011] Current Speed = 1.396 GB/s Avg Speed: 1.366 GB/s Total Transfer: 30.186 GB
[Tue Nov 08 19:09:25 PST 2011] Current Speed = 1.387 GB/s Avg Speed: 1.368 GB/s Total Transfer: 32.959 GB
[Tue Nov 08 19:09:27 PST 2011] Current Speed = 1.391 GB/s Avg Speed: 1.369 GB/s Total Transfer: 35.742 GB
[Tue Nov 08 19:09:29 PST 2011] Current Speed = 1.382 GB/s Avg Speed: 1.37 GB/s Total Transfer: 38.506 GB
[Tue Nov 08 19:09:31 PST 2011] Current Speed = 1.386 GB/s Avg Speed: 1.371 GB/s Total Transfer: 41.279 GB
[Tue Nov 08 19:09:33 PST 2011] Current Speed = 1.377 GB/s Avg Speed: 1.372 GB/s Total Transfer: 44.033 GB
[Tue Nov 08 19:09:35 PST 2011] Current Speed = 1.381 GB/s Avg Speed: 1.372 GB/s Total Transfer: 46.797 GB
[Tue Nov 08 19:09:37 PST 2011] Current Speed = 1.382 GB/s Avg Speed: 1.373 GB/s Total Transfer: 49.561 GB
[Tue Nov 08 19:09:39 PST 2011] Current Speed = 1.357 GB/s Avg Speed: 1.372 GB/s Total Transfer: 52.275 GB
^C
Total Transfer: 52.812 GBytes ( 56706990080 bytes )
Time: 38 seconds
Avg Speed: 1.372 GB/s
read a random:
[igable@scdemo00 rbatch0]$ java -cp ~/fdt/fdt.jar lia.util.net.common.DDCopy if=/ssd/10Grandom03.dat of=/dev/null bs=10M count=10240
Got exception:
java.io.FileNotFoundException: /ssd/10Grandom03.dat (No such file or directory)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.<init>(Unknown Source)
at java.io.RandomAccessFile.<init>(Unknown Source)
at lia.util.net.common.DDCopy.main(DDCopy.java:371)
[igable@scdemo00 rbatch0]$ java -cp ~/fdt/fdt.jar lia.util.net.common.DDCopy if=/ssd/rbatch0/10Grandom03.dat of=/dev/null bs=10M count=10240
[Tue Nov 08 19:16:10 PST 2011] Current Speed = 760 MB/s Avg Speed: 760 MB/s Total Transfer: 1.484 GB
[Tue Nov 08 19:16:12 PST 2011] Current Speed = 1,008.646 MB/s Avg Speed: 886.82 MB/s Total Transfer: 3.535 GB
[Tue Nov 08 19:16:14 PST 2011] Current Speed = 989.505 MB/s Avg Speed: 920.598 MB/s Total Transfer: 5.469 GB
[Tue Nov 08 19:16:16 PST 2011] Current Speed = 990 MB/s Avg Speed: 937.771 MB/s Total Transfer: 7.402 GB
[Tue Nov 08 19:16:18 PST 2011] Current Speed = 1,000 MB/s Avg Speed: 950.114 MB/s Total Transfer: 9.355 GB
Total Transfer: 10 GBytes ( 10737418240 bytes )
Time: 10 seconds
Avg Speed: 955.402 MB/s
Now read a file made with /dev/zero:
[igable@scdemo00 rbatch0]$ java -cp ~/fdt/fdt.jar lia.util.net.common.DDCopy if=/ssd/rbatch0/10Gfile01.dat of=/dev/null
[Tue Nov 08 19:40:14 PST 2011] Current Speed = 1.163 GB/s Avg Speed: 1.164 GB/s Total Transfer: 2.328 GB
[Tue Nov 08 19:40:16 PST 2011] Current Speed = 1.194 GB/s Avg Speed: 1.179 GB/s Total Transfer: 4.753 GB
[Tue Nov 08 19:40:18 PST 2011] Current Speed = 1.229 GB/s Avg Speed: 1.196 GB/s Total Transfer: 7.211 GB
[Tue Nov 08 19:40:20 PST 2011] Current Speed = 1.231 GB/s Avg Speed: 1.204 GB/s Total Transfer: 9.674 GB
Total Transfer: 10 GBytes ( 10737418240 bytes )
Time: 8 seconds
Avg Speed: 1.205 GB/s
November 7, 2011
All nodes are now configured with the Write Through on the SSD logical disk. SNMP polling to the Brocade now done in 1 min intervals.
Figure 3: We are now seeing above 9 Gbps disk to disk on all boxes. The write on scdemo03 is the slowest in the cluster by about 1 Gb/s, it's not immediately obvious what the problem is.
Completed a 5 to 5 test Disk test:
|
CFQ Scheduler |
|
NOOP Sceduler |
|
Transfer |
Rate |
Reverse Rate |
Rate |
Reverse Rate |
scdemo00->scdemo05 |
Avg: 9.174 Gb/s |
Avg: 9.435 Gb/s |
Avg: 9.544 Gb/s |
Avg: 9.489 Gb/s |
scdemo01->scdemo06 |
Avg: 9.172 Gb/s |
Avg: 9.175 Gb/s |
Avg: 9.490 Gb/s |
Avg: 9.492 Gb/s |
scdemo02->scdemo07 |
Avg: 9.121 Gb/s |
Avg: 9.173 Gb/s |
Avg: 9.490 Gb/s |
Avg: 9.549 Gb/s |
scdemo03->scdemo08 |
Avg: 9.225 Gb/s |
Avg: 8.087 Gb/s |
Avg: 9.382 Gb/s |
Avg: 8.332 Gb/s |
scdemo04->scdemo09 |
Avg: 9.382 Gb/s |
Avg: 9.170 Gb/s |
Avg: 9.545 Gb/s |
Avg: 9.545 Gb/s |
Table 1: Switching to the noop scheduler has added about another 5 percent improvement in performance.
Switching to no noop:
echo noop > /sys/block/sdb/queue/scheduler
Figure 4: Shows improvement when moving from CFQ to NoOp Kernel disk scheduler.
November 6, 2011
Figure 1: This shows preliminary results from testing from Sunday morning Nov 6. The summary is that simultaneous iperf is good from all hosts (9.9+ Gbps) . Reading from disk and writing memory is reasonable with performance of 7.5 -8.5 Gbit/s, but I think could do with some improvement. Disk to disk performance is only 5.0 Gbps and not really adequate for the test. We are using 2000 atlas files of around 500MB on each node for the transfers. We had better performance with 10G files written with 'dd'. Graph pulled from
cacti for the Brocade
.
Now attempting to improve disk performance be removing LVM and and changing the Raid controller to write through. Also note that scdemo00 and scdemo01 had their raid stripe set at 64 kb.
Delete a logical drive
[root@scdemo00 ~]# /opt/MegaRAID/MegaCli/MegaCli64 -CfgLdDel -L1 -a0
Get the enclosure device ID:
[root@scdemo00 ~]# /opt/MegaRAID/MegaCli/MegaCli64 -EncInfo -aALL
Create a logical drive
[root@scdemo00 ~]# /opt/MegaRAID/MegaCli/MegaCli64 -CfgLdAdd -r0 [32:2,32:3,32:4,32:5,32:6] WT ADRA Direct -strpsz1024 -a0
Dramatic improvement with these new disk settings, but still using large files.
[igable@scdemo09 ssd]$ fdtclient -c 10.200.0.50 10Gfile01.dat * -d /ssd/batch0
FDT [ 0.9.23-201107290935 ] STARTED ...
Nov 06, 2011 5:04:15 PM lia.util.net.common.Config <init>
INFO: Using lia.util.net.copy.PosixFSFileChannelProviderFactory as FileChannelProviderFactory
Nov 06, 2011 5:04:15 PM lia.util.net.common.Config <init>
INFO: FDT started in client mode
<snip>
INFO: Requested window size -1. Using window size: 43690
06/11 17:04:25 Net Out: 9.929 Gb/s Avg: 9.929 Gb/s
06/11 17:04:30 Net Out: 9.408 Gb/s Avg: 9.668 Gb/s
<snip>
06/11 17:07:45 Net Out: 9.892 Gb/s Avg: 9.597 Gb/s 93.30% ( 15s )
06/11 17:07:50 Net Out: 9.923 Gb/s Avg: 9.605 Gb/s 95.61% ( 09s )
06/11 17:07:55 Net Out: 8.892 Gb/s Avg: 9.588 Gb/s 97.68% ( 05s )
06/11 17:08:00 Net Out: 9.891 Gb/s Avg: 9.595 Gb/s 99.98% ( 00s )
Nov 06, 2011 5:08:03 PM lia.util.net.copy.FDTReaderSession handleEndFDTSession
INFO: [ FDTReaderSession ] Remote FDTWriterSession for session [ f72d49d9-db0e-4b27-9264-0247e1bc6864 ] finished OK!
06/11 17:08:05 Net Out: 90.602 Mb/s Avg: 9.384 Gb/s 100.00% ( 00s )
FDTReaderSession ( f72d49d9-db0e-4b27-9264-0247e1bc6864 ) final stats:
Started: Sun Nov 06 17:04:15 PST 2011
Ended: Sun Nov 06 17:08:06 PST 2011
Transfer period: 03m 50s
TotalBytes: 268435456000
TotalNetworkBytes: 268435456000
Exit Status: OK
Nov 06, 2011 5:08:06 PM lia.util.net.copy.FDTReaderSession doPostProcessing
INFO: [ FDTReaderSession ] Post Processing started
Nov 06, 2011 5:08:06 PM lia.util.net.copy.FDTReaderSession doPostProcessing
INFO: [ FDTReaderSession ] No post processing filters defined/processed.
[ Sun Nov 06 17:08:07 PST 2011 ] - GracefulStopper hook started ... Waiting for the cleanup to finish
[ Sun Nov 06 17:08:07 PST 2011 ] - GracefulStopper hook finished!
[ Sun Nov 06 17:08:07 PST 2011 ] FDT Session finished OK.
Now do scdemo00->scdemo09 with 1TB
6/11 17:42:46 Net In: 6.342 Gb/s Avg: 8.583 Gb/s 100.00% ( 00s )
Nov 06, 2011 5:42:49 PM lia.util.net.copy.transport.ControlChannel run
INFO: ControlThread for ( 35c60688-40f0-4480-8025-bde1fcee9b25 ) /10.200.0.50:55421 FINISHED
FDTWriterSession ( 35c60688-40f0-4480-8025-bde1fcee9b25 ) final stats:
Started: Sun Nov 06 17:26:44 PST 2011
Ended: Sun Nov 06 17:42:50 PST 2011
Transfer period: 16m 06s
TotalBytes: 1030792151040
TotalNetworkBytes: 1030792151040
Exit Status: OK
scdemo09->scdemo00
INFO: [ FDTReaderSession ] Remote FDTWriterSession for session [ 94f47223-967c-4275-a737-a71929e5dddb ] finished OK!
06/11 19:36:35 Net Out: 5.450 Gb/s Avg: 9.383 Gb/s 100.00% ( 00s )
FDTReaderSession ( 94f47223-967c-4275-a737-a71929e5dddb ) final stats:
Started: Sun Nov 06 19:21:55 PST 2011
Ended: Sun Nov 06 19:36:38 PST 2011
Transfer period: 14m 42s
TotalBytes: 1030792151040
TotalNetworkBytes: 1030792151040
Exit Status: OK
Figure 2: After changing the raid configuration to be write through and using large 10G files created with 'dd' we see a much improved disk to disk throughput (as shown in the two FDT outputs immediately above). Strangely we see that one direction is nearly 0.8 Gbps faster then the other. I don't understand the reason for this yet.