Hi, sorry for the delay in replying. It appears that many of our issues are due to PCoIP packet loss. I am seeing a lot of messages similar to the following inside PCoIP server logs.
02/13/2014, 07:39:32.991> LVL:1 RC: 0 MGMT_PCOIP_DATA :BW: Decrease (loss) loss 0.000 active 768.4845 -> 926.2378, plateau 147.2917 -> 945.1406 min_max 147.2917 floor 207.2000
02/13/2014, 07:39:33.405> LVL:1 RC: 0 MGMT_PCOIP_DATA :BW: Decrease (loss) loss 0.004 active 930.7117 -> 798.0853, plateau 945.1406 -> 814.3727 min_max 147.2917 floor 207.2000
02/13/2014, 07:39:38.449> LVL:2 RC: 0 MGMT_PCOIP_DATA :Tx thread info: bw limit = 814, plateau = 814.4, avg tx = 78.9, avg rx = 7.6 (KBytes/s)
02/13/2014, 07:39:38.449> LVL:1 RC: 0 VGMAC :Stat frms: R=000000/000000/465719 T=000000/1183303/202749 (A/I/O) Loss=0.00%/3.18% (R/T)
02/13/2014, 07:39:40.609> LVL:1 RC: 0 MGMT_PCOIP_DATA :BW: Decrease (loss) loss 0.059 active 824.5322 -> 707.0364, plateau 814.3727 -> 721.4657 min_max 814.3727 floor 29.4000
So, I have been trying to investigate the network. One of the tools I've been using is iperf. I have set up the following:
On my VM: iperf -s -u -l 1300 -i 1
On another VM at the same site: iperf -c 10.2.14.161 -u -l 1300
If I run this, I get good results, sometimes one datagram arrives out of order.
[ 4] local 10.2.14.161 port 5001 connected with 10.2.20.226 port 62676
[ 4] 0.0- 1.0 sec 127 KBytes 1.04 Mbits/sec 0.151 ms 0/ 100 (0%)
[ 4] 1.0- 2.0 sec 128 KBytes 1.05 Mbits/sec 0.096 ms 0/ 101 (0%)
[ 4] 2.0- 3.0 sec 128 KBytes 1.05 Mbits/sec 0.115 ms 0/ 101 (0%)
[ 4] 3.0- 4.0 sec 128 KBytes 1.05 Mbits/sec 0.111 ms 0/ 101 (0%)
[ 4] 4.0- 5.0 sec 128 KBytes 1.05 Mbits/sec 0.176 ms 0/ 101 (0%)
[ 4] 5.0- 6.0 sec 127 KBytes 1.04 Mbits/sec 0.224 ms 0/ 100 (0%)
[ 4] 6.0- 7.0 sec 128 KBytes 1.05 Mbits/sec 0.337 ms 0/ 101 (0%)
[ 4] 7.0- 8.0 sec 128 KBytes 1.05 Mbits/sec 0.433 ms 0/ 101 (0%)
[ 4] 8.0- 9.0 sec 128 KBytes 1.05 Mbits/sec 0.435 ms 0/ 101 (0%)
[ 4] 9.0-10.0 sec 128 KBytes 1.05 Mbits/sec 0.440 ms 0/ 101 (0%)
[ 4] 0.0-10.0 sec 1.25 MBytes 1.05 Mbits/sec 0.508 ms 0/ 1009 (0%)
[ 4] 0.0-10.0 sec 1 datagrams received out-of-order
However, if I increase the bandwidth by using the -b command, then I immediately see significant packet loss.
iperf -c 10.2.14.161 -u -l 1300 -b 100m
[ 4] local 10.2.14.161 port 5001 connected with 10.2.20.226 port 62678
[ 4] 0.0- 1.0 sec 9.62 MBytes 80.7 Mbits/sec 0.059 ms 1702/ 9462 (18%)
[ 4] 1.0- 2.0 sec 10.5 MBytes 87.8 Mbits/sec 0.022 ms 1168/ 9615 (12%)
[ 4] 2.0- 3.0 sec 10.4 MBytes 87.5 Mbits/sec 0.000 ms 1207/ 9616 (13%)
[ 4] 3.0- 4.0 sec 10.5 MBytes 88.2 Mbits/sec 0.000 ms 1130/ 9615 (12%)
[ 4] 4.0- 5.0 sec 10.2 MBytes 85.2 Mbits/sec 0.000 ms 1426/ 9616 (15%)
[ 4] 5.0- 6.0 sec 10.4 MBytes 87.4 Mbits/sec 0.000 ms 1212/ 9615 (13%)
[ 4] 6.0- 7.0 sec 10.3 MBytes 86.2 Mbits/sec 0.000 ms 1331/ 9615 (14%)
[ 4] 7.0- 8.0 sec 10.0 MBytes 84.3 Mbits/sec 0.000 ms 1514/ 9616 (16%)
[ 4] 8.0- 9.0 sec 10.1 MBytes 85.1 Mbits/sec 0.000 ms 1431/ 9615 (15%)
[ 4] 9.0-10.0 sec 9.66 MBytes 81.0 Mbits/sec 0.000 ms 1827/ 9616 (19%)
[ 4] 0.0-10.0 sec 102 MBytes 85.3 Mbits/sec 0.000 ms 13947/96002 (15%)
[ 4] 0.0-10.0 sec 1 datagrams received out-of-order
Is this telling me anything useful, or do I have my iperf configured incorrectly? I should note that I am using the following policy in the pcoip_adm group policy template "Configure the PCoIP session MTU" Enabled and set to 1300 bytes.