Start a Conversation

Unsolved

This post is more than 5 years old

A

5 Practitioner

 • 

274.2K Posts

698

October 8th, 2015 23:00

Customer question about IB traffic?

Hi all,


Customer’s question is about ib0 (int-a)  and ib1 (int-b) throughput on their cluster. Usually, IB traffic should be balanced on the cluster but customer reports that traffic of ib0/ib1 on node 1 is much lower that on node 2 or 3.

# isi statistics history --stats=node.net.iface.bytes.in.rate.0,node.net.iface.bytes.out.rate.0
# isi statistics history --stats=node.net.iface.bytes.in.rate.1,node.net.iface.bytes.out.rate.1

I have just connected to cluster and node-1 is reporting  unbalanced rates.

ISI00-EMAD11-1% sudo isi statistics query --nodes=all --stats=node.net.iface.name.0

Password:

NodeID node.net.iface.name.0

1 int-a

2 int-a

3 int-a

ISI00-EMAD11-1% sudo isi statistics query --nodes=all --stats=node.net.iface.name.1

NodeID node.net.iface.name.1

1 int-b

2 int-b

3 int-b

ISI00-EMAD11-1% sudo isi statistics query --nodes=all --stats=node.net.iface.bytes.in.rate.0

NodeID node.net.iface.bytes.in.rate.0

1 193.6

2 42249.8

3 139942.8

average 60795.4

ISI00-EMAD11-1% sudo isi statistics query --nodes=all --stats=node.net.iface.bytes.in.rate.1

NodeID node.net.iface.bytes.in.rate.1

1 558.0

2 165.0

3 1503.2

average 742.1

ISI00-EMAD11-1% sudo isi statistics query --nodes=all --stats=node.net.iface.bytes.out.rate.0

NodeID node.net.iface.bytes.out.rate.0

1 559.4

2 1135796.8

3 949409.4

average 695255.2

ISI00-EMAD11-1% sudo isi statistics query --nodes=all --stats=node.net.iface.bytes.out.rate.1

NodeID node.net.iface.bytes.out.rate.1

1 1025.2

2 0.0

3 652.2

average 559.1

In addition, I have attached a excel sheet where samples were taken for a day and there are three graphics where you can see that traffic is unbalanced on node-1.

Is there any explanation or reported bug?


Thank you

Daniel

1 Attachment

1.2K Posts

October 9th, 2015 00:00

We have seen those consistently inconsistent IB throughput statistics on OneFS 6.5,

as well as on 7.0, on 7.1 and on 7.2... 

So apparently nobody cares (enough)... good luck

-- Peter

205 Posts

October 10th, 2015 04:00

This would seem to be normal to me... unless the data is laid out completely evenly, and is all accessed evenly, then the traffic is going to be different. A critical question might be if all nodes have external networking connected and whether the connections are balanced there.

1.2K Posts

October 10th, 2015 04:00

The data usually /is/ laid out quite nicely, and so is the disk throughput per node.

The IB reports simply don't match (not even sum up correctly)

between the distributed back-end disk traffic  and the NAS front-end traffic;

a certain subset of nodes always report the IB flow way too low.

-- Peter

No Events found!

Top