What is OpenSM

https://docs.nvidia.com/networking/display/mlnxofedv461000/opensm

#
# See the comments in the following example.
# They explain different keywords and their meaning.
#
port-groups
 
    port-group # using port GUIDs
        name: Storage
        # "use" is just a description that is used for logging
        #  Other than that, it is just a comment
        use: SRP Targets
        port-guid: 0x10000000000001, 0x10000000000005-0x1000000000FFFA
        port-guid: 0x1000000000FFFF
    end-port-group
 
    port-group
        name: Virtual Servers
        # The syntax of the port name is as follows:
        #   "node_description/Pnum".
        # node_description is compared to the NodeDescription of the node,
        # and "Pnum" is a port number on that node.
        port-name: “vs1 HCA-1/P1, vs2 HCA-1/P1”
    end-port-group
 
    # using partitions defined in the partition policy
    port-group
        name: Partitions
        partition: Part1
        pkey: 0x1234
    end-port-group
 
    # using node types: CA, ROUTER, SWITCH, SELF (for node that runs SM)
    # or ALL (for all the nodes in the subnet)
    port-group
        name: CAs and SM
        node-type: CA, SELF
    end-port-group
 
end-port-groups
 
qos-setup
    # This section of the policy file describes how to set up SL2VL and VL
    # Arbitration tables on various nodes in the fabric.
    # However, this is not supported in OFED - the section is parsed
    # and ignored. SL2VL and VLArb tables should be configured in the
    # OpenSM options file (by default - /var/cache/opensm/opensm.opts).
end-qos-setup
 
qos-levels
 
    # Having a QoS Level named "DEFAULT" is a must - it is applied to
    # PR/MPR requests that didn't match any of the matching rules.
    qos-level
        name: DEFAULT
        use: default QoS Level
        sl: 0
    end-qos-level
 
    # the whole set: SL, MTU-Limit, Rate-Limit, PKey, Packet Lifetime
    qos-level
        name: WholeSet
        sl: 1
        mtu-limit: 4
        rate-limit: 5
        pkey: 0x1234
        packet-life: 8
    end-qos-level
 
end-qos-levels
 
# Match rules are scanned in order of their apperance in the policy file.
# First matched rule takes precedence.
qos-match-rules
 
    # matching by single criteria: QoS class
    qos-match-rule
        use: by QoS class
        qos-class: 7-9,11
        # Name of qos-level to apply to the matching PR/MPR
        qos-level-name: WholeSet
    end-qos-match-rule
 
    # show matching by destination group and service id
    qos-match-rule
        use: Storage targets
        destination: Storage
        service-id: 0x10000000000001, 0x10000000000008-0x10000000000FFF
        qos-level-name: WholeSet
    end-qos-match-rule
 
    qos-match-rule
        source: Storage
        use: match by source group only
        qos-level-name: DEFAULT
    end-qos-match-rule
    qos-match-rule
        use: match by all parameters
        qos-class: 7-9,11
        source: Virtual Servers
        destination: Storage
        service-id: 0x0000000000010000-0x000000000001FFFF
        pkey: 0x0F00-0x0FFF
        qos-level-name: WholeSet
    end-qos-match-rule
end-qos-match-rules

Simple QoS Policy - Details and Examples

Simple QoS policy match rules are tailored for matching ULPs (or some application on top of a ULP) PR/MPR requests. This section has a list of per-ULP (or per-application) match rules and the SL that should be enforced on the matched PR/MPR query.Match rules include:

Default match rule that is applied to PR/MPR query that didn't match any of the other match rules

IPoIB with a default PKey

IPoIB with a specific PKey

Any ULP/application with a specific Service ID in the PR/MPR query

Any ULP/application with a specific PKey in the PR/MPR query

Any ULP/application with a specific target IB port GUID in the PR/MPR query

Since any section of the policy file is optional, as long as basic rules of the file are kept (such as no referring to nonexistent port group, having default QoS Level, etc), the simple policy section (qos-ulps) can serve as a complete QoS policy file.The shortest policy file in this case would be as follows:

qos-ulps default : 0 #default SL end-qos-ulps

It is equivalent to the previous example of the shortest policy file, and it is also equivalent to not having policy file at all. Below is an example of simple QoS policy with all the possible keywords:

qos-ulps default :0 # default SL sdp, port-num 30000 :0 # SL for application running on # top of SDP when a destination # TCP/IPport is 30000 sdp, port-num 10000-20000 : 0 sdp :1 # default SL for any other # application running on top of SDP rds :2 # SL for RDS traffic ipoib, pkey 0x0001 :0 # SL for IPoIB on partition with # pkey 0x0001 ipoib :4 # default IPoIB partition, # pkey=0x7FFF any, service-id 0x6234:6 # match any PR/MPR query with a # specific Service ID any, pkey 0x0ABC :6 # match any PR/MPR query with a # specific PKey srp, target-port-guid 0x1234 : 5 # SRP when SRP Target is located # on a specified IB port GUID any, target-port-guid 0x0ABC-0xFFFFF : 6 # match any PR/MPR query # with a specific target port GUID end-qos-ulps

Similar to the advanced policy definition, matching of PR/MPR queries is done in order of appearance in the QoS policy file such as the first match takes precedence, except for the “default” rule, which is applied only if the query didn’t match any other rule. All other sections of the QoS policy file take precedence over the qos-ulps section. That is, if a policy file has both qos-match-rules and qos-ulps sections, then any query is matched first against the rules in the qos-match-rules section, and only if there was no match, the query is matched against the rules in qos-ulps section.Note that some of these match rules may overlap, so in order to use the simple QoS definition effectively, it is important to understand how each of the ULPs is matched.

IPoIB

IPoIB query is matched by PKey or by destination GID, in which case this is the GID of the multicast group that OpenSM creates for each IPoIB partition.Default PKey for IPoIB partition is 0x7fff, so the following three match rules are equivalent:

ipoib:ipoib, pkey 0x7fff : any, pkey 0x7fff :

SRP

Service ID for SRP varies from storage vendor to vendor, thus SRP query is matched by the tar- get IB port GUID. The following two match rules are equivalent:

srp, target-port-guid 0x1234 : any, target-port-guid 0x1234 :

Note that any of the above ULPs might contain target port GUID in the PR query, so in order for these queries not to be recognised by the QoS manager as SRP, the SRP match rule (or any match rule that refers to the target port GUID only) should be placed at the end of the qos-ulps match rules.

MPI

SL for MPI is manually configured by an MPI admin. OpenSM is not forcing any SL on the MPI traffic, which explains why it is the only ULP that did not appear in the qos-ulps section.

SL2VL Mapping and VL Arbitration

OpenSM cached options file has a set of QoS related configuration parameters, that are used to configure SL2VL mapping and VL arbitration on IB ports. These parameters are:

Max VLs: the maximum number of VLs that will be on the subnet

High limit: the limit of High Priority component of VL Arbitration table (IBA 7.6.9)

VLArb low table: Low priority VL Arbitration table (IBA 7.6.9) template

VLArb high table: High priority VL Arbitration table (IBA 7.6.9) template

SL2VL: SL2VL Mapping table (IBA 7.6.6) template. It is a list of VLs corresponding to SLs 0-15 (Note that VL15 used here means drop this SL).

There are separate QoS configuration parameters sets for various target types: CAs, routers, switch external ports, and switch’s enhanced port 0. The names of such parameters are prefixed by “qos__" string. Here is a full list of the currently supported sets:

qos_ca_ - QoS configuration parameters set for CAs.

qos_rtr_ - parameters set for routers.

qos_sw0_ - parameters set for switches' port 0.

qos_swe_ - parameters set for switches' external ports.

Here’s the example of typical default values for CAs and switches’ external ports (hard-coded in OpenSM initialisation):

qos_ca_max_vls 15 qos_ca_high_limit 0 qos_ca_vlarb_high 0:4,1:0,2:0,3:0,4:0,5:0,6:0,7:0,8:0,9:0,10:0,11:0,12:0,13:0,14:0 qos_ca_vlarb_low 0:0,1:4,2:4,3:4,4:4,5:4,6:4,7:4,8:4,9:4,10:4,11:4,12:4,13:4,14:4 qos_ca_sl2vl 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,7 qos_swe_max_vls 15 qos_swe_high_limit 0 qos_swe_vlarb_high 0:4,1:0,2:0,3:0,4:0,5:0,6:0,7:0,8:0,9:0,10:0,11:0,12:0,13:0,14:0 qos_swe_vlarb_low 0:0,1:4,2:4,3:4,4:4,5:4,6:4,7:4,8:4,9:4,10:4,11:4,12:4,13:4,14:4 qos_swe_sl2vl 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,7

VL arbitration tables (both high and low) are lists of VL/Weight pairs. Each list entry contains a VL number (values from 0-14), and a weighting value (values 0-255), indicating the number of 64 byte units (credits) which may be transmitted from that VL when its turn in the arbitration occurs. A weight of 0 indicates that this entry should be skipped. If a list entry is programmed for VL15 or for a VL that is not supported or is not currently configured by the port, the port may either skip that entry or send from any supported VL for that entry. Note, that the same VLs may be listed multiple times in the High or Low priority arbitration tables, and, further, it can be listed in both tables. The limit of high-priority VLArb table (qos__high_limit) indicates the number of high-priority packets that can be transmitted without an opportunity to send a low-priority packet. Specifically, the number of bytes that can be sent is high_limit times 4K bytes. A high_limit value of 255 indicates that the byte limit is unbounded. Warning

If the 255 value is used, the low priority VLs may be starved.

A value of 0 indicates that only a single packet from the high-priority table may be sent before an opportunity is given to the low-priority table. Keep in mind that ports usually transmit packets of size equal to MTU. For instance, for 4KB MTU a single packet will require 64 credits, so in order to achieve effective VL arbitration for packets of 4KB MTU, the weighting values for each VL should be multiples of 64. Below is an example of SL2VL and VL Arbitration configuration on subnet:

qos_ca_max_vls 15 qos_ca_high_limit 6 qos_ca_vlarb_high 0:4 qos_ca_vlarb_low 0:0,1:64,2:128,3:192,4:0,5:64,6:64,7:64 qos_ca_sl2vl 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,7 qos_swe_max_vls 15 qos_swe_high_limit 6 qos_swe_vlarb_high 0:4 qos_swe_vlarb_low 0:0,1:64,2:128,3:192,4:0,5:64,6:64,7:64 qos_swe_sl2vl 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,7

In this example, there are 8 VLs configured on subnet: VL0 to VL7. VL0 is defined as a high priority VL, and it is limited to 6 x 4KB = 24KB in a single transmission burst. Such configuration would suilt VL that needs low latency and uses small MTU when transmitting packets. Rest of VLs are defined as low priority VLs with different weights, while VL4 is effectively turned off.

Deployment Example

The figure below shows an example of an InfiniBand subnet that has been configured by a QoS manager to provide different service levels for various ULPs.

QoS Deployment on InfiniBand Subnet Example

worddave17a745cfa7444e9ff94e5f27a5f1986.png

Enhanced QoS

Enhanced QoS provides a higher resolution of QoS at the service level (SL). Users can configure rate limit values per SL for physical ports, virtual ports, and port groups, using enhanced_qos_policy_file configuration parameter.Valid values of this parameter:

Full path to the policy file through which Enhanced QoS Manager is configured

"null" - to disable the Enhanced QoS Manager (default value)

Warning

To enable Enhanced QoS Manager, QoS must be enabled in OpenSM.

Enhanced QoS Policy File

The policy file is comprised of two sections:

BW_NAMES: Used to define bandwidth setting and name (currently, rate limit is the only setting). Bandwidth names are defined using the syntax:

=

Example: My_bandwidth = 50

BW_RULES: Used to define the rules that map the bandwidth setting to a specific SL of a specific GUID. Bandwidth rules are defined using the syntax:

| = :, :…

Examples:

0x2c90000000025 = 5:My_bandwidth, 7:My_bandwidth

Port_grp1 = 3:My_bandwidth, 9:My_bandwidth

Notes:

When rate limit is set to 0 - it means that there is an unlimited rate limit.

Any unspecified SL in a rule will be set to 0 (unlimited) rate limit automatically.

"default" is a well-known name which can be used to define a default rule used for any GUID with no defined rule (If no default rule is defined, any GUID without a specific rule will be configured with unlimited rate limit for all SLs).

Failure to complete policy file parsing leads to an undefined behaviour. User must confirm no relevant error messages in SM log in order to ensure Enhanced QoS Manager is configured properly.

An empty file with only 'BW_NAMES' and 'BW_RULES' keywords configures the network with an unlimited rate limit.

Policy File Example

The below is an example of configuring all ports in the fabric with rate limit of 50Mbps on SL1, except for GUID 0x2c90000000025, which is configured with rate limit of 100Mbps on SL1. In this example, all SLs (other than SL1) are unlimited.


BW_NAMES bw1 = 50 bw2 = 100

BW_RULES default= 1:bw1 0x2c90000000025= 1:bw2 ————————————————————————

QoS Configuration Examples

The following are examples of QoS configuration for different cluster deployments. Each example provides the QoS level assignment and their administration via OpenSM configuration files. Typical HPC Example: MPI and Lustre Assignment of QoS Levels

MPI

    Separate from I/O load

    Min BW of 70%

Storage Control (Lustre MDS)

    Low latency

Storage Data (Lustre OST)

    Min BW 30%

Administration

MPI is assigned an SL via the command linehost1# mpirun –sl 0

OpenSM QoS policy file

            

            

 qos-ulps
	default										:0 # default SL (for MPI)
	any, target-port-guid OST1,OST2,OST3,OST4	:1 # SL for Lustre OST
	any, target-port-guid MDS1,MDS2				:2 # SL for Lustre MDS
 end-qos-ulps

        

Note: In this policy file example, replace OST* and MDS* with the real port GUIDs.

OpenSM options file

            

            

qos_max_vls 8
qos_high_limit 0
qos_vlarb_high 2:1
qos_vlarb_low 0:96,1:224
qos_sl2vl 0,1,2,3,4,5,6,7,15,15,15,15,15,15,15,15

EDC SOA (2-tier): IPoIB and SRP

The following is an example of QoS configuration for a typical enterprise data centre (EDC) with service oriented architecture (SOA), with IPoIB carrying all application traffic and SRP used for storage. QoS Levels

Application traffic

    IPoIB (UD and CM) and SDP

    Isolated from storage

    Min BW of 50%

SRP

    Min BW 50%

    Bottleneck at storage nodes

Administration

OpenSM QoS policy file

            

            

 qos-ulps
	default									:0
	ipoib									:1
	sdp										:1
	srp, target-port-guid SRPT1,SRPT2,SRPT3		:2
 end-qos-ulps

        

Note: In this policy file example, replace SRPT* with the real SRP Target port GUIDs.

OpenSM options file

            

            

qos_max_vls 8
qos_high_limit 0
qos_vlarb_high 1:32,2:32
qos_vlarb_low 0:1,
qos_sl2vl 0,1,2,3,4,5,6,7,15,15,15,15,15,15,15,15

EDC (3-tier): IPoIB, RDS, SRP

The following is an example of QoS configuration for an enterprise data centre (EDC), with IPoIB carrying all application traffic, RDS for database traffic, and SRP used for storage. QoS Levels

Management traffic (ssh)

    IPoIB management VLAN (partition A)

    Min BW 10%

Application traffic

    IPoIB application VLAN (partition B)

    Isolated from storage and database

    Min BW of 30%

Database Cluster traffic

    RDS

    Min BW of 30%

SRP

    Min BW 30%

    Bottleneck at storage nodes

Administration

OpenSM QoS policy file

            

            

 qos-ulps 
	default 									:0 
	ipoib, pkey 0x8001							:1 
	ipoib, pkey 0x8002 							:2 
	rds 										:3 
	srp, target-port-guid SRPT1, SRPT2, SRPT3 	:4
 end-qos-ulps

        

Note: In the following policy file example, replace SRPT* with the real SRP Initiator port GUIDs.

OpenSM options file

            

            

qos_max_vls 8
qos_high_limit 0
qos_vlarb_high 1:32,2:96,3:96,4:96
qos_vlarb_low 0:1
qos_sl2vl 0,1,2,3,4,5,6,7,15,15,15,15,15,15,15,15

        

Partition configuration file

            

            

Default=0x7fff,ipoib : ALL=full;PartA=0x8001, sl=1, ipoib : ALL=full;

Adaptive Routing Manager and SHIELD

Adaptive Routing Manager supports advanced InfiniBand features; Adaptive Routing (AR) and Self-Healing Interconnect Enhancement for InteLligent Datacenters (SHIELD).

For information on how to set up AR and SHIELD, please refer to HowTo Configure Adaptive Routing and SHIELD Community post.

Congestion Control Manager

Congestion Manager works in conjunction with Congestion Control implemented on the Switch. To verify whether your switch supports Congestion Control, refer to the switches Firmware Release Notes . Congestion Control Manager is a Subnet Manager (SM) plug-in, i.e. it is a shared library (libc- cmgr.so) that is dynamically loaded by the Subnet Manager. Congestion Control Manager is installed as part of Mellanox OFED installation. The Congestion Control mechanism controls traffic entry into a network and attempts to avoid over-subscription of any of the processing or link capabilities of the intermediate nodes and networks. Additionally, is takes resource reducing steps by reducing the rate of sending packets. Congestion Control Manager enables and configures Congestion Control mechanism on fabric nodes (HCAs and switches).

Running OpenSM with Congestion Control Manager

Congestion Control (CC) Manager can be enabled/disabled through SM options file. To do so, perform the following:

Create the file. Run:

            

            

opensm -c <options-file-name>' 

        

Find the 'event_plugin_name' option in the file, and add 'ccmgr' to it.

            

            

Event plugin name(s) 
event_plugin_name ccmgr

        

Run the SM with the new options file: 'opensm -F <options-file-name>'

Warning

Once the Congestion Control is enabled on the fabric nodes, to completely disable Congestion Control, you will need to actively turn it off. Running the SM w/o the CC Manager is not sufficient, as the hardware still continues to function in accordance to the previous CC configuration.

For further information on how to turn OFF CC, please refer to “ Configuring Congestion Control Manager” section below.

Configuring Congestion Control Manager

Congestion Control (CC) Manager comes with a predefined set of setting. However, you can fine-tune the CC mechanism and CC Manager behaviour by modifying some of the options. To do so, perform the following:

Find the 'event_plugin_options' option in the SM options file, and add the following:

            

            

conf_file <cc-mgr-options-file-name>':
Options string that would be passed to the plugin(s) 
event_plugin_options ccmgr --conf_file <cc-mgr-options-file-name>

        

Run the SM with the new options file: 'opensm-F<options-file-name>'.

Warning

To turn CC OFF, set ‘enable’ to ‘FALSE’ in the Congestion Control Manager configuration file, and run OpenSM ones with this configuration.

For further details on the list of CC Manager options, please refer to the IB spec.

Configuring Congestion Control Manager Main Settings

To fine-tune CC mechanism and CC Manager behaviour, and set the CC manager main settings, enable/disable Congestion Control mechanism on the fabric nodes, set the following

Parameter

Values

Default

enable

<TRUE FALSE>

TRUE

CC manager configures CC mechanism behaviour based on the fabric size. The larger the fabric is, the more aggressive CC mechanism is in its response to congestion. To manually modify CC manager behaviour by providing it with an arbitrary fabric size, set the following parameter:

Parameter
	

Values
	

Default

num_hosts
	

[0-48K]
	

0 (based on the CCT calculation on the current subnet size)

The smaller the number value of the parameter, the faster HCAs will respond to the congestion and will throttle the traffic. Note that if the number is too low, it will result in suboptimal bandwidth. To change the mean number of packets between marking eligible packets with a FECN, set the following parameter:

Parameter
	

Values
	

Default

marking_rate
	

[0-0xffff]
	

0xa

You can set the minimal packet size that can be marked with FECN. Any packet less than this size [bytes] will not be marked with FECN. To do so, set the following parameter:

Parameter
	

Values
	

Default

packet_size
	

[0-0x3fc0]
	

0x200

When number of errors exceeds 'max_errors' of send/receive errors or timeouts in less than 'error_window' seconds, the CC MGR will abort and will allow OpenSM to proceed. To do so, set the following parameters:

Parameter
	

Values
	

Default

max_errors
	

0: zero tollerance - abort configuration on first error
	

error_window
	

0: mechanism disabled - no error checking.[0-48K]
	

5

Congestion Control Manager Options File

Option File

Description

Values

Default Value

enable

Enables/disables Congestion Control mechanism on the fabric nodes.

<TRUE FALSE>

TRUE

num_hosts

Indicates the number of nodes. The CC table values are calculated based on this number.

[0-48K]

0 (base on the CCT calculation on the current subnet size)

threshold

Indicates how aggressive the congestion mark- ing should be.

[0-0xf]

0 - no packet marking

0xf - very aggressive

0xf

marking_rate

The mean number of packets between marking eligible packets with a FECN

[0-0xffff]

0xa

packet_size

Any packet less than this size [bytes] will not be marked with FECN.

[0-0x3fc0]

0x200

port_control

Specifies the Congestion Control attribute for this port

0 - QP based congestion control

1 - SL/Port based congestion control

0

ca_control_- map

An array of sixteen bits, one for each SL. Each bit indicates whether or not the corresponding SL entry is to be modified.

0xffff

ccti_increase

Sets the CC Table Index (CCTI) increase.

1

trigger_threshold

Sets the trigger threshold.

2

ccti_min

Sets the CC Table Index (CCTI) minimum.

0

cct

Sets all the CC table entries to a specified value . The first entry will remain 0, whereas last value will be set to the rest of the table.

Values:

0

When the value is set to 0, the CCT calculation is based on the number of nodes.

ccti_timer

Sets for all SL’s the given ccti timer.

0

When the value is set to 0, the CCT calculation is based on the number of nodes.

max_errors error_window

When number of errors exceeds ‘max_errors’ of send/receive errors or time outs in less than ‘error_window’ seconds, the CC MGR will abort and will allow OpenSM to proceed.

max_errors = 0: zero tolerance - abort configuration on first error.

error_window = 0: mechanism disabled - no error checking.

5

DOS MAD Prevention

DOS MAD prevention is achieved by assigning a threshold for each agent’s RX. Agent’s RX threshold provides a protection mechanism to the host memory by limiting the agents’ RX with a threshold. Incoming MADs above the threshold are dropped and are not queued to the agent’s RX.

Procedure_Heading_Icon.PNG

To enable DOS MAD Prevention:

Go to /etc/modprobe.d/mlnx.conf.

Add to the file the option below.

            

            

ib_umad enable_rx_threshold 1 

The threshold value can be controlled from the user-space via libibumad.

To change the value, use the following API:

int umad_update_threshold(int fd, int threshold);

@fd: file descriptor, agent’s RX associated to this fd. @threshold: new threshold value

MAD Congestion Control Warning

MAD Congestion Control is supported in both mlx4 and mlx5 drivers.

The SA Management Datagrams (MAD) are General Management Packets (GMP) used to communicate with the SA entity within the InfiniBand subnet. SA is normally part of the subnet manager, and it is contained within a single active instance. Therefore, congestion on the SA communication level may occur. Congestion control is done by allowing max_outstanding MADs only, where outstanding MAD means that is has no response yet. It also holds a FIFO queue that holds the SA MADs that their sending is delayed due to max_outstanding overflow. The length of the queue is queue_size and meant to limit the FIFO growth beyond the machine memory capabilities. When the FIFO is full, SA MADs will be dropped, and the drops counter will increment accordingly. When time expires (time_sa_mad) for a MAD in the queue, it will be removed from the queue and the user will be notified of the item expiration. This features is implemented per CA port. The SA MAD congestion control values are configurable using the following sysfs entries:

/sys/class/infiniband/mlx5_0/mad_sa_cc/ +– 1 ¦ +– drops ¦ +– max_outstanding ¦ +– queue_size ¦ +– time_sa_mad +– 2 +– drops +– max_outstanding +– queue_size +– time_sa_mad

Procedure_Heading_Icon.PNG

To print the current value:

cat /sys/class/infiniband/mlx5_0/mad_sa_cc/1/max_outstanding 16

change the current value:

echo 32 > /sys/class/infiniband/mlx5_0/mad_sa_cc/1/max_outstanding cat /sys/class/infiniband/mlx5_0/mad_sa_cc/1/max_outstanding 32

reset the drops counter:

echo 0 > /sys/class/infiniband/mlx5_0/mad_sa_cc/1/drops

Note: The path to the parameter is similar in mlx4 driver:

/sys/class/infiniband/mlx4_0/mad_sa_cc/

Parameters’ Valid Ranges

Parameter

Range

Default Values

MIN

MAX

max_oustanding

1

2^20

16

queue_size

16

2^20

16

time_sa_mad

1 milliseconds

10000

20 milliseconds

IB Router Support in OpenSM

In order to enable the IB router in OpenSM, the following parameters should be configured:

IB Router Parameters for OpenSM

Parameter

Description

Default Value

rtr_pr_flow_label

Defines whether the SM should create alias GUIDs required for router support for each port.

Defines flow label value to use in response for path records related to the router.

0 (Disabled)

rtr_pr_tclass

Defines TClass value to use in response for path records related to the router

0

rtr_pr_sl

Defines sl value to use in response for path records related to router.

0

rtr_p_mtu

Defines MTU value to use in response for path records related to the router.

4 (IB_MTU_LEN_2048)

rtr_pr_rate

Defines rate value to use in response for path records related to the router.

16 (IB_PATH_RE- CORD_RATE_100_GBS)

OpenSM Activity Report

OpenSM can produce an activity report in a form of a dump file which details the different activities done in the SM. Activities are divided into subjects. The OpenSM Supported Activities table below specifies the different activities currently supported in the SM activity report.Reporting of each subject can be enabled individually using the configuration parameter activity_report_subjects:

Valid values:Comma separated list of subjects to dump. The current supported subjects are:

"mc" - activity IDs 1, 2 and 8

"prtn" - activity IDs 3, 4, and 5

"virt" - activity IDs 6 and 7

"routing" - activity IDs 8-12

Two predefined values can be configured as well:

    "all" - dump all subjects

    "none" - disable the feature by dumping none of the subjects

Default value: "none"

OpenSM Supported Activities

ACtivity ID

Activity Name

Additional Fields

Comments

Description

1

mcm_member

MLid

MGid

Port Guid

Join State

Join state:

1 - Join

-1 - Leave

Member joined/ left MC group

2

mcg_change

MLid

MGid

Change

Change:

0 - Create

1 - Delete

MC group created/deleted

3

prtn_guid_add

Port Guid

PKey

Block index

Pkey Index

Guid added to partition

4

prtn_create

-PKey

Prtn Name

Partition created

5

prtn_delete

PKey

Delete Reason

Delete Reason:

0 - empty prtn

1 - duplicate prtn

2 - sm shutdown

Partition deleted

6

port_virt_discover

Port Guid

Top Index

Port virtualisation discovered

7

vport_state_change

Port Guid

VPort Guid

VPort Index

VNode Guid

VPort State

VPort State:

1 - Down

2 - Init

3 - ARMED

4 - Active

Vport state changed

8

mcg_tree_calc

mlid

MCast group tree calculated

9

routing_succeed

routing engine name

Routing done successfully

10

routing_failed

routing engine name

Routing failed

11

ucast_cache_invali- dated

ucast cache invalidated

12

ucast_cache_rout- ing_done

ucast cache routing done Offsweep Balancing

When working with minhop/dor/updn, subnet manager can re-balance routing during idle time (between sweeps).

offsweep_balancing_enabled - enables/disables the feature. Examples:

    offsweep_balancing_enabled = TRUE

    offsweep_balancing_enabled = FALSE (default)

offsweep_balancing_window - defines window of seconds to wait after sweep before starting the re-balance process. Applicable only if offsweep_balancing_enabled=TRUE. Example:

offsweep_balancing_window = 180 (default)

© Copyright 2023, NVIDIA. Last updated on Oct 23, 2023.

Updated: