Chapter 2. Working with sysctl and kernel tunables
https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/7/html/kernel_administration_guide/working_with_sysctl_and_kernel_tunables
Chapter 2. Working with sysctl and kernel tunables
2.1. What is a kernel tunable?
Copy link
Kernel tunables are used to customise the behaviour of Red Hat Enterprise Linux at boot, or on demand while the system is running. Some hardware parameters are specified at boot time only and cannot be altered once the system is running, most however, can be altered as required and set permanent for the next boot.
2.2. How to work with kernel tunables
There are three ways to modify kernel tunables.
Using the sysctl command
By manually modifying configuration files in the /etc/sysctl.d/ directory
Through a shell, interacting with the virtual file system mounted at /proc/sys
Note
Not all boot time parameters are under control of the sysfs subsystem, some hardware specific option must be set on the kernel command line, the Kernel Parameters section of this guide addresses those options
2.2.1. Using the sysctl command
The sysctl command is used to list, read, and set kernel tunables. It can filter tunables when listing or reading and set tunables temporarily or permanently.
Listing variables
# sysctl -a
Reading variables
sysctl kernel.version
kernel.version = #1 SMP Fri Jan 19 13:19:54 UTC 2018
Writing variables temporarily
sysctl .=
Writing variables permanently
sysctl -w .= >> /etc/sysctl.conf
2.2.2. Modifying files in /etc/sysctl.d
To override a default at boot, you can also manually populate files in /etc/sysctl.d.
Create a new file in /etc/sysctl.d
# vim /etc/sysctl.d/99-custom.conf
Include the variables you wish to set, one per line, in the following form
. = +
. =
Save the file
Either reboot the machine to make the changes take effect
or
Execute sysctl -p /etc/sysctl.d/99-custom.conf to apply the changes without rebooting
2.3. What tunables can be controlled?
Tunables are divided into groups by kernel sybsystem. A Red Hat Enterprise Linux system has the following classes of tunables:
Table 2.1. Table of sysctl interfacesClass Subsystem
abi
Execution domains and personalities
crypto
Cryptographic interfaces
debug
Kernel debugging interfaces
dev
Device specific information
fs
Global and specific filesystem tunables
kernel
Global kernel tunables
net
Network tunables
sunrpc
Sun Remote Procedure Call (NFS)
user
User Namespace limits
vm
Tuning and management of memory, buffer, and cache
2.3.1. Network interface tunables
System administrators are able to adjust the network configuration on a running system through the networking tunables.
Networking tunables are included in the /proc/sys/net directory, which contains multiple subdirectories for various networking topics. To adjust the network configuration, system administrators need to modify the files within such subdirectories.
The most frequently used directories are:
/proc/sys/net/core/
/proc/sys/net/ipv4/
The /proc/sys/net/core/ directory contains a variety of settings that control the interaction between the kernel and networking layers. By adjusting some of those tunables, you can improve performance of a system, for example by increasing the size of a receive queue, increasing the maximum connections or the memory dedicated to network interfaces. Note that the performance of a system depends on different aspects according to the individual issues.
The /proc/sys/net/ipv4/ directory contains additional networking settings, which are useful when preventing attacks on the system or when using the system to act as a router. The directory contains both IP and TCP variables. For detailed explaination of those variables, see /usr/share/doc/kernel-doc-/Documentation/networking/ip-sysctl.txt.
Other directories within the /proc/sys/net/ipv4/ directory cover different aspects of the network stack:
/proc/sys/net/ipv4/conf/ - allows you to configure each system interface in different ways, including the use of default settings for unconfigured devices and settings that override all special configurations
/proc/sys/net/ipv4/neigh/ - contains settings for communicating with a host directly connected to the system and also contains different settings for systems more than one step away
/proc/sys/net/ipv4/route/ - contains specifications that apply to routing with any interfaces on the system
This list of network tunables is relevant to IPv4 interfaces and are accessible from the /proc/sys/net/ipv4/{all,}/ directory.
Description of the following parameters have been adopted from the kernel documentation sites.[1]
log_martians
Log packets with impossible addresses to kernel log.
Type Default
Boolean
0
Enabled if one or more of conf/{all,interface}/log_martians is set to TRUE
Further Resources
What is the kernel parameter net.ipv4.conf.all.log_martians for?
Why do I see "martian source" logs in the messages file ?
accept_redirects
Accept ICMP redirect messages.
Type Default
Boolean
1
accept_redirects for the interface is enabled under the following conditions:
Both conf/{all,interface}/accept_redirects are TRUE (when forwarding for the interface is enabled)
At least one of conf/{all,interface}/accept_redirects is TRUE (forwarding for the interface is disabled)
For more information refer to How to enable or disable ICMP redirects
forwarding
Enable IP forwarding on an interface.
Type Default
Boolean
0
Further Resources
Turning on Packet Forwarding and Nonlocal Binding
mc_forwarding
Do multicast routing.
Type Default
Boolean
0
Read only value
A multicast routing daemon is required.
conf/all/mc_forwarding must also be set to TRUE to enable multicast routing for the interface
Further Resources
For an explanation of the read only behaviour, see Why system reports "permission denied on key" while setting the kernel parameter "net.ipv4.conf.all.mc_forwarding"?
medium_id
Arbitrary value used to differentiate the devices by the medium they are attached to.
Type Default
Integer
0
Notes
Two devices on the same medium can have different id values when the broadcast packets are received only on one of them.
The default value 0 means that the device is the only interface to its medium
value of -1 means that medium is not known.
Currently, it is used to change the proxy_arp behaviour:
the proxy_arp feature is enabled for packets forwarded between two devices attached to different media.
Further Resources - For examples, see Using the "medium_id" feature in Linux 2.2 and 2.4
proxy_arp
Do proxy arp.
Type Default
Boolean
0
proxy_arp for the interface is enabled if at least one of conf/{all,interface}/proxy_arp is set to TRUE, otherwise it is disabled
proxy_arp_pvlan
Private VLAN proxy arp.
Type Default
Boolean
0
Allow proxy arp replies back to the same interface, to support features like RFC 3069
shared_media
Send(router) or accept(host) RFC1620 shared media redirects.
Type Default
Boolean
1
Notes
Overrides secure_redirects.
shared_media for the interface is enabled if at least one of conf/{all,interface}/shared_media is set to TRUE
secure_redirects
Accept ICMP redirect messages only to gateways listed in the interface’s current gateway list.
Type Default
Boolean
1
Notes
Even if disabled, RFC1122 redirect rules still apply.
Overridden by shared_media.
secure_redirects for the interface is enabled if at least one of conf/{all,interface}/secure_redirects is set to TRUE
send_redirects
Send redirects, if router.
Type Default
Boolean
1
Notes
send_redirects for the interface is enabled if at least one of conf/{all,interface}/send_redirects is set to TRUE
bootp_relay
Accept packets with source address 0.b.c.d destined not to this host as local ones.
Type Default
Boolean
0
Notes
A BOOTP daemon must be enabled to manage these packets
conf/all/bootp_relay must also be set to TRUE to enable BOOTP relay for the interface
Not implemented, see DHCP Relay Agent in the Red Hat Enterprise Linux Networking Guide
accept_source_route
Accept packets with SRR option.
Type Default
Boolean
1
Notes
conf/all/accept_source_route must also be set to TRUE to accept packets with SRR option on the interface
accept_local
Accept packets with local source addresses.
Type Default
Boolean
0
Notes
In combination with suitable routing, this can be used to direct packets between two local interfaces over the wire and have them accepted properly.
rp_filter must be set to a non-zero value in order for accept_local to have an effect.
route_localnet
Do not consider loopback addresses as martian source or destination while routing.
Type Default
Boolean
0
Notes
This enables the use of 127/8 for local routing purposes.
rp_filter
Enable source Validation
Type Default
Integer
0
Value Effect
0
No source validation.
1
Strict mode as defined in RFC3704 Strict Reverse Path
2
Loose mode as defined in RFC3704 Loose Reverse Path
Notes
Current recommended practice in RFC3704 is to enable strict mode to prevent IP spoofing from DDos attacks.
If using asymmetric routing or other complicated routing, then loose mode is recommended.
The highest value from conf/{all,interface}/rp_filter is used when doing source validation on the {interface}
arp_filter
Type Default
Boolean
0
Value Effect
0
(default) The kernel can respond to arp requests with addresses from other interfaces. It usually makes sense, because it increases the chance of successful communication.
1
Allows you to have multiple network interfaces on the samesubnet, and have the ARPs for each interface be answered based on whether or not the kernel would route a packet from the ARP’d IP out that interface (therefore you must use source based routing for this to work). In other words it allows control of cards (usually 1) that respond to an arp request.
Note
IP addresses are owned by the complete host on Linux, not by particular interfaces. Only for more complex setups like load-balancing, does this behaviour cause problems.
arp_filter for the interface is enabled if at least one of conf/{all,interface}/arp_filter is set to TRUE
arp_announce
Define different restriction levels for announcing the local source IP address from IP packets in ARP requests sent on interface
Type Default
Integer
0
Value Effect
0
(default) Use any local address, configured on any interface
1
Try to avoid local addresses that are not in the target’s subnet for this interface. This mode is useful when target hosts reachable via this interface require the source IP address in ARP requests to be part of their logical network configured on the receiving interface. When we generate the request we check all our subnets that include the target IP and preserve the source address if it is from such subnet. If there is no such subnet we select source address according to the rules for level 2.
2
Always use the best local address for this target. In this mode we ignore the source address in the IP packet and try to select local address that we prefer for talks with the target host. Such local address is selected by looking for primary IP addresses on all our subnets on the outgoing interface that include the target IP address. If no suitable local address is found we select the first local address we have on the outgoing interface or on all other interfaces, with the hope we receive reply for our request and even sometimes no matter the source IP address we announce.
Notes
The highest value from conf/{all,interface}/arp_announce is used.
Increasing the restriction level gives more chance for receiving answer from the resolved target while decreasing the level announces more valid sender’s information.
arp_ignore
Define different modes for sending replies in response to received ARP requests that resolve local target IP addresses
Type Default
Integer
0
Value Effect
0
(default): reply for any local target IP address, configured on any interface
1
reply only if the target IP address is local address configured on the incoming interface
2
reply only if the target IP address is local address configured on the incoming interface and both with the sender’s IP address are part from same subnet on this interface
3
do not reply for local addresses configured with scope host, only resolutions for global and link addresses are replied
4-7
reserved
8
do not reply for all local addresses The max value from conf/{all,interface}/arp_ignore is used when ARP request is received on the {interface}
Notes
arp_notify
Define mode for notification of address and device changes.
Type Default
Boolean
0
Value Effect
0
do nothing
1
Generate gratuitous arp requests when device is brought up or hardware address changes.
Notes
arp_accept
Define behaviour for gratuitous ARP frames who’s IP is not already present in the ARP table
Type Default
Boolean
0
Value Effect
0
do not create new entries in the ARP table
1
create new entries in the ARP table.
Notes
Both replies and requests type gratuitous arp trigger the ARP table to be updated, if this setting is on. If the ARP table already contains the IP address of the gratuitous arp frame, the arp table is updated regardless if this setting is on or off.
app_solicit
The maximum number of probes to send to the user space ARP daemon via netlink before dropping back to multicast probes (see mcast_solicit).
Type Default
Integer
0
Notes
See mcast_solicit
disable_policy
Disable IPSEC policy (SPD) for this interface
Type Default
Boolean
0
needinfo
disable_xfrm
Disable IPSEC encryption on this interface, whatever the policy
Type Default
Boolean
0
needinfo
igmpv2_unsolicited_report_interval
The interval in milliseconds in which the next unsolicited IGMPv1 or IGMPv2 report retransmit takes place.
Type Default
Integer
10000
Notes
Milliseconds
igmpv3_unsolicited_report_interval
The interval in milliseconds in which the next unsolicited IGMPv3 report retransmit takes place.
Type Default
Integer
1000
Notes
Milliseconds
tag
Allows you to write a number, which can be used as required.
Type Default
Integer
0
xfrm4_gc_thresh
The threshold at which we start garbage collecting for IPv4 destination cache entries.
Type Default
Integer
1
Notes
At twice this value the system refuses new allocations.
2.3.2. Global kernel tunables
System administrators are able to configure and monitor general settings on a running system through the global kernel tunables.
Global kernel tunables are included in the /proc/sys/kernel/ directory either directly as named control files or grouped in further subdirectories for various configuration topics. To adjust the global kernel tunables, system administrators need to modify the control files.
Descriptions of the following parameters have been adopted from the kernel documentation sites.[2]
dmesg_restrict
Indicates whether unprivileged users are prevented from using the dmesg command to view messages from the kernel’s log buffer.
For further information, see Kernel sysctl documentation.
core_pattern
Specifies a core dumpfile pattern name.
Max length Default
128 characters
"core"
For further information, see Kernel sysctl documentation.
hardlockup_panic
Controls the kernel panic when a hard lockup is detected.
Type Value Effect
Integer
0
kernel does not panic on hard lockup
Integer
1
kernel panics on hard lockup
In order to panic, the system needs to detect a hard lockup first. The detection is controlled by the nmi_watchdog parameter.
Further Resources
Kernel sysctl documentation
Softlockup detector and hardlockup detector
softlockup_panic
Controls the kernel panic when a soft lockup is detected.
Type Value Effect
Integer
0
kernel does not panic on soft lockup
Integer
1
kernel panics on soft lockup
By default, on RHEL7 this value is 0.
For more information about softlockup_panic, see kernel_parameters.
kptr_restrict
Indicates whether restrictions are placed on exposing kernel addresses via /proc and other interfaces.
Type Default
Integer
0
Value Effect
0
hashes the kernel address before printing
1
replaces printed kernel pointers with 0’s under certain conditions
2
replaces printed kernel pointers with 0’s unconditionally
To learn more, see Kernel sysctl documentation.
nmi_watchdog
Controls the hard lockup detector on x86 systems.
Type Default
Integer
0
Value Effect
0
disables the lockup detector
1
enables the lockup detector
The hard lockup detector monitors each CPU for its ability to respond to interrupts.
For more details, see Kernel sysctl documentation.
watchdog_thresh
Controls frequency of watchdog hrtimer, NMI events, and soft/hard lockup thresholds.
Default threshold Soft lockup threshold
10 seconds
2 * watchdog_thresh
Setting this tunable to zero disables lockup detection altogether.
For more info, consult Kernel sysctl documentation.
panic, panic_on_oops, panic_on_stackoverflow, panic_on_unrecovered_nmi, panic_on_warn, panic_on_rcu_stall, hung_task_panic
These tunables specify under what circumstances the kernel should panic.
To see more details about a group of panic parameters, see Kernel sysctl documentation.
printk, printk_delay, printk_ratelimit, printk_ratelimit_burst, printk_devkmsg
These tunables control logging or printing of kernel error messages.
For more details about a group of printk parameters, see Kernel sysctl documentation.
shmall, shmmax, shm_rmid_forced
These tunables control limits for shared memory.
For more information about a group of shm parameters, see Kernel sysctl documentation.
threads-max
Controls the maximum number of threads created by the fork() system call.
Min value Max value
20
Given by FUTEX_TID_MASK (0x3fffffff)
The threads-max value is checked against the available RAM pages. If the thread structures occupy too much of the available RAM pages, threads-max is reduced accordingly.
For more details, see Kernel sysctl documentation.
pid_max
PID allocation wrap value.
To see more information, refer to Kernel sysctl documentation.
numa_balancing
This parameter enables or disables automatic NUMA memory balancing. On NUMA machines, there is a performance penalty if remote memory is accessed by a CPU.
For more details, see Kernel sysctl documentation.
numa_balancing_scan_period_min_ms, numa_balancing_scan_delay_ms, numa_balancing_scan_period_max_ms, numa_balancing_scan_size_mb
These tunables detect if pages are properly placed of if the data should be migrated to a memory node local to where the task is running.
For more details about a group of numa_balancing_scan parameters, see Kernel sysctl documentation.
[1] https://www.kernel.org/doc/Documentation/
[2] https://www.kernel.org/doc/Documentation/