ip(7)
NAME
ip - Linux IPv4 protocol implementation
SYNOPSIS
#include <sys/socket.h>
#include <net/netinet.h>
tcp_socket = socket(PF_INET, SOCK_STREAM, 0);
raw_socket = socket(PF_INET, SOCK_RAW, protocol);
udp_socket = socket(PF_INET, SOCK_DGRAM, protocol);
DESCRIPTION
Linux implements the Internet Protocol, version 4,
described in RFC791 and RFC1122. ip contains a level 2
multicasting implementation conforming to RFC1112. It
also contains an IP router including a packet filter.
The programmer's interface is BSD sockets compatible. For
more information on sockets, see socket(7).
An IP socket is created by calling the socket(2) function
as socket(PF_INET, socket_type, protocol). Valid socket
types are SOCK_STREAM to open a tcp(7) socket, SOCK_DGRAM
to open a udp(7) socket, or SOCK_RAW to open a raw(7)
socket to access the IP protocol directly. protocol is
the IP protocol in the IP header to be received or sent.
The only valid values for protocol are 0 and IPPROTO_TCP
for TCP sockets and 0 and IPPROTO_UDP for UDP sockets.
For SOCK_RAW you may specify a valid IANA IP protocol
defined in RFC1700 assigned numbers.
When a process wants to receive new incoming packets or
connections, it should bind a socket to a local interface
address using bind(2). Only one IP socket may be bound to
any given local (address, port) pair. When INADDR_ANY is
specified in the bind call the socket will be bound to all
local interfaces. When listen(2) or connect(2) are called
on a unbound socket the socket is automatically bound to a
random free port with the local address set to INADDR_ANY.
A TCP local socket address that has been bound is unavail
able for some time after closing, unless the SO_REUSEADDR
flag has been set. Care should be taken when using this
flag as it makes TCP less reliable.
ADDRESS FORMAT
An IP socket address is defined as a combination of an IP
interface address and a port number. The basic IP protocol
does not supply port numbers, they are implemented by
higher level protocols like udp(7) and tcp(7). On raw
sockets sin_port is set to the IP protocol.
struct sockaddr_in {
sa_family_t sin_family; /* address family: AF_INET */
u_int16_t sin_port; /* port in network byte order */
struct in_addr sin_addr; /* internet address */
};
/* Internet address. */
struct in_addr {
u_int32_t s_addr; /* address in network byte order */
};
sin_family is always set to AF_INET. This is required; in
Linux 2.2 most networking functions return EINVAL when
this setting is missing. sin_port contains the port in
network byte order. The port numbers below 1024 are called
reserved ports. Only processes with effective user id 0
or the CAP_NET_BIND_SERVICE capability may bind(2) to
these sockets. Note that the raw IPv4 protocol as such has
no concept of a port, they are only implemented by higher
protocols like tcp(7) and udp(7).
sin_addr is the IP host address. The addr member of
struct in_addr contains the host interface address in net
work order. in_addr should be only accessed using the
inet_aton(3), inet_addr(3), inet_makeaddr(3) library func
tions or directly with the name resolver (see gethostbyname(3)).
IPv4 addresses are divided into unicast, broad
cast and multicast addresses. Unicast addresses specify a
single interface of a host, broadcast addresses specify
all hosts on a network and multicast addresses address all
hosts in a multicast group. Datagrams to broadcast
addresses can be only sent or received when the SO_BROAD
CAST socket flag is set. In the current implementation
connection oriented sockets are only allowed to use uni
cast addresses.
Note that the address and the port are always stored in
network order. In particular, this means that you need to
call htons(3) on the number that is assigned to a port.
All address/port manipulation functions in the standard
library work in network order.
There are several special addresses: INADDR_LOOPBACK
(127.0.0.1) always refers to the local host via the loop
back device; INADDR_ANY (0.0.0.0) means any address for
binding; INADDR_BROADCAST (255.255.255.255) means any host
and has the same effect on bind as INADDR_ANY for histori
cal reasons.
SOCKET OPTIONS
IP supports some protocol specific socket options that can
be set with setsockopt(2) and read with getsockopt(2).
The socket option level for IP is SOL_IP
IP_OPTIONS
Sets or get the IP options to be sent with every
packet from this socket. The arguments are a
pointer to a memory buffer containing the options
and the option length. The setsockopt(2) call sets
the IP options associated with a socket. The maxi
mum option size for IPv4 is 40 bytes. See RFC791
for the allowed options. When the initial connec
tion request packet for a SOCK_STREAM socket con
tains IP options, the IP options will be set auto
matically to the options from the initial packet
with routing headers reversed. Incoming packets
are not allowed to change options after the connec
tion is established. The processing of all incom
ing source routing options is disabled by default
and can be enabled by using the accept_source_route
sysctl. Other options like timestamps are still
handled. For datagram sockets, IP options can be
only set by the local user. Calling getsockopt(2)
with IP_OPTIONS puts the current IP options used
for sending into the supplied buffer.
IP_PKTINFO
Pass an IP_PKTINFO ancillary message that contains
a pktinfo structure that supplies some information
about the incoming packet. This only works for
datagram oriented sockets.
struct in_pktinfo
{
unsigned int ipi_ifindex; /* Interface index */
struct in_addr ipi_spec_dst; /* Routing destination address */
struct in_addr ipi_addr; /* Header Destination address */
};
ipi_ifindex is the unique index of the interface
the packet was received on. ipi_spec_dst is the
destination address of the routing table entry and
ipi_addr is the destination address in the packet
header. If IP_PKTINFO is passed to sendmsg(2) then
the outgoing packet will be sent over the interface
specified in ipi_ifindex with the destination
address set to ipi_spec_dst
IP_RECVTOS
If enabled the IP_TOS ancillary message is passed
with incoming packets. It contains a byte which
specifies the Type of Service/Precedence field of
the packet header. Expects a boolean integer flag.
IP_RECVTTL
When this flag is set pass a IP_RECVTTL control
message with the time to live field of the received
packet as a byte. Not supported for SOCK_STREAM
sockets.
IP_RECVOPTS
Pass all incoming IP options to the user in a
IP_OPTIONS control message. The routing header and
other options are already filled in for the local
host. Not supported for SOCK_STREAM sockets.
IP_RETOPTS
Identical to IP_RECVOPTS but returns raw unpro
cessed options with timestamp and route record
options not filled in for this hop.
IP_TOS Set or receive the Type-Of-Service (TOS) field that
is sent with every IP packet originating from this
socket. It is used to prioritize packets on the
network. TOS is a byte. There are some standard
TOS flags defined: IPTOS_LOWDELAY to minimize
delays for interactive traffic, IPTOS_THROUGHPUT to
optimize throughput, IPTOS_RELIABILITY to optimize
for reliability, IPTOS_MINCOST should be used for
"filler data" where slow transmission doesn't mat
ter. At most one of these TOS values can be speci
fied. Other bits are invalid and shall be cleared.
Linux sends IPTOS_LOWDELAY datagrams first by
default, but the exact behaviour depends on the
configured queueing discipline. Some high priority
levels may require an effective user id of 0 or the
CAP_NET_ADMIN capability. The priority can also be
set in a protocol independent way by the (
SOL_SOCKET, SO_PRIORITY ) socket option (see
socket(7) ).
IP_TTL Set or retrieve the current time to live field that
is send in every packet send from this socket.
IP_HDRINCL
If enabled the user supplies an ip header in front
of the user data. Only valid for SOCK_RAW sockets.
See raw(7) for more information. When this flag is
enabled the values set by IP_OPTIONS, IP_TTL and
IP_TOS are ignored.
IP_RECVERR
Enable extended reliable error message passing.
When enabled on a datagram socket all generated
errors will be queued in a per-socket error queue.
When the user receives an error from a socket oper
ation the errors can be received by calling
recvmsg(2) with the MSG_ERRQUEUE flag set. The
sock_extended_err structure describing the error
will be passed in a ancillary message with the type
IP_RECVERR and the level SOL_IP. This is useful
for reliable error handling on unconnected sockets.
The received data portion of the error queue con
tains the error packet.
IP uses the sock_extended_err structure as follows:
ee_origin is set to SO_EE_ORIGIN_ICMP for errors
received as an ICMP packet, or SO_EE_ORIGIN_LOCAL
for locally generated errors. ee_type and ee_code
are set from the type and code fields of the ICMP
header. ee_info contains the discovered MTU for
EMSGSIZE errors. ee_data is currently not used.
When the error originated from the network, all IP
options (IP_OPTIONS, IP_TTL, etc.) enabled on the
socket and contained in the error packet are passed
as control messages. The payload of the packet
causing the error is returned as normal data.
On SOCK_STREAM sockets, IP_RECVERR has slightly
different semantics. Instead of saving the errors
for the next timeout, it passes all incoming errors
immediately to the user. This might be useful for
very short-lived TCP connections which need fast
error handling. Use this option with care: it makes
TCP unreliable by not allowing it to recover prop
erly from routing shifts and other normal condi
tions and breaks the protocol specification. Note
that TCP has no error queue; MSG_ERRQUEUE is ille
gal on SOCK_STREAM sockets. Thus all errors are
returned by socket function return or SO_ERROR
only.
For raw sockets, IP_RECVERR enables passing of all
received ICMP errors to the application, otherwise
errors are only reported on connected sockets
It sets or retrieves an integer boolean flag.
IP_RECVERR defaults to off.
IP_PMTU_DISCOVER
Sets or receives the Path MTU Discovery setting for
a socket. When enabled, Linux will perform Path MTU
Discovery as defined in RFC1191 on this socket. The
don't fragment flag is set on all outgoing
datagrams. The system-wide default is controlled
by the ip_no_pmtu_disc sysctl for SOCK_STREAM sock
ets, and disabled on all others. For non
SOCK_STREAM sockets it is the user's responsibility
to packetize the data in MTU sized chunks and to do
the retransmits if necessary. The kernel will
reject packets that are bigger than the known path
MTU if this flag is set (with EMSGSIZE ).
Path MTU discovery flags Meaning
IP_PMTUDISC_WANT Use per-route settings.
IP_PMTUDISC_DONT Never do Path MTU Discovery.
IP_PMTUDISC_DO Always do Path MTU Discovery.
When PMTU discovery is enabled the kernel automati
cally keeps track of the path MTU per destination
host. When it is connected to a specific peer with
connect(2) the currently known path MTU can be
retrieved conveniently using the IP_MTU socket
option (e.g. after a EMSGSIZE error occurred). It
may change over time. For connectionless sockets
with many destinations the new also MTU for a given
destination can also be accessed using the error
queue (see IP_RECVERR). A new error will be queued
for every incoming MTU update.
While MTU discovery is in progress initial packets
from datagram sockets may be dropped. Applications
using UDP should be aware of this and not take it
into account for their packet retransmit strategy.
To bootstrap the path MTU discovery process on
unconnected sockets it is possible to start with a
big datagram size (up to 64K-headers bytes long)
and let it shrink by updates of the path MTU.
To get an initial estimate of the path MTU connect
a datagram socket to the destination address using
connect(2) and retrieve the MTU by calling getsock
opt(2) with the IP_MTU option.
IP_MTU Retrieve the current known path MTU of the current
socket. Only valid when the socket has been con
nected. Returns an integer. Only valid as a get
sockopt(2).
IP_ROUTER_ALERT
Pass all to-be forwarded packets with the IP Router
Alert option set to this socket. Only valid for raw
sockets. This is useful, for instance, for user
space RSVP daemons. The tapped packets are not for
warded by the kernel, it is the users
responsibility to send them out again. Socket bind
ing is ignored, such packets are only filtered by
protocol. Expects an integer flag.
IP_MULTICAST_TTL
Set or reads the time-to-live value of outgoing
multicast packets for this socket. It is very
important for multicast packets to set the smallest
TTL possible. The default is 1 which means that
multicast packets don't leave the local network
unless the user program explicitly requests it.
Argument is an integer.
IP_MULTICAST_LOOP
Sets or reads a boolean integer argument whether
sent multicast packets should be looped back to the
local sockets.
IP_ADD_MEMBERSHIP
Join a multicast group. Argument is a struct
ip_mreqn structure.
struct ip_mreqn
{
struct in_addr imr_multiaddr; /* IP multicast group address */
struct in_addr imr_address; /* IP address of local interface */
int imr_ifindex; /* interface index */
};
imr_multiaddr contains the address of the multicast
group the application wants to join or leave. It
must be a valid multicast address. imr_address is
the address of the local interface with which the
system should join the multicast group; if it is
equal to INADDR_ANY an appropriate interface is
chosen by the system. imr_ifindex is the interface
index of the interface that should join/leave the
imr_multiaddr group, or 0 to indicate any inter
face.
For compatibility, the old ip_mreq structure is
still supported. It differs from ip_mreqn only by
not including the imr_ifindex field. Only valid as
a setsockopt(2).
IP_DROP_MEMBERSHIP
Leave a multicast group. Argument is an ip_mreqn or
ip_mreq structure similar to IP_ADD_MEMBERSHIP.
IP_MULTICAST_IF
Set the local device for a multicast socket. Argu
ment is an ip_mreqn or ip_mreq structure similar to
IP_ADD_MEMBERSHIP.
When an invalid socket option is passed, ENOPRO
TOOPT is returned.
SYSCTLS
The IP protocol supports the sysctl interface to configure
some global options. The sysctls can be accessed by read
ing or writing the /proc/sys/net/ipv4/* files or using the
sysctl(2) interface.
ip_default_ttl
Set the default time-to-live value of outgoing
packets. This can be changed per socket with the
IP_TTL option.
ip_forward
Enable IP forwarding with a boolean flag. IP for
warding can be also set on a per interface basis.
ip_dynaddr
Enable dynamic socket address and masquerading
entry rewriting on interface address change. This
is useful for dialup interface with changing IP
addresses. 0 means no rewriting, 1 turns it on and
2 enables verbose mode.
ip_autoconfig
Not documented.
ip_local_port_range
Contains two integers that define the default local
port range allocated to sockets. Allocation starts
with the first number and ends with the second num
ber. Note that these should not conflict with the
ports used by masquerading (although the case is
handled). Also arbitary choices may cause problems
with some firewall packet filters that make assump
tions about the local ports in use. First number
should be at least >1024, better >4096 to avoid
clashes with well known ports and to minimize fire
wall problems.
ip_no_pmtu_disc
If enabled, don't do Path MTU Discovery for TCP
sockets by default. Path MTU discovery may fail if
misconfigured firewalls (that drop all ICMP pack
ets) or misconfigured interfaces (e.g., a point-to-
point link where the both ends don't agree on the
MTU) are on the path. It is better to fix the bro
ken routers on the path than to turn off Path MTU
Discovery globally, because not doing it incurs a
high cost to the network.
ipfrag_high_thresh, ipfrag_low_thresh
If the amount of queued IP fragments reaches
ipfrag_high_thresh , the queue is pruned down to
ipfrag_low_thresh . Contains an integer with the
number of bytes.
ip_always_defrag
[New with Kernel 2.2.13; in earlier kernel version
the feature was controlled at compile time by the
CONFIG_IP_ALWAYS_DEFRAG option]
When this boolean frag is enabled (not equal 0)
incoming fragments (parts of IP packets that arose
when some host between origin and destination
decided that the packets were too large and cut
them into pieces) will be reassembled (defrag
mented) before being processed, even if they are
about to be forwarded.
Only enable if running either a firewall that is
the sole link to your network or a transparent
proxy; never ever turn on here for a normal router
or host. Otherwise fragmented communication may me
disturbed when the fragments would travel over dif
ferent links. Defragmentation also has a large mem
ory and CPU time cost.
This is automagically turned on when masquerading
or transparent proxying are configured.
neigh/*
See arp(7).
IOCTLS
All ioctls described in socket(7) apply to ip.
The ioctls to configure firewalling are documented in
ipfw(7) from the ipchains package.
Ioctls to configure generic device parameters are
described in netdevice(7).
NOTES
Be very careful with the SO_BROADCAST option - it is not
privileged in Linux. It is easy to overload the network
with careless broadcasts. For new application protocols it
is better to use a multicast group instead of broadcast
ing. Broadcasting is discouraged.
Some other BSD sockets implementations provide IP_RCVD
STADDR and IP_RECVIF socket options to get the destination
address and the interface of received datagrams. Linux has
the more general IP_PKTINFO for the same task.
ERRORS
ENOTCONN
The operation is only defined on a connected
socket, but the socket wasn't connected.
EINVAL Invalid argument passed. For send operations this
can be caused by sending to a blackhole route.
EMSGSIZE
Datagram is bigger than an MTU on the path and it
cannot be fragmented.
EACCES The user tried to execute an operation without the
necessary permissions. These include: Sending a
packet to a broadcast address without having the
SO_BROADCAST flag set. Sending a packet via a pro
hibit route. Modifying firewall settings without
CAP_NET_ADMIN or effective user id 0. Binding to a
reserved port without the CAP_NET_BIND_SERVICE
capacibility or effective user id 0.
EADDRINUSE
Tried to bind to an address already in use.
ENOMEM and ENOBUFS
Not enough memory available.
ENOPROTOOPT and EOPNOTSUPP
Invalid socket option passed.
EPERM User doesn't have permission to set high priority,
change configuration, or send signals to the
requested process or group,
EADDRNOTAVAIL
A non-existent interface was requested or the
requested source address was not local.
EAGAIN Operation on a non-blocking socket would block.
ESOCKTNOSUPPORT
The socket is not configured or an unknown socket
type was requested.
EISCONN
connect(2) was called on an already connected
socket.
EALREADY
An connection operation on a non-blocking socket is
already in progress.
ECONNABORTED
A connection was closed during an accept(2).
EPIPE The connection was unexpectedly closed or shut down
by the other end.
ENOENT SIOCGSTAMP was called on a socket where no packet
arrived.
EHOSTUNREACH
No valid routing table entry matches the destina
tion address. This error can be caused by a ICMP
message from a remote router or for the local rout
ing table.
ENODEV Network device not available or not capable of
sending IP.
ENOPKG A kernel subsystem was not configured.
ENOBUFS, ENOMEM
Not enough free memory. This often means that the
memory allocation is limited by the socket buffer
limits, not by the system memory, but this is not
100% consistent.
Other errors may be generated by the overlaying protocols;
see tcp(7), raw(7), udp(7) and socket(7).
VERSIONS
IP_PKTINFO, IP_MTU, IP_PMTU_DISCOVER, IP_PKTINFO,
IP_RECVERR and IP_ROUTER_ALERT are new options in Linux
2.2.
struct ip_mreqn is new in Linux 2.2. Linux 2.0 only sup
ported ip_mreq.
The sysctls were introduced with Linux 2.2.
COMPATIBILITY
For compatibility with Linux 2.0, the obsolete
socket(PF_INET, SOCK_RAW, protocol) syntax is still sup
ported to open a packet(7) socket. This is deprecated and
should be replaced by socket(PF_PACKET, SOCK_RAW, proto
col) instead. The main difference is the new sockaddr_ll
address structure for generic link layer information
instead of the old sockaddr_pkt.
BUGS
There are too many inconsistent error values.
The ioctls to configure IP-specific interface options and
ARP tables are not described.
AUTHORS
This man page was written by Andi Kleen.
SEE ALSO
sendmsg(2), recvmsg(2), socket(7), netlink(7), tcp(7),
udp(7), raw(7), ipfw(7).
RFC791 for the original IP specification.
RFC1122 for the IPv4 host requirements.
RFC1812 for the IPv4 router requirements.
Man(1) output converted with
man2html