Network Basics and Network Abstraction in Linux

From docwiki
Jump to: navigation, search


Motivation

Before you learn the tools and commands for using the network in Linux you need a basic understanding of how networks work and this unit tries to bring you up to speed quickly.

The ISO-OSI 7 Layer Model

ISO-OSI 7 Layer Model

The 7 Layer Model is used to describe networks. The IP Protocol has not been developed within ISO and thus only roughly fits into the mode. Still it is a good picture to have in you mind when you think about networks.

On Top you have your applications. Like e.g. a Web-Browser. Below you need definitions on the details of how websites are encoded and transported via HTTP. And so on. And at the bottom we need specifications on how data is transmitted at the wire (or wireless): E.g. cable definitions, voltage levels, frequency, etc..

What we are looking at here is the Layer 2: That defines how data is encoded on a certain medium and in later units also layer3 (Routing - how packets are sent between networks).

Layer 2

From the abstraction in the operating system we basically have 2 different kind of physical medium:

broadcast
There is a local network where stations can send to each other and also there is a way to send to all stations on the network. Typically an ethernet network segment or a WiFi network.
point-to-Point
Two stations connected via a link and only those 2 stations can exchange data. Typically a dial-up connection, a network over a serial line or a virtual connection like a VPN tunnel.

The typical broadcast medium is ethernet and most network interfaces are of this type. In ethernet you have a 6 byte address that is used to address each station on the network. This is the so called hardware address or MAC Address. It is usually written in the form of 12 hex digits grouped into bytes by colons. E.g.: b0:35:9f:2a:29:7d. Each network card should have a unique MAC address. The first digits are assigned to a company and the last digits are counted up in the factory. The address mentioned belongs to an intel card.

Network Hub

In the old days the ethernet was built with a coaxial-cable that connected all computers. Today ethernet is usually built with twisted pair cables and RJ45 connectors. The cables run to a central switch or hub that distributes the packages to all stations. A hub would distribute every packet to every station. A switch is more intelligent: It learns the MAC address of each station and only distributes packets to the computer that was address. Of course, broadcasts are always sent to all station on that segment.

Most of the time we want to send TCP/IP packages. Those are encoded as payload within the ethernet frame. Within the TCP/IP there could be e.g. an HTTP request.

Ethernet Encapsulation

With IP we are already moving to layer 3.

Network Abstraction in Linux

If we use a network card of a different vendor we do not want to rewrite all our programs. So the Linux system has drivers for all different network cards and once the right drivers are installed we do not have to care about the particularities of each card. We only see a network interface. For the most part we also do not want to care about the details of sending packets, re-transmitting those that are lost, etc.. - we just want a connection to youtube.com to watch funny hamsters dancing. The Linux kernel provides most of the needed abstraction here:

Network Stack

On the bottom the Linux kernel has drivers for each type of card. Most of the protocol for Ethenet, IP, TCP are all handled in the kernel. The user programs connect via a standardized library (libc) that offers them convenient functions for opening network connections, where they only need to specify the destination IP address.

Of course we also need tools to configure the network. The abstraction from the Linux kernel gives us so called interfaces. Most other hardware in Unix is typically abstracted as a device that has a device file below /dev. E.g. /dev/sda could be your hard-drive, while /dev/ttyUSB0 would be a serial port from a USB device. Network interfaces are different. They do not have device files but only interfaces. You can list the interfaces with:

$ ifconfig
$ ip link
$ ip addr

The ifconfig tool is actually depreciated because it does not support all the features of the Linux kernel anymore. The tool to use is ip. The ip link shows all your network interfaces and what type they are. ip addr also shows the IP addressed. Here is an output of both ifconfig and ip addr:

$ ifconfig
...
wlp3s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.79.105  netmask 255.255.255.0  broadcast 192.168.79.255
        inet6 fe80::290d:840f:e5a6:e72b  prefixlen 64  scopeid 0x20<link>
        ether b0:35:9f:2a:09:9d  txqueuelen 1000  (Ethernet)
        RX packets 210183  bytes 215596586 (205.6 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 91984  bytes 20003418 (19.0 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

$ ip addr
...
3: wlp3s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether b0:35:9f:2a:09:9d brd ff:ff:ff:ff:ff:ff
    inet 192.168.9.105/24 brd 192.168.9.255 scope global dynamic noprefixroute wlp3s0
       valid_lft 6556sec preferred_lft 6556sec
    inet6 fe80::290d:840f:e5a6:e72b/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever

Your output will have more interfaces but here I only show one interface. wlp3s0 which is actually my WiFi card. You see the MAC address and you see the IP addresses. In the ifconfig you also see that the number of incoming and outgoing packets and the number of bytes. You can see these statistics in ip with the -s option. E.g. ip -s addr

IPv4 vs IPv6

Most of the Internet sill uses IP Version4 (or IPv4 for short) with its 232 addresses written in the well known form of 4 decimal numbers separated by 3 dots. e.g: 192.168.92.113. For a long time Linux also supports IPv6 with its 128 bit addresses: Written as 8 groups of 16 bit numbers written in Hex and seperated by colons. E.g: 2001:0db8:85a3:0000:0000:8a2e:4370:c33a

Since most companies do not want a direct network connection to the outside but use a firewall where they can also run NAT (Network Address Translation), most company networks use private IP address inside and most of their infrastructure still runs on IPv4.

I will also focus on IPv4 here. Most tools in Linux also support IPv6. Either you have to append a 6 to the name of the tool or use the option -6.

IP over Ethernet

We learned that the local communication between computers on an ethernet is via ethernet frames that are addressed with MAC addresses and the TCIP/IP frames are the payload of the ethernet frames. But how does a station now which IP address on a local network and which MAC address belongs to which IP address?

The first part is via configuration. When an interface is set up we tell it which range of IP addresses belong to the local network. For this the CIDR (classless inter-domain routing) notation is used: E.g. we say: 192.168.1.0/24 is the network that should be found on a network interface. Which means that all addresses have the leading 24 bits as 192.168.1.xxx and the last 8 bits. (In this case this corresponds with the last number behind the dot) can all be found on the network.

CIDR

Where the boundary can be on any bit, but /24 is often used. E.g. We could define 10.11.12.128/25 where we would have all IPs from 10.11.12.128 to 10.11.12.255. Or we could have 192.168.99.64/29 where we would have the range 192.168.99.64 to 192.168.99.71, etc. When we have a /24 we call it a class-C, a /16 is a class-B network and a /8 is a class-A network. Everything as is classless.

Instead of using the CIDR / notation, the network is often specified with the netmask. The netmask is an IP address with all bits set to 1 that are part of the network. So /8 corresponds with 255.255.255.0.

Here is a short table:

CIDR netmask class example number of addresses in the range
/24 255.255.255.0 C 192.168.3.0/24 256
/16 255.255.0.0 B 10.11.23.0.0/15 65536
/8 255.0.0.0 A 10.0.0.0/8 16777216
/23 255.255.128.0 192.168.40.0/23 512
/27 255.255.255.224 192.168.0.64/27 32
/0 0.0.0.0 0.0.0.0/0 4294967296. the entire internet
/32 255.255.255.255 10.11.12.13/32 1 one host only

Try to find the netmask/CIDR information in the ifconfig/ip addr output from above.

The highest IP address in each network (with all 1-bits in the digits that belong to the net) is usually reserved as as broadcast address and should not be used for normal stations. E.g. 192.168.0.255 in a 192.168.0.0/24.

ARP

The second part of problem: So when we give an interface the address 192.168.99.17 and define a 192.168.99.0/24 network on that interface, how does the system find the MAC addresses of the other stations?

A part of the TCP/IP protocol is responsible for this. The ARP (Address Resolution Protocol). If you want to send to a station that should be on the local network (because it fits in the network range configured), then you have to send out a broadcast asking all stations to tell you if they have the IP and to answer with their MAC address.

A computer usually caches that information for 2 minutes and then asks again. So if that ever changes it will find out. Also a station can send out an ARP information on its own, a so called gratuitous ARP. This is useful if the information changes. Also remember, the switch needs to have a table where it knows which MAC addresses can be found on which port.

The tool in Linux to find out what is in the arp cache is called arp. Here is an example output:

(some tools like this are only useful for the root users. So run this as root. You can run it as a normal user as well but then you might need to give it the full path name. e.g. /sbin/arp or /usr/sbin/arp

$ arp
$ /usr/sbin/arp
Address                  HWtype  HWaddress           Flags Mask            Iface
192.168.5.1             ether   18:d6:c7:f7:f3:2e   C                     wlp3s0
192.168.5.201           ether   80:ee:73:81:a5:9e   C                     wlp3s0

So here we see that the arp cache has 2 entries. One is the router and one is an other station on the network that we have communicated with in the last 2 minutes. You see the IP addres, the corresponding MAC address, and the network interface where this was found.

Loopback Interface

Each Linux (and windows) system has a, so called, lookback interface with the address 127.0.0.1. In fact the entire 127.0.0.0/8 is reserved for loopback. On this interface the computer can talk to itself. This is useful for programs that normally run network services, but in some cases should only be used by programs on your own computer. The name of the 127.0.0.1 should be localhost. In IPv6 the localhost is all zeros except the very last bit is 1: ::1

Private IP Space: RFC 1918

On your private network at home or in your company you often need networks that are not used in the public internet. For this you can use IP addresses from the ranges defined in RFC1918:

RFC1918 private IP addresses
10.0.0.0 to 10.255.255.255 10.0.0.0/8 could be divided into 65536 times /24
172.16.0.0 to 172.31.255.255 172.16.0.0/12 could be divided into 1024 /24 networks
192.168.0.0 to 192.168.255.255 192.168.0.0/16 could be divided into 256 networks with /24

Exercises

  • Find out which interfaces are on your computer. What is your own MAC address on each of the interfaces?
  • Find out how many bytes and how many packets have been sent there
  • Look at your arp cache. What is the MAC address of other stations? Try to use the ping IP-address to send packets to other stations and see if their MAC address shows up in the arp cache.