Linux Tech Support: 2007

Wednesday, October 10, 2007

Implementing NFS

NFS client and server support is actually built into the Linux kernel. The NFS server application is named rpc.nfsd and the client is rpc.mountd. There is also a quota support application named rps.rquotad. These NFS deamons are normally started at boot time from the script /etc/rc.d/init.d/nfs. Most Linux implementations include this NFS support by default.

The NFS script only operates if the /etc/exports file exists and is not empty (zero length). The /etc/exports

NFS Server Support

Dynamic sharing of directories is done by rpc.nfsd using the exportfs program that changes the /etc/exports file. The following is an example using exportfs:

exportfs clientDomainName:/a/path/name/on/the/server
exportfs -o rw  :/a/path/name/on/the/server

The first exports the directory /a/path/name/on/the/server to a specified client. In this case the domain name is clientDomainName*.foo.com. This could also be an IP address or an IP address and subnet mask. NIS group names can also be used. The directory is exported as read-only when no options specified.

The second instance of exportfs exports the same directory but allows the world to access it. The exportfs supports a number of options. In this case, the command allows read-write access.

The exportfs program is also used to remove an export. This is done using the -u option as shown below:

exportfs -u client DomainName:/a/path/name/on/the/server

The /etc/exports file is used to define exported NFS directories when NFS is started. Each line in the file defines the directory to be exported and how the directory can be accessed. The following is a sample /etc/exports file:

/home/guest     (ro)
/pub                 *.local.dom(rw) (ro)

The first allows any user read-only access to the /home/guest directory. The second allows read-write access to computers with a domain name of local.dom and read-only access to everyone else.

-----------------------------------------------------------------------------------------------

The following methods can be used to specify host names:

single host — Where one particular host is specified with a fully qualified domain name, hostname, or IP address.
wildcards — Where a * or ? character is used to take into account a grouping of fully qualified domain names that match a particular string of letters. Wildcards should not be used with IP addresses; however, it is possible for them to work accidentally if reverse DNS lookups fail.
Be careful when using wildcards with fully qualified domain names, as they tend to be more exact than expected. For example, the use of *.example.com as a wildcard allows sales.example.com to access an exported file system, but not bob.sales.example.com. To match both possibilities both *.example.com and *.*.example.com must be specified.
IP networks — Allows the matching of hosts based on their IP addresses within a larger network. For example, 192.168.0.0/28 allows the first 16 IP addresses, from 192.168.0.0 to 192.168.0.15, to access the exported file system, but not 192.168.0.16 and higher.
netgroups — Permits an NIS netgroup name, written as @, to be used. This effectively puts the NIS server in charge of access control for this exported file system, where users can be added and removed from an NIS group without affecting /etc/exports.

--------------------------------------------------------------------------------------------

NFS export Options :

ro — Mounts of the exported file system are read-only. Remote hosts are not able to make changes to the data shared on the file system. To allow hosts to make changes to the file system, the read/write (rw) option must be specified.
wdelay — Causes the NFS server to delay writing to the disk if it suspects another write request is imminent. This can improve performance by reducing the number of times the disk must be accessed by separate write commands, reducing write overhead. The no_wdelay option turns off this feature, but is only available when using the sync option.
root_squash — Prevents root users connected remotely from having root privileges and assigns them the user ID for the user nfsnobody. This effectively "squashes" the power of the remote root user to the lowest local user, preventing unauthorized alteration of files on the remote server. Alternatively, the no_root_squash option turns off root squashing. To squash every remote user, including root, use the all_squash option. To specify the user and group IDs to use with remote users from a particular host, use the anonuid and anongid options, respectively. In this case, a special user account can be created for remote NFS users to share and specify (anonuid=,anongid=), where is the user ID number and is the group ID number.

The `exportfs` Command

Every file system being exported to remote users via NFS, as well as the access level for those file systems, are listed in the /etc/exports file. When the nfs service starts, the /usr/sbin/exportfs command launches and reads this file, passes control to rpc.mountd (if NFSv2 or NFSv3) for the actual mounting process, then to rpc.nfsd where the file systems are then available to remote users.

When issued manually, the /usr/sbin/exportfs command allows the root user to selectively export or unexport directories without restarting the NFS service. When given the proper options, the /usr/sbin/exportfs command writes the exported file systems to /var/lib/nfs/xtab. Since rpc.mountd refers to the xtab file when deciding access privileges to a file system, changes to the list of exported file systems take effect immediately.

The following is a list of commonly used options available for /usr/sbin/exportfs:

-r — Causes all directories listed in /etc/exports to be exported by constructing a new export list in /etc/lib/nfs/xtab. This option effectively refreshes the export list with any changes that have been made to /etc/exports.
-a — Causes all directories to be exported or unexported, depending on what other options are passed to /usr/sbin/exportfs. If no other options are specified, /usr/sbin/exportfs exports all file systems specified in /etc/exports.
-o file-systems — Specifies directories to be exported that are not listed in /etc/exports. Replace file-systems with additional file systems to be exported. These file systems must be formatted in the same way they are specified in /etc/exports. Refer to Section 9.3.1 The /etc/exports Configuration File for more information on /etc/exports syntax. This option is often used to test an exported file system before adding it permanently to the list of file systems to be exported.
-i — Ignores /etc/exports; only options given from the command line are used to define exported file systems.
-u — Unexports all shared directories. The command /usr/sbin/exportfs -ua suspends NFS file sharing while keeping all NFS daemons up. To re-enable NFS sharing, type exportfs -r.
-v — Verbose operation, where the file systems being exported or unexported are displayed in greater detail when the exportfs command is executed.

If no options are passed to the /usr/sbin/exportfs command, it displays a list of currently exported file systems.

For more information about the /usr/sbin/exportfs command, refer to the exportfs man page.

NIS - Client and Server Configuration

Network Information Service (NIS) is the traditional directory service on *nix platforms. The setup of NIS is relatively simple when compared to other directory services like LDAP. NIS stores administrative files like /etc/passwd, /etc/hosts and so on in Berkeley DB files. This data is made available over the network to all the clients that are connected to the NIS domain.

Drawback : The network connection is not encrypted and all transactions - including passwords - are sent in clear text.

Configuring an NIS Server

Make sure the following packages are installed in your machine:
ypserv : Contains the NIS server daemon (ypserv) and the NIS password daemon (yppasswdd).
portmap : mandatory
The yppasswdd daemon enables the NIS server to change the NIS database and password database information, at the client's request. In order to change your NIS password, the yppasswdd daemon must be running on the master server. From the client, one must use yppasswd to update a password within the NIS domain.

Insert the following line in the /etc/sysconfig/network file:
NISDOMAIN=mynisdomain

Specify the networks you wish NIS to recognize in /var/yp/securenets .
Eg:
# Permit access to localhost: host 127.0.0.1 #Permit access to xyz.com network: 255.255.255.0 192.168.0.0

Insert the following lines in the /var/yp/Makefile :
NOPUSH=true # Only if you have only a master NIS server else if you have even one slave server, set it to false MERGE_GROUP=false # If you have any group passwords in /etc/gshadow that need to be merged into the NIS group map, set it to true. MERGE_PASSWD=false # Set to true if you want to merge encrypted passwords from /etc/shadow into the NIS passwd map.

Uncomment the following line :
all: passwd group hosts netid ...

If you have slave NIS servers then enter their names in /var/yp/ypservers .

Finally run the following command:
# /usr/lib/yp/ypinit -m

Configuring a slave NIS server

Install ypserv package on the slave server.
Make sure you have the name of the slave server listed in /var/yp/ypservers on the master server.
Now issue the command :
# /usr/lib/yp/ypinit -s masterserver
Make sure the NOPUSH value in the /var/yp/Makefile on the master server is set to "false". Then when the master server's databases are updated, a call to the yppush executable will be made. yppush is responsible for transferring the updated contents from the master to the slaves. Only transfers within the same domain are made with yppush.
Lastly start ypserv and yppasswdd daemons
# service ypserv start # service yppasswdd start

Configuring an NIS client

Make sure the following packages are installed on your machine:
ypbind - NIS client daemon
authconfig - used for automatic configuration of NIS client.
yp-tools: Contains utilities like ypcat, yppasswd, ypwhich and so on used for viewing and modifying the user account details within the NIS server.
portmap (mandatory)
There are two methods to configure an NIS client.
- Method 1: Manual method
  - Enter the following line in the /etc/sysconfig/network file:
    NISDOMAIN=mynisdomain
  - Append the following line in /etc/yp.conf :
    domain mynisdomain server 192.168.0.1 # replace this with your NIS server address.
  - Make sure the following lines contain 'nis' as an option in the file /etc/nsswitch.conf file:
    passwd: files nis shadow: files nis group: files nis hosts: files nis dns networks: files nis protocols: files nis publickey: nisplus automount: files nis netgroup: files nis aliases: files nisplus
  - Finally restart ypbind and portmap.
- Method 2: Run authconfig and follow directions.
To check if you have succesfully configured NIS client, execute the following :
# ypcat passwd
The output will be the contents of the /etc/passwd file residing on the NIS server having user IDs greater than or equal to 500.

Tuesday, September 25, 2007

Linux Ethernet Bonding

What is bonding?
Bonding is the same as port trunking. In the following I will use the word bonding because practically we will bond interfaces as one.

But still...what is bonding?
Bonding allows you to aggregate multiple ports into a single group, effectively combining the bandwidth into a single connection. Bonding also allows you to create multi-gigabit pipes to transport traffic through the highest traffic areas of your network. For example, you can aggregate three megabits ports (1 mb each) into a three-megabits trunk port. That is equivalent with having one interface with three megabits speed.

Where should I use bonding?
You can use it wherever you need redundant links, fault tolerance or load balancing networks. It is the best way to have a high availability network segment. A very useful way to use bonding is to use it in connection with 802.1q VLAN support (your network equipment must have 802.1q protocol implemented).

Note :
------
The bonding driver originally came from Donald Becker's beowulf patches for
kernel 2.0. It has changed quite a bit since, and the original tools from
extreme-linux and beowulf sites will not work with this version of the driver.

For new versions of the driver, patches for older kernels and the updated
userspace tools, please follow the links at the end of this file.

Installation
============

1) Build kernel with the bonding driver
---------------------------------------
For the latest version of the bonding driver, use kernel 2.4.12 or above
(otherwise you will need to apply a patch).

Configure kernel with `make menuconfig/xconfig/config', and select
"Bonding driver support" in the "Network device support" section. It is
recommended to configure the driver as module since it is currently the only way
to pass parameters to the driver and configure more than one bonding device.

Build and install the new kernel and modules.

2) Get and install the userspace tools
--------------------------------------
This version of the bonding driver requires updated ifenslave program. The
original one from extreme-linux and beowulf will not work. Kernels 2.4.12
and above include the updated version of ifenslave.c in Documentation/network
directory. For older kernels, please follow the links at the end of this file.

IMPORTANT!!!  If you are running on Redhat 7.1 or greater, you need
to be careful because /usr/include/linux is no longer a symbolic link
to /usr/src/linux/include/linux.  If you build ifenslave while this is
true, ifenslave will appear to succeed but your bond won't work.  The purpose
of the -I option on the ifenslave compile line is to make sure it uses
/usr/src/linux/include/linux/if_bonding.h instead of the version from
/usr/include/linux.

To install ifenslave.c, do:
   # gcc -Wall -Wstrict-prototypes -O -I/usr/src/linux/include ifenslave.c -o ifenslave
   # cp ifenslave /sbin/ifenslave

3) Configure your system
------------------------
Also see the following section on the module parameters. You will need to add
at least the following line to /etc/conf.modules (or /etc/modules.conf):

 alias bond0 bonding

Use standard distribution techniques to define bond0 network interface. For
example, on modern RedHat distributions, create ifcfg-bond0 file in
/etc/sysconfig/network-scripts directory that looks like this:

DEVICE=bond0
IPADDR=192.168.1.1
NETMASK=255.255.255.0
NETWORK=192.168.1.0
BROADCAST=192.168.1.255
ONBOOT=yes
BOOTPROTO=none
USERCTL=no

(put the appropriate values for you network instead of 192.168.1).

All interfaces that are part of the trunk, should have SLAVE and MASTER
definitions. For example, in the case of RedHat, if you wish to make eth0 and
eth1 (or other interfaces) a part of the bonding interface bond0, their config
files (ifcfg-eth0, ifcfg-eth1, etc.) should look like this:

DEVICE=eth0
USERCTL=no
ONBOOT=yes
MASTER=bond0
SLAVE=yes
BOOTPROTO=none

(use DEVICE=eth1 for eth1 and MASTER=bond1 for bond1 if you have configured
second bonding interface).

Restart the networking subsystem or just bring up the bonding device if your
administration tools allow it. Otherwise, reboot. (For the case of RedHat
distros, you can do `ifup bond0' or `/etc/rc.d/init.d/network restart'.)

If the administration tools of your distribution do not support master/slave
notation in configuration of network interfaces, you will need to configure
the bonding device with the following commands manually:

   # /sbin/ifconfig bond0 192.168.1.1 up
   # /sbin/ifenslave bond0 eth0
   # /sbin/ifenslave bond0 eth1

(substitute 192.168.1.1 with your IP address and add custom network and custom
netmask to the arguments of ifconfig if required).

You can then create a script with these commands and put it into the appropriate
rc directory.

If you specifically need that all your network drivers are loaded before the
bonding driver, use one of modutils' powerful features : in your modules.conf,
tell that when asked for bond0, modprobe should first load all your interfaces :

probeall bond0 eth0 eth1 bonding

Be careful not to reference bond0 itself at the end of the line, or modprobe will
die in an endless recursive loop.

4) Module parameters.
---------------------
The following module parameters can be passed:

   mode=

Possible values are 0 (round robin policy, default) and 1 (active backup
policy), and 2 (XOR).  See question 9 and the HA section for additional info.

   miimon=

Use integer value for the frequency (in ms) of MII link monitoring. Zero value
is default and means the link monitoring will be disabled. A good value is 100
if you wish to use link monitoring. See HA section for additional info.

   downdelay=

Use integer value for delaying disabling a link by this number (in ms) after
the link failure has been detected. Must be a multiple of miimon. Default
value is zero. See HA section for additional info.

   updelay=

Use integer value for delaying enabling a link by this number (in ms) after
the "link up" status has been detected. Must be a multiple of miimon. Default
value is zero. See HA section for additional info.

   arp_interval=

Use integer value for the frequency (in ms) of arp monitoring.  Zero value
is default and means the arp monitoring will be disabled.  See HA section
for additional info.   This field is value in active_backup mode only.

   arp_ip_target=

An ip address to use when arp_interval is > 0.  This is the target of the
arp request sent to determine the health of the link to the target. 
Specify this value in ddd.ddd.ddd.ddd format.

If you need to configure several bonding devices, the driver must be loaded
several times. I.e. for two bonding devices, your /etc/conf.modules must look
like this:

alias bond0 bonding
alias bond1 bonding

options bond0 miimon=100
options bond1 -o bonding1 miimon=100

5) Testing configuration
------------------------
You can test the configuration and transmit policy with ifconfig. For example,
for round robin policy, you should get something like this:

[root]# /sbin/ifconfig
bond0     Link encap:Ethernet  HWaddr 00:C0:F0:1F:37:B4 
         inet addr:XXX.XXX.XXX.YYY  Bcast:XXX.XXX.XXX.255  Mask:255.255.252.0
         UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
         RX packets:7224794 errors:0 dropped:0 overruns:0 frame:0
         TX packets:3286647 errors:1 dropped:0 overruns:1 carrier:0
         collisions:0 txqueuelen:0

eth0      Link encap:Ethernet  HWaddr 00:C0:F0:1F:37:B4 
         inet addr:XXX.XXX.XXX.YYY  Bcast:XXX.XXX.XXX.255  Mask:255.255.252.0
         UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
         RX packets:3573025 errors:0 dropped:0 overruns:0 frame:0
         TX packets:1643167 errors:1 dropped:0 overruns:1 carrier:0
         collisions:0 txqueuelen:100
         Interrupt:10 Base address:0x1080

eth1      Link encap:Ethernet  HWaddr 00:C0:F0:1F:37:B4 
         inet addr:XXX.XXX.XXX.YYY  Bcast:XXX.XXX.XXX.255  Mask:255.255.252.0
         UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
         RX packets:3651769 errors:0 dropped:0 overruns:0 frame:0
         TX packets:1643480 errors:0 dropped:0 overruns:0 carrier:0
         collisions:0 txqueuelen:100
         Interrupt:9 Base address:0x1400

Questions :
===========

1.  Is it SMP safe?

 Yes.  The old 2.0.xx channel bonding patch was not SMP safe.
 The new driver was designed to be SMP safe from the start.

2.  What type of cards will work with it?

 Any Ethernet type cards (you can even mix cards - a Intel
 EtherExpress PRO/100 and a 3com 3c905b, for example).
 You can even bond together Gigabit Ethernet cards!

3.  How many bonding devices can I have?

 One for each module you load. See section on module parameters for how
 to accomplish this.

4.  How many slaves can a bonding device have?

 Limited by the number of network interfaces Linux supports and the
 number of cards you can place in your system.

5.  What happens when a slave link dies?

 If your ethernet cards support MII status monitoring and the MII
 monitoring has been enabled in the driver (see description of module
 parameters), there will be no adverse consequences. This release
 of the bonding driver knows how to get the MII information and
 enables or disables its slaves according to their link status.
 See section on HA for additional information.

 For ethernet cards not supporting MII status, or if you wish to
 verify that packets have been both send and received, you may
 configure the arp_interval and arp_ip_target.  If packets have
 not been sent or received during this interval, an arp request
 is sent to the target to generate send and receive traffic. 
 If after this interval, either the successful send and/or
 receive count has not incremented, the next slave in the sequence
 will become the active slave.

 If neither mii_monitor and arp_interval is configured, the bonding
 driver will not handle this situation very well. The driver will
 continue to send packets but some packets will be lost. Retransmits
 will cause serious degradation of performance (in the case when one
 of two slave links fails, 50% packets will be lost, which is a serious
 problem for both TCP and UDP).

6.  Can bonding be used for High Availability?

 Yes, if you use MII monitoring and ALL your cards support MII link
 status reporting. See section on HA for more information.

7.  Which switches/systems does it work with?

 In round-robin mode, it works with systems that support trunking:
 
 * Cisco 5500 series (look for EtherChannel support).
 * SunTrunking software.
 * Alteon AceDirector switches / WebOS (use Trunks).
 * BayStack Switches (trunks must be explicitly configured). Stackable
   models (450) can define trunks between ports on different physical
   units.
 * Linux bonding, of course !
 
 In Active-backup mode, it should work with any Layer-II switches.

8.  Where does a bonding device get its MAC address from?

 If not explicitly configured with ifconfig, the MAC address of the
 bonding device is taken from its first slave device. This MAC address
 is then passed to all following slaves and remains persistent (even if
 the the first slave is removed) until the bonding device is brought
 down or reconfigured.
 
 If you wish to change the MAC address, you can set it with ifconfig:

   # ifconfig bond0 ha ether 00:11:22:33:44:55

 The MAC address can be also changed by bringing down/up the device
 and then changing its slaves (or their order):
 
   # ifconfig bond0 down ; modprobe -r bonding
   # ifconfig bond0 .... up
   # ifenslave bond0 eth...

 This method will automatically take the address from the next slave
 that will be added.
 
 To restore your slaves' MAC addresses, you need to detach them
 from the bond (`ifenslave -d bond0 eth0'), set them down
 (`ifconfig eth0 down'), unload the drivers (`rmmod 3c59x', for
 example) and reload them to get the MAC addresses from their
 eeproms. If the driver is shared by several devices, you need
 to turn them all down. Another solution is to look for the MAC
 address at boot time (dmesg or tail /var/log/messages) and to
 reset it by hand with ifconfig :

   # ifconfig eth0 down
   # ifconfig eth0 hw ether 00:20:40:60:80:A0

9.  Which transmit polices can be used?

 Round robin, based on the order of enslaving, the output device
 is selected base on the next available slave.  Regardless of
 the source and/or destination of the packet.

 XOR, based on (src hw addr XOR dst hw addr) % slave cnt.  This
 selects the same slave for each destination hw address.

 Active-backup policy that ensures that one and only one device will
 transmit at any given moment. Active-backup policy is useful for
 implementing high availability solutions using two hubs (see
 section on HA).

High availability
=================

To implement high availability using the bonding driver, you need to
compile the driver as module because currently it is the only way to pass
parameters to the driver. This may change in the future.

High availability is achieved by using MII status reporting. You need to
verify that all your interfaces support MII link status reporting. On Linux
kernel 2.2.17, all the 100 Mbps capable drivers and yellowfin gigabit driver
support it. If your system has an interface that does not support MII status
reporting, a failure of its link will not be detected!

The bonding driver can regularly check all its slaves links by checking the
MII status registers. The check interval is specified by the module argument
"miimon" (MII monitoring). It takes an integer that represents the
checking time in milliseconds. It should not come to close to (1000/HZ)
(10 ms on i386) because it may then reduce the system interactivity. 100 ms
seems to be a good value. It means that a dead link will be detected at most
100 ms after it goes down.

Example:

  # modprobe bonding miimon=100

Or, put in your /etc/modules.conf :

  alias bond0 bonding
  options bond0 miimon=100

There are currently two policies for high availability, depending on whether
a) hosts are connected to a single host or switch that support trunking
b) hosts are connected to several different switches or a single switch that
  does not support trunking.

1) HA on a single switch or host - load balancing
-------------------------------------------------
It is the easiest to set up and to understand. Simply configure the
remote equipment (host or switch) to aggregate traffic over several
ports (Trunk, EtherChannel, etc.) and configure the bonding interfaces.
If the module has been loaded with the proper MII option, it will work
automatically. You can then try to remove and restore different links
and see in your logs what the driver detects. When testing, you may
encounter problems on some buggy switches that disable the trunk for a
long time if all ports in a trunk go down. This is not Linux, but really
the switch (reboot it to ensure).

Example 1 : host to host at double speed

         +----------+                          +----------+
         |          |eth0                  eth0|          |
         | Host A   +--------------------------+  Host B  |
         |          +--------------------------+          |
         |          |eth1                  eth1|          |
         +----------+                          +----------+

 On each host :
    # modprobe bonding miimon=100
    # ifconfig bond0 addr
    # ifenslave bond0 eth0 eth1

Example 2 : host to switch at double speed

         +----------+                          +----------+
         |          |eth0                 port1|          |
         | Host A   +--------------------------+  switch  |
         |          +--------------------------+          |
         |          |eth1                 port2|          |
         +----------+                          +----------+

 On host A :                             On the switch :
    # modprobe bonding miimon=100           # set up a trunk on port1
    # ifconfig bond0 addr                     and port2
    # ifenslave bond0 eth0 eth1

2) HA on two or more switches (or a single switch without trunking support)
---------------------------------------------------------------------------
This mode is more problematic because it relies on the fact that there
are multiple ports and the host's MAC address should be visible on one
port only to avoid confusing the switches.

If you need to know which interface is the active one, and which ones are
backup, use ifconfig. All backup interfaces have the NOARP flag set.

To use this mode, pass "mode=1" to the module at load time :

   # modprobe bonding miimon=100 mode=1

Or, put in your /etc/modules.conf :

   alias bond0 bonding
   options bond0 miimon=100 mode=1

Example 1: Using multiple host and multiple switches to build a "no single
point of failure" solution.


               |                                     |
               |port3                           port3|
         +-----+----+                          +-----+----+
         |          |port7       ISL      port7|          |
         | switch A +--------------------------+ switch B |
         |          +--------------------------+          |
         |          |port8                port8|          |
         +----++----+                          +-----++---+
         port2||port1                           port1||port2
              ||             +-------+               ||
              |+-------------+ host1 +---------------+|
              |         eth0 +-------+ eth1           |
              |                                       |
              |              +-------+                |
              +--------------+ host2 +----------------+
                        eth0 +-------+ eth1

In this configuration, there are an ISL - Inter Switch Link (could be a trunk),
several servers (host1, host2 ...) attached to both switches each, and one or
more ports to the outside world (port3...). One an only one slave on each host
is active at a time, while all links are still monitored (the system can
detect a failure of active and backup links).

Each time a host changes its active interface, it sticks to the new one until
it goes down. In this example, the hosts are not too much affected by the
expiration time of the switches' forwarding tables.

If host1 and host2 have the same functionality and are used in load balancing
by another external mechanism, it is good to have host1's active interface
connected to one switch and host2's to the other. Such system will survive
a failure of a single host, cable, or switch. The worst thing that may happen
in the case of a switch failure is that half of the hosts will be temporarily
unreachable until the other switch expires its tables.

Example 2: Using multiple ethernet cards connected to a switch to configure
          NIC failover (switch is not required to support trunking).


         +----------+                          +----------+
         |          |eth0                 port1|          |
         | Host A   +--------------------------+  switch  |
         |          +--------------------------+          |
         |          |eth1                 port2|          |
         +----------+                          +----------+

 On host A :                                 On the switch :
    # modprobe bonding miimon=100 mode=1     # (optional) minimize the time
    # ifconfig bond0 addr                    # for table expiration
    # ifenslave bond0 eth0 eth1

Each time the host changes its active interface, it sticks to the new one until
it goes down. In this example, the host is strongly affected by the expiration
time of the switch forwarding table.

3) Adapting to your switches' timing
------------------------------------
If your switches take a long time to go into backup mode, it may be
desirable not to activate a backup interface immediately after a link goes
down. It is possible to delay the moment at which a link will be
completely disabled by passing the module parameter "downdelay" (in
milliseconds, must be a multiple of miimon).

When a switch reboots, it is possible that its ports report "link up" status
before they become usable. This could fool a bond device by causing it to
use some ports that are not ready yet. It is possible to delay the moment at
which an active link will be reused by passing the module parameter "updelay"
(in milliseconds, must be a multiple of miimon).

A similar situation can occur when a host re-negotiates a lost link with the
switch (a case of cable replacement).

A special case is when a bonding interface has lost all slave links. Then the
driver will immediately reuse the first link that goes up, even if updelay
parameter was specified. (If there are slave interfaces in the "updelay" state,
the interface that first went into that state will be immediately reused.) This
allows to reduce down-time if the value of updelay has been overestimated.

Examples :

   # modprobe bonding miimon=100 mode=1 downdelay=2000 updelay=5000
   # modprobe bonding miimon=100 mode=0 downdelay=0 updelay=5000

4) Limitations
--------------
The main limitations are :
 - only the link status is monitored. If the switch on the other side is
   partially down (e.g. doesn't forward anymore, but the link is OK), the link
   won't be disabled. Another way to check for a dead link could be to count
   incoming frames on a heavily loaded host. This is not applicable to small
   servers, but may be useful when the front switches send multicast
   information on their links (e.g. VRRP), or even health-check the servers.
   Use the arp_interval/arp_ip_target parameters to count incoming/outgoing
   frames.

--------------------------------------------------------------------------------------

The following script (the gray area) will configure a bond interface (bond0) using two ethernet interface (eth0 and eth1). You can place it onto your on file and run it at boot time..

#!/bin/bash

modprobe bonding mode=0 miimon=100 # load bonding module

ifconfig eth0 down # putting down the eth0 interface
ifconfig eth1 down # putting down the eth1 interface

ifconfig bond0 hw ether 00:11:22:33:44:55 # changing the MAC address of the bond0 interface
ifconfig bond0 192.168.55.55 up # to set ethX interfaces as slave the bond0 must have an ip.

ifenslave bond0 eth0 # putting the eth0 interface in the slave mod for bond0
ifenslave bond0 eth1 # putting the eth1 interface in the slave mod for bond0

You can set up your bond interface according to your needs. Changing one parameters (mode=X) you can have the following bonding types:

mode=0 (balance-rr)
Round-robin policy: Transmit packets in sequential order from the first available slave through the last. This mode provides load balancing and fault tolerance.

mode=1 (active-backup)
Active-backup policy: Only one slave in the bond is active. A different slave becomes active if, and only if, the active slave fails. The bond's MAC address is externally visible on only one port (network adapter) to avoid confusing the switch. This mode provides fault tolerance. The primary option affects the behavior of this mode.

mode=2 (balance-xor)
XOR policy: Transmit based on [(source MAC address XOR'd with destination MAC address) modulo slave count]. This selects the same slave for each destination MAC address. This mode provides load balancing and fault tolerance.

mode=3 (broadcast)
Broadcast policy: transmits everything on all slave interfaces. This mode provides fault tolerance.

mode=4 (802.3ad)
IEEE 802.3ad Dynamic link aggregation. Creates aggregation groups that share the same speed and duplex settings. Utilizes all slaves in the active aggregator according to the 802.3ad specification.

 Pre-requisites:
 1. Ethtool support in the base drivers for retrieving
 the speed and duplex of each slave.
 2. A switch that supports IEEE 802.3ad Dynamic link
 aggregation.
 Most switches will require some type of configuration
 to enable 802.3ad mode.

mode=5 (balance-tlb)
Adaptive transmit load balancing: channel bonding that does not require any special switch support. The outgoing traffic is distributed according to the current load (computed relative to the speed) on each slave. Incoming traffic is received by the current slave. If the receiving slave fails, another slave takes over the MAC address of the failed receiving slave.

 Prerequisite:
 Ethtool support in the base drivers for retrieving the
 speed of each slave.

mode=6 (balance-alb)
Adaptive load balancing: includes balance-tlb plus receive load balancing (rlb) for IPV4 traffic, and does not require any special switch support. The receive load balancing is achieved by ARP negotiation. The bonding driver intercepts the ARP Replies sent by the local system on their way out and overwrites the source hardware address with the unique hardware address of one of the slaves in the bond such that different peers use different hardware addresses for the server.

The most used are the first four mode types...

Also you can use multiple bond interface but for that you must load the bonding module as many as you need.
Presuming that you want two bond interface you must configure the /etc/modules.conf as follow:

 alias bond0 bonding
 options bond0 -o bond0 mode=0 miimon=100
 alias bond1 bonding
 options bond1 -o bond1 mode=1 miimon=100

Thursday, September 20, 2007

Linux Shell Script - Random number generation

A small note to generate random numbers in Linux scripts:

To Generate Random numbers between 0 to 10:

echo $[($RANDOM % 10)]

To Generate Random numbers between 1 to 10:

echo $[($RANDOM % 10) + 1]

To Generate Random numbers between 30 to 40:

my_random is a function which accept two integers,

#! /bin/bash

my_random()

{

number=$[($RANDOM % $2) + 1]

while [ $number -lt $1 ]

number=$[($RANDOM % $2) + 1]

done

echo $number

}

my_random 30 40

Thursday, August 30, 2007

Troubleshooting Tips

These tips are copied from some other sites. Basically I am trying to accumulate all informations needed for Linux support to this blog ....

Up to date?

If you are installing new hardware, or having trouble with anything new, do you have the right drivers for any hardware being added? Just because the OS HAS a driver doesn't mean that it is a GOOD driver- for example, SCO OSR5 had a driver for Intel Pro100 cards, but if you wanted something that WORKED, you had to go to SCO's FTP site and download a good driver.
On a similar note, do you have the current recommended patches and updates? Sometimes this is all it takes to fix your problem. It's certainly worth checking.

Evidence
The first rule of troubleshooting is to do no harm. Another way to put that is to say "don't trample all over the evidence". If you aren't careful and methodical in your approach, you may destroy clues that could help you narrow down the source of your problem. For example, Unix and Linux keep a "last accessed" date for files and directories. You can see that date by using "ls -u filename" or "ls -du directoryname". That date can be a critical clue as we'll see in a moment, but it's so easy to lose it. Try this experiment: cd /tmp and make a new directory "testdate". Do

cd /tmp
mkdir testdate
touch testdate/a
ls -lud testdate
sleep 60
ls -lud testdate
# still the same date
ls -l testdate
# nothing to see in there, but
ls -lud testdate
# now the access date has changed
The same change will take place to an ordinary file if you cat it, copy it or access it in any way (because it's the access date!).
(Confused? The "ls -lud" reads the /tmp directory to get info about "testdate". The "ls -l" looks into "testdate" so that's access.)
Needing or wanting to know when something was last accessed comes up all the time, but just a few examples might give you some ideas:
A directory was supposed to be on the backup but isn't. Did the backup program access that directory?
A misbehaving program is supposed to use a certain file during its startup. Did it?
What files does a program try to use? Knowing this can sometimes help you track down where a program is failing when it is too dumb to tell you itself.

Unix systems keep two other dates: modify, and inode change. The modified date is what you see when you use "ls -l" and is the date that the file has been changed (or created, if nobody has changed it since). The inode change date (ls -lc) reflects the time that permissions or ownership have changed (but note that anything that affects the modified time also affects this).
Some systems have a command line "stat" command that shows all three times (and more useful info) at once. Here's the output from "stat" on a Red Hat Linux system:

[tony@linux tony]$ stat .bashrc
File: ".bashrc"
Size: 124 Filetype: Regular File
Mode: (0644/-rw-r--r--) Uid: ( 500/ tony) Gid: ( 500/ tony)
Device: 3,6 Inode: 19999 Links: 1
Access: Fri Aug 18 07:07:17 2000(00000.00:00:13)
Modify: Mon Oct 25 03:51:14 1999(00298.03:16:16)
Change: Mon Oct 25 03:51:14 1999(00298.03:16:16)
Note that "Modify" and "Change" are the same. That's becaue "Change" will also reflect the time that a file was modified. However, if we used chmod or chown on this file, the "Change" date would reflect that and "Modify" would not.
You can get a similar listing on SCO, but you need to know the inode (ls -i), the filesystem the file is on, and then run fsdb on that file system. It's not usually worth the trouble, but it does also tell you where the file's disk blocks are to be found:

# cd /tmp
# l -i y
33184 -rw-r--r-- 1 root sys 143 Aug 11 14:01 y
(33184 is the inode number of the file "y")
# fsdb /dev/root
/dev/root(): HTFS File System
FSIZE = 1922407, ISIZE = 480608
33184i
i#: 33184 md: f---rw-r--r-- ln: 1 uid: 0 gid: 3 sz: 143
a0:337145 a1: 0 a2: 0 a3: 0 a4: 0 a5: 0 a6: 0
a7: 0 a8: 0 a9: 0 a10: 0 a11: 0 a12: 0
at: Tue Aug 15 11:06:35 2000
mt: Fri Aug 11 14:01:10 2000
ct: Fri Aug 11 14:01:10 2000
This tells me the three times, and also that the entire file is located in one block (block 337145). If it were larger, and the next block was not 337146, I'd also know that the file is fragmented (once you get above a9, the rules change; see BFIND.C for a brief introduction to that).
The inode time can be very illuminating: suppose a problem started yesterday, and you can see that "ls -c" shows an inode change then but no "ls -l" change: it might be that ownership or permissions have been changed and are causing your problem.
Of course, if you don't know what the ownership or permissions should be, this may not help a lot. Some systems have a database of file ownership and permissions and can report discrepancies and even return the files to their proper state. SCO Unix, for example, can use the "custom" command to verify software. Old Xenix systems had "fixperm". Linux systems using RPM can verify packages; other Linux package managers have similar capabilities. Add-on security packages like COPS and others can also be useful: although their concern is the security aspects of files changing, their watchfulness can be useful in troubleshooting contexts also.
If you need to know what files a process is using, the "lsof" command (standard with Linux, available from Skunkware for SCO) can help. Here's an example from a Linux system:

# lsof -p 1748

COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
httpd 1748 root cwd DIR 8,5 4096 2 /
httpd 1748 root rtd DIR 8,5 4096 2 /
httpd 1748 root txt REG 8,6 242860 361425 /usr/sbin/httpd
httpd 1748 root mem REG 8,5 494250 216911 /lib/ld-2.2.4.so
httpd 1748 root mem REG 8,6 10007 264042 /usr/lib/apache/mod_vhost_alias.so
httpd 1748 root mem REG 8,6 8169 263597 /usr/lib/apache/mod_env.so
httpd 1748 root mem REG 8,6 17794 263604 /usr/lib/apache/mod_log_config.so
httpd 1748 root mem REG 8,6 7562 263603 /usr/lib/apache/mod_log_agent.so
httpd 1748 root mem REG 8,6 8558 263605 /usr/lib/apache/mod_log_referer.so
httpd 1748 root mem REG 8,6 8142 263596 /usr/lib/apache/mod_dir.so
httpd 1748 root mem REG 8,6 370 117962 /usr/lib/locale/en_US/LC_IDENTIFICATION
httpd 1748 root mem REG 8,5 531205 110159 /lib/i686/libpthread-0.9.so
(many lines deleted)
This tool can also list open network connections. For example, if you need to know what process are listening on port 137 , you would use:

# lsof -i :137
COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
nmbd 903 root 5u IPv4 1255 UDP *:netbios-ns
nmbd 903 root 7u IPv4 1260 UDP apl:netbios-ns
Another piece of evidence that can be very helpful is when someone has logged in. The "last" comand will give you that information, including how long they were on the system. But "last" gets its data from /etc/wtmp, so you may want to get information about wtmp before running last (keep in mind that wtmp is affected by logins and logouts, so unless you were logged in BEFORE whatever problem you are looking for started, your login has changed wtmp).
How long the system itself has been up is available from "w" or "uptime". Other evidence you may want to collect before beginning a trouble search includes:
• df -v
• du -s
• memsize
System logs may be very useful. However, before you look at them, get the "ls -l", "ls -c" and "ls -u" from the logs- this tells you when the log was last used, etc.- if there's no current information and there should be, those dates are important clues..
You need to know what to look for in the logs. SCO makes it easier by using words like "CONFIG" "NOTICE" and "WARNING" that you can grep for, but on other OS'es you may have to just look manually until you can figure out what sort of key words they would use.
Another often overlooked clue is "what processes are using this file?". You get the answer to that with "fuser", which will return a list of the processes using a file ("lsof" is another very useful tool in this context).
Software problems can be tough. If you have "trace" (Linux "strace", Mac OS X "ktrace" and "kdump"), knowing what files a program tries to read and write can be very useful. A first attempt is to run the program like this:

trace badprog 2> /tmp/badprog.trace
Then examine /tmp/badprog.trace, particularly noting access, open, read and write calls. These will all have the general format like this:

_open ("/etc/default/lang", 0x0) = 4
read (4, "#\t@(#) lang.src 58.1 96/10/09 \n#\n#\tCopyr".., 1024) = 437
...
_open ("/usr/lib/lang/C/C/C/collate", 0x0) = 5
read (5, "".., 1) = 1
The return value of the call is your clue- if it's positive, it's usually OK.For example, the lines above mean that /etc/default/lang was opened correctly ( we would have got a negative number instead of 4 if it failed) and then 437 bytes were read from it- see the "read(4.." ? That 4 means it's reading /etc/default/lang. Later on it opens "/usr/lib/lang/C/C/C/collate" and the return value is "5", the next read is from that file (because it's "read(5..". As you can see, you really don't need to be a programmer or even understand much about this to be able to (possibly) find a problem here (do watch for "close", though- if you aren't paying attention you won't know what file is being read because the numbers get reused).
Sometimes it's helpful to know WHAT files got modified by an application. To find that out, this can help:

touch /tmp/rightnow
runapplication
find / -newer /tmp/rightnow

Keep it simple, stupid

The kiss ("keep it simple stupid") principle is always good to follow. Removing unnecessary hardware, making the network smaller, etc. are good examples. Dirk Hart posted this in response to a problem that involved one of those nasty sub-nets that make our brains hurt:

In this situation, I would use a netmask of 255.255.255.0 on the '486
(and on another pc to avoid confusion) and determine that the card is
indeed working on the 486. _Then_ I would worry about binary arithmetic.

'It don't mean a thing if it ain't got that ping'

Regards, Dirk

Testing Specific Systems

You may have a very good idea of what you need to concentrate on. If a printer isn't working, bad disk blocks aren't a likely suspect. However, the problem may be less obvious: data corruption in the middle of a file could be bad disk blocks, bad memory, bad cpu, bad disk controller, a bad DMA controller.. how do you narrow this stuff down? Well, honestly, sometimes it can be hellishly difficult, particularly if you are pressed for time. If you don't know where to start, and you aren't sure if the problem is hardware or software, checking the hardware is probably the best place to start. Unfortunately, some of the things you'll need to for that require that the system be in single user mode.
Single user mode does NOT mean that just one person is logged in. Specifically, it means that the system has been brought to init state 1, usually by typing "init 1" while logged in as root at the console.

Things you can check while still multi-user

Log files. On some sytems, "dmesg" gives you a lot of information, on others (SCO, for instance) it doesn't give you much and you have to look at the logs yourself. You'll find these logs in /var/log or /usr/adm (SCO). The "syslog" is often particularly useful. If printing is the problem, the printlogs are in /usr/spool/lp/logs (SCO), or the "lf" entry in /etc/printcap points you to them.
Network statistics. If the network isn't working right, all kinds of other things may be in trouble. To be good at this, you really need to understand networking more than I want to try to cover here, but I will cover some basic points:
• If you can ping 127.0.0.1, your tcp/ip is working but that says NOTHING about your NIC card. As pinging your own ip address will get re-routed to 127.0.0.1, pinging your own ip address won't prove anything about your NIC either.
• Always try netstat and arp commands with the "-n" flags to avoid name resolution lookups- this keeps everything local and aids your troubleshooting.
You can see what network ports are in use with "netstat -an" (lsof is also useful for this). Let's just take a quick look :

Active Internet connections (including servers)
Proto Recv-Q Send-Q Local Address Foreign Address (state)
tcp 0 0 64.13.44.12.1254 209.167.40.69.119 ESTABLISHED
tcp 0 0 64.13.44.12.1252 63.209.14.66.119 CLOSE_WAIT
tcp 0 0 64.13.44.12.1216 216.71.1.37.80 CLOSE_WAIT
tcp 0 0 10.1.36.3.1085 10.1.36.100.23 ESTABLISHED
tcp 0 0 *.80 *.* LISTEN
What can I tell from this? First, I'm reading news from 209.167.40.69. How do I know that? See the "119" that follows that address? If I "grep 119 /etc/services, I get:

nntp 119/tcp readnews untp # USENET News Transfer Protocol
I also was reading news from another site, but I'm not presently- that's why that line says "CLOSE_WAIT" rather than "ESTABLISHED". I had a connection to a web page (the "80"- grep it from /etc/services if you don't recognize it) and I have a telnet session (23) open to 10.1.36.100. This machine also has a web-server, and it's LISTENing on port 80. Of course, none of this means anything unless you know what's SUPPOSED to be going on. So, if the web server isn't working, and you "netstat -an | grep LISTEN" and don't see *.80 in the list, that's why it isn't working. Now WHY isn't it working? If you try to start it and it fails, its logs will likely tell you what the problem is. If not, then maybe trace or strace can give you a clue.
A system that is running, but slowly, can be hard to figure out. The slowness is either coming from the CPU or the hard drive. The "sar" program (available for Linux, too) will let you figure out which pretty quickly. First try"sar 5 5". That gives you information about your CPU. Then, "sar -d 5 5" will tell you about your hard drives. The only problem is that, unless it's flat obvious (CPU at 0% idle or hard drives showing 1,000 msec in the avserv column), you don't know if what you are seeing is normal or abnormal for this machine running its typical load. If sar has been set up and employed properly, you'll have historical data (in /var/adm/sa on SCO) to compare, but otherwise you have to just use your best judgement, and if you haven't had much experience on similar systems, you just won't know. See Sarcheck Review(May 1999) for more information.
If CPU is the problem, a simple "ps -e | sort -r +2 | head" can tell you a lot, particularly if you put it in a loop like this:
while true
do
ps -e | sort -r +2 | head
sleep 10
done
If that shows a process gaining a substantial amount of time during the 10 second sleep, that process is using a lot of time- if it gained 5 seconds, for example, it is using 50% of your cpu! You might also use "ps -ef" and look at the "C" column (just before the time)- that's total CPU usage.
Skunkware has "iohog", "memhog", and "cpuhog". These can help you pinpoint misbehaving processes. Download them all from http://www.caldera.com/skunkware/osr5/vols/hog-1.1-VOLS.tar
If you are slow on a network connection, "netstat -in" and (on SCO) "llistat -l" can show you problems. Don't let DNS issues confuse you: if, for example, a telnet session is slow to connect but then works fine, that's DNS- either the connecting machine can't resolve the server's name or vice-versa. Remember that: the server is going to try to resolve the name of the client, and if it can't, there will be a delay- possibly a long delay.
Sometimes it helps to know just what equipment is in the machine. SCO has the "hw -v" command and "hwconfig", Windows has its Control Panel -> System -> Device Manager, and linux has lsdev, lspci, pnpdump and, of course, dmesg.
See Why is my system slow for more on this.
If you are having troubles, particularly during an installation, take out any hardware you don't need right now. You can always put it in later.
If you can identify a good-sized file that is not being used by anyone (check with fuser or lsof), running repeated "sum -r"'s on that file should always produce the same result. If it does not, and you are certain no-one else is modifying the file, then the hard disk, memory, disk controller, dma controller or cpu could be suspect. Gee, doesn't that narrow it down? Well, maybe we can get a little better:
The first time you sum the file, the information should be read from the hard drive directly unless it is already in cache( to effectively flush the cache, you need to know how big it is so you can sum, or just cat to /dev/null, some other, larger file which will overwrite any trace of this file). On SCO, the cache size can be found by

grep "i/o bufs" /usr/adm/messages
On linux "dmesg | grep cache" will get you what you need. If you pick your file size to be slightly less than cache, you've got a good shot at getting it all into memory (assuming no one else is using the machine just then).
The second time you sum, some portion or even all of the file will be read from memory, so if the sum's were constant after flushing cache, but not otherwise, memory would be suspect.
You might get a clue about DMA by testing other devices that use it- the floppy disk is a good candidate: if data can be reliably written and read from floppy (use sum -r to check it) but hard drive data is changing, then it's probably NOT memory or motherboard problems.

Things you need to be in single-user mode for

The first thing you will want to do is run fsck. Except for root, file systems should be unmounted when running this on them- you could umount /home while multiuser, of course, but since you are going to be checking all of them, you may as well be single-user. Be sure to check the man page; SCO, for example, doesn't do a full fsck on their modern versions unless you specify "-ofull" (see Filesystems for a more complete discussion of fsck).
What does fsck tell you? Well, if you have hard disk problems, fsck is going to trip over them, so that can be useful. Because fsck uses a fair amount of memory as it's working, memory problems will probably affect it -unfortunately, the effect may be unpleasant, so if you suspect that memory could be an issue, try creating a floppy file system first ( just "mkfs /dev/fd0" is sufficient ) and fsck it- that won't make you 100% sure that fsck isn't going to ruin your life for reasons beyond its control, but it gives you a little more confidence. I wouldn't fret over this- if you've had bad memory problems, things on your disk are likely not healthy already.
After fsck, you want to check the hard drive. SCO has "badtrk", Linux has "badblocks". Read the man pages carefully- you do NOT want to accidentally destroy your system, and these things can.

Rebooting

If your problems began after linking a new kernel, maybe you should boot with the previous version. On SCO, you automatically get "unix.old"; just type that at the Boot prompt. On Linux, you get the old kernel if you bothered to arrange for that in lilo.conf (or whatever your boot manger uses). Try hitting TAB at the boot prompt to see if you have other choices.

Logging

Maybe you still can't figure it out. Some file is getting trashed for unknown reasons and you can't get a handle on it. Perhaps you can set up some cron or background process to try to catch whatever is going on in the act. Something like this:

while true
do
sleep 300
sum -r suspect_file
fuser (or lsof) suspect file
done > /tmp/mylog &
You can get more sophisticated; if I have to leave this for days or weeks I send the sum to temporary files, compare them and delete them if nothing has changed; or I'll collect the fuser output and run it through "ps" so I can see who or what was responsible.

Seeing the invisible

Sometimes you need to see what's really going on without interpretation. For dumb terminals, you can often turn on a "monitor" mode in the terminal's setup that will then display the hex characters of any control characters rather than acting on them. For a telnet session, you can escape to the telnet prompt (generally CTRL-] ) and type "set netdata", then escape again and "set prettydump".
Another way to capture everything is to use "script". For example, "script /tmp/mystuff". Then do whatever you are doing, and when you are done, CTRL-D to end script. The /tmp/mystuff contains everything that happened, including control characters; you can examine it with vi or hd.

Tuesday, August 14, 2007

Mdadm problems adding disk back to software raid array

What you did should have worked. But since it didn’t, now would be a good time to backup anything of value on the system.

If the problem persists after a clean reboot, try adding the missing raid members after booting into rescue mode or booting using the CentOS Live-CD, but don’t search/mount/chroot the installation. Rescue mode will probably not start the raids without search/mount, so:

# mdadm -A --run /dev/md1 /dev/sda2
# mdadm /dev/md1 -a /dev/sdb2

If you still cannot add the missing members, try recreating a raid1 from rescue mode or Live-CD:

# mdadm -S /dev/md1
# mdadm -C /dev/md1 -l1 -n2 /dev/sda2 missing
# mdadm /dev/md1 -a /dev/sdb2

If that doesn’t work, overwrite the first few MB of the sdb partition(s), reboot and try adding it/them again.

# dd if=/dev/zero of=/dev/sdb2 bs=1M count=8
# init 6
# # After reboot
# mdadm /dev/md1 -a /dev/sdb2

If at any point you reassemble the raids, it would be a good idea to let them rebuild before rebooting.

Monday, May 28, 2007

100 Linux Questions

1. You attempt to use shadow passwords but are unsuccessful. What characteristic of the /etc/passwd file may cause this? Choose one: a. The login command is missing. b. The username is too long. c. The password field is blank. d. The password field is prefaced by an asterick.
2. You create a new user account by adding the following line to your /etc/passwd file. bobm:baddog:501:501:Bob Morris:/home/bobm:/bin/bash Bob calls you and tells you that he cannot logon. You verify that he is using the correct username and password. What is the problem? Choose one: a. The UID and GID cannot be identical. b. You cannot have spaces in the line unless they are surrounded with double quotes. c. You cannot directly enter the password; rather you have to use the passwd command to assign a password to the user. d. The username is too short, it must be at least six characters long.
3. Which of the following tasks is not necessary when creating a new user by editing the /etc/passwd file? Choose one: a. Create a link from the user’s home directory to the shell the user will use. b. Create the user’s home directory c. Use the passwd command to assign a password to the account. d. Add the user to the specified group.
4. You create a new user by adding the following line to the /etc/passwd file bobm::501:501:Bob Morris:/home/bobm:/bin/bash You then create the user’s home directory and use the passwd command to set his password. However, the user calls you and says that he cannot log on. What is the problem? Choose one: a. The user did not change his password. b. bobm does not have permission to /home/bobm. c. The user did not type his username in all caps. d. You cannot leave the password field blank when creating a new user.
5. When using useradd to create a new user account, which of the following tasks is not done automatically. Choose one: a. Assign a UID. b. Assign a default shell. c. Create the user’s home directory. d. Define the user’s home directory.
6. You issue the following command useradd -m bobm But the user cannot logon. What is the problem? Choose one: a. You need to assign a password to bobm’s account using the passwd command. b. You need to create bobm’s home directory and set the appropriate permissions. c. You need to edit the /etc/passwd file and assign a shell for bobm’s account. d. The username must be at least five characters long.
7. You have created special configuration files that you want copied to each user’s home directories when creating new user accounts. You copy the files to /etc/skel. Which of the following commands will make this happen? Choose one: a. useradd -m username b. useradd -mk username c. useradd -k username d. useradd -Dk username
8. Mary has recently gotten married and wants to change her username from mstone to mknight. Which of the following commands should you run to accomplish this? Choose one: a. usermod -l mknight mstone b. usermod -l mstone mknight c. usermod -u mknight mstone d. usermod -u mstone mknight
9. After bob leaves the company you issue the command userdel bob. Although his entry in the /etc/passwd file has been deleted, his home directory is still there. What command could you have used to make sure that his home directory was also deleted? Choose one: a. userdel -m bob b. userdel -u bob c. userdel -l bob d. userdel -r bob
10. All groups are defined in the /etc/group file. Each entry contains four fields in the following order. Choose one: a. groupname, password, GID, member list b. GID, groupname, password, member list c. groupname, GID, password, member list d. GID, member list, groupname, password
11. You need to create a new group called sales with Bob, Mary and Joe as members. Which of the following would accomplish this? Choose one: a. Add the following line to the /etc/group file: sales:44:bob,mary,joe b. Issue the command groupadd sales. c. Issue the command groupadd -a sales bob,mary,joe d. Add the following line to the /etc/group file: sales::44:bob,mary,joe
12. What command is used to remove the password assigned to a group?
13. You changed the GID of the sales group by editing the /etc/group file. All of the members can change to the group without any problem except for Joe. He cannot even login to the system. What is the problem? Choose one: a. Joe forgot the password for the group. b. You need to add Joe to the group again. c. Joe had the original GID specified as his default group in the /etc/passwd file. d. You need to delete Joe’s account and recreate it.
14. You need to delete the group dataproject. Which two of the following tasks should you do first before deleting the group? A. Check the /etc/passwd file to make sure no one has this group as his default group. B. Change the members of the dataproject group to another group besides users. C. Make sure that the members listed in the /etc/group file are given new login names. D. Verify that no file or directory has this group listed as its owner. Choose one: a. A and C b. A and D c. B and C d. B and D
15. When you look at the /etc/group file you see the group kmem listed. Since it does not own any files and no one is using it as a default group, can you delete this group?
16. When looking at the /etc/passwd file, you notice that all the password fields contain ‘x’. What does this mean? Choose one: a. That the password is encrypted. b. That you are using shadow passwords. c. That all passwords are blank. d. That all passwords have expired.
17. In order to improve your system’s security you decide to implement shadow passwords. What command should you use?
18. What file contains the default environment variables when using the bash shell? Choose one: a. ~/.profile b. /bash c. /etc/profile d. ~/bash
19. You have created a subdirectory of your home directory containing your scripts. Since you use the bash shell, what file would you edit to put this directory on your path? Choose one: a. ~/.profile b. /etc/profile c. /etc/bash d. ~/.bash
20. Which of the following interprets your actions when typing at the command line for the operating system? Choose One a. Utility b. Application c. Shell d. Command
21. What can you type at a command line to determine which shell you are using?
22. You want to enter a series of commands from the command-line. What would be the quickest way to do this? Choose One a. Press enter after entering each command and its arguments b. Put them in a script and execute the script c. Separate each command with a semi-colon (;) and press enter after the last command d. Separate each command with a / and press enter after the last command
23. You are entering a long, complex command line and you reach the right side of your screen before you have finished typing. You want to finish typing the necessary commands but have the display wrap around to the left. Which of the following key combinations would achieve this? Choose One a. Esc, /, Enter b. /, Enter c. ctrl-d, enter d. esc, /, ctrl-d
24. After typing in a new command and pressing enter, you receive an error message indicating incorrect syntax. This error message originated from.. Choose one a. The shell b. The operating system c. The command d. The kernel
25. When typing at the command line, the default editor is the _____________ library.
26. You typed the following at the command line ls -al /home/ hadden. What key strokes would you enter to remove the space between the ‘/’ and ‘hadden’ without having to retype the entire line? Choose one a. Ctrl-B, Del b. Esc-b, Del c. Esc-Del, Del d. Ctrl-b, Del
27. You would like to temporarily change your command line editor to be vi. What command should you type to change it?
28. After experimenting with vi as your command line editor, you decide that you want to have vi your default editor every time you log in. What would be the appropriate way to do this? Choose one a. Change the /etc/inputrc file b. Change the /etc/profile file c. Change the ~/.inputrc file d. Change the ~/.profile file
29. You have to type your name and title frequently throughout the day and would like to decrease the number of key strokes you use to type this. Which one of your configuration files would you edit to bind this information to one of the function keys?
30. In your present working directory, you have the files maryletter memo1 MyTelephoneandAddressBook What is the fewest number of keys you can type to open the file MyTelephoneandAddressBook with vi? Choose one a. 6 b. 28 c. 25 d. 4
31. A variable that you can name and assign a value to is called a _____________ variable.
32. You have installed a new application but when you type in the command to start it you get the error message Command not found. What do you need to do to fix this problem? Choose one a. Add the directory containing the application to your path b. Specify the directory’s name whenever you run the application c. Verify that the execute permission has been applied to the command. d. Give everyone read, write and execute permission to the application’s directory.
33. You telnet into several of your servers simultaneously. During the day, you sometimes get confused as to which telnet session is connected to which server. Which of the following commands in your .profile file would make it obvious to which server you are attached? Choose one a. PS1=’\h: \w>’ b. PS1=’\s: \W>’ c. PS1=’\!: \t>’ d. PS1=’\a: \n>’
34. Which of the following environment variables determines your working directory at the completion of a successful login? Choose one a. HOME b. BASH_ENV c. PWD d. BLENDERDIR
35. Every time you attempt to delete a file using the rm utility, the operating system prompts you for confirmation. You know that this is not the customary behavior for the rm command. What is wrong? Choose one a. rm has been aliased as rm -i b. The version of rm installed on your system is incorrect. c. This is the normal behavior of the newest version of rm. d. There is an incorrect link on your system.
36. You are running out of space in your home directory. While looking for files to delete or compress you find a large file called .bash_history and delete it. A few days later, it is back and as large as before. What do you need to do to ensure that its size is smaller? Choose one a. Set the HISTFILESIZE variable to a smaller number. b. Set the HISTSIZE to a smaller number. c. Set the NOHISTFILE variable to true. d. Set the HISTAPPEND variable to true.
37. In order to display the last five commands you have entered using the history command, you would type ___________.
38. In order to display the last five commands you have entered using the fc command, you would type ___________.
39. You previously ran the find command to locate a particular file. You want to run that command again. What would be the quickest way to do this? Choose one a. fc -l find fc n b. history -l find history n c. Retype the command d. fc -n find
40. Using command substitution, how would you display the value of the present working directory? Choose one a. echo $(pwd) b. echo pwd c. $pwd d. pwd | echo
41. You need to search the entire directory structure to locate a specific file. How could you do this and still be able to run other commands while the find command is still searching for your file? Choose one a. find / -name filename & b. find / -name filename c. bg find / -name filename d. &find / -name filename &
42. In order to create a file called DirContents containing the contents of the /etc directory you would type ____________.
43. What would be displayed as the result of issuing the command ps ef? Choose one a. A listing of the user’s running processes formatted as a tree. b. A listing of the stopped processes c. A listing of all the running processes formatted as a tree. d. A listing of all system processes formatted as a tree.
44. What utility can you use to show a dynamic listing of running processes? __________
45. The top utility can be used to change the priority of a running process? Another utility that can also be used to change priority is ___________?
46. What key combination can you press to suspend a running job and place it in the background?
47. You issue the command jobs and receive the following output: [1]- Stopped (tty output) pine [2]+ Stopped (tty output) MyScript How would you bring the MyScript process to the foreground? Choose one: a. fg %2 b. ctrl-c c. fg MyScript d. ctrl-z
48. You enter the command cat MyFile | sort > DirList & and the operating system displays [4] 3499 What does this mean? Choose one a. This is job number 4 and the PID of the sort command is 3499. b. This is job number 4 and the PID of the job is 3499. c. This is job number 3499 and the PID of the cat command is 4. d. This is job number 4 and the PID of the cat command is 3499.
49. You attempt to log out but receive an error message that you cannot. When you issue the jobs command, you see a process that is running in the background. How can you fix this so that you can logout? Choose one a. Issue the kill command with the PID of each running command of the pipeline as an argument. b. Issue the kill command with the job number as an argument. c. Issue the kill command with the PID of the last command as an argument. d. Issue the kill command without any arguments.
50. You have been given the job of administering a new server. It houses a database used by the sales people. This information is changed frequently and is not duplicated anywhere else. What should you do to ensure that this information is not lost? Choose one a. Create a backup strategy that includes backing up this information at least daily. b. Prepare a proposal to purchase a backup server c. Recommend that the server be made part of a cluster. d. Install an additional hard drive in the server.
51. When planning your backup strategy you need to consider how often you will perform a backup, how much time the backup takes and what media you will use. What other factor must you consider when planning your backup strategy? _________
52. Many factors are taken into account when planning a backup strategy. The one most important one is how often does the file ____________.
53. Which one of the following factors does not play a role in choosing the type of backup media to use? Choose one: a. How frequently a file changes b. How long you need to retain the backup c. How much data needs to be backed up d. How frequently the backed up data needs to be accessed
54. When you only back up one partition, this is called a ______ backup. Choose one a. Differential b. Full c. Partial d. Copy
55. When you back up only the files that have changed since the last backup, this is called a ______ backup. Choose one a. Partial b. Differential c. Full d. Copy
56. The easiest, most basic form of backing up a file is to _____ it to another location.
57. When is the most important time to restore a file from your backup? Choose one a. On a regular scheduled basis to verify that the data is available. b. When the system crashes. c. When a user inadvertently loses a file. d. When your boss asks to see how restoring a file works.
58. As a system administrator, you are instructed to backup all the users’ home directories. Which of the following commands would accomplish this? Choose one a. tar rf usersbkup home/* b. tar cf usersbkup home/* c. tar cbf usersbkup home/* d. tar rvf usersbkup home/*
59. What is wrong with the following command? tar cvfb / /dev/tape 20 Choose one a. You cannot use the c option with the b option. b. The correct line should be tar -cvfb / /dev/tape20. c. The arguments are not in the same order as the corresponding modifiers. d. The files to be backed up have not been specified.
60. You need to view the contents of the tarfile called MyBackup.tar. What command would you use? __________
61. After creating a backup of the users’ home directories called backup.cpio you are asked to restore a file called memo.ben. What command should you type?
62. You want to create a compressed backup of the users’ home directories so you issue the command gzip /home/* backup.gz but it fails. The reason that it failed is that gzip will only compress one _______ at a time.
63. You want to create a compressed backup of the users’ home directories. What utility should you use?
64. You routinely compress old log files. You now need to examine a log from two months ago. In order to view its contents without first having to decompress it, use the _________ utility.
65. Which two utilities can you use to set up a job to run at a specified time? Choose one: a. at and crond b. atrun and crontab c. at and crontab d. atd and crond
66. You have written a script called usrs to parse the passwd file and create a list of usernames. You want to have this run at 5 am tomorrow so you can see the results when you get to work. Which of the following commands will work? Choose one: a. at 5:00 wed usrs b. at 5:00 wed -b usrs c. at 5:00 wed -l usrs d. at 5:00 wed -d usrs
67. Several of your users have been scheduling large at jobs to run during peak load times. How can you prevent anyone from scheduling an at job? Choose one: a. delete the file /etc/at.deny b. create an empty file called /etc/at.deny c. create two empty files: /etc/at.deny and /etc/at.allow file d. create an empty file called /etc/at.allow
68. How can you determine who has scheduled at jobs? Choose one: a. at -l b. at -q c. at -d d. atwho
69. When defining a cronjob, there are five fields used to specify when the job will run. What are these fields and what is the correct order? Choose one: a. minute, hour, day of week, day of month, month b. minute, hour, month, day of month, day of week c. minute, hour, day of month, month, day of week d. hour, minute, day of month, month, day of week
70. You have entered the following cronjob. When will it run? 15 * * * 1,3,5 myscript Choose one: a. at 15 minutes after every hour on the 1st, 3rd and 5th of each month. b. at 1:15 am, 3:15 am, and 5:15 am every day c. at 3:00 pm on the 1st, 3rd, and 5th of each month d. at 15 minutes after every hour every Monday, Wednesday, and Friday
71. As the system administrator you need to review Bob’s cronjobs. What command would you use? Choose one: a. crontab -lu bob b. crontab -u bob c. crontab -l d. cronq -lu bob
72. In order to schedule a cronjob, the first task is to create a text file containing the jobs to be run along with the time they are to run. Which of the following commands will run the script MyScript every day at 11:45 pm? Choose one: a. * 23 45 * * MyScript b. 23 45 * * * MyScript c. 45 23 * * * MyScript d. * * * 23 45 MyScript
73. Which daemon must be running in order to have any scheduled jobs run as scheduled? Choose one: a. crond b. atd c. atrun d. crontab
74. You want to ensure that your system is not overloaded with users running multiple scheduled jobs. A policy has been established that only the system administrators can create any scheduled jobs. It is your job to implement this policy. How are you going to do this? Choose one: a. create an empty file called /etc/cron.deny b. create a file called /etc/cron.allow which contains the names of those allowed to schedule jobs. c. create a file called /etc/cron.deny containing all regular usernames. d. create two empty files called /etc/cron.allow and /etc/cron.deny
75. You notice that your server load is exceptionally high during the hours of 10 am to 2 noon. When investigating the cause, you suspect that it may be a cron job scheduled by one of your users. What command can you use to determine if your suspicions are correct? Choose one: a. crontab -u b. crond -u c. crontab -l d. crond -l
76. One of your users, Bob, has created a script to reindex his database. Now he has it scheduled to run every day at 10:30 am. What command should you use to delete this job. Choose one: a. crontab -ru bob b. crontab -u bob c. crontab -du bob d. crontab -lu bob
77. What daemon is responsible for tracking events on your system?
78. What is the name and path of the default configuration file used by the syslogd daemon?
79. You have made changes to the /etc/syslog.conf file. Which of the following commands will cause these changes to be implemented without having to reboot your computer? Choose one: a. kill SIGHINT `cat /var/run/syslogd.pid` b. kill SIGHUP `cat /var/run/syslogd.pid` c. kill SIGHUP syslogd d. kill SIGHINT syslogd
80. Which of the following lines in your /etc/syslog.conf file will cause all critical messages to be logged to the file /var/log/critmessages? Choose one: a. *.=crit /var/log/critmessages b. *crit /var/log/critmessages c. *=crit /var/log/critmessages d. *.crit /var/log/critmessages
81. You wish to have all mail messages except those of type info to the /var/log/mailmessages file. Which of the following lines in your /etc/syslogd.conf file would accomplish this? Choose one: a. mail.*;mail!=info /var/log/mailmessages b. mail.*;mail.=info /var/log/mailmessages c. mail.*;mail.info /var/log/mailmessages d. mail.*;mail.!=info /var/log/mailmessages
82. What is the name and path of the main system log?
83. Which log contains information on currently logged in users? Choose one: a. /var/log/utmp b. /var/log/wtmp c. /var/log/lastlog d. /var/log/messages
84. You have been assigned the task of determining if there are any user accounts defined on your system that have not been used during the last three months. Which log file should you examine to determine this information? Choose one: a. /var/log/wtmp b. /var/log/lastlog c. /var/log/utmp d. /var/log/messages
85. You have been told to configure a method of rotating log files on your system. Which of the following factors do you not need to consider? Choose one: a. date and time of messages b. log size c. frequency of rotation d. amount of available disk space
86. What utility can you use to automate rotation of logs?
87. You wish to rotate all your logs weekly except for the /var/log/wtmp log which you wish to rotate monthly. How could you accomplish this. Choose one: a. Assign a global option to rotate all logs weekly and a local option to rotate the /var/log/wtmp log monthly. b. Assign a local option to rotate all logs weekly and a global option to rotate the /var/log/wtmp log monthly. c. Move the /var/log/wtmp log to a different directory. Run logrotate against the new location. d. Configure logrotate to not rotate the /var/log/wtmp log. Rotate it manually every month.
88. You have configured logrotate to rotate your logs weekly and keep them for eight weeks. You are running our of disk space. What should you do? Choose one: a. Quit using logrotate and manually save old logs to another location. b. Reconfigure logrotate to only save logs for four weeks. c. Configure logrotate to save old files to another location. d. Use the prerotate command to run a script to move the older logs to another location.
89. What command can you use to review boot messages?
90. What file defines the levels of messages written to system log files?
91. What account is created when you install Linux?
92. While logged on as a regular user, your boss calls up and wants you to create a new user account immediately. How can you do this without first having to close your work, log off and logon as root? Choose one: a. Issue the command rootlog. b. Issue the command su and type exit when finished. c. Issue the command su and type logoff when finished. d. Issue the command logon root and type exit when finished.
93. Which file defines all users on your system? Choose one: a. /etc/passwd b. /etc/users c. /etc/password d. /etc/user.conf
94. There are seven fields in the /etc/passwd file. Which of the following lists all the fields in the correct order? Choose one: a. username, UID, GID, home directory, command, comment b. username, UID, GID, comment, home directory, command c. UID, username, GID, home directory, comment, command d. username, UID, group name, GID, home directory, comment
95. Which of the following user names is invalid? Choose one: a. Theresa Hadden b. thadden c. TheresaH d. T.H.
96. In order to prevent a user from logging in, you can add a(n) ________at the beginning of the password field.
97. The beginning user identifier is defined in the _________ file.
98. Which field is used to define the user’s default shell?
99. Bob Armstrong, who has a username of boba, calls to tell you he forgot his password. What command should you use to reset his command?
100. Your company has implemented a policy that users’ passwords must be reset every ninety days. Since you have over 100 users you created a file with each username and the new password. How are you going to change the old passwords to the new ones? Choose one: a. Use the chpasswd command along with the name of the file containing the new passwords. b. Use the passwd command with the -f option and the name of the file containing the new passwords. c. Open the /etc/passwd file in a text editor and manually change each password. d. Use the passwd command with the -u option.