Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
z u s e j.h r u i GL314 F d t t iu LINUX r t e TROUBLESHOOTING e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu RHEL7 SLES12
The contents of this course and all its modules and related materials, including handouts to audience members, are copyright ©2016 Guru Labs L.C.
No part of this publication may be stored in a retrieval system, transmitted or reproduced in any way, including, but not limited to, photocopy, photograph, magnetic, electronic or other record, without the prior written permission of Guru Labs.
This curriculum contains proprietary information which is for the exclusive use of customers of Guru Labs L.C., and is not to be shared with personnel other than those in attendance at this course. This instructional program, including all material provided herein, is supplied without any guarantees from Guru Labs L.C. Guru Labs L.C. assumes no liability for damages or legal action arising from the use or misuse of contents or details contained herein. Photocopying any part of this manual without prior written consent of Guru Labs L.C. is a violation of federal law. This manual should not appear to be a photocopy. If you believe that Guru Labs training materials are being photocopied without permission, please email
[email protected] or call 1-801-298-5227. Guru Labs L.C. accepts no liability for any claims, demands, losses, damages, costs or expenses suffered or incurred howsoever arising from or in connection with the use of this courseware. All trademarks are the property of their respective owners.
Version: GL314S-R7S12-H01
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Table of Contents Chapter 1 TROUBLESHOOTING METHODOLOGY The Troubleshooting Mindset Evaluating Possible Solutions Identifying and Implementing Change Define and Follow Policies Working with Others Finding Documentation Finding Help Online
ii
Recovery: Network Utilities Lab Tasks 1. Recovery Runlevels 2. Recovering Damaged MBR 3. Recover from Deleted Critical Files
sz
u e h r . j u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Chapter 2 TROUBLESHOOTING TOOLS Common Troubleshooting Tools RPM Queries RPM Verification SRPM and spec Files Hardware Discovery Tools Configuring New Hardware with hwinfo strace and ltrace lsof and fuser ipcs and ipcrm iostat, mpstat, and vmstat Using hdparm to Measure Troubleshooting with the ip command Name Resolution ss/netstat and rpcinfo nmap Netcat tcpdump and wireshark Lab Tasks 1. Determining the System's Configuration 2. Troubleshooting with rpm 3. Process Related Tools 4. Network Tools Chapter 3 RESCUE ENVIRONMENTS Diagnostic/Recovery Rescue Procedures Recovery: mount & chroot Recovery Examples
1 2 3 4 5 6 7 9
1 2 3 4 5 6 7 8 10 12 13 15 16 18 20 21 22 23 26 27 42 45 53
1 2 3 4 6
7 8 9 11 15
Chapter 4 TOPIC GROUP 1 Linux Boot Process System Boot Method Overview systemd System and Service Manager Using systemd Booting Linux on PCs Troubleshooting With GRUB 2 Boot Process Troubleshooting Troubleshooting: Linux and Init Process Management Process Management Tools Troubleshooting Processes: top Filesystem Concepts Filesystem Troubleshooting Backup Concepts Backup Troubleshooting Backup Troubleshooting Lab Tasks 1. Troubleshooting Problems: Topic Group 1
1 2 3 4 6 8 10 12 13 14 15 16 17 18 20 21 22 23 24
Chapter 5 TOPIC GROUP 2 Networking Tools Linux Network Interfaces Networking Commands Review NetworkManager Networking Troubleshooting Networking Troubleshooting Virtual Interfaces/IP Aliases Network Teaming Xinetd Concepts Xinetd Troubleshooting TCP Wrappers Concepts TCP Wrappers Concepts
1 2 3 5 6 8 9 10 11 15 16 17 18
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
TCP Wrappers Troubleshooting Netfilter/iptables Concepts Netfilter/iptables Troubleshooting Lab Tasks 1. Troubleshooting Problems: Topic Group 2
19 20 21 22 23
Chapter 6 TOPIC GROUP 3 X11 Concepts X11 Server Operation X11 Troubleshooting Rsyslog Concepts System Logging systemd Journal systemd Journal's journactl Secure Logging with Journal's Log Sealing Syslog Troubleshooting RPM Concepts RPM Troubleshooting Common Unix Printing System (CUPS) CUPS Troubleshooting CUPS Troubleshooting at & cron at & cron Usage at & cron Troubleshooting Lab Tasks 1. Troubleshooting Problems: Topic Group 3
1 2 3 4 6 7 9 11 13 15 16 17 18 20 21 22 23 24 25 26
Chapter 7 TOPIC GROUP 4 Users and Groups Users and Groups Troubleshooting PAM Concepts PAM Troubleshooting Filesystem Quotas Quotas Troubleshooting File Access Control Lists FACL Troubleshooting SELinux Concepts SELinux Troubleshooting SELinux Troubleshooting Continued Lab Tasks 1. Troubleshooting Problems: Topic Group 4
1 2 3 4 5 6 7 8 9 10 12 14 16 17
Chapter 8 TOPIC GROUP 5 Kernel Modules Kernel Modules Troubleshooting Logical Volume Management Creating Logical Volumes LVM Deployment Issues VG Migration, PV Resizing & Troubeshooting Software RAID Overview RAID Troubleshooting Multipathing Overview SAN Multipathing Multipath Configuration Multipathing Best Practices LDAP and OpenLDAP Troubleshooting OpenLDAP NIS and NIS+ (YP) NIS Troubleshooting Aids Lab Tasks 1. Troubleshooting Problems: Topic Group 5
1 2 3 4 5 6 7 9 10 11 12 13 15 17 18 20 21 22 23
Chapter 9 TOPIC GROUP 6 DNS Concepts DNS Troubleshooting DNS Troubleshooting Apache Concepts Apache Troubleshooting Apache Troubleshooting FTP Concepts FTP Troubleshooting Squid Concepts Squid Troubleshooting Lab Tasks 1. Troubleshooting Problems: Topic Group 6
1 2 3 4 5 6 7 8 9 10 11 12 13
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu Chapter 10 TOPIC GROUP 7 Samba Concepts Samba Troubleshooting Postfix Concepts Postfix Troubleshooting Postfix Troubleshooting
1 2 3 4 6 8 iii
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
IMAP & POP Concepts IMAP/POP Troubleshooting MariaDB MariaDB Troubleshooting Lab Tasks 1. Troubleshooting Problems: Topic Group 7
9 10 11 12 13 14
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu iv
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Typographic Conventions The fonts, layout, and typographic conventions of this book have been carefully chosen to increase readability. Please take a moment to familiarize yourself with them. A Warning and Solution
0O 1l
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
A common problem with computer training and reference materials is the confusion of the numbers "zero" and "one" with the letters "oh" and "ell". To avoid this confusion, this book uses a fixed-width font that makes each letter and number distinct. Typefaces Used and Their Meanings
The following typeface conventions have been followed in this book:
fixed-width normal ⇒ Used to denote file names and directories. For example, the /etc/passwd file or /etc/sysconfig/directory. Also used for computer text, particularily command line output.
The number "zero".
The letter "oh".
The number "one".
The letter "ell".
fixed-width italic ⇒ Indicates that a substitution is required. For example, the string stationX is commonly used to indicate that the student is expected to replace X with his or her own station number, such as station3. fixed-width bold ⇒ Used to set apart commands. For example, the sed command. Also used to indicate input a user might type on the command line. For example, ssh -X station3. fixed-width bold italic ⇒ Used when a substitution is required within a command or user input. For example, ssh -X stationX. fixed-width underlined ⇒ Used to denote URLs. For example, http://www.gurulabs.com/. variable-width bold ⇒ Used within labs to indicate a required student action that is not typed on the command line. Occasional variations from these conventions occur to increase clarity. This is most apparent in the labs where bold text is only used to indicate commands the student must enter or actions the student must perform.
v
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Typographic Conventions Terms and Definitions
Line Wrapping
The following format is used to introduce and define a series of terms:
Occasionally content that should be on a single line, such as command line input or URLs, must be broken across multiple lines in order to fit on the page. When this is the case, a special symbol is used to indicate to the reader what has happened. When copying the content, the line breaks should not be included. For example, the following hypothetical PAM configuration should only take two actual lines:
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
deprecate ⇒ To indicate that something is considered obsolete, with the intent of future removal. frob ⇒ To manipulate or adjust, typically for fun, as opposed to tweak. grok ⇒ To understand. Connotes intimate and exhaustive knowledge. hork ⇒ To break, generally beyond hope of repair. hosed ⇒ A metaphor referring to a Cray that crashed after the disconnection of coolant hoses. Upon correction, users were assured the system was rehosed. mung (or munge) ⇒ Mash Until No Good: to modify a file, often irreversibly. troll ⇒ To bait, or provoke, an argument, often targeted towards the newbie. Also used to refer to a person that regularly trolls. twiddle ⇒ To make small, often aimless, changes. Similar to frob. When discussing a command, this same format is also used to show and describe a list of common or important command options. For example, the following ssh options:
-X ⇒ Enables X11 forwarding. In older versions of OpenSSH that do not include -Y, this enables trusted X11 forwarding. In newer versions
of OpenSSH, this enables a more secure, limited type of forwarding. -Y ⇒ Enables trusted X11 forwarding. Although less secure, trusted forwarding may be required for compatibility with certain programs. Representing Keyboard Keystrokes
When it is necessary to press a series of keys, the series of keystrokes will be represented without a space between each key. For example, the following means to press the "j" key three times: jjj
When it is necessary to press keys at the same time, the combination will be represented with a plus between each key. For example, the following means to press the "ctrl," "alt," and "backspace" keys at the same time: Ó¿Ô¿×. Uppercase letters are treated the same: Ò¿A
vi
password required /lib/security/pam_cracklib.so retry=3a type= minlen=12 dcredit=2 ucredit=2 lcredit=0 ocredit=2 password required /lib/security/pam_unix.so use_authtok Representing File Edits
File edits are represented using a consistent layout similar to the unified diff format. When a line should be added, it is shown in bold with a plus sign to the left. When a line should be deleted, it is shown struck out with a minus sign to the left. When a line should be modified, it is shown twice. The old version of the line is shown struck out with a minus sign to the left. The new version of the line is shown below the old version, bold and with a plus sign to the left. Unmodified lines are often included to provide context for the edit. For example, the following describes modification of an existing line and addition of a new line to the OpenSSH server configuration file: File: /etc/ssh/sshd_config + +
#LoginGraceTime 2m #PermitRootLogin yes PermitRootLogin no AllowUsers sjansen #StrictModes yes
Note that the standard file edit representation may not be used when it is important that the edit be performed using a specific editor or method. In these rare cases, the editor specific actions will be given instead.
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Lab Conventions Lab Task Headers
Variable Data Substitutions
Every lab task begins with three standard informational headers: "Objectives," "Requirements," and "Relevance". Some tasks also include a "Notices" section. Each section has a distinct purpose.
In some lab tasks, students are required to replace portions of commands with variable data. Variable substitution are represented using italic fonts. For example, X and Y.
Objectives ⇒ An outline of what will be accomplished in the lab task. Requirements ⇒ A list of requirements for the task. For example, whether it must be performed in the graphical environment, or whether multiple computers are needed for the lab task. Relevance ⇒ A brief example of how concepts presented in the lab task might be applied in the real world. Notices ⇒ Special information or warnings needed to successfully complete the lab task. For example, unusual prerequisites or common sources of difficulty.
Substitutions are used most often in lab tasks requiring more than one computer. For example, if a student on station4 were working with a student on station2, the lab task would refer to stationX and stationY
Command Prompts
Truncated Command Examples
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Though different shells, and distributions, have different prompt characters, examples will use a $ prompt for commands to be run as a normal user (like guru or visitor), and commands with a # prompt should be run as the root user. For example:
$ whoami guru $ su Password: password # whoami root
Occasionally the prompt will contain additional information. For example, when portions of a lab task should be performed on two different stations (always of the same distribution), the prompt will be expanded to:
stationX$ whoami guru stationX$ ssh root@stationY root@stationY's password: password stationY# whoami root
stationX$ ssh root@stationY
and each would be responsible for interpreting the X and Y as 4 and 2.
station4$ ssh root@station2
Command output is occasionally omitted or truncated in examples. There are two type of omissions: complete or partial. Sometimes the existence of a command’s output, and not its content, is all that matters. Other times, a command’s output is too variable to reliably represent. In both cases, when a command should produce output, but an example of that output is not provided, the following format is used:
$ cat /etc/passwd . . . output omitted . . .
In general, at least a partial output example is included after commands. When example output has been trimmed to include only certain lines, the following format is used:
$ cat /etc/passwd root:x:0:0:root:/root:/bin/bash . . . snip . . . clints:x:500:500:Clint Savage:/home/clints:/bin/zsh . . . snip . . .
vii
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Lab Conventions Distribution Specific Information
Action Lists
This courseware is designed to support multiple Linux distributions. When there are differences between supported distributions, each version is labeled with the appropriate base strings:
Some lab steps consist of a list of conceptually related actions. A description of each action and its effect is shown to the right or under the action. Alternating actions are shaded to aid readability. For example, the following action list describes one possible way to launch and use xkill to kill a graphical application:
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
R ⇒ Red Hat Enterprise Linux (RHEL) S ⇒ SUSE Linux Enterprise Server (SLES) U ⇒ Ubuntu
Ô¿Å
xkillÕ
The specific supported version is appended to the base distribution strings, so for Red Hat Enterprise Linux version 6 the complete string is: R6.
Certain lab tasks are designed to be completed on only a sub-set of the supported Linux distributions. If the distribution you are using is not shown in the list of supported distributions for the lab task, then you should skip that task. Certain lab steps are only to be performed on a sub-set of the supported Linux distributions. In this case, the step will start with a standardized string that indicates which distributions the step should be performed on. When completing lab tasks, skip any steps that do not list your chosen distribution. For example:
1) [R4] This step should only be performed on RHEL4.
Because of a bug in RHEL4's Japanese fonts...
Sometimes commands or command output is distribution specific. In these cases, the matching distribution string will be shown to the left of the command or output. For example: [R6] [S11]
viii
$ grep -i linux /etc/*-release | cut -d: -f2 Red Hat Enterprise Linux Server release 6.0 (Santiago) SUSE Linux Enterprise Server 11 (i586)
Open the "Run Application" dialog.
Launch xkill. The cursor should change, usually to a skull and crossbones. Click on a window of the application to kill. Indicate which process to kill by clicking on it. All of the application’s windows should disappear.
Callouts
Occasionally lab steps will feature a shaded line that extends to a note in the right margin. This note, referred to as a "callout," is used to provide additional commentary. This commentary is never necessary to complete the lab succesfully and could in theory be ignored. However, callouts do provide valuable information such as insight into why a particular command or option is being used, the meaning of less obvious command output, and tips or tricks such as alternate ways of accomplishing the task at hand. [S10]
$ sux Password: password # xclock
On SLES10, the sux command copies the MIT-MAGIC-COOKIE-1 so that graphical applications can be run after switching to another user account. The SLES10 su command did not do this.
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Content The Troubleshooting Mindset . . . . . . . . . . . . . . . . . . . . . . . . Evaluating Possible Solutions . . . . . . . . . . . . . . . . . . . . . . . . Identifying and Implementing Change . . . . . . . . . . . . . . . . . Define and Follow Policies . . . . . . . . . . . . . . . . . . . . . . . . . . . Working with Others . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Finding Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Finding Help Online . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2 3 4 5 6 7 9
z u s e j.h r u i F Chapter d t t iu r t e e b z o R m e : n to TROUBLESHOOTING d @ e rt METHODOLOGY s e n b e c o i r L sz. e r fu
1
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
The Troubleshooting Mindset The Art and Science Knowledge is Key Understand Inter-dependencies Stay Informed Plan For Problems Practice Solving Problems
The Art and Science
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Effective troubleshooting is a skill that takes time and effort to develop. It must be approached as both an art and a science. It is possible to teach general methodologies, but there is no replacement for the instinct that only experience can instill. Because of this interplay between training and experience, each person will approach troubleshooting differently. An effective troubleshooter will recognize and develop both approaches to problem solving. Knowledge is Key
Although troubleshooting skills can be transferred from one subject to another, knowledge of relevant material is key to solving any problem. If a person lacks necessary information, troubleshooting will take longer because it must be collected. Solving problems without knowledge of a system and its components is almost impossible. Understand Inter-dependencies
No system can be understood fully without seeing it as both a single object and as a group of interconnected objects. Troubleshooting often involves many components and the actual cause of a problem may be obscured by these components.
Similarly, all tools should be seen as building blocks. Because there are so many possible problems, good tools are designed so that they can be combined. An effective troubleshooter will know what tools are available and be able to combine them to solve new problems. 1-2
Stay Informed
New technologies are regularly introduced while old technologies are constantly refined. An effective troubleshooter can not afford to rely on static knowledge of a system. New security issues may be discovered; solutions that worked in the past may no longer work in the future. Seeing changes coming instead of discovering them when something breaks makes change exciting instead of scary. Plan for Problems
The best way to address problems is to prevent them. None the less, bad things happen. When things are going well, this is the time to create an infrastructure capable of responding to inevitable problems. This could mean, for example, having sufficient backup hardware on-site and pre-configured. Practice Solving Problems
Certain types of failures and responses can be easily predicted. Wise troubleshooters will make time to practice such scenarios regularly. Doing so will keep knowledge fresh and expose possible issues in a less stressful environment. An obvious example is backups. Many organizations perform at least occasional backup of critical data. Backups are only a partial solution, however. It is just as important, if not more so, to regularly practice data recovery.
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Evaluating Possible Solutions Not All Solutions Are Equal Don't Guess Reduce the Number of Possibilities Try Simple Solutions Before Complex Consider Unlikely Scenarios Address Causes Not Symptoms
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Not All Solutions Are Equal
Try Simple Solutions Before Complex Ones
Often, a problem will have more than one solution. Even under time critical conditions, try to consider all possible approaches. The ideal solution is both permanent and easy to implement. Some solutions include the danger of new problems in the future and should therefore be avoided. However, when speed of response is the highest priority, implementing a possibly dangerous temporary fix may be the most appropriate response. In such cases be sure to include specific plans to return and implement a long-term fix in the near future.
It is often tempting to assume complex causes when a problem arises. The truth is that most problems have simple solutions. "When you hear hooves: think horses not zebras." Don't waste time on slow and complex solutions until fast simple ones have already been tried.
Don't Guess
Troubleshooting can be a very frustrating experience. It is easy to get discouraged and stop approaching the problem effectively. At a certain point, it becomes tempting to make decisions using a magic eight ball instead of logic and experience. Doing so can cause new problems to be introduced. Take time to stop and think about the probable result of an approach and the reason for trying it. Reduce the Number of Possibilities
For every problem, there could be multiple potential causes. The faster these causes can be eliminated the sooner the problem will be resolved. Try things that will eliminate the largest number of possibilities first.
Consider Unlikely Scenarios
Although it is important to consider the most simple and likely scenarios first, don't ignore other possibilities. A program's strange behavior may not be the result of bugs in the software but hardware failure instead. System misconfiguration is more likely than a security breach, but the later is a real possibility. A little paranoia can be good when troubleshooting, but stay realistic. Address Causes Not Symptoms
A single problem can have multiple effects on a system. It is easy to focus on resolving the symptoms of the problem instead of the base problem itself. For example, cleaning a filled filesystem without trying to find out why the filesystem was filled will probably just result in a filled filesystem again in the future. If the true source of a problem is not resolved now, it is likely to return again; usually at an even worse time.
1-3
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Identifying and Implementing Change First Understand the Problem Identify System Changes Avoid Irreversible Changes Verify Before Committing
First Understand the Problem
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Do not try to resolve a problem without understanding what is really happening. First, check any log files and observe the system. Blindly modifying a system can make the problem more complex and/or introduce additional, possibly unrelated problems. The more the system is changed while troubleshooting, the more likely it is that any solution will address only a symptom instead of the base issue. Identify System Changes
When a problem occurs, it is common to hear the claim that "nothing was changed, the problem just appeared". While this is possible, it is highly unlikely. The majority of problems occur as unexpected consequences of changes. For example, a sudden change could be the result of another administrator's actions, an recently updated software package, or a system resource being gradually consumed. All troubleshooting should start with the question: "How has the system changed?" A simpler question might be: "When did the system last behave as expected?" The easier it is to answer these questions, the easier it will be to resolve a problem. Even a simple, administrator maintained changelog in root's home directory can significantly speed troubleshooting.
1-4
Avoid Irreversible Changes
If an attempted solution doesn't solve the problem, or makes it worse, you want to be able to roll back the system. Include a comment and copy of the original content above any change to a configuration file. This helps to identify which changes have already been made to the system and why. Move, copy or rename files instead of deleting them. Have recent backups for those times when you make a mistake and need to restore a file or files. It can be tempting to blindly type commands hoping to resolve the problem quickly, don't! Stop and think about each action while troubleshooting. Verify Before Committing
Whenever possible, verify changes before committing them. Some services include syntax checkers that can be run before the service is restarted or reloaded with the new configuration. Examples include Samba's testparm; BIND's named-checkconf and named-checkzone; and Apache's apachectl configtest. Stepping back to re-read configuration files, re-check permissions, etc. will occasionally reveal a forgotten detail. Taking the time to do a complete check up front can save much time in the future.
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Define and Follow Policies Basic Policy Issues • How problems are reported. • Who approves changes. • Who can change what. • How changes are coordinated. • When changes can be made. • How changes are recorded. Document! Document! Document! Keep Policies Reasonable
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Define and Follow Policies
Document! Document! Document!
It is impossible to list all types of troubleshooting policies, but in general they should make clear how to approach problems and decide whether a possible solution is appropriate or not. Many policies may feel like useless overhead, but well defined policies make troubleshooting faster and less stressful. Treating every problem as a unique fire leaves only time to fight fires.
One policy aspect that deserves special consideration is the creation of change documentation. When a system is modified in any way, the change should be recorded. Change documentation should be easily available to anyone that might later need the information. Not only may this information be needed by other administrators, after time has passed, it may be needed by the person who made the change. In addition, keeping notes while debugging complex problems can prevent wasted time. After several hours of troubleshooting, it can be hard to remember everything that has been tried and discovered.
Some policies may be designed to ensure that new problems aren't created. Others may be designed to maintain integrity of core business infrastructure. For example, it may be discovered that a firewall rule is interfering with a service, before punching a hole in the firewall more secure options or elimination of the service should be explored. Likewise, it may be appropriate for a lone administrator to change a corporate website without approval, but it probably isn't appropriate for any employee at a university to be able to change academic data without proper auditing. A complete set of policies should include at least: y How problems are reported y Who approves changes y Who can change what y How changes are coordinated y When changes can be made y How changes are recorded
Keep Policies Reasonable
Only implement policies that are likely to be followed. Most people consider achieving results more important than following policy. If organizational policy demands good documentation, make producing and using documentation as easy as possible. Focus on the usefulness and correctness of submitted documentation, not its spelling or layout. When a policy can not be made reasonably painless, education is the only option. If a person does not understand the reasoning behind and benefits of a particular policy, that person is more likely to cynically ignore all organizational policy.
1-5
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Working with Others Recognize Personal Limits Recognize Differences in Background Provide Context Embrace Ownership
Recognize Personal Limits
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Some times it can be hard to recognize when another person should be involved in the troubleshooting process. This could mean asking a question, asking for help with a part of the problem, or even handing off the entire problem to someone new. It is important to understand your own capabilities and limits. Likewise, it is helpful to know what skills others have. Lack of experience is not the only type of personal limitation that must be recognized, physical limitations are also very real. Lack of sleep, illness, even hunger can result in poor performance while troubleshooting. When other options are available, don't risk making the problem worse. Recognize Differences in Background
Subtle differences in background and objectives can complicate communication. System administrators, database administrators, and developers (just to name a few) each have unique cultures that often use the same terminology with slightly different meanings. Be sensitive to such cultural differences. Sometimes troubleshooting a problem actually means discovering a breakdown in communication resulting from mutual misunderstanding.
1-6
Provide Context
When asking for help, include sufficient context for the other person to understand your needs. Instead of asking: "Where can I find the main circuit breaker for the server room?", a better question would be: "I need to verify that the backup generators are working. How should I do that?" Don't say: "Move your application to the backup server." Instead, say: "We're running out of disk space on the primary server. I need your help to free up system resources." When another person forgets to include sufficient context, it is often helpful to ask "What's your real question?" or "What problem are you trying to solve." Embrace Ownership
Although it's desirable for all members of a team to have, at least, a minimal level of competency, don't fight the tendency of people to become experts on specific things over time. It's unavoidable that certain people will become more proficient with specific aspects of organizational infrastructure. Recognize other people's "ownership" and consult with them whenever possible.
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Finding Documentation System Installed Documentation • /usr/share/doc/ (RHEL7) • /usr/share/doc/packages/ (SLES12) • man • info Changelogs Source Code
System Documentation
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Most people do not realize how much information is available on a Linux system itself. This information includes man pages, info pages, the contents of documentation directories and documentation systems designed for specific programs or environments. Knowing where to find useful information is a critical part of troubleshooting. Man pages are commonly under-appreciated. Not only do they serve as quick reminders for command options, but some programs make their complete documentation available as man pages. They can also be useful for programmers. Most core system libraries, and even some other libraries like OpenSSL, include documentation of each available function as a man page. Similarly, perl has extensive and useful documentation available as man pages. While some man pages include simple examples, info pages are more likely to contain more extensive and useful examples. Info pages are also more likely to include useful tutorials. Info pages can be viewed with the info command. Another useful application is the pinfo command, which allows viewing of info pages and supports features like highlighting, hyperlinks and the use of a mouse. Complex programs often include complete documentation systems. Some graphical programs include documentation under the Help menu item. GNOME and KDE both provide frameworks that can be used to search and explore most forms of installed documentation including man and info pages. Programming languages often also include custom solutions such as Perl's perldoc and Python's pydoc.
Most packages include additional documentation and is usually stored in a documentation directory. This directory can include changelogs, sample configuration files, or all of the documentation available on a project's web site. [R7] The following applies to RHEL7 only:
On RHEL7 the directory is /usr/share/doc/. Inside that directory are versioned sub-directories for each RPM package. [S12] The following applies to SLES12 only:
On SLES12 the directory is /usr/share/doc/packages. Inside that directory are un-versioned sub-directories for each RPM package installed. Changelogs
Most programs include a list of changes made each time a new version is released. This is called a changelog. Sometimes its included in the documentation directory. Other times it can be found on the project's Web site. Unlike most proprietary software, open source software is generally very good about including specific details. Knowing what has changed, what bugs have been fixed, and what features have been added can be extremely useful when troubleshooting a specific program.
1-7
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Source Code When all else fails, the definitive source of software documentation is the source code itself. For a person with the necessary skills, the availability of source code is an empowering difference between open and closed software. For example, given a specific error message, it is possible to identify where in the code the message occurs and discover what conditions could trigger it. Even if you're not a programmer, it can be useful to look at the source code and try to understand what is happening.
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
1-8
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Finding Help Online Communities and Contributing Don't Get Side-tracked Mailing Lists and Forums • Email • Usenet • Web Forums • Netiquette Online Bug Trackers IRC • Freenode User Groups Search Engines
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Communities and Contributing
One of the greatest strengths of open source software is the communities built around it. Almost every significant program has a group of developers and users that work together to help each other. In fact, the line between developers and users often blurs as users contribute bug reports, fixes, documentation, and technical support. Knowing how to find, use, and contribute to these communities is an important part of using open source software, and finding help.
However, it is easy to continuously take from the community without giving anything in return. Take some time to help others in the community. This will provide opportunities to learn while facing and helping to solve realistic problems. This experience may be invaluable when dealing with future problems. More importantly, you will be giving back to the community. Don't Get Side-tracked
Exploring complex problems will invariably reveal new information. It is tempting to get side-tracked while researching information related to a problem. It may seem obvious, but focusing on the problem itself is important to troubleshooting. Yes, having knowledge on a wide range of subjects is a characteristic of a good troubleshooter, but resist the urge to explore new terrain instead of solving the problem at hand. Mailing Lists
forums and Usenet groups. However, the most important form of communication in the open source world are email lists. Many projects have multiple mailing lists, some focused on development and testing, others on providing user support. Some lists can generate a lot of traffic; subscribers may receive literally hundreds of messages per day. Using filters to organize mail is highly recommended. Before actively participating in an online community, it is a good idea to lurk in the background and learn cultural norms. Certain rules of online etiquette, known as netiquette, are common to almost all communities. Two good resources for learning proper netiquette are: y Eric S. Raymond's "How to Ask Questions the Smart Way" [http://www.catb.org/žesr/faqs/smart-questions.html] y The subtly humorous RFC 1855 [http://rfc.net/rfc1855.html]
Online Bug Trackers
Online bug trackers are becoming increasingly common and can be very useful when troubleshooting or reporting problems. Online bug trackers make it possible to communicate an issue directly to the developers and to monitor the problem for a resolution. Until a fix is provided, work-arounds may also be documented in the bug tracker. Instead of wasting time re-discovering a bug someone else has already found, it may already be listed in the bug tracker with a solution to the problem. Online bug trackers should be one of the first places to look online when troubleshooting.
Most open source projects have Web sites, and some have Web 1-9
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
IRC IRC is one of the most common forms of online collaboration for programing communites. It provides live interaction with developers and users. If you have a program that you use often, you should find whether there is an IRC community dedicated to that program, and participate. This is a great way to make yourself visible to the community, and in doing so developers and users will be more inclined to help when you have a pressing need. Keep in mind that many IRC users are not always in front of their computer when you ask a question. Be patient when waiting for a response, or in understanding the response you get. It is also helpful to identify the difference between new users, and active members and developers. Finally, IRC is not entirely standardized, so different services may have different protocols and norms that need to be observed. In the open source world, the most common IRC service is freenode.net. User Groups
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Besides the communities built around specific projects, there are also communities based on geographic location. The most common of these are Linux User Groups. Local groups can also be found that focus on programming languages, such as Perl and Java, or software, such as Oracle. User groups generally have local meetings with some type of training, mailing lists for wide-ranging discussion, and social activities of some type. They can be a great way to meet people who share the same interests and will be able to help with problems in the future. A list of most Linux user groups world wide can be found at http://www.linux.org/groups/index.html and additional groups can sometimes be found by searching online. Search Engines
Most everyone is aware that online search engines can be invaluable when troubleshooting. Tips for more effective searching include: y Search for exact error message. For example, "Unable to load default keyring: error=74" is much better than 'linux kernel module error'. y Use double quotes when searching for specific strings. Otherwise, the search engine will focus on each word separately. y A minus sign can often be added in front of a word to request 1-10
that a specific word not be present. For example, one possible search might be 'best linux text editor -emacs'. y Google includes the ability to request that only a specific site be searched. For example, 'troubleshooting class site:gurulabs.com'.
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Content Common Troubleshooting Tools . . . . . . . . . . . . . . . . . . . . . . 2 RPM Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 RPM Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 SRPM and spec Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Hardware Discovery Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Configuring New Hardware with hwinfo . . . . . . . . . . . . . . . 7 strace and ltrace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 lsof and fuser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 ipcs and ipcrm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 iostat, mpstat, and vmstat . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Using hdparm to Measure . . . . . . . . . . . . . . . . . . . . . . . . . . 15 Troubleshooting with the ip command . . . . . . . . . . . . . . . 16 Name Resolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 ss/netstat and rpcinfo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 nmap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 Netcat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 tcpdump and wireshark . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 Lab Tasks 26 1. Determining the System's Configuration . . . . . . . . . . . . 27 2. Troubleshooting with rpm . . . . . . . . . . . . . . . . . . . . . . . . 42 3. Process Related Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4. Network Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
z u s e j.h r u i F Chapter d t t iu r t e e b z o R m e : n to TROUBLESHOOTING d @ e rt TOOLS s e n b e c o i r L sz. e r fu
2
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Common Troubleshooting Tools RPM packages • rpm -q, rpm -V Hardware detection and configuration • lspci, lsusb, dmidecode, biosdecode, {h,s}dparm Interaction with system • strace, ltrace • fuser, lsof • ipcs, ipcrm • vmstat, iostat, sar Networking • ifconfig, ip, arp, route, host, dig, getent, ss, rpcinfo, nc, tcpdump, wireshark
Common Tools
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Good troubleshooting involves gathering information about a problem, and the system state. A common set of tools used to collect information, and their application to the troubleshooting process, is described in more detail in the pages that follow. Command(s)
Description
rpm
Examine/change the software installed on a system. Powerful query functions can verify packages and identify problems.
lspci, lsusb, lsscsi
Print kernel detected hardware details (e.g. PCI bus, USB).
Command(s)
Description
vmstat, iostat, sar
Examine the state of system resources and sub-systems (e.g. CPU, memory, disk)
ip, ifconfig, route, arp Examine or change the configuration of network interfaces, ARP mappings, and network routes.
host, dig
Perform name resolution without using the system resolver.
getent
Perform a name service database lookup (e.g. NIS)
rpcinfo
Connect to an RPC server and list information about the registered RPC services.
ss/netstat
Examine a wide variety of network related statistics, including the state of current network connections.
tcpdump, tshark/wireshark
Capture and analyze network traffic.
nc
Connect to, or listen on, a TCP or UDP port: useful for testing network services.
{dmi,bios}decode Print low level hardware details from the BIOS. hdparm
Get and set EIDE/SATA/PATA/SAS/SAT kernel parameters
strace, ltrace
Examine program interaction with system and library calls.
fuser, lsof
Examine files and connections opened by a process.
ipcs, ipcrm
List and remove shared resources (shared memory segments, semaphores, message queues).
2-2
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
RPM Queries rpm -q or rpmquery • -i – basic information • -l – list of files
RPM Queries
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
The rpm command provides a number of options that can be useful when troubleshooting a system. These examples illustrate a few invocations that can be useful for troubleshooting. Display information and the file list of an installed package:
# rpm -qil package_name . . . output omitted . . .
Display the changelog from a package file:
# rpm -qp --changelog package_name-version.arch.rpm . . . output omitted . . .
Display the scripts triggered when an installed package was installed or if it were removed:
# rpm -q --scripts package_name . . . output omitted . . .
Note that the information (and format) output by a query is highly flexible. For a list of variables that can be used when specifying the query output format, run:
# rpm -q --querytags . . . output omitted . . .
Display the number of installed packages for each vendor:
# rpm 2 11 1618 631
-qa --queryformat •%{VENDOR}\n• | sort | uniq -c Dag Apt Repository, http://dag.wieers.com/apt/ Guru Labs Red Hat, Inc. SUSE LINUX Products GmbH, Nuernberg, Germany
Recently installed packages are suspect when something breaks. Display the names of the last three packages installed, along with their install date and time:
# rpm -qa --last | head -n 3 ypserv-2.32.1-2.5.x86_64 apache2-prefork-2.4.10-6.1.x86_64 mailman-2.1.17-1.18.x86_64
Tue Apr 7 14:50:49 2015 Tue Apr 7 13:44:11 2015 Tue Apr 7 13:44:10 2015
Display the names and sizes of all packages larger than 50MB:
# rpm -qa --queryformat •%{SIZE}\ %{NAME}\n• |a awk •($1 > 52428800) {print}• 149170791 MozillaFirefox-translations 72031951 kernel-firmware 60205855 Mesa 83034266 MozillaFirefox 140374881 kernel-default 66042865 ghostscript 119762706 glibc-locale
2-3
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
RPM Verification rpm -V or rpmverify rpm --checksig or rpm -K
RPM Verification
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
When an RPM package is installed, the meta-data contained in the package header is stored in the RPM database. This meta-data includes things such as the location, size, owner, permissions and MD5 hash for the file.
The rpm command can be used to compare one or more package's existing files on the system with the package's original files at package install time. This is a quick way to discover any changes to files provided by packages. For example, consider the following output where files from the uucp package have been damaged:
# rpm -V .......T missing ..5....T .M...... .....UG. S.5....T
uucp c /etc/uucp/call c /etc/uucp/dial c /etc/uucp/dialcode c /etc/uucp/passwd c /etc/uucp/port c /etc/uucp/sys
In this example, six files under the /etc/uucp/ directory, which are provided by the uucp package, have been modified since installation. The modification time of call has changed, dial is missing from the system, the MD5 sum, and modification time, of dialcode have changed, the mode (permissions) of passwd has been modified, port now belongs to a different user and group, and sys has changed in size, MD5 sum, and modification time. Things to consider when verifying packages: 2-4
y Changes to files do not necessarily represent corruption or problems. Consider the case of configuration files that are expected to change. y Files provided by packages can be tagged as "no-verify". These files will not report changes when the package is verified. y Some files are not provided by (owned by) any package and therefore changes to the files will never be detected by the rpm verify feature.
RPM Package Digital Signature Verification
RPMs can be digitally signed using GPG (or PGP). Signed packages offer two important advantages to end users who download them: y They authenticate the package, assuring the user that the package comes from the vendor it is supposed to come from. y They guarantee the package's integrity, assuring the user that the package has not been modified since the packager signed it (though RPMs also provide this guarantee using other mechanisms).
Both of these features are important on today's Trojan Horse-riddled Internet. Once the public key is imported, the signature of packages can be verified as shown in this example:
$ rpm -K package-1.0-1.x86_64.rpm package-1.0-1.x86_64.rpm: rsa sha1 (md5) pgp md5 OK
If the package verifies, then the user knows the signature of it is that of a vendor they trust.
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
SRPM and spec Files Source packages • rebuild to produce new binary rpm • install to access source code, patches, spec file, etc. Spec file • defines meta-info about the package • describes how to compile package • describes what files to install • contains scripts to execute before and after installation and uninstallation
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Source Packages (SRPMs or .src.rpm)
Spec Files
In addition to binary packages, the rpm command can also be used to work with source RPM packages (package files ending with .src.rpm). Source packages can be used to build binary RPMs and contain several different files such as:
Spec files are included in each source package and describe the build process. Examining the .spec file can yield insights into how a package was patched, configured, and built. These details can prove valuable when troubleshooting the service.
y Original source code for the application y Patches required to modify that source code y Needed auxiliary files for that application (e.g. systemd unit file) y A script specifying how to configure and compile the source code; this script is typically named package.spec
Spec files are written in a syntax which interleaves a macro programming language with shell commands and with descriptive text. In the spec file, the number sign # is used to denote comments, just as in many other Unix configuration files.
One design principle behind RPM is that packages should always be compiled from the original source code for that application. This criterion is important in the Linux community, where individual applications are developed by a large, loosely organized collection of organizations all over the Internet, and then collated into a cohesive operating system by distribution vendors.
Distribution vendors often need to modify the original source code to better integrate it into the distribution and to patch bugs. Similarly, end users often need to recompile (and occasionally need to patch) applications shipped by their distribution vendor to better suit their local environment.
The spec file consists of several closely related sections (or stanzas): Header, Prep, Build, Install, Files, Scripts and Changelog. Together, these sections define the source files and patches which make up the application, provide detailed information about the source and use of the application, instruct RPM how to compile the application, define the files which RPM needs to install when installing the application, as well as how to install those files. The spec file can also contain optional scripts which are executed before and after installation and/or uninstallation of the application.
Preparing RPMs from original source code and applying separate patches helps ensure long-term ease of maintenance. It also allows end-users to easily determine exactly how a package has been modified.
2-5
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Hardware Discovery Tools Manual discovery of hardware • dmesg • /var/log/dmesg • /var/log/boot.msg • /proc/ and /sys/ • udevadm • lspci, lscpu, lsscsi, lsusb • dmidecode, biosdecode • sensors, sensors-detect
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Detecting New Hardware Manually
commands.
As the Linux kernel loads, it scans for hardware and then loads drivers to initialize and support the detected hardware. Examining kernel boot messages is a good way to see what hardware has been detected. You can view the current kernel messages at any time with the dmesg command. A copy of the kernel messages is made near the end of the boot sequence and stored so it can be examined long after the in memory messages have been overwritten.
UDEV Hardware Database
[R7] The following applies to RHEL7 only:
Boot time kernel messages are kept in /var/log/dmesg. [S12] The following applies to SLES12 only:
Boot time kernel messages are kept in /var/log/boot.msg. The /proc/ Virtual Filesystem
The running kernel exports details about detected hardware in the /proc/ and /sys/ filesystems. You can use the cat command to display the contents of the files in these filesystems. Files and directories in /proc/ pertaining to hardware that may be useful include: cpuinfo, dma, interrupts, iomem, meminfo, bus, bus/usb/, ide/sdX/*.
In many cases, utilities exist that can extract information from files in /proc/ and display it in a more human readable fashion. For example, instead of trying to read the raw data shown in the /proc/bus/pci/ and /proc/bus/usb/ directories, you can use the lspci and lsusb 2-6
The udev daemon maintains an internal database of the hardware it is aware of. The udevadm command can be used to query the database by device paths or names and return all the properties associated with the device. The entire database can also be exported as follows:
# udevadm info --export-db Interpreting BIOS DMI Data
Information stored in CMOS often contains low-level details about system hardware. Dump the BIOS data in human readable format with the dmidecode, or biosdecode, commands:
# dmidecode | sed -n •/Memory Device/,//p• |a egrep •^[[:space:]]*S(ize|peed|erial)• Size: 2048 MB Speed: 667 MHz (1.5 ns) Serial Number: ED1E3C43
The dmidecode command can also be used to determine the current BIOS version:
# dmidecode -s bios-version 7IET26WW (1.07)
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Configuring New Hardware with hwinfo SUSE automatic detection: hwinfo Runs at boot time and detects hardware changes Automatically reconfigures system on addition or removal of hardware • Uses detection routines found in the /usr/lib64/libhd.so.* libraries • Records detected hardware in /var/lib/hardware/
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Detecting and Configuring New Hardware with hwinfo
To make Linux administration easier, SLES12 provides hwinfo, a program which identifies and configures new hardware attached to the system.
hwinfo utilizes the /var/lib/hardware/unique-keys/ directory,
where it creates and updates ASCII text files as additional hardware is added to the system. Invoking hwinfo Manually
Model: 6.15.6 "Intel(R) Xeon(R) CPU . . . snip . . . # hwinfo --usb --short hub: Linux 2.6.16.21-0.25-smp uhci_hcd Linux 2.6.16.21-0.25-smp uhci_hcd Linux 2.6.16.21-0.25-smp uhci_hcd Linux 2.6.16.21-0.25-smp ehci_hcd Cypress Hub
5150 @ 2.66GHz"
UHCI UHCI UHCI EHCI
Host Host Host Host
Controller Controller Controller Controller
hwinfo can be run at any time (for example to detect and configure hot-plugged devices). By default, it will scan for nearly every type of device that it is capable of detecting. Options can be passed to limit the scan to a specific subset of devices. The following examples show using hwinfo to list a few details for a few specific types of hardware: # hwinfo --cpu --short cpu: Intel(R) Xeon(R) CPU 5150 @ 2.66GHz, 2660 MHz Intel(R) Xeon(R) CPU 5150 @ 2.66GHz, 2660 MHz # hwinfo --cpu 01: None 00.0: 10103 CPU [Created at cpu.290] Unique ID: rdCR.j8NaKXDZtZ6 Hardware Class: cpu Arch: X86-64 Vendor: "GenuineIntel"
2-7
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
strace and ltrace strace • • • • • •
traces systems calls and signals -p pid_num – attach to a running process -o file_name – output to file -e trace=expr – specify a filter for what is displayed -f – follow (and trace) forked child processes -c – show counts
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu ltrace
• very similar to strace in functionality and options but shows dynamic library calls made by the process • -S – show system calls • -n x – indent output for nested calls
The strace Command
The strace command can be used to intercept and record the system calls made, and the signals received by a process. This allows examination of the boundary layer between the user and kernel space which can be very useful for identifying why a process is failing. Using strace to analyze how a program interacts with the system is especially useful when the source code is not readily available. In addition to its importance in troubleshooting, strace can provide deep insight into how the system operates. Any user may trace their own running processess; additionally, the root user may trace any running processess. For example, the following could be used to attach to and trace the running slapd daemon:
# strace -p $(pgrep slapd) . . . output omitted . . . strace Output
Output from strace will correspond to either a system call or signal. Output from a system call is comprised of three components: The system call, any arguments surrounded by paranthesis, and the result of the call following an equal sign. An exit status of -1 usually indicates an error. For example:
$ strace ls enterprise execve("/bin/ls", ["ls", "enterprise"], [/* 38 vars */]) = 0 brk(0) = 0x8aa2000 2-8
access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such a file or directory) open("/etc/ld.so.cache", O_RDONLY) = 3 . . . snip . . . Curly braces are used to indicate dereferenced C structures. Square braces are used to indicate simple pointers or an array of values. Since strace often creates a large amount of output, it's often convenient to redirect it to a file. For example, the following could be used to launch the Z shell, trace any forked child processes, and record all file access to the files.trace file:
# strace -f -o files.trace -e trace=file zsh . . . output omitted . . .
Run the ls command counting the number of times each system call was made and print totals showing the number and time spent in each call (useful for basic profiling or bottleneck isolation):
# strace -c ls . . . output omitted . . .
The following example shows the three config files that OpenSSH's sshd reads as it starts. Note that strace sends its output to STDERR by default, so if you want to pipe it to other commands like grep for further altering you must redirect the output appropriately:
# strace -f -eopen /usr/sbin/sshd 2>&1 | grep ssh . . . output omitted . . .
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
[R7] The following applies to RHEL7 only:
Trace only the network related system calls as Netcat attempts to connect to a local in.telnetd service:
# strace -e trace=network nc localhost 23 . . . output omitted . . .
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
[S12] The following applies to SLES12 only:
Trace just the network related system calls as Netcat attempts to connect to a local telnetd service:
# strace -e trace=network netcat localhost 23 . . . output omitted . . . The ltrace Command
The ltrace command can be used to intercept and record the dynamic calls made to shared libraries. The amount of output generated by the ltrace command can be overwhelming for some commands (especially if the -S option is used to also show system calls). You can focus the output to just the interaction between the program and some list of libraries. For example, to execute the id -Z command and show the calls made to the libselinux.so module, execute:
$ ltrace -l /lib/libselinux.so.1 id -Z is_selinux_enabled(0xc1c7a0, 0x9f291e8, 0xc1affc, 0, -1)a =1 getcon(0x804c2c8, 0xfee80ff4, 0x804b179, 0x804c020, 0)a =0 user_u:system_r:unconfined_t
Remember that you can see what libraries a program is linked against using the ldd command.
2-9
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
lsof and fuser lsof – list files opened by processes • -a – AND multiple selection criteria together • -c – regex match against command name • -d – exclude/include file descriptors from/in output • +d – list all open files in a given directory • +D – same as above but recurses to sub-directories • -i – list open network connections • -N – list open NFS files • -U – list open Unix sockets fuser – list process IDs that have specified files open • -m – list processes with file open on the given filesystem • -k – send signal (KILL by default) to processes
The lsof Command
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
The lsof command lists all the files opened by processes on the system. An open file can be of any of the following types: a regular file, a directory, a block special file, a character special file, an executing text reference, a library, a stream or a network file (Internet socket, NFS file, or Unix domain socket.) The lsof command has numerous options that allow you to selectively display only the type of files you want, and to format the display (making the output easily parsable by other programs). These example invocations show some of the features of the lsof command:
List all open files for processes that have a command name matching the regular expression /slap/ (this matches slapd):
# lsof
[email protected]
List the ssh connections to or from station3:
# lsof -a -iTCP@station3 -c /ssh/
If you have updated your systems with the latest OpenSSL update, test which running processes still make use of the older library:
# lsof -d DEL | grep libssl.so.1.0.1e 22288: untitled (1.1k) List all files opened in the /var/ directory:
# lsof +d /var/
# lsof -c /slap/ slapd 14904 ldap rtd DIR 8,5 4096 2 / slapd 14904 ldap cwd DIR 8,5 4096 251905 /root slapd 14904 ldap txt REG 8,3 1934948 1155874 /usr/sbin/slapd
List files owned by the cron process in /var/ or any of its sub-directories:
More lsof examples
The fuser Command
# lsof +D /var/ -a -c /cron/
In each of the remaining examples of lsof usage, the output has been omitted.
The fuser command has some overlap with lsof in its functionality and is used to list the process IDs of processes accessing the specified file or filesystem.
List files owned by processes matching the regex /moz/ and (-a) are also Unix sockets:
To list all processes accessing some file in the /var/ filesystem:
# lsof -c /moz/ -a -U List UDP network services listening on loopback: 2-10
# fuser -m /var/ /var/: 1632 1857rc 1935c 2153 2188 2361c 2369c 2385c . . . snip . . .
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
The fuser command can also be used to send signals to the processes accessing files. For example, to send a HUP signal to each process that has open files in /var/log/:
# fuser -HUP -k /var/log/* . . . output omitted . . .
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu 2-11
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
ipcs and ipcrm ipcs – list information about active IPC facilities • -u – list current IPC usage • -l – list system IPC limits • -t – list time information (when created, last used, etc.) ipcrm – delete (release) an IPC facility
The ipcs Command
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
The ipcs command lists information about active inter-process communication (IPC) facilities; specifically: message queues, shared memory segments and semaphore sets. This is useful when troubleshooting because some processes will fail to start (often without explanation) if they require one of these IPC mechanisms and can not allocate it. One possible reason a process might fail when attempting to obtain one of these IPC facilities is simple resource consumption. A limited number of each exists. Current usage and limits can be shown with the -u and -l options, as shown here:
# ipcs -u ------ Shared Memory Status -------segments allocated 1 pages allocated 96 pages resident 96 . . . snip . . . # ipcs -l . . . snip . . . ------ Semaphore Limits -------max number of arrays = 128 max semaphores per array = 250 max semaphores system wide = 32000 . . . snip . . .
Since these IPC structures are, of necessity, shared resources, it is 2-12
not always possible for a terminating application to free (release) the structure. This can lead to a build up of open but unused structures that can result in resource exhaustion or collisions when the application tries to start later. Safely releasing these resources generally requires an excellent understanding of the programs in question. One good way to identify abandoned resources is to use the -t option to list the times associated with each IPC resource:
# ipcs -t . . . snip . . . ------ Shared Memory Operation/Change Times -------shmid owner last-op last-changed 1114144 apache Not set Fri Oct 8 19:59:00 2007 1146913 apache Sat Oct 9 14:07:44 Fri Oct 8 19:59:02 2007 . . . snip . . . Once you have identified a resource that must be freed, you can use the ipcrm command to release the resource:
# ipcrm -s 5996557
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
iostat, mpstat, and vmstat iostat – report statistics for I/O and CPU • -x /dev/disk – list detailed disk statistics mpstat – more detailed CPU statistics for SMP systems vmstat – report statistics for memory, I/O, and CPU • -s – list event counters and memory statistics • -d – list per/disk read/write statistics
The iostat Command
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Several commands are available to examine input and output statistics on Linux systems. Some basic I/O information is provided by vmstat, but iostat is the primary command and can show more detailed information. This following example shows summary information regarding device utilization:
# iostat -d Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn dev8-0 0.39 0.95 2.55 1535456 4103392 dev8-1 0.16 4.11 4.35 6623344 7015920 dev8-2 0.42 10.41 11.52 16782014 18571160
For more detailed information about a specific disk use the -x option:
# iostat -d -x /dev/hda The mpstat Command
Slightly more detailed CPU statistics than those provided by iostat can be obtained with the mpstat command which is usually run with a -P ALL option to display information about all processors on the system:
# mpstat 03:29:27 03:29:27 03:29:27 03:29:27
-P ALL PM CPU %user %nice %system %idle intr/s PM all 0.01 93.97 0.03 5.99 104.63 PM 0 0.01 93.40 0.03 6.55 104.63 PM 1 0.02 93.61 0.03 6.35 104.63
03:29:27 PM 03:29:27 PM
2 0.01 93.73 3 0.01 95.14
0.02 6.24 104.63 0.02 4.83 104.63
The vmstat Command
The vmstat command is commonly used on Unix systems to report information about processes, memory, paging, block IO, traps, and CPU activity. Statistics can be displayed in a simple list (only works on a Linux kernel version >= 2.6):
# vmstat -s 2065744 total memory 1041384 used memory 321240 active memory 582176 inactive memory 1024360 free memory 91624 buffer memory 763436 swap cache 522072 total swap 0 used swap 522072 free swap . . . snip . . . 1131122 pages paged out 0 pages swapped in 0 pages swapped out 6675348 interrupts 3150012 CPU context switches 1196790979 boot time 80591 forks
2-13
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
vmstat can also output statistics for a specified number of samples at a specified interval. The following example will output memory (and other) statistics every 2 seconds for a total of 10 samples: # vmstat 2 10 procs ----------memory---------r b swpd free buff cache 0 0 0 67584 227620 123380 0 0 0 67584 227620 123380 0 0 0 67584 227620 123380 . . . snip . . .
--swap-si so 0 0 0 0 0 0
---io--- --system-- -----cpu---bi bo in cs us sy id wa st 68 5 46 122 2 0 97 1 0 0 0 1100 26 0 0 100 0 0 0 0 1090 21 0 0 100 0 0
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
2-14
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Using hdparm to Measure Disk Performance Show performance of cache reads • hdparm -T /dev/sda Show performance of device reads • hdparm -t /dev/sda
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Measuring cache-read performance
Getting Accurate Numbers
You can use the hdparm program with the -T option to measure how fast a system can read data from the Linux buffer cache. Note that this does NOT read data from the drive and in no way indicates the performance of the drive. Instead, it measures the throughput of the system cache, CPU and memory. Think of this number as the upper limit for throughput assuming an infinitely fast drive:
The hdparm(8) man page makes several recommendations regarding how to obtain the most accurate readings:
# hdparm -T /dev/sda /dev/sda: Timing cached reads:
2320 MB in
Measuring Device Read Performance
y Make sure the system has several megabytes free memory before running the test. This ensures that the system can allocate memory to the buffer cache. y Minimize system load and especially other access to the drive. y Run several tests and average the results. y Use both the -t and -T options in the same invocation of hdparm. When used together, more accurate readings are 2.00 seconds = 1160.33 MB/sec reported because a correction factor (based on the -T readings) is applied to the -t readings.
Using the -t option with the hdparm command measures the throughput for reading data through the buffer cache from disk. This reading gives an indication of how fast the disk can sustain sequential data read. Note that this does NOT access the disk via normal filesystem calls. Filesystem code will impose additional overhead on this throughput:
# hdparm -t /dev/sda
/dev/sda: Timing buffered disk reads: 328 MB in 3.01 seconds = 108.82 MB/sec
2-15
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Troubleshooting with the ip command ip – view and configure network settings • newer replacement for ifconfig, route, arp, and others • extensive context sensitive help • addr – view and modify layer 2 and 3 addresses Route and ARP Mappings • route – view and modify layer 3 routes • neigh – show and modify ARP cache • -r – resolve IP addresses to names • changes not persistent when rebooting
The ip Command
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
The ifconfig command can be used to display or change network interface settings when troubleshooting problems related to network connectivity or networked services. When troubleshooting these problems, be sure to examine the transmission statistics in addition to the basic network address and prefix values. For example, in the output shown here, the RX/TX and collision errors are likely indicative of either a saturated segment or a bad Ethernet device on the segment:
# ip addr list eth0 2: eth0: mtu 1500 qdisc pfifo_fast link/ether 00:08:74:46:26:9d brd ff:ff:ff:ff:ff:ff inet 10.100.0.201/24 brd 10.100.0.255 scope global eth0 inet6 fe80::208:74ff:fe46:269d/64 scope link valid_lft forever preferred_lft forever
To activate the eth0 network interface, and assign an IP address and prefix, run:
# ifconfig eth0 eth0 Link encap:Ethernet HWaddr 00:07:E9:54:35:F0 inet addr:10.100.0.254 Bcast:10.100.0.255 Mask:255.255.255.0 inet6 addr: fe80::207:e9ff:fe54:35f0/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:981853 errors:26 dropped:0 overruns:0 frame:0 TX packets:894937 errors:345 dropped:0 overruns:0 carrier:0 collisions:1546 txqueuelen:1000 RX bytes:94117868 (89.7 Mb) TX bytes:306277992 (292.0 Mb)
# ip addr add 10.200.2.4/24 dev eth0
The ip command replaces the functionality of ifconfig, arp, and route. It is generally more powerful than the tools that it replaces and allows lower level changes. For example, to see the layer 2 and 3 addresses assigned to the eth0 interface:
Changes made while running ip interactively will not persist after a reboot. However, Linux distributions have boot scripts and network configuration facilities which call ip and configure network settings based on stored values.
2-16
To deactivate a link, run:
# ip link set eth0 down
To examine the cached ARP mappings:
# ip neigh list 10.100.0.254 dev eth0 lladdr 00:08:02:c4:1c:a2 REACHABLE
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Traffic Routing and Addressing on an Ethernet LAN When an IP packet is generated, before it can be transmitted on a wire, it must be encapsulated inside of a layer 2 frame. The ARP protocol is used to determine the layer 2 address (MAC) that is used to send packets to a given IP. If the destination host is on the local segment, then the MAC address used will be that of the end host. If the routing process on the local host determines that the destination host is on a remote segment, then a lookup is done in the local route table for a matching route. The MAC used to reach the destination host will be the MAC corresponding to the IP address assigned to the router's interface.
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
It is important to note that as an IP packet hops through the Internet, the layer 3 IP header source and destination IPs never change (unless the packet passes through a NAT gateway). However, that IP packet will be riding inside many different layer 2 frames. ARP mappings
The ip command can be used to examine and change entries in the ARP cache. For example, to list all the current known IP to MAC mappings (with numerical output) run:
# ip neigh 10.100.0.254 dev eth0 lladdr 00:1b:21:5a:ea:ee REACHABLE 10.100.0.10 dev eth0 lladdr 00:1b:21:24:f9:35 REACHABLE Routing Table
The ip command can be used to examine, create, remove, and alter entries in the routing table. For example, to show all routes in the local main table (with numerical output) run:
# ip route 10.100.0.0/24 dev eth0 proto kernel scope link src 10.100.0.9 169.254.0.0/16 dev eth0 scope link metric 1002 default via 10.100.0.254 dev eth0 proto static To add a new default gateway:
# ip route add default via 10.0.0.254
2-17
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Name Resolution Files include: • /etc/nsswitch.conf • /etc/hosts • /etc/resolv.conf Troubleshooting commands: • nslookup – query Internet name servers (deprecated) • host – newer DNS lookup utility • dig – alternate DNS lookup utility (very flexible) • getent – lookup entries via the name service switch (as defined in /etc/nsswitch.conf) • nscd – interact with local cache
Name Resolution
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
It is common for people to use names to reference systems instead of the assigned numeric protocol addresses. Several mechanisms exist to provide name resolution, such as local lookups in the /etc/hosts file, a DNS forwarder, or with a directory service such as LDAP. When troubleshooting name resolution problems, start by asking which of the possible name resolution sources you expect to resolve the query in question, and then focus on the files, and settings, specific to that mechanism. Most services or commands that rely on name resolution use standard calls found in the libc.so and libresolv.so libraries. For example, when you attempt to ping a host by its FQDN the ping command uses the gethostbyname() call found in the libc.so library. The exact actions to resolve the name would then depend on the configuration of the machine. The common sequence would be: 1. Read in the /etc/nsswitch.conf file and parse the hosts line to determine the methods and order. 2. Assuming the hosts line in /etc/nsswitch.conf reads 'hosts: files dns', load the libnss_files.so library. This reads the /etc/host.conf config file for parameters and overrides and then attempts resolution via a lookup in the /etc/hosts file. 3. If resolution has not succeeded yet (again, assuming the hosts line reads 'hosts: files dns', which is the default), load the libnss_dns.so library, which reads the /etc/resolv.conf file and then attempts resolution via DNS. 2-18
Troubleshooting Name Lookup Problems
Commands to test DNS include the older nslookup and the newer host and dig commands each of which can send queries directly to a name server. The nslookup command was deprecated because it was not as reliable as it should be. In fact, its advanced features have already been removed. Use host to perform a simple name to IP address lookup, suppressing use of the search list in the /etc/resolv.conf file:
# host station1.example.com. station1.example.com has address 10.100.0.1
Use dig to perform a simple name to IP address lookup (dig never uses the search list in the /etc/resolv.conf file):
# dig station1.example.com
; DiG 9.3.1 station1.example.com ;; global options: printcmd ;; Got answer: ;; ->>HEADER0) {++count}; END {print count}• Result:
29)
Current activated Systemd targets:
# systemctl list-units UNIT basic.target cryptsetup.target getty.target graphical.target local-fs-pre.target local-fs.target multi-user.target network-online.target network.target nfs-client.target nss-user-lookup.target paths.target remote-fs-pre.target remote-fs.target slices.target sockets.target swap.target sysinit.target 2-36
--type LOAD loaded loaded loaded loaded loaded loaded loaded loaded loaded loaded loaded loaded loaded loaded loaded loaded loaded loaded
target ACTIVE active active active active active active active active active active active active active active active active active active
SUB active active active active active active active active active active active active active active active active active active
DESCRIPTION Basic System Encrypted Volumes Login Prompts Graphical Interface Local File Systems (Pre) Local File Systems Multi-User System Network is Online Network NFS client services User and Group Name Lookups Paths Remote File Systems (Pre) Remote File Systems Slices Sockets Swap System Initialization
Count all modules where usage count is greater than 0.
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
timers.target
loaded active active Timers
LOAD = Reflects whether the unit definition was properly loaded. ACTIVE = The high-level unit activation state, i.e. generalization of SUB. SUB = The low-level unit activation state, values depend on unit type.
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
19 loaded units listed. Pass --all to see loaded but inactive units, too. To show all installed unit files use •systemctl list-unit-files•. # runlevel # who -r Result:
Network
30)
IP address and subnet mask for eth0:
# ip addr show eth0 # ifconfig eth0 Result:
31)
Default gateway:
# ip route list match 0.0.0.0 # route -n | grep ^0\.0\.0\.0 Result:
32)
DNS name server IP(s):
# cat /etc/resolv.conf Result:
33)
[R7] [R7]
Speed and duplex settings for eth0:
# ethtool eth0 . . . output omitted . . . # mii-tool eth0 eth0: negotiated 100baseTx-FD flow-control, link ok
2-37
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Result:
34)
Hostname:
# uname -n # cat /etc/hostname Result:
35)
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
MAC address of default gateway:
# ping -c 1 w.x.y.z ; ip neigh # arping -I eth0 w.x.y.z Result:
36)
Kernel module in use for eth0:
# dmesg | grep eth0 # lsmod Result:
Software
37)
ping the gateway to populate the ARP cache, then extract the matching entry from the cache.
Good knowledge of what kernel modules correspond to the various network chipsets will allow you to spot the corresponding module in the list.
Number of packages installed:
# rpm -qa | wc -l Result:
38)
Version of rpm installed:
# rpm --version # rpm -q rpm Result:
2-38
Most commands support some way of identifying the version, or you can query the RPM database.
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
39) [R7] [S12]
Install and check the version of gcc:
# # # #
yum install -y gcc zypper install -n gcc gcc --version rpm -q gcc Result:
40)
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Number of Xinetd services set to start with Xinetd):
# chkconfig --list | awk •($2=="on") {count++} END {print count}• Result:
41) [R7]
Which MTA is installed:
# # # #
rpm -q sendmail postfix alternatives --display mta | grep link pgrep sendmail pgrep master Result:
42) [R7] [S12]
See which packages are installed (remember that on RHEL/FC both might be installed simultaneously) Main postfix daemon
What cron jobs are set to run:
# cat /etc/crontab; ls -R /etc/cron.* # ls -R /var/spool/cron/ # ls -R /var/spool/cron/tabs Result:
43)
Use chkconfig to list all services, then count Xinetd service that are on.
Check system crontab and directories it references, then check the user cron spools.
Where syslog authpriv messages get logged to:
# grep ^authpriv /etc/rsyslog.conf Result:
Check first for lines exactly matching authpriv. If none are found, then search the entire file for a matching wildcard (*) entry and look to see what file the messages are sent to.
2-39
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Security and Firewall
44)
Rules in the iptables filter table:
# iptables -t filter -nL Result:
45)
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
TCP Wrappers denying anything:
# cat /etc/hosts.* Result:
46)
# ssh root@localhost Result:
[R7] [S12]
Focus on any active entries in the hosts.deny file.
Is SSH active, and if so does it permit root logins:
# service sshd status # grep PermitRootLogin /etc/ssh/sshd_config
47)
Be sure to also take note of the policy set for each chain.
Check first for explicit configuration of this option in sshd config. Try to connect as root to your own local machine.
After installation, is telnet enabled? If so, will it permit root to log in?
# # # #
yum install -y telnet-server telnet zypper install -n telnet-server chkconfig --list telnet telnet localhost Result:
Is Xinetd running? /sbin/service xinetd status Try to connect as root to your own local machine.
Users and Authentication
48)
What other user accounts exist:
# cat /etc/passwd Result:
2-40
Focus on the end of the file and look for regular user (nonsystem) accounts.
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
49)
Is the shadow file in use:
# ls /etc/shadow Result:
50) [R7] [S12]
# grep PASSWDALGORITHM /etc/sysconfig/authconfig # grep CRYPT /etc/default/passwd Result:
51)
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
What password hash format is in use?
Which services invoke the pam_limits.so module:
# grep -l limits /etc/pam.d/* [R7] [S12]
# grep -l system-auth /etc/pam.d/* # grep -l common-session /etc/pam.d/* Result:
52)
Or just run the authconfig program and examine the current settings. Or run yast users and look in the expert settings.
Look for calls to the limit module in each services PAM config. Since the limit module is listed in the system-auth file we need to check and see which services reference it.
Administrative privileges are no longer required; exit the root shell to return to an unprivileged account:
# exit
2-41
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Objectives y Practice using common troubleshooting techniques with rpm. Requirements b (1 station) c (classroom server)
Lab 2
Task 2 Troubleshooting with rpm Estimated Time: 15 minutes
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Relevance The RPM package management system has extremely powerful querying and verification abilities. During the course of troubleshooting it can be used to see what is installed as well as if the installation has been damaged in any way. Being adept with RPM will enable you to perform sophisticated inspections and verifications of your system.
1)
Install the Apache web server on your system:
[R7]
# . # .
[R7] [S12] [S12]
yum install -y httpd . . output omitted . . . zypper install -n apache2 . . output omitted . . .
2)
Print a summarized revision history and also detailed information about the last few changes:
[R7]
# rpm -q --changelog httpd | less # rpm -q --changelog apache2 | less . . . output omitted . . .
[S12]
Type q to quit less.
3)
Examine the scripts that are triggered when the Apache package is installed or removed:
[R7] [S12]
# rpm -q --scripts httpd . . . output omitted . . . # rpm -q --scripts apache2
[S12]
. . . output omitted . . .
[R7]
2-42
Very simple script that adds a user account and adds the new service into chkconfig on install—stop and removes (from chkconfig) on uninstall. Complex script that, among other things tries to convert old httpd configs to the new modular format.
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
4)
Verify the integrity of the files provided by the Apache package using the RPM verify capabilities:
[R7]
# rpm -V httpd # rpm -V apache2
[S12]
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
5)
It's possible that an intruder has modified the files owned by the package, and then updated the RPM database to cover their tracks. A "safer" verification can be done by comparing against a "known good" package. Download the Apache package from the classroom server and place it in the root user's home directory. (You can also copy the package from local media, CD-ROMs for example, if they are available).
[R7]
# # # #
[R7] [S12] [S12]
mount server1:/export/netinstall/RHEL7/ /mnt/ cp /mnt/Packages/httpd-.rpm /root/ mount server1:/export/netinstall/SLES12/ /mnt/ cp /mnt/suse/x86_86/apache2-2.rpm /root/
6)
Now that you have a local copy of the main Apache web server package, verify the installed package against the uninstalled package you downloaded from the server.
[R7]
# rpm --import /etc/pki/rpm-gpg/RPM-GPG-KEY-redhat-release # rpm -Vp /root/httpd-2*.rpm # rpm -Vp /root/apache2-2*.rpm
[R7] [S12]
Remember that if no output if given then it means that the computed values for all files owned by the package match the values stored in the RPM database.
7)
The question you must now ask is how can you trust that the package you are verifying against has not been modified since it was created for inclusion with the distribution? To solve this problem, most packages are digitally signed using the private key of the vendor (packager). View the information for the Apache package again and take note of the Signature information:
[R7]
# rpm -qi httpd | grep "^Signature" Signature : RSA/8, Mon 16 Aug 2010 11:03:35 AM MDT, Key ID 199e2f91fd431d51 # rpm -qi apache2 | grep "^Signature" Signature : RSA/SHA256, Fri Sep 26 15:09:14 2014, Key ID 70af9e8138db7c82
[R7] [S12] [S12]
2-43
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
8)
To allow verification of a package's digital signature, you must have the public key corresponding to the shown Key ID imported into your RPM database. Use three different query methods to view the public keys installed into the RPM database.
# . # . # .
2-44
rpm . . rpm . . rpm . .
-qa •gpg-pubkey*• output omitted . . . -qg •Public Keys• output omitted . . . -qig •Public Keys• output omitted . . .
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
All GPG public keys used by RPM are in the 'Public Keys' RPM group.
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Objectives y Practice using common troubleshooting process related tools. Requirements b (1 station)
Lab 2
Task 3 Process Related Tools Estimated Time: 15 minutes
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Relevance Operating systems exist to run applications. Being able to view which processes are running, what resources they are using, as well as what operations they are performing is extremely helpful in gathering information to diagnose problems.
1)
The following actions require administrative privileges. Switch to a root login shell:
$ su Password: makeitso Õ
2)
Ensure that the Apache web server is installed and started:
[R7]
# # # #
[R7] [S12] [S12]
3)
[R7] [R7] [R7] [S12] [S12] [S12]
yum install -y httpd systemctl start httpd zypper install -n apache2 systemctl start apache2
Look at the processes launched by Apache. Take note of the hierarchy by looking at the parent process IDs (ppid). Take note of the user that each process is running as. Record the PID# of the main (root owned) Apache process and the PID# of the first forked child Apache process:
# ps -eo comm,pid,ppid,user f | grep http httpd 2061 1 root \_ httpd 2064 2061 apache \_ httpd 2065 2061 apache -httpd2-prefork 26614 1 root \_ httpd2-pref 26615 26614 wwwrun \_ httpd2-pref 26616 26614 wwwrun . . . snip . . . Result:
2-45
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
4)
List all of the open files owned by the running Apache process: Replace main_proc_PID with the actual process ID recorded in the previous step.
# lsof -p main_proc_PID . . . output omitted . . . Notice that Apache has many memory mapped files corresponding to the system libraries and Apache modules that it loads. It also has a few file handles open to standard files for writing events to logs.
5) [R7] [R7] [R7] [S12] [S12] [S12]
6)
[R7] [S12]
[R7] [R7] [S12]
2-46
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
List only the open files in /var/log/ that are owned by the main Apache process:
# lsof -ap main_proc_PID +D /var/log/ COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME httpd 2061 root 2w REG 253,4 531 8235 /var/log/httpd/error_log httpd 2061 root 7w REG 253,4 0 8236 /var/log/httpd/access_log COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME httpd2-pr 26614 root 2w REG 8,6 689 40322 /var/log/apache2/error_log httpd2-pr 26614 root 7w REG 8,6 0 40323 /var/log/apache2/access_log An alternate way to see a listing of the standard file handles held by a process is to look in the /proc/PID/fd directory for the process. List the file handles held for standard files for the main Apache process:
# ls -l /proc/main_proc_PID/fd/ total 0 lr-x------ 1 root root 64 Mar 31 l-wx------ 1 root root 64 Mar 31 l-wx------ 1 root root 64 Mar 31 l-wx------ 1 root root 64 Mar 31 lrwx------ 1 root root 64 Mar 31 lrwx------ 1 root root 64 Mar 31 lr-x------ 1 root root 64 Mar 31 l-wx------ 1 root root 64 Mar 31 l-wx------ 1 root root 64 Mar 31 lr-x------ 1 root root 64 Mar 31 l-wx------ 1 root root 64 Mar 31
13:30 13:30 13:30 13:30 13:30 13:30 13:30 13:30 13:30 13:30 13:30
0 1 2 2 3 4 5 6 7 8 7
-> -> -> -> -> -> -> -> -> -> ->
/dev/null /dev/null /var/log/httpd/error_log /var/log/apache2/error_log socket:[16366] socket:[16367] pipe:[16390] pipe:[16390] /var/log/httpd/access_log /dev/urandom /var/log/apache2/access_log
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
7)
Use the telnet program to connect to your running Apache process and verify that it can serve content:
[R7]
# yum install -y telnet . . . output omitted . . . # telnet localhost 80 Trying 127.0.0.1... Connected to localhost. Escape character is •^]•. GET / HTTP/1.0Õ
[R7]
Õ
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
HTTP/1.1 403 Forbidden . . . snip . . .
8)
Verify that the access is being logged:
[R7]
# cd /var/log/httpd/ # cd /var/log/apache2/ # tail -n 1 access_log ::1 - - [10/Apr/2015:14:08:58 -0600] "GET / HTTP/1.0" 403 3985 "-" "-"
[S12]
9)
Press Õ on a blank line. A default install of Apache will not have an index.html file, resulting in 403 Forbidden.
With the Apache process still running, remove the access_log file:
# mv access_log /tmp/
10)
[R7] [S12]
Again list the open files in /var/log that are owned by the main Apache process:
# lsof -ap main_proc_PID +D /var/log/ COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME httpd 2061 root 2w REG 253,4 749 8235 /var/log/httpd/error_log httpd2-pr 26614 root 2w REG 8,6 689 40322 /var/log/apache2/error_log Notice that the file handle that previously pointed to the access_log is no longer present.
11)
Repeat the earlier procedure (Step 7 and Step 8) of connecting to the web server with telnet, and then attempt to view the record of your connection in the access log. Note that the log file still does not exist and that subsequently the connection event was not logged.
2-47
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
12) [R7] [S12]
13) [R7] [R7] [S12] [S12]
14)
Send a HUP signal to the running web server process (causing it to re-read its configuration file and re-open file handles to its logs):
# killall -HUP httpd # killall -HUP httpd2-prefork
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Verify that a new access_log file has been created:
# cd /var/log/httpd/; ls access_log access_log # cd /var/log/apache2/; ls access_log access_log
Use the strace command to examine the systems calls made by a web server child process when servicing a connection:
# strace -p $(pgrep httpd | sed -n •3p•) Process 15489 attached
Leave this strace process running while you complete the next steps.
15) [R7] [S12]
16)
Create an index file on the web server to retrieve:
# echo hello, world > /var/www/html/index.html # echo hello, world > /srv/www/htdocs/index.html
Connect to your own web server, such as with a browser, by repeating the process shown in Step 7, or by some other means, e.g. curl. The main web server process directs connections to each of the child web server processes in a round robin fashion, so if you do not see evidence of your connection then repeat this process until you notice that the strace output shows evidence of your connection:
$ pgrep httpd | wc -l 6 $ for i in $(seq 6) > do curl http://localhost > done hello, world 2-48
Use the pgrep command to list the PIDs for all of the httpd processes, then have sed discard all but the third line.
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
hello, hello, hello, hello, hello, hello,
17)
world world world world world world
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Terminate the strace command and analyze the collected data to better understand how the web server process handles a connection. Specifically look for the following things:
1. Process uses accept(2) system call to extract a request off the pending queue and create a new connected socket. The return code is the file-handle that points to the new socket. 2. The GET request to the server: how many bytes of data does the read(2) call return? 3. Get the current date and time (gettimeofday(2)) so that it can be recorded in the log message for this connection. 4. Use the stat(2) and lstat(2) calls to verify that directory and file permissions allow access, and to locate the file that will be sent back. Can you see how this search corresponds to the DirectoryIndex web server configuration command? 5. Open the file for access (open(2)). Note the flag (O_RDONLY) that causes the file to be opened read-only. Note the file-handle given as a return value. 6. The writev(2) call is used to transfer the data to the client, (also see sendfile(2)). How many bytes of data were sent? 7. write(2) to a previously opened file-handle that points to the access_log file. 8. close(2) this end of the socket for sending. 9. Wait a brief period of time to see if the client has more data to send on this socket. How long will the web server process wait for data to arrive on the socket? (Hint: check the poll(2) man page and look for the time-out value) 10. Close the file-handle for the client connection. [R7] [R7] [R7] [R7]
1. accept4(4, {sa_family=AF_INET6, sin6_port=htons(59806), inet_pton(AF_INET6, "::1", &sin6_addr),a sin6_flowinfo=0, sin6_scope_id=0}, [28], SOCK_CLOEXEC) = 9 gettimeofday({1429116818, 858499}, NULL) = 0 getsockname(9, {sa_family=AF_INET6, sin6_port=htons(80), inet_pton(AF_INET6, "::1", &sin6_addr),a sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0 fcntl(9, F_GETFL) = 0x2 (flags O_RDWR) 2-49
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
[R7] [R7] [R7] [R7] [R7] [R7]
2. 3.
[R7] [R7] [R7] [R7] [R7] [R7] [R7]
4.
[R7] [R7] [R7]
5.
[R7] [R7] [R7] [R7] [R7]
6.
[R7] [R7] [R7]
7.
[R7] [R7] [R7] [R7] [R7] [R7] [R7] [R7] [R7] [R7] [R7]
8. 9.
[R7] [R7] [R7] [R7] [R7] [R7]
2-50
10.
fcntl(9, F_SETFL, O_RDWR|O_NONBLOCK) = 0 gettimeofday({1429116818, 858642}, NULL) = 0 gettimeofday({1429116818, 858678}, NULL) = 0 gettimeofday({1429116818, 858696}, NULL) = 0 read(9, "GET / HTTP/1.1\r\nUser-Agent: curl"..., 8000) = 73 gettimeofday({1429116818, 858777}, NULL) = 0 gettimeofday({1429116818, 858801}, NULL) = 0 gettimeofday({1429116818, 858818}, NULL) = 0 gettimeofday({1429116818, 858836}, NULL) = 0 gettimeofday({1429116818, 858853}, NULL) = 0 gettimeofday({1429116818, 858904}, NULL) = 0 gettimeofday({1429116818, 858924}, NULL) = 0 stat("/var/www/html/", {st_mode=S_IFDIR|0755, st_size=23, ...}) = 0 gettimeofday({1429116818, 859092}, NULL) = 0 stat("/var/www/html/index.html", {st_mode=S_IFREG|0644, st_size=13, ...}) = 0 open("/var/www/html/index.html", O_RDONLY|O_CLOEXEC) = 10 gettimeofday({1429116818, 859278}, NULL) = 0 read(9, 0x7f6e91526a28, 8000) = -1 EAGAIN (Resource temporarily unavailable) gettimeofday({1429116818, 859327}, NULL) = 0 mmap(NULL, 13, PROT_READ, MAP_SHARED, 10, 0) = 0x7f6e905ce000 writev(9, [{"HTTP/1.1 200 OK\r\nDate: Wed, 15 A"..., 240}, {"hello, world\n", 13}], 2) = 253 munmap(0x7f6e905ce000, 13) = 0 gettimeofday({1429116818, 860510}, NULL) = 0 write(7, "::1 - - [15/Apr/2015:10:53:38 -0"..., 79) = 79 times({tms_utime=0, tms_stime=0, tms_cutime=0, tms_cstime=0}) = 507132999 close(10) = 0 gettimeofday({1429116818, 860608}, NULL) = 0 gettimeofday({1429116818, 860634}, NULL) = 0 gettimeofday({1429116818, 860653}, NULL) = 0 poll([{fd=9, events=POLLIN}], 1, 5000) = 1 ([{fd=9, revents=POLLIN}]) read(9, "", 8000) = 0 gettimeofday({1429116818, 860712}, NULL) = 0 gettimeofday({1429116818, 860728}, NULL) = 0 shutdown(9, SHUT_WR) = 0 poll([{fd=9, events=POLLIN}], 1, 2000) = 1 ([{fd=9, revents=POLLIN|POLLHUP}]) read(9, "", 512) = 0 close(9) = 0 read(5, 0x7fff43f2956f, 1) = -1 EAGAIN (Resource temporarily unavailable) gettimeofday({1429116818, 860890}, NULL) = 0 accept4(4, Ó¿c^CProcess 9996 detached
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
[S12] [S12] [S12] [S12] [S12] [S12] [S12] [S12] [S12] [S12] [S12] [S12] [S12] [S12] [S12] [S12] [S12] [S12] [S12] [S12] [S12] [S12] [S12] [S12] [S12] [S12] [S12] [S12] [S12] [S12] [S12] [S12] [S12] [S12] [S12] [S12] [S12] [S12]
1. accept4(4, {sa_family=AF_INET6, sin6_port=htons(47739), inet_pton(AF_INET6, "::1", &sin6_addr),a sin6_flowinfo=0, sin6_scope_id=0}, [28], SOCK_CLOEXEC) = 9 getsockname(9, {sa_family=AF_INET6, sin6_port=htons(80), inet_pton(AF_INET6, "::1", &sin6_addr),a sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0 fcntl(9, F_GETFL) = 0x2 (flags O_RDWR) fcntl(9, F_SETFL, O_RDWR|O_NONBLOCK) = 0 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fc0c0b53000 gettimeofday({1429115483, 956390}, NULL) = 0 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fc0c0b51000 2. read(9, "GET / HTTP/1.1\r\nUser-Agent: curl"..., 8000) = 73 3. gettimeofday({1429115483, 956466}, NULL) = 0 gettimeofday({1429115483, 956497}, NULL) = 0 gettimeofday({1429115483, 956521}, NULL) = 0 gettimeofday({1429115483, 956543}, NULL) = 0 gettimeofday({1429115483, 956556}, NULL) = 0 4. stat("/srv/www/htdocs/", {st_mode=S_IFDIR|0755, st_size=56, ...}) = 0 lstat("/srv", {st_mode=S_IFDIR|0755, st_size=28, ...}) = 0 lstat("/srv/www", {st_mode=S_IFDIR|0755, st_size=35, ...}) = 0 lstat("/srv/www/htdocs", {st_mode=S_IFDIR|0755, st_size=56, ...}) = 0 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fc0c0b4f000 stat("/srv/www/htdocs/index.html", {st_mode=S_IFREG|0644, st_size=13, ...}) = 0 lstat("/srv/www/htdocs/index.html", {st_mode=S_IFREG|0644, st_size=13, ...}) = 0 5. open("/srv/www/htdocs/index.html", O_RDONLY|O_CLOEXEC) = 10 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fc0c0b4d000 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fc0c0b4b000 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fc0c0b49000 gettimeofday({1429115483, 956980}, NULL) = 0 read(9, 0x7fc0c0b51048, 8000) = -1 EAGAIN (Resource temporarily unavailable) mmap(NULL, 13, PROT_READ, MAP_SHARED, 10, 0) = 0x7fc0c0b48000 6. writev(9, [{"HTTP/1.1 200 OK\r\nDate: Wed, 15 A"..., 230}, {"hello, world\n", 13}], 2) = 243 munmap(0x7fc0c0b48000, 13) = 0 7. write(7, "::1 - - [15/Apr/2015:10:31:23 -0"..., 79) = 79 close(10) = 0 poll([{fd=9, events=POLLIN}], 1, 15000) = 1 ([{fd=9, revents=POLLIN}]) read(9, "", 8000) = 0 gettimeofday({1429115483, 957991}, NULL) = 0 8. shutdown(9, SHUT_WR) = 0 9. poll([{fd=9, events=POLLIN}], 1, 2000) = 1 ([{fd=9, revents=POLLIN|POLLHUP}]) read(9, "", 512) = 0 10. close(9) = 0
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
2-51
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
[S12] [S12] [S12]
read(5, 0x7fffc5683c57, 1) = -1 EAGAIN (Resource temporarily unavailable) accept4(4, Ó¿c^CProcess 21383 detached
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu 2-52
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Objectives y Practice using common network related tools. Requirements b (1 station)
Lab 2
Task 4 Network Tools Estimated Time: 25 minutes
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Relevance Using a variety of networking tools to discover which process and services are running and listening for connections on a system is important. Being able to identify those processes is important for enhancing system security as well as gathering troubleshooting information.
1)
Install RPM packages required for this lab task:
[R7]
# yum install -y nmap{,-ncat} wireshark # zypper install -n netcat-openbsd nmap wireshark . . . output omitted . . .
[S12]
2)
Leave the following command running in a terminal so that you can monitor the changes made to the network configuration in the subsequent commands:
[terminal 1]# ip monitor
3)
In another terminal create a virtual interface (using the legacy ifconfig command) with a new IP address:
[terminal 2]# ifconfig eth0:1 172.16.0.X netmask 255.255.255.0 up
Replace X with your station number.
2-53
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
4)
Examine the output in the first terminal ([1]) to see what low level operations were involved in creating the new virtual interface. You should see something like the following:
2: eth0 inet 172.16.0.X/16 brd 172.16.255.255 scope global eth0:1 local 172.16.0.X dev eth0 table local proto kernel scope host src 172.16.0.X broadcast 172.16.255.255 dev eth0 table local proto kernel scope link src 172.16.0.X 172.16.0.0/16 dev eth0 proto kernel scope link src 172.16.0.X broadcast 172.16.0.0 dev eth0 table local proto kernel scope link src 172.16.0.X Deleted 2: eth0 inet 172.16.0.X/16 brd 172.16.255.255 scope global eth0:1 Deleted 172.16.0.0/16 dev eth0 proto kernel scope link src 172.16.0.X Deleted broadcast 172.16.255.255 dev eth0 table local proto kernel scope link src 172.16.0.X Deleted broadcast 172.16.0.0 dev eth0 table local proto kernel scope link src 172.16.0.X Deleted local 172.16.0.X dev eth0 table local proto kernel scope host src 172.16.0.X 2: eth0 inet 172.16.0.X/24 brd 172.16.0.255 scope global eth0:1 local 172.16.0.X dev eth0 table local proto kernel scope host src 172.16.0.X broadcast 172.16.0.255 dev eth0 table local proto kernel scope link src 172.16.0.X 172.16.0.0/24 dev eth0 proto kernel scope link src 172.16.0.X broadcast 172.16.0.0 dev eth0 table local proto kernel scope link src 172.16.0.X
5)
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Verify that the new interface shows up in the output from ifconfig and that you can ping the new address:
[terminal 2]# ifconfig . . . output omitted . . . [terminal 2]# ping -c 1 172.16.0.X . . . output omitted . . .
6)
Add an additional (secondary) address to your eth0 interface using broadcast 172.16.0.255 dev eth0 table local proto kernel scope link src 172.16.0.X the newer ip command:
[terminal 2]# ip addr add 172.17.0.X label eth0:2 dev eth0
2-54
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
7)
Examine the still running ip monitor command in terminal 1 where the changes have been logged. You should see something like the following:
4: eth0 inet 172.17.0.X/32 scope global eth0:2 local 172.17.0.X dev eth0 table local proto kernel scope host src 172.17.0.X
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
You can now terminate the monitoring command by pressing Ó¿c in the [terminal 1] window.
8)
Verify that the new address is listed and that you can ping it:
# ip addr show eth0 2: eth0: mtu 1500 qdisc pfifo_fast state UP qlen 1000 link/ether 00:1b:21:24:fa:6e brd ff:ff:ff:ff:ff:ff inet 10.100.0.X/24 brd 10.100.0.255 scope global eth0 inet 172.16.0.X/24 brd 172.16.0.255 scope global eth0:1 inet 172.17.0.X/32 scope global eth0:2 inet6 fe80::21b:21ff:fe24:fa6e/64 scope link valid_lft forever preferred_lft forever # ping -c 1 172.17.0.X PING 172.17.0.X (172.17.0.X) 56(84) bytes of data. 64 bytes from 172.17.0.X: icmp_seq=0 ttl=0 time=0.041 ms . . . snip . . .
9)
Verify that the new secondary address is shown in the output of the deprecated
ifconfig command:
# ifconfig eth0 . . . output omitted . . .
If label eth0:2 had not be used with the ip command, ifconfig would not see the address.
2-55
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
10)
Use NetCat to start a process listening on port 50001 on the new virtual and secondary addresses:
# . # . # .
11)
yes . . yes . . yes . .
virtual | nc -l 172.16.0.X 50001 & output omitted . . . secondary | nc -l 172.17.0.X 50001 & output omitted . . . normal | nc -lu 50001 & output omitted . . .
Listening on just the virtual IP :50001
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Listening on just the secondary IP :50001 Listening on all IPs udp 50001
Use the ss command to show the listening processes:
# ss -tupa | grep nc . . . output omitted . . .
12)
Use the lsof command to list the network connections in a variety of ways:
# lsof -i:50001 COMMAND PID USER FD nc 2388 root 3u nc 2391 root 3u nc 2394 root 3u # lsof -iudp:50001 COMMAND PID USER FD nc 2394 root 3u # lsof
[email protected] COMMAND PID USER FD ntpd 1633 ntp 23u nc 2388 root 3u # lsof -a -i -c /nc/ COMMAND PID USER FD nc 2388 root 3u nc 2391 root 3u nc 2394 root 3u
2-56
List all Internet sockets on port 50001.
TYPE IPv4 IPv4 IPv4
DEVICE SIZE/OFF NODE NAME 18723 0t0 TCP 172.16.0.9:50001 (LISTEN) 18729 0t0 TCP 172.17.0.9:50001 (LISTEN) 18735 0t0 UDP *:50001
List just UDP port 50001 sockets.
TYPE DEVICE SIZE/OFF NODE NAME IPv4 18735 0t0 UDP *:50001
List all sockets on the given address.
TYPE DEVICE SIZE/OFF NODE NAME IPv4 18450 0t0 UDP 172.16.0.9:ntp IPv4 18723 0t0 TCP 172.16.0.9:50001 (LISTEN) TYPE IPv4 IPv4 IPv4
Internet sockets that also match Netcat's command name.
DEVICE SIZE/OFF NODE NAME 18723 0t0 TCP 172.16.0.9:50001 (LISTEN) 18729 0t0 TCP 172.17.0.9:50001 (LISTEN) 18735 0t0 UDP *:50001
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
13)
Use nmap to discover what services are listening on your machine:
# nmap -sSU -p 1-1024 -O localhost
-sS = SYN scan (half connect) -sU = UDP scan -O = Activate remote host identification -p = Limit scan to given port range.
Starting Nmap 6.46 ( http://nmap.org ) at 2015-04-10 14:15 MDT Nmap scan report for localhost (127.0.0.1) Host is up (0.000044s latency). Other ports for other services may show, depending on Not shown: 2042 closed ports configuration and distribution. PORT STATE SERVICE 22/tcp open ssh 25/tcp open smtp 80/tcp open http 111/tcp open rpcbind 111/udp open rpcbind 123/udp open ntp No exact OS matches for host (If you know what OS is running on it, see http://nmap.org/submit/ ). TCP/IP fingerprint: OS:SCAN(V=6.46%E=4%D=4/10%OT=22%CT=1%CU=1%PV=N%DS=0%DC=L%G=Y%TM=55283034%P= OS:x86_64-suse-linux-gnu)SEQ(SP=101%GCD=1%ISR=10F%TI=Z%CI=Z%TS=8)OPS(O1=MFF OS:D7ST11NW7%O2=MFFD7ST11NW7%O3=MFFD7NNT11NW7%O4-MFFD7ST11NW7%O5=MFFD7ST11N OS:W7%O6=MFFD7ST11)WIN(W1=AAAA%W2=AAAA%W3=AAAA%W4=AAAA%W5=AAAA%W6=AAAA)ECN( OS:R=Y%DF=Y%T=40%W=AAAA%O=MFFD7NNSNW7%CC=Y%Q=)T1(R=Y%DF=Y%T=40%S=O%A=S+%F=A OS:S%RD=0%Q=)T2(R=N)T3(R=N)T4(R=Y%DF=Y%T=40%W=0%S=A%A=Z%F=R%O=%RD=0%Q=)T5(R OS:=Y%DF=Y%T=40%W=0%S=Z%A=S+%F=AR%O=%RD=0%Q=)T6(R=Y%DF=Y%T=40%W=0%S=A%A=Z%F OS:=R%O=%RD=0%Q=)T7(R=Y%DF=Y%T=40%W=0%S=Z%A=S+%F=AR%O=%RD=0%Q=)U1(R=Y%DF=N% OS:T=40%IPL=164%UN=0%RIPL=G%RID=G%RIPCK=G%RUCK=G%RUD=G)IE(R=Y%DFI=N%T=40%CD OS:=S)
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Network Distance: 0 hops
OS detection performed. Please report any incorrect results at http://nmap.org/submit/ . Nmap done: 1 IP address (1 host up) scanned in 207.00 seconds
14)
Start a packet capture process running in a terminal window:
[terminal 2]# tshark -ni lo port 50001 Capturing on lo Leave this capture process running.
2-57
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
15)
In another terminal window, see if nmap shows one of your still listening Netcat processes on port 50001:
# nmap -sS -p 50001 172.16.0.X . . . snip . . . Nmap scan report for 172.16.0.X Host is up (0.00012s latency). PORT STATE SERVICE 50001/tcp open unknown . . . snip . . .
16)
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Look at the output obtained by the still running packet capture process. The output should be similar to the following:
0.000000 0.000055 0.000072
172.16.0.X -> 172.16.0.X 172.16.0.X -> 172.16.0.X 172.16.0.X -> 172.16.0.X
TCP 62016 > 50001 [SYN] Seq=0 Win=1024 Len=0 MSS=1460 TCP 50001 > 62016 [SYN, ACK] Seq=0 Ack=1 Win=32792 Len=0 MSS=16396 TCP 62016 > 50001 [RST] Seq=1 Win=0 Len=0
Leave this capture process running.
17)
Scan again with a full connect scan:
# nmap -sT -p 50001 172.16.0.X . . . snip . . . 50001/tcp open unknown
The service still shows that it is open (e.g. listening).
Nmap done: 1 IP address (1 host up) scanned in 0.08 seconds
18)
Look at the output obtained by the still running packet capture process. This time you should see output similar to the following:
174.045522 172.16.0.X -> 172.16.0.X 174.045561 172.16.0.X -> 172.16.0.X TSval=2770032 TSecr=2770032 WS=6 174.045588 172.16.0.X -> 172.16.0.X 174.045691 172.16.0.X -> 172.16.0.X TSval=2770032 TSecr=2770032 174.045709 172.16.0.X -> 172.16.0.X TSval=2770032 TSecr=2770032 174.045727 172.16.0.X -> 172.16.0.X 2-58
TCP 57692 > 50001 [SYN] Seq=0 Win=32792 Len=0 MSS=16396 TSval=2770032 TSecr=0 WS=6 TCP 50001 > 57692 [SYN, ACK] Seq=0 Ack=1 Win=32768 Len=0 MSS=16396a TCP 57692 > 50001 [ACK] Seq=1 Ack=1 Win=32832 Len=0 TSval=2770032 TSecr=2770032 TCP 57692 > 50001 [RST, ACK] Seq=1 Ack=1 Win=32832 Len=0a
TCP 50001 > 57692 [PSH, ACK] Seq=1 Ack=1 Win=32768 Len=2048a TCP 57692 > 50001 [RST] Seq=1 Win=0 Len=0
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
This time the scan completes the TCP handshake and the Netcat process starts sending data. After a few packets have been received, the nmap process sends a RESET and terminates the connection.
19)
Repeat the exact scan one last time:
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
# nmap -sT -p 50001 172.16.0.X . . . snip . . . 50001/tcp closed unknown
Nmap done: 1 IP address (1 host up) scanned in 0.08 seconds
20)
This time the scan shows no service listening on the port. This is because the Netcat process terminated after servicing the last connection.
Look at the output obtained by the still running packet capture process. This final time you should see output similar to the following:
331.921325 331.921347
172.16.0.X -> 172.16.0.X 172.16.0.X -> 172.16.0.X
TCP 57693 > 50001 [SYN] Seq=0 Win=32792 Len=0 MSS=16396 TSval=2927908 TSecr=0 WS=6 TCP 50001 > 57693 [RST, ACK] Seq=1 Ack=1 Win=0 Len=0
Terminate the command by pressing Ó¿cin the terminal window.
21)
Connect to one of the still running Netcat listening processes:
# nc 172.17.0.X 50001 secondary secondary secondary . . . snip . . . Ó¿c
22)
All data sent by the other process is displayed to STDOUT.
Connect to the final Netcat process still listening on UDP 50001:
# nc -u 10.100.0.X 50001 Õ
normal normal normal . . . snip . . . Ó¿c
Since the UDP protocol is stateless, the remote end does not know that we have connected until some data is sent—a blank line will suffice.
2-59
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
23)
Administrative privileges are no longer required; exit the root shell to return to an unprivileged account:
# exit
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu 2-60
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Content Diagnostic/Recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Rescue Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Recovery: mount & chroot . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Recovery Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Recovery: Network Utilities . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Lab Tasks 8 1. Recovery Runlevels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2. Recovering Damaged MBR . . . . . . . . . . . . . . . . . . . . . . . 11 3. Recover from Deleted Critical Files . . . . . . . . . . . . . . . . 15
z u s e j.h r u i F Chapter d t t iu r t e e b z o R m e : n to RESCUE d @ e rt ENVIRONMENTS s e n b e c o i r L sz. e r fu
3
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Diagnostic/Recovery Legacy Recovery Runlevels • Legacy runlevel 1 (prepare for single user mode) • Single user mode – S (or s or single) systemd Recovery Targets • systemd.unit=rescue.target Synonyms: 1, S, s, single) • systemd.unit=emergency.target Synonym: emergency
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu init=/bin/ksh
Runlevels
AT&T's System V Unix implements the concept of runlevels. Each runlevel defines which services should be started or stopped. At minimum a multi-user mode is used for the standard operation of the system, and a single user mode for system maintenance and recovery. On Linux systems using the sysvinit or upstart packages, runlevel 1 prepares for single user mode S (also known as s or single). Single User Mode
Single user mode is intended for system maintenance and usually requires that most services be stopped. User authentication is disabled and only one tty is available for the ID 0 (root) user. If the system is booting, a minimal boot-time cleanup is performed (e.g. deleting lock files, other configuration) before switching to single user mode. When booting to single user mode, the sulogin command is used. You must enter the root password at the sulogin prompt to continue to a shell prompt. Under traditional sysvinit, runlevel 1 should be used to enter runlevel S. However, jumping straight to runlevel S is useful when there is a problem during runlevel 1. One reason for going straight to runlevel S is a missing or damaged /etc/inittab file. Systems using systemd for the initial process call this the rescue target. For compatibility, runlevel 1, or S (or s or single) will use the rescue target.
3-2
Other Than /sbin/init
The kernel loads an initial process to begin booting the system. This is launched as /sbin/init (or /etc/init, or /bin/init) and if that is not found, /bin/sh. The init process is responsible for starting all other processes in the desired runlevel. However, if init becomes damaged, it is possible to execute a different program other than init during the boot process. This is done by passing the init=/path/to/executable kernel parameter at boot. The most common alternative is to have the kernel launch a shell by passing an argument such as init=/bin/bash. With init=/bin/bash, no systemd units are activated or scripts from /etc/ are run, so some minimal configuration is required in order to access all the filesystems read-write:
# # # # #
/usr/sbin/load_policy -i # If using SELinux mount -o remount,rw / /usr/lib/systemd/systemd-udevd --daemon /sbin/vgchange -ay # If using LVM mount -a
When exiting the alternative init program (such as a command shell), the kernel may panic. If so, asynchronous disk writes will be lost. To ensure that the data is written to disc, run the sync command and unmount filesystems before exiting the shell. When the kernel panics, the system will have to be power cycled.
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Rescue Procedures Installer Based Rescue Environments • RHEL7: Anaconda Rescue Mode Options nomount noprobe • SLES12: YaST Rescue Mode
Rescue Environment
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Sometimes a problem may prevent running any diagnostics tools or commands to correct the problem. In that scenario, you need boot the system into a fail-safe environment. Sometimes live CDs are used such as Knoppix, but the best practice method is to use the rescue environment provided by the Linux distribution. The reason is that if security labeling technology like SELinux is in use, then the official rescue environment will preserve and properly label files created or modified during a rescue.
The SLES12 Rescue Mode
When booting from the SLES12 CD/DVD, there is a menu item labeled Rescue System. It is possible to pass kernel parameters; this is done by typing them into the provided Boot Options field at the bottom of the screen:
install=nfs://10.100.0.254/netinstall usedhcp=1 rescue=1
Typically, the rescue environment is provided on the installation DVD as a mode of the installation program. The RHEL7 Rescue Mode
When booting from the RHEL7 Network Install and Recovery Disc, or the Installation DVD, the rescue environment is launched by selecting the Rescue Installed System option from the menu, or hitting à and typing linux rescue at the boot: prompt. Additional options, such as nomount, noprobe, and standard kernel arguments can be added to the command. y The nomount option prevents automatic mounting of any filesystems found on the system. If nomount is not passed, the rescue environment will ask the user before mounting detected filesystems. y The noprobe option prompts the user before loading any device drivers. If desired, any network interfaces can be configured during boot.
3-3
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Recovery: mount & chroot Utilities • mount • chroot
Rescue Mode Tools
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Once a system has been booted into a rescue environment, there are several tools that are commonly needed during system repair. These rescue tools can be used outside of the rescue environment. Many tools available on a normal system are also available on the rescue CD. If a tool isn't available on the rescue CD, it can sometimes be found on a partially recovered system. The mount Command
Recovering a system will often involve manually mounting filesystems. Normally on a system the mount command is able to determine missing arguments from system configuration files. During recovery, it is usually necessary to manually specify all arguments. The basic mount command consists of specifying what device should be mounted, where it should be mounted and what type of filesystem it contains:
the normal NFS daemons running.
# mount -o nolock server1.example.com:/export/ /mnt/nfs/ The chroot Command
When booting from rescue media, the root of the filesystem will be on the boot media instead of the normal root of the system being repaired. Sometimes, it is necessary to change the root of the filesystem to be that of the system being recovered. This is accomplished with the chroot command. The chroot command launches a program in an environment that makes the specified directory appear to be the root of the filesystem. Nothing outside of the specified directory will be available to the program. If no program is specified, the default is to launch /bin/sh (relative to the new root). [R7] The following applies to RHEL7 only:
Sometimes it is necessary to remount a filesystem during recovery, for example to switch it from read-only to read-write mode. However, it will often not be possible to unmount the filesystem. Instead, remount the filesystem with the remount option:
During a RHEL7 rescue, host filesystems are automatically detected and mounted in read-write mode, or in read-only mode; skipping the detection of filesystems is also an option. When host filesystems are detected and mounted, they are available under /mnt/sysimage/. To start a shell with /mnt/sysimage/ as the filesystem root, use the command:
# mount -o remount,rw /mnt/sysimage/
# chroot /mnt/sysimage/
If it is necessary to mount an NFS share during recovery, the nolock option can be passed to mount. This allows NFS to be used without
Optionally, chroot can define the shell as argument to load: chroot /mnt/sysimage bash -l.
# mount -t ext3 /dev/hda1 /mnt/sysimage/
3-4
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
[S12] The following applies to SLES12 only:
SUSE Linux Enterprise Server does not attempt to automatically mount host filesystems. The user must instead manually mount all desired filesystems first. For example:
# # # #
mkdir /mnt/sysimage mount /dev/sda2 /mnt/sysimage mount /dev/sda3 /mnt/sysimage/home chroot /mnt/sysimage
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
If Mounting Filesystems Fail
Use mknod to create device nodes. Use fdisk -l to view partitions. If there is filesystem corruption causing the mount failure, run fsck against the filesystem, then mount (if needed) the root filesystem at /mnt/sysimage/ (and other filesystems under /mnt/sysimage/).
# mknod /dev/sda # mknod /dev/sda1 # mknod /dev/sda2 ... # fdisk -l /dev/sda # fsck -y /dev/sda1 # fsck -y /dev/vg0/root ... # mount /dev/vg0/root /mnt/sysimage/ # chroot /mnt/sysimage/ bash -l
3-5
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Recovery Examples Utilities • fsck/xfs_repair • grub2-install • rpm
The fsck Command
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Some filesystem errors prevent the system from booting. In such cases, repairs must be performed by booting from rescue media and running fsck manually. fsck is a wrapper to the actual filesystem check utility. Two options to fsck are important to be aware of. The -f option tells fsck to force a filesystem check, even if it thinks one isn't needed. The -y option tells fsck not to ask any repair related questions, but instead assume that the answer is always yes. Filesystem checks should only be performed on unmounted filesystems.
# fsck -f -y /dev/sda1 . . . output omitted . . .
The XFS filesystem uses xfs_repair instead of fsck. grub2-install
If the master boot record becomes damaged, for example after installing Windows on a dual-boot system, it is necessary to reinstall GRUB. This can be done one of two ways within the rescue environment. The first way is using the --root-directory=/some/dir/ option. Under this directory, grub2-install must be able to find a valid /boot/grub/ directory. For example, if the host system's root filesystem were mounted under /mnt/sysimage/, and the master boot record of /dev/sda needed to be rewritten, the command would be:
# grub2-install --root-directory=/mnt/sysimage/ /dev/sda 3-6
. . . output omitted . . .
If this does not work, alternatively, chroot can be used in combination with grub2-install. It is important to note that doing so would use the host version of grub2-install instead of the rescue media grub2-install. The commands would be similar to:
# chroot /mnt/sysimage # grub2-install /dev/sda . . . output omitted . . . The rpm Command
One possible reason for being in rescue mode is a damaged or missing piece of core software. It is possible to tell rpm to use an alternative directory as the root directory. This is done using the --root option. For example:
# rpm -Uvh --root /mnt/sysimage/ coreutils-x.y.z.rpm --replacepkgs . . . output omitted . . .
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Recovery: Network Utilities Utilities • ip • Default gateway • DHCP clients
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Manual IP and Netmask Configuration
[S12] The following applies to SLES12 only:
In a rescue environment, it is often useful to have network access. Some rescue environments include the ability to automatically start network devices. None the less, it is important to understand and to be able to configure networking from the command line, if necessary. To enable networking the network interface must first be configured. This is done using either the ip command, or the ifconfig command. Generally, providing a network address and netmask is sufficient:
In SLES12 rescue environment, the wicked command can be used to configure a network interface via DHCP with the following syntax:
# wicked ifup eth0
For those used to dhcpcd, a wrapper is provided.
# ip addr add dev eth0 10.100.0.42/24
# ifconfig eth0 10.100.0.42 netmask 255.255.255.0 Manual Default Gateway Configuration
In order to reach hosts outside of the local network, a default gateway must be defined. This can be accomplished by using either the ip command, or the route command:
# ip route add default via 10.100.0.254 # route add default gw 10.100.0.254 Using DHCP Configuration
Usually, it is preferable to configure a network interface via DHCP. This can be accomplished with a DHCP client utility.
# dhclient eth0 3-7
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Lab 3 Estimated Time: S12: 40 minutes R7: 40 minutes
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Task 1: Recovery Runlevels Time: 10 minutes Page: 3-9 Requirements: b (1 station)
Task 2: Recovering Damaged MBR
Page: 3-11 Time: 15 minutes Requirements: b (1 station) c (classroom server)
Task 3: Recover from Deleted Critical Files Page: 3-15 Time: 15 minutes Requirements: b (1 station) c (classroom server)
3-8
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Objectives y Boot the system into systemd's rescue mode. Requirements b (1 station)
Lab 3
Task 1 Recovery Runlevels Estimated Time: 10 minutes
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Relevance One common solution to solving boot related problems is to enter into single user mode (i.e. init S) or runlevel 1. On systems using systemd, this is the rescue target.
1)
The following actions require administrative privileges. Switch to a root login shell:
$ su Password: makeitso Õ
2)
[R7] This step should only be performed on RHEL7.
Reboot the system. When the boot menu appears, press the à key to stop the automatic timeout timer. Highlight the rescue boot option, then type e to modify and pass additional kernel arguments before booting. Append the number 1 to the end of the kernel line. Press ӿx to boot the system into single user mode.
...rhgb quiet 1 Ó¿x
3)
[S12] This step should only be performed on SLES12.
Reboot the system. When the boot menu appears type 1. Highlight the
(recovery mode) boot option, then type e to modify and pass additional kernel arguments before booting. Append the number 1 to the end of the kernel line (beginning with linux). Type Ó¿x or Í to boot into single user mode. ...nomodeset x11failsafe 1 Ó¿x
4)
After the kernel and basic system start-up, a login prompt will appear asking for the root password:
Give root password for maintenance (or press Control-D to continue): makeitso Õ 3-9
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
5)
Verify that the system is in runlevel 1:
# who -r If switching runlevels, look for last= in the output to indicate the previous runlevel.
run-level 1 Apr 15 13:08
6)
[R7] [R7] [R7] [R7] [R7] [S12] [S12] [S12] [S12] [S12]
7)
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
List out the contents of the /etc/fstab file. Record the system's partitions, mount points and filesystem types. This information will be useful for later tasks.
# cat /etc/fstab /dev/mapper/vg0-root / xfs defaults 1 1 UUID=0373436a-349a-4a66-a005-2e11e6f7a54d /boot xfs defaults 1 2 /dev/mapper/vg0-tmp /tmp xfs defaults 1 2 /dev/mapper/vg0-var /var xfs defaults 1 2 /dev/mapper/vg0-swap swap swap defaults 0 0 /dev/vg0/swap swap swap defaults 0 0 /dev/vg0/root / xfs defaults 1 1 /dev/sda1 /boot xfs defaults 1 2 /dev/vg0/tmp /tmp xfs defaults 1 2 /dev/vg0/var /var xfs defaults 1 2
Use the df (or mount) command to determine the partition, filesystem, and mount point information. Record this information for later use.
# df -hT Filesystem /dev/mapper/vg0-root devtmpfs tmpfs tmpfs tmpfs /dev/mapper/vg0-var /dev/mapper/vg0-tmp /dev/sda1
3-10
The content of the /etc/fstab file may differ than the output shown here.
The output may different from that shown here.
Type xfs devtmpfs tmpfs tmpfs tmpfs xfs xfs xfs
Size 8.0G 351M 372M 372M 372M 2.0G 1014M 497M
Used 3.2G 0 0 9.5M 0 181M 33M 103M
Avail Use% Mounted on 4.9G 40% / 351M 0% /dev 372M 0% /dev/shm 362M 3% /run 372M 0% /sys/fs/cgroup 1.9G 9% /var 982M 4% /tmp 394M 21% /boot
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Objectives y Use the rescue environment to recover a damaged MBR Requirements b (1 station) c (classroom server)
Lab 3
Task 2 Recovering Damaged MBR Estimated Time: 15 minutes
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Relevance Certain problems are severe enough that they render the system unbootable by normal means. In these cases, you must boot the system using an alternate boot media and possibly alternate root filesystem to repair the system. This task teaches how to enter and use the rescue environment.
1)
The following actions require administrative privileges. Switch to a root login shell:
$ su Password: makeitso Õ
2)
Proper preparation for any disk or filesystem disaster includes having a copy of the partition table and what each partition contains. At a minimum you should know where your /boot/, /, and /var/ filesystems are located. Use df to view where they are currently located:
# df . . . output omitted . . .
3)
Record which partition contains /boot: Result:
4)
Record which partition contains /: Result:
5)
Record which partition contains /var: Result:
3-11
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
6)
Overwrite your MBR by running the command listed below. Please double-check your syntax before running the command, as simple typos here can easily render the machine inoperable (beyond the scope of this course to recover):
# dd if=/dev/zero of=/dev/sda count=1 bs=446 . . . output omitted . . . # systemctl reboot
7)
Carefully type this command as any typos could completely destroy your Linux installation.
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Your instructor will provide instructions on whether PXE is available, and if so how to PXE boot your lab system. If PXE is not available, skip to the next step. PXE boot your workstation. Most Intel systems do this with the Ï key after POST.
From the PXE menu, select Rescue Mode by typing XXX (replace with correct value from menu). This will boot to the Rescue environment. Skip to step 11 for RHEL7 systems, or 12 for SLES12 systems.
8)
[R7] This step should only be performed on RHEL7.
If you are using the installation DVD, or the Rescue CD, complete the following: Boot from the optical media provided by your instructor. At the menu, select Troubleshooting
At the menu, select Rescue a Red Hat Enterprise Linux system
9)
[S12] This step should only be performed on SLES12.
If you are using the "disk 1" installation media, it contains all the required components to boot into the rescue environment without requiring a network server. Otherwise, skip to step 10 to use the custom boot ISO. Boot off the media then complete the following actions: Boot from the CD
The system should boot to a list of bootable targets.
Select Rescue System The system should boot directly to the rescue environment 3-12
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Skip to step 12.
10)
[S12] This step should only be performed on SLES12.
If you are using a CD created from your own suse-boot.iso image then you should complete the following actions: Boot from the CD
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu The system should boot to a list of targets.
Use your £ key to highlight the Install option Then type the following:
install=nfs://10.100.0.254/export/netinstall/SLES12 usedhcp=1 rescue=1
11)
[R7] This step should only be performed on RHEL7.
It will then ask if it should attempt to locate and automatically mount the Linux filesystems. In many situations this will work fine and you would choose Continue. It would then mount the root filesystem at /mnt/sysimage/ and mount child filesystems as well.
In order to become versed with the steps required in the worst case scenario, do not have it attempt automatic mounting. Select Skip to go directly to the command prompt.
Note the partition number for your boot partition.
sh-4.2# fdisk -l
12)
[S12] This step should only be performed on SLES12.
The system will boot and then drop you to a prompt:
No password will be required.
Rescue Login: root
13)
If the system being rescued is using LVM based devices, it may be necessary to discover, import, activate, or create the necessary device files:
# lvmdiskscan . . . snip . . . /dev/sdb 6 disks 17 partitions
[
74.51 GiB]
In some cases, misconfigured external storage targets, such as iSCSI or FCoE, fail to automatically detect. Note in this case that 1 LVM physical volume is detected.
3-13
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
0 LVM physical volume whole disks 1 LVM physical volume # lvm vgs VG #PV #LV #SN Attr VSize VFree vg0 1 5 0 wz--n- 34.18g 25.68g # lvm vgchange -a y vg0 5 logical volume(s) in volume group "vg0" now active # ls /dev/vg0/ root swap tmp var
14)
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Mount the normal filesystems under /mnt/sysimage/:
# # # # # # # # #
15)
If there is an x in the attributes then the Volume Group has been exported and will need to be imported using lvm vgimport vg0.
mkdir /mnt/sysimage mount /dev/vg0/root /mnt/sysimage/ mount -o bind /dev /mnt/sysimage/dev/ mount -o bind /sys /mnt/sysimage/sys/ mount -o bind /proc /mnt/sysimage/proc/ mount /dev/vg0/var /mnt/sysimage/var/ mount /dev/vg0/tmp /mnt/sysimage/tmp/ mount /dev/sda1 /mnt/sysimage/boot/ chroot /mnt/sysimage/
Once the root filesystem is mounted it is possible to look at /mnt/sysimage/etc/fstab to see the normal mounts. If LABEL or UUID references are used instead of the device filename, use the findfs or blkid commands to identify the device filenames.
The chroot command may not provide an indication that you are currently in a pseudo-root.
Repair the master boot record (MBR) on the primary drive:
# grub2-install /dev/sda Installing for i386-pc platform. Installation finished. No error reported.
16)
Exit the /mnt/sysimage/ isolated root. Halt the system:
# exit # systemctl poweroff
17)
3-14
Boot the system to the hard drive. If needed, remove the rescue media used. If the steps above were performed correctly the system can now be booted normally.
This terminates the chrooted shell.
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Objectives y Boot to the rescue environment while activating networking. y Mount an NFS share y Install a package using rpm --root. Requirements b (1 station) c (classroom server)
Lab 3
Task 3 Recover from Deleted Critical Files
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Estimated Time: 15 minutes
Relevance There exist many critical files, that if deleted, renders the system inoperable and results in a system that cannot boot. Knowing the procedure to re-install the files will enable you recover from such a situation. Notices y This lab task will require booting to rescue media as detailed in the previous lab task.
1)
Remove the /bin/sh symbolic link and the /sbin/init file then reboot the system.
# rm -f /bin/sh /usr/lib/systemd/systemd # systemctl reboot -f
2)
[R7] This step should only be performed on RHEL7.
Boot into the rescue environment (as described previously). Have the filesystems automatically mounted by clicking Continue.
3)
[S12] This step should only be performed on SLES12.
Boot into the rescue environment (as described previously). Mount all filesystems under /mnt/sysimage/:
# mkdir /mnt/sysimage/ # mount /dev/vg0/root /mnt/sysimage/
4)
The device used should be the root filesystem.
[S12] This step should only be performed on SLES12.
Once the root filesystem has been mounted, continue mounting all other filesystems: 3-15
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
# mount /dev/sda1 /mnt/sysimage/boot/ # mount /dev/vg0/var /mnt/sysimage/var/ # mount /dev/vg0/tmp /mnt/sysimage/tmp/
5)
It may be necessary to perform a filesystem check with fsck in order to mount the other filesystems.
Recreate the symbolic link for /bin/sh to /bin/bash:
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
# cd /mnt/sysimage/bin/ # ln -s bash sh # cd /
6)
One of the reasons the system is failing to boot correctly is because the /sbin/init file is missing. Identify the package that contains this file:
# rpm --root /mnt/sysimage/ -qf /usr/lib/systemd/systemd systemd-version
7)
Use the rpm --root command to verify the systemd package.
[R7]
# rpm --root /mnt/sysimage/ -V systemd SM5....T. c /etc/rc.d/rc.local missing /usr/lib/systemd/systemd
8)
Networking is required to allow access to the NFS share on server1.example.com. If DHCP is not available, skip to the next step, otherwise use DHCP to configure network access:
[R7]
# dhclient eth0 # dhcpcd eth0 eth0 up
[S12] [S12]
If your network (IP, DNS, gateway) is now activated, skip to step 13.
9)
If DHCP is not available, or the network was not configured while booting to the recovery media, the address will have to be added statically:
# ip addr add 10.100.0.X/24 dev eth0
3-16
It is possible to configure network access either dynamically or statically, during the rescue environment setup phase.
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
10)
To enable DNS resolution create a new resolver configuration file:
# echo •nameserver 10.100.0.254• > /etc/resolv.conf
11)
Add a default route to the system:
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
# ip route add default via 10.100.0.254
12)
Test the network settings, connection and name resolution by pinging the classroom server:
# ping -c 1 server1.example.com PING server1.example.com (10.100.0.254) 56(84) bytes of data. 64 bytes from server1.example.com (10.100.0.254): icmp_seq=0 ttl=64 time=0.124 ms --- server1.example.com ping statistics --1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 0.124/0.124/0.124/0.000 ms
13) [R7] [R7] [R7] [R7] [S12]
14) [R7] [R7] [R7] [R7] [R7] [S12] [S12] [S12]
Mount the NFS share from server1.example.com which contains the installation media and RPMs:
# # # # #
mount server1.example.com:/export/netinstall/RHEL7/Packages/ /mnt/sysimage/mnt/ rpm --root /mnt/sysimage/ -Uvh /mnt/sysimage/mnt/libmicrohttpd-*.x86_64.rpm umount /mnt/sysimage/mnt/ mount server1.example.com:/export/courserepos/errata/R7/ /mnt/sysimage/mnt/ mount server1.example.com:/export/netinstall/SLES12/ /mnt/sysimage/mnt/
Attempt to reinstall the systemd package using the --root option:
# rpm --root /mnt/sysimage/ -Uvh /mnt/sysimage/mnt/systemd-???-*.rpm Preparing... ########################################### [100%] . . . snip . . . If not already updated, this will succeed (with package systemd-version is already installed dependencies) and the next step will be unnecessary. . . . snip . . . # rpm --root /mnt/sysimage/ -Uvh /mnt/sysimage/mnt/suse/x86_64/systemd-???-*.rpm Preparing... ########################################### [100%] package systemd-version is already installed 3-17
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
15) [R7] [R7] [S12] [S12] [S12]
16) [R7]
17)
To reinstall an already installed package version, it is necessary to use the --replacepkgs option of the rpm command:
# rpm --root /mnt/sysimage/ -Uvh /mnt/sysimage/mnt/systemd-???-*.rpm --replacepkgs . . . output omitted . . . # rpm --root /mnt/sysimage/ -Uvh /mnt/sysimage/mnt/suse/x86_64/sysvinit-*.rpm --replacepkgs Preparing... ########################################### [100%] 1:systemd-version ########################################### [100%]
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Verify that the /usr/lib/systemd/systemd file was reinstalled on the system:
# rpm --root /mnt/sysimage/ -V systemd SM5....T. c /etc/rc.d/rc.local
The missing /usr/lib/systemd/systemd file has been recovered and the /bin/sh symbolic link has been recreated. Reboot the system and remove the DVD-ROM from the DVD drive. Boot the system as normal.
# reboot
Some remote interfaces reboot back to the last bootable device. In that case, power off the system, instead of rebooting, then boot to the hard drive.
3-18
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Content Linux Boot Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 System Boot Method Overview . . . . . . . . . . . . . . . . . . . . . . 3 systemd System and Service Manager . . . . . . . . . . . . . . . . 4 Using systemd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Booting Linux on PCs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Troubleshooting With GRUB 2 . . . . . . . . . . . . . . . . . . . . . . 10 Boot Process Troubleshooting . . . . . . . . . . . . . . . . . . . . . . 12 Troubleshooting: Linux and Init . . . . . . . . . . . . . . . . . . . . . . 13 Process Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 Process Management Tools . . . . . . . . . . . . . . . . . . . . . . . . 15 Troubleshooting Processes: top . . . . . . . . . . . . . . . . . . . . . 16 Filesystem Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 Filesystem Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . 18 Backup Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 Backup Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 Backup Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 Lab Tasks 23 1. Troubleshooting Problems: Topic Group 1 . . . . . . . . . . 24
z u s e j.h r u i F Chapter d t t iu r t e e b z o R m e : n to TOPIC GROUP 1 d @ e rt s e n b e c o i r L sz. e r fu
4
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Linux Boot Process Several packages, scripts, and programs participate Great potential for problems (due to number of components involved and the complexity of their interactions)
Booting
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
The boot process of a Linux system is fairly complex and involves a large number of files. When troubleshooting boot related problems it is important to have a good understanding of the exact sequence of events that occur during the boot process. This knowledge greatly improves troubleshooting skills, enabling quick resolution of boot-time problems. The following table lists important packages, configuration files, and log files related to the boot process: Packages
kernel, mingetty, bash, systemd, [S12] aaa_base [R7] initscripts
Configuration files /etc/rc[0-6].d, /etc/init.d/*,
/etc/fstab /boot/grub2/grub.cfg /etc/grub.d/, /etc/default/grub /etc/systemd/ Log files
4-2
/var/log/messages /var/log/boot.log [R7] /var/log/dmesg [S12] /var/log/boot.msg
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
System Boot Method Overview Runlevel-driven - AT&T System V init • Each runlevel can have a unique defined list of services to start and stop • Used by most commercial Unix systems and Linux distributions Event-driven - Upstart • Originally created for Ubuntu, also used with RHEL6 • Builds upon SysV style, launches scripts based on events Dependency-driven - Systemd • Parallelizes as much as dependencies allow • Unit files replace SysV init scripts • Used in RHEL7, SLES12, Ubuntu 15.04
init Styles
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
After the Linux kernel runs /sbin/init, and init reads its configuration file and starts all programs listed there, bringing the system up to a predefined working state, or runlevel, two styles of init are commonly used. BSD-derived init programs only have a multi-user mode and a single-user mode, while System V-derived init programs usually have a variety of different predefined runlevels which can be selected.
Many Linux distributions have signaled their intent to move to systemd as RHEL, SLES, Debian and Ubuntu already have. Systemd Targets replace SysV runlevels
Systemd uses targets to group units together. It also maintains backwards compatibility using target aliases. Unlike runlevels multiple targets can be active as the same time.
System V-style init programs offer administrators more flexibility than BSD-style init programs, since they make it possible to configure the system in a variety of different ways. These different runlevels can then be selected as necessary, depending upon circumstances. Because of this flexibility, almost all Unix systems have moved away from the BSD-style init and use a variant of System V-style init. Upstart builds upon the features of the System V-style initialization scheme. It adds support for responding to events. Events may be triggered when services are started or stopped, or by other processes on the running system.
Upstart has no native notion of runlevels. The event-driven approach that has been taken to managing services and jobs makes the SysV runlevel approach obsolete. However, since a lot of work will need to take place before all software will be converted to this new mode of operation, Upstart has created SysV runlevel compatibility which is used by default.
4-3
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
systemd System and Service Manager Provides strict control of daemons and speedy booting Natively uses "unit" files • Compatible with SysV init scripts Uses socket and D-Bus activation for starting daemons • Aggressive parallelization with dependency support Offers on-demand starting of daemons and daemon monitoring • Captures all STDOUT and STDERR from daemons • uses Linux cgroups to track daemon processes • controls all enviromental runtime properties for daemons Maintains mount and automount points systemctl - Administration command Used in RHEL7, SLES12, Debian 8, and Ubuntu since 15.04
systemd Features
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Linux systems historically used the SysV init bootup system, or more recently the Upstart system. systemd replaces these earlier systems and offers the following features and benefits: y Compatibility with existing SysV init scripts. y Fast booting with aggressive parallelization (avoids most strict ordering of daemon startup by using socket and D-Bus activation methods) y Keeps track of processes using the cgroup feature of the Linux kernel (not by their PIDs). y Records everything a daemon sends to STDOUT, STDERR, or via syslog(). y Standardizes many aspects of the bootup process and managing services making it easier for system administrators who manage heterogeneous environments. Unit Files
Although systemd has compatibility with SysV init scripts placed in the /etc/init.d/ directory, its native equivalents are called "unit" files. SysV init scripts are a mix of configuration settings (such as command line arguments either hardcoded into the script, or read from /etc/sysconfig/*) and shell code. In contrast, unit files are simple, declarative descriptions that are usually less than 10 lines, and do not include any code. To modify a unit file, copy it from /lib/systemd/system/ to /etc/systemd/system/ keeping the filename the same and edit it 4-4
there. The files in /etc/systemd/system/ override the /lib/systemd/system/ files and will not be touched by the package management system. An example unit file follows: File: /lib/systemd/system/crond.service
[Unit] Description=Command Scheduler After=syslog.target auditd.service ypbind.service [Service] EnvironmentFile=/etc/sysconfig/crond ExecStart=/usr/sbin/crond -n $CRONDARGS [Install] WantedBy=multi-user.target
Documentation for unit files can be found in systemd.unit(5),
systemd.exec(5), and systemd.service(5). systemd Targets
Unit files ending in .target are used for grouping units together and also define certain synchronization points used during boot up. In effect, they are a more flexible replacement for the SysV init runlevel concept. Unlike runlevels, more than one target may be active at the same time.
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Socket Activation
Tracking Processes Belonging to a Daemon
In the traditional SysV init bootup sequence, the order daemons are started is carefully defined so that dependencies are satisfied. For example, many daemons send syslog messages (via the socket file /dev/log) when they start so the syslog daemon must be started before anything attempts to use it. The design systemd uses is that it creates all the sockets for all daemons in one step, and then starts all the daemons. If one daemon requires another, it will connect to the pre-created socket and send the request which will then be queued by the Linux kernel until the other daemon is ready to dequeue all the messages and process them. The result is that daemons, even those with interdependencies, can be started in parallel.
When stopping or restarting a daemon cleanly, it is important that all the running processes associated with a daemon are stopped. It is also useful for the Systems Administrator to be able to identify which processes belong to each daemon (especially given the proliferation of processes on modern Linux systems). In certain circumstances such as crond or web server CGI processes it can be complex to track back the inheritance. Linux has always supported the Unix feature of process groups and tracking of the parent PID, however these decades old methods have limitations and allow a process to "escape" supervision.
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Another benefit is that a daemon can be configured to be auto-spawned by systemd when a request is made to its socket file. Finally, because the socket files are not created by the daemon itself, a daemon can exit or crash, then be restarted without affecting client processes or dropping any request. Filesystem Management
On a box using systemd, the /etc/fstab will only contain entries for disk based filesystems. All the kernel based filesystems are handled by .mount unit files.
$ ls -1 /lib/systemd/system/*.mount /lib/systemd/system/dev-hugepages.mount /lib/systemd/system/dev-mqueue.mount /lib/systemd/system/media.mount /lib/systemd/system/proc-fs-nfsd.mount /lib/systemd/system/proc-sys-fs-binfmt_misc.mount /lib/systemd/system/sys-fs-fuse-connections.mount /lib/systemd/system/sys-kernel-config.mount /lib/systemd/system/sys-kernel-debug.mount /lib/systemd/system/var-lib-nfs-rpc_pipefs.mount
Normally, disk based filesystems listed in the /etc/fstab are checked and mounted before they are used. systemd offers the alternative that filesystems can be configured to be checked and mounted when they are first accessed. This can speed boot, especially for filesystems not needed during boot such as /home. This is done by editing the /etc/fstab and adding the comment=systemd.automount mount option to the filesystem.
The Linux kernel 2.6.24 added a new feature called "control groups" (cgroups) that was primarily created by engineers at Google. The use of cgroups allows for labeling, control, isolation, accounting, prioritization, and resource limiting of groups of processes. By default, systemd only uses cgroups for labeling and tracking which processes belong to which daemon. Every child process inherits the cgroup of its parent and can't escape that cgroup unless it is privileged. This way systemd can kill a daemon and all processes it created with certainty. View the systemd created cgroups with systemd-cgls either in their entirety or for an individual service:
$ systemd-cgls . . . output omitted . . . $ systemd-cgls systemd:/system/dbus.service systemd:/system/dbus.service: - 627 /bin/dbus-daemon --nofork --systemd-activation - 656 /usr/libexec/polkit-1/polkitd --no-debug - 708 /usr/sbin/modem-manager With systemd, when users login to interactive sessions, all processes for that user are placed into a cgroup unique to that login session. This way all processes for users are tracked and systemd can be configured to terminate all user processes on logout so that straggler processes don't hang around. This is done with the following edit: File: /etc/systemd/logind.conf
[Login] + KillUserProcesses=yes
4-5
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Using systemd Starting and Stopping a service: • systemctl start unit_name.service • systemctl stop unit_name.service • systemctl restart unit_name.service Enabling and Disabling a service: • systemctl enable unit_name.service • systemctl disable unit_name.service • systemctl mask unit_name.service Listing services and their state: • systemctl status unit_name.service • systemctl list-unit-files --type=service Modern systemctl assumes the .service suffix if it is omitted
Enabling a Service
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
When enabling a service, a symlink is created in the service_name.wants/ directory of the default target. When using systemctl to do this, it helpfully prints to the screen what it is doing. For example, the following enables the autofs.service:
# systemctl enable autofs ln -s •/usr/lib/systemd/system/autofs.service• •/etc/systemd/system/multi-user.target.wants/autofs.service• Disabling a Service
To disable a service you can either use disable or mask. The difference is with mask, it isn't possible to start the service manually. For example:
# systemctl disable autofs rm •/etc/systemd/system/multi-user.target.wants/autofs.service• # systemctl mask autofs ln -s •/dev/null• •/etc/systemd/system/autofs.service• # systemctl start autofs Failed to issue method call: Unit autofs.service is masked. To undo a mask operation, use unmask:
# systemctl unmask autofs rm •/etc/systemd/system/autofs.service•
4-6
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Checking on a Service The status option is quite powerful. It shows the current status, a list of all the processes, and the last 10 lines of logged output from the daemon. For example:
# systemctl status crond crond.service - Command Scheduler Loaded: loaded (/usr/lib/systemd/system/crond.service; enabled) Active: active (running) since Thu, 05 Jul 2012 13:33:53 -0600; 4h 16min ago Main PID: 647 (crond) CGroup: name=systemd:/system/crond.service \ 647 /usr/sbin/crond -n
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Jul 05 13:33:53 mentor.gurulabs.com /usr/sbin/crond[647]: (CRON) INFO (running with inotify support) Obtain a list of services that failed at boot:
# systemctl --failed
Obtain the status of a service on a remote server:
# systemctl -H
[email protected] status autofs Listing all Services
To see a list of all possible services and what their state is, use the following command:
# systemctl list-unit-files --type=service UNIT FILE STATE abrt-ccpp.service enabled abrt-oops.service enabled abrt-vmcore.service enabled abrtd.service enabled accounts-daemon.service disabled acpid.service enabled arp-ethers.service enabled iscsi.service masked . . . snip . . .
4-7
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Booting Linux on PCs Booting is a critical and complex sequence • Linux can have very infrequent reboots (long uptimes) • Little opportunity for SysAdmin to become familiarized • Familiarity required for troubleshooting bootup errors Main Actors • System BIOS or UEFI • Sector 0 of boot device (BIOS) or UEFI boot manager shim • GRand Unified Bootloader (GRUB) • Initial ramdisk • Linux kernel • /sbin/init launches bootup scripts from /etc/
System Boot Procedure
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Understanding the exact sequence of events that occur during a boot of Linux greatly improves troubleshooting, skills, enabling quick resolution of boot-time problems. The Extensible Firmware Interface (EFI, originally the Intel Boot Initiative) provides a replacement for the traditional BIOS, initially supporting Itanium. HP worked on updating the LILO boot loader in 2003 with EFI support as a separate project: ELILO. On newer Linux systems, GRUB 2 is used instead of ELILO. Unified EFI (UEFI) was created as an industry-wide specification to replace the Intel-only EFI specification. The GUID Partition Table (GPT) format is included in the UEFI specification and, unlike MBR, supports disks/LUNs larger than 2TB and a straightforward approach to partition types.
Linux supports the traditional BIOS/MBR mode of booting as well as BIOS/GPT and UEFI/GPT. The new UEFI secure boot feature has been used on PCs since Windows 8. Though many Intel64 systems allow for disabling secure boot, commercial Linux distributions provide options for booting Linux with UEFI secure boot.
4-8
BIOS/MBR Boot
Following is the Linux boot process using BIOS/MBR: 1. System BIOS performs three tasks: y Power-On Self Test (POST) y Initial hardware setup and configuration y Loads option ROM from add-in cards (SCSI, SAN HBA, RAID) y Selects boot device and executes MBR from device 2. First stage GRUB (boot.img) in MBR (446 bytes); it loads: y stage1.5 GRUB (core.img) using int13 BIOS calls, which has embedded GRUB 2 modules needed to access /boot. 3. Second stage GRUB y Reads and uses configuration file or displays GRUB command prompt y Loads initial ram disk (usually specified) y Loads, decompresses, and executes selected Linux kernel from hard drive with command line arguments 4. Linux kernel y Initializes and configures hardware using drivers statically compiled into the kernel y Decompresses the initramfs image and mounts it y Runs init script from initramfs image y init script loads kernel modules and performs tasks necessary to mount the real root filesystem y Mounts the root partition passed to the kernel by the boot loader using the root= kernel command-line option (usually read only) as the root partition, replacing the initrd y Executes /sbin/init
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
UEFI/GPT Boot Following is the Linux boot process using UEFI/GPT which differs from BIOS/MBR boot until the second stage GRUB is loaded: 1. UEFI firmware performs five tasks: y Power-On Self Test (POST) y Loads UEFI option ROM from add-in cards (SCSI, SAN HBA, RAID) y UEFI boot manager consults NVRAM variables for default UEFI boot entry y Firmware initializes the hardware required for booting y UEFI boot manager executes UEFI application from default boot entry 2. UEFI SecureBoot shim (shim.efi) on EFI System Partition (ESP); it loads: y stage1.5 GRUB (grubx64.efi) which has embedded GRUB 2 modules needed to access /boot. y Note that the shim is signed in case SecureBoot is enabled. It is used in either case. 3. Second stage GRUB and subsequent steps proceed as before. System Boot Daemons
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
systemd is a replacement for the traditional init daemon. On a systemd system, /sbin/init is a symlink to /lib/systemd/systemd. It is an asynchronous, highly parallelized system that uses target dependencies for ordering. When it starts, it activates all dependencies of the /etc/systemd/system/default.target, which is a symlink for either multi-user.target or graphical.target. See bootup(7). [S12] The following applies to SLES12 only:
On SLES12, there are several other things that init takes care of:
y Runs /etc/init.d/boot.d/ scripts (S symlinks to /etc/init.d/boot.*) on boot y Runs /etc/init.d/rc#.d/script start for the relevant runlevel y Runs /etc/init.d/after.local only at boot time y Launches getty programs (e.g. mingetty) UEFI Secure Boot
Secure Boot is a security standard to prevent "unauthorized" software, rootkits and low-level malware from being loaded during the boot process. It is usually enabled on modern desktop PCs, but not on servers. Examine the appropriate UEFI NVRAM variable to determine if SecureBoot is enabled. A value of 1 indicates it is enabled:
# od -An -t u1 /sys/firmware/efi/vars/SecureBoot-8be4df61-93ca-11d2-aa0d-00e098032b8c/data 1 On a Linux system using Secure Boot, the following restrictions apply:
y Kernel modules must be signed y kexec and thus kdump are disabled y Debug interfaces such as SystemTap are disabled y Access to /dev/kmem and /dev/mem is blocked, even for root y Suspend to disk hibernation is disabled y Low level hardware interfaces such as I/O ports and PCI BAR access is disabled
4-9
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Troubleshooting With GRUB 2 Troubleshooting GRUB 2 problems • GRUB 2 not installed in MBR • Configuration typos
/boot/grub2/grub.cfg Troubleshooting kernel problems • Bad kernel parameters • Wrong kernel file
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Boot Process – Common Problems
It is easy to recognize when the boot process fails. Finding the culprit of the problem may be a challenge. The most valuable tool is a solid understanding of the boot process. GRUB 2 Problems
One of the most common problems with GRUB 2 occurs when dual-booting Linux and Windows. The Windows installer overwrites GRUB 2 in the master boot record (MBR); Windows fails to check for non-Windows installations before installing its own bootloader. To solve this problem, assuming /boot/grub2/grub.cfg is in place, GRUB 2 must be reinstalled. If possible, enter the GRUB 2 menu on boot and change to the GRUB 2 shell with the c command. Define the filesystem that contains the /boot/grub2/ subdirectory, and then type setup (hd0), where 0 is the first storage device. For example:
grub> root=(hd0,msdos1) grub> configfile /grub2/grub.cfg
This will launch the menu using the indicated configuration file. Once the system is booted successfully, run the update-grub command. The grub2-install command is the standard way to install GRUB 2 from the shell (from recovery media, if needed):
# grub-install /dev/sda . . . output omitted . . . 4-10
Boot problems can occur if there are typographical errors or syntax errors in the GRUB 2 configuration file. Fortunately, GRUB 2 is versatile enough to allow editing of the configuration at boot time, which helps to resolve these types of problems. However, these edits made at boot time are not written to the permanent configuration file. You must edit the configuration file once the system is up and running to make the changes permanent. (See grub2-mkconfig(1), the /etc/grub.d/ configuration files and /etc/default/grub.) Bad Kernel Parameters
The kernel needs to be able to mount a root filesystem before launching the only program that it will start, (e.g. init). The kernel relies on the bootloader to pass the root= parameter, identifying the correct device to mount and will panic if it is not usable. For instance, if the root=UUID=/ or root=LABEL=/ parameter identifying the root partition is missing or incorrect. The mem= parameter is used to help the kernel find the correct amount of memory on a few systems where there could be trouble detecting this correctly on its own. One common mistake is to think that this parameter takes values of a larger scale than it does. For example, passing mem=1024 tells the kernel that there is only 1024 bytes of memory in the system. Use mem=1024M to indicate 1G.
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Wrong Kernel File Specifying the wrong filename in the bootloader configuration for the compressed kernel image files will also result in a boot failure. Typically, the compressed kernel will have a name like vmlinuz-3.12.28-4-default and be located in the /boot/ directory. (The z means it is a compressed kernel image.)
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu 4-11
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Boot Process Troubleshooting Filesystems • Mounting issues, labels – /etc/fstab • fsck
Filesystems
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Mounting filesystems by raw device name (e.g. /dev/sda1) can lead to problems if the device names shift, (which can occur when adding a new drive, or when a drive fails). To avoid the problems with devices being assigned a different device file name, filesystems can be mounted using a label, or UUID, instead. The label and UUID for a filesystem are stored in the superblock and can be listed with a variety of commands. The following example shows using tune2fs and blkid to discover the label and UUID of an ext3 filesystem and then mount using those values:
One way to greatly minimize the potential problem of duplicate labels is to use filesystem UUIDs instead.
If a filesystem encounters corruption of some kind, this is usually detected at boot time and fsck is automatically run to scan and fix any filesystem errors. (fsck is a wrapper to the filesystem specific tool.) If the error is more serious, the system boot script will drop you into single-user mode to fix it manually, if possible.
# fsck -y /dev/hda5
Modern Linux distributions employ the use of journaling filesystems,
which greatly reduce the possibility of filesystem corruption and # tune2fs -l /dev/sda10 | egrep "name|UUID" make real-time error recovery possible. Filesystem volume name: /test Filesystem UUID: 34f93944-d482-4c34-843f-c952b7f900be # blkid /dev/sda10 /dev/sda10: LABEL="/test" UUID="34f93944-d482-4c34-843f-c95a 2b7f900be" SEC_TYPE="ext2" TYPE="ext3" # mount LABEL=/test /mnt # mount UUID="34f93944-d482-4c34-843f-c952b7f900be" /mnt Even if mounting by filesystem labels, as opposed to device files, naming conflicts can still occur. For example, if you were to install a second hard drive to allocate additional space for your /var/ filesystem, there may be conflicts between the old partition, and the new one, if both had a filesystem label of /var. It's important to make sure each partition has a unique label. Mismatched labels can be repaired using the e2label command. 4-12
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Troubleshooting: Linux and Init Once the kernel has mounted the root filesystem, the init program is started: • /sbin/init • /etc/init • /bin/init • /bin/sh
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu init=
Kernel to Userland Hand-off
When the kernel is finished performing initialization, and the initial RAM disk is processed, it starts the userland by executing /sbin/init. The kernel has fall-backs in case the binary is not found as shown in this code found in the kernel source: File: init/main.c
the root filesystem is horribly damaged or the kernel is mounting the incorrect filesystem as root. Note that the if (execute_command) check is for processing of the init= option kernel parameter.
/* * We try each of these until one succeeds. * * The Bourne shell can be used instead of init if we are * trying to recover a really broken machine. */ if (execute_command) { run_init_process(execute_command); printk(KERN_WARNING "Failed to execute %s. Attempting " "defaults...\n", execute_command); } run_init_process("/sbin/init"); run_init_process("/etc/init"); run_init_process("/bin/init"); run_init_process("/bin/sh"); panic("No init found. Try passing init= option to kernel");
When booting if the error message "No init found" is seen, then either 4-13
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Process Management Processes can cause problems for other processes or the system • Memory consumption • CPU consumption • Disk space consumption
Process Management
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Ideally all processes on a system would be completely bug-free and would be well behaved in gracefully dealing with whatever conflicts arose on the system. Unfortunately real world applications and processes rarely resemble this ideal model. Processes can—due to bugs, or perhaps just poor design and normal operation—misbehave and cause problems for other processes or for the system itself. Process management is an essential aspect of tracking down, isolating, and troubleshooting the many problems which may manifest themselves on Linux systems. Understanding how processes interact with the system and other processes on Linux allows you to identify processes that are adversely affecting the system and take the correct action. It also allows you to set the necessary limits on system resources to help minimize the damage of a "run-away" process. Packages Related to Process Management
The following list of packages contain commands commonly used in process management: procps, psmisc, coreutils, util-linux. Command Related to Process Management
The following list of commands are examples of commands commonly used in process management: ps, top, kill, pgrep, pkill, nice, killall, free, and w.
4-14
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Process Management Tools General tools • top, htop • iotop • ps • kill • killall • pgrep
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Process Management – Common Problems
Dealing with processes that consume resources in an excessive manner can be tricky. Thankfully, many tools exist that will help you track down the culprit process. General Process Management Tools
The top command is often the first tool used to find a process that is consuming resources on a system. The top command shows a sorted listing of currently-running processes. You can alter the sorting criteria by pressing certain keys (i.e. by memory usage, accumulated CPU time, etc.)
For more general information about currently running processes, use the ps command. The Linux ps command attempts to maintain compatibility with System V and BSD versions. You can use the ps -ef or ps aux commands to list all processes running. Use the kill command or one of its variants to send signals to processes. The killall command is convenient for sending signals to all processes with a given name. The pkill command can use a variety of criteria for identifying a process, as well as the process name. For example:
# pkill firefox
The top command conveniently lets you send signals to processes with the k key. For a complete list of what keys you may use within top, press h or ?.
The pgrep command is a useful substitute for commands like ps -ef | grep regex. For example, to list the process IDs for all processes with names matching the regular expression httpd, run:
Newer versions of top allow more options for defining the sorting criteria.
# pgrep httpd 6907 6908 6909 6910 6911 6912
[R7] The following applies to RHEL7 only:
For RHEL7, the htop command is available from Fedora's Extra Packages for Enterprise Linux (EPEL). [S12] The following applies to SLES12 only:
For SLES12, the htop command is available from the openSUSE Build Service.
4-15
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Troubleshooting Processes: top Process consumes excessive memory Process consumes excessive CPU resources Process consumes excessive disk space
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Processes That Consume Excessive Memory
If you discover that memory is being consumed excessively, use the top command and sort by memory usage (press Ò¿F, then n). This will help you find the offending process so you can kill it or take another course of action. Processes That Consume Excessive CPU Resources
If you discover CPU resources are being consumed excessively, use the top command and sort the listing by percentage of CPU time used (press Ò¿F, then k). Processes That Consume Excessive Disk Space
If you discover disk resources are being consumed excessively, you can use top and look for processes with D in the status field. This indicates un-interruptable sleep, which typically indicates that the running process is waiting for I/O resources, in this case the disk. One thing to note is that it is difficult to kill a process that is waiting for I/O because the kernel is designed to prevent processes from dying if they have open file handles in order to prevent inconsistencies in the stored data.
4-16
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Filesystem Concepts Considerations • creation • repair • tuning • monitoring
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu Repair and Tuning
Filesystems
Linux has a variety of filesystem options. While all filesystems share certain common problems, each has its own advantages and challenges. A solid understanding of the solutions available will result in a more reliable, secure, efficient and adaptable system. related packages dosfstools, e2fsprogs, xfsprogs,
btrfsprogs [S12] btrfsprogs, reiserfs
[S12] The following applies to SLES12 only:
/usr/sbin/mkfs, /usr/sbin/mkswap, /usr/sbin/fsck, /usr/sbin/tune2fs, /usr/sbin/e2label, /usr/sbin/xfs_fsr, /usr/sbin/btrfstune, /usr/sbin/swapon, /usr/sbin/swapoff, /usr/bin/chown, /usr/bin/chmod, /usr/bin/df, /usr/bin/du [S12] /usr/bin/chkstat, /sbin/reiserfstune
Filesystem Monitoring
[R7]
binaries
Filesystem integrity can be checked and repaired using fsck, which is also a wrapper to the filesystem specific check utilities. Many aspects of the filesystem can be changed even after creation. All extended filesystem versions can be configured using tune2fs and e2label. XFS can be administered with xfs_admin. Btrfs can be administration including btrfstune and btrfs commands.
Laying Down Filesystems
chkstat checks and sets file permissions.
Most filesystems can only store a limited amount of data and number of files. Ownership and file permissions must be correct for the system to be secure and usable. Disk space should be monitored to ensure system reliability. Swap files and partitions are not like other filesystems. Because swap is used for virtual memory instead of storing files, none of the normal filesystem concepts apply to swap. Swap is basically limited to creation, activation and deactivation.
Each filesystem type has its own set of tools for creation, configuration, and repair. Filesystems can be created with the mkfs command, a wrapper to the specific creation utility of each filesystem's own tool, or by running the appropriate file creation utility directly, such as mkfs.xfs, mke2fs, or mkfs.btrfs.
4-17
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Filesystem Troubleshooting Common Problems • Filled filesystem • Exhausted inodes • File permissions • Filesystem corruption • Avoiding a reboot
Filled Filesystem
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
A full filesystem can result in many strange errors. If a filesystem is full, some software will lock up because it can't write to its log file. Other programs will blindly throw away potentially critical data while saving a file. The best solution is to avoid such problems by using tools such as df and du to monitor disk usage. Careful partitioning can also prevent problems: for example, by putting web server logs and database files on separate partitions.
Use the lsof command to identify open files that have been deleted, and are thus taking up the space that has otherwise been freed by being deleted:
$ lsof | grep deleted less 3289 guru 4r REG 253,2 986006016 131 /tmp/foo (deleted) Exhausted Inodes
On both ext2 and ext3 it is possible to have disk space available but not be able to create new files because all of the filesystem's inodes are in use. This is most likely on a system with many small files, such as a mail server using maildir. The only solution is to create a new filesystem with more inodes. The df -i command can be used to monitor inode usage. XFS, Btrfs, JFS, ReiserFS are able to dynamically add inodes and do not suffer from this problem. Filesystem Corruption
Following a system crash or power failure, it is important to perform 4-18
a filesystem integrity check. By default, the extended filesystem performs periodic integrity checks, which can take a long time on large filesystems. This can be adjusted (i.e. the maximum mount count/interval) using a command like the following for the extended filesystem:
# tune2fs -c 0 -i 0 /dev/sda1 . . . output omitted . . . File Permissions
It is easy to accidentally change important file and directory permissions when modifying multiple files at once. For example, the execute bit has very different meaning for directories than it does for files. Certain system tools require the SUID bit in order to provide services to ordinary users. The /tmp/ directory requires the sticky bit be set to prevent users from clobbering each other's files. Restoring execute only to directories after a "recursive-chmod-gone-wrong" is relatively easy:
# chmod -R +X broken_directory
The rpm command can also be used to fix broken permissions and user and group file ownership:
# rpm -qa --setperms # rpm -qa --setugids
Detecting and correcting other permission changes is harder, but can be eased somewhat with tools such as rpm -V (verify), Tripwire, or AIDE.
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
[S12] The following applies to SLES12 only:
SLES12 also includes chkstat, a tool which can check and repair permissions on a handful of critical files. It bases its checks on information in /etc/permissions and /etc/permissions.d/*. By default, SLES12 calls chkstat during boot to reset permissions on all files for which it has information. Avoiding a reboot
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Standard partitioning tools, such as those based on libparted, the sfdisk command, and others, use the ioctl(2) system call to inform the kernel that a change has been made to the partition table. However, in cases where the drive is in use (e.g. a mounted partition), this system call may fail, especially when using the util-linux package's fdisk command.
To avoid the need to reboot, use the sfdisk or partprobe commands to solve the problem. For example:
# sfdisk -R /dev/sda
The partx command, as well as addpart and delpart, can still be used to update what the kernel sees:
# partx -a /dev/sda
This may cause warnings when partx attempts to update existing entries. If removing a partition, delete instead of add:
# partx -d /dev/sda
4-19
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Backup Concepts While troubleshooting, good backups can be useful for: • Historical information on the system • Rapid roll-back to a known good state • Worst-case recovery scenarios
Backups
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Sometimes no amount of creative troubleshooting can resolve an issue. A partial or full system recovery may be the only option. Even if recovery isn't needed, a historical record of the system can be useful when diagnosing an issue.
Periodically transferring files to backup media is only part of a successful backup strategy. Periodic simulation of real-world recovery scenarios is a vital part of any complete backup strategy.
Careful planning and regular verification of the established procedures can prevent unpleasant surprises when (not if) disaster strikes. related packages coreutils, tar, star, cpio, pax, rsync
openssh-clients [S12] openssh [R7]
binaries
/usr/bin/cp, /usr/bin/tar, /usr/bin/star, /usr/bin/cpio, /usr/bin/pax, /usr/bin/rsync, /usr/bin/scp
Effective planning requires attention to detail. For example, depending on the recovery scenarios you wish to support. For example, the star command is preferred by some because of its speed and support for alternative archive formats. However, it isn't available on all platforms that will be used for recovery, therefore it would probably be better to go with the more widely used GNU tar. 4-20
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Backup Troubleshooting Include archival options Remember special files Regularly simulate recovery
Include Archival Options
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Some tools provide special "archival" options. For example, cp should generally be called with either -a or -dp in backup scripts. Otherwise the copy will be missing important details like ownership, modification timestamp and regular files vs. links. Remember Special Files
Every backup strategy, no matter how simple or complex, should be periodically tested. If the strategy includes off-site storage, testing it should include retrieval under realistic conditions. If the restore is likely to happen on a machine that consistently runs at 90% disk capacity, testing should occur under similar conditions. Otherwise, you run the risk of not noticing problems like insufficient room to bring back a cpio file before extracting, or a failover server that had a hard drive cannibalized.
A complete Unix system consists of more than normal files and directories. A backup tool that doesn't properly handle symbolic and hard links can quickly get in trouble. For example, unlike cp, scp doesn't have a -d option. As a result, it's easy to create a file structure that causes scp to create deeply nested, infinite copies of the same files. A better solution is to use rsync or a combination of tar and scp. Device files are another type of special file. While cpio is able to archive block and character devices, tar can not. If you want to perform a full system backup, including /dev/, then cpio, pax, or star must be used to archive at least portions of the system. Regularly Simulate Recovery
Most experienced system administrators have war stories from failed backup strategies. It could be operators that inserted the same tape every night, or years of useless archived backups in the vault because a script was never switched from development to production mode. These stories can be both hilarious and tragic depending which side of the punch line you end up on.
4-21
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Backup Troubleshooting Document all procedures Don't waste resources
Document All Procedures
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Both backup and restoration procedures should be well documented. Whoever is called to perform an emergency recovery at two in the morning with upper management watching shouldn't have to guess where the backups have been going and how to get them back. To ensure that documentation is accurate and useful, simulated recoveries should be performed using existing documentation instead of personal knowledge of the system. Don't Waste Resources
Every party knowledgeable in the services provided on a system should be involved in at least the planning of backups. For example, it is impossible to backup most database servers simply by copying their data files. Because the files are often large, and the relationship between files complex, even if an entire data file can be copied before it is changed, it may not be valid in relationship to other database files. Instead, the database must provide an export of the data, or be informed that a backup is taking place. If the chosen solution is periodic data exports, there is no reason to include the data files in a system backup. Likewise, if a group of servers are exact duplicates of one another, it is only necessary to take a full backup of one and any minor deltas on the others. Without thoughtful planning, it is easy to waste precious resources such as storage space, network bandwidth, CPU cycles or personnel time.
4-22
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Lab 4 Estimated Time: S12: 120 minutes R7: 120 minutes
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Task 1: Troubleshooting Problems: Topic Group 1 Page: 4-24 Time: 120 minutes Requirements: b (1 station)
4-23
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Objectives y Practice troubleshooting common system errors. y Learn the tsmenu program in Topic Group 0. y Solve all troubleshooting scenarios in Topic Group 1. Requirements b (1 station)
Lab 4
Task 1 Troubleshooting Problems: Topic Group 1
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Estimated Time: 120 minutes
Relevance Troubleshooting scenario scripts were installed on your system as part of the classroom setup process. You use these scripts to break your system in controlled ways, and then you troubleshoot the problem and fix the system.
1)
As the root user, invoke the tsmenu program:
# tsmenu
2)
The first time the troubleshooting framework is started, you are required to confirm some information about your system:
Confirm the distribution of Linux is correct by selecting Yes and then press Õ.
select OK and then press Õ. You are presented with the 'Select Troubleshooting Group' screen.
3)
This first break system scenario is a simple HOW-TO for the tsmenu program itself. Its function is to familiarize you with the usage of the program: Use the UP and DOWN arrow keys to select 'Troubleshooting Group #0'.
Use the LEFT and RIGHT arrow keys to select OK and press Õ to continue. You are taken to the 'Select Scenario Category' screen.
4)
Pick the category of problem that you want:
Select the 'Learn: Example learning scenarios' category.
Select OK and press Õ to continue. You are taken to the 'Select Scenario Script' screen. 4-24
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
5)
Pick the specific troubleshooting script that you want to run: Select the 'learn-01.sh' script select OK and press Õ. You are taken to the 'Break system?' screen.
6)
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
The 'Break system?' screen provides you with a more detailed description of the scenario and asks whether you want to proceed and "break the system now?". Read the description of the problem and then Select Yes and press Õ. You are taken to the 'SYSTEM IS BROKEN!' screen. Select OK and press Õ. The system is now locked on the selected scenario and will not permit you to run another scenario until the current scenario is solved.
Depending on the scenario, a reboot may be required before the problem is noticed. In these cases, the system will reboot automatically when you select OK.
7)
You can re-read the description of the scenario two different ways. First, the description is written into a text file. Display the contents of this file:
# cat /problem.txt . . . output omitted . . .
/problem.txt is a symbolic link to /boot/problem.txt. Especially with boot scenarios, this allows the file to be accessed with the root filesystem is not available, such as through the boot loader menu.
8)
You can also get information about the currently locked problem by re-running the tsmenu program. As the root user, launch the tsmenu program:
# tsmenu
You are taken to the 'You•re in learn-01.sh (Example scenario)' screen.
9)
Use the UP and DOWN arrow keys to select the 'Description' menu item, then select OK and press Õ to continue. The scenario description text is displayed.
4-25
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
10)
Select the 'Hints' menu item, then select OK and press Õ. The 'Hints' screen is displayed. Notice that the total number of hints available is indicated, and that past hints (if already displayed) are also shown in the list.
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
11)
Select the 'Hints' menu item several times until you have seen all of the available hints for the learn-01.sh scenario. Each time a hint is accessed, it will be stored in the problem.txt for later reference.
12)
The 'Check' menu item is used to check if the currently locked troubleshooting scenario has been correctly solved. Select 'Check', then select OK and press Õ.
You are presented with the message 'ERROR: Scenario not completed'. This indicates that the conditions required by the script have not yet been met. If you feel that you have solved the problem, then you may need to carefully review the requirements as listed in the 'Description'. If you are still unsure about how to proceed then you should consult with the instructor. Select Cancel and press Õ to exit the tsmenu program.
13)
Solve this problem by creating the required file:
# touch /root/solved
14)
Launch the tsmenu program again.
# tsmenu
15)
Each time the tsmenu program launches, it checks to see if you have solved the current problem. If you leave the tsmenu program open, then you can check a problem at any time by following these steps: select the 'Check' menu item
select OK and press Õ you are presented with a 'SUCCESS: Scenario completed' message Select OK and press Õ You are taken back to the main 'Select troubleshooting group' screen 4-26
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Select Cancel and press Õ to exit the tsmenu program
16)
You should now proceed to complete the troubleshooting scenarios associated with topic group 1 using the same basic procedure as shown in the previous steps.
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu 4-27
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Content Networking Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Linux Network Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Networking Commands Review . . . . . . . . . . . . . . . . . . . . . . 5 NetworkManager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Networking Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . 8 Networking Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . 9 Virtual Interfaces/IP Aliases . . . . . . . . . . . . . . . . . . . . . . . . . 10 Network Teaming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Xinetd Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 Xinetd Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 TCP Wrappers Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 TCP Wrappers Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 TCP Wrappers Troubleshooting . . . . . . . . . . . . . . . . . . . . . 19 Netfilter/iptables Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . 20 Netfilter/iptables Troubleshooting . . . . . . . . . . . . . . . . . . . 21 Lab Tasks 22 1. Troubleshooting Problems: Topic Group 2 . . . . . . . . . . 23
z u s e j.h r u i F Chapter d t t iu r t e e b z o R m e : n to TOPIC GROUP 2 d @ e rt s e n b e c o i r L sz. e r fu
5
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Networking Tools ip ifconfig and route tcpdump wireshark ethtool
Networking
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
The usefulness of a system generally increases when the system is connected to a network. Setting up network devices, obtaining an IP address, network routing and the potential problems associated with networked systems are part of the day-to-day duties for many systems administrators.
related packages net-tools, iproute2, tcpdump, iputils, wireshark, ethtool [R7]
mtr
binaries
/sbin/ifconfig, /usr/sbin/ifup, /sbin/ifdown, /sbin/route, /sbin/ip, /bin/ping, /usr/bin/tracepath, /usr/bin/traceroute, /usr/sbin/ethtool, /sbin/dhclient [R7] /sbin/mii-tool, /usr/sbin/NetworkManager [S12] /usr/sbin/wicked
configs
/etc/hostname, [R7]
/etc/sysconfig/network-scripts/* [S12] /etc/sysconfig/network/*
log
/var/log/messages
systemd
[R7]
/etc/init.d/NetworkManager
[S12]
/usr/lib/systemd/system/wickedd.service
5-2
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Linux Network Interfaces Historical and Virtual Machine Network Interface names, e.g. ethX • X corresponds to the instance number starting at 0 Modern Interface Names based on firmware, topology and location information • Goal: Provide consistent, predictable interface names • systemd-udev Interface Naming Scheme used by default Uses five separate, ordered schemes • Alternate biosdevname Interface Naming Scheme Used if requested biosdevname=1 or by default on Dell servers Listing network interfaces via /sys • ls /sys/class/net
Linux Network Interfaces
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu Linux Network Interface Naming
The Linux operating system handles networking through virtual devices called interfaces. For most practical purposes, an interface is a network connection, such as a connection to an Ethernet network. Keep in mind that several IPs can be bound to any single physical networking device through the creation of virtual interfaces or aliases. The system can be configured with several virtual interfaces all accepting traffic destined for different IP addresses. This can be accomplished with only one physical network connection and several network interfaces bound to the Ethernet connection.
Traditionally interface devices are assigned names based on the type of network connection with which they are associated. Ethernet interface names begin with eth followed by a number, starting from zero, which represents the instance of that device in the machine. Thus, eth0 is the name of the first Ethernet interface, eth1 is the name of the second Ethernet interface, ppp3 is the name of the fourth PPP interface, slip0 is the first SLIP interface, isdn1 is the second ISDN interface, tr0 is the first Token Ring adapter, and fddi0 is the first FDDI adapter.
Local Loopback Interface
Today, new naming schemes are used by default such as systemd-udev or BIOSDEVNAME. These new schemes have a common goal of providing consistent, predictable interface names tied to physical topology information exposed by the BIOS/firmware.
The loopback interface (lo) is a special network interface that points to the local machine. This interface is useful for testing basic networking. It is also useful when testing client-server IP applications (such as a web server), since it means the service will always have an IP address which can be used. Some applications regularly use the localhost address (on the loopback interface) to talk to other applications on the local system. For example, a daemon that wants to send email and connects to the MTA through the loopback interface.
The loopback interface is assigned all IP addresses in the 127.0.0.0/8 netblock, though typically it is represented as having the host address 127.0.0.1.
The systemd-udev Interface Naming Scheme
By default, the systemd-udev inteface naming scheme is now used. With this scheme, alll interface names have a two character prefix depending on the type of hardware.
en ⇒ Ethernet wl ⇒ Wireless LAN (WLAN) ww ⇒ Wireless Wide Area Network (WWAN)
After the two character prefix, the rest of the interface name is set by following 5 ordered schemes based on what information is exposed by the BIOS/firmware. In order, the first scheme able to be applied is 5-3
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
used.
oX ⇒ On-board device index number (e.g., eno1) sX ⇒ PCI Express hotplug slot index numbet (e.g., ens1) pXsX ⇒ PCI geographical location (e.g., enp3p1 xMAC_ADDRESS ⇒ MAC address, however not used by default. (e.g., enxf0def14f36e6) Traditional name (e.g., ethX) ⇒ Only used if all previous methods failed
To disable the BIOSDEVNAME scheme, add biodevname=0 to the GRUB_CMDLINE_LINUX variable in the /etc/default/grub file.
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
To view the information available to UDEV for a network interface, run the following command:
# udevadm info /sys/class/net/eno1 P: /devices/pci0000:00/00:00:01.0/000:01:00.0/net/eno1 E: DEVPATH=/devices/pci0000:00/00:00:01.0/000:01:00.0/net/eno1 E: ID_BUS=pci E: ID_MODEL_FROM_DATABASE=Ethernet 10G 4P X540/I350 rNDC E: ID_MODEL_ID=0x1528 E: ID_NET_LABEL_ONBOARD=enNIC1 E: ID_NET_NAME_MAC=enxbc305befbf10 E: ID_NET_NAME_ONBOARD=eno1 E: ID_NET_NAME_PATH=enp1s0f0 E: ID_OUI_FROM_DATABASE=Intel Corporation E: ID_PCI_CLASS_FROM_DATABASE=Network controller E: ID_PCI_SUBCLASS_FROM_DATABASE=Ethernet controller E: ID_VENDOR_FROM_DATABASE=Intel Corporation E: ID_VENDOR_ID=0x8086 E: IFINDEX=48 E: INTERFACE=eno1 E: SUBSYSTEM=net E: SYSTEMD_ALIAS=/sys/subsystem/net/devices/eno1 E: TAGS=:systemd: E: USEC_INITIALIZED=351223 To disable the systemd-udev inteface naming scheme, add net.ifnames=0 to the GRUB_CMDLINE_LINUX variable in the /etc/default/grub file. BIOSDEVNAME Interface Naming Scheme
The BIOSDEVNAME scheme was created by Dell to provide a consistent, deterministic naming scheme where network interface names correspond to physical labeling for embedded NICs, and PCI slot and port numbers for addon cards. This happens automatically 5-4
for Dell Servers and systems that have specifically requested this scheme using biodevname=1 during installation and also have SMBIOS type 41 records and type 9 records in the BIOS SMI tables (see dmidecode(8), i.e. dmidecode -t 41). Virtual machines still use the historical ethX scheme.
UDEV rules call the biosdevname command to convert kernel names to the new BIOSDEVNAME scheme. In debug mode, this command will show all detected network interfaces with both their BIOS and kernel names:
# biosdevname -d BIOS device: p3p1 Kernel name: p3p1 Permanent MAC: D4:BE:D9:90:DE:0F Assigned MAC : D4:BE:D9:90:DE:0F Driver: tg3 Driver version: 3.119 Firmware version: sb Bus Info: 0000:02:00.0 PCI name : 0000:02:00.0 PCI Slot : 3 Index in slot: 1
Manually Controlling the Interface Name [R7] The following applies to RHEL7 only:
By convention the interface configuration file should have a suffix that matches desired interface name, (e.g., /etc/sysconfig/network-scripts/icfg-eth0), however the filename itself doesn't control anything. The HWADDR variable inside the file is used to associate the file to a network interface, and the DEVICE variable is used to manually set the name. The DEVICE variable is normally initialized using the systemd-udev scheme during installation. However, changing the DEVICE variable afterwards will cause the interface name to be changed on the next reboot. Any software configuration files, such as firewall rules, that reference the old name must be updated to the new name.
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Networking Commands Review Interface management • ip addr|link • ifconfig Route management • ip route • route Packet Capture and Analysis
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Basic Network Interface Management
Configuration information for all network devices is stored in text files contained in the /etc/sysconfig directory. The /etc/init.d/network Init script will use the information in these files to setup static IP addresses or dynamic (DHCP or BOOTP) addresses for a network interface. If you make any changes to these files, restart the network service using the Init script.
# systemctl restart network [R7] The following applies to RHEL7 only:
On Red Hat Enterprise Linux, the configuration for Ethernet device eth0 is stored in /etc/sysconfig/network-scripts/ifcfg-eth0. Documentation listing all the valid options that can be set for a network interface is found in the /usr/share/doc/initscripts-*/sysconfig.txt file. [S12] The following applies to SLES12 only:
On SUSE Linux Enterprise Server, the configuration for Ethernet device eth0 is stored in the /etc/sysconfig/network/ifcfg-eth0 file. Files and reference documentation is found in the /usr/share/doc/packages/sysconfig/Contents file. Interface Management
Perhaps the most useful tool for diagnosing networking problems is the ifconfig program. Running ifconfig with no options will display the current networking settings including IP address, adapter MAC
addresses, etc. You may also use ifconfig to change parameters of the network configuration as shown in the following example:
# ifconfig eth1 192.168.74.12 netmask 255.255.255.0
Increasingly, the more capable ip command is used to manage IP aliases, kernel routing, tunnels and policies. The following example shows using ip to list addresses assigned to all interfaces on the system:
# ip addr show # ip addr add 192.168.74.12/24 dev eth1 # ip link show IP Route Management
Running ip with the route argument will display the current routing table. The following example uses of the ip route show how to delete and add a default route:
# ip route del default # ip route add default via 192.168.74.1 Packet Capture
When a network problem isn't easily explained, it's often useful to see what packets are going across the network itself. The tcpdump and wireshark tools are indispensable for doing this.
5-5
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
NetworkManager Tools • nmcli
nm-settings(5) • GUI: nm-connection-editor • TUI: nmtui Manual ifcfg changes require nmcli c reload systemd services • NetworkManager.service • NetworkManager-dispatcher.service R7: NetworkManager-config-server RPM • Configures NM to not run DHCP on newly added NICs by default • Configures NM to ignore link state on statically configured NICs
NetworkManager
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
NetworkManager was introduced by Red Hat to dynamically detect
and configure network connections. The capability is especially useful to wireless and laptop users. NetworkManager works in GNOME by loading nm-applet when the NetworkManager service is running.
NetworkManager can be disabled on a per interface basis by updating or adding NM_CONTROLLED=•no• to the appropriate ifcfg-LABEL file.
A GUI networking tool can configure individual connections with the nm-connection-editor, which can be used for everything from defining DNS settings and PPP connections to IPSec VPN tunnels and Wi-Fi settings. The nmtui command line utility is also provided. NetworkManager Dispatcher
NetworkManager supports an interface that allows a script to be placed in /etc/NetworkManager/dispatcher.d/ to configure environmental settings when an interface is brought up, or down.
Two positional arguments are passed to the script: ($1) name of the network interface and ($2) the action, either up or down. The following will signal Postfix to attempt immediate redelivery of mail when any interface comes up: File: /etc/NetworkManager/dispatcher.d/fire-off-mail.sh + #!/bin/bash + [ "$2" = "up" ] && postfix flush 2>&1 > /dev/null 5-6
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
nmcli Red Hat Enterprise Linux defaults to using Network Manager for its network configuration. The nmcli command is a utility for persistent configuration of network settings through NetworkManager. Also, it is useful for scripting configuration of networking devices. For example, a script could contain the following:
# nmcli connection ipv4.method ipv6.method # nmcli connection # nmcli connection # nmcli connection # nmcli connection # nmcli connection # nmcli connection # nmcli connection # nmcli connection ipv4.method ipv6.method
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
show eth0 | grep method auto link-local modify eth0 ipv4.address 10.100.0.25/24 modify eth0 ipv4.gateway 10.100.0.254 modify eth0 ipv4.dns "10.100.0.254,8.8.8.8" modify eth0 +ipv4.dns 8.8.4.4 modify eth0 connection.autoconnect yes modify eth0 ipv4.method manual reload show eth0 | grep method manual link-local
Associating New Devices with Connections
Network devices (most commonly physical interfaces) are not the same as connections which are logical profiles that use one or more devices. When additional network interfaces are added to a system, they will not initially be associated with any connection. Add them as shown in the following example:
# nmcli device show DEVICE TYPE STATE CONNECTION eth0 ethernet connected eth0 eth1 ethernet disconnected -lo loopback unmanaged -# nmcli connection show NAME UUID TYPE DEVICE eth0 31113e7a-08b0-4cfc-85eb-ddf6405007ad 802-3-ethernet eth0 # nmcli connection add type ethernet ifname eth1 con-name eth1 Connection •eth1• (65882cfe-a13c-4ba7-8a16-867d360e3f64) successfully added. # nmcli connection show NAME UUID TYPE DEVICE eth0 31113e7a-08b0-4cfc-85eb-ddf6405007ad 802-3-ethernet eth0 eth1 65882cfe-a13c-4ba7-8a16-867d360e3f64 802-3-ethernet eth1
5-7
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Networking Troubleshooting Adapter link Address and subnet mask Gateway issues
Adapter Link
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
The most obvious (and commonly overlooked) thing you can check for when troubleshooting networking problem is the indicator link on the physical network interface. Is the link light lit up? Has the Ethernet connector slid out of the port? This is all too often the culprit of connectivity issues. If you do not have physical access to the system or if the network interface doesn't have a link indicator light, use the ethtool command and examine its output for link detection.
You can use ethtool to determine (and modify) the speed and duplex settings your Ethernet interface is operating at, or whether it has a link:
# ethtool eth0 Settings for eth0: Supported ports: [ TP MII ] Supported link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full Supports auto-negotiation: Yes Advertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full Advertised auto-negotiation: Yes Speed: 100Mb/s Duplex: Full Port: MII PHYAD: 24 5-8
Transceiver: internal Auto-negotiation: on Current message level: 0x00000001 (1) Link detected: yes
[R7] The following applies to RHEL7 only:
In some cases, the ethtool command has difficulty with old hardware. The legacy mii-tool command is available for this purpose. Address and Subnet Mask
Network problems can also be caused by address and subnet mask errors. Because these consist of a string of numbers, it can be difficult to immediately identify a typographical error. Make sure that the default gateway address lies inside your configured subnet. Gateway Issues
A common problem is a system that can't access any other systems outside of its local network. This is usually caused by the system not having a default route in its routing table. Use the ip route command to determine whether there is a default route. If the system is obtaining its network configuration from a DHCP server, it's possible that the DHCP server is not configured to give a default gateway address. If the network configuration is set to be static, check the network configuration file to ensure that it has the correct default gateway assigned.
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Networking Troubleshooting Kernel module problems DHCP client problems DNS issues
Kernel Module Problems
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
When an Ethernet device can not load and the hardware is present, there may be a problem in a /etc/modprobe.d/ configuration file which can override the kernel's normal procedure and define which module should be loaded to support a device. For example, the following entry would cause the kernel to attempt to use the Intel EtherExpress 10/100 driver for the eth5 card (which would fail if the card was actually another chipset): File: /etc/modprobe.d/eepro.conf
alias eth5 eepro100 DHCP Client Problems
Trouble may occur when using the DHCP protocol to automatically obtain an IP address from a DHCP server. To help diagnose these problems, examine the system logs and look in the files where the DHCP client stores the information it obtains from the DHCP server.
Some DHCP servers only allow specific systems to request IP addresses. Access is usually limited by the MAC address of the Ethernet adapter. Use the ip link command to find the MAC addresses assigned for Ethernet adapters. If using ifconfig, look at the HWaddr label.
dhclient-$INTERFACE.leases.
[S12] The following applies to SLES12 only:
On SLES12, the DHCP client directory is /var/lib/wicked/ and the DHCP client file is named lead-$INTERFACE.dhcp-ipv4.xml.
DNS Issues
Many networking problems can be caused by incorrect client side DNS settings. For example, if you find you can ping a numeric IP address but you can not ping its domain name equivalent, name resolution may be the culprit. Check the contents of the /etc/resolv.conf file for accurate information. [R7] The following applies to RHEL7 only:
On RHEL7 systems, the NetworkManager daemon will overwrite changes to the /etc/resolv.conf. At least one, in successive order, of DNS1, DNS2, and DNS3 must be entered into the appropriate network interface configuration script in the /etc/sysconfig/network-scripts/ directory, (e.g. /etc/sysconfig/network-scripts/ifcfg-p3p1).
[R7] The following applies to RHEL7 only:
On RHEL7, the DHCP client information is stored in the /var/lib/dhclient/ directory. The DHCP client file is named 5-9
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Virtual Interfaces/IP Aliases Virtual Interfaces • Displayed with the ip and ifconfig commands IP Aliases • Only displayed with the ip command
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
IP Aliases and Virtual Interfaces
Historically, Linux provided support for multiple IP addresses on a single physical interface by using virtual interfaces. Virtual interfaces can be configured by using the ip or ifconfig commands. Virtual interfaces are designated by a colon and then a virtual interface label, prefixed by the physical interface. For example, ethX:Y where ethX is the physical interface and Y is the virtual interface label. If you have multiple virtual interfaces configured, the ifconfig command can display both the physical and virtual interface information.
physical interface. The ip command supports both the virtual interface method and the IP aliases method. The IP aliases method is handled in a different way than virtual interfaces. IP aliases and virtual interfaces are completely different and ifconfig output does not include information about IP aliases. This can and does cause confusion for system administrators who are familiar with virtual interfaces but not IP aliases. The following sequence of commands and output is an example of adding an IP alias and the inability of ifconfig to show that alias address:
The ip program also lets you assign multiple IP addresses to a
# ifconfig eth0 eth0 Link encap:Ethernet HWaddr 00:07:E9:54:35:F0 inet addr:10.100.0.254 Bcast:10.100.0.255 Mask:255.255.255.0 . . . snip . . . # ip addr add 10.100.0.247/24 dev eth0 # ip -4 addr list eth0 2: eth0: mtu 1500 qdisc pfifo_fast qlen 1000 link/ether 00:07:e9:54:35:f0 brd ff:ff:ff:ff:ff:ff inet 10.100.0.254/24 brd 10.100.0.255 scope global eth0 inet 10.100.0.247/24 scope global secondary eth0 # ifconfig eth0 eth0 Link encap:Ethernet HWaddr 00:07:E9:54:35:F0 inet addr:10.100.0.254 Bcast:10.100.0.255 Mask:255.255.255.0 . . . snip . . .
5-10
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Network Teaming Major components • team → kernel module • teamd → userspace daemon compiled with a runner which provides control logic • teamdctl → to monitor and configure Configuration via: nmtui, nmcli, teamd, ifcfg-* files, NetworkManager GUI
Teaming Architecture
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Red Hat Enterprise Linux offers an alternative implementation of interface aggregation called Network Teaming. In the end it accomplishes the same thing, but has several improvements over the traditional Linux bonding implementation. With Network Teaming all of the logic is pushed into userspace which gives it more flexibility, however, it still has a kernel component, the team module that handles the performance sensitive portion. The team kernel module uses a lockless Tx/Rx path for much lower overhead compared to the bonding kernel module.
The following are a list of the currently available runners which can be used with teamd. Note that these are basically equivalent to the various modes implemented in the traditional bonding module: broadcast ⇒ data is transmitted over all ports round-robin ⇒ data is transmitted over all ports in turn active-backup ⇒ one port or link is used while others are kept as a backup loadbalance ⇒ with active Tx load balancing and BPF-based Tx port selectors lacp ⇒ implements the 802.3ad Link Aggregation Control Protocol
team ⇒ Small kernel module that implements fast handling of packet
flows. Listens for communications from userspace applications via Netlink API. teamd ⇒ Userspace daemon built on the Team lib library, and contains the common logic for interface aggregation. runner ⇒ Userspace code compiled into a teamd instance which implements the specific load-balancing and active-backup logic. teamdctl ⇒ Command to examine configuration, change state of ports (active/backup), add/remove ports, etc.
5-11
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Configuration via Direct Editing ifcfg-* Files First define the team interface: File: /etc/sysconfig/network-scripts/ifcfg-team0 + DEVICE=team0 + DEVICETYPE=Team + ONBOOT=yes + BOOTPROTO=none + IPADDR=10.100.0.3 + PREFIX=24 + TEAM_CONFIG=•{"runner": {"name": "activebackup"},a
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
"link_watch": {"name": "ethtool"}}•
Then create configs for each slave interface substituting the MAC address and device name as appropriate: Configuration via nmcli
File: /etc/sysconfig/network-scripts/ifcfg-team0-port0 + DEVICE=eth0 + HWADDR=52:54:00:04:00:04 + DEVICETYPE=TeamPort + ONBOOT=yes + TEAM_MASTER=team0 + TEAM_PORT_CONFIG=•{"prio": 100}• File: /etc/sysconfig/network-scripts/ifcfg-team0-port1 + DEVICE=eth1 + HWADDR=52:54:00:04:00:05 + DEVICETYPE=TeamPort + ONBOOT=yes + TEAM_MASTER=team0 + TEAM_PORT_CONFIG=•{"prio": 100}•
First create a new teaming connection and add ports to it:
# nmcli con add type team ifname team0 con-name team0 Connection •team0• (6991240c-c9b5-4591-93ff-2f7a5ecf0f40) successfully added. # nmcli con show NAME UUID TYPE DEVICE team0 6991240c-c9b5-4591-93ff-2f7a5ecf0f40 team team0 # nmcli con add type team-slave con-name team0-port0 ifname eth0 master team0 Connection •team0-port0• (54c66033-da90-49d4-8bf0-42e62413ba5b) successfully added. # nmcli con add type team-slave con-name team0-port1 ifname eth1 master team0 Connection •team0-port1• (35c4ce56-5c5e-4e95-8716-576363b5df82) successfully added. # nmcli con mod team0 team.config "{runner: {name: activebackup}, link_watch: {name: ethtool}}" # nmcli con show NAME UUID TYPE DEVICE team0-port1 35c4ce56-5c5e-4e95-8716-576363b5df82 802-3-ethernet -team0-port0 54c66033-da90-49d4-8bf0-42e62413ba5b 802-3-ethernet -team0 6991240c-c9b5-4591-93ff-2f7a5ecf0f40 team team0
5-12
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
The teaming interface will come up when at least one of the ports assigned to it are brought up:
# ip link list team0 5: team0: mtu 1500 qdisc noqueue state DOWN mode DEFAULT link/ether 7a:15:55:92:64:58 brd ff:ff:ff:ff:ff:ff # nmcli connection up team0-port0 Connection successfully activated (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/6) # nmcli connection up team0-port1 Connection successfully activated (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/7) # ip link list team0 5: team0: mtu 1500 qdisc noqueue state UP mode DEFAULT link/ether 52:54:00:04:00:03 brd ff:ff:ff:ff:ff:ff
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Configure other properties on the teaming interface as needed. For example:
# nmcli con mod team0 ipv4.method manual ipv4.addresses "10.100.0.3/24 10.100.0.254" # nmcli device show team0 GENERAL.DEVICE: team0 GENERAL.TYPE: team GENERAL.HWADDR: 52:54:00:04:00:13 GENERAL.STATE: 100 (connected) GENERAL.CONNECTION: team0 GENERAL.CON-PATH: /org/freedesktop/NetworkManager/ActiveConnection/10 IP4.ADDRESS[1]: ip = 10.100.0.3/24, gw = 10.100.0.254 IP4.DNS[1]: 10.100.0.254 IP4.DOMAIN[1]: example.com Files will be created automatically in /etc/sysconfig/network-scripts/ to store the configuration.
5-13
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
The team interface can be brought up, and the state verified, by running:
# ifup team0 # teamdctl team0 state view -v setup: runner: activebackup kernel team mode: activebackup D-BUS enabled: yes ZeroMQ enabled: no debug level: 0 daemonized: no PID: 4902 PID file: /var/run/teamd/team0.pid ports: eth0 ifindex: 2 addr: 52:54:00:04:00:04 ethtool link: 0mbit/halfduplex/up link watches: link summary: up instance[link_watch_0]: name: ethtool link: up link up delay: 0 link down delay: 0 eth1 ifindex: 3 addr: 52:54:00:04:00:04 ethtool link: 0mbit/halfduplex/up link watches: link summary: up instance[link_watch_0]: name: ethtool link: up link up delay: 0 link down delay: 0 runner: active port: eth1 # teamdctl team0 state item set runner.active_port eth0
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
5-14
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Xinetd Concepts Super server that listens on behalf of other services Replaces earlier inetd
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
The Xinetd Super Daemon
The extended Internet services daemon (Xinetd) is a super daemon that manages several network services. Common services managed by the xinetd program include IMAP & POP servers, FTP servers, talk servers and time servers. xinetd controls services and can apply access control lists (ACL) to a service. ACLs can be used to determine which hosts can access the services from the network and what times of the day a service is available. xinetd also leverages TCP wrappers to provide even more fine-grained control. package
xinetd
port/protocol arbitrary binary
/usr/sbin/xinetd
configs
/etc/xinetd.conf and /etc/xinetd.d/* [R7] /etc/sysconfig/xinetd
log
/var/log/messages
Init
/usr/lib/systemd/system/xinetd.service
user/group
runs as root and may launch other daemons under other users as configured.
the xinetd daemon will need to be restarted. Use systemctl restart for restarting the Xinetd:
# systemctl restart xinetd
The main configuration file for the xinetd daemon is the /etc/xinetd.conf file. This file contains global parameters that affect the behavior of the xinetd daemon. The file may also contain global parameters that affect all services controlled by Xinetd. Service specific configuration files are stored in the /etc/xinetd.d/ directory (this directory is specified in the /etc/xinetd.conf file, identified by the includedir directive). Each service managed by Xinetd can be enabled or disabled by editing its corresponding configuration file and modifying the disabled= directive. If the disabled= directive is missing from the file the service will be enabled. The chkconfig command can list the status of a service or it can be used to control which xinetd services are enabled or disabled.
# chkconfig talk on # chkconfig talk --list talk on
Xinetd uses individual configuration files for each service that it controls. If changes are made to a service's configuration file, then 5-15
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Xinetd Troubleshooting Configuration file errors Consider other problems upstream (network, firewall, etc.) Service port(s) may already be in use
Xinetd – Common Problems
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
The following is a list of common problems: Configuration File Errors
Most Xinetd problems stem from configuration errors. The xinetd daemon does a good job of logging configuration parsing errors, so examining the /var/log/messages file is a good way to quickly identify most configuration related errors. Consider Other Problems Upstream
Consider problems upstream in the network traffic's path when a service doesn't appear to respond as it should. Consider firewalls (e.g. Netfilter/iptables) rules that may be preventing requests from getting to an Xinetd managed service. Also consider whether TCP wrappers may be preventing access to a service. Service Port(s) May Already Be In Use
If a port used by a service is already being used by another service or program, then the service will not be able to bind to the port and will fail. As a result, that service will not be accessible. If this happens, messages will be recorded in /var/log/messages such as:
bind failed (Address already in use (errno = 98))
Tools like netstat, lsof or fuser can display a list of in use (bound) ports and the processes and services that are using those ports:
# fuser -v 25/tcp 5-16
here: 25
USER PID ACCESS COMMAND 25/tcp root 2902 f.... sendmail # kill 2902
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
TCP Wrappers Concepts TCP Wrappers Architecture • /etc/hosts.allow • /etc/hosts.deny • Allow by default
libwrap.so daemon_list syntax
TCP Wrappers
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
hosts.allow file is checked before the hosts.deny file. First match
TCP Wrappers provides a general security method for limiting access to services based on remote IP address. TCP Wrappers create a centralized administrative point where access to all services can be managed. It works by "wrapping around" existing applications.
wins and no further checking of rules is done. If no matches are found then access is granted. The libwrap.so Library
related packages [R7] tcp_wrappers, setup, [S12] tcpd, netcfg
The network ACL capabilities of TCP Wrappers can be compiled into a program via the libwrap.so library. This allows a service to get all the benefits of TCP Wrappers without performing a chained launch via tcpd (which may not even be possible).
binaries
/usr/sbin/tcpd /lib/libwrap.so
The daemon_list Syntax
configs
/etc/hosts.allow, /etc/hosts.deny
logs
/var/log/secure, /var/log/messages
The daemon_list should be one (typical) or more daemon process names. When wrapping xinetd services, the daemon_list should be the name of the binary used in the server attribute. If the service doesn't have a server attribute, usually an internal or redirect service, then the daemon_list should be the service name.
Before incoming network connections get passed to the appropriate daemon, they are processed by TCP Wrappers, which checks two files, /etc/hosts.allow and /etc/hosts.deny and then either drops the connection or passes it on to the appropriate service. /etc/hosts.allow is used to specify which hosts can connect to specific services, while /etc/hosts.deny is used to specify which hosts cannot.
When wrapping a libwrap.so linked binary, such as sshd, then the proper daemon_list value is usually the binary file name.
When incoming connections are wrapped, the hosts.allow and hosts.deny files are checked for a match. A match means that both the daemon_list and client_list pair matches. It is important to note that both files may be examined. The
5-17
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
TCP Wrappers Concepts client_list syntax Syntax Shortcuts • ALL • LOCAL
The client_list Syntax
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
The client_list is used to match the client's source IP address. This can be specified in a variety of ways: y A domain starting with a period. For example, .gurulabs.com. This will match any IP for which a reverse DNS lookup resolves to a host in the gurulabs.com domain. y A partial IP address terminated with a period. For example, 192.168. will match all IP addresses starting with 192.168. Note that a single complete IP can also be used. y An IPv4 network in the form n.n.n.n/m.m.m.m. For example 192.168.32.0/255.255.255.0. Note that CIDR notation is not supported. y An IPv6 network in the form [n:n:n:n:n:n:n:n]/m. For example, [3ffe:505:2:1::]/64. Note that CIDR notation is required. y The wildcards * and ? can be used to match hostnames or IP address. Example TCP Wrappers Rules File: /etc/hosts.allow
sshd: 10.100.0.5 in.telnetd: 192.168. 10.5.2.0/255.255.255.0 in.ftpd sshd: .linuxtraining.com
5-18
Syntax Shortcuts In hosts.{allow,deny} Files
To ease and shorten rule writing TCP Wrappers defines many wildcard shortcuts that can be used. The most commonly used wildcard is the ALL wildcard. It always matches. For example: File: /etc/hosts.allow
ALL: 127. [::1] sshd: ALL
The LOCAL wildcard matches any host whose name does not contain a dot. This means that hosts within your same domain would match.
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
TCP Wrappers Troubleshooting General Troubleshooting • Can affect many services • Logged to syslog
journalctl Rsyslog: /var/log/
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu File Parsing Order/Rule Syntax Incorrect Daemon Name Unwrapped Applications Xinetd Conflicts
General Troubleshooting
Because most network services are linked against TCP wrappers, problems with services are often caused by misconfigurations in TCP wrappers, not in the services themselves. It is important to understand which network services use TCP wrappers and to get in the habit of checking the /etc/hosts.allow and /etc/hosts.deny files early in the troubleshooting process.
On RHEL7/SLES12, TCP wrappers passes all log messages to syslog, which can be viewed using the journalctl command. If using Rsyslog, the /var/log/messages file should be examined if errors are occurring when trying to access network services. Syntax errors in the configuration files are also sent to syslog. [R7] The following applies to RHEL7 only:
On Red Hat Enterprise Linux, with Rsyslog, the /var/log/secure file should be examined for authpriv facility messages, such as authentication errors. File Parsing Order & Rule Syntax
It is important to understand the order in which the TCP wrapper configuration files are parsed: first hosts.allow, then hosts.deny. If a service is not matched in either of these two files, access to it is allowed. Many system administrators do not understand the check process correctly, and create rules that are either too restrictive, or too open.
One common problem is the misuse (or overuse) of the ALL shortcut. Because TCP wrappers match on the first encountered rule, general rules listed first may mask out later rules. As an extreme example, if ALL:ALL were placed in the hosts.allow file, the hosts.deny file would never be checked: Incorrect Daemon Name
TCP Wrappers matches the server attribute for each service in the daemon_list. In most cases this is the name of the binary that is executed, not the name of the service. Unwrapped Applications
Not all network services use TCP wrappers for access control, the most notable examples being Apache and Samba. Because of the complexity of these applications, access control has been built in. A very common problem is trying to use TCP Wrappers to lock down programs which do not use it. Xinetd Conflicts
Xinetd is used to manage many network services. It has built-in access control support of it's own, in addition to being linked against libwrap. xinetd is compiled with TCP wrappers support, so applications using xinetd can use either TCP Wrappers' access control or Xinetd's access controls (or both). TCP Wrappers' access control list is checked first and then Xinetd's list is checked. When troubleshooting check that one access control method is not interfering with the other. 5-19
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Netfilter/iptables Concepts iptables • -t – • -L – • -A – • -I – • -D –
Netfilter/iptables
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu configs
Netfilter is the standard Linux firewall solution. It has proven to be both robust and flexible. Because it is so capable, it can also be very complicated. When a system has problems providing or connecting to network services, iptables should be one of the first suspects. related packages iptables
iptables-services, firewalld [S12] SuSEfirewall2 [R7]
binaries
5-20
table to modify or display display chain/table contents append a rule to a chain insert a rule in a chain remove a rule from a chain
/usr/sbin/iptables /usr/sbin/iptables-restore /usr/sbin/iptables-save /usr/sbin/ip6tables /usr/sbin/ip6tables-restore /usr/sbin/ip6tables-save [R7] /usr/bin/firewall-cmd [S12] /usr/sbin/SuSEfirewall2
[R7]
/etc/sysconfig/iptables-config /etc/sysconfig/ip6tables-config [S12] /etc/sysconfig/scripts/
log
/var/log/messages
data directory /usr/lib64/xtables/ Init
/usr/lib/systemd/system/iptables.service
The iptables and ip6tables commands are used to interact with Netfilter, which is inside the kernel.
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Netfilter/iptables Troubleshooting Identifying bad rules • Enable logging • Examine alternate tables • Disable IP Tables Chain Policy
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
iptables – Common Problems
The following is a list of common problems: Enable Logging
It doesn't take much for Netfilter to contain more rules than can be easily understood by scanning the output of iptables -L. In such cases, logging can be invaluable. However, it is easy for Netfilter to flood syslog when logging is enabled. Any time a logging rule is added to a chain, rate limiting should be used. For example, to enable logging before a suspect entry in the INPUT chain, with a limit of 6 messages per minute, the command would be:
# iptables -I INPUT 5 -m limit --limit 6/m -j LOG Examining Alternate Tables
Netfilter includes more that just the default filter table. A list of all tables that have been activated on your system can be found in /proc/net/ip_tables_names. While troubleshooting iptables, don't forget to check the contents of these other tables. Disabling Netfilter
Some times the only way to be completely sure that Netfilter is not involved in a problem is to completely disable it. Obviously, care should be taken to not leave a system vulnerable for extended periods of time.
that policy allows traffic. Use the -F option to flush each table and then use the -P option to make sure each chain has a default policy of accept. It's also a good idea to remove any user-defined chains that may exist with the -X option, though this is not strictly necessary in order to disable Netfilter:
# # # # # # # # # # # # # # # # #
iptables iptables iptables iptables iptables iptables iptables iptables iptables iptables iptables iptables iptables iptables iptables iptables iptables
-F -t -t -X -t -t -P -P -P -t -t -t -t -t -t -t -t
nat -F mangle -F
nat -X mangle -X INPUT ACCEPT FORWARD ACCEPT OUTPUT ACCEPT nat -P PREROUTING ACCEPT nat -P OUTPUT ACCEPT nat -P POSTROUTING ACCEPT mangle -P PREROUTING ACCEPT mangle -P INPUT ACCEPT mangle -P FORWARD ACCEPT mangle -P OUTPUT ACCEPT mangle -P POSTROUTING ACCEPT
To disable Netfilter, use iptables to flush all the rules and ensure 5-21
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Lab 5 Estimated Time: S12: 120 minutes R7: 120 minutes
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Task 1: Troubleshooting Problems: Topic Group 2 Page: 5-23 Time: 120 minutes Requirements: b (1 station) c (classroom server)
5-22
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Objectives y Practice troubleshooting related to: networking, xinetd, TCP wrappers, and iptables
Lab 5
Requirements b (1 station) c (classroom server)
Topic Group 2
Task 1 Troubleshooting Problems:
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Estimated Time: 120 minutes
Relevance Practice solving problems to make it easier to diagnose and fix them in the real-world.
1)
Enter the troubleshooting environment with tsmenu.
# tsmenu
2)
Execute each of the scripts within Troubleshooting Group #2.
5-23
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Content X11 Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 X11 Server Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 X11 Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Rsyslog Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 System Logging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 systemd Journal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 systemd Journal's journactl . . . . . . . . . . . . . . . . . . . . . . . . . 11 Secure Logging with Journal's Log Sealing . . . . . . . . . . . 13 Syslog Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 RPM Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 RPM Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 Common Unix Printing System (CUPS) . . . . . . . . . . . . . . . 18 CUPS Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 CUPS Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 at & cron . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 at & cron Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 at & cron Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . 24 Lab Tasks 25 1. Troubleshooting Problems: Topic Group 3 . . . . . . . . . . 26
z u s e j.h r u i F Chapter d t t iu r t e e b z o R m e : n to TOPIC GROUP 3 d @ e rt s e n b e c o i r L sz. e r fu
6
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
X11 Concepts MIT's The X Window System • X.Org Foundation based on XFree86
X11
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
The implementation of the X Window System, popularly known as X, was provided by the XFree86 Project during the 90s. Modern distributions use the version provided by the X.Org Foundation. The switch to the Xorg implementation has resulted in a renewed vitality for X. related package(s) [R7] xorg-x11-server-Xorg [S12]
xorg-x11-server
port/protocol
177/tcp, 177/udp – XDMCP 6000/tcp – X display
binaries
/usr/bin/X, /usr/bin/Xorg, /usr/bin/xinit, /usr/bin/startx, /usr/bin/xauth, /usr/bin/xhost, /usr/bin/xconsole, /usr/sbin/gdm
configs
/etc/X11/xorg.conf
logs
/var/log/Xorg.*.log, $HOME/.xsession-errors, /tmp/xses-$USER*
6-2
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
X11 Server Operation Starting X11 • multi-user.target • graphical.target X Security
Starting X11
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
When most Linux systems are set to boot in a multi-user mode, but without an X display manager, users must start their own X sessions using startx. The startx command is basically a user friendly wrapper around xinit.
Similar to runlevel 5, the systemd graphical target runs an X "display manager". The GNOME or LightDM display manager provides session management and a graphical login screen. (Systems running KDE use kdm, and the X Window System also provides xdm as a simple option.) Running Multiple X Sessions
It is possible to have more than one X session running on a machine at a time. Each session is identified by a display name. If a user has an active X session, its name will be in the $DISPLAY variable. This name has two parts; the first is the hostname of the machine where the session is running; when no hostname is included, localhost is assumed. The second part is the display number. The default display is :0.
X Security
If it has permission, a process can connect to an X session on a remote system. Two types of X session security can be used. The original xhost command is not very fine grained; it can only grant or revoke connection privileges for an entire IP address. The newer xauth command creates a shared secret that a process must be able to present before it is allowed to connect to a session; this allows access to be granted only to specific users. These xauth shared secrets (typically called cookies) are usually stored in the ž/.Xauthority file. Although X provides these modest security features, its network protocol is unencrypted and insecure. SSH can be configured to forward X connections. It does so by tunneling the X connection from an X client, which connects to the user's sshd process (on the remote system), through the SSH encrypted data stream to the X server on the ssh client side.
When a user connects to a remote machine, the $DISPLAY variable may appear to be set to display on the remote system, but if SSH is configured correctly any graphical applications will actually be displayed on the user's local system. Additionally, the ssh client and server ensure that appropriate xauth cookies are set on both ends, and remove them from the remote system when the SSH connection is terminated.
6-3
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
X11 Troubleshooting Using an Alternate Display Transferring the $DISPLAY Transferring the ž/.Xauthority Cookie Changing the hostname Pruning /tmp/ Session specific log • /var/log/Xorg.#.log Display configuration • /etc/X11/xorg.conf Typically, display should auto-configure without this file.
X11 – Common Problems
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
The following is a list of common problems: Using an Alternate Display
By default, startx will try to use the default display :0. If an X session is already started and the user wishes to start a second session, startx must be told to use an alternate display:
# startx -- :1 . . . output omitted . . . Transferring the $DISPLAY
When using the su command to switch to another account, the - (or -l or --login) option should usually be included to get the user's full environment. However, doing so replaces the $DISPLAY variable. One solution is to echo $DISPLAY and explicitly set it after switching to a root owned shell. Transferring the ~/.Xauthority Cookie
When using su to switch to another account, the ž/.Xauthority cookie isn't transferred. This doesn't matter when switching to the root user, but does when switching to an unprivileged user. One solution is to copy the ž/.Xauthority file.
6-4
Changing the Hostname
The ž/.Xauthority file can contain multiple cookies. Each cookie is tied to a hostname. If the hostname is changed during an active X session, applications won't be able to find the right cookie and X will refuse to allow them to connect. As a result, the user won't be able to open any new applications. One surprising but realistic scenario for the hostname changing is inserting a wireless card. Most wireless cards are configured to use DHCP, and DHCP can be configured to accept a hostname from the DHCP server. This may be a desired behavior during boot, but probably not afterward.
The easiest solution is to set the hostname back to what it originally was by running hostname old_hostname. Because many other services depend on the hostname, it is generally best to reboot the system after changing the hostname. Pruning /tmp/
In order to start an X session, X must be able to create certain files in /tmp/. If it is not able to, X will not work correctly. The most common cause of such a problem is /tmp/ being full (or the user is over quota limits for /tmp/). The solution is deleting old files in /tmp/.
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
/var/log/Xorg.#.log Check the /var/log/Xorg.0.log file (where 0 is the X session number) for the session log output. Check the DISPLAY variable for the session number (number after colon):
$ echo $DISPLAY :0.0
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Note that towards the top of the log is a key for interpreting messages in the log file:
$ grep -A 2 Markers: /var/log/Xorg.0.log Markers: (--) probed, (**) from config file, (==) default setting, (++) from command line, (!!) notice, (II) informational, (WW) warning, (EE) error, (NI) not implemented, (??) unknown. (EE) marks fatal errors. Display Configuration
In general, the /etc/X11/xorg.conf is not needed. If it is present, and configuring the display is problematic, rename the file:
# mv /etc/X11/xorg.conf /etc/X11/xorg.conf.bkp
If the auto detected configuration is unable to properly detect and configure the display, at least sufficient to use a graphical tool for configuring resolution and color depth for fine tuning, an xorg.conf file should be configured. An initial configuration can be generated by running Xorg -configure while Xorg is not already running. Check the xorg.conf(5) manual for details. In most cases, a blank or distorted display is a symptom of an unsupported, or perhaps faulty, display card. Use the nomodeset kernel parameter, passed by the boot loader to see if a clean display results. A Vesa X Window System display driver might also work where the chipset specific driver does not. Often, a proprietary driver is provided from the manufacturer that can provide support, or improved support, for running X with the video chip in question. Finally, check that the /etc/X11/xorg.conf.d/ directory does not contain a file that is causing problems.
6-5
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Rsyslog Concepts Accepts and routes logging messages • /etc/rsyslog.conf • /etc/sysconfig/rsyslog • /etc/sysconfig/syslog
Rsyslog
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu /etc/rsyslog.conf Syntax
The rsyslogd daemon is responsible for routing system log messages to local files or over the network to a centralized logging server. related packages rsyslogd port/protocol
514/udp, 6514/tcp
binaries
/usr/sbin/rsyslogd
configs
/etc/rsyslog.conf
facility.priority
action
Here is an example of an entry in the /etc/syslog.conf: File: /etc/rsyslog.conf
mail.*
/var/log/maillog
After making any changes to the /etc/rsyslog.conf file, restart the Rsyslog daemon:
# systemctl restart rsyslog
/etc/sysconfig/rsyslog [S12] /etc/sysconfig/syslog
To configure a Linux machine to accepts log messages from other systems, load the plugin to support listening for remote logs:
log
Self-referential
data directory
/var/log/
systemd
/usr/lib/systemd/system/rsyslog.service
File: /etc/rsyslog.conf + $ModLoad imudp + $UDPServerAddress * + $UDPServerRun 514
[R7]
The Rsyslog daemon is configured in the /etc/rsyslog.conf file. This file specifies log facilities, priorities, and actions for the corresponding facilities and priorities. The actions can be local files or remote logging servers.
6-6
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
System Logging Sysklogd • Standard through RHEL5 • syslogd and klogd Rsyslog • Enhanced drop-in replacment for sysklogd • rsyslogd systemd Journal • Wide adoption in modern Linux distributions • Often used in combination with Rsyslog Some daemons bypass syslog • Typically log directly into a subdirectory under /var/log/ • Apache and Samba are examples
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Logging Daemon Implementations
The original Unix system log daemon was written for Sendmail and was a standard part of Berkeley Unix (BSD). An enhancement called Sysklogd has been dominant on Linux systems until recent years. This provided syslogd, and a separate klogd for interpreting kernel messages, and was configured in /etc/syslog.conf. It has been replaced on some systems with Syslog-NG. However, enterprise distributions now ship Rsyslog, a drop in replacement for Sysklogd.
With the adoption of systemd, the primary logging services have shifted from being provided by a Syslog implementation to the systemd Journal. The new journal can operate independently, or can be configured to selectively pass messages to an Rsyslog process. Rsyslog can in turn be configured to selectively pass messages (for example those from a remote host) back into the journal. Logging Architecture
Traditional Logging Architecture
kernel space
print(k)
/proc/kmsg
user space daemons
logs /var/log/*
/dev/log
imklog
imuxsock
RSyslog
The following illustrations show the flow of log data between the various system components for a system using just Rsyslog and the newer architecture introduced by running both the systemd Journal and Rsyslog together as found on modern Linux distributions.
6-7
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
New Logging Architecture Incorporating the systemd Journal
user space
z u s e j.h r u i F d t Systemd t u r i Journal t e e b z o R m e : n o t RSyslog d @ e t r s e n b e c o i r L sz. e r fu
kernel space
daemons
/dev/log
print(k)
logs
/run/systemd/ journal/stdout
/dev/kmsg
/var/log/*
# tail, less, ...
/run/log/journal/*
# journalctl
/var/log/journal/*
/var/run/ journal/socket
# dmesg
imjournal
omjournal
im{tcp,udp}
TCP / UDP (514)
6-8
/dev/console
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
systemd Journal Journal adds many new features, and complements syslog • Forwards to Syslog by default in systemd version less than 216 Several different approaches to integrating rsyslog and the journal are available Volatile storage by default, persistent storage possible Configurable in /etc/systemd/journald.conf
Journal Features
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
The traditional Syslog style logging daemons had many deficiencies. Messages generated before the logging daemon was started (early boot and initial RAM disk processes) were not captured. Messages written to STDOUT or STDERR were not captured. Logs were unstructured, unindexed text. The new systemd Journal addresses these issues and also adds several other useful features such as: y Can collect messages from: early boot, kernel, and user processes (STDOUT, STDERR, or native Journal API) y Logs are structured binary data with extensive meta-data y Logs are indexed on all fields allowing for rapid filtering and retrieval. y Built in log rotation. y Supports cryptographic sealing and verification of logs for integrity. y Includes powerful command for accessing log data (journalctl). y Integrates well with Rsyslog allowing bi-directional passing of log messages between the services. Can also be used as a replacement to Rsyslog. Enabling Persistent Storage for Journal
By default, the journal logs data to the /run/log/journal/_MACHINE_ID directory which is a tmpfs filesystem and subsequently volatile. The location and method of storage is controlled via the Storage= option in the
/etc/systemd/journald.conf file. The default setting of Storage=auto will automatically begin storing log data in the /var/log/journal directory if that directory exists. # mkdir -p /var/log/journal/ # systemctl restart systemd-journald
Once persistent log storage is enabled, settings in the journald.conf control the total amount of space the logs are allowed to consume, and the rotation intervals. See the man page journald.conf(5) for details. The total space in use by journal can be seen by running:
# journalctl --disk-usage Journals take up 712.3M on disk. rsyslog and Journal Integration
Two different approaches can be used for Syslog integration. With early versions of systemd, the journal was configured to forward messages to rsyslog in a push architecture and the journal by default used the ForwardToSyslog=yes setting. Currently the more efficient pull architecture is used, and rsyslog uses imjournal to pull messages from the journal in a structured format, and not access the journal's unstructured output socket by using this configuration: File: /etc/rsyslog.conf
$ModLoad imjournal # provides access to the systemd journal $OmitLocalLogging on # don•t use the journal•s output socket 6-9
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
In the recommended configuration, the systemd Journal is what receives the syslog messages from local applications by listening to the /dev/log file. This is configured in the [Socket] section of the /usr/lib/systemd/system/systemd-journald.socket. Then as previously noted, rsyslog pulls the messages from the journal using the imjournal input module.
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
File: /usr/lib/systemd/system/systemd-journald.socket
[Socket] ListenStream=/run/systemd/journal/stdout ListenDatagram=/run/systemd/journal/socket ListenDatagram=/dev/log SocketMode=0666 PassCredentials=yes PassSecurity=yes ReceiveBuffer=8M
This also means that installing rsyslog is optional, and a pure journal configuration will work transparently for the syslog using applications on the box. Receiving network syslog messages into the journal
One feature the journal doesn't have is the ability to receive standard syslog messages over the network. For this type of configuration, setup rsyslog to receive the messages and then forward the messages to the journal using the omjournal output module. For example: File: /etc/rsyslog.conf
module(load="imudp") # input module for UDP syslog module(load="omjournal") # output module for journal input(type="imudp" port="514" ruleset="writeToJournal") ruleset(name="writeToJournal") {action(type="omjournal")}
6-10
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
systemd Journal's journactl systemd Journal is an indexed binary database journalctl used to query database • Without any parameters, shows full journal contents, oldest first • Advanced filtering available • Uses less to display information No wrapping by default, use arrow keys to scroll horizontally Set SYSTEMD_LESS environment variable to control less behavior
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Accessing Logs with journalctl
The journalctl command is used to access log data stored by the journal. If called without any options it will display all log data automatically piped to the less pager wth the oldest log entries at the top. Remember that less allows for both vertical and horizontal paging, and that long log lines can be read by scrolling to the right or left as needed with the arrow keys. In order to wrap long lines, set the SYSTEMD_LESS environment variable to FRXMK which omits the S that is normally used by default. For example:
# SYSTEMD_LESS=FRXMK journalctl
Normal users will only have access to their own user log data (if any). Members of the adm group or the root user have full access. The output of journalctl is similar to traditional Syslog format but is visually styled to highlight important data, and time-stamps apply time-zone offset to show the local time. Powerful filtering options are
supported to specify exactly what log data is desired (see man journalctl(1) for details). Examples of useful options include:
-n num ⇒ Show the specified number of log events -r ⇒ Reverse output showing most recent events at the top -o output_format ⇒ Almost a dozen output formats ranging from the most minimal cat, to verbose which show all meta-data fields. -f ⇒ Shows most recent log events and continues to display new entries as they are generated (similar to tail -f logfile) Filtering Logs
One of the major advantages of the journal is its ability to quickly filter logs based on any of the indexed meta-data fields. This allows an administrator to quickly extract the relevant log data without resorting to extensive error-prone grep pipelines. The following examples show some of the filtering methods (see man journalctl(1) for more details).
Show logs, in reverse chronological order for a specific systemd unit:
# journalctl -r -u libvirtd -- Logs begin at Fri 2013-07-26 15:31:10 MDT, end at Mon 2015-04-20 13:05:12 MDT. -Dec 10 08:34:05 localhost dnsmasq[3589]: started, version 2.68 cachesize 150 Dec 10 08:33:59 localhost systemd[1]: Started Virtualization daemon. . . . snip . . .
6-11
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Show details of how many boots are stored in the logs and what periods of time they cover. Then display logs for only the previous boot:
# journalctl --list-boots | head -n 2 -90 afc3137b82ad4650b95c231b62fbb636 Wed 2014-06-18 12:07:03 MDT Thu 2014-06-19 18:20:43 MDT -89 afc3137b82ad4650b95c231b62fbb636 Wed 2014-06-18 12:07:03 MDT Thu 2014-06-19 18:20:43 MDT # journalctl -b -1 . . . output omitted . . .
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Display log entries that have collected since the most recent boot:
# journalctl -b . . . output omitted . . .
Show logs only for events with priority levels of emergency, alert, critical, or error, and were generated by the sshd command:
# journalctl -p 0..3 _COMM=sshd -- Logs begin at Wed 2014-06-18 12:07:03 MDT, end at Wed 2014-10-15 15:13:31 MDT. -Oct 14 16:28:08 dhcp93.hq.gurulabs.com sshd[24936]: error: Couldn•t create pid file "/var/run/sshd.pid": Permission denied Show all possible values that can be used with the _COMM match based on what is in the journal:
# journalctl -F _COMM sudo pkexec . . . snip . . .
Show all possible match fields that can be used:
# journalctl ÐÐ _AUDIT_LOGINUID= . . . snip . . .
ERRNO=
_PID=
_SYSTEMD_SESSION=
Show logs for the specified time period only for a specific systemd unit:
# journalctl --since "2014-7-20" --until "2014-7-22" -u crond -- Logs begin at Wed 2014-06-18 12:07:03 MDT, end at Wed 2014-10-15 15:39:36 MDT. -Jul 20 10:58:54 localhost systemd[1]: Started Command Scheduler. Jul 20 10:58:54 localhost crond[636]: (CRON) INFO (RANDOM_DELAY will be scaled with factor 60% if used.) Show the last 20 log events for the graphical session of a specified user:
# journalctl -r -n 20 _UID=50002 _COMM=gnome-session -- Logs begin at Wed 2014-06-18 12:07:03 MDT, end at Wed 2014-10-15 17:44:41 MDT. -Oct 15 17:32:39 localhost gnome-session[1596]: ,["alt4.stun.l.google.com","19302"] Oct 15 17:32:39 localhost gnome-session[1596]: [19883:218] 0x3d758dca5a0: C->F: ["jf",[["stun.l.google.com","19302"] . . . snip . . . 6-12
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Secure Logging with Journal's Log Sealing Attackers typically modify log files to cover their tracks Forward Secure Sealing (FSS) enables detection of log modifications • Uses Seekable Sequential Key Generators cryptographic primitive • Most recent 15 minutes not protected by default Configurable interval • Still consider remote logging FSS serves as a red flag only
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Detecting Log Modification with Log Sealing
In the aftermath of system security compromise, logs become an essential resource in determining when and how the system security was breached. However, if an attacker obtained root level access, then any data on the system could have been modified and subsequently locally stored logs can not be trusted. The most common solution to this problem is for a copy of all log data to be forwarded to a secure remote system. However, the additional
complexity and cost associated with remote logs is not always acceptable. The systemd Journal allows the local logs to be periodically "sealed" in such a way that they can later be verified for integrity and an alert produced if modifications are detected. To initiate log sealing, first determine the sealing interval (default 15min). Then create the key pair (sealing and verification):
# journalctl --setup-keys --interval=30m Generating seed... Generating key pair... Generating sealing key...
The new key pair has been generated. The secret sealing key has been written to the following local file. This key file is automatically updated when the sealing key is advanced. It should not be used on multiple hosts. /var/log/journal/3495c675d461450ba68decb774b69083/fss
Please write down the following secret verification key. It should be stored at a safe location and should not be saved locally on disk. d8c8a7-149a6b-86c299-b5ae20/bfb78-6b49d200
The sealing key is automatically changed every 30min. The keys have been generated for host nazgul.gurulabs.com/3495c675d461450ba68decb774b69083. To verify the logs later run the following:
6-13
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
# journalctl --verify --verify-key=d8c8a7-149a6b-86c299-b5ae20/bfb78-6b49d200 PASS: /var/log/journal/3495c675d461450ba68decb774b69083/
[email protected]ž PASS: /var/log/journal/3495c675d461450ba68decb774b69083/
[email protected]ž PASS: /var/log/journal/3495c675d461450ba68decb774b69083/
[email protected]ž 1c47f18: invalid entry item (11/23 offset: 000000 37% Invalid object contents at 1c47f18: Bad message File corruption detected at /var/log/journal/3495c675d461450ba68decb774b69083/
[email protected] journalž:1c47f18 (of 33554432 bytes, 88%). FAIL: /var/log/journal/3495c675d461450ba68decb774b69083/
[email protected]ž (Bad message) PASS: /var/log/journal/3495c675d461450ba68decb774b69083/
[email protected]ž
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
6-14
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Syslog Troubleshooting Synchronous logging performance Syntax checking Directories and paths must exist
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Synchronous Logging Performance
Syntax Checking
Typically, Syslog runs without problems. On very busy servers, synchronous logging may degrade system performance. Synchronous logging synchronizes or writes each message to the file on disk when it comes in. Though this was the default with Sysklogd, requiring a prepended to the destination file to disable it, modern Syslog daemons write asynchronous by default.
The Syslog and Rsyslog daemons do not validate the syntax of the configuration file. Common syntax errors may include white spaces between the facility name and priority, instead of periods.
Under Rsyslog, to enable synchronous logging, add the $ActionFileEnableSync directive above all log rules with a value of on: File: /etc/rsyslog.conf + $ActionFileEnableSync on
authpriv.* /var/log/secure mail.* -/var/log/maillog
The syslog-ng daemon performs a syntax check of its config when it starts. If it finds an error, it will refuse to start. Unfortunately, it is not able to catch all types of errors. Do not assume that because the syslog-ng daemon started that there are no errors in the configuration file. Directories Must Exist
Syslog daemons will not create new directories. If a destination file specified in the Syslog configuration does not exist, the daemon can automatically create it (if configured to do so). If a directory does not exist, the daemon will report a warning, but continue, losing all messages intended for that destination.
Under Syslog-NG, disable asynchronous writes using the following example configuration line: File: /etc/syslog-ng/syslog-ng.conf
destination mailerr { file("/var/log/mail.err" a fsync(yes)); };
6-15
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
RPM Concepts RPM RPM • RPM • • •
RPM
/var/lib/rpm/ Utilities
rpm rpmbuild rpm2cpio
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu RPM Database
The RPM Package Manager was developed by Red Hat to improve package management on Linux. RPM allows administrators to easily install, remove, upgrade, and verify software packages on the system. packages
rpm, rpm-build
binaries
rpm, rpmbuild, rpm2cpio
configs
/etc/rpm/, /usr/lib/rpm/
data directory /var/lib/rpm/ RPM Package Files
An RPM package files consists of two components. One component is the metadata. The metadata contains a description about the package itself, helper-scripts for installing the package and a list of file attributes. The second component is a cpio archive file that contains all the files to be installed.
6-16
Package Manager Database
When RPM packages are installed on a system, information about the rpm package is recorded in the RPM database. The RPM database is typically located in the /var/lib/rpm/ directory. The database is in the Berkeley DB4 (sleepy cat) format. The RPM database contains a list of all installed rpm packages and all the files which belong to those packages. This allows for easy upgrade or uninstallation of a package at a later time. The RPM database also tracks properties of each file such as its correct size, timestamp, and a cryptographic checksum ensuring that the file's correctness can be verified at a later date. The RPM database also contains dependency information for every package, ensuring that administrators trying to install new packages can be certain that any required libraries or file exists on the system. It also ensures administrators who remove packages from the system will not break existing packages. RPM Utilities
Several utilities are supplied for use with RPM. The basic utility used for most RPM administrative tasks is the rpm command. The rpmbuild command is used to produce new RPM package files. The rpm2cpio command is provided for conversion of RPM package files into a standard archive file format.
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
RPM Troubleshooting Dependency Resolution • Follow the dependency chain • Installation tools: gpk-update-viewer RHEL7: gpk-application, yum SLES12: zypper • rpm --nodeps (be careful) Database Corruption • Remove index files • rpm --rebuilddb
RPM Configuration Files
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
RPM has several configuration files in the /usr/lib/rpm directory which primarily define macros which are used when running RPM commands, or when preparing RPM package files. RPM configuration files are also found in the /etc/rpm/ directory. Dependency Resolution
One of the most common problems that system administrators face is dealing with the complex inter-dependencies that packages can have. Most RPMs contain a list of packages that must already be installed on the system. Some of these dependencies include libraries, programming languages or other programs. Before an RPM can be installed, all of the packages that it depends on must be installed (or must be installed on the same rpm command). Each of those packages in turn may have dependencies; the list goes on. In order for one RPM to be installed, an administrator might have to download and install ten other RPMs. If this is done manually then it can be a tedious and time consuming process.
There are several solutions to this problem. The first is to manually identify and install each of the required packages. For RPMs with minimal dependencies, this is a valid solution. However, for more complicated packages, a more sophisticated tool should be used that can resolve, download and install additional, depended upon packages.
use the rpm --nodeps option. This ignores all dependency checks when installing an RPM. However, if the package claims that it requires something, and that dependency is not resolved, then the installed package will, most likely, not function correctly. RPM Database Corruption
It is possible for the RPM database to become corrupted if the rpm command crashes or if attempting to install a bad package. Common symptoms of database corruption are the rpm command hanging, incorrect information being reported about installed RPMs, or error messages from rpm. The database, which is located in /var/lib/rpm/, can be rebuilt by running this command:
# rpm --rebuilddb
Newer versions of RPM support the -v flag and can show status while rebuilding the database.
# rpm -vv --rebuilddb . . . output omitted . . .
Rebuilding the database takes several minutes. If problems still occur, RPM database index files may need to be removed before rebuilding again:
# rm /var/lib/rpm/__db.* # rpm --rebuilddb
Another possibility, which should be avoided if at all possible, is to 6-17
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Common Unix Printing System (CUPS) Can be configured and administered via: • Command line • Web interface • system-config-printer • SLES12: yast2 printer
CUPS
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
The Common Unix Printing System (CUPS) is the advanced printing system that has replaced LPD and LPRng as the de facto printing and printer management software for modern Linux distributions. CUPS was developed by Easy Software Products (ESP) "to promote a standard printing solution for all UNIX vendors and users." In Feb of 2007, Apple Inc. purchased ESP and hired the creator of CUPS. Apple has stated that CUPS will still be released under the GPL2/LGPL2 license and that the original creator will continue to develop and support CUPS. Communicating with CUPS
Several command-line facilities are available for submitting, querying and managing printers, print jobs, remote queues and other aspects of CUPS. CUPS implements the Internet Printing Protocol (IPP) which is based on HTTP. You can connect to the IPP port (port 631) of a CUPS server with a web browser to manage every aspect of CUPS within a web environment. package
cups
port/protocol 631/tcp,udp – Internet Printing Protocol (IPP) 515/tcp,udp – Legacy LPR protocol
6-18
binaries
/usr/sbin/cupsd, /usr/bin/lprm, /usr/bin/lpq, /usr/bin/lpstat, /usr/bin/lpr, /usr/bin/lpoptions, /usr/bin/lppasswd, /usr/sbin/lpadmin, /usr/sbin/lpinfo, /usr/sbin/lpmove, /usr/sbin/accept, /usr/sbin/reject, /usr/sbin/cupsenable, /usr/bin/cupsdisable
config
/etc/cups/cupsd.conf /etc/cups/printers.conf /etc/xinetd.d/cups-lpd /etc/cups/*
log
/var/log/cups/access_log, /var/log/cups/error_log
data directory /var/spool/cups Init
/usr/lib/systemd/system/cups.service
user/group
root/root
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
CUPS Configuration and Administration CUPS provides a complete set of command line utilities for administration as well as a web interface (listening on http://localhost:631 by default). The KDE System Settings' Printer Configuration tool provides an easy to use, complete printer administration tool.
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Red Hat's system-config-printer program, for printer configuration and management, is common on most Linux distributions. [S12] The following applies to SLES12 only:
SLES12 provides the yast2 printer command (module) for printer configuration and management.
6-19
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
CUPS Troubleshooting Printer queue is disabled • By default, any errors cause queue to be stopped Configurable globally, or per print queue Printer is spewing garbage
CUPS – Common Problems
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
CUPS is complex and may manifest complex problems. As with many services on Linux, some of the most useful troubleshooting clues can be found in log files. Printer Queue is Disabled
the top level directive, ErrorPolicy retry-job, can be set so that any newly created printers will automatically re-enable and retry jobs based on the JobRetryInterval setting. This setting can also be set on a per printer/queue basis with:
# lpadmin -p printername -o printer-error-policy=retry-job
Use the lpstat command to check the status of print queues and printers. lpstat has several options, of which, the -t option will cause it to show all status information available about all printers, printer classes and currently queued jobs.
Printer is Spewing Garbage
# lpstat -t scheduler is running system default destination: hpdj952c device for hpdj952c: usb:/dev/usb/lp0 hpdj952c accepting requests since Jan 01 00:00 printer hpdj952c disabled since Jan 01 00:00 Paused
Use the lpq command or the lpstat command to list jobs currently in the print queue.
If a print queue is disabled, use the appropriate CUPS command to re-enable it. For example, to enable the paused printer shown in the above output, you could run:
Using an inappropriate printer filter will usually result in output from the printer that can only be defined as garbage. Usually, you want to dequeue the print job before the printer wastes all its paper.
# lpstat -a hpdj952c . . . output omitted . . . # lpq -P hpdj952c Rank Owner Job File(s) Total Size 1st guru 220 resume.pdf 24737 bytes
To remove a job from the queue, feed the job number to the lprm command:
# /usr/sbin/cupsenable hpdj952c
# lprm 220
Error Policy
This would remove print job number 220 from the default print queue.
By default CUPS will stop the queue after any errors and wait for manual intervention using cupsenable. In the /etc/cups/cupsd.conf 6-20
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
CUPS Troubleshooting Local queue won't print from remote CUPS clients Remote Unix machine can not print to local CUPS printer
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Local Queue Not Accepting Jobs from Remote CUPS Clients
If a local queue will not accept print jobs from remote CUPS clients, it is usually an issue with permissions. Make sure the cupsd.conf file allows access to the printer from the network address or Ethernet interface the remote system is on: File: /etc/cups/cupsd.conf
Order Deny,Allow Deny From All Allow From 127.0.0.1 AuthType None Allow from @IF(eth0)
Also check firewall settings (using iptables) that may block traffic on port 631. Remote Unix Machine Can Not Print to CUPS Printer
Legacy Unix systems send print jobs to print servers using the lpd protocol, not IPP. CUPS can accept lpd jobs via the cups-lpd service, which is usually managed by Xinetd. Use the chkconfig command to enable it or edit the /etc/xinetd.d/cups-lpd file.
# chkconfig cups-lpd on
Check that the network port (515) is not blocked. 6-21
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
at & cron Automated and scheduled tasks • crond • crontab -e • at • atq • atrm
at & cron
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu packages
at, cronie [R7] crontabs
binaries
/usr/bin/crontab, /usr/bin/at, /usr/bin/atrm, /usr/bin/atq, /usr/sbin/atd [R7] /usr/sbin/crond [S12] /usr/sbin/cron
configs
/etc/crontab /etc/cron.d/* /etc/cron.{hourly,daily,weekly,monthly} /etc/at.{allow,deny}
user data
/var/spool/cron/, /var/spool/at/
logs
[R7]
Init
/usr/lib/systemd/system/atd.service
The atd and crond daemons provide the ability to schedule one-time or recurring execution of tasks or commands.
/var/log/cron [S12] /var/log/messages [R7]
/usr/lib/systemd/system/crond.service [S12]
/usr/lib/systemd/system/cron.service
user/group root/root; jobs run as submitter's UID/GID
6-22
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
at & cron Usage crontab • -e – edit crontab • -r – remove crontab • -l – list crontab at atq atrm
Using cron
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu Using at
The crontab command can be used with the -e option to edit crontab files. Output from cron tasks is emailed by the cron daemon, if command outputs are not redirected within the cron task specification.
Use the at command to schedule a command or series of commands to run once at a certain time and date.
System administrators can add files to the /etc/cron.d/ directory. These files are automatically handled by the crond daemon as additional system crontabs.
Ó¿d
The crontab command accepts the -l option to list jobs scheduled for the current user. Use the -r option to remove all cron jobs for that user. The crontab command does not affect system crontabs.
# crontab -l # (/tmp/crontab.26641 installed on Mon Aug 9 13:50:57 2004) # (Cron version -- $Id: crontab.c,v 2.13 1994/01/17 a 03:20:37 vixie Exp $) 0 4 * * * webalizer -c /www/utahisps.com/web.conf See the crontab(5) man page for detailed syntax for a crontab file.
System administrators can add executable scripts and programs to the /etc/cron.hourly/, /etc/cron.daily/, /etc/cron.weekly/ and /etc/cron.monthly/ directories to have those files executed automatically by the cron daemon at those intervals. The schedule for execution of scripts in these directories is specified by commands in the main system crontab, /etc/crontab.
# at 0500 tomorrowÕ rm -rf /var/www/html/logs/old/*Õ Job 234 at 2004-11-23 05:00
Use the atq command to list all jobs currently scheduled for future execution.
# atq 234 229
2004-11-23 05:00 root 2004-11-15 07:33 root
Use atrm to remove a job from the at queue.
# atrm 234
System administrators can restrict which users may use the at command by populating the /etc/at.deny or /etc/at.allow files with usernames.
6-23
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
at & cron Troubleshooting Misspecified date/time Execute bit not set on custom scripts Non-redirected output Accidental deletion of crontab Users blocked via {at,cron}.{allow,deny}
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
at & cron – Common Problems
Most at and cron problems are caused by failure to specify the jobs, or their schedule, properly. Misspecified Date/Time
When using the crontab -e command to edit jobs for cron execution, crontab will attempt to validate the format of your entries, but it is possible to enter some date and time specifications that will never occur (like February 31).
Because the r and e keys are adjacent to each other on the keyboard, it is easy to press the wrong key. For this reason, many users forgo the use of the crontab -e command for editing their user crontab. Instead, they keep a file in their home directory, which they edit when a change to their crontab is desired. The contents of a file can be installed as the user's new crontab using:
$ crontab filename
Execute Bit Not Set on Custom Scripts
Users Blocked Via /etc/{at,cron}.{allow,deny}
One common mistake is to forget to add execute permission to a file placed in one of the /etc/cron.daily/, /etc/cron.hourly/, /etc/cron.weekly/ or /etc/cron.monthly/ directories.
A common problem with the at command occurs when the /etc/at.allow file exists. If the file /etc/at.allow exists, then only those users who are listed in it will have the ability to run the at command. If the file /etc/at.deny exists and /etc/at.allow does not, users specifically listed in this file are prevented from running at. If neither file exists, then all users are denied access to at, except root, of course.
Non-Redirected Output
If cron jobs are run often and generate lots of output, consider redirecting the output to a file in the job specification. Otherwise, someone will receive many large email messages. Redirecting normal, non-error output to /dev/null is very common, as well as being a good, basic practice. Be careful to not be too over-zealous about this, though. Accidental Deletion of a crontab
Another common problem with cron jobs is accidentally typing crontab -r instead of crontab -e, thus deleting the user's crontab. 6-24
The behavior of crond with its /etc/cron.allow and /etc/cron.deny files is the same as for atd. If the file /etc/cron.allow exists, then only those users who are listed in it will be able to use the crontab command. If the file /etc/cront.deny exists and /etc/cron.allow does not, users specifically listed in this file are prevented from running crontab. If neither file exists, then all users are denied access to crontab, except root, of course.
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Lab 6 Estimated Time: S12: 120 minutes R7: 120 minutes
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Task 1: Troubleshooting Problems: Topic Group 3
Page: 6-26 Time: 120 minutes Requirements: b (1 station) c (classroom server) d (graphical environment)
6-25
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Objectives y Practice troubleshooting related to: X11, syslog, RPM, Printing, CUPS, at and cron
Lab 6
Requirements b (1 station) c (classroom server) d (graphical environment)
Topic Group 3
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Relevance Practice solving problems to make it easier to diagnose and fix them in the real-world.
1)
Enter the troubleshooting environment with tsmenu.
# tsmenu
2)
6-26
Task 1 Troubleshooting Problems:
Execute each of the scripts within Troubleshooting Group #3.
Estimated Time: 120 minutes
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Content Users and Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Users and Groups Troubleshooting . . . . . . . . . . . . . . . . . . . 3 PAM Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 PAM Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Filesystem Quotas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Quotas Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 File Access Control Lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 FACL Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 SELinux Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 SELinux Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 SELinux Troubleshooting Continued . . . . . . . . . . . . . . . . . 14 Lab Tasks 16 1. Troubleshooting Problems: Topic Group 4 . . . . . . . . . . 17
z u s e j.h r u i F Chapter d t t iu r t e e b z o R m e : n to TOPIC GROUP 4 d @ e rt s e n b e c o i r L sz. e r fu
7
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Users and Groups User Files • /etc/passwd, /etc/shadow Group Files • /etc/group Commands • useradd, userdel, usermod • groupadd, groupdel, groupmod • passwd, gpasswd, chage • chsh, vipw, vipw -g • RHEL7 graphical tool: system-config-users • SLES12 graphical tool: yast/yast2
Users and Groups
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Linux is a truly multi-user operating system, supporting multiple users and groups. User and group administration consists of several files; primarily /etc/passwd, /etc/shadow and /etc/group. package [R7] shadow-utils, setup [S12]
shadow
binaries useradd, userdel, usermod groupadd, groupdel, groupmod passwd, gpasswd, chage
system-config-users [S12] yast/yast2 [R7]
configs
/etc/passwd, /etc/shadow, /etc/group, /etc/login.defs, /etc/default/useradd
log
[R7]
/var/log/secure [S12] /var/log/messages
Important Files
/etc/passwd ⇒ Stores information such as the username, UID, primary GID, home directory, and shell.
/etc/shadow ⇒ Was created to keep the hashed password out of a world readable file. Contains each account's hashed password and 7-2
also password and account expiration settings/information.
/etc/group ⇒ Defines all groups on the system, their GID's and a list of who is a member of each group.
/etc/login.defs and /etc/default/useradd ⇒ Defines defaults for both the user and group account manipulation commands. Examples of options found in these files include: UID and GID ranges, whether (and where) home directories are automatically created, and what hash type to use for passwords.
Commands
Although these files can be edited by hand, a more convenient and safer way is to use the user and group administration tools. The intuitively named useradd, usermod and userdel commands allow user accounts to be created, modified, and deleted. The groupadd, groupdel and groupmod commands do the same for group administration. Using vipw allows for editing /etc/passwd directly, or with -g the /etc/group database. For unprivileged users, the chsh command can change the default shell. The chage command changes password and account expiration settings in the /etc/shadow file. The -l option provides a listing of the current expiration settings. Most Linux distributions come with graphical tools to manage users and groups.
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Users and Groups Troubleshooting Authentication Issues • Incorrect authentication system setup (local, LDAP, NIS, etc.) • Bad password • Locked account • Expired account/password • Typo in PAM configuration Post-Authentication Issues • Invalid shell • PAM limits (resource or access time limits)
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Users and Groups – Common Problems
The majority of problems relating to users and groups deal with the authentication process. Because there are so many possible problems, it is best to break them up into two categories: authentication and post-authentication issues. Authentication Issues
There are several reasons why a user cannot authenticate to a system. The first is that the authentication system—whether it be through the local passwd system, NIS or LDAP—is not configured correctly. Most of these problems are related to PAM's configuration, which are discussed elsewhere.
The next step in resolving an authentication problem is often to verify that the user is entering the correct password. It may be necessary to reset the password with the passwd command. If the root user's password is unknown, one way to reset it is to boot into single user mode and change the password there. Another way is to access a rescue environment. Maybe the user's account is locked or the account or the password has expired. The -S option to passwd shows the account lock state:
# passwd -S emcnabb Password locked.
Minimum: 0 Maximum: 99999 Warning: 7 Inactive: -1 Last Change: Aug 15, 2004 Password Expires: Never Password Inactive:Never Account Expires: Oct 23, 2004
An account can be unlocked with passwd -u and expiration times can be changed with the chage command. Post-Authentication Issues
After the user has authenticated, problems can still occur that prohibit the user from successfully logging in. Not many error messages are given when one of these problems occur. Usually, the user is logged out immediately after logging in. The most common issue is that the user's shell in /etc/passwd is incorrect. Changing the shell field to something invalid was, historically, a common method to lock accounts. There are also many PAM settings that could prevent a user from gaining system access after being authenticated. These include system resource limits and access times that are configured in files within the /etc/security/ directory.
The chage command reports account or password expiration:
# chage -l emcnabb 7-3
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
PAM Concepts Pluggable Authentication Modules • Invented by SUN Microsystems • Allows programs to share a common authentication framework • "Pluggable" design allows new authentication methods, such as smart cards, to be supported without modifying the original program • Administrators can easily customize authentication policies of programs that support PAM • Implementations exist for Linux, Solaris, HP-UX, FreeBSD, etc.
PAM
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu PAM Configuration
The first PAM (Pluggable Authentication Modules) implementation was created by Sun Microsystems to provide a common framework for user authentication. Programs that use PAM for authentication can easily be customized and upgraded without modifying the program itself. By combining multiple PAM modules, fine grained system policies can be implemented with relatively minimal effort. packages pam, pam_krb5
pam_pkcs11 [S12] pam-modules, pam_mount, pam_radius, pam_smb, pam_ssh, yast2-pam [R7]
configs
/etc/pam.d/ /etc/security/
log
[R7]
7-4
/var/log/secure [S12] /var/log/messages
If the /etc/pam.d/ directory exists then the /etc/pam.conf file is ignored. Policies for each program supporting PAM are found in /etc/pam.d/ by program name. In other words, the policy for the su command would be found in /etc/pam.d/su file. Policies are created by sequencing PAM modules. The modules are found in /lib/security/. Their individual configuration files (if the given module has one) are usually found in the /etc/security/ directory. For example, the PAM limits module is /lib/security/pam_limits.so and its configuration file is /etc/security/limits.conf. Policies consist of multiple aspects or groups of related management functions: auth, password, account and session. Various control values can be applied to each function. These are required, requisite, sufficient and optional. Most PAM modules come with a README.module_name file describing their functionality and the supported options.
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
PAM Troubleshooting Typographical errors in config files Using a PAM module in the wrong role Creating overly restrictive or permissive policies
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Typographical Errors In Config Files
The easiest mistake to make with PAM is a simple typographical error. If PAM can't understand a policy, it will generally fail and possibly deny all access, even locking out all users including the root user. The best way to spot such problems is to review the relevant log files. Most configuration errors in PAM files will cause fairly specific errors to be logged.
user out. At the same time, care should be taken not to create too permissive a policy. For example, it is possible to carefully lock down all tty's while forgetting that users can also access the system through a pts.
[R7] The following applies to RHEL7 only:
On RHEL7 systems, PAM errors are sent to the /var/log/secure file. [S12] The following applies to SLES12 only:
Unfortunately, on SLES12 PAM error messages get mixed with other system events in /var/log/messages. Finding error messages may require searching for the string "PAM" included with each message. Using a PAM Module In the Wrong Role
Another common mistake is to try to use a PAM module in a role it doesn't support. For example, using a module that only supports account in a session statement. The best way to avoid this type of error is to carefully consult the documentation of modules when creating new PAM config entries. Creating Overly Restrictive or Permissive Policies
It is possible to create overly restrictive policies, effectively locking a 7-5
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Filesystem Quotas Limit use of filesystem resources • per filesystem • hard/soft • user/group • blocks/inodes Commands • quotacheck • quotaon • edquota • repquota • quota
Quotas
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
On systems that have users and services that may exhaust storage resources, quotas impose soft and hard limits on how many data blocks and filesystem inodes a user or group is allowed to use.
To enable quotas on a filesystem, the filesystem must be mounted with the usrquota and/or grpquota filesystem options. These options should be listed in the forth column within the /etc/fstab file to make them persistent across reboots. The special data files aquota.user and aquota.group must be created using the quotacheck command:
package
quota
binaries
/usr/sbin/quotacheck /usr/sbin/edquota, /usr/sbin/setquota /usr/sbin/xfs_quota /usr/sbin/quotastats, /usr/sbin/repquota /usr/sbin/warnquota, /usr/sbin/convertquota /usr/sbin/quotaoff, /usr/sbin/quotaon /usr/bin/quota
# quotaon /home
/etc/quotatab /etc/warnquota.conf /etc/quotagrpadmins, /etc/fstab
# edquota joe . . . output omitted . . .
configs
data directory Applicable filesystem base directory contains aquota.user and/or aquota.group files. Init
/usr/lib/systemd/system/systemd-quotacheck.service /usr/lib/systemd/system/quotaon.service [R7]
/usr/lib/systemd/system/nfs-rquotad.service user/group 7-6
root/root
# quotacheck -cmug /home
Quotas can then be turned on with the quotaon command:
Quotas for users and groups can be edited with the edquota command:
The root user can get a summary of quota's on a filesystem with the repquota command. Individual users can view their own quotas by using just the quota command.
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Quotas Troubleshooting Missing or corrupt quota database Filesystem not mounted with quota options Quotas set too small Quota database not consistent with actual usage
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Missing or Corrupt Quota Database
Initial problems with quotas are usually the result of improper setup of quotas. For example, if the aquota.user or aquota.group files are not created, quotas can not be use with the extended filesystem. Filesystem Not Mounted with Quota Options
If filesystems are not mounted with the usrquota or grpquota filesystem options, quotas will not be enforced. Quotas Set Too Small
Managing a system with quotas often involves modifying quotas for users or programs that demonstrate a need for a larger quota. Use the edquota or setquota command to accomplish this. Quota Database Not Consistent With Actual Usage
Although the Linux quota system is robust, inconsistencies may occur on heavily used systems. Periodic system maintenance to verify and rebuild the quota data files should be scheduled.
It is worth noting that the quotacheck command must examine every file in a quota-enabled filesystem. On large filesystems with many files, this may be a lengthy process.
7-7
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
File Access Control Lists FACLs • Assign different permissions to additional users and additional groups on a given file or directory Create/Modify • setfacl -m [modify a FACL] Delete • setfacl -x [delete a specified FACL] • setfacl -b [delete all FACLs on specified file(s)] View • getfacl [list all FACLs] • ls -l [indicates FACL presence]
FACLs
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
# setfacl -m u:charlotte:rw /depts/recv/ship-log.txt
Every file on Unix systems have one owner and one group for which specific permissions (read, write, and execute) are assigned. If multiple users need access to the same file, they must be members of the group with permissions for that file.
FACLs can be deleted using the -x option as shown in the following example:
This arrangement, while simple to understand and administer, is not very flexible. What if there is more than one group of users who each need to have different permissions for the same file? With the standard Unix user / group scheme, you would have to create multiple copies of the same file and try to keep them synchronized whenever one changes. Obviously, this is not a realistic practice.
Viewing FACLs
What is needed to provide this flexibility is the ability to assign permissions for multiple users and/or groups to a single file. File Access Control Lists, or FACLs, are lists of additional users (and/or groups) and their respective permissions, attached to a single file. package acl
binaries ls, getfacl, setfacl
Creating, Modifying & Deleting FACLs
The -m option to the setfacl command is used to create and/or modify FACLs on one or more files and/or directories. For example, the following command would grant read-write access for the charlotte user to the ship-log.txt file: 7-8
# setfacl -x u:charlotte /depts/payroll/payrates.txt
The ls command indicates that FACLs are present on a file by appending a + to the flags. In the following output, the ship-log.txt file has some FACL set on it while the other.txt file does not:
# ls -l total 1 -rw-rw----+ root root 3738 Dec 29 17:37 ship-log.txt -rw------- root root 251 Nov 3 9:01 other.txt The output of getfacl provides details about any FACLs set on a file:
# getfacl /depts/recv/ship-log.txt # file: ship-log.txt # owner: root # group: root user::rwuser:charlotte:rwgroup::r-mask::rwx other::---
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
FACL Troubleshooting Obstructive Ancestor Directory Permissions • Execute bit(s) not set • No Read access Missing FACL Support • Omitting acl option in /etc/fstab or filesystem's default mount options • Kernel support missing for filesystem type • Kernel support not compiled in for filesystem type • Trying to use FACLs on older kernels Empty FACL Mask(s)
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
FACLs – Common Problems
FACL Masks
Almost all problems with FACLs manifest themselves because something that should be accessible with the currently set FACLs is not.
In most places where a mask is used, the mask value indicates which items are to be blocked from being used. For example, a umask with a value of 002 would indicate that the system will not affect the permissions for the owner (user) or group, will never assign write permission for everyone and not affect read or execute permissions for everyone.
Obstructive Ancestor Directory Permissions
If the proper execute bits are not set on a directory, some or all users (other than root, of course) will not be able to cd into that directory. This will also prevent those users from entering sub-directories. If the read bit corresponding to a particular user, group or for other is not set, users will not be able to read the directory's contents. However, if the execute bit is set on a sub-directory and the user knows the name of the sub-directory, they can cd directly into that sub-directory if the execute bit is also set on the parent directory. Missing FACL Support
FACL support must be compiled into the kernel or the kernel module for each filesystem type. Look for the ACL option in the kernel configuration.
A FACL mask works in the opposite way. If the mask is set to rwx, then read, write and execute permissions on the file will be honored. If the ACL mask is set to r-x, then write permission will not be granted to additional users and/or groups, even if set in the FACL. The mask never affects the standard Unix owner/group/other permissions. For normal day-to-day operations, there is no need for the FACL mask to be set to anything other than rwx. The most common use for some other mask setting is when there is a need to block everyone temporarily from a number of files and directories. Perhaps several user accounts are compromised and you may not know which ones were affected.
With ACL support compiled into the kernel or kernel module for a particular filesystem type, FACLs can still only be used when the filesystem is mounted with the ACL mount option.
7-9
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
SELinux Concepts What is SELinux? SELinux Policies SELinux Configuration Tools
What is SELinux?
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Security Enhanced Linux (SELinux) is a set of kernel extensions built on the Linux Security Module (LSM) framework that provides the ability to implement mandatory access control (MAC) security models for Linux systems. SELinux security policy describes the allowed actions for processes and the kernel will block or log any attempted deviations from what is permitted by the currently loaded policy. SELinux Policies
When the NSA first released the SELinux kernel patches, they also included an example policy. The company Tresys Technology now maintains the SELinux reference policy: http://oss.tresys.com/projects/refpolicy. Specific Linux distributions generally create security policy (based on the reference policy) that is customized to integrate well with their specific set of packages and system configuration, and implement their desired security goals. The SELinux framework supports a wide variety of potential security policies including those based on the concepts of Type Enforcement, Role-based Access Control, and Multi-level Security. [R7] The following applies to RHEL7 only:
The default policy shipped with Red Hat Enterprise Linux is called the targeted policy. It provides security policy for a subset of the system's services (with new services being added with each new targeted policy). With the targeted policy, parts of the system not 7-10
covered by the policy are grouped into an "unconfined" security domain and are essentially unaffected by SELinux. This allows additional applications for which policy does not yet exist to still function normally. SELinux Configuration Tools
SELinux provides a basic set of tools to adjust and manipulate the underlying policy. Many of these tools are part of the policycoreutils package and provide simple behavior adjustments to the current SELinux policy. Assuming the SELinux kernel extensions are loaded, SELinux can be placed in one of two modes: Enforcing or Permissive. In Enforcing mode, the system will prevent processes from violating security policy and will log details of what was denied (note that certain permitted actions are also logged). In Permissive mode, the system will only log each action in violation of policy, but will not actually block the action. The setenforce command can be used to switch between these two modes, and both the getenforce and sestatus commands will report the current mode:
# setenforce 0 # sestatus SELinux status: SELinuxfs mount: Current mode: Mode from config file: Policy version:
enabled /selinux permissive enforcing 21
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Policy from config file:
targeted
SELinux extensions must be enabled at boot time (so that processes started early in the boot process can be assigned the correct security label). Once enabled, the SELinux extensions cannot be disabled, so a reboot is required to switch between these two states. The semanage and setsebool commands can modify most SELinux settings.
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
[R7] The following applies to RHEL7 only:
The system-config-selinux tool provides a comprehensive graphical interface for modifying SELinux settings, mostly equivalent to the semanage and setsebool commands.
7-11
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
SELinux Troubleshooting Is it SELinux? Decoding SELinux denial messages
Is it SELinux?
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Usually the quickest way to identify if a problem is caused by SELinux is to momentarily switch the system from Enforcing mode into Permissive mode and attempt the action again. If the action still fails then SELinux is not at fault and troubleshooting efforts can turn elsewhere. Once SELinux has been identified as the cause of the fault, the next logical step is to examine the logs for details of the denial event. If the audit daemon auditd is running then SELinux log messages will be sent to the /var/log/audit/audit.log file otherwise they will be handled by Syslog and send to the /var/log/messages file. Decoding SELinux Denial Messages
Even though the logs describe exactly what interaction was denied by SELinux, a good deal of knowledge and experience is needed for an administrator to be able to read the log and determine the correct course of action. This can lead to frustration and ultimately administrators disabling SELinux. In some cases, SELinux is deliberately silent about denials due to dontaudit policy statements. To log all denials (e.g. during troubleshooting) run the following to rebuild the policy without dontaudit:
7-12
# semodule -DB
To re-enable dontaudits run by rebuilding the policy:
# semodule -B
To make denial messages easier to understand and identify, Red Hat ships the setroubleshoot package, which provides an easier to read description of what was denied, and suggests actions that could "solve" the problem. [R7] The following applies to RHEL7 only:
When started, the setroubleshootd daemon monitors the log denial events and places additional descriptive text regarding the event in the logs. If a user is logged into a graphical session on the system then the setroubleshoot browser applet will notify the user of the event. This applet provides a simplified description of the event as well as showing what command(s) can be run to "solve" the problem. Careful thought should be given before running the suggested commands as they may disable protective features of SELinux. Keep in mind that the goal of the setroubleshoot tools is to "make things work" not to "make things secure". The sealert command can be used from the command line to get the same detailed information.
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
SELinux Logs The following example denial output was generated by attempting to have the Apache web process read content from a file with the wrong context set:
# tail /var/log/audit/audit.log type=AVC msg=audit(1221063986.617:333): avc: denied { getattr } for pid=1074 comm="httpd" path="/var/www/html/foo" dev=a dm-1 ino=499078 scontext=unconfined_u:system_r:httpd_t:s0 tcontext=unconfined_u:object_r:user_tmp_t:s0 tclass=file type=SYSCALL msg=audit(1221063986.617:333): arch=c000003e syscall=4 success=no exit=-13 a0=7f6a59a55528 a1=7fff602a9ec0a a2=7fff602a9ec0 a3=0 items=0 ppid=1071 pid=1074 auid=50002 uid=48 gid=48 euid=48 suid=48 fsuid=48 egid=48 sgid=48a fsgid=48 tty=(none) ses=1 comm="httpd" exe="/usr/sbin/httpd" subj=unconfined_u:system_r:httpd_t:s0 key=(null) . . . output omitted . . .
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
[R7] The following applies to RHEL7 only:
# tail /var/log/messages . . . snip . . . Sep 10 10:26:26 anakin setroubleshoot: SELinux is preventing the httpd from using potentially mislabeled files (/var/a www/html/foo). For complete SELinux messages. run sealert -l 2057ea76-0d25-4004-8637-d1c24538181a Sep 10 10:26:26 anakin setroubleshoot: SELinux is preventing the httpd from using potentially mislabeled files (/var/a www/html/foo). For complete SELinux messages. run sealert -l 2057ea76-0d25-4004-8637-d1c24538181a # sealert -l 2057ea76-0d25-4004-8637-d1c24538181a Summary:
SELinux is preventing the httpd from using potentially mislabeled files (/var/www/html/foo). Detailed Description:
SELinux has denied httpd access to potentially mislabeled file(s) (/var/www/html/foo). This means that SELinux will not allow httpd to use these files. It is common for users to edit files in their home directory or tmp directories and then move (mv) them to system directories. The problem is that the files end up with the wrong file context which confined applications are not allowed to access. Allowing Access:
If you want httpd to access this files, you need to relabel them using restorecon -v •/var/www/html/foo•. You might want to relabel the entire directory using restorecon -R -v •/var/www/html•. Additional Information:
Source Context Target Context Target Objects . . . output omitted . . .
unconfined_u:system_r:httpd_t:s0 unconfined_u:object_r:user_tmp_t:s0 /var/www/html/foo [ file ]
7-13
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
SELinux Troubleshooting Continued Incorrect File Context • determining correct context • setting contexts • shared contexts Updating File Context Database Allowing a Service on an Alternate Port
Fixing File Contexts
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Files and processes each have a security context or label. The SELinux policy describes the allowed interactions between these contexts. If the context is not correctly set (wrong value or no value) on a file then processes attempting to interact with the file can be denied access.
the following would set contexts for all files under the /var/www/html directory:
# restorecon -R /var/www/html
Allowing Multiple Services to Access Files
Most services protected by the targeted SELinux policy now have manual pages (man service_name_selinux) that describe the correct use of types for that service. Once the correct type has been identified, use the chcon command to set the file's type to the correct value:
If security dictates that only a single service should be able to interact with a file then the SELinux policy will define a service specific type to use for the file(s). For example, a directory being shared by the Apache server is normally assigned the httpd_sys_content_t type, whereas a directory being shared by Samba is assigned the samba_share_t type instead.
# cd /var/www/html # ls -Z index.html -rw-r--r-- root root unconfined_u:object_r:user_tmp_t:s0a index.html # chcon -t httpd_sys_content_t index.html # ls -Z index.html -rw-r--r-- root root unconfined_u:object_r:httpd_sys_cona tent_t:s0 index.html
In addition to the service specific types, the targeted SELinux policy provides types that allow files to be accessed by multiple services (Apache, Samba, NFS, FTP, rsync). The public_content_t and public_content_rw_t types can be used to allow these services read only and write access respectively. When troubleshooting file access problems, remember that a service can also be denied access to a file due to service related SELinux booleans, regular file permissions, or even the service's configuration.
For larger sets of files (e.g. entire nested directories) with differing contexts, or if the correct context for a file is not known, the restorecon command can be used. restorecon uses contexts stored within the policy and can apply contexts based on those stored values. Running semanage fcontext -l will list the filesystem paths and corresponding file contexts used by restorecon. For example, 7-14
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Updating File Context Database
portcon tcp 8100 system_u:object_r:http_port_t:s0
When a new file is created or when a file's context is reset using the restorecon command, the context applied to the file is determined by the file context database stored within the policy. This database can be updated so that the correct file context is applied to files in non-standard, user selected, locations. Changes can be made with the semanage command and are stored in the /etc/selinux/targeted/contexts/files/file_contexts.local file. The following example shows how initially a newly created directory uses the default_t type and the effect of adding a custom entry to the file context database. Notice how both restorecon and initial file creation contexts are affected by the entry:
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
# mkdir /shared; cd /shared # touch example_file # ls -Z example_file -rw-r--r-- root root unconfined_u:object_r:default_t:s0 example_file # semanage fcontext -a -t public_content_rw_t "/shared(/.*)?" # restorecon -R /shared/ # ls -Z example_file -rw-r--r-- root root system_u:object_r:public_content_rw_t:s0 example_file # touch example2_file # ls -Z example2_file -rw-r--r-- root root unconfined_u:object_r:public_content_rw_t:s0 example2_file Allowing a Service on an Alternate Port
SELinux policy defines what type of network sockets can be opened or used by applications running within a protected security context (domain). The list of ports and protocols allowed for various types can be viewed and modified with semanage. The following example demonstrates adding a new entry that would permit the web server process running in the httpd_t security domain to bind to TCP port 8100:
# semanage port -l | grep ^http_port_t http_port_t tcp 80, 443, 488, 8008, 8009, 8443 # semanage port -a -t httpd_port_t -p tcp 8100 # semanage port -l | grep ^http_port_t http_port_t tcp 8100, 80, 443, 488, 8008, 8009, 8443 # cat /etc/selinux/targeted/modules/active/ports.local # This file is auto-generated by libsemanage # Do not edit directly.
7-15
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Lab 7 Estimated Time: S12: 120 minutes R7: 120 minutes
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Task 1: Troubleshooting Problems: Topic Group 4 Page: 7-17 Time: 120 minutes Requirements: b (1 station) c (classroom server)
7-16
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Objectives y Practice troubleshooting related to: user and group accounts, PAM, disk quotas, FACLs
Lab 7
Requirements b (1 station) c (classroom server)
Topic Group 4
Task 1 Troubleshooting Problems:
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Estimated Time: 120 minutes
Relevance Practice solving problems to make it easier to diagnose and fix them in the real-world.
1)
Enter the troubleshooting environment with tsmenu.
# tsmenu
2)
Execute each of the scripts within Troubleshooting Group #4.
7-17
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Content Kernel Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Kernel Modules Troubleshooting . . . . . . . . . . . . . . . . . . . . . 3 Logical Volume Management . . . . . . . . . . . . . . . . . . . . . . . . 4 Creating Logical Volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 LVM Deployment Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 VG Migration, PV Resizing & Troubeshooting . . . . . . . . . . . 7 Software RAID Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 RAID Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 Multipathing Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 SAN Multipathing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Multipath Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Multipathing Best Practices . . . . . . . . . . . . . . . . . . . . . . . . . 15 LDAP and OpenLDAP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 Troubleshooting OpenLDAP . . . . . . . . . . . . . . . . . . . . . . . . 18 NIS and NIS+ (YP) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 NIS Troubleshooting Aids . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 Lab Tasks 22 1. Troubleshooting Problems: Topic Group 5 . . . . . . . . . . 23
z u s e j.h r u i F Chapter d t t iu r t e e b z o R m e : n to TOPIC GROUP 5 d @ e rt s e n b e c o i r L sz. e r fu
8
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Kernel Modules Kernel modules overview Module commands • lsmod • modinfo • modprobe/depmod • insmod/rmmod
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu /etc/modprobe.d/
Kernel Modules
The Linux kernel's modular design allows for additional functionality to be enabled on the fly. Support for kernel features can be compiled directly into the kernel or as modules that can be loaded and unloaded as needed. The kernel, initial ramdisk, and boot loader files are all located under the /boot/ directory. Kernel modules are located in the /lib/modules/$(uname -r) directory.
related packages
kmod, kernel-devel [R7] kernel-{doc,headers,tools} [S12]
kernel-{default,firmware,source}, kernel-{default,xen}-base, kernel-{syms,xen}
binaries
/boot/{vmlinuz*,initrd*} /usr/bin/kmod
The kmod list (or lsmod) command lists all modules that are currently loaded into the kernel. Information about a module (such as supported parameters) can be found with modinfo modulename.
Symbolic links to kmod /sbin/insmod, /sbin/modprobe, /sbin/depmod, /sbin/modinfo, /sbin/rmmod
Additional modules can be loaded with insmod modulename or modprobe modulename. Generally, it is better to use modprobe rather than insmod. modprobe will load dependency modules automatically, determined by the /lib/modules/kernel-version/modules.dep file. The depmod command can be used at any time to create the file and
configs
/boot/config* /boot/grub2/grub.cfg
logs
/usr/bin/dmesg, /var/log/messages [S12] /dev/tty10
data directory
/lib/modules/
/usr/sbin/lsmod [S12] /usr/bin/lsmod [R7]
is configured to run automatically each time the system boots.
The /etc/modprobe.d/ directory contains configuration files (ending in .conf) that configure aliases and other parameters:
[R7]
File: /etc/modprobe.d/foo.conf
/usr/share/doc/kernel-doc*/Documentation/
alias eth0 e100 alias eth1 c59x options sonypi minor=250 options snd cards_limit=1
[S12]
8-2
/usr/src/linux/Documentation/
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Kernel Modules Troubleshooting Incorrect kernel parameters and aliases Modules not installed or module directory misnamed Module dependency database not updated The Kernel ABI and 3rd party kernel modules Parameters in /etc/modprobe.d/ files not taking effect • Is the module in question being loaded in the initial ram disk? If so, rebuild the initial ram disk: dracut -f
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Incorrect Kernel Parameters and Aliases
It is important to edit the /etc/modprobe.d/*.conf files with care. Incorrect module parameters may cause modules to crash or not load. Verify that aliases are correct. This is especially critical when mapping an interface name (e.g. eth0) with a module name (e.g. e1000), though this is often done better with Udev. Modules Not Installed or Module Directory Misnamed
When compiling a kernel from scratch, make sure to run make modules_install to copy the modules to the appropriate sub-directory of /lib/modules/.
Make sure the loaded kernel's modules are in /lib/modules/ in a directory that is named the same as the kernel's release number. If you are unsure what the exact kernel version is for the currently running kernel, use the uname -r command which reports the release number. Module Dependency Database Not Updated
There are many complex dependencies between modules in the kernel. In order for a module to know what other modules to load, a small "database" needs to be built. This is located in the /lib/modules/$(uname -r)/modules.* files.
Kernel RPMs come with a pre-configured module dependency database. When a customized kernel is built and installed (with modules_install) the database is also created. If additional modules
are added to the module directory, the database can be re-created by running the depmod command. The Kernel ABI and 3rd Party Kernel Modules
Kernel modules are compiled specifically against an exact kernel release and will not work or load with a different kernel release. The reason for this is because while the kernel provides strict userspace compatibility (user applications written and compiled for Linux 1.0 in 1994 still run today) and a reasonably stable kernel API (kernel module source code interface), the kernel ABI (kernel module binary interface) is not guaranteed. This is for security, performance, and maintainability reasons documented in the stable_api_nonsense.txt file. Because of the ABI compatibility issue using 3rd party external kernel modules can be extra work as they have to be recompiled or possibly updated for each new kernel installed on the system. It is much easier to stick with modules supplied with the kernel and the Linux development model strongly encourages this. All kernel modules include a vermagic header field that is used as the comparison check against the running kernel. It can be viewed with the modinfo command:
$ modinfo -F vermagic ext4 3.10.0-229.1.2.el7.x86_64 SMP mod_unload modversions
8-3
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Logical Volume Management Hierarchy of concepts • Volume Groups • Physical Volumes • Physical Extents • Logical Extents • Logical Volumes • Filesystems
LVM
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
To ease the pain of filesystem management, Linux has support for logical volumes. The Linux LVM introduces an abstraction layer between the physical disk and the filesystem, which provides benefits such as allowing filesystems to be easily resized, allowing filesystems to use discontinuous disk space, and even to span physical disks. In the 2.6 kernel, LVM2 was introduced and operates on top of a low level volume manager called device mapper. LVM on Linux is implemented through a few conceptual layers. The physical hard drives are called physical volumes. These physical volumes are divided into arbitrary fixed-size physical extents. Logical volumes are constructed from arbitrary pools of physical extents. Filesystems are then built on top of logical volumes. Volume groups are logical management concepts—they're the pool of all physical volumes which supply physical extents for logical volumes.
One of the nicest features of LVM is that it truly abstracts the filesystem from the physical device, making it extremely easy to manipulate filesystems created on top of logical volumes in all sorts of ways that are not possible with traditional partitions. Filesystems, that support hot-resize, can be grown or shrunk on-the-fly, new disks can be added on-the-fly, logical volumes can be moved from one set of underlying physical drives to another, and so forth. In most cases, these operations can even be done while the filesystem is mounted and in use.
8-4
packages lvm binaries
pvchange, pvcreate, pvdisplay, pvscan, vgcfgbackup, vgchange, vgck, vgcreate, vgdisplay, vgextend, vgmerge, vgreduce, vgremove; vgrename, vgscan; lvcreate, lvremove, lvdisplay, lvscan, lvextend, lvchange, lvreduce, lvrename, lvresize
config
/etc/lvm/lvm.conf
log
/var/log/messages
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Creating Logical Volumes Create partitions on drives • partition type 0x8E • fdisk Create Physical Volumes • pvcreate Create Volume Group • vgcreate Create Logical Volumes • lvcreate Create filesystems • mkfs
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Creating the LVM Partitions
Logical Volumes
Creating new logical volumes from free space is fairly straight-forward. First, physical partitions which will be contributing to the logical volume(s) are created and set to type 0x8E. Reboot, if necessary (better yet, use the partprobe command). If using LVM1 run vgscan to create and populate the necessary files and directories under the /etc/ directory. This process is performed automatically with LVM2.
Multiple logical volumes can be created in a volume group using the lvcreate command. Each logical volume has a unique name that can be specified or automatically generated. The two most commonly used options are -L to specify the size, and -n to specify the name. For example:
Physical Volumes
Physical volumes can be a disk partition, a whole disk, a meta-device (RAID device) or a loopback file; any block device may be used. Physical volumes are created with the pvcreate command. The normal usage is:
# pvcreate /dev/disk ... pvcreate -- physical volume "/dev/disk" created Volume Groups
A volume group contains physical volumes which provide physical extents. Physical extents can vary in size and are defined at volume group creation time (the default is 4MB). Physical extents are allocated to logical volumes. Commonly, the PE size is increased to support better performance. Specifying a physical extent size is done with the -s option. To create a VG with physical extents 32MB in size:
# lvcreate -L 650M -n lvname vgname lvcreate -- rounding size up to PE boundary lvcreate -- doing automatic backup of "vgname" lvcreate -- LV "/dev/vgname/lvname" created
Filesystem Creation
Filesystems are created on top of Logical Volumes, this is done with the mkfs command. For example:
# mkfs -t ext3 /dev/vg_name/lv_name . . . output omitted . . .
To make the new logical volume and filesystem available at system boot time, add the appropriate entries to the /etc/fstab file.
# vgcreate -s 32M vgname /dev/disk ... 8-5
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
LVM Deployment Issues LVM provides management not redundancy Deployment expectations • switching to LVM when running out of space • plan to deploy LVM initially • allow for future expansion needs Careful data migration methodology • verify current backups exist first!
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Storage Management vs. Resiliency
Many people have made the mistake of thinking that LVM takes the place of RAID, or that the two are mutually exclusive. On the contrary, RAID & LVM are complimentary technologies which work quite well together. The purpose of RAID is to provide redundancy with the goal of resilience in the face of storage hardware failures. However, RAID does not ease the task of managing storage. In fact, it is often complicated.
The Linux LVM implementation is capable of providing some redundancy in the form of Physical Extent based mirroring. However, this technique cuts the total manageable storage capacity of the Volume Group in half. The best strategy is to use both tools together. Let the strengths of each complement the other. With the reliability provided by RAID as the foundation for LVM. LVM Deployment Planning
Unfortunately, most people discover LVM only when they reach the point where they realize that their current partition scheme proves to be woefully inadequate. They may have large amounts of disk space free in one partition with other critical and heavily used partitions nearly full. Without LVM, reconfiguring the partition layout can prove very 8-6
difficult. With LVM, if all of the existing storage space is already allocated to Logical Volumes, then additional hard drives can be added to the pool. This provides additional storage to allocate as the administrator sees fit. It is a generally accepted best practice to always try to keep 5% – 15% of the available PEs unallocated. This provides a buffer for future needs, without forcing the administrator to add another hard drive. This free space can also be used temporarily, for things like snapshot volumes. Migration Methodology
The best way to convert storage from simple partitioning to managed Logical Volumes is to add new storage space. The new drive(s) become the first PVs in the new Volume Group. Create some LVs to contain as much of the data found in the old partitions as possible. Size these LVs to be just barely big enough for the data they will contain. After copying the data over to each new Logical Volume, make sure to verify that the copy is complete before deleting the old partition. If several old partitions can be deleted at once, then the space they consumed could be used to create one LVM partition. Add these new PVs to the Volume Group. Once all of the old partitions are gone, begin extending the Logical Volumes to sizes that make sense, as they had little to no free space left during this process.
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
VG Migration, PV Resizing & Troubeshooting Migrating Volume Groups • vgexport/vgimport • vgscan Growing PVs and disk upgrades LVM Troublshooting • Missing PVs
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu pvscan
• Prematurely deleted logical volumes
vgmknodes
• Metadata damage vgcfgbackup/vgcfgrestore
Migrating Volume Groups
Consider the scenario where an LVM volume group needs to be moved from one system to another. To do this, move the underlying block devices, containing all of the LVM volume group's PVs, to the new system. This scenario illustrates why having two LVM volume groups per system provides greater flexibility — one for the OS and one for application data. This provides the flexibility to move only the VG with application data or both. Moving a VG is easier if the underlying hard disks contain PVs for only one VG.
Before the PVs are moved however, the Volume Group(s) needs to be deactivated and exported using vgchange and vgexport. Be sure to update /etc/fstab if needed to remove references as well. For example:
[old [old [old [old
system]# system]# system]# system]#
umount filesystems as needed vi /etc/fstab vgchange -a n SQLdataVG vgexport SQLdataVG
Now the storage device(s) containing the PV(s) are moved to the new system. In a SAN environment moving the PVs from one system to another is very easily done via the SAN management interface. With direct attached hard drives, those drives would have to be physically moved. Once the block devices have been moved to the new system, scan, import, activate the Volume Group(s), and edit /etc/fstab if needed.
[new [new [new [new
system]# system]# system]# system]#
vgscan vgimport SQLdataVG vgchange -a y SQLdataVG vi /etc/fstab
Growing a PV
Consider the scenario where a PV has grown in size. This might be because the underlying block device orginates from a SAN LUN or hardware RAID volume that was increased in size. In that case, use pvresize to make LVM aware of the new PV size.
# pvresize -v /dev/sdb Using physical volume(s) on command line Archiving volume group "dataVG" metadata (seqno 14). Resizing physical volume /dev/sdb from 6822 to 7001 extents. Resizing volume "/dev/sdb" to 917765472 sectors. Updating physical volume "/dev/sdb" Creating volume group backup "/etc/lvm/backup/dataVG" Physical volume "/dev/sdb" changed 1 physical volume(s) resized If the underlying disk contains partitions (e.g. a /boot/ partition and a second partition that is the PV), then the PV partition size must be grown before running pvresize. This is usually done be deleting the partition table entry and then re-adding a partition table entry that has the same starting point as the original parition, but the new ending point corresponding with end of the newly enlarged block device. This requires the PV partition to be the last partition on the drive. 8-7
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Upgrading a Hardrive on a Single Drive System
Recovering a Damaged Volume Group
Consider the scenario where a Linux system has a single 320GB hard drive using LVM and it must be upgraded to a 512GB hard drive. Assume that the original hardrive is partitioned in the typical fashion with a /dev/sda1 for /boot/ and /dev/sda2 PV that fills the rest of the drive.
The best and easiest way to recover a damaged Volume Group is by using vgcfgrestore to restore the VG's metadata (a.k.a. the Volume Group Descriptor Area or VGDA) from the backup created with the vgcfgbackup command:
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Poweroff the system, and add the new drive so that both the old and new drives are attached to the system simultaneously as /dev/sda (new drive) and /dev/sdb (old drive). This may require the use of an external drive enclosure. Now boot the system to the rescue environment or a Live CD. Make sure that the filesystems from the old drive are not mounted. From within the rescue environment, deactivate the VG, and then dd the old drive to the new drive.
# vgchange -a n # dd if=/dev/sdb of=/dev/sda bs=4096
Now use fdisk to modify the partition table on the new drive so that the second partiton (the PV) goes to the end of the drive while maintaining the original starting point. This is usually done be deleting the partition table entry and then re-adding a partition table entry that has the same starting point as the original partition, but the end point corresponds with the end of the new disk. Use fdisk with the -u option for exact sector sizing. Once this is done, poweroff the system and remove the old drive from the system. Power on the system. It should boot off of the new drive. Once the system comes up, use pvresize to grow the PV to fill the new, larger partition.
# pvresize -v /dev/sda2 LVM — Common Problems
Most problems with LVM are self-inflicted. This is because LVM is quite robust and its on-disk metadata structures are where LVM is managed. In LVM2, except for backups, the /etc/lvm/ files are not really needed. If the device files in /dev/ are not showing up, run vgscan with the --mknodes option. 8-8
# vgcfgrestore -f /etc/lvm/backup/file
The backup file used by vgcfgrestore must be created from the Volume Group while it is healthy with vgcfgbackup. The backups created by vgcfgbackup are placed in the /etc/lvm/backup/ directory. The text backup file will be named vgname.conf. Old backup files will be renamed to vgname.conf.1.old and so forth. Missing PVs
To replace a missing PV, start with the vgdisplay command with the --partial and the --verbose options. Substitute a new PV of the same size as a lost PV with the pvcreate command with the --restorefile and --uuid options. Repeat for each missing PV, then use vgcfgrestore to restore the Volume Group's metadata. Recovering Deleted Logical Volumes
The only reliable way to recover a deleted Logical Volume, is to use the vgcfgrestore command to restore from a backup that had the now missing LV. If the Logical Volume is still part of the Volume Group, but the device node is missing, use the vgmknodes command to recreate them correctly, using:
# vgmknodes vgname
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Software RAID Overview Partitions use type 0xFD
mdadm • /etc/mdadm.conf (not required) • mdadm -C|--create • mdadm -A|--assemble • mdadm --manage [-R|--run|-S|--stop] • mdadm --manage [-a|--add|-r|--remove]
RAID
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
# mdadm -C /dev/md0 -a yes -l 5 -n 3 -x 1 /dev/sd{a,b,c,d}1
RAID is a method of combining many physical disks to create one virtual disk. These virtual disks can be faster, larger and more reliable than physical disks. RAID partitions must be type OxFD. packages mdadm binaries
/usr/sbin/mdadm
configs
/etc/mdadm.conf
log
/var/log/messages
kernel
/proc/mdstat
mdadm
mdadm is a command used to create, manage, and monitor software RAID devices. Unlike the older raidtools package, which depended on the /etc/raidtab configuration file, mdadm, by default, does not
use a configuration file, and can perform most of its functions without one. mdadm can use the configuration file /etc/mdadm.conf. In fact, it is needed to start software RAID devices containing filesystems other than /. If software RAID is used for the / filesystem then that RAID device is activated from the initial ramdisk during boot.
Software RAID Superblocks
When Linux software RAID was first created all the meta-information about RAID devices was stored in the /etc/raidtab. In order to activate a RAID device the file was required. This fragile situation was rectified as Linux Software RAID evolved through the introduction of RAID superblocks.
Software RAID superblocks (metadata) are created on each block component-device that is part of the RAID device. This metadata provides a unique id for each block device, the type of RAID, status, and other details. The superblocks enable the activation of a RAID device without any external config file. Use mdadm to scan all partitions on the system for RAID superblocks, and display a brief summary of all RAID devices found, with either of the following equivalent:
# mdadm -Ebs -c partitions
# mdadm --examine --brief --scan --config=partitions . . . output omitted . . .
The following demonstrates creating a RAID 5 array with a spare using existing Linux Software RAID partitions: 8-9
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
RAID Troubleshooting Improper use of RAID Activating a RAID device with missing configuration file Prepare for the inevitable
RAID – Common Problems
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Because of the complexity of RAID, there is the possibility for many problems. Improper Use of RAID
RAID supports multiple types of configuration:
RAID 0 ⇒ (striping) provides fast data access by striping data across multiple physical devices. RAID 1 ⇒ (mirroring) provides data integrity by putting exact copies of the same data on multiple physical devices. RAID 4 ⇒ (striping with dedicated parity drive) provides data integrity while requiring less physical disk space than RAID 1, but more than RAID 5. RAID 5 ⇒ (striping with parity) tries to find a middle ground, providing data integrity while requiring less physical disk space than RAID 1 to achieve the same logical disk size. RAID 6 ⇒ (double striping with parity) provides extra reliability at the expense of disk space. Allows two drives to fail compared to only one with RAID 5. Compound RAID levels can be achieved by combining RAID types. For example RAID 1+0—sometimes referred to as RAID 10 (ten)—can be used to achieve both data integrity and speed by mirroring all data then striping across the mirror. Each configuration has advantages and disadvantages.
8-10
Missing /etc/mdadm.conf
RAID devices created without persistent superblocks require a RAID configuration file to start. However, modern software RAID devices using superblocks can be activated without any configuration file using mdadm. For example to start /dev/md0 and /dev/md1 use:
# . # .
mdadm -A -c partitions . . output omitted . . mdadm -A -c partitions . . output omitted . .
-m 0 /dev/md0 . -m 1 /dev/md1 .
Prepare for the Inevitable
Although RAID can increase reliability, hardware failure is unavoidable. Eventually, it will be necessary to add or remove physical devices from a RAID system. Taking the time to practice beforehand on non-critical systems will result in greater confidence and speed responding to critical failures. Such advice may seem unnecessary, but experience has shown otherwise.
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Multipathing Overview Multipath Components • dm-multipath.ko • multipathd • multipath • /etc/multipath.conf
Multipathing
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Linux supports device-mapper multipathing. This is where remote storage block devices are made accessible over Ethernet using multiple portals. related packages [R7] device-mapper-multipath,
device-mapper-multipath-libs [S12] multipath-tools binaries
multipath, multipathd, mpathpersist, kpartx, [R7] mpathconf
configs
/etc/multipath.conf, /etc/multipath/
data directory
/lib/modules/$(unamea -r)/kernel/drivers/md/dm-multipath.ko, [R7]
/usr/share/doc/device-mapper-multipath-*/a multipath.conf [S12]
/usr/share/doc/packages/a multipath.conf.annotated
8-11
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
SAN Multipathing Aggregating multiple paths to storage is referred to as multipathing Provides redundancy and increased throughput device-mapper provides vendor-neutral multipathing configuration and consists of: • dm-multipath (kernel module) • multipath (command) • multipathd (daemon) • kpartx (command) Multipathing creates /dev/mapper/name device file • Name is the WWID, auto-created friendly name, or user defined friendly name Once configured for a LUN, only use it's multipath device
Multipathing Overview
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
The connection between a server and the underlying storage is referred to as a path. Normally, a path consists of the connection from the server to the storage controller via the host bus adapter (HBA). If the path is severed, I/O failure occurs. A path may be severed if an HBA, network cable, networking hardware, or even an entire SAN fails. Multipathing exists in order to protect against such failure. Multipathing allows for the system to use multiple physical paths to a LUN simultaneously in order to provide redundancy, increased throughput, or a combination of the two. Device-mapper multipathing supports any type of block device, though the most commonly used are Fibre Channel Procotol (FCP), Fibre Channel over Ethernet (FCoE), ATA over Ethernet (AoE), and Internet Small Computer System Interface (iSCSI). Device-mapper Multipathing
Many different vendor-specific multipath implementations exist resulting in difficult configuration. Device-mapper multipathing exists in order to provide a consistent method of configuring multipathing under Linux. Regardless of the vendor hardware in use, device-mapper creates a block device under /dev/mapper/ for each LUN attached to the system.
8-12
Device-mapper Multipath Components
Device-mapper multipathing consists of several notable components: dm-multipath ⇒ kernel module responsible for making routing decisions under normal operation and during path failure.
multipath ⇒ command used for initial configuration, listing, and viewing multipathed devices.
multipathd ⇒ daemon that monitors paths, marks failed paths, reactivates restored paths, adds and removes device files as needed, and can be used to monitor and manage individual paths. kpartx ⇒ command used to create device-mapper entries for partitions on a multipathed LUN. It is invoked automatically when the multipath command is used.
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Multipath Configuration /etc/multipath.conf Configuration file sections • defaults • blacklist • blacklist_exceptions • devices • multipaths Define a blacklist if *not* using the find_multipaths setting Sample config: multipath.conf.annotated mpathconf tool generates multipath.conf
/etc/multipath.conf
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Both the multipath command and multipathd are configured using the /etc/multipath.conf file. The configuration file is only used during configuration of device-mapper multipathing. If updated, the multipath command must be run in order to reconfigure the multipathed devices. The file consists of five sections:
defaults ⇒ System-level default configuration blacklist ⇒ Black listed devices - a list of devices which should not be controlled by device-mapper multipathing blacklist_exceptions ⇒ Exceptions to the blacklist - individual devices which should be managed by device-mapper multipathing even if they exist in the blacklist devices ⇒ settings to be applied to individual storage controller devices multipaths ⇒ fine-tune configuration of individual LUNs Detailed explanations of possible configuration options and values may be found in the multipath.conf.annotated file under the /usr/share/doc/ directory tree. Options not configured use default values. Multipath Blacklisting
Care should be taken to include a complete blacklist in /etc/multipath.conf of all the block devices which should not be controlled by the device-mapper multipathing if the find_multipaths setting is not used. A good starting point includes the following
configuration:
File: /etc/multipath.conf
blacklist { devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*" devnode "^sd[a-z][0-9]*" }
Manual Configuration
In earlier versions of multipath, or in newer versions when find_multipaths is not used, multipath attempts to create multipath devices for every non-blacklisted device. This requires a proper blacklist definition. The procedure for first time setup is as follows:
# mpathconf --enable # creates /etc/multipath.conf # vi /etc/multipath.conf # modify blacklist and configure # mpathconf --enable --with_multipathd y # multipath -F # flush unused multipath device maps # multipath -v2 # scan non-blacklisted devices create: mpatha (3600140574eff7bc56a747faaa52d508a) LIO-ORG size=25G features=•0• hwhandler=•0• wp=undef |-+- policy=•round-robin 0• prio=1 status=undef | - 6:0:0:0 sdb 8:16 undef ready running -+- policy=•round-robin 0• prio=1 status=undef - 7:0:0:0 sdc 8:32 undef ready running The --with_multipathd y parameter to mpathconf persistently enables and starts the daemon. After any changes to the 8-13
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
multipath.conf file, run systemctl reload multipathd.
# systemctl enable multipathd
Automatic Configuration
The multipath command can also be used to display a list of the currently-blacklisted devices. For example:
A new feature of multipath is to have it do the right thing, by default, for the typical scenarios without requiring any manual editing of the configuration file. This is done using the find_multipaths setting. With that setting, blacklist is *not* needed. Instead, whenever there are two or more devices with paths to the same LUN, it creates a multipath device. It will also create a multipath device for a LUN if there was a previously configured multipath device for that LUN. This logic makes configuration simpler. The procedure for first time setup is as follows:
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
# mpathconf --enable --find_multipaths y --with_multipathd y Starting multipathd daemon: [ OK ] # multipath -F # flush unused multipath device maps # multipath -v2 # scans for LUNs with multiple paths create: mpatha (3600140574eff7bc56a747faaa52d508a) LIO-ORG size=25G features=•0• hwhandler=•0• wp=undef |-+- policy=•round-robin 0• prio=1 status=undef | - 6:0:0:0 sdb 8:16 undef ready running -+- policy=•round-robin 0• prio=1 status=undef - 7:0:0:0 sdc 8:32 undef ready running The multipath device name to LUN mappings are stored in the /etc/multipath/bindings file. Verifying Configuration
# multipath -v3 -ll | grep blacklist . . . output omitted . . . Controlling Device File Properties
The owner, UID, GID, and name of the multipath device file can be controlled using a UDEV rule. First bring up the multipath using user_friendly_name and determine what the DM_UUID UDEV ENV is set to for the multipath device:
# udevadm info /dev/mapper/mpatha | grep DM_UUID E: DM_UUID=mpath-36001405956a7ebcb7ccd8d Then create an UDEV rule using that DM_UUID value, for example: File: /etc/udev/rules.d/99-mpath.rules + ENV{DM_UUID}=="mpatha-36001405956a7ebcb7ccd8d",a
OWNER:="oracle", GROUP:="dba", MODE:="660",a SYMLINK+="ORAlun01"
This can be useful in situations where an unprivileged application, such as the Oracle Database, needs read/write access to the block device. LVM on top of Multipath Devices
Once multipathing has been configured, the multipath command can be used to display information about multipathed devices. For example:
When using multipath devices as LVM physical volumes, you should configure LVM to ignore the individual paths, otherwise you will get spurious error messages. This is done by adjusting the global_filter in the lvm.conf file. For the example scenario where # multipath -ll mpatha (3600140574eff7bc56a747faaa52d508a) dm-5 LIO-ORG,IBLOCK LVM should only examine /dev/sda (the local disk) and device mapper created devices use: size=25G features=•0• hwhandler=•0• wp=rw
|-+| -+-
policy=•round-robin 0• prio=1 6:0:0:0 sdb 8:16 active ready policy=•round-robin 0• prio=1 7:0:0:0 sdc 8:32 active ready
status=active running status=enabled running
If satisfied with the completed configuration, the multipathd should be started and turned on persistently. For example:
# systemctl start multipathd 8-14
File: /etc/lvm/lvm.conf + global_filter = [ "a|/dev/sda.*|",a
"a|/dev/disk/by-id/dm.*|", "r|.*|" ]
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Multipathing Best Practices What naming method should be used? • user_friendly_names — easier to type • WWIDs — no ambiguity Multipathing boot considerations • Use _netdev fstab option for iSCSI or FCoE LUNs Using LVM on top of multipath device? • Filter out underlying devices in /etc/lvm/lvm.conf Consistent multipath device names across multiple machines • Use same Udev rules on all machines, or if possible • RHEL7: Use same /etc/multipath/bindings file Underlying paths should fail quickly Use queue_if_no_path for critical LUNs
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Multipathing Best Practices
In order to ease troubleshooting, device-mapper may be configured to create human-readable device files under /dev/mapper/ instead of files named after the WWID.
Device-mapper may be told to create device files with names such as /dev/mapper/mpatha, enabled with the user_friendly_names option in /etc/multipath.conf (enabled by default when using mpathconf). For example: File: /etc/multipath.conf
defaults { + user_friendly_names yes }
Instead, if more control over the name is desired for a particular LUN, use a Udev rule. Boot Considerations
With some types of mounts, such as SMB or NFS, the first column of /etc/fstab makes it obvious that the mount point is network-based so no special mount options are required. Since multipathed devices are often network-based, yet appear as local device files in /dev/mapper/, a special mount option _netdev must be used to inform system start up to perform the mount operation for the device.
When configuring /etc/fstab to persistently mount multipathed LUNs, do not attempt to mount using filesystem labels as the system will attempt to read labels from the underlying block devices rather than the multipathed device. Instead, use either the LVM logical volume name or the multipath WWID, Udev alias, or user_friendly_names. Partitioning Multipathed LUNs
While it's possible to partition multipathed LUNs, care must be taken to make certain they are made available to the host OS. If partitioned LUNs are desired, the partitions should be created prior to setting up multipathing, then kpartx should be run to detect the partitions and create device-mapper entries for the partitions. Rather than partitioning the LUN directly, it's usually better to use the LUN as a physical volume for LVM and partition out the available space using LVM if multiple partitions are required. Consistent Device Names
When using the same LUN multipath device across multiple systems, as is commonly done in a cluster, it can ease administration work if the multipath device names are the same across all the systems. If all the systems have identical multipath configurations, having identical Udev rules files accomplishes this goal. [R7] The following applies to RHEL7 only:
Having an identical /etc/multipath/bindings on all the systems is another way to accomplish this goal. 8-15
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Failure Timeouts When using multipath, it is desirable to have the underlying paths detect failures quickly, and once detected, fail quickly. This allows the multipath layer to move pending I/O requests to another path and resume normal operations quickly. How this is done is specific to the particular type of connection used, for example, with FC HBA cards this is usually done with kernel module parameters.
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
If a multipath device contains a critical filesystem (e.g. /), then use the queue_if_no_path option so that I/O will be queued at the multipath layer until a path returns. This is like a hard NFS mount. Path selection
In the commented out defaults section example of /etc/multipath.conf, the path_selector is set to "round-robin 0". The round-robin selector uses each path in a path group equally. Recent dm-multipath implementations also provide the queue-length and service-time and use service-time as the default. These new algorithms allow for path selection based on the number of outstanding I/O requests, or observed service time. LUN resizing
When a LUN has been resized, be sure to rescan for each underlying device (determined with multipath -ll).
# echo 1 > /sys/block/sdX/device/rescan
Once the underlying block devices have been rescanned, *then* resize the multipath device:
# multipathd -k "resize map multipath_device"
Only do this if all paths are active and there are no queued commands. Resetting Multipath Configuration
To reset or remove the multipath configuration from the machine, first stop using any multipath devices, and then run the following commands:
# # # #
systemctl stop multipathd systemctl disable multipathd rm /etc/multipath.conf /etc/multipath/* multipath -F
8-16
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
LDAP and OpenLDAP Lightweight Directory Access Protocol • Network protocol and hierarchical data model • Schema defines permitted data layout • LDIF represents entries and changes as text file OpenLDAP Server • slapd • Support multiple database backends • Provides replication through syncrepl OpenLDAP Tools • Online tools: ldapsearch, ldapadd, etc. • Offline tools: slapadd, slapindex, etc.
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Lightweight Directory Access Protocol
related packages [R7] openldap,
openldap-clients, openldap-servers [S12] openldap2, openldap2-client, yast2-ldap
LDAP stores data in a hierarchy, or tree. Each entry has a unique path, or distinguished name (DN). Entries can have various attributes based on their type or objectClass. Schema defines valid entries and attributes. Entries can be exported, imported, and modified using the LDIF text format. Data is searched and modified using both online, and offline, command tools, and through client programs that use LDAP.
Configuration is divided into three hierarchical sections, each successive section overriding the previous: Global, Backend, and Database. OpenLDAP supports multiple database backends, indexing, and access control mechanisms (specific to general). Configuration can be stored in a text file, referred to as slapd.conf. Alternatively, configuration can be stored in an LDAP directory, referred to as cn=config or slapd.d.
port/protocol
tcp/389 [ldap], tcp/636 [ldaps]
Online binaries
/usr/bin/ldapadd, /usr/bin/ldapdelete, /usr/bin/ldapmodify, /usr/bin/ldapmodrdn, /usr/bin/ldappasswd, /usr/bin/ldapsearch, [R7] /usr/sbin/slapd [S12] /usr/lib/openldap/slapd
Offline binaries
/usr/sbin/slapadd, /usr/sbin/slapcat, /usr/sbin/slapindex, /usr/sbin/slaptest
configs
/etc/openldap/ldap.conf, /etc/openldap/slapd.conf, /etc/openldap/schema/*, /etc/openldap/slapd.d/
log
syslog local4 facility
data directory
/var/lib/ldap/
user/group
ldap/ldap
8-17
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Troubleshooting OpenLDAP View log messages: journalctl -u slapd • RHEL7 - Or, must configure Rsyslog to write local4 messages • SLES12 - local0-7 facility logging done by default
/var/log/localmessages Configure OpenLDAP to log to syslog • Add loglevel to /etc/openldap/slapd.conf • (re)start slapd Configuration files are leading whitespace sensitive • Used to continue a previous line (much like LDIF files) Sensitive comment style • A hash (#) only begins a comment if it is the first non-whitespace character on a line
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Permission/Ownership of Database Files
When using the slapadd command as root to bulk import data be sure to fix the permissions on the database files afterwords:
# chown -R ldap.ldap /var/lib/ldap
If SSL is required for access, it is common for client configurations to neglect enabling SSL in the ldap.conf file, the server certificate to be improperly signed (or generated), or the key to have incorrect permissions (only root should own and have access to the key). This will result in access problems to the LDAP data. slapd.conf Comment and Whitespace Issues
Unsurprisingly, the slapd.conf treats whitespace like an LDIF file. Namely, whitespace at the start of a line indicates a continuation of the previous line. Be sure that during editing of the configuration file you don't accidentally introduce leading whitespace. For example, the following would be a syntax error: File: /etc/openldap/slapd.conf
rootdn rootpw
"cn=Manager,dc=gurulabs,dc=com" AtVamNatno
The # character only starts a comment if it is the first non-whitespace character on a line. Attempts to comment out only part of a line, such as the following example, will result in a syntax error:
8-18
File: /etc/openldap/slapd.conf → sasl-secprops noanonymous,noplain #,noactive
Syslog Considerations
When logging is turned on, the slapd daemon does its logging via syslog using the local4 facility by default. [R7] The following applies to RHEL7 only:
On RHEL7, this facility is not recorded anywhere by default. Modify the /etc/rsyslog.conf file by adding a new line to write messages using the local4 facility, and asynchronous writes, to the slapd log file: File: /etc/rsyslog.conf + local4.*
-/var/log/slapd
Since this new log file /var/log/slapd does not exist, it must be created and rsyslogd restarted:
# touch /var/log/slapd # /etc/init.d/rsyslog restart [S12] The following applies to SLES12 only:
On SLES12, all local0-7 facilities are logged to the file /var/log/localmessages by default in an asynchronous fashion. This is where the OpenLDAP log messages will appear.
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
OpenLDAP Logging Troubleshooting any service is difficult at best without any feedback from the daemon(s) or client(s). The supplied OpenLDAP configuration does not enable logging. Therefore, the first step when trying to troubleshoot LDAP services is to enable logging.
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Edit the /etc/openldap/slapd.conf file to instruct slapd to write messages to syslog and to specify what categories of messages to write. This is done with the loglevel directive.
The value given to the loglevel directive ranges from 1 to 4095 and is selected by adding the values in the table together for each item for which logging is desired. loglevel description 1
trace function calls
2
debug packet handling
4
heavy trace debugging
8
connection management
16
print out packets sent and received
32
search filter processing
64
configuration file processing
128
access control list processing
256
stats log connections/operations/results
512
stats log entries sent
1024
print communication with shell backends
2048
entry parsing
8-19
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
NIS and NIS+ (YP) Network Information Service • RPC service for sharing Unix authentication data • Network Information Service Plus (NIS+) Proprietary version replacing YP/NIS Tools and Files • /etc/nsswitch.conf • /usr/lib/yp/ • /var/yp/ • /etc/yp.conf • /etc/defaultdomain RHEL7: /etc/sysconfig/network • ypserv, ypbind, ypinit, rpcbind, rpcinfo
Network Information Service
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Developed by Sun Microsystems in the mid-80s, Sun Yellow Pages, later called Network Information service (NIS), provided a service sharing the user and group databases on a remote server that could be used for authentication by clients. Coupled with the Network File System (NFS), this provided a local area network authentication solution. Related NIS IETF RFCs include 2307, 3898. NIS+ is primarily proprietary, and not implemented outside Solaris. NIS uses RPC and portmap.
In 2012, Oracle Corporation discontinued both NIS and NIS+ on Solaris 11.1. Maintenance of NIS, and to some degree NIS+, still continues for Linux, (see http://www.linux-nis.org/). NIS and NIS+ have mostly been replaced with LDAP.
8-20
related packages
ypbind ypserv yp-tools
Related binaries
ypbind, ypcat, ypinit, ypserv, yppasswdd, yppush, ypwhich, rpcbind, rpcinfo, ypdomainname, rpc.ypxfrd
Configuration Files /etc/yp.conf, /etc/nsswitch.conf, /etc/defaultdomain, /var/yp/securenets, [R7]
data directory
/etc/sysconfig/network
/var/yp/
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
NIS Troubleshooting Aids Debug rpcbind • rpcinfo -p Debug NIS • ypcat • ypwhich
Troubleshooting NIS
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Most often, problems with NIS are related to the RPC-based nature of the NIS protocol. RPC is commonly filtered by host firewalls. Furthermore, many hosts do not run the rcpbind daemon by default. To eliminate RPC as a source of NIS problems, first verify that rpcbind is running on both client and server systems:
$ pgrep -lf rpcbind 3294 rpcbind . . . snip . . .
$ ypwhich . . . output omitted . . .
In addition, the client can display the contents of the NIS maps shared by the server. This can be done by running this command where map is the name of the NIS map to be displayed:
If it is running on both client and server, make sure that the client can communicate with rpcbind on the server by running the following command on the client:
$ rpcinfo -p server program vers 100000 2 . . . snip . . .
If RPC seems functional, a couple of commands can be used to test basic NIS functionality. This command can be run on a client host to verify that it is bound to an NIS server:
proto tcp
port 111
$ ypcat map . . . output omitted . . .
portmapper
Make sure that the server can communicate with rpcbind on the client by running the following command on the server:
$ rpcinfo -p client program vers 100000 2 . . . snip . . .
proto tcp
port 111
portmapper
8-21
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Lab 8 Estimated Time: S12: 120 minutes R7: 120 minutes
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Task 1: Troubleshooting Problems: Topic Group 5 Page: 8-23 Time: 120 minutes Requirements: b (1 station) c (classroom server)
8-22
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Objectives y Practice troubleshooting related to: kernel modules, LVM, RAID, and LDAP
Lab 8
Requirements b (1 station) c (classroom server)
Topic Group 5
Task 1 Troubleshooting Problems:
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Estimated Time: 120 minutes
Relevance Practice solving problems to make it easier to diagnose and fix them in the real-world.
1)
Enter the troubleshooting environment with tsmenu.
# tsmenu
2)
Execute each of the scripts within Troubleshooting Group #5.
Bonus
3)
NIS is offered as a bonus, but is not required.
8-23
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Content DNS Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 DNS Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 DNS Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Apache Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Apache Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Apache Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 FTP Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 FTP Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Squid Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 Squid Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Lab Tasks 12 1. Troubleshooting Problems: Topic Group 6 . . . . . . . . . . 13
z u s e j.h r u i F Chapter d t t iu r t e e b z o R m e : n to TOPIC GROUP 6 d @ e rt s e n b e c o i r L sz. e r fu
9
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
DNS Concepts Resolves Internet host names to IP addresses Client/Server based Diagnostic commands • dig • host
DNS
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu data directory [R7] /var/named/
Domain Name Service (DNS) is a crucial service for Internet use as it provides the name resolution service that translates domain names to numeric IP addresses and vice versa. packages
bind, bind-utils, [R7] bind-chroot, bind-dyndb-ldap [S12] bind-chrootenv, bind-doc
port/protocol 53/tcp,udp – domain binaries
configs
log
9-2
/usr/sbin/named, /usr/sbin/named-{checkconf,checkzone} /usr/sbin/rndc, /usr/sbin/rndc-confgen /usr/sbin/dnssec-{keygen,signzone} /usr/bin/dig, /usr/bin/host, /usr/bin/nslookup, /usr/bin/nsupdate /etc/resolv.conf /etc/sysconfig/named /etc/named.conf /etc/rndc.conf /var/log/messages
[S12]
user/group
/var/lib/named/
named/named
DNS consists of servers and clients. A Linux system can be either or both. A widely used DNS server is the Berkeley Internet Name Domain (BIND) and is included with most Linux distributions, although there are many other DNS server alternatives.
The command-line tools dig and host are available for diagnosing DNS issues. These tools query DNS servers and report the results.
# host -t MX redhat.com redhat.com mail is handled by 10 mx1.redhat.com. redhat.com mail is handled by 10 mx3.redhat.com. redhat.com mail is handled by 20 mx2.redhat.com. BIND can be controlled through systemd:
# systemctl restart named
[R7] The following applies to RHEL7 only:
When using the named-chroot package, on RHEL7 the service named is named-chroot.
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
DNS Troubleshooting Can't resolve domain names Changes to zone not propagating Clients cannot access DNS server Changes to /etc/named.conf not affecting named Validating named configuration files
DNS – Common Problems
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Changes To /etc/named.conf Not Affecting named
The following is a list of common problems: Cannot Resolve Domain Names
The first issue to tackle is network connectivity to the name server. If there are no network problems interfering with the connection to the server, check the contents of /etc/resolv.conf.
Linux can place BIND in a change rooted environment for additional security. In this environment, the BIND server only sees the files it needs and can not see any of the other files on the system. Because of this, files previously found under the /var/ and /etc/ directories are now found in the change rooted path. [R7] The following applies to RHEL7 only:
Changes To Zone Not Propagating
The base directory that a change rooted BIND process uses is:
Check the serial number in the zone record, if it isn't incremented when making changes to the record, other servers will not recognize that there is an updated record.
/var/named/chroot/
Check the BIND log messages for syntax error introduced by the record changes. If there are syntax errors detected, BIND will not load the updated zone record.
Perhaps the most common syntax error is failing to put a trailing period at the end of fully qualified domain names. The following line would be erroneous in a zone file because it is missing a period at the end of bar.domain.com.
foo
IN
CNAME
[S12] The following applies to SLES12 only:
The base directory that a change rooted BIND process uses is:
/var/lib/named/
Validating Named Configuration Files
BIND comes with two utilities for checking the syntax of the named.conf and zone files. The use of the named-checkconf and named-checkzone commands can quickly identify errors.
bar.domain.com
Clients Cannot Access DNS Server
When a DNS server is running but other hosts can't query the server, the IP address and/or port the server is listening on may not be correct. Check the listen-on parameter in the named.conf file. 9-3
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
DNS Troubleshooting Cannot control remote BIND server with rndc • Matching keys • Access controls preventing communication
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Cannot Control Remote BIND Server with rndc
Version 9 of BIND allows remote administrative control of the server using the rndc command. If rndc can't communicate with the BIND server, verify that both rndc and named are loading a valid key configuration. [R7] The following applies to RHEL7 only:
On RHEL7 systems, the /etc/rndc.conf file specifies the key used by the rndc program. [S12] The following applies to SLES12 only:
On SLES12 systems, the /etc/named.d/rndc-access.conf file specifies the key used by the rndc program. Connections to named Blocked by Configuration
Another thing that prevents connections from rndc is an incorrect controls section in the named.conf file. The following example of a controls block grants access from localhost, using the key labeled rndckey and to the host 192.16.1.25 using the key labeled key2: File: named.conf
controls { inet 127.0.0.1 allow {localhost;} keys {rndckey;}; inet 192.16.1.25 allow {192.16.1.22;} keys {key2;}; }; 9-4
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Apache Concepts Web (HTTP) server • httpd.conf [main config file] RPM Package and File names • RHEL7 names packages: httpd-* • SLES12 names packages: apache2-*
About Apache
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu log
Apache is the most widely used web (HTTP) server software in the world. packages
httpd, httpd-manual, httpd-devel, mod_ssl, mod_dav_svn [S12] apache2, apache2ctl apache2-doc, apache2-example-pages, apache2-devel, yast2-http-server, apache2-prefork, apache2-worker, apache2-mod_perl, apache2-mod_php5, apache2-mod_python, apache2-mod-apparmor [R7]
port/protocol 80/tcp, 443/tcp (typically)
/var/log/httpd/* [S12] /var/log/apache2/* [R7]
data directory [R7] /var/www/html/ [S12]
user/group
/srv/www/htdocs/
apache/apache [S12] wwwrun/wwwrun [R7]
When loading, the Apache server reads the httpd.conf configuration file to determine how it should run, what applicable permission settings to apply, what sites to serve, and what modules should be loaded. Apache uses the apachectl command for handling the daemon.
/usr/sbin/httpd [S12] /usr/sbin/httpd2
binary
[R7]
config
[R7]
/etc/httpd/conf/httpd.conf /etc/httpd/conf.d/* /etc/httpd/conf.modules.d/* [S12] /etc/apache2/*conf /etc/apache2/conf.d/*
9-5
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Apache Troubleshooting Validating configuration syntax Path errors Apache user issues
Apache – Common Problems
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Most Apache problems are the result of configuration errors. Apache usually won't start if there are syntax errors or if a file path specified in a directive does not exist. Validating Configuration Syntax
Apache includes a utility called apachectl which can be used to manage the httpd daemon (start, stop, check status, etc.). apachectl is also used to validate the syntax of a configuration file using the configtest argument.
# apachectl start Starting httpd: (2)No such file or directory: httpd:a could not open error log file /etc/httpd/mylogs/error_log. Unable to open logs Apache User Issues
Apache initially runs as root and immediately changes to a less privileged user. It's crucial that the user Apache is running and can read files, and access directories, served by Apache. [R7] The following applies to RHEL7 only:
[S12] The following applies to SLES12 only:
On RHEL7 systems, Apache runs as user apache by default.
On SUSE Linux Enterprise Server, this is named apache2ctl.
[S12] The following applies to SLES12 only:
Path Errors
On SLES12 systems, Apache runs as user wwwrun by default.
Another common error found in Apache are directives that contain a path name that refers to a file or directory that does not exist. Apache will check path directives when starting and if any paths do not exist will fail to start. The configtest syntax validation checks some paths, but not all.
# apachectl configtest Syntax error on line 273 of /etc/httpd/conf/httpd.conf: DocumentRoot must be a directory
Other error messages will be printed to the screen when you start or restart Apache. 9-6
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Apache Troubleshooting Handlers, Filters, and Modules More information
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Handlers, Filters and Modules
[R7] The following applies to RHEL7 only:
Apache determines how to process requested files and programs by directives listed in the configuration file(s). For example, to configure Apache to process files ending with a file suffix of .shtml for server-side includes, these directives could be used in the configuration file:
The httpd-manual package will install the documentation to the /var/www/manual/ directory. The Apache configuration directives can be found at http://localhost/manual/mod/directives.html.
File: httpd.conf + AddType text/html .shtml + AddOutputFilter INCLUDES .shtml
[S12] The following applies to SLES12 only:
The apache2-doc package will install the documentation to the /usr/share/apache2/manual/ directory. The Apache configuration directives can be found at /usr/share/apache2/manual/mod/directives.html.
These two directives tell Apache the MIME type (text/html) and that a special filter (INCLUDES) should be used when parsing .shtml files. Another common example of an Apache directive is processing CGI scripts. To have Apache process files that end with the extension .cgi as CGI applications, add directives similar to the following example to Apache's configuration file. File: httpd.conf + AddHandler cgi-script .cgi More Information
For more information about the current version of Apache configuration directives, see
http://httpd.apache.org/docs-2.2/mod/directives.html. 9-7
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
FTP Concepts vsftpd – The Very Secure FTP Daemon • Default FTP daemon in most Linux distros highly secure high performance • FTP Daemon recommended by SANS
FTP
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
vsftpd, the Very Secure FTP Daemon, is now the default FTP server for most major Linux distributions. vsftpd is designed for high security and performance. package
vsftpd
port/protocol
21/ftp, 20/ftp-data
binary
/usr/sbin/vsftpd
configs
[R7]
log
/var/log/vsftpd.log [R7] /var/log/xferlog
/etc/vsftpd/vsftpd.conf [S12] /etc/vsftpd.conf
data directory [R7] /var/ftp/ [S12]
user/group
/srv/ftp
ftp/ftp
vsftpd can be run in either of two different modes. It can be run stand-alone, as a daemon which directly listens for incoming network requests, or it can be managed by the Xinetd daemon whenever necessary to answer an incoming FTP request.
vsftpd has one configuration file, vsftpd.conf. This configuration file is used to specify basic options about how vsftpd should operate, 9-8
such as what directories can be downloaded and how download attempts should be logged by the server.
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
FTP Troubleshooting No anonymous upload File and directory permissions
FTP – Common Problems
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
The main configuration file for vsftpd is vsftpd.conf. It is easy to understand and well commented. Out of the box, vsftpd is set up as an anonymous FTP server. For security reasons, it is critically important to make sure that anonymous upload is not accidentally turned on. Careful consideration should be given before deploying a publicly accessible FTP server that allows uploads (to prevent it from being abused). File and Directory Permissions
A common problem when configuring FTP servers is dealing with permissions. Remember that you must consider both the filesystem permissions and also any permissions the FTP server is imposing. As a general rule, files must be readable by the user that the daemon is running as; directories must be readable and executable. [R7] The following applies to RHEL7 only:
Don't forget the SELinux security context of files. The public_content_t type enforcement is needed for read access, public_content_rw_t is needed for write access to incoming directories and the allow_ftpd_anon_write boolean needs to be enabled:
# setsebool -P allow_ftpd_anon_write on See the ftpd_selinux(8) manual for details.
9-9
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Squid Concepts Popular proxy server • caching • access control • logging Mostly used for web proxying, though Squid supports proxying of other protocols, as well
About Squid
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Squid is a full-featured caching proxy server, primarily used for HTTP(S) & FTP. It can be used to limit access to web resources, to accelerate web access via caching, as an incoming web accelerator (so-called httpd-accelerator mode) or provide simple web proxy services. Squid has a comprehensive security model based on access control list (ACL) rules. package
squid
port/protocol
3128/tcp
binary
/usr/sbin/squid, /usr/sbin/squidclient
config
/etc/squid/squid.conf
log
/var/log/squid
data directory [R7] /var/spool/squid/ (for caching) [S12]
/var/cache/squid/ (for
caching) user/group
squid/squid
Squid is configured with the /etc/squid/squid.conf file. The default configuration file provided by the squid RPM configures it to listen for requests from localhost and nothing else.
9-10
The most common modification to the configuration file is to permit connections from other hosts. For example, to configure Squid to accept connections from two specific networks two lines are added to the squid.conf file. File: /etc/squid/squid.conf + acl our_networks src 192.168.1.0/24 192.168.2.0/24 + http_access allow our_networks
The first line creates a new source IP ACL (src) labeled our_networks and associates the networks 192.168.1.0/24 and 192.168.2.0/24 to the label. The second line grants HTTP access to whatever is defined by the our_networks acl. It is important that this line comes before the default http_access statement that denies all access other than localhost. If modifications are made to the squid.conf file, Squid should be reloaded or restarted to pickup these changes:
# systemctl restart squid . . . output omitted . . .
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Squid Troubleshooting Caching considerations • RAM used • Disk space used Corrupt cache directory
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Squid – Common Problems
Corrupt Cache Directory
Most Squid problems are ACL-related and can be addressed by editing the squid.conf file.
The Squid cache can become corrupted. Power failure, a bug in the Squid code or a filesystem issue are the usual culprits. When this happens, Squid can not restart until the cache file structures are rebuilt. Recursively delete the cache_dir and start Squid again:
Caching Issues
If Squid is configured to do caching, consider the amount of memory and storage space dedicated to caching. The cache_mem directive in squid.conf dictates how much RAM to use. The cache_dir directive specifies a directory to use for the cache data. The first parameter after the directory name is the amount of disk space to use. The rule of thumb is that this should be about 20% lower than raw disk space you are willing to give to Squid.
# # # . #
systemctl squid stop rm -rf /var/spool/squid/* squid -z . . output omitted . . . systemctl squid start
File: /etc/squid/squid.conf + cache_dir ufs /var/spool/squid 3192 16 256
The last two parameters tell squid how many sub-directories to create under the cache directory. These sub-directories are where cached objects will actually be stored.
9-11
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Lab 9 Estimated Time: S12: 120 minutes R7: 120 minutes
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Task 1: Troubleshooting Problems: Topic Group 6 Page: 9-13 Time: 120 minutes Requirements: b (1 station) c (classroom server)
9-12
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Objectives y Practice troubleshooting related to: DNS, Apache, vsftpd, and Squid Requirements b (1 station) c (classroom server)
Topic Group 6 Estimated Time: 120 minutes
Enter the troubleshooting environment with tsmenu.
# tsmenu
2)
Task 1 Troubleshooting Problems:
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Relevance Practice solving problems to make it easier to diagnose and fix them in the real-world.
1)
Lab 9
Execute each of the scripts within Troubleshooting Group #6.
9-13
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Content Samba Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Samba Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Postfix Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Postfix Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Postfix Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 IMAP & POP Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 IMAP/POP Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . 10 MariaDB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 MariaDB Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Lab Tasks 13 1. Troubleshooting Problems: Topic Group 7 . . . . . . . . . . 14
z u s e j.h r u i F Chapter d t t iu r t e e b z o R m e : n to TOPIC GROUP 7 d @ e rt s e n b e c o i r L sz. e r fu
10
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Samba Concepts smbd • file and print capabilities • authentication services
nmbd • NetBIOS name services • computer browser services
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu /etc/samba/smb.conf
Samba
log
Samba is a network file and print server for the SMB protocol. It allows Unix systems to inter-operate with Microsoft Windows clients and servers. related packages samba, samba-client, samba-winbind
samba-common, samba-winbind-clients [S12] samba-doc, yast2-samba-client, yast2-samba-server [R7]
port/protocol
binaries
configs
10-2
139/tcp,udp 138/tcp,udp 137/tcp,udp 445/tcp,udp
[NETBIOS session] [NETBIOS datagram] [NETBIOS name service] [SMB over TCP/IP]
/usr/sbin/smbd, /usr/sbin/nmbd, /usr/bin/smbstatus, /usr/bin/smbpasswd, /usr/bin/net, /usr/bin/nmblookup, /usr/bin/smbclient, /usr/bin/testparm, /usr/bin/pdbedit, and many more...
/etc/samba/smb.conf, /etc/samba/smbpasswd, /etc/samba/smbusers, /etc/samba/*tdb, /etc/samba/lmhosts, /etc/sysconfig/samba
/var/log/samba/*
data directory Specified in /etc/samba/smb.conf user/group
root/root
Samba Daemons
Samba has two primary daemons that provide its various services:
smbd ⇒ Provides the file and print capabilities and also handles authentication services.
nmbd ⇒ Provides NetBIOS name registration and name resolution services and advertises what NetBIOS based services are registered on the system.
The main Samba configuration file is located in /etc/samba/smb.conf. The file is well commented. Additional information is available via man smb.conf. Samba is managed using the systemctl command:
# systemctl start smb
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Samba Troubleshooting Debugging Tools • testparm • Adjustable log levels Printing Issues Share Permissions Authentication Problems
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Samba – Common Problems
Share Permissions
Samba problems can be complex because the task of providing transparent file, print, and authentication services between Microsoft Windows and Linux systems is so complex. Fortunately, because of Samba's popularity, there are many online resources and documentation available for troubleshooting problems.
The combination of share and filesystem permissions will determine the level of access a user or group has to the files being shared. When a user attempts to connect to a share, the share permissions are checked first. Share permissions may specify a list of users or groups that are allowed to connect to the share. Share permissions may specify the level of access that a user has (read only, read-write, etc). After a user connects to the share and attempts a filesystem operation, the underlying filesystem permissions are checked.
The testparm command is a syntax checker for Samba's configuration files. After making changes to the smb.conf file, use testparm to validate the syntax. Validation can save time and identify syntactical errors that may exist in the configuration file. Samba can write very detailed messages to log files. The level of detail can be adjusted with the log level directive in the smb.conf file.
Forgetting about the interactions between these two different security mechanisms may cause administrative headaches. For example, a user is denied write access to a share even though it appears that they should have sufficient file and directory permissions to complete the operation.
Printing Issues
Authentication Problems
One of the most common problems when configuring a Samba print server is having the wrong printing subsystem selected. An example of this is having CUPS installed, but using LPRng in the Samba configuration file. As CUPS becomes the standard print system provided by distributions, this problem will become less common. Make sure that each printer share has the printable option set.
Samba allows for many backend authentication mechanisms, all of which have their own set of potential problems. Obviously, if one of these authentication systems isn't working properly, such as an LDAP server being down or mis-configured, Samba will not authenticate correctly and deny users access to shares.
Another important note to remember is that the spool directory for the printer must be writable by the Samba user.
A common example of this type of problem is that the Samba user does not exist on the system, or the SMB password is out of sync with the Unix password.
10-3
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Postfix Concepts Advantages of postfix • Fast • Secure • Simplified
Postfix
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Postfix is developed by Wietse Venema, the security expert who also created TCP Wrappers. He wrote Postfix while working at IBM's TJ Watson Research facility. In collaboration with Dan Farmer, Venema also developed the System Administrator Tool for Analyzing Networks (SATAN) and The Coroner's Toolkit (TCT).
package
postfix
port/protocol
25/tcp (smtp) 465/tcp (smtps)
binaries
/usr/sbin/post{alias,cat,conf,drop,fix,kick}, /usr/sbin/post{lock,log,map,queue,super} /usr/sbin/smtp-sink, /usr/sbin/smtp-source, /usr/sbin/sendmail, /usr/bin/mailq, /usr/bin/newaliases, [R7] /usr/bin/rmail.postfix [S12] /usr/sbin/mkpostfixcert, /usr/sbin/qmqp-source
configs
/etc/postfix/* /etc/aliases
log
[R7]
Wietse initially started the Postfix project in an attempt to provide an alternative to the widely-used Sendmail program. Sendmail is notorious for its difficult configuration syntax. Sendmail has also had a very poor security record. Postfix attempts to be fast, easy to administer and secure, while at the same time being Sendmail compatible.
Postfix processes multiple requests before terminating, which helps to reduce process creation overhead. Postfix is said to be up to three times as fast as its nearest competitor. Postfix uses multiple layers of defense to protect the local system. Almost every Postfix daemon can run in a chroot jail with low, fixed privileges. Postfix is designed to behave well under stress and the software will back off if the local system runs out of memory or disk space.
10-4
/var/log/maillog [S12] /var/log/mail
data directory /var/spool/postfix/ user/group
User: postfix Group: mail [R7] Group:
postdrop [S12] Group: maildrop
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
[R7] The following applies to RHEL7 only:
RHEL7 uses alternatives; as a result, Postfix and Sendmail can be installed at the same time. When Postfix is enabled, the *.postfix binaries will be linked without the .postfix extension. Because care has been taken to make Postfix compatible with Sendmail, programs that assume the existence of /usr/sbin/sendmail should work fine with the Postfix version of the sendmail command instead.
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
[S12] The following applies to SLES12 only:
SLES12 does not allow Postfix and Sendmail to be installed at the same time because certain files conflict with each other. However, the YaST installation tool can be used to painlessly switch between available MTAs. It will deal with the dependencies and conflicts, simultaneously.
10-5
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Postfix Troubleshooting Logging Comments New aliases
Logging
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Postfix uses the syslog interface for all message logging. It sends all operations related messages with a facility of mail (messages about the daemon process itself may appear in /var/log/messages).
By default, modern Syslog daemons log all messages asynchronously (meaning that the disk write cache is used). This prevents slowdowns from occurring even under heavy logging load.
the end of the line. For example: File: /etc/postfix/main.cf
mydestination = $myhostname, localhost.$mydomain, #localhost.localdomain, $mydomain, www.$mydomain, verdande.$mydomain, bartholomew.$mydomain, mail.$mydomain, ns.$mydomain, ftp.$mydomain
This would be interpreted by Postfix as:
[R7] The following applies to RHEL7 only:
Red Hat Enterprise Linux uses the /var/log/maillog file for email logs. [S12] The following applies to SLES12 only:
There are four email related log files that are used:
/var/log/mail ⇒ All email related log messages. /var/log/mail.info ⇒ Just info level email related log messages. /var/log/mail.warn ⇒ Just warn level email related log messages. /var/log/mail.err ⇒ Just err level email related log messages. Comments
When working with comments and multi-line options, be aware that behavior is different depending upon whether the comment is preceded by white space or not. Comments beginning with white space before the # extend only to 10-6
# postconf mydestination mydestination = $myhostname, localhost.$mydomain,a verdande.$mydomain, bartholomew.$mydomain,a mail.$mydomain, ns.$mydomain, ftp.$mydomain
Comments beginning without white space before the # proceed to the end of the option. For example: File: /etc/postfix/main.cf
mydestination = $myhostname, localhost.$mydomain, # localhost.localdomain, $mydomain, www.$mydomain, verdande.$mydomain, bartholomew.$mydomain, mail.$mydomain, ns.$mydomain, ftp.$mydomain
This would be interpreted by Postfix as:
# postconf mydestination mydestination = $myhostname, localhost.$mydomain
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
This allows great flexibility when configuring Postfix, but requires that care be exercised in the use of white space. Also, remember that for # to be interpreted as a comment, it must be the first non-white space on the line. For example: File: /etc/postfix/main.cf
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
mydestination = $myhostname, localhost.$mydomain, localhost.localdomain, $mydomain, www.$mydomain, verdande.$mydomain, #bartholomew.$mydomain, mail.$mydomain, ns.$mydomain, ftp.$mydomain Here is what Postfix sees:
# postconf mydestination mydestination = $myhostname, localhost.$mydomain,a localhost.localdomain, $mydomain, www.$mydomain,a verdande.$mydomain, #bartholomew.$mydomaina mail.$mydomain, ns.$mydomain, ftp.$mydomain
Postfix actually interprets #bartholomew.$mydomain as a local destination. The probable intention of using the hash symbol in #bartholomew.$mydomain was to remove or comment it out from the mydestination variable. Again the hash symbol must be the first non-white space on the line to be interpreted as a comment.
Because of the confusion which might arise from this, Postfix will log a warning whenever its configuration files contain a pound sign that is not the first non-white space on a configuration line. New Aliases
Postfix supports multiple alias files. In general, however, the only one used is either /etc/aliases or /etc/postfix/aliases. Changes to hash based alias files do not immediately take effect. Instead, the Postfix alias database must be rebuilt. This can be done by running the newaliases command, which will scan all alias files, or with the postalias command, which must be told the specific file to use:
# newaliases # postalias hash:/etc/aliases
10-7
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Postfix Troubleshooting Not configuring the root alias Relaying Un-configured domains
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Not Configuring the root Alias
Un-configured Domains
Important notifications are regularly sent to the root account. An alias should probably be added to forward these messages to a real person.
Postfix will reject mail for any domains it has not been configured to accept. By default, it will only accept mail for localhost and the hostname of the server. It can be configured to accept mail for other domains by adding the domain to mydestination, relay_domains or a virtual host configuration. If added to mydestination, a local user account or alias must exist for the addressee. Obviously, if added to relay_domains, Postfix will try to relay the message on to the actual server for that domain.
Relaying
By default, Postfix will only relay for members of the same subnet. Most likely, this setting needs modification. Some servers will not want to trust members of their subnet. Others will probably want to trust their entire organization. Servers that allow non-trustworthy systems to relay stand a good chance of being blacklisted after spammers discover and abuse them. To control which IP addresses are trusted, mynetworks_style and mynetworks can be modified in the /etc/postfix/main.cf file. For example, to disable all relaying so that only localhost can send mail, set: File: /etc/postfix/main.cf + mynetworks_style = host + mynetworks = 127.0.0.1
10-8
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
IMAP & POP Concepts IMAP POP IMAP and POP Implementations • Dovecot • RHEL7: Cyrus IMAP
IMAP & POP
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Two standard protocols are used to deliver messages to email clients: IMAP and POP. Under Linux there are many different server implementations that support these protocols. Some of the more popular servers include Courier IMAP, Cyrus IMAP, and Dovecot.
related packages dovecot
[R7] cyrus-imapd, cyrus-imapd-utils
port/protocol
143/tcp 993/tcp 110/tcp 995/tcp
[imap] [imaps] [pop3] [pop3s]
binaries
Dovecot:/usr/sbin/dovecot [R7] Cyrus:
/usr/lib/cyrus-imapd/*, /usr/lib/cyrus/bin/*
configs
Dovecot:/etc/dovecot/dovecot.d/*.conf [R7] Cyrus:
/etc/cyrus.conf, /etc/imapd.conf
logs
/var/log/messages, /var/log/secure
data directory
Cyrus: /var/lib/imap/
user/group
Cyrus: cyrus/mail [R7] Dovecot:
dovecot/dovecot
10-9
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
IMAP/POP Troubleshooting IMAP/POP enabling xinetd/TCP wrappers Incorrect SSL/TLS certificate Debugging with telnet
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
IMAP & POP – Common Problems
Debugging with telnet
IMAP and POP servers are simple and straightforward to configure, but difficulty can arise when these are integrated in within a complex mail system. Problems can occur between the IMAP/POP service and the MTA (Sendmail or Postfix), MDA (Procmail, Maildrop, Sieve), authentication system (SASL) and/or mail clients.
For advanced troubleshooting of IMAP/POP servers, it is possible to use telnet to correct to the IMAP or POP port and issue commands directly to the server:
The IMAP/POP daemons can be started through either SysV Init scripts or Xinetd (depending on the version of the server or the distribution it is running on). Make sure that the service has been started correctly. Using a port scanning tool such as nmap is a quick way of seeing if the port is open (a service is listening). xinetd/TCP wrappers
Remember that Xinetd and TCP Wrappers restrictions apply to most IMAP/POP server implementations. Incorrect SSL/TLS Certificate
When using IMAP/POP over SSL or TLS, make sure that a valid certificate is being used. Many clients will not properly connect with an incorrect or default certificate.
10-10
# telnet localhost 110 Trying 127.0.0.1... Connected to localhost. Escape character is •^]•. +OK POP3 mail.example.com v2003.83rh server ready USER emcnabb +OK User name accepted, password please PASS mypa$$wd +OK Mailbox open, 3 messages LIST +OK Mailbox scan listing follows 1 3078 2 3370 3 3648 . QUIT +OK Sayonara Connection closed by foreign host.
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
MariaDB MariaDB overview Commands • mysql • mysqldump • mysqladmin Configuration • RHEL7: /var/log/mariadb/mariadb.log • SLES12: /var/log/mysql/mysqld.log • mysql database
MariaDB
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
As a drop-in replacement to MySQL, MariaDB is a high performance relational database, commonly used in LAMP stacks. It integrates well with many other software projects which require transactional data storage. related packages mariadb
mariadb-{server,libs} [S12] mariadb-{client,tools} [R7]
binaries
/usr/bin/mysql, /usr/bin/mysqldump, /usr/bin/mysqladmin
configs
/etc/my.cnf /etc/my.cnf.d/ mysql database
logs
[R7]
data directory
/var/lib/mysql/
Data from a MariaDB database can be exported to a plain-text format with the mysqldump command:
$ mysqldump -u username -p database > backup.sql
Data that was dumped by mysqldump can be reloaded into the database server by redirecting the plain-text file into the mysql command:
$ mysql -u username -p database < backup.sql
The mysqladmin command can be used to view the current status of the MariaDB server and which processes the MariaDB server is executing, change user passwords, and flush the logs.
/var/log/mariadb/mariadb.log [S12] /var/log/mysql/mysqld.log
MariaDB supports only online modification of data stored in the database. Modification of data can happen through the mysql command-line client, both interactively and non-interactively.
10-11
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
MariaDB Troubleshooting MariaDB troubleshooting overview Common Problems • Incorrect root password • Invalid database file permissions • Typographical Errors In Config Files • Full Filesystem • MariaDB already running
Incorrect root password
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
To reset MariaDB's root password, stop any currently running MariaDB servers. Next, start MariaDB from the command-line, passing it the --skip-grant-tables option. This will allow root to log in without the need for a password. The root password can then be changed using the mysql client:
# systemctl stop mariadb # mysqld_safe --skip-grant-tables & # mysql mysql MariaDB [mysql]> UPDATE usera SET Password=PASSWORD("new_password")a WHERE User=•root•; . . . output omitted . . . MariaDB [mysql]> \q Bye # pkill mysqld # systemctl start mariadb Permission/Ownership of Database Files
The /var/lib/mysql/ directory is used to store the socket file, among others. If encountering the error "Can•t connect to local MySQL server through socket...", try changing the mode (e.g. chmod 755 /var/lib/mysql/), and verify the directory is owned by the same user and group as the MariaDB server process, (generally mysql). With systemd, this is determined by the systemd unit file.
10-12
Typographical Errors In Config Files
An easy mistake to make with MariaDB is a simple typographical error. If MariaDB can't understand a configuration option, it will generally fail to start. The best way to spot such problems is to review the output of journalctl.
[R7] The following applies to RHEL7 only:
On Red Hat Enterprise Linux systems, MariaDB errors are sent to the /var/log/mariadb/mariadb.log file. [S12] The following applies to SLES12 only:
On SUSE Linux Enterprise Server systems, MariaDB errors are sent to the /var/log/mysql/mysql.log file.
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Lab 10 Estimated Time: S12: 120 minutes R7: 120 minutes
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Task 1: Troubleshooting Problems: Topic Group 7 Page: 10-14 Time: 120 minutes Requirements: b (1 station) c (classroom server)
10-13
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
Objectives y Practice troubleshooting related to: Samba, MariaDB, Postfix, and IMAP/POP services
Lab 10
Requirements b (1 station) c (classroom server)
Topic Group 7
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu
Relevance Practice solving problems to make it easier to diagnose and fix them in the real-world.
1)
Enter the troubleshooting environment with tsmenu.
# tsmenu
2)
10-14
Task 1 Troubleshooting Problems:
Execute each of the scripts within Troubleshooting Group #7.
Estimated Time: 120 minutes
Copying this manual is prohibited. If this manual appears photocopied please alert Guru Labs by email at
[email protected] or call +1 801-298-5227. Copyright ©2014 Guru Labs, L.C. All Rights Reserved. DO NOT COPY. Licensed to Robert Furesz
z u s e j.h r u i F d t t iu r t e e b z o R m e : n to d @ e t r s e n b e c o i r L sz. e r fu